Tuesday, July 26, 2005

JavaScript and RDF - (almost) perfect together

JavaScript and RDF. a match made in heaven. or perhaps, on earth, rather. what do i mean by that? well let me explain.

the match between JavaScript and RDF, not being forged in heaven could never be perfect. it is a fine match, nonetheless. and we gain much if we remember that there is no perfection down here on earth. many of us share the continual experience that the more data we accumulate, and the more perspectives we acquire, the less crisp and clean do the lines of any theories we hold appear to be. the boundaries drawn by our theories are constantly being scratched out, and redrawn, as we learn more, and for some of us the lines look more like blurry smudges than sharp lines. fine, you say, but what does any of this have to do with JavaScript and RDF? what does an age-old antagonism between Platonic idealism and Epicurean empiricism have to do with RDF and JavaScript?

today we live in a world with ever more digital data from an ever increasing number of sources. and a world all in which all of this data is ever more connected via the web. information technology, no longer controlled by an ordained elite with the power to control by whom, how, and wherefore information is created, processed, and distributed is now largely in the hands of "the people" who are now using the means at their disposal to create massive amounts of data with an unprecedented level of freedom and ease, driving unprecedented levels of creativity and innovation, as well as noise. several important open standards for how this data is represented and distributed have been critical in enabling this tidal wave of information to set forth - TCP/IP, HTTP, and HTML being chief among them. the philosophy of "open source" computer code has been important, as well.

okay, we know all this, i hear you saying. get to the point, you say. we're gettin there ...

by and large the data in this tidal wave is unstructured. HTML being in large part a standard for marking up unstructured text, this makes sense. while Google does an admirable job of helping you harvest this sea of unstructured data, it can't help you with all that structured data out there, much of which is locked up in relational databases behind firewalls, only presented to the outside world in chopped up, regurgited, mixed-with-HTML form. what's missing is a standard for structured data that will scale to the broad, decentralized, and open nature of the web. old models of data that worked well within isolated, well-controlled domains will not scale to meet the requirements of a massive, global web of data.

but i misspoke. we do have such a model of data, and for anyone interested enough to read this far you probably know what I'm about to say: RDF. in RDF, everything has an identifier, called a URI, which is global in scope. more importantly, RDF's structural properties give it the flexibility to accomodate all of the world's structured data in one big structured database - the fabled "Semantic Web", that could be queried with a language that is as powerful as SQL is for relational databases. don't underestimate the gravity and presumption of this statement. all of the data now locked up in relational database silos, and in non-relational ones, with the great multitude of world views, concepts, and prejudices that the schemas underlying those databases embody, could be united into one giant database. and then, at any time, anything, anywhere, could be related to anything anywhere else in the world, in any way, by merely creating a labeled pointer, and then a query involving the relationship between these two things could be executed. the phrases "at any time" and in "any way" are key here. in RDF the relationships are dynamic, rather than being predefined by a schema as they are in the relational world.

"wow - data integration nirvana!", some who have worked in enterprise data integration might say. but then they would scratch their heads and say, "it's not so simple as that". there are all kinds of issues surrounding how data from different sources was modeled, the meanings of the different fields and tables and such, formatting issues, and all that dirty data out there. but this would only underscore RDF's unique potential as a model of structured data for the web. these sorts of problems have perenially plagued those working in the trenches of enterprise data integration efforts. many of these problems are in large part due to the fact that there is no perfect schema; the corporate data model is a myth; or as clay shirky would say: "ontologies are overrated". and rather than going away, these problems are only magnified exponentially when you scale out to the web. the genius of RDF is that it doesn't see resolving all of these "ontological" issues as a prerequisite for integration (that is, unless you're in the ontology-oriented RDF camp, in which case you see the use of ontologies modeled in languages like OWL as a key component of the semantic web. i actually believe that the dissonance in the discourse about RDF and the semantic web, between discussions of its fundamental flexibility on the one hand and very esoteric discussions about ontologies on the other, is largely responsible for the confusion surrounding it, and for how slow RDF has been on the uptake). we can unify and connect all of the world's structured data even though it's all quite messy, complicated, and multi-faceted. and even as there is ever more data produced, and the lines we draw in the data are continually erased and redrawn, RDF accomodates all of this roiling diversity, change, instability, and uncertainty quite well.

ok, rather than trying to drive the point home any further, i'm going to assume that you're with me on the notion that RDF, with its inherent flexibility is an ideal data platform for the web. that you get how rather than requiring the kind of Platonic purity of forms that the relational paradigm implies, it allows for a more organic, florescence of structured data. and i'll take it for granted that you think this is a good thing, a worthy thing. so what of JavaScript? it's just some scripting language used to spice up HTML and make web pages more flashy, right? HA! that's what they used to say about Java in the early days, before folks started realizing its potential ...

the seed of my sense of the affinity between RDF and JavaScript was planted when I was working on an RDF project at my last company. one of my colleagues jokingly labelled my goal of spreading RDF as "hashmaps everywhere". i laughed at the truth embedded in that joke, but i wasn't fully aware of how true it was. for those of you who don't know, hashmaps are a widely used implementation of the Map interface in the Java programming language. maps are otherwise known as "associative arrays", "hashes", or "dictionaries" in other languages. in a very real way, the RDF model of data could be described as interlinked associative arrays. this simplification and reduction to something akin to an essence of RDF was in the back of my mind months later, when I was working on an AJAX application, using JSON as a data interchange format. prior to this, i had never looked too deeply into JavaScript, but the similarities between RDF and JSON were apparent. both are a very general, minimalist means of representing data, with simplicity being a primary virtue. both can be modeled very simply as a sets of connected associative arrays, with the distinction that JSON is more suitable for representing tree-like sets of data, than a global graph of data. in essence, JSON - which is essentially a serialization of JavaScript's object model - is very suitable for representing localized subsets of the uber-graph of data - "the semantic web" - represented in RDF. in fact, in JavaScript an object is an associative array; therefore the properties of any object are completely dynamic.

JavaScript is a prototype-based programming language. in traditional object-oriented programming languages, you need to define a class model, sometimes called an object model, for your data. class models, like RDBMS schemas, are essentially ontologies, and define a narrow, prescriptive container for your data. anything that doesn't fit within the model isn't allowed. the assumption in early waterfall models of software development is that you create the perfect model for your data upfront, and then design your programs around that assumption of perfectness.

of course, the class model is rarely perfect and often changes. iterative development styles and refactoring techniques arose to address this reality. more recently, reflection-based techniques and dynamic byte-code manipulation are the rage, allowing for programs that are more robust and flexible in the face of variability in class structures. but these techniques are rather cumbersome to use, and seem like a big ugly patch on a language that is fundamentally statically typed. prototype-based languages, on the other hand, start out with the assumption that you cannot predefine a perfect class model. there are no classes of data, only instances. some of those instances may serve as prototypes for other instances, but by and large the language is much more empirically oriented than formally oriented.

and so, with JavaScript, you have for your application tier, what you have with RDF, for your data tier. a programming model that is built to accomodate a world of data and function most of which does not fit nicely into clean Platonic shapes, that is more interested in accomodating whatever you throw at it then being a tool for designing the perfect glove. a match made in heaven. oops, i mean on earth.

i think it is no mere coincidence that RDF and JavaScript are both relatively young technologies, both having arisen after the rise of the web. they are both a product of the times, in which change is increasingly rapid, time increasingly scarce, data increasingly abundant and interconnected, and knowledge, or understanding of the data, decreasingly perfect. now i realize that JavaScript has heretofore been relegated largely to cosmetic client-side web page enhancementw, and has made virtually no inroads into the server side where most of the meat of applications today is considered to reside (Netscape's failed LiveWire technology notwithstanding). but there are new projects that are reviving the concept of JavaScript on the server side, and with the emergence of the AJAX web programming model we should be seeing more intelligence moving to the client side.

so what is my vision of RDF, JavaScript, and the web of the future? well, im not quite sure, but it involves web apps with lots of JavaScript manipulating RDF, that is shuffled around in JSON format. and, somehow, the art of programming starts to look more like jazz. but more on that in a future post ...