These are safe bets, but they fail to capture the Web's disruptive trajectory. The real transformation under way is more akin to what Sun's John Gage had in mind in 1988 when he famously said, "The network is the computer." He was talking about the company's vision of the thin-client desktop, but his phrase neatly sums up the destiny of the Web: As the OS for a megacomputer that encompasses the Internet, all its services, all peripheral chips and affiliated devices from scanners to satellites, and the billions of human minds entangled in this global network. This gargantuan Machine already exists in a primitive form. In the coming decade, it will evolve into an integral extension not only of our senses and bodies but our minds.Later he remarks:
By 2015, desktop operating systems will be largely irrelevant. The Web will be the only OS worth coding for.This vision is similiar to previous pipe dreams, like The Intergalactic Foundation (which I, in my college years and fresh-out-of-college years, happened to have been a big believer in), except it doesn't seem like such a pipe dream anymore. The web has taught us a great deal about what it is necessary to make a truly "intergalactic" web platform work, and if we look at the evolution of pipe dream towards realistic vision we see a trend towards increasing simplicity of the model. SOAP was a revelation because it looked like CORBA reincarnated on a more lightweight web substrate. In its first incarnation it did not require any specialized software other than a web server and xml parser, which are much easier to come by and simpler beasts than CORBA ORB's. Unfortunately, SOAP seems to be following the path of CORBA in a spiral of increasing complexity towards irrelevance. For that reason, REST appears to be the architecture of choice for these emerging "web service" applications.
The common thread through all of these discussions about distributed computing platforms is the notion of API's, and so the RPC (remote procedure call), in some form, to this day remains the key figure in the vision of Web 2.0. But what I think has been absent from these discussions is consideration of a DBMS for the web. For decades now, some sort of DBMS has served as the backbone for the vast majority of "data-driven" applications, which happens to comprise virtually 100% of corporate IT systems and "business apps". The reason is simple: a standard, consistent, elegant data management platform is not a trivial undertaking, and yet is a requirement for all such applications. For most software developers, developing these applications would be unthinkable without a DBMS, usually an RDBMS.
Databases often serve as an integration point between several applications that share the same data (in fact, this was one of the primary motivations for the development of the first database management systems). Sometimes the quickest way to extend the functionality of an existing application that you've inherited, is to go around the code and look at the database and build a new app directly against that. This is frowned upon but fairly common, in my experience, often because the existing code either doesn't provide an API, per se, or the API is deficient in some way (functionally, or non-functionally). Still, the philosophy that one shouldn't access a database directly, and should go through API's instead, persists and this is still the way many systems are integrated. What are the reasons for this?
Well one reason is that you want to protect your database from "corruption". There are often complex rules surrounding how records get updated that cannot be fully expressed through the "data integrity" machinery of the DBMS, and so some sort of API call (which might be a stored procedure in the RDBMS) backed by code which enforces these rules is required. Furthermore, the space and shape of update operations is usually pretty well understood and to some degree fixed. The application designers can usually map out the majority of useful write operations and provide API calls, or end-user functionality, which accomplish them. Not so with the reading of the data. Application developers often find that users need to be able to generate "reports" about the data that were not foreseen. There are myriad possible ways that a user might want to filter, sort, count, or see relationships amongst the different data elements, and the chances of predicting all of the ones users will want ahead of time is slim. Thus the robust market for reporting and OLAP software that hit the database directly, as well as the trend of building data warehouses - large uber-databases with data culled and integrated from multiple systems across an enterprise, to which OLAP software is then applied.
Another reason for the persistence of this API-oriented thinking, I think, is that there is still engrained in our collective software engineering unconscious this notion of the importance of "encapsulation". We were taught the importance of writing, and writing to, abstract interfaces in our software development, and to treat the implementations of these interfaces as "black boxes" that cannot, and should not, be seen into. It was thought that encapsulation could not only provide greater security, but also prevent users of software libraries from building dependencies in their systems on the parts of the software library most likely to change (the implementations vs. the more stable interfaces), causing the client system to break. While this interface vs. implementation concept has a lot of merit when developing software frameworks, from a practical standpoint its value is negligible in the context of pure read access of data, particularly when the database software and database schema of a production application is the thing least likely to change. Even when the schema does change, this usually requires a change to interfaces representing data anyway since there is usually a straight mapping from database schema to these interfaces. The open-source era has also taught us a lot about the relative value of this black-box notion of software components. Contrary to our prior intuition, in a globally networked environment with constant, instant, and open communication, lots of eyes looking deep into software can increase its safety and reliability. Our ability to respond to changes in software components which break the apps we build on top of them is also enhanced.
A Case Study
Recently, I wrote a Greasemonkey script that reinforced my belief in the need for a web database service for Web 2.0 apps. While it was a fairly trivial script that I wrote simply to tinker around, it highlights some of the shortcomings of a purely API-centric approach to these new cross-web applications. Basically what the script does is replace the photos in the slideshows of city guides on the Yahoo travel site with Flickr photos that are tagged with that city's name and have been flagged by the Flickr system as "interesting".
Well, the first problem is that the Flickr API does not give you a way to retrieve interesting photos. They have a search method that allows you to retrieve photos with the tags you specify, but "interestingness" is some special system attribute which is not modeled as a tag. In a situation like this, where the method hard-codes a limited set of ways in which you can query the data, you're pretty much shit up the creek if you want to query the data in a way that the developers didn't anticipate. You can ask the Flickr development team to provide it, and hope that they honor your request, and implement it within a reasonable timeframe, but your deadline will likely be past by then. Luckily for me, there's a screen I can scrape to grab the photos I need, an inelegant hack that does the job, but which is an ugly solution.
The second problem I had was that I wanted to filter out any photos tagged as "nude", not wanting to offend the users of my script with the sight of unwanted genitalia when they're exploring possible vacation destinations. There is no exclude tag option for the search method, and no easy way to do this. I could if I wanted to, put a loop in my program to repeatedly call the search method (assuming the search method did actually provide an option to specify "interesting" photos), and for each photo in the result page invoke the Flickr service again to find out all that photo's tags and throw it away if it has a "nude" tag, calling the search method repeatedly until I have the number of photos I need to fill in the slide show. Now, it's unlikely that the search method will need to be invoked more than twice, but I have to code for an indefinite number of iterations of this loop cuz I can't know for certain at any time for any given city how many nude photos there will be in the results. And two invocations of the search method is already more than I should have to make. Not only is this solution more work to implement, but it has very unfavorable performance characteristics, and puts unnecessary load on the server. Instead of making one service call over the network, I have to make (N+1)*X calls, where N is the number of results in each page, and X is the number of pages that need to be processed to fill the slide show. In this case, this requirement turned out not be worth the effort and performance impact it would have, so I let it go.
The third problem I encountered was a consequence of the screen scraping approach I was forced to take. I wanted to display the title of each photo, just like the default Yahoo slideshow does. The search method of the Flickr API returns the title of each photo in the results, but unfortunately the screen that shows a page of "interesting" photos with a given tag does not. If I want to display the titles of each photo in the slideshow, I have the same (N+1)*X problem I have with wanting to filter out nude photos; I'd have to make a seperate call to get the title for each photo in the page. This was not such an easy requirement to let go of, so we're forced to pay the performance penalty.
Now this was a very small script with very limited functionality, but you you can see the issues that crop up when you want to build a real-world web app using a purely API-based approach. It is not possible to approximate the power of a full relational/pattern-matching calculus, the kind that is approximated with a typical database query language like SQL, with a set of name-value pairs, which is what the input to a method/REST-endpoint essentially is (the usual way around this is to allow one of the name-value pairs to represent a query that gets executed directly against the database; this is nothing more than proxying the DB query interface through the method call). It is also generally much more efficient to look at a diagram of a data model to figure out what query to run against a database than it is to read a functional API spec to figure out how to orchestrate a set of API calls to accomplish what one query could.
We need a WDBMS (Web Database Management System) or WDBS (Web Database Service)
I say, let's use API's when appropriate(for most write access to data), and give access to DBMS query interfaces when appropriate (which is often the case for read access to rich data repositories). We have a good architecture for Web Services/API's, which is proving itself in real and prominent (press-worthy, at least) apps, in REST. Where's our web database architecture, which can complement REST in its simplicity and ability to scale to a global level? Well, as I've expounded on in previous posts, I think RDF is it.
Another point to consider is that as these mash-ups get more sophisticated they will no longer be pure mash-ups. Instead of merely exploiting existing relationships between data in different web sites, they will allow for the creation and storage of new relationships amongst data that is globally distributed across the web. These applications will need to have write access to their own databases, built on DBMS's designed for the web.
Designed for the web, these databases should be available as online services that can be accessed over the web. There should be a consistent serialization defined from an arbitrary dataset to an "on-the-wire" transport format in the lingua franca of the web - XML - which RDF provides, or alternatively into another web format that is simpler and better - JSON ( this simple requirement could have naively be achieved by storing your data as XML with some sort of XML database technology, but XML has many problems as a data model, not the least of which being that it violates the KISS principle) . Physically, they should look like the web, with a similiar topology and the ability to be massively distributed and decentralized, with distributed query mechanisms that can work in a peer-to-peer fashion. As the data substrate underpinning the sophisticated mash-ups of the future, I see them filling in what might be viewed as the currently "negative space" of the web, the gaps between web sites. I can see these kinds of database services really coming into their own serving as data hubs between multiple sites.
As an experiment, I will be putting a stab at such a WDBS online in the near future. A web app that I'm putting together using Kowari's RDF database engine. It will be available for free use by mash-up experimentalists who just have a Mozilla browser with Greasemonkey at their disposal, and need some place online to store their data. More news on that coming up ...