IBM Mashup Summit

    May 8, 2007

I’m at the IBM Mashup Summit in San Francisco today.  As we are within the bowels of a large enterprise, there is a process to follow to get wifi access, so I’m offline. 

ibm mashup summit We’re here to talk about mashups within the enterprise.  With all the innovation on the web with mashups and widgets, real work needs to be done on standards, identity, process and security to bring them into the enterprise.  We aren’t just talking about technical work to make things work, but how to market it before insecurity FUD curbs innovation.

Dion Hinchliffe phoned in an introductory talk about what he is seeing in the space (I’ll try to find notes to link to, and update this post later).

Then Pete Kaminski (Socialtext CTO & co-founder) and I gave a little talk about the things we have seen, done and questions we have.  Unfortunately, Open Data is being seen as enough in this space.  Be like Flickr, just put up an API, let people innovate, and that’s good enough.  But it isn’t good enough for enterprises, which is an opportunity for us to work on standards that may conversely enhance the consumer web.  For an enterprise, to develop upon a service, they need to know if their effort is at risk, as the API may change.  Especially when services are built upon services upon services.  Enterprises also pay particular mind to switching cost and lock-in risk, and standards and open source provide ways of reducing cost and managing risk.

Mostly we talked about Amo, which means "to carry" in Hawaiian and is a REST API for wikis we hope becomes an ad-hoc standard.  It incorporates the Atom Publishing Protocol thanks to some good work Chris Dent did over a weekend.  Unfortunately, the folks really working on this stuff are up at our hackathon in Vancouver this week.  But it gave us a chance to share what we have heard and not from enterprises over the last four years.  Customers have stronger needs to integrate with directory systems for single sign on, and despite our efforts to make auth pluggable, the lack of standardization in this area is a problem not just for deploying a wiki — but signals the complexity and perhaps greatest risk in enterprise mashups, that of identity.  When a mashup platform has multiple services and multiple logins, where and how are they stored is an exponential problem that puts security and system cost in conflict with usability.  We didn’t get customers coming to us asking for mashups in numbers, but we did get people asking for data to be available and offline editing.  We created RSS and Atom feeds for every page, tag, search query, watchlist, weblog and wiki.  In absence of other clients, we used the Atom API for offline editing using Ecto, a blog editor.  Most recently, we created SocialPoint for Sharepoint portal integration using our SOAP API.  With the REST API, we worked with Jeremy Ruston to create Socialtext Unplugged for offline wiki reading and editing.

Pete made an interesting point about the currency and quality of data that reminded me of a post by Allen Morgan I read yesterday.  Pete pointed out how the rise of the convenient cell phone has changed user expectations for call quality within land-lines themselves. Allen is exploring similar trends in audio and video: fidelity declines with the rise of convenience.  Pete gave the example of how a user of Socialtext Unplugged can board an airplane to Hong Kong with a reasonable expectation they will be working with less current information the further they travel.  What user expectations and education will they have when using mashups across different data from multiple processes.  This is an important question because it also informs how expensive it should be to build and operate these systems.  Rod Smith suggested there should be a "freshness dial."

I emphasized that there are some areas you don’t want to automate, such as merging revision conflicts, because people are better than algorithms for many things, and suggested other service providers borrow from some elements of wiki design like revision history.

I shared our experience with open source application licensing.  From the conversation, I think people understood the need for a different license for open source web applications compared to infrastructure.  But it also was clear to me that I’ve not communicated our current status, as someone in the know asked if we were "Open Source."  Nobody owns the term and can modify it in their own way, but there is a significant role for OSI to accredit project as OSI Certified.  Socialtext is almost six months into the process of getting it’s license OSI Certified, we don’t claim we are yet, but we do say rightly we are a commercial open source provider.  We are about to submit a third revision of our license, so I write more later, but if the process concludes in the negative, we will choose a different OSI license.  Not because it will suit our needs, in fact it will decidedly not, but because of the role we want to play in the community.  We’ll see what the other 15 MPL+Attribution projects do.  But attribution is an important issue for mashups, and people here seemed to be in favor of it.

Stephan from Kapow technologies sees the stack as Mashup builders like QEDwiki, Teqlo and Excel and Mashup enablers like Kapow and RSSBus..  Because we don’t have UDDIs and WSDLs of the web services world, we need service discovery through a central service repository and builder specific repository. How do I find the data I need and get it into the format I need? Within the enterprise, users want to be able to get to data without involving IT.  An example of this is IBMs Mashup Hub, and while more service descriptors are needed, people just want to grab two values off of different sites (using Kapow’s web-scraping) and put them together in Excel or SocialCalc. Need to communicate through WS* (he assumes SOAP is what legacy speaks.  Someone pointed out that at Mysql conference nobody knew about SOAP, and he countered that people in Europe don’t know REST), REST, RSS/Atom feeds, Atom Publishing Protocol, APIs.  And access the data through HTTP and HTTPs.  Suggested solution: Define microformats to describe each type of service.  Define a simple way to inform Builders of the existence of services and define a simple way for Enablers to request service information from central repositories.

At a certain point the notion of having a market of services that people could purchase on a granular billable basis came up.  I suggested to start from the opposite side, encouraging the commons.  Or more specifically this group could go to Creative Commons and try to host a directory of CC licensed APIs.  We also discussed availability, and I pointed out that in other industries we would start with conversations about standardizing SLAs.

Paul Raymond who is in the commercial division of AccuWeather, which provides weather info to 106 million Americans each day.  Their primary asset is their brand, they copyright much of their material and want to syndicate under control.  Web scraping creates new business models for them, even if it is just linking back.  They co-brand over 20k affiliate sites, provide a number of mapping web services and work with other mapping services, have a number of widgets and more.   Other business models: subscription and fixed pricing that is secure and authenticated — or CPM-based control content, campaign, source and cost.  Their basic approach is let people hack upon it, but largely encourage marketing attribution in return.

I had to leave before the afternoon sessions by SnapLogic, Jeff Nolan, Reuters and Mashery.  We still haven’t really talked about security, or the marketing thereof, which is the elephant in the room. It will be interesting to see if a common roadmap emerges.

A guy from the EPA was asked about politicizing of data.  He shared how there is a law where you can dispute the bias or accuracy of data and gain resolution.  He told the story of how a US Satellite over the north pole started picking up anomalies in ozone levels and scientists believed it was impossible so they normalized the data syndicated.  It wasn’t until British scientists used balloons to find unreported change that they opened up the logs and corrected the feed. 

Data is political and when you have so much change it is the politics, as much as the technology, that needs to be worked out by the community.

UPDATE: More coverage from Jeff Nolan, SnapLogic, and otherwise I’m disappointed more participants aren’t blogging this.