22 Oct 2007

Google, Microsoft, and the Open Content Alliance

The Open Content Alliance, better known as “those people who run Archive.org” is finally getting some good press.

Although I’m a bit disappointed that the NYT decided to focus on the negative side of the equation – libraries shunning Google and Microsoft – rather than the positive, it’s nice to see a project like this being mentioned either way. And it brings up a lot of important issues concerning restrictions on digital materials, particularly public-domain materials, being compiled by corporations.

Unfortunately, scanning books (non-destructively) is an expensive business, even if you have volunteers to pick up the bulk of the page-flipping and book-moving tasks. So I’m not sure how many libraries are going to be able to partner with OCA, since it doesn’t pay for the scanning like Google and MS do. And on the surface, their deal doesn’t seem that bad.

Although I haven’t seen a copy of their actual agreement that they have with libraries, the gist of it seems to be that, in return for paying for the scanning itself, the corporate partner gets exclusive use of the content in its databases. The main excluded party would be other Internet-search firms: Google wants to lock out Microsoft (and Yahoo), Microsoft wants to lock out Google.

And so the net result is that there will be two separate for-profit digital libraries, Microsoft’s and Google’s, and they won’t be sharing with each other. Google has the New York Public Library and Harvard’s collection, Microsoft has the Universities of California and Toronto (the largest university libraries in the U.S. and Canada); once an institution has taken sides, they’re basically committed. Scholars will have to search both to get a complete picture of what’s available on a particular subject.

OCA represents a third path: a library partnering with them has to pay for the scanning itself (or find outside funding to do it), but the materials thus digitized are available to everyone. This entry was converted from an older version of the site; if desired, it can be viewed in its original format.