Technology / Web / 

22 Nov 2010

What the Internet is Reading (Apparently)

For a while now I’ve been thinking about getting a Kindle. When they first came out, I was pretty negative on the whole concept: expensive, too much vendor lock-in, not enough of a step up over existing product (paper books). Anyway, three years has changed my assessment somewhat.

First, I’ve started to do a lot of traveling again. I say ‘again,’ because I was doing a fair bit of travel back in 2007/08, but then stopped for a while. But it’s now picking up again, and for me, travel means lots of time sitting around in train stations, airports, and on various modes of transport, often without Internet access or AC power. So I do a lot of reading.

Second, the price of the Kindle has come way down. At its original $400 pricetag, I wasn’t interested. But at the current sub-$200 price (and that’s for the 3G model; the base model is only a relatively small step up from a C-note), it goes from ‘expensive toy’ to something I could put in my briefcase and not be obsessively concerned about.

Third, the ebook marketplace has matured. Amazon now has competition, and although there is still a lot of vendor lock-in with the Kindle, you can get ebooks from sources besides AMZ. And that’s really important; I wouldn’t have bought an iPod if the only source for music had been the ITMS (and, I strongly suspect, very few others would have either). Anyone who buys a hardware device that can only be used in conjunction with content purchased from one vendor is a fool.

Now, I’m not saying that the Kindle marketplace is bad or wrong, or that you shouldn’t buy stuff from it. I suspect that if I do get a Kindle, I’ll probably be spending some significant money there. But I wouldn’t even consider the Kindle if that was the only way to load content onto it.

Which brings me around to The Pirate Bay. TPB now has an “E-Books” section – it’s a top-level category, right next to “Music” and “Movies”. If you ever needed a sign that electronic books are here to stay, that’s it. A quick glance seems to suggest that many of them are PDFs, meaning that they’re probably meant for consumption on a computer with an RGB display, rather than an e-ink device like the Kindle or Nook, but e-reader-friendly formats like MOBI and E-PUB (and venerable old ASCII, with a smattering of HTML) seem to be reasonably popular.

The really fascinating is what seems to be getting read. After all, this is a marketplace where the only “cost” to a user is a few minutes of their Internet connection. By sorting the list by either seeders or leechers, we can find out what users are either choosing to redistribute to others, or merely downloading for themselves. The top picks are interesting:

Sorted by leechers, as of 22 November 2010, 0331 UTC:

Unless this is some sort of joke or codeword for something else, it’s a 300+ MB scan of an out-of-print chef’s reference from 2002. Apparently there’s a huge unmet demand in the bowels of the Internet for ingredient identification. Who knew?

This is pretty interesting. The #2 entry on the list isn’t a book per se, it’s actually a huge collection of books – so big that the contents of the archive is actually provided as a separate torrent. This is presumably so that you can pick out various titles that you want, and tell your BT client to only get the ones you care about. However, I suspect that most people are just grabbing the whole thing, judging by the number of users involved.

There’s a whole separate post for another day, just going through what gets included in archives like this. Real-world librarians have to balance demands like shelf space when deciding what books to keep and which to cull … but someone putting together a torrent doesn’t have any limitation except disk space, and that’s not much of an issue today. This particular collection weighs in at 23.41 GiB, compressed, and claims to unpack to 32 GB. (Depending on the format, the books inside would still be compressed, though.)

I’m pretty sure this has to be bigger than the fiction collections of many local libraries, so the idea that someone can just download something like this in one go is a big mind-blowing.

Another scanned cookbook as a PDF. I think I may actually own this one, on paper, picked up in the bargain section of B&N. What strikes me as odd about this one is that it doesn’t seem like it would translate well to electronic reading on a computer. I leave it sitting out on the bar and flip through it occasionally when I’m looking for a new drink to try out – advances in e-readers aside, that’s still easier on paper than on bits.

If you have a computer in front of you, there are much better options for learning about drinks: sites like WebTender and iDrink can automatically provide you a list of drinks you can make given ingredients at hand, etc. If you were going to go to the work of scanning in a book and pirating it, this wouldn’t be at the top of my list… and yet, hundreds of people are downloading it.

This makes a bit more sense. It’s another big collection, this time of various “xyz For Dummies” books. Although I still question whether these aren’t squarely in that minority of books that are still better on paper than on a screen, the piracy appeal at least seems more clear. They tend to be fairly expensive and have a short lifespan: I tend to purchase them when I want to learn about a specific product or skill, and then move on to more specialized resources once I’ve gotten the basics down.

Now this is interesting. Not only is Anonymous terribly interested in cooking and mixology, the Internet would also like to know about such varied topics as “The Destruction of Sodom, Gomorrah, and Jericho: Geological, Climatological, and Archaeological Background” and “Lyndon Johnson and the Escalation of the Vietnam War”.

More seriously, I suspect the cachet here is not so much the subject matter – although it does appear to contain so much stuff that you’d have to be a real vegetable to not find something of interest in there – but its perceived value: OUP books tend to be fairly expensive, and if we use $40 as an average figure, works out to a ‘street value’ of over twenty five thousand dollars for the archive. (Naturally, like the RIAA’s figures for the ‘value’ of downloaded MP3 files, that’s ridiculous. First because you could get the paper books used for a fraction of the new price, second because I doubt that more than a rare handful of users getting the PDFs would ever have purchased any of the paper books.)

Another surprising choice. But the entire back catalog of Fine Woodworking does have a lot more value than most technology-oriented magazines, so perhaps it’s not so strange. A 1996 issue if MacWorld is pretty much only good for laughs and/or nostalgia value, but FW from the same year could still be handy, if it contained a project that you were interested in. As a result, the collection taken as a whole of Fine Woodworking is pretty valuable, in a way that many other magazines wouldn’t be.

It’s not hard to think of other magazines that might be similarly interesting if put together in a big collection; hobbyist mags of the ‘project per month’ format are probably the best candidates, long-form journalism are probably up there too, while purely news magazines would seem to have less value.

I’m not sure what it means that this book is the first in-print title to be simultaneously on the Pirate Bay top ebooks list and also the New York Times Best-Seller list (where it is currently #1). On one hand, maybe it cuts across demographics. Or, maybe the demographic represented by TPB’s users are interested in the book’s content, but don’t want to pay for it.

My personal guess is that it’s not politically motivated, and you’d see any book with the amount of press coverage that ‘Decision Points’ is getting represented on the Pirate Bay’s first page. (Tracking TPB versus the NYT would be interesting, though.)

Joking aside, the Pirate Bay’s listings provide a window into a corner of the market that’s under-represented in other measures, like the NYT’s lists. It’s a justified under-representation if all you care about are financial transactions, since this sort of piracy occurs without any money changing hands. But that doesn’t mean it doesn’t both reflect and perhaps affect other aspects of the market.

Given the shift we’re about to see towards ebooks and electronic publishing, you’d have to be foolish to ignore what’s going on outside the squeaky-clean, vendor-approved channels.

This entry was converted from an older version of the site; if desired, it can be viewed in its original format.