Kadin2048's Weblog

←2017→
Months
Jan	Feb	Mar
Apr	May	Jun
Jul	Aug	Sep
Oct	Nov	Dec

Mon, 22 Nov 2010

What the Internet is Reading (Apparently)

For a while now I’ve been thinking about getting a Kindle. When they first came out, I was pretty negative on the whole concept: expensive, too much vendor lock-in, not enough of a step up over existing product (paper books). Anyway, three years has changed my assessment somewhat.

First, I’ve started to do a lot of traveling again. I say ‘again,’ because I was doing a fair bit of travel back in 2007/08, but then stopped for a while. But it’s now picking up again, and for me, travel means lots of time sitting around in train stations, airports, and on various modes of transport, often without Internet access or AC power. So I do a lot of reading.

Second, the price of the Kindle has come way down. At its original $400 pricetag, I wasn’t interested. But at the current sub-$200 price (and that’s for the 3G model; the base model is only a relatively small step up from a C-note), it goes from ‘expensive toy’ to something I could put in my briefcase and not be obsessively concerned about.

Third, the ebook marketplace has matured. Amazon now has competition, and although there is still a lot of vendor lock-in with the Kindle, you can get ebooks from sources besides AMZ. And that’s really important; I wouldn’t have bought an iPod if the only source for music had been the ITMS (and, I strongly suspect, very few others would have either). Anyone who buys a hardware device that can only be used in conjunction with content purchased from one vendor is a fool.

Now, I’m not saying that the Kindle marketplace is bad or wrong, or that you shouldn’t buy stuff from it. I suspect that if I do get a Kindle, I’ll probably be spending some significant money there. But I wouldn’t even consider the Kindle if that was the only way to load content onto it.

Which brings me around to The Pirate Bay. TPB now has an “E-Books” section — it’s a top-level category, right next to “Music” and “Movies”. If you ever needed a sign that electronic books are here to stay, that’s it. A quick glance seems to suggest that many of them are PDFs, meaning that they’re probably meant for consumption on a computer with an RGB display, rather than an e-ink device like the Kindle or Nook, but e-reader-friendly formats like MOBI and E-PUB (and venerable old ASCII, with a smattering of HTML) seem to be reasonably popular.

The really fascinating is what seems to be getting read. After all, this is a marketplace where the only “cost” to a user is a few minutes of their Internet connection. By sorting the list by either seeders or leechers, we can find out what users are either choosing to redistribute to others, or merely downloading for themselves. The top picks are interesting:

Sorted by leechers, as of 22 November 2010, 0331 UTC:

Cooking Ingredients

Unless this is some sort of joke or codeword for something else, it’s a 300+ MB scan of an out-of-print chef’s reference from 2002. Apparently there’s a huge unmet demand in the bowels of the Internet for ingredient identification. Who knew?

Largest fiction library (english), ebooks - 80000 authors -9000

This is pretty interesting. The #2 entry on the list isn’t a book per se, it’s actually a huge collection of books — so big that the contents of the archive is actually provided as a separate torrent. This is presumably so that you can pick out various titles that you want, and tell your BT client to only get the ones you care about. However, I suspect that most people are just grabbing the whole thing, judging by the number of users involved.

There’s a whole separate post for another day, just going through what gets included in archives like this. Real-world librarians have to balance demands like shelf space when deciding what books to keep and which to cull … but someone putting together a torrent doesn’t have any limitation except disk space, and that’s not much of an issue today. This particular collection weighs in at 23.41 GiB, compressed, and claims to unpack to 32 GB. (Depending on the format, the books inside would still be compressed, though.)

I’m pretty sure this has to be bigger than the fiction collections of many local libraries, so the idea that someone can just download something like this in one go is a big mind-blowing.

The Ultimate Encyclopedia of Wine Beer Spirits & Liqueurs

Another scanned cookbook as a PDF. I think I may actually own this one, on paper, picked up in the bargain section of B&N. What strikes me as odd about this one is that it doesn’t seem like it would translate well to electronic reading on a computer. I leave it sitting out on the bar and flip through it occasionally when I’m looking for a new drink to try out — advances in e-readers aside, that’s still easier on paper than on bits.

If you have a computer in front of you, there are much better options for learning about drinks: sites like WebTender and iDrink can automatically provide you a list of drinks you can make given ingredients at hand, etc. If you were going to go to the work of scanning in a book and pirating it, this wouldn’t be at the top of my list… and yet, hundreds of people are downloading it.

For Dummies Ebooks A-Z

This makes a bit more sense. It’s another big collection, this time of various “xyz For Dummies” books. Although I still question whether these aren’t squarely in that minority of books that are still better on paper than on a screen, the piracy appeal at least seems more clear. They tend to be fairly expensive and have a short lifespan: I tend to purchase them when I want to learn about a specific product or skill, and then move on to more specialized resources once I’ve gotten the basics down.

Oxford University Press Ebook Pack 652 Books

Now this is interesting. Not only is Anonymous terribly interested in cooking and mixology, the Internet would also like to know about such varied topics as “The Destruction of Sodom, Gomorrah, and Jericho: Geological, Climatological, and Archaeological Background” and “Lyndon Johnson and the Escalation of the Vietnam War”.

More seriously, I suspect the cachet here is not so much the subject matter — although it does appear to contain so much stuff that you’d have to be a real vegetable to not find something of interest in there — but its perceived value: OUP books tend to be fairly expensive, and if we use $40 as an average figure, works out to a ‘street value’ of over twenty five thousand dollars for the archive. (Naturally, like the RIAA’s figures for the ‘value’ of downloaded MP3 files, that’s ridiculous. First because you could get the paper books used for a fraction of the new price, second because I doubt that more than a rare handful of users getting the PDFs would ever have purchased any of the paper books.)

Fine Woodworking 2010 - All Issues

Another surprising choice. But the entire back catalog of Fine Woodworking does have a lot more value than most technology-oriented magazines, so perhaps it’s not so strange. A 1996 issue if MacWorld is pretty much only good for laughs and/or nostalgia value, but FW from the same year could still be handy, if it contained a project that you were interested in. As a result, the collection taken as a whole of Fine Woodworking is pretty valuable, in a way that many other magazines wouldn’t be.

It’s not hard to think of other magazines that might be similarly interesting if put together in a big collection; hobbyist mags of the ‘project per month’ format are probably the best candidates, long-form journalism are probably up there too, while purely news magazines would seem to have less value.

Decision Points - George W. Bush

I’m not sure what it means that this book is the first in-print title to be simultaneously on the Pirate Bay top ebooks list and also the New York Times Best-Seller list (where it is currently #1). On one hand, maybe it cuts across demographics. Or, maybe the demographic represented by TPB’s users are interested in the book’s content, but don’t want to pay for it.

My personal guess is that it’s not politically motivated, and you’d see any book with the amount of press coverage that ‘Decision Points’ is getting represented on the Pirate Bay’s first page. (Tracking TPB versus the NYT would be interesting, though.)

Joking aside, the Pirate Bay’s listings provide a window into a corner of the market that’s under-represented in other measures, like the NYT’s lists. It’s a justified under-representation if all you care about are financial transactions, since this sort of piracy occurs without any money changing hands. But that doesn’t mean it doesn’t both reflect and perhaps affect other aspects of the market.

Given the shift we’re about to see towards ebooks and electronic publishing, you’d have to be foolish to ignore what’s going on outside the squeaky-clean, vendor-approved channels.

0 Comments, 0 Trackbacks

[/technology/web] permalink

Wed, 13 Oct 2010

Removing Dot-Underscore Files

This is just a quick note, mostly for my own reference, of a few ways to easily delete the dot-underscore (._foo, ._bar, etc.) files created by (badly-behaved) Mac OS X systems on non-AFP server volumes.

First of all, if you’re in a mixed-platform environment, you probably want to run this command on your Mac:

defaults write com.apple.desktopservices DSDontWriteNetworkStores true

This doesn’t stop the creation of dot-underscore (resource fork) files, but it does at least cut down on the creation of their equally-obnoxious cousin, the “.DS_Store” file. I’m not aware of a way to automatically and persistently suppress the creation of resource fork files on platforms that don’t deal with resource forks, though.

For the record, it’s not that I’m against the idea of resource forks or filesystem metadata … I think metadata is great and I wish filesystems supported more of it! But hacky solutions like .DS_Store and dot-underscore resource forks are not going to convince anyone who’s on the fence, and give the Mac OS a reputation for crapping all over shared network resources.

To get rid of the dot-underscore files, the most efficient way is using find from the Unix side of things:

find . -name '._*' -exec ls {} \;

Once you’ve verified that you’re only looking at files you want to delete, kill them with:

find . -name '._*' -exec rm -v {} \;

And if you have .DS_Store files around that you need to zap as well, then you’d just do:

find . -name '.DS_Store' -exec rm -v {} \;

The -v switch on rm isn’t strictly necessary, of course, but I like it just so I can see what’s going on. If you’re hardcore, you can omit it. Note that the single-quotes around the search string being passed to find are crucial; if you use double-quotes, your shell will (more than likely, depending on the shell) expand the string before it gets to find. Not good.

A certain amount of caution is advised when running this — although the files are basically useless on any non-Mac platform, they do contain Finder comments and HFS+ EAs, which are significant on OS X and could be important to some users. This is not something you’d want to run globally on a shared system, for instance, unless it was as part of a script that checked to see whether the dot-underscore file was an orphan, or something with similar safeguards.

Unfortunately I don’t see the need for this going away, unless Apple finds some more elegant solution for dealing with Mac-specific metadata in mixed environments. It would be great if copying files from a Mac to a Linux-backed SMB share automatically preserved all the HFS+ metadata and turned it into ext4 extended attributes, obviating the need for the dot-underscore files… I am not going to hold my breath for that, though.

0 Comments, 0 Trackbacks

[/technology/software] permalink

Tue, 07 Sep 2010

“Optimistic” Thought Experiments and the ‘Equity Premium Puzzle’

Peter Thiel, formerly of PayPal, more recently of the Founders Fund and Clarium Capital, has an interesting article in the Hoover Institution Policy Review, called “The Optimistic Thought Experiment.” The best one-sentence summary is probably his own: “In the long run, there are no good bets against globalization.”

However, the article is more interesting than just that, and even if you disagree with that particular conclusion — perhaps especially if you disagree with that particular conclusion — it’s worth a read.

The argument that I found most interesting is that, as a result of more powerful technologies and a more complex and interconnected world, a greater risk exists of “secular apocalypse,” a complete, system-wide failure of the current capitalist framework, than has ever existed in the past. What might have been local panics or crashes now reverberate globally, even as new failure modes have emerged.

This, in particular, struck me:

[T]he extreme valuations of recent times may be an indirect measure of the narrowness of the path set before us. Thus, to take but one recent example, in 1999 investors would not have risked as much on internet stocks if they still believed that there might be a future anywhere else. […] It is often claimed that the mass delusion reached its peak in March 2000; but what if the opposite also were true, and this was in certain respects a peak of clarity? Perhaps with unprecedented clarity, at the market’s peak investors and employees could see the farthest: They perceived that in the long run the Old Economy was surely doomed and believed that the New Economy, no matter what the risks, represented the only chance. Eventually, their hopes shifted elsewhere, to housing or China or hedge funds — but the unarticulated sense of anxiety has remained.

I am not sure exactly how convinced I am of this — it has a sort of exceptionalist tinge to it that I am intrinsically skeptical of — but it’s a very interesting theory. To some extent, casual observation bears it out: the last few years, we have seen speculative bubbles pop up in various places as investors have moved from one market to the next in search of returns.

Much has been made of the fact that some of these bubbles — real estate in particular — just never made a whole lot of sense, or at least not enough sense to justify the amount of money that was being pumped into them, or the fervor with which it was being pumped, not just by I-banks and hedge funds, but by individuals ‘flipping’ houses, taking on second homes, and getting involved in shady high-return investment schemes (which are still being advertised on sketchy hand-drawn yard signs at major intersections in my area).

Viewed through the lens of Thiel’s thought experiment’s premise, it starts to look a whole lot less irrational and a whole lot more rational — albeit desperate.

It also made me wonder about the long-standing arguments regarding the ‘equity premium’ (the premium paid by equities versus ‘risk-free’ investments like Treasuries). Some people argue, generally by analyzing U.S. market returns during the 20th century, that the equity premium is around 5-8%. However, others suggest that in the future, it might be more like 3-4% over cash. The jury is definitely still out on this, but it certainly seems like there is a developing consensus that the equity premium is in decline.

The equity premium has long been considered something of a mystery, because it’s higher than you’d expect given investor behavior. With an equity premium of 7% over bonds, you’d have to be almost ridiculously risk-averse to not buy equities. But if the premium is as low as some suggest it may be going forward, than the mystery might be the other way around: why hold equities when you can have less risky bonds instead, at a small discount?

My completely speculative theory is that perhaps this is due, in part, to the kind of attitude Thiel discusses. If investors suspect, consciously or unconsciously, that a scenario in which their S&P 500 fund becomes worthless would also be one where T-Bills or even cash are worthless (or, less extreme, that they’d lose significant value as well), then they might not agree that the difference in risk is great enough to justify the lower yield of the ‘safer’ investment.

Of course, the market could just be irrational. I’m not sure that anything other than time is going to tell.

0 Comments, 0 Trackbacks

[/finance] permalink

Mon, 24 May 2010

RITEKS04: Learn from my mistake

(Or, “In Which Your Narrator Learns a Valuable Lesson About The Perils of Cheap Media and the Proper Use of the Verify Button.”)

A while back I picked up a 50-pack ‘cake box’ of surprisingly low-priced DVD+R Dual Layer discs, possibly at Costco (although it could have been Best Buy or Staples, at this point I’m not sure). The price was good and I’d been thinking that it’d be nice to be able to cram more than an hour’s worth of DVD-Video onto a disc, and I naively had some idea about using them for backup purposes as well.

The discs were TDK branded, and say “DVD+R Double Layer 8x Speed 8.5GB”. Unfortunately, what the label doesn’t say is that, as far as I can tell, the only relationship they have with actual recordable DVDs is that they’re roughly the same size and they have a hole in the middle. But more on that later.

Anyway, I bought the discs, brought them home, and promptly forgot about them. I don’t burn a ton of discs anymore, but it’s good to have media on hand…right? So there they sat, lurking.

A few months ago, I went to burn a fairly big photography project to disc for safekeeping. Of course I had it backed up to a second hard drive (via Aperture’s ‘Vaults’ feature), but I figured an additional copy on optical wouldn’t hurt. And so I clicked away in Toast and burned the files to one of the discs.

Now I generally make a point of always letting Toast verify any disc I burn, but I’d be a liar if I told you that I actually remember doing it. It’s possible that I just mounted the disc immediately after burning, cataloged it, and then ejected it and put it with my other backup discs. Rinse, repeat … about a half a dozen times or so, over the space of a few months. Each time the same process — I burn a disc without problems, mount it, catalogue it, and then put it away in a binder. The process wasn’t flawless — sometimes I’d get failures due to ‘sense key’ errors and have to try a couple of times to get a successful write, and writing 8.5GB at the maximum speed of 2X that my writer supported was slow — but I figured that was par for the course.

Up until earlier this week, when I did something different. I burned a bootable disc image (long story) and attempted to use it on another computer. It failed. In fact, the disc wouldn’t even mount when I put it in the other computer’s drive. Thinking it was a compatibility issue, I brought it back to my desktop where I’d burned it — nope, nothing. Odd, I thought — so I tried it again, and got the same result.

Although the disc had mounted just fine immediately after burning, once ejected from the drive and reinserted, it would never mount again.

Not only was the boot disc bad, but every disc I’d burned on the media turned out to be bad. That’s right: one hundred percent mis-burns.

In the interest of science, I blew through most of the rest of the stack trying different burning programs, data, and writers. I achieved identical results with my desktop’s Pioneer DVR-109 and my laptop’s LG GSA-S10N6, using both Toast and Apple’s Disk Utility. No dice under any combination. (On the LG, the discs seem to fail more reliably during burning; on the Pioneer they sometimes complete and fail silently.) However, the error does seem to be reliably caught if full post-burn verification is run on the resulting disc.

So, lessons learned:

Always run a full verification cycle on every disc, every time. I got sloppy, and as a result let a few bad discs slip through into my backups. No harm came of it, and they’re a belt-and-suspenders thing anyway, but it’s not hard to imagine how it could have been a nastier surprise. If it’s worth burning, it’s worth verifying.
Brand names on optical media are totally meaningless. I’ve discovered the crummy “RITEKS04” stuff under both the TDK and Memorex names. It seems to be the case that if you’re getting a ‘good deal’ on double layer DVD+R 8X media, what you’re getting is probably the RITEK crap. Avoid it like the plague it is.
Allegedly, the only decent DVD+R DL media around is Verbatim. I wasn’t able to find any in any local stores in my area, although it is available mail-order. In general, the whole double layer system seems half-baked and worth avoiding unless you really need it.
DVD burning isn’t like CD burning. Or at least not like CD burning in the last ten years. I was cavalier about burning DVDs because I mostly burn CDs, and CD writing technology has been pretty reliable — even with cheap non-mail-order media — for the better part of a decade now. DVD burning, especially double layer, isn’t like that; it’s like 1999 all over again with the weird incompatibilities.

It’s entirely possible that there are burners out there that handle the steaming pile that is RITEKS04 just fine, but both of my drives are common models (both are stock Apple parts) and work fine with better grades of Verbatim media.

So much for a good deal on a box of discs.

0 Comments, 0 Trackbacks

[/technology] permalink

Wed, 28 Apr 2010

Flickr Referrer Data: Going Once, Going Twice…

According to the powers that be at Flickr, you have until June 1, 2010, 12:00 PM PDT to get your historical referrer-log data, if you are a Pro member and are interested. You can download it as CSVs from the bottom of your stats page.

Apparently it is “not sustainable” to keep the data available forever. I suspect this translates to ‘it’s really expensive and only 0.001% of our users actually care or are even aware of it.’ In the future, they will be providing access to 28 days worth of data via the API, but probably nothing beyond that.

I wasn’t even aware that this feature existed until I saw the notice that it was going away, and although their reasons for terminating it are understandable, it is surprisingly interesting data. My strong recommendation is that, if you’re a Pro member and you have a few megabytes of disk and transfer to spare, you might as well take thirty seconds and download them.

Note to anyone thinking of being clever and using curl or wget to batch-download all the files at once: don’t bother, it’s really not worth your time. (Trust me on this, I looked into it.) You’d have to authenticate using Flickr’s API and it’s guaranteed to take longer than just pointing your browser’s download-destination folder to some appropriate place and Alt-clicking on them.

If you download your logs as CSV files — and really why wouldn’t you? Excel gives you no advantage here — you can use this small Python script I wrote to dump them into a SQLite database. The script requires Python 2.5 or later (or possibly an older version with the appropriate PySQLite add-on package, but I haven’t tested that and probably won’t). Bug reports and enhancements are welcomed although it’s not meant to be pretty, since it won’t be of much use to anyone after June 1. The schema should be obvious just from looking at the script; it’s two tables — one for daily and the other for weekly data — and the columns in the CSVs are all carried over into the DB. There’s no date conversion or other fancy stuff.

What you do with it once you get it there is your business; I haven’t really decided what, if anything, to do with it, but it seemed like having everything in a couple of DB tables was a lot more convenient, whatever I might decide to do with it, than having it in dozens of CSVs. (Do keep the CSVs though. They’re small and there’s no good reason not to.) If you have any suggestions of interesting things to do with or ways to analyze the data, let me know by SDF email or in the comments.

Maybe once they get the API access set up, I’ll write something to grab new stats and shove them into the same SQLite DB on top of the existing records. It would only have to run once a month or so to stay on top of the feed, which isn’t that bad.

0 Comments, 0 Trackbacks

[/technology/web] permalink

Wed, 07 Apr 2010

On the Cost of Digital Photography, Pt. 2

So I finally got around to taking that trip to Yellowstone that I was talking about back in May. Rather than going in late August 2009, as I had planned, I actually ended up going last month (that’d be March, 2010). For those of you not familiar with weather in the Northern Hemisphere, that meant going in what amounts to the dead of winter, instead of the height of summer — and as far as I’m concerned, it was the best thing that could have happened to the trip.

While we didn’t have the Park to ourselves, exactly, we were much closer to it than we would have been in August. And wildlife was much easier to spot as well. Although I didn’t snag a shot of one of Yellowstone’s coveted wolves, I did get some nice images of the local bison, coyotes, birds, etc. All in all, a great trip, and I highly recommend a winter excursion to the park for anyone who hasn’t done it already. You won’t be disappointed.

But that’s not my purpose here. In my earlier post I made soem guesses about what I thought my photography habit was going to set me back, in terms of consumables (in the form of storage), for the trip. In that post I had guessed that I’d fill two 4GB cards, which I thought would be fine for my relatively low-resolution (by 2010 standards) DSLR.

As it turned out, I was a little low.

Over the course of nine days, I took a total of 2,299 frames, equivalent to about 64 rolls of 135, and consuming just under 20GB. Just sorting through then and making a ‘first cut’ is a project in itself.

Part of the reason I ended up taking so many frames (and I’m using the word “frames” rather than “images” carefully here) is because I had brought my laptop along with me on the trip and as a result knew I didn’t have any reason to conserve storage space. The limiting factor on my shooting wasn’t storage, but instead camera batteries. With a 4GB and 8GB card, I could easily shoot all day and then dump the contents to my laptop at night for storage and immediate backup to a DVD.

This led to an immediate change in my shooting style that I never would have made, if I’d been shooting film or even using the digital without the hundreds of gigabytes of disk storage that the laptop represented: I turned on three-frame, +/-0.5 EV bracket mode on the first day and never turned it off. That’s something I’ve never felt rich enough to do on my film Maxxum.

So those 2299 frames really represent something like 800 images (I did take a few without bracket mode, so that’s a low estimate), much closer to my initial estimate of 20 or so 35mm rolls worth. It’s just that, rather than only having one frame for each image, with the digital I have two “insurance” frames, in case my judgement of the light was a bit off or I just decide that slightly lighter or darker is preferable. Although not earth-shattering by any means, I do think the bracketing saved a few marginal images that otherwise would have been garbage if I’d only had the “center” one. And that’s enough to make me a pretty happy photographer.

0 Comments, 0 Trackbacks

[/photography] permalink