Technology / Web / 

28 Apr 2010

Flickr Referrer Data: Going Once, Going Twice...

According to the powers that be at Flickr, you have until June 1, 2010, 12:00 PM PDT to get your historical referrer-log data, if you are a Pro member and are interested. You can download it as CSVs from the bottom of your stats page.

Apparently it is “not sustainable” to keep the data available forever. I suspect this translates to ‘it’s really expensive and only 0.001% of our users actually care or are even aware of it.’ In the future, they will be providing access to 28 days worth of data via the API, but probably nothing beyond that.

I wasn’t even aware that this feature existed until I saw the notice that it was going away, and although their reasons for terminating it are understandable, it is surprisingly interesting data. My strong recommendation is that, if you’re a Pro member and you have a few megabytes of disk and transfer to spare, you might as well take thirty seconds and download them.

Note to anyone thinking of being clever and using curl or wget to batch-download all the files at once: don’t bother, it’s really not worth your time. (Trust me on this, I looked into it.) You’d have to authenticate using Flickr’s API and it’s guaranteed to take longer than just pointing your browser’s download-destination folder to some appropriate place and Alt-clicking on them.

If you download your logs as CSV files – and really why wouldn’t you? Excel gives you no advantage here – you can use this small Python script I wrote to dump them into a SQLite database. The script requires Python 2.5 or later (or possibly an older version with the appropriate PySQLite add-on package, but I haven’t tested that and probably won’t). Bug reports and enhancements are welcomed although it’s not meant to be pretty, since it won’t be of much use to anyone after June 1. The schema should be obvious just from looking at the script; it’s two tables – one for daily and the other for weekly data – and the columns in the CSVs are all carried over into the DB. There’s no date conversion or other fancy stuff.

What you do with it once you get it there is your business; I haven’t really decided what, if anything, to do with it, but it seemed like having everything in a couple of DB tables was a lot more convenient, whatever I might decide to do with it, than having it in dozens of CSVs. (Do keep the CSVs though. They’re small and there’s no good reason not to.) If you have any suggestions of interesting things to do with or ways to analyze the data, let me know by SDF email or in the comments.

Maybe once they get the API access set up, I’ll write something to grab new stats and shove them into the same SQLite DB on top of the existing records. It would only have to run once a month or so to stay on top of the feed, which isn’t that bad.

This entry was converted from an older version of the site; if desired, it can be viewed in its original format.