Making Twitter Safe for Historians

Mon 21 May 2007

Enamored as I am of Twitter, I wonder about the long-term availability of the system. The business model might run its course or be overtaken by others offering the same service. People might grow tired of posting to or reading it.

Twitter’s ephemeral—and increasingly blasé—form may make you question the utility of preserving all that content. Years from now, will anyone care what you were eating for breakfast? Yet in the decades, centuries, and millennia to come, it will be exactly these everyday, pointless details that will be of most interest to the historian. Wouldn’t you want to read what people would have Twittered about (had the service been around) 50, 500, or 5000 years ago?

I for one want my children’s children to be able to read what my friends and I have posted there, regardless of the accessibility or even existence of http://twitter.com.

As a step in that direction, I am releasing a Twitter archiving tool called Aviary.

git clone git://github.com/mja/aviary.git

Once you’ve checked out the repository, run the aviary.rb file in Ruby.

ruby aviary.rb -u USERNAME -p PASSWORD --updates [new|all] [--page XXX]

Note: you’ll also need the Hpricot and builder gems.

Supply your Twitter username and password, and the program will create a directory called USERNAME and start filling it with XML formatted versions of all of your tweets parsed from your account archive page. The --updates all option will look through all of your archive pages for tweets to download. With --updates new, the script will stop when it encounters a tweet that has already been downloaded. Your best bet is to run it with all until your entire back catalog has been saved (use the --pages switch to start gathering tweets from a particular page). Afterwards, run it occasionally with new to update your archive.

Now you have a directory filled with your tweets. How will you repurpose them and save them for the future?

embassy opposition