If you’ve run Damon Cortesi’s handy curl command to download all (or the last 3200) tweets from your twitter account, you’ll have a directory full of files with names like
user_timeline.xml?count=100&page=1. Not only that but they include a large amount of redundant profile stuff in the
<user> element. And not only that, but twitter sometimes returns a “Twitter is over capacity” page instead of your tweets.
What we want to do is a) detect any files which don’t contain tweets, b) remove the redundant user profile, and c) combine the results into a single file.
Well, friends, here is a shell script to do exactly that. You’ll need zsh and xsltproc, both of which are standard on MacOS X and most sane Linuxen.
zsh is needed to sort the input files in numeric, as opposed to lexicographic, order. If you know of a way to do this in bash, let me know...
Output is on stdout, so just redirect to your filename of choice:
$ tweetcombine user_timeline.xml\?count=100\&page=* \ > tweet_archive.xml