Comments on: Archiving Tweets http://girtby.net/archives/2009/08/23/archiving-tweets/ this blog is girtby.net Wed, 30 Sep 2009 01:44:34 -0400 http://wordpress.org/?v=2.9-rare hourly 1 By: Aristotle Pagaltzis http://girtby.net/archives/2009/08/23/archiving-tweets/comment-page-1/#comment-14641 Aristotle Pagaltzis Mon, 24 Aug 2009 05:14:21 +0000 http://girtby.net/?p=3905#comment-14641 <p>I concede that <a href="http://search.cpan.org/perldoc?JSON::XS" rel="nofollow">JSON::XS</a> is less likely to be installed than libxslt.</p> <p>I started out with your stylesheet actually, but eventually I got to annoyed at all the effort that XSLT takes for very simple cases like this one.</p> <p>The deciding factor was JSON + dynamic language, so Ruby would work as well as Perl here; I guess it would look cleaner at the expense of a longer command. (Python’s not much for one-liners, however.) Of course you’d ultimately put this in a script, so that’s neither here nor there.</p> <p>As for the spamminess, that was probably because somehow all the underscores in my code block got turned into <code>_</code> character references, and ASCII characters spelled as NCRs is a popular filter blinding technique. (On both sides of the war, actually – <em>we</em> use it <em>against</em> spammers too, c.f. <code>mailto:</code> hiding.)</p> I concede that JSON::XS is less likely to be installed than libxslt.

I started out with your stylesheet actually, but eventually I got to annoyed at all the effort that XSLT takes for very simple cases like this one.

The deciding factor was JSON + dynamic language, so Ruby would work as well as Perl here; I guess it would look cleaner at the expense of a longer command. (Python’s not much for one-liners, however.) Of course you’d ultimately put this in a script, so that’s neither here nor there.

As for the spamminess, that was probably because somehow all the underscores in my code block got turned into _ character references, and ASCII characters spelled as NCRs is a popular filter blinding technique. (On both sides of the war, actually – we use it against spammers too, c.f. mailto: hiding.)

]]>
By: alastair http://girtby.net/archives/2009/08/23/archiving-tweets/comment-page-1/#comment-14638 alastair Sun, 23 Aug 2009 23:31:48 +0000 http://girtby.net/?p=3905#comment-14638 <p>A weird sort of easier:</p> <pre><code>% curl -k -u randomphrase:shh -o tweets-#1.json ... zsh: no matches found: tweets-#1.json % curl -k -u randomphrase:shh -o tweets-\#1.json ... [...] % perl -MJSON::XS -E'...' -- tweets-* | json_xs zsh: command not found: json_xs Can't locate JSON/XS.pm in @INC (@INC contains: ... ) . BEGIN failed--compilation aborted. </code></pre> <p>Despite the snarky comment above, yes it does work nicely after installing libjson-xs-perl.</p> <p>Another nice to have would be to resolve shortened URLs - this is probably a lot easier to do in Perl than XSLT...</p> <p>BTW: Your comment was marked as "Very Spammy" by Defensio, and had to be manually rescued. This makes me sad.</p> A weird sort of easier:

% curl -k -u randomphrase:shh -o tweets-#1.json ...
zsh: no matches found: tweets-#1.json
% curl -k -u randomphrase:shh -o tweets-\#1.json ...
[...]
% perl -MJSON::XS -E'...' -- tweets-* | json&#95;xs
zsh: command not found: json&#95;xs
Can't locate JSON/XS.pm in @INC (@INC contains: ... )
.
BEGIN failed--compilation aborted.

Despite the snarky comment above, yes it does work nicely after installing libjson-xs-perl.

Another nice to have would be to resolve shortened URLs – this is probably a lot easier to do in Perl than XSLT…

BTW: Your comment was marked as “Very Spammy” by Defensio, and had to be manually rescued. This makes me sad.

]]>
By: Aristotle Pagaltzis http://girtby.net/archives/2009/08/23/archiving-tweets/comment-page-1/#comment-14634 Aristotle Pagaltzis Sun, 23 Aug 2009 16:40:35 +0000 http://girtby.net/?p=3905#comment-14634 <p>A weird sort of power.</p> <ol> <li>You can use <code>[01-32]</code> instead of <code>[1-32]</code> to get filenames with correctly sorting names and with the <code>-o</code> switch you can clean up the filenames further.</li> <li>You can download JSON rather than XML.</li> </ol> <p>Bottom line:</p> <pre><code>curl -k -u user:pass -o tweets-#1.json 'https://twitter.com/statuses/user_timeline.json?count=100&page=[01-32]' perl -MJSON::XS -E'@s=map{local@ARGV=$_;@{decode_json<>}}@ARGV;delete@{$_}{qw(user source)}for@s;say encode_json\@s' -- tweets-* | json_xs </code></pre> <p>Tad easier…</p> A weird sort of power.

  1. You can use [01-32] instead of [1-32] to get filenames with correctly sorting names and with the -o switch you can clean up the filenames further.
  2. You can download JSON rather than XML.

Bottom line:

curl -k -u user:pass -o tweets-#1.json 'https://twitter.com/statuses/user_timeline.json?count=100&page=[01-32]'
perl -MJSON::XS -E'@s=map{local@ARGV=$_;@{decode_json<>}}@ARGV;delete@{$_}{qw(user source)}for@s;say encode_json\@s' -- tweets-* | json_xs

Tad easier…

]]>