Comments on: Wide Finder in C++ http://girtby.net/archives/2007/10/09/wide-finder-in-c/ this blog is girtby.net Wed, 30 Sep 2009 01:44:34 -0400 http://wordpress.org/?v=2.9-rare hourly 1 By: Michel S. http://girtby.net/archives/2007/10/09/wide-finder-in-c/comment-page-1/#comment-1629 Michel S. Tue, 09 Oct 2007 11:07:00 +0000 http://girtby.net/2007/11/09/wide-finder-in-c#comment-1629 <p>Hi Alastair,</p> <p>Did a similar implementation in C++, but without the multimap -- slower than Ruby in the single-thread case, and on par in speed (but with higher CPU usage) with 2 and 4 threads, on a dual-core machine.</p> <p>I guess C++ is not the language for string processing.</p> Hi Alastair,

Did a similar implementation in C++, but without the multimap — slower than Ruby in the single-thread case, and on par in speed (but with higher CPU usage) with 2 and 4 threads, on a dual-core machine.

I guess C++ is not the language for string processing.

]]>
By: Richard A http://girtby.net/archives/2007/10/09/wide-finder-in-c/comment-page-1/#comment-1630 Richard A Tue, 09 Oct 2007 11:07:00 +0000 http://girtby.net/2007/11/09/wide-finder-in-c#comment-1630 <p>Tim has started collecting results on each WF implementation on his new toy Sun server (<a href="http://www.tbray.org/ongoing/When/200x/2007/10/30/WF-Results">http://www.tbray.org/ongoing/When/200x/2007/10/30/WF-Results</a>), but it seems he hasn't/can't yet run C++ programs such as yours on this highly minimal platform. Is there anything you can do to help out with this? Is it a problem with building Boost? I'd like to see how the C++ version does against the various speed demons in that table...</p> <p>Regarding the explicit iterator usage, you <strong>might</strong> be able to use boost::lambda or boost::bind to keep the code terse (i.e. through the _1 macros), but doing that might just increase the portability problems.</p> <p>As far as regex performance goes, C++ is only limited here by the implementation used - there's no silver bullet to the functions in Ruby (or Perl, for that matter). There's no good reason why you couldn't simply embed calls to either of these languages regex handlers in a C++ program to get the best of both worlds.</p> <p>One question I have is does the app perform any better if the sorting is done after the filtering is completed? Theoretically, the code needs a global lock on the map before writing new results to it, since two threads may try to increase a count from 0 (not in the map) to 1 simultaneously. Additionally, once there's already a key in the map, that key's entry should be locked for incrementing beyond 1. That would make it a major bottleneck in any multithreaded system. I'm pretty sure the target platform's atomic int behaviour won't save you here (since for expressions like i = i + 1, that's just for the read(i) and write(i), not the whole expression). Don't even think about making this a multiprocess system without some fundamental restructuring...</p> <p>That said, I can imagine a sophisticated compiler and library that can automatically determine which data structures can be in which process, and how each would be isolated from the other. Similar (but simpler!) analysis already leads to the inlining and escape analysis done for heap to stack object conversions in some virtual machine-based languages.</p> Tim has started collecting results on each WF implementation on his new toy Sun server (http://www.tbray.org/ongoing/When/200x/2007/10/30/WF-Results), but it seems he hasn’t/can’t yet run C++ programs such as yours on this highly minimal platform. Is there anything you can do to help out with this? Is it a problem with building Boost? I’d like to see how the C++ version does against the various speed demons in that table…

Regarding the explicit iterator usage, you might be able to use boost::lambda or boost::bind to keep the code terse (i.e. through the _1 macros), but doing that might just increase the portability problems.

As far as regex performance goes, C++ is only limited here by the implementation used – there’s no silver bullet to the functions in Ruby (or Perl, for that matter). There’s no good reason why you couldn’t simply embed calls to either of these languages regex handlers in a C++ program to get the best of both worlds.

One question I have is does the app perform any better if the sorting is done after the filtering is completed? Theoretically, the code needs a global lock on the map before writing new results to it, since two threads may try to increase a count from 0 (not in the map) to 1 simultaneously. Additionally, once there’s already a key in the map, that key’s entry should be locked for incrementing beyond 1. That would make it a major bottleneck in any multithreaded system. I’m pretty sure the target platform’s atomic int behaviour won’t save you here (since for expressions like i = i + 1, that’s just for the read(i) and write(i), not the whole expression). Don’t even think about making this a multiprocess system without some fundamental restructuring…

That said, I can imagine a sophisticated compiler and library that can automatically determine which data structures can be in which process, and how each would be isolated from the other. Similar (but simpler!) analysis already leads to the inlining and escape analysis done for heap to stack object conversions in some virtual machine-based languages.

]]>
By: Alastair http://girtby.net/archives/2007/10/09/wide-finder-in-c/comment-page-1/#comment-1631 Alastair Tue, 09 Oct 2007 11:07:00 +0000 http://girtby.net/2007/11/09/wide-finder-in-c#comment-1631 <p>Excellent comments Richard.</p> <p>I have been waiting to see if Tim gets to give my code a run on his 8-core box. For the record, I'm not expecting much; as I said above, my implementation was mainly about conciseness and readability, not performance.</p> <p>I did look at boost::lambda and boost::bind but, like I said, they didn't do anything for readability or conciseness. Quite happy to admit operator error though; I haven't had much experience with boost::lambda.</p> <p>I've thought a bit about performance lately, particularly in light of <a href="/archives/2007/11/6/required-viewing">recently-acquired knowledge</a>, and would like to tinker some more in this space. A couple of optimisations have presented themselves, mainly as a result of reading about other attempts, particularly the interesting <a href="http://effbot.org/zone/wide-finder.htm">python implementation</a>.</p> <p>One thing that the python implementation does is to filter the input lines using a non-regex search first. This allows the classic Boyer-Moore (or whatever) string search algorithms to kick in, which are apparently a lot more efficient than a regex search. I see no reason why I couldn't do the same in my implementation, just using <code>std::search</code> even!</p> <p>Parallelizing the code is more tricky, obviously, but the same techniques would apply here as with the python implementation: chunk the input and process it in multiple threads. Again, no real obstacle to adapt this to my code.</p> <p>There is no need to synchronise writes to the <code>counts_by_key</code> map - just give each thread a local copy of the map. You need to synchronise the writes to the <code>keys_by_count</code> multimap of course, but again maybe a thread-local multimap might be the way to go, prior to a final non-parallel aggregation step? Something to think about anyway. Like you I have some doubts as to whether there would be any practical benefit here? (Again, Herb Sutter's memory latency talk is looming large)</p> Excellent comments Richard.

I have been waiting to see if Tim gets to give my code a run on his 8-core box. For the record, I’m not expecting much; as I said above, my implementation was mainly about conciseness and readability, not performance.

I did look at boost::lambda and boost::bind but, like I said, they didn’t do anything for readability or conciseness. Quite happy to admit operator error though; I haven’t had much experience with boost::lambda.

I’ve thought a bit about performance lately, particularly in light of recently-acquired knowledge, and would like to tinker some more in this space. A couple of optimisations have presented themselves, mainly as a result of reading about other attempts, particularly the interesting python implementation.

One thing that the python implementation does is to filter the input lines using a non-regex search first. This allows the classic Boyer-Moore (or whatever) string search algorithms to kick in, which are apparently a lot more efficient than a regex search. I see no reason why I couldn’t do the same in my implementation, just using std::search even!

Parallelizing the code is more tricky, obviously, but the same techniques would apply here as with the python implementation: chunk the input and process it in multiple threads. Again, no real obstacle to adapt this to my code.

There is no need to synchronise writes to the counts_by_key map – just give each thread a local copy of the map. You need to synchronise the writes to the keys_by_count multimap of course, but again maybe a thread-local multimap might be the way to go, prior to a final non-parallel aggregation step? Something to think about anyway. Like you I have some doubts as to whether there would be any practical benefit here? (Again, Herb Sutter’s memory latency talk is looming large)

]]>
By: Alastair http://girtby.net/archives/2007/10/09/wide-finder-in-c/comment-page-1/#comment-1632 Alastair Tue, 09 Oct 2007 11:07:00 +0000 http://girtby.net/2007/11/09/wide-finder-in-c#comment-1632 <blockquote> <p>I see no reason why I couldn</p> </blockquote>

I see no reason why I couldn

]]>