Caret7:Development/OpenMP tricks

From Van Essen Lab

Revision as of 18:38, 8 September 2011 by Tim (Talk | contribs)
Jump to: navigation, search

This page describes quirks discovered while using OpenMP in caret, as well as some general parallelization guidelines.

Quirks

  • OpenMP's default loop scheduler is static, with a chunk size equal to (number of indexes) / (number of threads). This means each thread is assigned an equal sized continuous chunk of indexes, and will STOP and wait for the other threads when it is finished. For things like correlation where the workload is unequal among indexes, this is inefficient because as soon as the lightest loaded thread is done, it is using one less thread than available. The way to fix this for such jobs is to include the clause "scheduler(dynamic, 1)" (use dynamic scheduling with a chunk size of 1, default chunk size is 1 so technically you don't need to specify it) to the "for" pragma line, it will cause each thread to work on one index, and will allocate it a new index once it finishes, never letting threads sit idle until all indexes are claimed. There are other scheduler methods which should also solve the problem, such as "guided" (uses large chunks initially, but tapers down to specified chunk size at the end, again assigning chunks as they are requested). Larger chunk sizes may be better if you know the job will always contain a lot of indexes, in order to reduce overhead. If the job has extremely fast inner loops, using this scheduler may slow the execution down slightly due to overhead of mutexes and scheduling logic, so for jobs with equal workloads, the default static scheduling is probably fine.
  • The "private" clause is broken in some compilers (notably, the mac compiler for a recent version of XCode), so instead, use the "parallel" pragma by itself first, declare what needs to be private but persist through the loop, then use the "for" pragma on the loop itself.

General guidelines

  • Parallelization at the lowest level that doesn't introduce much overhead is generally a good idea - if you can parallelize an operation on a single column of a metric, it is better than going parallel only when you have a multiple column metric, since it allows it to always run parallel.
Personal tools
Sums Database