Our algorithms are dealing with a 3D volume or surface. I am going to concentrate on the 3D mesh of vertexs, faces, and edges for this discussion.
Consider the problem of finding a maximum in a list of values. Serial code scanning from one end will easily find either the first or last of equal maxima, and get the same answer every time. Parallel code may find any of the maxima, and find a different one every time.
A similar problem is calculating an average. Floating point addition is not associative, (A+B)+C might differ slightly from A+(B+C). In serial code, the exact order of evaluation will differ depending on compiler optimizations. In parallel code it may differ depending on the number of threads, and the assignment of loop iterations to threads by the omp implementation.
A surface folding algorithm calculates a value for each point on a surface using floating point operations, and one of several almost identical values will be the fold point, resulting in different answers depending on very small differences.
Reproducible OMP programming (ROMP)
With this in mind, a loop can be categorized as
- Is being experimented on to see if it support parallel execution
- Is known to get slightly acceptably different answers
- Is thought to get exactly the same answer regardless of the number of threads
- Is verified by eye or by checking code to do so
The romp_support.h allows all the parallel loops to be annotated saying which of these it is, and then to suppress executing the earlier ones when reproducibility is needed.
Which loops to make parallel
The ROMP support is includes timing code, which produces a report on program exit showing which loops executed, how much time they used, and which of the above categories they were in.
This report has enabled us to identify all the important loops executed during recon-all, and enough of them have been modified to satisfy level 3 criteria at least, that there is no good reason to do the lower levels in parallel.