mrisAsynchronousTimeStep_optionalDxDyDzUpdate

The mrisAsynchronousTimeStep_optionalDxDyDzUpdate function in mrisurf.c implements an algorithm that is moving all the vertexs of the surface slightly, in preferred directions, while constrained to not letting faces intersect.

The serial alternately swept through the vertexs from low numbered to the high numbered, and high to low. However since each vertex movement moves a face, the behavior of the first vertex moved has the potential, and does, affect the movement of some of the subsequent vertexs. This algorithm can not be parallelized easily and still get the same answers.

I replaced this code with one that, for each sweep, does three things

It calculates for each vertex the [xLo.xHi, yLo..yHi, zLo..zHi] box that bounds its possible new positions
It partitions the volume into subvolumes, and for each face it calculates a set of subvolumes that will contain that face regardless of how its vertexs move within their bounding box. If that is a single subvolume, the face is assigned to it, otherwise the face is assigned to the whole volume. Even getting close to the surface of a subvolume causes assignment to the whole volume, to allow for rounding errors.
If every face that a vertex belongs to is in the same subvolume, the vertex is assigned to that subvolume, otherwise the vertex is assigned to the whole volume.

Fortunately all the above steps are easily parallelized.

Once this is done, the subvolumes are processed in parallel, to decide on the movement of all the vertexs assigned to them. Because the faces within them are not moving far enough to impact the others subvolumes, this can be done in parallel.

I did not similarly subdivide the mrishash.c implementation, which speeds up finding intersecting faces, but I did introduce locks into it to support parallelism. Each subvolume is processed with the original serial algorithm. After all the subvolumes are done in parallel, the remaining vertexs, which were assigned to the whole volume, are also processed by the serial algorithm. If this last step was slow (which it is not), my intention was to repartition the space so that most of these would fall into a single subvolume, and repeat.

This approach - exploiting the physical dimensions of the surface to partition the work into mostly independent chunks, and then doing the rest some how - does complicate the code, but it matches how the real world does things such as inflating crumpled balloons.