The mrisAsynchronousTimeStep_optionalDxDyDzUpdate function in mrisurf.c implements an algorithm that is moving all the vertexs of the surface slightly, in preferred directions, while constrained to not letting faces intersect.

The serial alternately swept through the vertexs from low numbered to the high numbered, and high to low. However since each vertex movement moves a face, the behavior of the first vertex moved has the potential, and does, affect the movement of some of the subsequent vertexs. This algorithm can not be parallelized easily and still get the same answers.

I replaced this code with one that, for each sweep, does three things

  1. It calculates for each vertex the [xLo.xHi, yLo..yHi, zLo..zHi] box that bounds its possible new positions
  2. It partitions the volume into subvolumes, and for each face it calculates a set of subvolumes that will contain that face regardless of how its vertexs move within their bounding box. If that is a single subvolume, the face is assigned to it, otherwise the face is assigned to the whole volume. Even getting close to the surface of a subvolume causes assignment to the whole volume, to allow for rounding errors.
  3. If every face that a vertex belongs to is in the same subvolume, the vertex is assigned to that subvolume, otherwise the vertex is assigned to the whole volume.

Fortunately all the above steps are easily parallelized.

Once this is done, the subvolumes are processed in parallel, to decide on the movement of all the vertexs assigned to them. Because the faces within them are not moving far enough to impact the others subvolumes, this can be done in parallel.

This approach - exploiting the physical dimensions of the surface to partition the work into mostly independent chunks, and then doing the rest some how - does complicate the code, but it matches how the real world does things such as inflating crumpled balloons.