Parent: MorphoOptimizationProject

===Overview=== This is detected by a difference either in the stdout lines or the created files. The problem is to find when the change first happens.

My general approach is

===Details=== The most useful additional output was the hash of the MRIS, because that is the main data that is output from one step and input to the next. If the hash is the same between runs, it is likely that the final outputs will be the same.

The comparison soon revealed where the divergence happened, but to nail this down I added code to romp_support.c that counts the number of parallel loops executed (not iterations, but the number of times such a loop is executed). To enable this tracing, at the start of utils/romp_support.c there is a line

Rebuild. Now when you run, you will get stdout lines that show when the parallel loops have executed, and you will see at the end all the loops that have executed.

Now add fprint's close to where those parallel loops are, to show what their inputs and output's are. To show a MRIS, simply print its hash using mris_print_hash. If the mris has the same state, its hash will be the same. It is highly unlikely that two different ones will have the same hash.

Now diff'ing the outputs should enable you to zoom into where the two runs differ. The biggest problem is finding shared variables - variables written by one thread while being read or written by another. Intel's Inspector tool is aimed at finding these, and is very powerful.

A simple fast-executing alternative for finding some of the violations is to have code keep track of whether an object has been accessed for read or write by omp parallel loop iteration, and check that only one iteration accesses each object for write, or that the only accesses are reads. Sadly this requires adding significant code. In some projects I have worked in, major classes have a member that specifies the current owner.

Another alternative is, once the problem loop is identified, is to do the iterations in a different order - say hi to lo - printing out the behavior of each iteration. The behavior should be the same, so the iterations that do different things give a hint as to the problem.

Getting the same answer for different reasons

Sometimes the code is searching a list of candidates for any one that matches a criteria - eg: does this face intersect any other.

The function that returns true or false should also return at least one candidate, so that if one run matches and another does not, we know at least one matching candidate to investigate.