Deletions are marked like this. | Additions are marked like this. |
Line 12: | Line 12: |
* ''Richard:'', MRI data structure - pointer-chasing implied by `***slices` is horrific for the CPU caches (and a non-starter on the GPU). The 'chunking' alternative is much better, but needs to be used uniformly * ''Richard:'', MATRIX data type - If this is only used for 4x4 affine transformations, it should be coded as such. If used more generally too, then a separate 'Affine' class should be considered (this exists for the GPU already in the file `affinegpu.cu`) |
Notes on FreeSurfer code optimization
This page is for free-form entry of notes on ways to optimize the freesurfer code base, whether they be simple things, or notes on larger scale problems.
Format is: name:, <short label> - description
* nick:, -ffast-math - Try the -ffast-math flag of gcc v4.x. Prior experiments with this on the AMD compiler showed output differences in recon-all, but perhaps selective use of this flag is possible.
* nick:, SSE math lib - Replace instances of sin, cos, log and exp with routines optimized for SSE instructions found on intel processors. See http://gruntthepeon.free.fr/ssemath/. Tried this, but ran into problems and gave up. Some wrangling with it could make it possible to optionally build with this lib via #ifdefs. Actually, its not a lib but a header file.
* Richard:, MRI data structure - pointer-chasing implied by ***slices is horrific for the CPU caches (and a non-starter on the GPU). The 'chunking' alternative is much better, but needs to be used uniformly
* Richard:, MATRIX data type - If this is only used for 4x4 affine transformations, it should be coded as such. If used more generally too, then a separate 'Affine' class should be considered (this exists for the GPU already in the file affinegpu.cu)