Deletions are marked like this. | Additions are marked like this. |
Line 18: | Line 18: |
* the intel was machine 'monster', running Centos6 |
* the 2.66GHz intel was machine 'namic', running Centos6 * the 3.3GHz intel was machine 'monster', running Centos6 |
Line 22: | Line 23: |
||2.66GHz Intel Xeon E5430 (Core)||3.4.6||-O3 -msse2 -mfpmath=sse||NA|| hours, minutes|| | |
Line 35: | Line 37: |
* nehalem architecture makes a huge difference (compared to amd opteron) | * nehalem architecture makes a huge difference (compared to amd opteron 200 series) |
mri_ca_register timing info
- tests conducted by NJS from 17-20 march 2012
- using subject 'ernie'
- using 'dev' build(s)
- commandline:
mri_ca_register \ -nobigventricles \ -T transforms/talairach.lta \ -align-after \ -mask brainmask.mgz \ norm.mgz \ /autofs/cluster/freesurfer/centos6_x86_64/dev/average/RB_all_2008-03-26.gca \ transforms/talairach.m3z
- the opteron was a seychelles node (node0355), running CentOS5
- the 2.66GHz intel was machine 'namic', running Centos6
- the 3.3GHz intel was machine 'monster', running Centos6
processor |
gcc v |
flags |
OMP threads |
mri_ca_register runtime |
2GHz AMD Opteron 246 |
3.4.6 |
-O3 -msse2 -mfpmath=sse |
NA |
12 hours, 46 minutes |
2.66GHz Intel Xeon E5430 (Core) |
3.4.6 |
-O3 -msse2 -mfpmath=sse |
NA |
hours, minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
3.4.6 |
-O3 -msse2 -mfpmath=sse |
NA |
3 hours, 8 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.1.2 |
-O3 -msse2 -mfpmath=sse |
NA |
3 hours, 10 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-O3 -msse2 -mfpmath=sse |
NA |
1 hours, 56 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
1 |
1 hours, 58 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
2 |
1 hours, 14 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
3 |
0 hours, 57 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
4 |
0 hours, 50 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
5 |
0 hours, 44 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
6 |
0 hours, 41 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
7 |
0 hours, 40 minutes |
3.3GHz Intel Xeon W5590 (Nehalem) |
4.4.5 |
-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse |
8 |
0 hours, 38 minutes |
observations
- nehalem architecture makes a huge difference (compared to amd opteron 200 series)
- gcc 4.4.5 alone drops 1 hour of time
- -ftree-vectorize -msse4.1 flags dont make any difference over -msse2
- adding omp threads adds modest and linear performance improvement
- asegstatsdiff comparisons show minimal differences in results