Differences between revisions 4 and 5
Deletions are marked like this. Additions are marked like this.
Line 18: Line 18:
 * the intel was machine 'monster', running Centos6  * the 2.66GHz intel was machine 'namic', running Centos6
 * the 3.3GHz
intel was machine 'monster', running Centos6
Line 22: Line 23:
||2.66GHz Intel Xeon E5430 (Core)||3.4.6||-O3 -msse2 -mfpmath=sse||NA|| hours, minutes||
Line 35: Line 37:
 * nehalem architecture makes a huge difference (compared to amd opteron)  * nehalem architecture makes a huge difference (compared to amd opteron 200 series)

mri_ca_register timing info

  • tests conducted by NJS from 17-20 march 2012
  • using subject 'ernie'
  • using 'dev' build(s)
  • commandline:

mri_ca_register \
  -nobigventricles \
  -T transforms/talairach.lta \
  -align-after \
  -mask brainmask.mgz \
  norm.mgz \
  /autofs/cluster/freesurfer/centos6_x86_64/dev/average/RB_all_2008-03-26.gca \
  transforms/talairach.m3z
  • the opteron was a seychelles node (node0355), running CentOS5
  • the 2.66GHz intel was machine 'namic', running Centos6
  • the 3.3GHz intel was machine 'monster', running Centos6

processor

gcc v

flags

OMP threads

mri_ca_register runtime

2GHz AMD Opteron 246

3.4.6

-O3 -msse2 -mfpmath=sse

NA

12 hours, 46 minutes

2.66GHz Intel Xeon E5430 (Core)

3.4.6

-O3 -msse2 -mfpmath=sse

NA

hours, minutes

3.3GHz Intel Xeon W5590 (Nehalem)

3.4.6

-O3 -msse2 -mfpmath=sse

NA

3 hours, 8 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.1.2

-O3 -msse2 -mfpmath=sse

NA

3 hours, 10 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-O3 -msse2 -mfpmath=sse

NA

1 hours, 56 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

1

1 hours, 58 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

2

1 hours, 14 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

3

0 hours, 57 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

4

0 hours, 50 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

5

0 hours, 44 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

6

0 hours, 41 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

7

0 hours, 40 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

8

0 hours, 38 minutes

observations

  • nehalem architecture makes a huge difference (compared to amd opteron 200 series)
  • gcc 4.4.5 alone drops 1 hour of time
  • -ftree-vectorize -msse4.1 flags dont make any difference over -msse2
  • adding omp threads adds modest and linear performance improvement
  • asegstatsdiff comparisons show minimal differences in results

CaRegTimings (last edited 2012-06-20 21:04:57 by NickSchmansky)