Differences between revisions 9 and 16 (spanning 7 versions)
Revision 9 as of 2012-03-20 13:45:33
Size: 2527
Comment:
Revision 16 as of 2012-06-20 21:04:57
Size: 2724
Comment:
Deletions are marked like this. Additions are marked like this.
Line 17: Line 17:
 * the opteron was a seychelles node (node0355), running CentOS4.8  * the opteron was a 'seychelles' cluster node (node0355), running CentOS4.8
Line 19: Line 19:
 * the 3.3GHz intel was machine 'monster', running Centos6.0  * the 3GHz intel was a 'launchpad' cluster node, running Centos5
* the 3.3GHz intel was machine 'monster', which has 8 processors, running Centos6.0
Line 23: Line 24:
||2.66GHz Intel Xeon E5430 (Core)||3.4.6||-O3 -msse2 -mfpmath=sse||NA|| hours, minutes|| ||2.66GHz Intel Xeon E5430 (Core)||3.4.6||-O3 -msse2 -mfpmath=sse||NA|| 6 hours, 3 minutes||
||3GHz Intel Xeon E5472 (Core)||3.4.6||-O3 -msse2 -mfpmath=sse||NA|| 5 hours, 46 minutes||
Line 35: Line 37:
||GPU: Tesla C2050|| || || ||0 hours, 19 minutes||
Line 38: Line 41:
 * nehalem architecture makes a huge difference (compared to amd opteron 200 series)  * nehalem architecture makes a difference (compared to amd opteron 200 series)
Line 40: Line 43:
 * -ftree-vectorize -msse4.1 flags dont make any difference over -msse2
 * adding omp threads adds modest and tapering performance improvement
 * -ftree-vectorize -msse4.1 flags does not make any difference over -msse2
 * omp threads plot:

mri_ca_register timing info

  • tests conducted by NJS from 17-20 march 2012
  • using subject 'ernie'
  • using 'dev' build(s)
  • commandline:

mri_ca_register \
  -nobigventricles \
  -T transforms/talairach.lta \
  -align-after \
  -mask brainmask.mgz \
  norm.mgz \
  /autofs/cluster/freesurfer/centos6_x86_64/dev/average/RB_all_2008-03-26.gca \
  transforms/talairach.m3z
  • the opteron was a 'seychelles' cluster node (node0355), running CentOS4.8
  • the 2.66GHz intel was machine 'namic', running Centos6.2
  • the 3GHz intel was a 'launchpad' cluster node, running Centos5
  • the 3.3GHz intel was machine 'monster', which has 8 processors, running Centos6.0

processor

gcc v

flags

OMP threads

mri_ca_register runtime

2GHz AMD Opteron 246

3.4.6

-O3 -msse2 -mfpmath=sse

NA

12 hours, 46 minutes

2.66GHz Intel Xeon E5430 (Core)

3.4.6

-O3 -msse2 -mfpmath=sse

NA

6 hours, 3 minutes

3GHz Intel Xeon E5472 (Core)

3.4.6

-O3 -msse2 -mfpmath=sse

NA

5 hours, 46 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

3.4.6

-O3 -msse2 -mfpmath=sse

NA

3 hours, 8 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.1.2

-O3 -msse2 -mfpmath=sse

NA

3 hours, 10 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-O3 -msse2 -mfpmath=sse

NA

1 hours, 56 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

1

1 hours, 58 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

2

1 hours, 14 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

3

0 hours, 57 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

4

0 hours, 50 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

5

0 hours, 44 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

6

0 hours, 41 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

7

0 hours, 40 minutes

3.3GHz Intel Xeon W5590 (Nehalem)

4.4.5

-fopenmp -O3 -ftree-vectorize -msse4.1 -mfpmath=sse

8

0 hours, 38 minutes

GPU: Tesla C2050

0 hours, 19 minutes

observations

  • asegstatsdiff comparisons show minimal differences in results
  • nehalem architecture makes a difference (compared to amd opteron 200 series)
  • gcc 4.4.5 alone drops 1 hour of time
  • -ftree-vectorize -msse4.1 flags does not make any difference over -msse2
  • omp threads plot:

runtimes.jpg

CaRegTimings (last edited 2012-06-20 21:04:57 by NickSchmansky)