Differences between revisions 1 and 2
Deletions are marked like this. Additions are marked like this.
Line 17: Line 17:
The test data is here:
{{{
cd /autofs/space/nike_001/users/nicks/subjects/Haswell-AVX2-Testing
setenv SUBJECTS_DIR $PWD
}}}
The avx2 and non-avx2 builds, created by z.kaufman, are links:
{{{
/space/nike/1/users/zkaufman/freesurfer_centos7/stable_install_haswsell
/space/nike/1/users/zkaufman/freesurfer_centos7/stable_install_no_haswsell
}}}
The haswell build included this flag in the freesurfer source build:
Line 20: Line 31:
and also in the VXL build, as VXL performs a lot of freesurfer math functions.
Line 21: Line 33:
The script 'runloop' was used to run recon-all with and without parallelization, just after nike was reboot, and during a time when nobody was running other tasks:
{{{
#! /bin/tcsh

foreach r (1 2 3 4)
 pushd stable_install_haswsell
 source /homes/11/nicks/bin/setfreesurferhome
 popd
 recon-all -s nick-haswell-1 -all -parallel -openmp 8 -clean

 pushd stable_install_no_haswsell
 source /homes/11/nicks/bin/setfreesurferhome
 popd
 recon-all -s nick-NOhaswell-1 -all -parallel -openmp 8 -clean

 pushd stable_install_haswsell
 source /homes/11/nicks/bin/setfreesurferhome
 popd
 recon-all -s nick-haswell-2 -all

 pushd stable_install_no_haswsell
 source /homes/11/nicks/bin/setfreesurferhome
 popd
 recon-all -s nick-NOhaswell-2 -all -clean
end
}}}

'grep'ing on 'recon-all-run-time' in the recon-all.log shows the completed runtime.
Line 23: Line 63:
||run||w/o avx2||w/ avx2||%d||
||-parallel -openmp 8||2.913||2.888||0.86%||
||-parallel -openmp 8||2.879||2.835||1.53%||
||<single cpu>||6.499||6.181||4.89%||

AVX2 Performance Test

Summary

In 2014, Intel introduced the 'Haswell' chip architecture, which included silicon to improve performance via new 256-bit-wide instructions:

At the Martinos Center, the machine 'nike' has this chip, and also runs Centos 7, which has gcc 4.8, which has support for the flag to enable AVX2 instructions.

To test AVX2 performance, freesurfer was built with and without AVX2 support, and recon-all executed on nike.

In summary, only a tiny improvement was found: about 5 minutes was saved on a 'parallelized' run, and 20 minutes on a non-parallelized run.

Test Setup

The test data is here:

cd /autofs/space/nike_001/users/nicks/subjects/Haswell-AVX2-Testing
setenv SUBJECTS_DIR $PWD

The avx2 and non-avx2 builds, created by z.kaufman, are links:

/space/nike/1/users/zkaufman/freesurfer_centos7/stable_install_haswsell
/space/nike/1/users/zkaufman/freesurfer_centos7/stable_install_no_haswsell

The haswell build included this flag in the freesurfer source build:

-march=avx2

and also in the VXL build, as VXL performs a lot of freesurfer math functions.

The script 'runloop' was used to run recon-all with and without parallelization, just after nike was reboot, and during a time when nobody was running other tasks:

foreach r (1 2 3 4)
 pushd stable_install_haswsell
 source /homes/11/nicks/bin/setfreesurferhome
 popd
 recon-all -s nick-haswell-1 -all -parallel -openmp 8 -clean

 pushd stable_install_no_haswsell
 source /homes/11/nicks/bin/setfreesurferhome
 popd
 recon-all -s nick-NOhaswell-1 -all -parallel -openmp 8 -clean

 pushd stable_install_haswsell
 source /homes/11/nicks/bin/setfreesurferhome
 popd
 recon-all -s nick-haswell-2 -all

 pushd stable_install_no_haswsell
 source /homes/11/nicks/bin/setfreesurferhome
 popd
 recon-all -s nick-NOhaswell-2 -all -clean
end

'grep'ing on 'recon-all-run-time' in the recon-all.log shows the completed runtime.

Results

run

w/o avx2

w/ avx2

%d

-parallel -openmp 8

2.913

2.888

0.86%

-parallel -openmp 8

2.879

2.835

1.53%

Expected "=" to follow "single"

6.499

6.181

4.89%

Conclusion

Avx2Testing (last edited 2021-05-03 08:07:10 by DevaniCordero)