Haswell AVX2 Performance Test


In 2013, Intel introduced the 'Haswell' chip architecture, which included silicon to improve performance via new 256-bit-wide instructions:

At the Martinos Center, the machine 'nike' has this chip, and also runs Centos 7, which has gcc 4.8, which has support for the flag to enable AVX2 instructions.

To test AVX2 performance, freesurfer was built with and without AVX2 support, and recon-all executed on nike.

In summary, only a tiny improvement was found: about 5 minutes was saved on a 'parallelized' run, and 20 minutes on a non-parallelized run.

Test Setup

The test data is here:

cd /autofs/space/nike_001/users/nicks/subjects/Haswell-AVX2-Testing

The avx2 and non-avx2 builds, created by z.kaufman, are links:


The haswell build included this flag in the freesurfer source build:


and also in the VXL build, as VXL performs a lot of freesurfer math functions.

The script 'runloop' was used to run recon-all with and without parallelization, just after nike was reboot, and during a time when nobody was running other tasks:

foreach r (1 2 3 4)
 pushd stable_install_haswsell
 source /homes/11/nicks/bin/setfreesurferhome
 recon-all -s nick-haswell-1 -all -parallel -openmp 8 -clean

 pushd stable_install_no_haswsell
 source /homes/11/nicks/bin/setfreesurferhome
 recon-all -s nick-NOhaswell-1 -all -parallel -openmp 8 -clean

 pushd stable_install_haswsell
 source /homes/11/nicks/bin/setfreesurferhome
 recon-all -s nick-haswell-2 -all

 pushd stable_install_no_haswsell
 source /homes/11/nicks/bin/setfreesurferhome
 recon-all -s nick-NOhaswell-2 -all -clean

'grep'ing on 'recon-all-run-time' in the recon-all.log shows the completed runtime.



w/o avx2

w/ avx2


-parallel -openmp 8

2.913 hours (nick-NOhaswell-1)

2.888 hours (nick-haswell-1)


-parallel -openmp 8

2.879 hours (nick-NOhaswell-1)

2.835 hours (nick-haswell-1)


(single cpu)

6.499 hours (nick-NOhaswell-2)

6.181 hours (nick-haswell-2)


These were run to determine if avx2 changes results:

asegstatsdiff nick-haswell-1 nick-NOhaswell-1
asegstatsdiff nick-haswell-2 nick-NOhaswell-2
aparcstatsdiff nick-haswell-1 nick-NOhaswell-1 lh aparc thickness
aparcstatsdiff nick-haswell-2 nick-NOhaswell-2 lh aparc thickness

It was found that results DO differ. eg. l hippo by -1.36%, r hippo by 0.93%, l ento by -1.07%, r ento by 3.58% (!?) These differences were identical between parallized and non-parallelized run, confirming that -openmp does not change results (also confirmed by comparing nick-NOhaswell-1 and nick-NOhaswell-2).


It does not appear worthwhile to provide a haswell avx2 optimized build to the public, given the maintenance overhead and the minimal performance increase.

A builder can build their own avx2 enabled build though by including these flags with configure on a centos 7 system:

--with-avx2 --with-vxl-dir=/usr/pubsw/packages/1.14.0_centos7_build

Note: running avx2-enabled binaries on a non-haswell processor will cause the binary to core dump.

Avx2Testing (last edited 2021-05-03 08:07:10 by DevaniCordero)