Differences between revisions 17 and 18
Deletions are marked like this. Additions are marked like this.
Line 30: Line 30:
||mri_watershed: 10min||mri_watershed_cuda||10x||
||mri_ca_label: 60min||mri_ca_label_cuda||15x||
||mri_robust_register: 30min||mri_robust_register_cuda||15x||
||mri_glmfit (monte-carlo sim): 600min||mri_glmfit_cuda||150x||
||mri_glmfit (permutation): 300min||mri_glmfit_cuda||100x||
Line 33: Line 38:
||mris_volmask: 60min||mris_volmask_cuda||||

CUDA Developers Guide

See also: GpuDevelopersGuide

This is the root of the CUDA developing documentation. Everything should go underneath here. For instance if one wants to add a CUDATesting page, it should go in http://surfer.nmr.mgh.harvard.edu/fswiki/CUDADevelopersGuide/CUDATesting and a link to that page should be pointed from here.

Enabling CUDA in the Build Environment

CUDAEnabling page gives one an idea of what tweaks have been done in the build environment and what tweaks one should make to add their CUDA enabled binary to the build environment

Development Notes

Running within recon-all

The convention for running CUDA-enabled executables within recon-all is the following. Firstly, it was decided that rather than create a single executable with a --use-cuda switch, instead, a separate executable with the post-fix _cuda is to be created, ie. mri_em_register_cuda (paired with mri_em_register). The recon-all script will itself accept a -use-cuda switch, which will then run the cuda-enabled executable over the default (ie mri_em_register_cuda instead of mri_em_register).

Refer to CUDAEnabling, particularly the Makefile.am example.

Benchmarks

_cuda executables are the GPU-enabled versions

CPU

GPU

order speed-up

mri_em_register: 33min

mri_em_register_cuda: 3min

10x

mri_ca_register

mri_ca_register_cuda

mri_watershed: 10min

mri_watershed_cuda

10x

mri_ca_label: 60min

mri_ca_label_cuda

15x

mri_robust_register: 30min

mri_robust_register_cuda

15x

mri_glmfit (monte-carlo sim): 600min

mri_glmfit_cuda

150x

mri_glmfit (permutation): 300min

mri_glmfit_cuda

100x

mris_sphere: 140min

mris_sphere_cuda: 20min

7x

mris_inflate

mris_inflate_cuda

mris_flatten

mris_flatten_cuda

mris_volmask: 60min

mris_volmask_cuda

HPC w/ CUDA Tutorials

NVidia's HPC w/ CUDA Tutorials

CUDADevelopersGuide (last edited 2012-04-09 10:34:24 by NickSchmansky)