<> ''See also:'' [[CUDADevelopersGuide]] == Detection of CUDA == CUDA detection at run-time by any script ( such as recon-all ) is possible by executing a binary file called `cudadetect`. If it exits with 0, CUDA is available. Otherwise, not. The following pseudocode is how one can use the detection scheme in their own scripts: {{{ .. cudadetect if $status == 0 setenv CUDAENABLED endif .. .. if CUDAENABLED else endif }}} There are a few reasons why cudadetect might not detect your CUDA setup. * The LD_LIBRARY_PATH or DYLD_LIBRARY_PATH doesn't have the PATH where CUDA libraries reside. Usually they reside in `/usr/local/cuda/lib` . So check to see whether these environment variables have the CUDA libraries path. Better to put it in `.profile` or `.cshrc` * `cudadetect` works by calling `dlopen()` system call which calls a function from arbitrary shared library. In this case, `cudadetect` tries to call `cudaGetDeviceCount` from `libcudart`. If either the name of the library or the name of the function change, `cudadetect` might not work. Currently, `cudadetect` works for CUDA version 2.0 but effort will be taken to ensure that `cudadetect` calls the correct library and API is called whenever CUDA is upgraded. See also the section in [[CUDADevelopersGuide]] about how to install libcuda on a machine that doesnt have a cuda gpu card (to support building cuda code on such machines). == Changes to enable CUDA during configuration ( configure.in ) == This section gives an overview of how `configure.in` is changed and what stuff it exports. More often than not, one need not tamper with configure.in -- it is already changed to accommodate CUDA. * CUDA support is enabled by using a `--with-cuda=` switch to ./configure. Suppose, the CUDA installation is in `/usr/local/cuda` ( which is the default path, usually ), then the `./configure` command would have the switch `--with-cuda=/usr/local/cuda`. But note that configure.in will look for the existence of /usr/local/cuda and use it if found, so `--with-cuda=/usr/local/cuda` is not strictly necessary. The script checks to see if it can find `nvcc`, the CUDA compiler. This is a sanity check and if it finds `nvcc`, the ./configure assumes CUDA exists and exports certain things which are helpful in compiling CUDA programs ( and which are to be used in Makefile.am ) === Flags which configure.in exports === These flags are typically used in the Makefile.am of any binary which has a CUDA replacement. * '''CUDA_CFLAGS''' : This variable contains the include path of CUDA header files. ( Usually like, CUDA_CFLAGS="-I/usr/local/cuda/include" ) * '''CUDA_LIBS''' : This variable contains the CUDA libraries to link against ( Usually like, CUDA_LIBS="-L/usr/local/cuda/lib -lcuda -lcudart" ) * '''BUILDCUDA''' : This important variable is exported as AM_CONDITIONAL. If configure finds CUDA, it sets this flag and any Makefile.am can test this flag and if it exists, compile CUDA specific stuff. This is illustrated in the next section. * '''LIBS_CUDA_MGH''': This is used in a Makefile.am when it is desired to build against libutils_cuda.a instead of libutils.a. * configure.in does '''not''' export a #define directive (such as '''FS_CUDA'''). Instead, each Makefile.am should define '''FS_CUDA''' where needed (see examples below). This flag should be used in C programs in a #ifdef #endif directive. Which is used preferably if there are only a very few number of CUDA functions which replace standard functions. Usually like, {{{ #ifdef FS_CUDA #include "cudaproc.h" cuda_procedure(args); #else cpu_procedure(args); #endif // FS_CUDA }}} And then specific rules in Makefile.am can be used to compile the .cu file where the `cuda_procedure()` usually resides. `cudaproc.h` contains the definition of the cuda_procedure() function. More in the next section. If there are a large number of functions which are to be replaced this way, a separate source file should be written with `_cuda` suffix. For instance, if `foo_file.c` is re-written with functions calling CUDA code, the file is named `foo_file_cuda.c`. == Tweaking Makefile.am of the binary == '''Listing of Makefile.am of `mri_em_register` to illustrate how Makefile.am changes:''' {{{ ## ## Makefile.am ## AM_CFLAGS=-I$(top_srcdir)/include AM_LDFLAGS= bin_PROGRAMS = mri_em_register mri_em_register_SOURCES=mri_em_register.c mri_em_register_LDADD= $(addprefix $(top_builddir)/, $(LIBS_MGH)) mri_em_register_LDFLAGS=$(OS_LDFLAGS) ## ---- ## CUDA ## ---- # BUILDCUDA is defined if configure.in finds CUDA if BUILDCUDA # rules for building cuda files .cu.o: $(NVCC) -o $@ -c $< $(NVCCFLAGS) $(AM_CFLAGS) $(MNI_CFLAGS) bin_PROGRAMS += mri_em_register_cuda mri_em_register_cuda_SOURCES = mri_em_register.c \ computelogsampleprob.cu findoptimaltransform.cu findoptimaltranslation.cu mri_em_register_cuda_CFLAGS = $(AM_CFLAGS) $(CUDA_CFLAGS) -DFS_CUDA mri_em_register_cuda_CXXFLAGS = $(AM_CFLAGS) $(CUDA_CFLAGS) -DFS_CUDA mri_em_register_cuda_LDADD = $(addprefix $(top_builddir)/, $(LIBS_CUDA_MGH)) $(CUDA_LIBS) mri_em_register_cuda_LDFLAGS = $(OS_LDFLAGS) mri_em_register_cuda_LINK = $(LIBTOOL) --tag=CC $(AM_LIBTOOLFLAGS) \ $(LIBTOOLFLAGS) --mode=link $(CCLD) $(mri_em_register_cuda_CFLAGS) \ $(CFLAGS) $(mri_em_register_cuda_LDFLAGS) $(LDFLAGS) -o $@ endif # Our release target. Include files to be excluded here. They will be # found and removed after 'make install' is run during the 'make # release' target. EXCLUDE_FILES="" include $(top_srcdir)/Makefile.extra }}} As one can notice, the Makefile.am of a CUDA enabled code differs from normal Makefile.am only in the `if BUILDCUDA.. endif` block. '''Notes :''' * CUDA support is to be enclosed in `if BUILDCUDA.. endif` block ( which is exported by configure.in if it finds CUDA ) * `.cu.o:` Makefile rule takes care of how to build any `.cu` source file which is how CUDA C files are named. Note that `NVCCFLAGS` is also exported by `configure.in` which contains the default nvcc flags. Also, `mri_em_register_cuda` needs `MNI_CFLAGS` to compile as well. * `bin_PROGRAMS` - The `+=` spec enables Makefile to build `mri_em_register_cuda` in addition to `mri_em_register` * `mri_em_register_cuda_SOURCES` - should contain all C file's name ( mri_em_register.c making use of #ifdef FS_CUDA code blocks) and all the sources having the CUDA functions it calls ( *.cu files ) * `mri_em_register_cuda_CFLAGS` - in addition to the Automake's CFLAGS, it contains `CUDA_CFLAGS` exported by `configure.in`, and most importantly, -DFS_CUDA, which is what distinguishes the mri_em_register binary from mri_em_register_cuda. * `mri_em_register_cuda_LDADD` - in addition to the other FreeSurfer libraries, it contains `CUDA_LIBS` exported by `configure.in` which has the details about CUDA libs which the binary links against. Also, it links against libutils_cuda.a (via $(LIBS_CUDA_MGH)) instead of libutils.a. * `mri_em_register_cuda_LINK` - is a rule suggesting how to link the compiled CUDA code. == Steps to CUDA-ize a binary == Suppose you have a CUDA replacement for a binary ( like the `mri_em_register` case above). Following steps are like guidelines: * The file could be named `_cuda.c` and put in the ''same directory'' where the `*.c` files of the binary reside. But if at all possible, keep the original source and include #ifdef FS_CUDA blocks, as this allows maintenance of only one file (as is done with mri_em_register.c). * All the `*.cu` files go inside the above directory and all the `*.h` files go inside the `dev/include` directory. * If the code needs timing support, look for `dev/include/chronometer.h` and `dev/utils/chronometer.c` to see how to incorporate the functions to support profiling your CUDA as well as CPU code. * Modify the `Makefile.am` of the binary to include CUDA-specific information -- `if BUILDCUDA... endif` block. The following is the template {{{ if BUILDCUDA # rules for building cuda files .cu.o: $(NVCC) -o $@ -c $< $(NVCCFLAGS) $(AM_CFLAGS) bin_PROGRAMS += dummy_cuda dummy_SOURCES = dummy.c \ dummy.cu dummy_CFLAGS = $(AM_CFLAGS) $(CUDA_CFLAGS) -DFS_CUDA dummy_CXXFLAGS = $(AM_CFLAGS) $(CUDA_CFLAGS) -D_FS_CUDA dummy_LDADD = $(addprefix $(top_builddir)/, $(LIBS_CUDA_MGH)) $(CUDA_LIBS) dummy_LDFLAGS = $(OS_LDFLAGS) dummy_LINK = $(LIBTOOL) --tag=CC $(AM_LIBTOOLFLAGS) \ $(LIBTOOLFLAGS) --mode=link $(CCLD) $(dummy_CFLAGS) \ $(CFLAGS) $(dummy_LDFLAGS) $(LDFLAGS) -o $@ endif }}} At the top of your 'main' program, you will want to acquire the CUDA device. Users can set the FREESURFER_CUDA_DEVICE environment variable to specify which one of their numerous GPUs should be used (they can find out the device numbers using the deviceQuery SDK program). Your 'main' routine should be modified as follows: {{{ // In the preamble #ifdef FS_CUDA #include "devicemanagement.h" #endif // And for main int main( int argc, char **argv ) { // Variable declarations // ... #ifdef FS_CUDA AcquireCUDADevice(); #endif // Original program continues // ... } }}} If your program just uses accelerated routines from libutils_cuda, this is all you need to do. == libutils_cuda.a == When it is necessary for CUDA code to replace or optimize a routine in the dev/utils directory (ie, libutils), then this situation is handled as follows: * the dev/Makefile.am is setup to build, in addition to the usual libutils.a, an additional lib: libutils_cuda.a. This lib is built using all the same source files as libutils, plus whatever .cu files are necessary, and also FS_CUDA is defined when building all files. For example, mrifilter.c contains CUDA code, within #define FS_CUDA blocks. When libutils_cuda.a is built, all the utils files are built with FS_CUDA defined, so the programmer just replaces the cpu code with the gpu code where needed. Then, a binary, like mri_em_register_cuda or mri_ca_register_cuda, just links against libutils_cuda.a (see the Makefile.am examples above). == Attribution == A lot of ideas on Autotools and CUDA integration were taken from [[http://code.google.com/p/beagle-lib/|beagle-lib]].