01-10-2013 09:01 AM - edited 01-10-2013 09:12 AM
This took me a while to work out, so I thought I'd post a quick guide showing my solution. The following provides instructions on implementing CUDA (nVidia's GPU processing language) code in LabWindows. Specifically, I adapted the library "cufft", which nVidia provides as a convenient way to perform batch FFT processing without having to worry about hardware optimization.
This is the solution I used as a model:
http://stackoverflow.com/questions/9363827/building-gpl-c-program-with-cuda-module
Basically the idea is quite simple.
Step one: Write an intermediate C source file and header that "wrap" the CUDA code you wish to call in plain vanilla C code, as in the example I linked. You will call these C functions in LabWindows to execute the CUDA code. For example, I this is the prototype for the wrapper function I wrote for CUDA malloc functions:
int TOMCUFFT_cudaMalloc(void** allocatedMemory, size_t numberOfBytes);
TOMCUFFT_cudaMalloc() can be called from within LabWindows. The body of the function simply calls cudaMalloc for numberOfBytes of memory, and stores the resulting address in the pointer allocatedMemory.
Step two: Generate C object code from your CUDA files. This involves compiling it with the nVidia CUDA compiler (nvcc) to object code. This generates (in Windows) a file with the extension ".obj" (-o flag for nvcc, default option when using Visual Studio). Object code is compiled binary code which has not yet been linked into an executable. LabWindows is capable of linking in external binary files to your project in the process of creating the final ".exe" executable. If you are using an IDE such as Visual Studio configured to compile CUDA code, you can compile it to a static library (".lib" file in Windows), though I can't think of any reason why this would more desirable than an object file.
IMPORTANT! Make sure to compile the object code / library with C style linkage! Do this by adjusting the settings of the IDE you are working with CUDA in, or by wrapping the entire C source file with:
extern "C" { }
Otherwise, LabWindows's linker will not be able to link these files in, as LabWindows only seems to have support for C linkage, and not C++ linkage.
Step three: Add files to LabWindows project. Now you have a static library of functions with C-style calling conventions with already compiled CUDA code in them. The header file lets you use these functions in LabWindows. Add your header file and the object code / static library to the LabWindows project. Finally, because static libraries do not actually contain all of their dependencies, you have to add in some of the CUDA libraries provided by nVidia. The default dependency is "cudart.lib". There are two copies (32 and 64 bit) of this file, so make sure you get the right one.
LabWindows should now be able to compile the rest of your project, and during linking it will link in cudart.lib and your object / library file. Good luck!
The implementation I made and tested uses the CUDA library "cufft", which is designed to perform a batch Fourier transform without the coder having to make decisions about block/grid sizes, optimization, etc etc. I am attaching:
- The CUDA source file, which I compiled to a static library in Visual Studio 10, configured to compile .cu files with nvcc to object files (default CUDA configuration), and for the project to take that and produce a ".lib" instead of ".exe".
- The header file used to "wrap" the CUDA object code, and to add in a few definitions which are present in "cufft.lib". I did not wish to add "cufft.lib" directly to my LabWindows project, however, there's something in there the linker doesn't like, so I just re-defined them in the header file (such as the data type "cufftComplex").
- The static library, which you can try adding to your own LabWindows project if you want to perform batch Fourier transforms on the GPU without having to do anything else.
(I had to add a ".txt" extension to get the forum to accept the files, just remove it again)
EDIT: the header file comments are partially incorrect
- the CUDA host memory allocation function is now "cudaMallocHost" instead of "cudaHostAlloc"
- instead of passing the cufft plan to the functions as a void*, it is now of the cufftHandle datatype, which I typedef at the top of the file to be the same as that defined in the cufft library (unsigned int)
- the provided library is for 32 bit applications