I converted a program from IDL into CUDA that performs some calculations on a 256X256Xn cube of densities and renders a 2-D image.The program works correctly, but all the pre-processing is still done in IDL (such as reading in the density cube, etc) and passes that info to a wrapper function (using call_external to a C program), that then calls CUDA.
Currently I am trying to optimize the program and would like to use NVIDIA Visual Profiler to check my coalescence, and was wondering if there was a way to do this…a way to get the visual profiler to run when I call the CUDA part of the program?
I currently can’t test anything because there are way too many variables to just hard-code into the CUDA function, but without those values passed in from IDL to C to CUDA it cannot run.
I do have it set up so I can run the IDL, have it stop and then manually call the C wrapper function instead of just running the IDL and having it automatically do everything.
Thanks