5 questions about driver api, occupancy & profiling

Hello, here are some questions I have

#1

Is it possible to split a computation in multiple executables using the runtime api?
i.e., one executable loads data to the device, the next one performs some computation, …, and so on.
I tried this, but could not get it to work right.
So, I switched to driver api - and using this approach I have no problems. But the runtime api is a lot nicer to work with.

#2

Is there a nice way to give the name of the kernel to cuModuleGetFunction() if the kernel is an instance of a template?

#3

What should be specified in cuFuncSetSharedSize()? This is discussed here http://forums.nvidia.com/index.php?showtopic=68192
The (threadMigration, matrixMulDrv) sdk-examples seem to specify the size of the (in the examples statically) allocated smem for the kernel, but not for the kernel parameters (which also resides in the smem if I’m not mistaken).

#4

When launching a kernel with only 32 threads per block, why does each thread allocate twice the registers needed?
This is what happens according to the occupancy calculator; in register allocations per block, the formula in the spreadsheet is written as:
=CEILING(MyWarpsPerBlock*2; 4)16MyRegCount

The values from the occupancy calculator the same as measured with the visual profiler.
Why do things work like this? Will future devices also work in the same manner? (Or maybe it’s not tied to workings in the hardware - maybe it will change in the next driver release? :-)

#5

Have anyone been able to use the visual profiler for timing kernels run by mex files in matlab under windows? I have tried but failed. I’m using win xp pro, matlab 2007b, and have tried many versions of the visual profiler (newest being 1.0.11).

I have tried the method described here:
http://forums.nvidia.com/index.php?showtop…ndpost&p=313762
but I can’t get it to work.

If I for example try to profile the script speed_fft.m (which comes in the matlab plug-in
http://developer.nvidia.com/object/matlab_cuda.html) I get: “Error -97 in reading profiler output. Unable to open file”.
The profiler creates some .cvs files which all contain the error message “Failed to open profile config file.”.

The error is reported before the matlab script has even started, after that the script is run.
(for all other matlab scripts I’ve tried to run I get the same error)

bump,… i still have not solved these things
please let me know if ive been too fuzzy, if so ill try to make the questions more clear.

thanks in advance

I am running ubuntu 8.04 and attempting to run the cudaprof tool. I am working through the exercises created by nvidia. When trying to profile the reverseArray_multiblock, I get the following error:

Error -97 in reading profiler output.

Unable to open file.

I found it creating (and deleting) the .conf file from the directory where the profiler is run. However, I have yet to find any other files being created.

Any suggestions would be much appreciated.