cuda profiler cudaMemcpy linux cuda visual profiler breaks working program

wlangdon · December 11, 2010, 1:45pm

I have a working CUDA prog which I would like to profile.
When I run it via NVIDIA Compute Visula Profiler Version 3.2.0
the profiler bottom window says that the exe has an “unknown error”
on the line with the first cudaMemcpy(). This after cudaMalloc()
etc which do not report errors. The profiler says my program
“failed, exit code:255”.
As I’v said, the same image works fine when used outside the profiler.

BTW why does this inferface mess with what I type so bady?

Many thanks
Bill

wlangdon · December 17, 2010, 9:27pm

Some progress. What I did was create a profiler .csv file before running the profiler
and then import this file the file menue. This avoids trying to run my exe via the profiler.

That is I use the linux commands
setenv CUDA_PROFILE 1
setenv CUDA_PROFILE_CSV 1
setenv CUDA_PROFILE_LOG test.csv
setenv CUDA_PROFILE_CONFIG myfile
setenv LD_LIBRARY_PATH “$LD_LIBRARY_PATH”:/usr/local/cuda/computeprof/bin
run my cuda program
and then run /usr/local/cuda/computeprof/bin/computeprof

I am of course still not sure what the numbers mean. But attached should be a nice plot showing
the best compute performace I have squeeze out of half of a GTX 295 (265 instructons per microsecond).
I this is the actual performance for one of the GTX 295’s multiprocessor blocks. Given the clock is 1.24Ghz
this seems to mean an average of 1 instrauction every 4.7 clock tics.

This is for an arteficial compute bound kernel with 0 divergence, 0 warp serialisation (but shared memory is
used), no use of constants.

In contrast the kernel I want to use, works out at about 51 instructions per microsecond.

Does anyone else have figures they are prepared to share?
Bill

ps Under Centos computeprof help seems to create an assistant process which when computeprof is exited
often goes into a cpu hogging loop and has to be kill PID by hand.

pps I should have said the above is for 32 threads per block. It increases to 374 instructions per microsecond
with 96 threads per block (and the same for 128, 256 and 512).

wlangdon · December 17, 2010, 9:33pm

Sorry, could not get “attachments” to work. Instead the picture is here

Bill

Topic		Replies	Views
Running CUDA Visual Profiler CUDA Programming and Performance	8	5078	October 29, 2010
CUDA Visual Profiler Error CUDA Programming and Performance	0	4825	March 17, 2010
cudav3.2 compute visual profiler under ubuntu 10.04 seeking help to resolve configuation issuein com CUDA Programming and Performance	9	1155	January 14, 2011
Cuda Visual Profiler problem unable to load cuda. CUDA Programming and Performance	3	3912	March 5, 2009
Visual profiler bug? CUDA Programming and Performance	1	939	May 5, 2011
cuda profiler error CUDA Programming and Performance	2	3512	May 28, 2008
Visual Profiler outputs nothing Help! CUDA Programming and Performance	4	9174	April 9, 2009
CUDA Profiler Error CUDA Programming and Performance	3	4140	June 5, 2008
Error in reading profiler output CUDA Programming and Performance	16	23469	September 27, 2010
A question about CUDA visual profiler visual profiler CUDA Programming and Performance	4	1756	July 7, 2009

cuda profiler cudaMemcpy linux cuda visual profiler breaks working program

Related topics