Errors with cuda 2.0 beta 2 & visual profiler 1.0

I guess I can/need to post my errors here now. I am experiencing the following errors on T10P:

  1. The visual profiler does not demangle method names anymore.

  2. When clicking the height plot icon, I get a segmentation fault, when it is a session that ran on GT200 (and has the method names mangled) Sessions that I ran before on 8800GTX (and which have demangled function names) have no trouble showing the height plot.

  3. running simpleVoteIntrinsics does not select the right card. After changing the code to select the GT200 the test runs, but indicates failure:

simpleVoteIntrinsics: Using Device 0: “GT200”
[VOTE Kernel Test 1/3]
Running <<Vote.Any>> kernel1 …
<Vote.Any>[0 - 31] = 1023-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1 - FAILED!
<Vote.Any>[32 - 63] = -1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1 - FAILED!
<Vote.Any>[64 - 95] = -1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1 - FAILED!
<Vote.Any>[96 - 127] = -1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1 - FAILED!

[VOTE Kernel Test 2/3]
Running <<Vote.All>> kernel2 …
<Vote.All>[0 - 31] = 1023-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1 - FAILED!
<Vote.All>[96 - 127] = -1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1-1 - FAILED!

[VOTE Kernel Test 3/3]
Running <<Vote.Any>> kernel3 …


This looks like it might be a bug. Can you save the project attach the cudaprof project files? If you’re not able to post the files publicly, please send me a PM, and we’ll arrange another mechanism.


Tomorrow when back at work, I’ll take a project file and send it by PM to you.

Reminds me, I still have to make a small test kernel that has different results for 32-thread blocks with & without __syncthreads(); I’ll do that too tomorrow and send it also (if it still is different with the new driver & toolkit)

Which OS are you using when you’re seeing this problem?
Can you provide the executable that you’re running as well?

I am running a simulation in matlab that calls CUDA via mex.
Actually I have 2 projects that have the same crash, both are running inside matlab.
Monday, I will try to profile one of the SDK examples to see if it also happens then.

I made a CUDA 2.0 b2 application for Windows x86. It’s an .EXE / IJW/CLR made using Managed C++. It uses CUDA via a plugin DLL that it’s loaded at realtime with LoadLibrary(). When I try to profile it using the CUDA visual profiler I got these errors:

=== Start profiling for session 'Session1' ===

Start program 'E:/myApp/myApp.exe' run #1 ...

Program run #1 failed, exit code: 1

Error -3 in program execution.

I tried all… to enable/disable timestamps, profiler counters, etc… with the same result.

I used VS2005 SP1, Windows XP Pro x86 with SP3 installed. Got the CUDA 2.0 b2 SDK and toolkit installed and running ok. Any idea why this happens?


I think your .exe is returning an error code. That is why the profiler is complaining.

You were right! I changed the main() to return correct error codes and now that error is not shown anymore… but I got this new one :">

=== Start profiling for session 'Session1' ===

Start program 'E:/myApp/myApp.exe' run #1 ...

Program run #1 completed.

Start program 'E:/myApp/myApp.exe' run #2 ...

Program run #2 completed.

Start program 'E:/myApp/myApp.exe' run #3 ...

Program run #3 completed.

Error -96 in reading profiler output.

No data rows in profiler output file.

Thanks for sending the saved cudaprof project files. I suppose Session6 is the latest session for the application run on GT200 for which you have reported the problems.

Using the project - we could not reproduce the height plot crash on FC8 (we also tried on other platforms). We see a correct height plot for Session6 (and also the other sessions 1-5). Can you confirm that you are using the latest version of cudaprof v1.0.11 which is released along with the CUDA 2.0 beta2 release? Also you could try opening the saved project & check if height plot works.

Also we find that the method names for Session6 are correctly demangled when the main menu option “Options->Demangle Method Names” is enabled. Can you check that this option is enabled before running the app.

Regarding point (3) “running simpleVoteIntrinsics does not select the right card”. I suppose you have multiple CUDA capable devices on your system. You can check by running the CUDA SDK deviceQuery sample. Note that by default device0 is used. If you need to select any device other 0 you will need to modify the code (as you have already done).

I donwloaded the latest version of cudaprof, but I am afraid, I extracted an older version I downloaded before, because now I extracted the latest version I indeed have a nice graph & demangled names… Sorry about that false bug report.

About the simplevoteintrinsics, the output that I posted is what I get after modifying the code like this (not the nicest way probably, but it got the T10P selected)

   cudaDeviceProp deviceProp;

    int dev=1, warp_size;

	CUDA_SAFE_CALL(cudaChooseDevice(&dev, &deviceProp));

    CUDA_SAFE_CALL(cudaGetDeviceProperties(&deviceProp, 1));



	warp_size = 1;


	if ((deviceProp.major > 1 || deviceProp.minor >= 2))


  printf("simpleVoteIntrinsics: Using Device %d: \"%s\"\n", dev,;


	} else 


  printf("simpleVoteIntrinsics: requires Compute Capability 1.2 or higher\n");

  printf("Aborting test\n");

  printf("TEST PASSED\n");



	warp_size = deviceProp.warpSize;


It does print GT200 as selected device, so I still think there is something else wrong. Here is my nvcc -V output:

nvcc -V

nvcc: NVIDIA ® Cuda compiler driver

Copyright © 2005-2007 NVIDIA Corporation

Built on Tue_Jun_10_04:42:57_PDT_2008

Cuda compilation tools, release 1.1, V0.2.1221

I think this is the latest version given the date it was built?