NVProf error on samples

I have the same problem as mev3000. When I try to run the same script from https://devblogs.nvidia.com/parallelforall/even-easier-introduction-cuda/

Error: unified memory profiling failed

When I turn off unified memory profiling, it runs.

Also, I check my environment variables and ensured that there is no second version of CUDA installed on my system. Everything looks as it should. I am running cuda 8, and I compile for sm_61 using nvcc

I am having the same problem. Tried all solutions and still nvprof doesn’t work. It takes 0.07 without the GPU and 0.6 seconds with the GPU. From drivers to simply checking if GPU is working, everything is littered with problems. Please spend less on marketing and more on good developers.

Facing the same problem here. Using a remote server which has a couple of GTX 1080 Ti cards, and profiling it returns the “No kernels were profiled” message. Turning off the unified memory profiling indeed picks up the kernels.

The funny thing is, doing the exact same profiling on a server with GTX 1080 cards (no Ti) there is no problem even with uniform memory profiling on. Both servers use the same distributed file system, so it impossible that write permission of directories, or the installation, is at fault here.

I however am interested in the unified memory profiling results, so fixing this bug would be highly appreciated.

Edit: in response to @txbob’s reply below, upgrading to CUDA 9 indeed solves this issue. Thanks!

Try upgrading to the latest version of CUDA 9. A new version was released today.

Is this

==5448== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.

message about non-synchronized cuda streams still running when app is closed?

For those who are getting this with CUDA 10.0 try turning off Unified memory

--unified-memory-profiling off
==196396== Warning: Some profiling data are not recorded. Make sure cudaProfilerStop() or cuProfilerStop() is called before application exit to flush profile data.
======== Error: Application received signal 139

Using CUDA 11.1.
Same problem.
cudaProfilerStop() doesn’t help.
Is this a CUDA bug or something wrong with my code?

It may mean that your application has a crash in it (illegal behavior detected at runtime, I can’t be more specific here) but in any event it means the application returned a nonzero error code. You must fix that before you can profile it. The profiler will refuse to work if the application returns a non-zero error code.