No GPU devices in Session

Just new in CUDA, so excuse any mistakes or misunderstandings.

I am trying to profile a simple C++ program from the book “Learn CUDA Programming” in chapter2 you can find in the link below.

Running the Visual Profiler, in the “Overall GPU usage” tab, I get “No GPU devices in Session”, which means I far as I can understand that no GPU’s were used.

My laptop Asus has two video cards. One integrated (Intel) and another one, Nvidia GTX 960M.

I suspected that the visual profiler is using the integrated video card, so I changed the default video card for this specific application, under the “Nvidia Control Panel” and “Manager 3d Settings->Program Settings” to use the “High-Performance NVidia Processor”.

Nothing changed. Also, I noticed that the Nvidia display icon in the notification area is not reporting any applications that are using the video card.

What seems to be the problem here? How can I enable also the Nvidia GPU for both Visual Profiler and the command line nvprof.exe application?

Best Regards,

PS: I installed CUDA Version 11

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:48_Pacific_Daylight_Time_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.relgpu_drvr445TC445_37.28540450_0

What is the OS version?
Is the NVIDIA driver correctly installed?
Did you install the driver as part of the CUDA 11 installation?
Please check the driver version under NVIDIA Control Panel “Help->System Information”.

Hi

My OS is win10 64bit. Everything works just fine except this. Cuda SDK updated my video card driver since it was a little bit older.

I have changed successfully other app’s video card under the Nvidia control panel, but visual and cli profilers do not seem to take in account this. Is there any chance that are pretty old to recognize multiple video cards?

I have installed also insight but I didn’t test it yet, but I am pretty sure that it will work. I will give it a try layer.

In this case, I just want to follow the book instructions with these specific tools.

Regards,

Edited: Take a look to the images attached. It seems neither Nsight works for me.

  1. You mentioned:
    "I changed the default video card for this specific application, under the “Nvidia Control Panel” and “Manager 3d Settings->Program Settings” to use the “High-Performance NVidia Processor”.
    These control panel settings are not required for using nvprof or Visual Profiler.

  2. Can you please confirm that you can run the same CUDA applications standalone successfully (without any profiler)?

  3. You mentioned:
    “Running the Visual Profiler, in the “Overall GPU usage” tab, I get “No GPU devices in Session”, which means I far as I can understand that no GPU’s were used.”
    Could you see the timeline view in Visual Profiler?
    It will be best to first try and check if nvprof works.

  4. Note that the GPU you have Nvidia GTX 960M is a Maxwell architecture GPU and it is not supported by Nsight Compute. Please refer the Nsight Compute document - https://docs.nvidia.com/nsight-compute/ReleaseNotes/index.html#gpu-support

thanks,

  1. Ok, I thought that it may be an activation of the high-performance video card problem.
  2. Yes, I can confirm that the application at the link below (aos.cu and soa.cu ) runs successfully without any errors.
    https://github.com/PacktPublishing/Learn-CUDA-Programming/tree/master/Chapter02/02_memory_overview/03_aos_soa
    3)Yes I can see the timeline in the Visual Profiler

    The CLI nvprof reports “No kernels were profiled.” but it shows some statistics like those below.
    https://ibb.co/26Nsd8X
  1. Didn’t know this, I am sorry. I will check it.

Hi,

I looked at the source code you pointed out
https://github.com/PacktPublishing/Learn-CUDA-Programming/tree/master/Chapter02/02_memory_overview/03_aos_soa

I can see that makefile builds two executable - “aos_soa” and “aos_soa_solved” but I cannot see corresponding files in the directory. The listed source files are “aos.cu” and “soa.cu”. From your screen-shot it seems you ran aos_soa.exe. Did you make any changes to the github original code to build aos_soa.exe?

I tried building “aos.cu” and “soa.cu” at my end and profiled both executable in nvvp/nvprof. It works perfectly fine for me.

Here is nvprof output, You can see one kernel profiled

$ nvprof ./aos
==14494== NVPROF is profiling process 14494, command: ./aos
==14494== Profiling application: ./aos
==14494== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
GPU activities:  100.00%  114.72us         1  114.72us  114.72us  114.72us  complicatedCalculation(Coefficients_AOS*)
      API calls:   99.23%  202.92ms         1  202.92ms  202.92ms  202.92ms  cudaMalloc
                    0.38%  786.05us       100  7.8600us     278ns  487.64us  cuDeviceGetAttribute
                    0.24%  483.75us         1  483.75us  483.75us  483.75us  cuDeviceTotalMem

However there are few reasons when nvprof result in “No kernels were profiled” situation. Can you please try few things and see if you get the kernel result.

Make an explicit synchronization call cudaDeviceSynchronize() after your kernel launch before freeing up device memory. e.g.

complicatedCalculation<<<num_blocks,NUM_THREADS>>>(d_x);
cudaDeviceSynchronize();

OR

Call nvprof with flag --unified-memory-profiling off
e.g. > nvprof --unified-memory-profiling off ./aos

Problem solved. As a beginner in the CUDA world, I didn’t know that I should add the parameter gencode to compile my CUDA files under the command line (Visual Studio of CUDA SDK sample projects are already had these parameters that’s why I had GPU activity).

So, the full parameter list under the command line should be like this for my maxwell architecture with CUDA Capability Major/Minor version number 5.0.

nvcc -run -m64 -gencode arch=compute_50,code=sm_50 -o aos_soa.exe aos_soa.cu

Unfortunately, my 1st book “Learn CUDA Programming”, from Packt Publishing, on page 49 refers that I should compile ONLY with the following parameters, besides the fact that in the source code files is contains a “Makefile” that includes all of the parameters above (only for linux, so I ignored it).

$ nvcc -o aos_soa ./aos_soa.cu
Now I can see my GPU statistics under nvprof.

nvcc -run -m64 -gencode arch=compute_50,code=sm_50 -o aos_soa.exe aos_soa.cu

nvprof aos_soa.exe
==18308== NVPROF is profiling process 18308, command: aos_soa.exe
==18308== Profiling application: aos_soa.exe
==18308== Profiling result:
            Type  Time(%)      Time     Calls       Avg       Min       Max  Name
 GPU activities:  100.00%  1.1421ms         1  1.1421ms  1.1421ms  1.1421ms  complicatedCalculation(Coefficients_SOA*)
      API calls:   83.40%  226.57ms         1  226.57ms  226.57ms  226.57ms  cudaMalloc
                   15.90%  43.183ms         1  43.183ms  43.183ms  43.183ms  cuDevicePrimaryCtxRelease
                    0.58%  1.5790ms         1  1.5790ms  1.5790ms  1.5790ms  cudaFree
                    0.07%  198.40us         1  198.40us  198.40us  198.40us  cuModuleUnload
                    0.03%  70.100us         1  70.100us  70.100us  70.100us  cudaLaunchKernel
                    0.01%  26.800us         1  26.800us  26.800us  26.800us  cuDeviceTotalMem
                    0.01%  20.200us       101     200ns     100ns  3.3000us  cuDeviceGetAttribute
                    0.00%  11.600us         1  11.600us  11.600us  11.600us  cuDeviceGetPCIBusId
                    0.00%  1.4000us         3     466ns     200ns     700ns  cuDeviceGetCount
                    0.00%  1.4000us         2     700ns     200ns  1.2000us  cuDeviceGet
                    0.00%     600ns         1     600ns     600ns     600ns  cuDeviceGetName
                    0.00%     400ns         1     400ns     400ns     400ns  cuDeviceGetLuid
                    0.00%     300ns         1     300ns     300ns     300ns  cuDeviceGetUuid

Best Regards,

PS : @ rameshgunjal, What video card do you have? I see a huge speedup in our GPU metric results.

Good to know that your problem is solved.

I use GEFORCE RTX 2080 Ti (compute capability 7.5)


Thanks,
Ramesh