Just new in CUDA, so excuse any mistakes or misunderstandings.
I am trying to profile a simple C++ program from the book “Learn CUDA Programming” in chapter2 you can find in the link below.
Running the Visual Profiler, in the “Overall GPU usage” tab, I get “No GPU devices in Session”, which means I far as I can understand that no GPU’s were used.
My laptop Asus has two video cards. One integrated (Intel) and another one, Nvidia GTX 960M.
I suspected that the visual profiler is using the integrated video card, so I changed the default video card for this specific application, under the “Nvidia Control Panel” and “Manager 3d Settings->Program Settings” to use the “High-Performance NVidia Processor”.
Nothing changed. Also, I noticed that the Nvidia display icon in the notification area is not reporting any applications that are using the video card.
What seems to be the problem here? How can I enable also the Nvidia GPU for both Visual Profiler and the command line nvprof.exe application?
Best Regards,
PS: I installed CUDA Version 11
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2020 NVIDIA Corporation
Built on Thu_Jun_11_22:26:48_Pacific_Daylight_Time_2020
Cuda compilation tools, release 11.0, V11.0.194
Build cuda_11.0_bu.relgpu_drvr445TC445_37.28540450_0
What is the OS version?
Is the NVIDIA driver correctly installed?
Did you install the driver as part of the CUDA 11 installation?
Please check the driver version under NVIDIA Control Panel “Help->System Information”.
Hi
My OS is win10 64bit. Everything works just fine except this. Cuda SDK updated my video card driver since it was a little bit older.
I have changed successfully other app’s video card under the Nvidia control panel, but visual and cli profilers do not seem to take in account this. Is there any chance that are pretty old to recognize multiple video cards?
I have installed also insight but I didn’t test it yet, but I am pretty sure that it will work. I will give it a try layer.
In this case, I just want to follow the book instructions with these specific tools.
Regards,
Edited: Take a look to the images attached. It seems neither Nsight works for me.
Hi,
I looked at the source code you pointed out
https://github.com/PacktPublishing/Learn-CUDA-Programming/tree/master/Chapter02/02_memory_overview/03_aos_soa
I can see that makefile builds two executable - “aos_soa” and “aos_soa_solved” but I cannot see corresponding files in the directory. The listed source files are “aos.cu” and “soa.cu”. From your screen-shot it seems you ran aos_soa.exe. Did you make any changes to the github original code to build aos_soa.exe?
I tried building “aos.cu” and “soa.cu” at my end and profiled both executable in nvvp/nvprof. It works perfectly fine for me.
Here is nvprof output, You can see one kernel profiled
$ nvprof ./aos
==14494== NVPROF is profiling process 14494, command: ./aos
==14494== Profiling application: ./aos
==14494== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 100.00% 114.72us 1 114.72us 114.72us 114.72us complicatedCalculation(Coefficients_AOS*)
API calls: 99.23% 202.92ms 1 202.92ms 202.92ms 202.92ms cudaMalloc
0.38% 786.05us 100 7.8600us 278ns 487.64us cuDeviceGetAttribute
0.24% 483.75us 1 483.75us 483.75us 483.75us cuDeviceTotalMem
However there are few reasons when nvprof result in “No kernels were profiled” situation. Can you please try few things and see if you get the kernel result.
Make an explicit synchronization call cudaDeviceSynchronize() after your kernel launch before freeing up device memory. e.g.
complicatedCalculation<<<num_blocks,NUM_THREADS>>>(d_x);
cudaDeviceSynchronize();
OR
Call nvprof with flag --unified-memory-profiling off
e.g. > nvprof --unified-memory-profiling off ./aos
Problem solved. As a beginner in the CUDA world, I didn’t know that I should add the parameter gencode to compile my CUDA files under the command line (Visual Studio of CUDA SDK sample projects are already had these parameters that’s why I had GPU activity).
So, the full parameter list under the command line should be like this for my maxwell architecture with CUDA Capability Major/Minor version number 5.0.
nvcc -run -m64 -gencode arch=compute_50,code=sm_50 -o aos_soa.exe aos_soa.cu
Unfortunately, my 1st book “Learn CUDA Programming”, from Packt Publishing, on page 49 refers that I should compile ONLY with the following parameters, besides the fact that in the source code files is contains a “Makefile” that includes all of the parameters above (only for linux, so I ignored it).
$ nvcc -o aos_soa ./aos_soa.cu
Now I can see my GPU statistics under nvprof.
nvcc -run -m64 -gencode arch=compute_50,code=sm_50 -o aos_soa.exe aos_soa.cu
nvprof aos_soa.exe
==18308== NVPROF is profiling process 18308, command: aos_soa.exe
==18308== Profiling application: aos_soa.exe
==18308== Profiling result:
Type Time(%) Time Calls Avg Min Max Name
GPU activities: 100.00% 1.1421ms 1 1.1421ms 1.1421ms 1.1421ms complicatedCalculation(Coefficients_SOA*)
API calls: 83.40% 226.57ms 1 226.57ms 226.57ms 226.57ms cudaMalloc
15.90% 43.183ms 1 43.183ms 43.183ms 43.183ms cuDevicePrimaryCtxRelease
0.58% 1.5790ms 1 1.5790ms 1.5790ms 1.5790ms cudaFree
0.07% 198.40us 1 198.40us 198.40us 198.40us cuModuleUnload
0.03% 70.100us 1 70.100us 70.100us 70.100us cudaLaunchKernel
0.01% 26.800us 1 26.800us 26.800us 26.800us cuDeviceTotalMem
0.01% 20.200us 101 200ns 100ns 3.3000us cuDeviceGetAttribute
0.00% 11.600us 1 11.600us 11.600us 11.600us cuDeviceGetPCIBusId
0.00% 1.4000us 3 466ns 200ns 700ns cuDeviceGetCount
0.00% 1.4000us 2 700ns 200ns 1.2000us cuDeviceGet
0.00% 600ns 1 600ns 600ns 600ns cuDeviceGetName
0.00% 400ns 1 400ns 400ns 400ns cuDeviceGetLuid
0.00% 300ns 1 300ns 300ns 300ns cuDeviceGetUuid
Best Regards,
PS : @ rameshgunjal, What video card do you have? I see a huge speedup in our GPU metric results.
Good to know that your problem is solved.
I use GEFORCE RTX 2080 Ti (compute capability 7.5)
–
Thanks,
Ramesh