I was trying to profile my code using nvidia visual profiler but every time I ran it showed me the same error irrespective of the executable.
Which error have you met ?
Can you profile using nvprof command line ?
I am getting the error.
Unified memory profiling failed
Can you provide more details?
Which OS/Toolkit/GPU are you using?
The error caused no profile result generated or it just show as warning ?
If possible, can you paste all console output here ?
Also you can workaround issue by using nvprof --unified-memory-profiling off.
Hey Thanks that did work.
Here are the details you have asked for.
OS : Windows 10 64 bit
Toolkit : 8.0.61
GPU : Nvidia Geforce 930M
The following are the error in the log file that was generated: ==12088== NVPROF is profiling process 12088, command: F:\Theoretical Computer Science\Experiments\CUDA-local\CUDA-programs\Windows Environment\Debug\Windows Environment.exe ==12088== Warning: Unified Memory Profiling is not supported on the underlying platform. System requirements for unified memory can be found at: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements ==12088== Generated result file: C:\Users\Pradeep Kumar\nvvp_workspace\.metadata\.plugins\com.nvidia.viper\launch
The following are the error in the log file that was generated:
==12088== NVPROF is profiling process 12088, command: F:\Theoretical Computer Science\Experiments\CUDA-local\CUDA-programs\Windows Environment\Debug\Windows Environment.exe
==12088== Warning: Unified Memory Profiling is not supported on the underlying platform. System requirements for unified memory can be found at: http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#um-requirements
==12088== Generated result file: C:\Users\Pradeep Kumar\nvvp_workspace.metadata.plugins\com.nvidia.viper\launch\0\api_12088.log
This is when Unified Memory Profiling turned on.
But the error disappeared when Unified Memory Profiling turned off.
But Why did this error occur. Do you have any answer for that.
I was getting a similar problem, and then found other posts where you run the same profiling command with “sudo”, and subsequently things work, even without sudo…
I tried with sudo in my Linux environment but it said Command not found.
I also tried running nvprof as admin in Windows it didn’t work.
I just wanted to know why it didn’t work
From the log you provided, I can see the warning printed not error.
If the warning printed, it means your config do not meet the requirement described in
PS, on windows, if you have 2 GPUs that with different arch, it will also not meet UM requirement.
In this case, it will only work if you export CUDA_VISIBLE_DEVICES to only 1 GPU.
I didn’t try setting CUDA_VISIBLE_DEVICES
but setting the device using the following function
cudaError_t cudaSetDevice(int device)
worked in windows but after some time it also didn’t work.
but Doesn’t work in Linux(Ubuntu 17.04).
The problem is it the warning show at time and sometimes it doesn’t
My system satisfies the minimum requirement.
setting CUDA_VISIBLE_DEVICES to 1 also didn’t work
According to your description, it should work.
Let me clarify here:
Under cuda 8.0.61
On Linux, the warning shouldn’t print in any case if use ‘nvprof ./sample’. If you still have the issue, please check if SDK sample such as 0_Simple/matrixMul have the same problem
On Windows10, if not 2 GPU visible(you can 1_Utilities/deviceQuery to check), and the visible device not Pascal, the warning should not print. If you still have the issue, please check if SDK sample such as 0_Simple/matrixMul have the same problem
PS: Do you use the driver that contained in the toolkit ?
If you still meet this, i will find a Geforce 930M or GPU with same arch to check.
sorry for the late reply, I am getting the same problem with geforce 930M in both Windows and Linux.
I find a Geforce GTX 750(GM107) to have a try. This should similar to your Geforce 930M(GM108).
I can not reproduce the issue.No warning will printed.
Also I can do UVM profiling if there is UVM usage in the sample like 0_Simple/UnifiedMemoryStreams
OS:Win10 RS2(build id: 15063)
GPU: Geforce GTX750
So if this warning will not block your use, please ignore it temporarily.
C:\ProgramData\NVIDIA Corporation\CUDA Samples\v8.0\bin\win64\Release>nvprof ./matrixMul.exe
[Matrix Multiply Using CUDA] - Starting…
==4772== NVPROF is profiling process 4772, command: ./matrixMul.exe
GPU Device 0: “GeForce GTX 750” with compute capability 5.0
Computing result using CUDA Kernel…
Performance= 145.67 GFlop/s, Time= 0.900 msec, Size= 131072000 Ops, WorkgroupSize= 1024 threads/block
Checking computed result for correctness: Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
==4772== Profiling application: ./matrixMul.exe
==4772== Profiling result:
Time(%) Time Calls Avg Min Max Name
99.88% 268.57ms 301 892.27us 833.10us 931.91us void matrixMulCUDA<int=32>(float*, float*, float*, int, int)
0.07% 192.00us 2 96.000us 64.576us 127.43us [CUDA memcpy HtoD]
0.05% 132.74us 1 132.74us 132.74us 132.74us [CUDA memcpy DtoH]
==4772== API calls:
Time(%) Time Calls Avg Min Max Name
60.95% 268.45ms 1 268.45ms 268.45ms 268.45ms cudaEventSynchronize
29.02% 127.82ms 3 42.606ms 238.37us 127.33ms cudaMalloc
8.66% 38.148ms 1 38.148ms 38.148ms 38.148ms cuDevicePrimaryCtxRelease
0.31% 1.3751ms 301 4.5680us 3.6160us 41.285us cudaLaunch
0.30% 1.3253ms 1 1.3253ms 1.3253ms 1.3253ms cudaDeviceSynchronize
0.20% 884.77us 3 294.92us 76.845us 591.85us cudaMemcpy
0.15% 653.63us 91 7.1820us 0ns 307.38us cuDeviceGetAttribute
0.14% 601.80us 1 601.80us 601.80us 601.80us cudaGetDeviceProperties
0.10% 462.27us 3 154.09us 125.36us 204.01us cudaFree
0.06% 270.31us 1505 179ns 0ns 9.9450us cudaSetupArgument
0.04% 177.20us 1 177.20us 177.20us 177.20us cuModuleUnload
0.03% 132.29us 1 132.29us 132.29us 132.29us cuDeviceGetName
0.02% 71.115us 301 236ns 0ns 10.245us cudaConfigureCall
0.00% 17.780us 1 17.780us 17.780us 17.780us cuDeviceTotalMem
0.00% 13.862us 2 6.9310us 904ns 12.958us cudaEventCreate
0.00% 11.752us 1 11.752us 11.752us 11.752us cudaEventElapsedTime
0.00% 8.1370us 1 8.1370us 8.1370us 8.1370us cudaGetDevice
0.00% 7.8350us 2 3.9170us 1.8080us 6.0270us cudaEventRecord
0.00% 4.5210us 3 1.5070us 302ns 3.3150us cuDeviceGetCount
0.00% 1.5060us 3 502ns 301ns 904ns cuDeviceGet
You don’t need root access to fix this. Since you are using Cuda 8.0 this should be working. The problem lies in the ‘iBUS’ config. All you have to do is delete the folder /home/[user]/.config/ibus/bus and the problem will be gone.