I’m trying to run this. I’m on ubuntu 16.04, with 2x 1080TIs and a 1030, cuda 9.1.
Initially, running with cuda-memcheck resulted in this error:
“Program hit cudaErrorInvalidConfiguration (error 9) due to “invalid configuration argument” on CUDA API call to cudaLaunch”
After reading stuff online I thought setting COMPUTE_PROFILE=0 might help, now I’m getting a different error (and unsetting the variable does not help):
“Internal Memcheck Error: Memcheck failed initialization as some other tools is currently attached. Please make sure that nvprof and Nsight Visual Studio Edition are not being run simultaneously”
Works fine for me (see below): CUDA 8, Win 7, Quadro P2000. Note that this program does not use any status checking, presumably to prevent the code from becoming cluttered. I would suggest adding that in. There may be an error long before the code gets to the kernel launch.
Running this app under cuda-memcheck crashes the display driver on my system, presumably due to hitting TDR (Windows’s two-second GUI watchdog timer) although it is not clear to me why this would be the case.
c:\Users\Norbert\My Programs>nvcc -arch=sm_61 -o cudamanaged.exe cudamanaged.cu
nvcc warning : nvcc support for Microsoft Visual Studio 2010 and earlier has been deprecated and is no longer being maintained
cudamanaged.cu
support for Microsoft Visual Studio 2010 has been deprecated!
Creating library cudamanaged.lib and object cudamanaged.exp
c:\Users\Norbert\My Programs>cudamanaged
Max error: 0
c:\Users\Norbert\My Programs>nvprof cudamanaged.exe
==6500== NVPROF is profiling process 6500, command: cudamanaged.exe
Max error: 0
==6500== Profiling application: cudamanaged.exe
==6500== Profiling result:
Time(%) Time Calls Avg Min Max Name
100.00% 251.00ms 1 251.00ms 251.00ms 251.00ms add(int, float*, float*)
==6500== Unified Memory profiling result:
Device "Quadro P2000 (0)"
Count Avg Size Min Size Max Size Total Size Total Time Name
2048 4.0000KB 4.0000KB 4.0000KB 8.000000MB 21.04488ms Host To Device
384 32.000KB 32.000KB 32.000KB 12.00000MB 7.825223ms Device To Host
==6500== API calls:
Time(%) Time Calls Avg Min Max Name
41.68% 287.57ms 2 143.78ms 4.2846ms 283.28ms cudaMallocManaged
36.42% 251.23ms 1 251.23ms 251.23ms 251.23ms cudaDeviceSynchronize
13.47% 92.920ms 1 92.920ms 92.920ms 92.920ms cuDevicePrimaryCtxRelease
7.53% 51.971ms 1 51.971ms 51.971ms 51.971ms cudaLaunch
0.72% 4.9578ms 2 2.4789ms 1.7708ms 3.1870ms cudaFree
0.11% 773.23us 91 8.4970us 0ns 365.06us cuDeviceGetAttribute
0.06% 422.53us 1 422.53us 422.53us 422.53us cuModuleUnload
0.00% 15.540us 1 15.540us 15.540us 15.540us cudaConfigureCall
0.00% 10.849us 1 10.849us 10.849us 10.849us cuDeviceTotalMem
0.00% 4.9850us 3 1.6610us 294ns 4.1050us cuDeviceGetCount
0.00% 3.2260us 3 1.0750us 587ns 1.4660us cudaSetupArgument
0.00% 1.4660us 3 488ns 293ns 879ns cuDeviceGet
0.00% 1.1720us 1 1.1720us 1.1720us 1.1720us cuDeviceGetName
You’re using CUDA 8, I’m using CUDA 9, you’re using Windows 7 (why are you using win7?!?), I’m on ubuntu, I’m using GTX1080TIs, you’re using a quadro P2000s. It’s hard to imagine a more different configuration. But thanks for the tip on error checking, how does one do that?
I am using what I have in front of me. Yes, my setup is quite different from yours, but the quick check shows that there is nothing fundamentally wrong with the code from the blog. That’s about as much work as I am willing to do for free.
Your favorite internet search engine will return pages of relevant information at the click of a button.
It’s possible that there is a bug in CUDA 9.x, but given that the code from the blog is a trivial example, I would expect any such bug to be found during regression testing and never make it into a CUDA or driver release. Is Ubuntu 16.04 on the list of officially supported operating systems for CUDA 9.1? Are you running with the stock kernel for Ubuntu 16.04?
Are you certain that you are running the code in the blog verbatim, and have not changed the n variable
is n set to 256 in your code?
If n is set to 256 in your code, have you verified your CUDA install? Instructions for verification are in the cuda linux install guide and amount to building and running a few sample projects such as vectorAdd
Don’t know what happened but it works today. I’ve had CUDA installed for 6+ months and use it almost everyday, but never needed to actually write any CUDA code until now. Thanks anyways!