I have used CUDA on and off for 3-4 years now. After upgrading to the latest version, and using “make” to make the Samples, I find that a lot of them no longer work. For example, nbody crashes with the messages
Compute 6.1 CUDA device: [NVIDIA GeForce GTX 1060 with Max-Q Design]
CUDA error at bodysystemcuda_impl.h:191 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer(&m_pGRes[i], m_pbo[i], cudaGraphicsMapFlagsNone)”
and volumeRender crashes with
CUDA error at volumeRender.cpp:424 code=999(cudaErrorUnknown) “cudaGraphicsGLRegisterBuffer( &cuda_pbo_resource, pbo, cudaGraphicsMapFlagsWriteDiscard)”
My own previously-working code also crashes. In all cases, the errors appear to be associated with code of the sort below:
cudaGraphicsGLRegisterBuffer(&cudaPBORes, glPBO, cudaGraphicsMapFlagsWriteDiscard);
cudaGraphicsMapResources(1, &cudaPBORes, 0);
Can somebody help? This is a clean installation (I just installed everything from scratch yesterday):
OS: Ubuntu 22.04
nvidia-smi output: NVIDIA-SMI 515.43.04 Driver Version: 515.43.04 CUDA Version: 11.7
deviceQuery output:
Detected 1 CUDA Capable device(s)
Device 0: “NVIDIA GeForce GTX 1060 with Max-Q Design”
CUDA Driver Version / Runtime Version 11.7 / 11.7
CUDA Capability Major/Minor version number: 6.1
Total amount of global memory: 6078 MBytes (6373376000 bytes)
(010) Multiprocessors, (128) CUDA Cores/MP: 1280 CUDA Cores
GPU Max Clock rate: 1480 MHz (1.48 GHz)
Memory Clock rate: 4004 Mhz
Memory Bus Width: 192-bit
L2 Cache Size: 1572864 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 98304 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.7, CUDA Runtime Version = 11.7, NumDevs = 1
Result = PASS