How to report a bug

To report bugs to Nvidia, you will need to first register with our developer program here. Doing this enables you to file and get feedback on bugs at the following link.

Please be prepared to provide the following details:

  • Summary
  • Relevant Area
  • Description
  • NVIDIA GPU or System
  • NVIDIA Software Version
  • OS
  • Other Details
Calling cuSparse library on Tesla A100 with CUDA11.1 is much slower than that on Tesla P100 with CUDA9.0
NPP - functions that perform an operation where a constant is on the device?
CUDA Toolkit 11.3 could not find Visual Studio 2019 Community
Got out of memory from cudaMemcpy
Dynamic SM with Dynamic Parallelism
cuBLAS gemv incx != 0 restriction
Bug in cudaMemsetAsync or in Nsight VS Edition when visualizing cudaMemsetAsync execution
Cuda memory pool performance issue
RGB to YUV conversion Color convertion
Can we specify a CUDA core dump location?
Order of registers in MMA calls
Local memory layout and 32-bit words
Ubuntu 20.04, GCC 9.3, Cuda Toolkit 11.3 - not a supported combination?
Impact of cudaMalloc() on CPU LLC
Is cudaMemcpyDeviceToDevice between a WDDM device and a TCC device possible?
VPI Gaussian Blur Max Kernel Size Limitation?
Fault is of type FAULT_PDE ACCESS_TYPE_VIRT_READ kernel problem or driver issue?
CUDA sample bicubicTexture not working
cuModuleLoadData Segment Fault Using cuda 11.4, Driver 470.57.02
Why is this CUDA kernel repeating indices with a 3D grid?
Where is the ptxas documentation?
Peaks and slow performance with cudaDeviceSynchronize
Best practices for cudaDeviceScheduleBlockingSync usage pattern on Linux
Consuming a populated JIT cache with read-only permissions
Cuda slow performance after process sleep/wait
Single cudaMemcpy across multiple allocations
cuBLAS GEMM INT8 is much slower than FP16 in T4
Issue with cooperative_groups::memcpy_async
cudaOccupancyMaxActiveBlocks returns the blocks by taking into acccount other co-running kernels?
Nppi resize doesn't work with 1x1px
Are there any branch non-divergence hints for the compiler?
What is the stream-ordered equivalent of cudaMallocPitch?
NPPI Label MakersUF Return Incorrect results in Cuda 11.4
cudaMemset in 11.4: what causes it to give cudaErrorInvalidValue?
cufftPlan creation deadly slow on CUDA 11+
Get function/global name from pointer using CUDA Driver API
nvmlDeviceGetMigDeviceHandleByIndex return wrong MIG devices when some MIG devices deleted
Calling NPP helper with large image gives kernel execution error
All CUDA-capable devices busy or unavailable
Suq.*.b32 other than suq.widht.b32 and suq.height.b32 causes cudaError 801/500
Reducing binary size while using accelerated libs
cudaArray, used size and layout
Question about getting libcuda debug symbols
Compute Capability support in desktop NVIDIA RTX A2000
cusolverDnSgetrf() fails on A100 (but not on A10) when called in a tight loop
__nanosleep not working as expected
Cannot peek at last error after a call to a dlsym()-ed function
Significant speedup of OpenCL vs CUDA
Speed difference between different driver versions
Task Manager GPU usage disabled: Windows Server 2019, Tesla V100
Feature Request: Host and Heap allocated memory transfers
NVJPEG issues and inconsistencies with transcoding
Very poor performance with NPP CrossCorrValid
Theoretical TFLOPS for FP16, BF16 and TF32 for tensor and non-tensor
Newer Drivers fail when allocating Memory Chunks of 2MB + 1 byte on multiple devices
Cublas Bug
Ampere 16x8x256 BMMA
Creating a compressed texture object
We are going to Abandon Cuda without Mingw support on windows
Malloc in Kernel Complexity (10.2)
Bug Report for nppiNV32ToBGR_8u_P2C4R_Ctx and nppiNV21ToRGB_8u_P2C4R_Ctx
Libdevice functions causing PTXAS segfault
Cusparse cholesky & structural zeros - preconditioned conjugate gradient
Debug segfault in libnvvm
Performance varies greatly with different nvcc compilers
cudaExternalMemoryGetMappedMipmappedArray for ID3D11Texture3D fails in most cases
Inconsistent performance on the A100