How to report a bug
|
|
2
|
18604
|
May 27, 2024
|
Convert ID3d11Resource to fp32 tensor in CUDA
|
|
1
|
11
|
May 23, 2025
|
Blackwell Integer
|
|
131
|
1987
|
May 23, 2025
|
Clarification on Implicit Synchronization Behavior Change in CUDA 12.8+
|
|
0
|
16
|
May 23, 2025
|
cub::DeviceRadixSort output wrong result when number of items larger than 4800
|
|
3
|
60
|
May 23, 2025
|
Possibilities to further optimize PoC programme using custom copy kernels
|
|
28
|
99
|
May 22, 2025
|
Streams not running conccurently
|
|
4
|
31
|
May 22, 2025
|
Matrix multiplication: Two codes with similar assembly, but different performance?
|
|
0
|
21
|
May 22, 2025
|
Call to cuImportExternalSemaphore always fails
|
|
0
|
13
|
May 21, 2025
|
How to get the most dot products of batched vectors out of L4 GPU
|
|
8
|
39
|
May 20, 2025
|
Large Matrix Batch Transpose
|
|
5
|
33
|
May 19, 2025
|
Memcpy performance on GH200
|
|
1
|
27
|
May 20, 2025
|
Efficient in-place transpose of multiple square float matrices
|
|
10
|
5474
|
October 10, 2019
|
Help with ldmatrix instruction
|
|
0
|
14
|
May 19, 2025
|
How to use mma.sp PTX instruction
|
|
0
|
15
|
May 19, 2025
|
Using CUDA virtual memory API for host allocation
|
|
4
|
23
|
May 17, 2025
|
Resolving CUDA struct alignment mismatch for Atom in atoms.h
|
|
5
|
28
|
May 17, 2025
|
Troubleshooting uncommon error 98 "invalid device function"
|
|
3
|
2559
|
May 16, 2025
|
Inconsistent OpenCL headers in cuda 12.9
|
|
0
|
27
|
May 16, 2025
|
Understanding Linker Behavior and Symbol Resolution in GPU Device Compilation (CUDA/NVLink)
|
|
0
|
21
|
May 16, 2025
|
Concurrent Memory Transfers Broken After Upgrading from Windows 10 to Windows 11
|
|
12
|
40
|
May 16, 2025
|
How to restrict CUDA memory pool to a fixed size
|
|
1
|
26
|
May 16, 2025
|
How to set up shared memory allocated per block for a 3D structured data?
|
|
9
|
30
|
May 15, 2025
|
Re: Earlier Topic - Registers usage behaviour
|
|
0
|
17
|
May 15, 2025
|
Registers usage behaviour
|
|
8
|
86
|
May 15, 2025
|
Two GPUs; if both used, the 2080 crashes
|
|
4
|
40
|
May 15, 2025
|
Question about interoperability of CUDA Graphs Green Context across multiple processes
|
|
3
|
37
|
May 14, 2025
|
The cycles cost by _ballot api and popcll api
|
|
4
|
21
|
May 14, 2025
|
Spdlog doesn't work with nvcc (12.6) and c++20 (bug in nvcc?)
|
|
2
|
42
|
May 14, 2025
|
Cub Device RadixSort for sorting structs public documentation doesn't work
|
|
1
|
19
|
May 13, 2025
|