|
cudaMemcpyBatchAsync cannot aggregate D2D copy operations
|
|
13
|
127
|
December 9, 2025
|
|
Deadlock when using cuStreamWaitValue32/cuStreamWriteValue32 for async cross-stream ordering
|
|
8
|
67
|
December 8, 2025
|
|
Implementing clang-tidy checks for CUDA C++ Guidelines for Safety Critical Programming
|
|
3
|
69
|
December 8, 2025
|
|
Question about CTA/warp lifecycle
|
|
5
|
60
|
December 8, 2025
|
|
Help needed to execute tcgen05.mma_cta_group::2 instructions
|
|
0
|
60
|
December 7, 2025
|
|
Which offers lower latency for NV12 to RGB conversion, NPP or CV-CUDA?
|
|
1
|
72
|
December 5, 2025
|
|
Any advice for best pipeline of least latency to display NV12 textures to monitor (Windows 10)
|
|
1
|
27
|
December 5, 2025
|
|
the best way convert nv12 to RGBA
|
|
6
|
3533
|
December 5, 2025
|
|
Cuda core dump does not work properly when many device assert happens
|
|
2
|
169
|
December 4, 2025
|
|
Cooperative_groups::cluster_group _CG_HAS_CLUSTER_GROUP does not get #define'd
|
|
1
|
80
|
December 4, 2025
|
|
Cannot allocate any memory with cudaMallocHost or cudaMallocManaged
|
|
0
|
58
|
December 4, 2025
|
|
Unexpected Performance Behavior with CUDA Software Prefetcher, Warm-Up Kernel and GEMV
|
|
10
|
114
|
December 3, 2025
|
|
Asymmetric PCIe bandwidth in bidirectional transfers: H2D drops 56% while D2H maintains performance
|
|
1
|
57
|
December 2, 2025
|
|
What's the difference between special registers and general registers?
|
|
6
|
95
|
December 2, 2025
|
|
Delay between cudaMemcpy and kernel launch with MPS
|
|
0
|
27
|
December 2, 2025
|
|
Integer NTT on RTX 20xx, A100 vs RTX 30xx, 40xx, 50xx
|
|
27
|
522
|
November 30, 2025
|
|
Optimizing PTX mma ops on volta to surpass wmma
|
|
2
|
61
|
November 30, 2025
|
|
Large pinned host memory limit when using cudaHostRegister() on DAX-mapped memory (only ~124 GiB usable on a 128 GiB node)
|
|
1
|
99
|
November 29, 2025
|
|
I created a bitmap image processor using CUDA
|
|
0
|
54
|
November 27, 2025
|
|
Correct usage of mbarriers on Ampere / Ada
|
|
0
|
49
|
November 27, 2025
|
|
CuteDSL error: target SM ARCH unknown
|
|
4
|
131
|
November 26, 2025
|
|
Random errors occur when calling the same kernel function in a multi-threaded manner
|
|
9
|
122
|
November 26, 2025
|
|
Why the kernel nodes in the device graph cannot use dynamic parallelismc
|
|
2
|
108
|
November 26, 2025
|
|
Can Thrust programmers access shared, constant, or texture memory without dropping down to CUDA?
|
|
4
|
61
|
November 24, 2025
|
|
CUDA运行出错
|
|
5
|
104
|
November 24, 2025
|
|
Different CTAs Accessing the Same Shared Memory Address on RTX 5090 — Is This Expected?
|
|
9
|
109
|
November 21, 2025
|
|
Shared Memory "Bank Conflicts" I'am confused...
|
|
14
|
3668
|
November 20, 2025
|
|
CUDA code coverage and Static analysis tools
|
|
6
|
3907
|
November 20, 2025
|
|
How to understand the bank conflict of shared_mem
|
|
16
|
14252
|
November 19, 2025
|
|
What happens under MPS oversubscription
|
|
5
|
80
|
November 19, 2025
|