|
What is this kernel 'nvjet_tst_112x64_64x9_1x2_h_bz_bias_TNN'? Which cuda api do i need?
|
|
5
|
235
|
October 26, 2025
|
|
Why is the execution time of cudaMalloc so variable (when using hotspot benchmark from Rodinia Benchmark suite)?
|
|
5
|
332
|
October 26, 2025
|
|
Windows torchrun DDP fails on RTX 50-series only ("use_libuv…"); same code OK on 40/20 series – guidance on rendezvous/libuv/CUDA stack
|
|
1
|
90
|
October 25, 2025
|
|
Long Scoreboard Stalls On L1 Cache Hits
|
|
4
|
67
|
October 25, 2025
|
|
Question about the assignment of SMS through Green Context
|
|
5
|
77
|
October 24, 2025
|
|
A kernel's performance depends on sortedness of data, but sorting the data would take more time than the performance gained
|
|
2
|
42
|
October 23, 2025
|
|
H100 DSMEM: A kernel launch error has occurred due to cluster misconfiguration
|
|
3
|
42
|
October 23, 2025
|
|
Odd behavior. Bug in opencl implementation?
|
|
6
|
6408
|
October 22, 2025
|
|
How does the .satfinite modifier work in mma.sync PTX instructions?
|
|
3
|
73
|
October 22, 2025
|
|
Do all threads in a warp share the same PC?
|
|
9
|
90
|
November 5, 2025
|
|
Compatibility between RTX4070 and RTX Mobile GPU
|
|
2
|
53
|
October 21, 2025
|
|
How to lower the num_regs when use cuLinkComplete
|
|
1
|
45
|
October 21, 2025
|
|
Documentation on cudaGraphXXX() Graph Management functions?
|
|
3
|
648
|
October 20, 2025
|
|
Inconsistent __all_sync() behavior between CUDA 10.1 (RTX 2080 Ti) and CUDA 13.0 (RTX 4060 Ti)
|
|
4
|
79
|
October 20, 2025
|
|
Unexecuted warp shuffle function makes my program slower
|
|
5
|
62
|
October 20, 2025
|
|
Warp-shuffle - Shared Memory Performance Comparison For Reduction, Between RTX4070 and RTX5070
|
|
7
|
88
|
November 2, 2025
|
|
GB200 vs H200 NVL: cuMemCreate(1 GiB) is ~80–90 ms vs ~0.08–0.13 ms — expected on GB200?
|
|
2
|
82
|
October 19, 2025
|
|
Control panel stopped working? Won't open
|
|
5
|
2250
|
October 18, 2025
|
|
The Usage Scenarios of Green Context
|
|
5
|
84
|
October 17, 2025
|
|
cudaGraphInstantiate() hangs with CUDA device graphs
|
|
3
|
53
|
October 17, 2025
|
|
Incorrect PTX complier warning? Potential performance loss for WGMMA pipeline crossing func boundary
|
|
0
|
35
|
October 17, 2025
|
|
Fusion of two GEMM operators using CUDA
|
|
5
|
77
|
October 17, 2025
|
|
Is Independent Thread Scheduling reconvergence based on Program Counter or Instruction?
|
|
8
|
92
|
October 16, 2025
|
|
Gaussian Splatting (Nerfstudio) in RTX 5090
|
|
1
|
83
|
October 15, 2025
|
|
Access shared memory array via pointers
|
|
2
|
50
|
October 15, 2025
|
|
Questions about Resource Isolation and Execution Control using CUDA Green Contexts + MPS
|
|
8
|
183
|
October 14, 2025
|
|
Barnes-Hut CUDA Simulation Performance
|
|
14
|
277
|
October 28, 2025
|
|
Dynamic Parallelism memory consistency
|
|
7
|
94
|
October 27, 2025
|
|
Can I run CUDA tasks and Vulkan tasks within the same process simultaneously?
|
|
0
|
29
|
October 13, 2025
|
|
What does cuLibraryGetUnifiedFunction actually do
|
|
3
|
360
|
October 11, 2025
|