|
Double4 is deprecated, but the preferred double4_32a is unrecognized?
|
|
6
|
127
|
December 16, 2025
|
|
How to sync Cuda and Vulkan?
|
|
2
|
49
|
December 16, 2025
|
|
Nvcc, syntax error in cuda.h(7451): error: expected a ")"
|
|
3
|
73
|
December 16, 2025
|
|
Wmma vs Wgmma On H100 GPU
|
|
5
|
129
|
December 15, 2025
|
|
Thrust device allocator vs std allocator
|
|
3
|
68
|
December 15, 2025
|
|
Architectural insights needed: Why is the MIG 3g.71gb instance consistently the "Efficiency Sweet Spot" on H200?
|
|
4
|
137
|
December 15, 2025
|
|
Weekend project: Very accurate double-precision sincos() implementation for a restricted domain
|
|
0
|
51
|
December 14, 2025
|
|
Pixel Shader vs NPP - Which is faster for batch processing NV12 to RGB conversions and display directly to screen?
|
|
5
|
96
|
December 14, 2025
|
|
Register usage spike in SASS with divison slow/full path
|
|
13
|
283
|
December 12, 2025
|
|
Question about the cacheConfig value in nsight systems
|
|
6
|
84
|
December 12, 2025
|
|
Is the CUDA tile kernel submitted to GPU still using the cuLaunchKernel?
|
|
3
|
97
|
December 12, 2025
|
|
Unexpected results on cub::DeviceRadixSort::SortKeys and SortPairs with 128 bit keys
|
|
6
|
66
|
December 12, 2025
|
|
How many tensor cores to execute the wmma.mma.sync.aligned.{alayout}.{blayout}.m16n16k16 instruction?
|
|
23
|
266
|
December 12, 2025
|
|
__frsqrt_rn is not accurate 0.5ulp? I found a number
|
|
4
|
70
|
December 10, 2025
|
|
FFMA with Uniform register
|
|
3
|
138
|
December 9, 2025
|
|
Is it possible having compressible memory & memory pools over the same array on device?
|
|
0
|
50
|
December 9, 2025
|
|
cudaMemcpyBatchAsync cannot aggregate D2D copy operations
|
|
13
|
166
|
December 9, 2025
|
|
Deadlock when using cuStreamWaitValue32/cuStreamWriteValue32 for async cross-stream ordering
|
|
8
|
84
|
December 8, 2025
|
|
Implementing clang-tidy checks for CUDA C++ Guidelines for Safety Critical Programming
|
|
3
|
91
|
December 8, 2025
|
|
Question about CTA/warp lifecycle
|
|
5
|
93
|
December 8, 2025
|
|
Help needed to execute tcgen05.mma_cta_group::2 instructions
|
|
0
|
71
|
December 7, 2025
|
|
Which offers lower latency for NV12 to RGB conversion, NPP or CV-CUDA?
|
|
1
|
92
|
December 5, 2025
|
|
Any advice for best pipeline of least latency to display NV12 textures to monitor (Windows 10)
|
|
1
|
40
|
December 5, 2025
|
|
the best way convert nv12 to RGBA
|
|
6
|
3565
|
December 5, 2025
|
|
Cuda core dump does not work properly when many device assert happens
|
|
2
|
200
|
December 4, 2025
|
|
Cooperative_groups::cluster_group _CG_HAS_CLUSTER_GROUP does not get #define'd
|
|
1
|
84
|
December 4, 2025
|
|
Cannot allocate any memory with cudaMallocHost or cudaMallocManaged
|
|
0
|
75
|
December 4, 2025
|
|
Unexpected Performance Behavior with CUDA Software Prefetcher, Warm-Up Kernel and GEMV
|
|
10
|
154
|
December 3, 2025
|
|
Asymmetric PCIe bandwidth in bidirectional transfers: H2D drops 56% while D2H maintains performance
|
|
1
|
80
|
December 2, 2025
|
|
What's the difference between special registers and general registers?
|
|
6
|
104
|
December 2, 2025
|