|
How to report a bug
|
|
2
|
19503
|
May 27, 2024
|
|
A new GPU-accelerated prime sieve using constant-cost structural elimination to overcome memory bandwidth limits at massive scales
|
|
0
|
23
|
January 11, 2026
|
|
clEnqueueNDRangeKernel call takes too much time on Nvidia GPUs
|
|
4
|
157
|
January 10, 2026
|
|
OpenCL clBuildProgram caches source, and does not recompile if #include'd source changes
|
|
3
|
1191
|
January 8, 2026
|
|
Source file updating Problem with C++ and multiple source files
|
|
5
|
1215
|
January 8, 2026
|
|
How to enable P2P atomic operations between two GPUs connected via PCIe Switch?
|
|
1
|
34
|
January 7, 2026
|
|
Unnecessary traffic in Load-Reduce and Store operations on a Multicast object
|
|
0
|
33
|
January 7, 2026
|
|
[Announcement] PTX Inject & Stack PTX: Runtime PTX injection for CUDA kernels without recompilation
|
|
2
|
93
|
January 6, 2026
|
|
CUDA 13.1 cccl errors
|
|
0
|
37
|
January 6, 2026
|
|
Bandwidth test of pageable memory is mush different in 2 computer
|
|
18
|
87
|
January 6, 2026
|
|
Clarification on cudaMemcpy synchronization behavior with pageable memory and non-blocking streams
|
|
2
|
38
|
January 6, 2026
|
|
Implementing H100 TMA multicast with cuda::ptx:: functions but its slower than 8 independent TMA operations fetching same tile in cluster
|
|
7
|
107
|
January 5, 2026
|
|
Tcgen05{.ld, .st} matrix fragments
|
|
0
|
19
|
January 5, 2026
|
|
About setmaxnreg
|
|
4
|
39
|
January 4, 2026
|
|
cudaErrorIllegalAddress Encountered: "CUDA error: an illegal memory access was encountered"
|
|
0
|
65
|
January 4, 2026
|
|
CUDA Kernel Launch Fails with “Invalid Configuration Argument” on RTX 30xx
|
|
0
|
11
|
January 3, 2026
|
|
Agentic Formal Verification
|
|
0
|
21
|
January 2, 2026
|
|
CUDA->Vulkan interop shows (uninitialized memory?) artifacts depending on the value written by CUDA
|
|
3
|
48
|
January 2, 2026
|
|
Is it correct to assume that 0 is an invalid value for cudaTextureObject_t?
|
|
3
|
33
|
January 2, 2026
|
|
FP64 Performance - Power Limitation - H100 vs A100
|
|
12
|
119
|
January 1, 2026
|
|
Rtx 5090 Peak BF16 Tensor TFLOPS
|
|
1
|
205
|
December 30, 2025
|
|
Look-Up Table vs __sincosf for Large-Scale Random Phase Calculations in Radio Astronomy Pipeline
|
|
20
|
115
|
December 30, 2025
|
|
Unable to Run Parallel Inference on Two GPUs Using Python (Multi-Model, Multi-Queue Setup)
|
|
4
|
77
|
December 29, 2025
|
|
CUDA Error / Ubuntu / Ampere / 3090 - Constant CUDA error: an illegal instruction was encountered
|
|
8
|
79
|
December 28, 2025
|
|
How to tell the PTX version?
|
|
3
|
43
|
December 27, 2025
|
|
Im2col Illegal Instruction Encounterd on Supported Architecture (H100)
|
|
3
|
57
|
December 27, 2025
|
|
Dead code for local memory stores
|
|
1
|
38
|
December 25, 2025
|
|
Why MemcpAsync happend in DToD?
|
|
1
|
33
|
December 25, 2025
|
|
Introducing CUDA Online Judge - Learn CUDA Programming Without GPU Hardware
|
|
0
|
94
|
December 25, 2025
|
|
cub::DeviceSelect::Flagged does not work for large num_items
|
|
1
|
22
|
December 24, 2025
|