|
How to report a bug
|
|
2
|
19763
|
May 27, 2024
|
|
HW Power Brake Slowdown on a pair of RTX Ada R6000
|
|
4
|
50
|
April 9, 2026
|
|
Maximum number of operations in a stream, continued
|
|
2
|
29
|
April 8, 2026
|
|
how to solve error of misaligned address?
|
|
2
|
18719
|
April 8, 2026
|
|
Binary logic has reached its limit. I noticed this back in 2018
|
|
4
|
56
|
April 8, 2026
|
|
"Has anyone else hit the physical TGP ceiling on the Mobile 4060? I wrote a PyTorch cold-boot script that sustains 26.3+ TFLOPS across massive 163K x
|
|
1
|
15
|
April 8, 2026
|
|
Pytorch\cuda Veilsight SAA Telemetry DAEMON
|
|
0
|
13
|
April 7, 2026
|
|
Pushing RTX 4060 Mobile to 20.9 TFLOPS: Procedural 1M+ Vertex 3D Generation via PyTorch
|
|
0
|
13
|
April 7, 2026
|
|
Live demo: 3 Trillion dimension manifold processed on RTX 4060 Laptop GPU using pure phase-slicing zero-allocation — stable at only 1.87 GB VRAM
|
|
1
|
17
|
April 7, 2026
|
|
NCCL P2P hang on dual RTX PRO 6000 Blackwell Workstation Edition (WRX90E-SAGE SE)
|
|
4
|
94
|
April 7, 2026
|
|
Nsight compute seems to be giving the wrong number of bank conflicts
|
|
5
|
94
|
April 6, 2026
|
|
SU(7) Phase-Lattice Engine: Vector Resonance Model for High-Performance Multi-Layer Processing
|
|
32
|
88
|
April 4, 2026
|
|
Increasing number of seemingly superfluous instructions in integer-only code
|
|
0
|
21
|
April 3, 2026
|
|
CUDA MPS is a good choice for multiple yolo models inferencing parallely?
|
|
1
|
27
|
April 3, 2026
|
|
atomicCAS: CUDA 13.2 unexpectedly generates system-scope atomic instruction starting from CC100
|
|
5
|
55
|
April 3, 2026
|
|
Is fence required after mbar init when using transaction count?
|
|
0
|
28
|
April 3, 2026
|
|
Issues running certain APK-based tools on NVIDIA GPUs, CUDA compatibility or something else?
|
|
0
|
24
|
April 2, 2026
|
|
What is Warp Allocation Granulatity for?
|
|
7
|
59
|
April 2, 2026
|
|
Unusual Latency Discrepancy between NCU and Standalone %clock64 (SM 8.9)
|
|
5
|
35
|
April 2, 2026
|
|
What is this kernel 'nvjet_tst_112x64_64x9_1x2_h_bz_bias_TNN'? Which cuda api do i need?
|
|
7
|
606
|
April 2, 2026
|
|
Implementation of asinf() with improved accuracy and without negative performance impact
|
|
0
|
17
|
April 2, 2026
|
|
Granularity of L1 and L2 Cache
|
|
3
|
78
|
April 1, 2026
|
|
Beyond the Binary Wall: Anchor4-TC — A CMOS Specification for Resonance-Based Adaptive Lattice Computing
|
|
2
|
26
|
April 1, 2026
|
|
Ldmatrix.x4 with Swizzle<3,3,4> Shows Bank Conflicts When grid > (1,1,1), But Not with grid=(1,1,1) — RTX 3060 (SM86)
|
|
2
|
29
|
March 31, 2026
|
|
Cross-thread pageable D2H copy appears to block cudaLaunchKernel in another thread
|
|
2
|
43
|
March 30, 2026
|
|
Cross-thread pageable D2H copy appears to delay `cudaLaunchKernel` in another thread
|
|
1
|
40
|
March 30, 2026
|
|
Implementation of acosf() with improved accuracy and without negative performance impact
|
|
2
|
57
|
March 27, 2026
|
|
Unable to determine the device handle for GPU0: 0000:01:00.0: GPU is lost. Reboot the system to recover this GPU
|
|
2
|
76
|
March 26, 2026
|
|
I dont achieve peak performance of 4090 in Pytorch
|
|
0
|
33
|
March 25, 2026
|
|
Periodic and/or hybrid functional acceleration
|
|
1
|
90
|
March 24, 2026
|