About the CUDA Programming and Performance category
|
|
0
|
508
|
February 1, 2020
|
How to report a bug
|
|
0
|
2924
|
November 28, 2018
|
flop_sp_efficiency vs single_precision_fu_utilization
|
|
7
|
450
|
January 16, 2021
|
Interchangeability of block X and Y configuration on launch
|
|
3
|
39
|
January 16, 2021
|
Very fast ramp-down from high to low clock speeds leading to increased time repeatedly ramping up
|
|
13
|
30
|
January 15, 2021
|
Blockchain drivers
|
|
55
|
4764
|
January 15, 2021
|
11.2 > cudaMemPool_t and Peer2Peer
|
|
4
|
84
|
January 14, 2021
|
multi-threaded kernel concurrent execution on a single GPU
|
|
3
|
5288
|
January 14, 2021
|
Atomic operation and variable access
|
|
3
|
36
|
January 13, 2021
|
Half performance on a100
|
|
0
|
41
|
January 13, 2021
|
Ncu profiling l2 cache compression rate
|
|
1
|
45
|
January 13, 2021
|
NSight Debugger Freezes When Stepping into/over Shared Memory in Visual Studio 2019
|
|
4
|
28
|
January 12, 2021
|
cudaMemcpy to non-pinned memory
|
|
4
|
67
|
January 12, 2021
|
Atomic Adding to a Clamped Value
|
|
2
|
38
|
January 11, 2021
|
Managed memory slow to copy back to host
|
|
2
|
44
|
January 11, 2021
|
Selecting the 8 bytes banks of shared memory
|
|
9
|
869
|
January 11, 2021
|
Why is this a uint64_t subtraction and bit shift generating bottleneck?
|
|
2
|
59
|
January 10, 2021
|
Delay between multiple kernel calls
|
|
2
|
47
|
January 8, 2021
|
Cuda Run time library unload
|
|
5
|
968
|
January 8, 2021
|
CUDA copies serialised when using CUDA IPC
|
|
0
|
37
|
January 8, 2021
|
ASC Student Supercomputer Challenge - Request for Optimized HPL for Tesla V100
|
|
5
|
284
|
January 7, 2021
|
Preventing NVRTC from looking for header files on disk
|
|
0
|
35
|
January 7, 2021
|
Illegal memory access was encountered when launch some kernels in a loop
|
|
3
|
41
|
January 7, 2021
|
Quadro GV100 gives so low memory bandwidth
|
|
12
|
88
|
January 6, 2021
|
cudaPointerGetAttributes returns cudaErrorInvalidValue for host-pinned mem on Win 32-bit build
|
|
2
|
35
|
January 6, 2021
|
Does this count as instantiating the template function?
|
|
0
|
21
|
January 6, 2021
|
p2pBandwidthLatencyTest in python
|
|
0
|
24
|
January 5, 2021
|
Using dlink-time-opt together with gencode in CMAKE
|
|
0
|
31
|
January 5, 2021
|
Visual Studio update has caused existing CUDA to no longer work
|
|
15
|
338
|
January 5, 2021
|
Determining registers holding the data after executing LDG.E.128
|
|
5
|
102
|
January 5, 2021
|