Cuda stream stalls due to memcpyAsync --- even when memory copy performing is idle?
|
|
1
|
18
|
September 28, 2023
|
Nsight Profiler Hangs on OpenMP Initialization
|
|
8
|
342
|
September 26, 2023
|
Profiling elapsed time and energy around extreme clock frequencies
|
|
1
|
110
|
September 26, 2023
|
Applying Timeline View Filter to Stats System View
|
|
2
|
88
|
September 25, 2023
|
Confusion about the (d/f/h)(mul/add/fma) count in the nsight compute
|
|
2
|
151
|
September 8, 2023
|
Nsight compute failed to profile with nvtx ranges in pytorch
|
|
4
|
145
|
September 19, 2023
|
Jetson orin performace profiling
|
|
2
|
124
|
August 30, 2023
|
Higher L2 cache hit rate but larger device memory tranfer size
|
|
1
|
107
|
August 13, 2023
|
How to make an executable file run on NVIDIA gpus
|
|
0
|
129
|
July 27, 2023
|
How can I get the analysis like nvprof in nsys?
|
|
3
|
203
|
July 21, 2023
|
How to make an executable file run on NVIDIA gpus
|
|
0
|
126
|
July 19, 2023
|
Is there any way to solely collect the total duration of the CUDA kernels within each nvtx range
|
|
1
|
182
|
July 11, 2023
|
Weird behavior of cuda event
|
|
3
|
145
|
June 23, 2023
|
[Question] NSys CUDA Profiler - Page Migration and Number of CPU/GPU page faults
|
|
1
|
262
|
June 23, 2023
|
Max block size limiting factor
|
|
3
|
161
|
July 5, 2023
|
Segmented memory copy to/from device
|
|
4
|
679
|
June 20, 2023
|
Cannot get tensor core metrics with latest NSight system
|
|
4
|
832
|
June 20, 2023
|
How to profile multiple tensorrt model inference simultaneously using CUPTI
|
|
6
|
251
|
June 13, 2023
|
Why is shared memory configuration size is limiting the occupancy
|
|
2
|
205
|
June 4, 2023
|
Unable to profile with NCU -- WARNING: No Kernels were profiled
|
|
3
|
966
|
May 15, 2023
|
Kernel time discrepancy between nsys profile and cudaEventElapsedTime
|
|
4
|
415
|
April 28, 2023
|
Using Nsight Compute (ncu) alongside srun
|
|
6
|
1764
|
April 24, 2023
|
Code profiling tool for VPI Python and PyCUDA application
|
|
2
|
217
|
May 2, 2023
|
CUDA application binary in windows is not calling the kernels, how to solve it?
|
|
2
|
611
|
March 4, 2023
|
Nsight Compute does not detect kernel launches for OpenMP offloaded code
|
|
11
|
914
|
February 28, 2023
|
How to get the exec. time inner the kernel function?
|
|
6
|
564
|
February 27, 2023
|
Running perf inside a docker container / Docker seccomp profiles
|
|
2
|
407
|
January 27, 2023
|
==ERROR== Launching the target application failed
|
|
7
|
743
|
January 23, 2023
|
Detect memory coalescing from SASS file
|
|
1
|
373
|
January 6, 2023
|
Can I profile an application on jetson nano using NVIDIA Visual profiler?
|
|
2
|
277
|
February 1, 2023
|