Latest CUDA Programming and Performance topics

Topic	Replies	Views	Activity
How to report a bug	2	18767	May 27, 2024
Multiple cuda IPC mappings without closing previous handles causing `st.volatile.global`,`ld.volatile.global` peer-memory accesses to misbehave	0	15	June 23, 2025
GPU cache coherence problem	8	5603	June 23, 2025
LRU Cache in Shared Memory (Need help) cuda	2	29	June 22, 2025
Question about warp execution and the warp scheduler	4	71	June 21, 2025
Link errors caused by thrust complex in template cuda	1	15	June 21, 2025
cuSOLVER QR decomposition solving with non-square matrices	11	5661	June 20, 2025
Simple MPI CUDA Fortran Test Issue	3	26	June 20, 2025
What does CU_STREAM_WAIT_VALUE_FLUSH flush when call cuStreamWaitValue64 cuda	0	10	June 20, 2025
Poor half performance	13	2384	June 19, 2025
MPS vs no MPS: drastic increase in kernel latency	3	48	June 19, 2025
How to overlap CUDA core and tensor core computing	4	54	June 19, 2025
Difference in error handling between driver api and runtime api	2	22	June 19, 2025
Ptx version is decided by nvcc_version?	8	53	June 19, 2025
cuDeviceGetAttribute shows i can use fabric handle, but actually i cannot	2	29	June 18, 2025
Blackwell Integer	137	2602	June 18, 2025
The call to cuEventRecord failed. cuEventRecord return value: 709	2	857	June 18, 2025
Availability of MPS on Geforce RTX GPUs	0	19	June 18, 2025
Synchronizing only subset of CUDA warps in block	12	966	June 18, 2025
Thrust::device_vector Causing a Segmentation Fault in NVTX	1	22	June 18, 2025
Sparse matrix manipulation	22	1232	June 17, 2025
Using CUDA to transfer texture between EGL and OpenGL (GLX) contexts in the same process cuda , opengl	0	18	June 17, 2025
What is the access speed of tensor memory compared to shared memory?	2	44	June 17, 2025
Why does access cudaMallocManaged memory throw exception?	2	21	June 17, 2025
What's the cudaMalloc's implicit synchronize means?	0	14	June 17, 2025
Different stream still can not use event synchronize in stream capture cuda	1	13	June 17, 2025
Possibilities to further optimize PoC programme using custom copy kernels cuda	34	147	June 16, 2025
Please delay CUDA deprecation of Volta	5	63	June 16, 2025
Migrating CUDA capable container from JP4.6 to JP6.2 containers , jetson , jetson-nano , jetson-orin	2	40	June 14, 2025
Why does this simple example produce error?	7	76	June 13, 2025

How to report a bug

2

18767

May 27, 2024

Multiple cuda IPC mappings without closing previous handles causing `st.volatile.global`,`ld.volatile.global` peer-memory accesses to misbehave

0

15

June 23, 2025

GPU cache coherence problem

8

5603

June 23, 2025

LRU Cache in Shared Memory (Need help)

cuda

2

29

June 22, 2025

Question about warp execution and the warp scheduler

4

71

June 21, 2025

Link errors caused by thrust complex in template

cuda

1

15

June 21, 2025

cuSOLVER QR decomposition solving with non-square matrices

11

5661

June 20, 2025

Simple MPI CUDA Fortran Test Issue

3

26

June 20, 2025

What does CU_STREAM_WAIT_VALUE_FLUSH flush when call cuStreamWaitValue64

cuda

0

10

June 20, 2025

Poor half performance

13

2384

June 19, 2025

MPS vs no MPS: drastic increase in kernel latency

3

48

June 19, 2025

How to overlap CUDA core and tensor core computing

4

54

June 19, 2025

Difference in error handling between driver api and runtime api

2

22

June 19, 2025

Ptx version is decided by nvcc_version?

8

53

June 19, 2025

cuDeviceGetAttribute shows i can use fabric handle, but actually i cannot

2

29

June 18, 2025

Blackwell Integer

137

2602

June 18, 2025

The call to cuEventRecord failed. cuEventRecord return value: 709

2

857

June 18, 2025

Availability of MPS on Geforce RTX GPUs

0

19

June 18, 2025

Synchronizing only subset of CUDA warps in block

12

966

June 18, 2025

Thrust::device_vector Causing a Segmentation Fault in NVTX

1

22

June 18, 2025

Sparse matrix manipulation

22

1232

June 17, 2025

Using CUDA to transfer texture between EGL and OpenGL (GLX) contexts in the same process

cuda , opengl

0

18

June 17, 2025

What is the access speed of tensor memory compared to shared memory?

2

44

June 17, 2025

Why does access cudaMallocManaged memory throw exception?

2

21

June 17, 2025

What's the cudaMalloc's implicit synchronize means?

0

14

June 17, 2025

Different stream still can not use event synchronize in stream capture

cuda

1

13

June 17, 2025

Possibilities to further optimize PoC programme using custom copy kernels

cuda

34

147

June 16, 2025

Please delay CUDA deprecation of Volta

5

63

June 16, 2025

Migrating CUDA capable container from JP4.6 to JP6.2

containers , jetson , jetson-nano , jetson-orin

2

40

June 14, 2025

Why does this simple example produce error?

7

76

June 13, 2025

CUDA CUDA Programming and Performance