It seems like nvtxNameCudaStreamA in CUDA 11.3.0 has an off by one bug of sorts. I have an application with three “compute” streams and one “data transfer” stream. I name the three compute streams “compute 1”, “compute 2”, and “compute 3”. I name the data transfer stream “data stream”. When I profile the app with nsys and open with nsight, I see the kernel activity on the three compute streams and the data transfer on the data transfer stream as expected, but compute stream 1 labeled with a default name, compute stream 2 is labeled with “compute 1”, compute stream 3 is labeled “compute 2”, and the data transfer stream is labeled “compute 3”. No streams are labeled with “data stream”. When I export to HDF5 I can see that the stream names are all present in the “TARGET_INFO_NVTX_CUDA_STREAM” dataset, but the corresponding “streamId” fields are all one higher than the values used in the “CUPTI_ACTIVITY_KIND_KERNEL” and “CUPTI_ACTIVITY_KIND_MEMCPY” datasets. For example, the kernels on the compute streams have streamID values 14, 15, 16, and the memcpy on the data transfer stream as streamId value 17, but the four named streams in TARGET_INFO_NVTX_CUDA_STREAM have stream IDs 15 (compute 1), 16 (compute 2), 17 (compute 3), and 18 (data stream).
I don’t see how I could possibly manage to get things “off-by-one” since I’m only working with
cudaStream_t types. Has anyone else noticed this problem?