Hello CUPTI team,
I am using Cuda 10 and its corresponding CUPTI.
I am profiling activity CUPTI_ACTIVITY_KIND_DRIVER and CUPTI_ACTIVITY_KIND_RUNTIME as well as kernels and memcpys.
The processID that I get from CUpti_ActivityAPI packet is valid. However the threadID seems to be garbage value (very large and does not equal pid even if i am running a single thread in linux).
I tried to run CUPTI/sample/activity_trace_async and i see the same issue. As you can see below, this is definitely not the correct thread ID. I checked using top as well.
CUPTI/sample/activity_trace_async $ LD_LIBRARY_PATH=../../lib64/ ./activity_trace_async
CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_SIZE = 8388608
CUPTI_ACTIVITY_ATTR_DEVICE_BUFFER_POOL_LIMIT = 100
Device Name: NVIDIA Quadro P4000
DEVICE NVIDIA Quadro P4000 (0), capability 6.1, global memory (bandwidth 232 GB/s, size 8119 MB), multiprocessors 14, clock 1480 MHz
DRIVER cbid=2 [ 41943451 - 41944873 ] process 9799, thread 1667285504, correlation 1
DRIVER cbid=1 [ 41945605 - 41947821 ] process 9799, thread 1667285504, correlation 2
DRIVER cbid=4 [ 41978138 - 41978527 ] process 9799, thread 1667285504, correlation 3
DRIVER cbid=3 [ 41978977 - 41979468 ] process 9799, thread 1667285504, correlation 4
DRIVER cbid=5 [ 41980534 - 41999726 ] process 9799, thread 1667285504, correlation 5
DRIVER cbid=259 [ 42000642 - 42335869 ] process 9799, thread 1667285504, correlation 6
I see similar behavior with other GPU/other machines too.
Is this a known issue or maybe I am missing something. Or is there another way to find out the threadID which launched the driver/runtime calls?
Thank you
Sujan