CUPTI: Add more events to the trace

I am trying to trace 3 counters in a CUDA program as follows:

void *
sampling_func(void *arg)
  CUptiResult cuptiErr;
  CUpti_EventGroup eventGroup;
  CUpti_EventID gldrID, gldhitID, gldmissID; //eventId;
  size_t bytesRead;
  uint64_t eventVal;

  cuptiErr = cuptiSetEventCollectionMode(context,
  CHECK_CUPTI_ERROR(cuptiErr, "cuptiSetEventCollectionMode");

  cuptiErr = cuptiEventGroupCreate(context, &eventGroup, 0);
  CHECK_CUPTI_ERROR(cuptiErr, "cuptiEventGroupCreate");

  cuptiErr = cuptiEventGetIdFromName(device, gld_request, &gldrID); //"gld_request"
  CHECK_CUPTI_ERROR(cuptiErr, "cuptiEventGetIdFromName");

  cuptiErr = cuptiEventGetIdFromName(device, l1_gld_hit, &gldhitID); //"l1_global_load_hit"
  CHECK_CUPTI_ERROR(cuptiErr, "cuptiEventGetIdFromName");

  cuptiErr = cuptiEventGetIdFromName(device, l1_gld_miss, &gldmissID); //"l1_global_load_miss"
  CHECK_CUPTI_ERROR(cuptiErr, "cuptiEventGetIdFromName");

  cuptiErr = cuptiEventGroupAddEvent(eventGroup, gldrID);
  CHECK_CUPTI_ERROR(cuptiErr, "cuptiEventGroupAddEvent");

  cuptiErr = cuptiEventGroupAddEvent(eventGroup, gldhitID);
  CHECK_CUPTI_ERROR(cuptiErr, "cuptiEventGroupAddEvent");

  cuptiErr = cuptiEventGroupAddEvent(eventGroup, gldmissID);
  CHECK_CUPTI_ERROR(cuptiErr, "cuptiEventGroupAddEvent");

  cuptiErr = cuptiEventGroupEnable(eventGroup);
  CHECK_CUPTI_ERROR(cuptiErr, "cuptiEventGroupEnable");

Although when at run time, I get following error:

Error CUPTI_ERROR_INVALID_EVENT_ID for CUPTI API function 'cuptiEventGroupAddEvent'.

This points to the send call to cuptiEventGroupAddEvent for gldhitID. I tried various combinations and it seems I cannot add more counters at the same time, which I believe should not be the case. Also all these counters work one at a time well.

I am using CUDA compute compatibility 2.0 device.

Any suggestions? What am I doing wrong? I am using CUPTI sample code as reference.

Cross reference:

This is outside my area of expertise. I believe the following holds, but it should be treated as intelligent speculation for now:

(1) Hardware counters are a rare resources, so the total number of counters that can be operated in a single run is tightly limited. The limit is architecture specific, and sm_20 had the tightest restrictions. I want to say it provided four counters, but maybe it was fewer than that, I really don’t recall since sm_20 is very old hardware.

(2) Countable events are arranged in groups (presumably based on where in the hardware relevant signals are extracted). The number of events selectable from the same group for a run is limited. Restrictions are architecture specific.

That’s me again! I thought this will be a better forum.

So the problem is associated with the domains of these counters: {“gld_request”} & {“l1_global_load_hit”, “l1_global_load_miss”} fall in different domains. So I created different domains for these counters and associated with them with the same context. This solved the problem.

Well, I got it with a bit of luck. The error message was not very helpful in that regard (how about saying: domain incompatibility). Maybe someone should look into that.