How to mapping between GPU cards with GPU metrics in nsys profile output

Hi,
I’m trying to get GPU metrics using nsight system. I already have the output_report.nsys-rep file, and I can see the GPU metrics for each GPU cards using Nsight System GUI tool.
But now, when I want to query it using sqlite, I can’t find a way to mapping between GPU cards and GPU metrics.
For example, my server has 2 GPU cards, I can query 2 cards information in TARGET_INFO_GPU table.

sqlite> select * from TARGET_INFO_GPU;
281474976710656|1|NVIDIA TITAN RTX|0000:05:00.0|1|6291456|25190727680|672096000000|1770000000|72|1|732d8f7e-e197-1ef3-8a6e-566198ad80bf|0|TU102|1|||161|0|65536|4|65536|49152|65536|65536|65536|32|3|32|16|1024|1024|1024|64|2147483647|65535|65535|7|5|7|5
281474976710656|0|NVIDIA TITAN RTX|0000:01:00.0|1|6291456|25190727680|672096000000|1770000000|72|0|411eda04-f007-8d8b-570d-360b58276d44|0|TU102|0|||161|0|65536|4|65536|49152|65536|65536|65536|32|3|32|16|1024|1024|1024|64|2147483647|65535|65535|7|5|7|5

I can also see 2 different typeID in GPU_METRICS table

sqlite> select * from GPU_METRICS where metricID == 7;
221026|221026|281479271677952|7|25
523583|523583|281479271677953|7|80

But how Can I know which typeID (281479271677952 and 281479271677953) is belong to which GPU card (0 and 1) ?

Thanks

@jkreibich can you help?

1 Like

Just so you know, if you’re poking around in the SQLite data, it might help to issue the commands .header on and .mode column. The first command will turn on display of column headers, and the second will make the output more human readable.

The .schema command will also help with exploration. If we look at the structure of the GPU_METRICS table, you can see a references to the TARGET_INFO_GPU_METRICS table.

sqlite> .schema GPU_METRICS
CREATE TABLE GPU_METRICS (
    -- GPU Metrics, events and values.

    rawTimestamp                INTEGER   NOT NULL,                    -- Raw event timestamp recorded during profiling.
    timestamp                   INTEGER   NOT NULL,                    -- Event timestamp (ns).
    typeId                      INTEGER   NOT NULL,                    -- REFERENCES TARGET_INFO_GPU_METRICS(typeId) and GENERIC_EVENT_TYPES(typeId)
    metricId                    INTEGER   NOT NULL,                    -- REFERENCES TARGET_INFO_GPU_METRICS(metricId)
    value                       INTEGER   NOT NULL                     -- Counter data value
);

The TARGET_INFO_GPU_METRICS table references GENERIC_EVENT_SOURCES via the sourceId, which is the value you want.

sqlite> .schema TARGET_INFO_GPU_METRICS
CREATE TABLE TARGET_INFO_GPU_METRICS (
    -- GPU Metrics, metric names and ids.

    typeId                      INTEGER   NOT NULL,                    -- REFERENCES GENERIC_EVENT_TYPES(typeId)
    sourceId                    INTEGER   NOT NULL,                    -- REFERENCES GENERIC_EVENT_SOURCES(sourceId)
    typeName                    TEXT      NOT NULL,                    -- Name of event type.
    metricId                    INTEGER   NOT NULL,                    -- Id of metric in event; not assumed to be stable.
    metricName                  TEXT      NOT NULL                     -- Definitive name of metric.
);

According to the schema of the GENERIC_EVENT_SOURCES table, the sourceId is a GlobalID value:

sqlite> .schema GENERIC_EVENT_SOURCES
CREATE TABLE GENERIC_EVENT_SOURCES (
    -- Generic event source modules

    sourceId                    INTEGER   NOT NULL   PRIMARY KEY,      -- Serialized GlobalId.
    nameId                      INTEGER   NOT NULL,                    -- REFERENCES StringIds(id) -- Event source name
    timeSourceId                INTEGER   NOT NULL,                    -- REFERENCES ENUM_NSYS_GENERIC_EVENT_SOURCE(id)
    sourceGroupId               INTEGER   NOT NULL,                    -- REFERENCES ENUM_NSYS_GENERIC_EVENT_GROUP(id)
    hyperType                   TEXT,                                  -- Hypervisor Type
    hyperVersion                TEXT,                                  -- Hypervisor Version
    hyperStructPrefix           TEXT,                                  -- Hypervisor Struct Prefix
    hyperMacroPrefix            TEXT,                                  -- Hypervisor Macro Prefix
    hyperFilterFlags            INTEGER,                               -- Hypervisor Custom Filter Flags
    hyperDomain                 TEXT,                                  -- Hypervisor Domain
    data                        TEXT                                   -- JSON encoded generic event source description.
);

The exact format of a GlobalID value depends on how it is used, but it tends to be a compound number made up of bit fields. For this sourceId, I believe it has an 8-bit VmId field (which is actually virtual GPU, not a traditional virtual machine) and an 8-bit GPU HwId. It helps to look at GlobalID values in hex, for example SELECT format('0x%X', sourceId) AS sourceId FROM TARGET_INFO_GPU_METRICS. You should be able to spot a correspondence between that value and the vmId and/or id values in TARGET_INFO_GPU. The TARGET_INFO_GPU:vmId is also a GlobalID, so looking in hex will help.

Let us know if that’s not working out or still confusing.

1 Like

Thanks for helping me out
I tried to select from TARGET_INFO_GPU and TARGET_INFO_GPU_METRICS but still can’t find the relationship between 2 values: vmId of TARGET_INFO_GPU and typeId of TARGET_INFO_GPU_METRICS

sqlite> SELECT printf('0x%X', vmId) AS vmId FROM TARGET_INFO_GPU;
vmId           
---------------
0x1000000000000
0x1000000000000
sqlite> SELECT printf('0x%X', typeId) AS typeId FROM TARGET_INFO_GPU_METRICS;
typeId         
---------------
0x1000100000001
...
0x1000100000000
...

Does that bottom bit of TARGET_INFO_GPU_METRICS.typeId appear to map to the GPU id?

It might help if you can post a simple SELECT * FROM ... for TARGET_INFO_GPU and TARGET_INFO_GPU_METRICS.

Yes, I was able to confirm the bottom eight bits of the typeId is the GPU ID value.

From User Guide — nsight-systems 2024.3 documentation

GENERIC_EVENTS.typeId is a composite bit field that combines HW ID, VM ID, source ID, and type ID with the following structure:

<Hardware ID:8><VM ID:8><Source ID:16><Type ID:32>

The type ID is yet another composite bit field that combines the GPU metrics event tag and the GPU ID. To extract the latter, you need to get the lower 8 bits:

SELECT typeId & 0xFF AS gpuId FROM GENERIC_EVENTS

1 Like

Thanks for answering
Now I can check the correct metrics for each gpu.

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.