Cupti_metric_properties demo will cause the computer to restart

When I use Nsight Compute to grab the nvlink_bandwith demo, the computer always restarts.
I tried to use cupti_metric_properties demo to obtain the metric, but the problem of computer restart also occurred.
My cards are two RTX A6000, and the driver is 536.25 on Windows 10.

Hi, @923525984

nvlink_bandwidth use CUPTI API, both CUPTI and Nsight Compute need reserve driver resource for performance monitor. So they can’t be used together.

How can I get NvLink information in Nsight Compute? I tried using Nsight compute to grab the demo of OptixNvlink, but it still caused the computer to restart.

And when I used Nsight Compute, I did not use the CUPTI API.

So OptixNvlink can run successfully on your machine, but when you profile use Nsight Compute, the compute will restart, right ? (Are you using interactive or non-interactive profile, have you enable NVLINK section? Which GPU? Which cuda/driver/Nsight Compute version ? )

Also can I know where can get the demo of OptixNvlink ?

Yes. I used interactive profile. This problem will occur regardless of whether Nvlink section is enabled or not. GPU is RTX A6000. Driver version is 536.25. Nsight Compute version is 2023.2.2.0.
I used version Optix 7.6. https://developer.nvidia.com/designworks/optix/downloads/legacy
This problem will only occur if you add the “-p nvlink” parameter when running this demo.

CUDA version is 12.2

Hi, @923525984

We can’t reproduce the restart issue with
Nsight Compute 2023.2.2.0 + Optix 7.6 + 536.25 + RTX6000 + OptixNvlink with and without “-p nvlink”

Does this sample run successfully without Nsight Compute ?

Yes. I can run successfully without Nsight Compute. Is there a way to get the GPU log on the Windows platform?

The following error is displayed on windows:

The description for Event ID 0 from source nvlddmkm cannot be found. Either the component that raises this event is not installed on your local computer or the installation is corrupted. You can install or repair the component on the local computer.

If the event originated on another computer, the display information had to be saved with the event.

The following information was included with the event:

\Device\Video3
Error occurred on GPUID: 2d00

The message resource is present but the message was not found in the message table

Can you use command line tool “ncu” to check the output, please ?

When I try to use “ncu --query-metrics-mode all” command line to obtain all metrics, the computer is also forced to restart.

This command is tested internally and we never met restart. Is it possible that your env has some resource problem ?

Hi, @923525984

Any other help we can provide ? If not, can I close the topic ?
You can always submit a new topic if you have issue with our tools. Thanks !

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.