Rpc Start(.Agent.StartRequest) returns (.Agent.EmptyMessage); is canceled because the timeout period is expired

/dvs/p4/build/sw/devtools/Agora/Rel/CUDA12.4/QuadD/Common/AgentAPI/Src/SessionImpl.cpp(18): rpc Start(.Agent.StartRequest) returns (.Agent.EmptyMessage);
 is canceled because the timeout period is expired

I find this post: Linux: Cannot Start Profile, Cannot Start Daemon

I get some logs as the post says. Could you give some help?

The machine has another docker running dcgm. But after I stop the docker, the issue still remains.

error.log (3.3 MB)

@liuyis can you comment?

Hi @cuda_new_bird, which Nsys command were you using before seeing this error? I noticed you were using Nsys 2023.4.4 from CUDA Toolkit 12.4 release, this Nsys version has been pretty outdated, could you try the latest release 2024.7 from Nsight Systems - Get Started | NVIDIA Developer?

I tried the latest version. It comes out that, my machine was running a dcgm container.

Does the nsight system has conflict with dcgm ?

That’s the reason I found.

Both DCGM and Nsight Systems use the NVIDIA CUPTI tool under the covers to get information about CUDA kernels. Unfortunately, CUPTI does not support multi-subscribers, so only one of them can successfully attach and get data.

Long term we are working on a way to get around this.

1 Like

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.