Unknown Error on device 0 when runing ncu

i try to use ncu to check the gpu usage on a simple function. but i get error. i am runing in container ARG CUDA_VERSION=12.1.0
ARG CUDNN_VERSION=8
ARG OS_VERSION=22.04
FROM nvidia/cuda:${CUDA_VERSION}-cudnn${CUDNN_VERSION}-devel-ubuntu${OS_VERSION}
my device is single 3090 GPU , ncu -v 2023.1.0.0 (build 32376155) (public-release)
my container already start with cap_add: - SYS_ADMIN -CAP_SYS_PTRACE
$ nvidia-smi
NVIDIA-SMI 560.35.02 Driver Version: 560.94 CUDA Version: 12.6
my local is also install with cuda 12.1 (same with container)
i check the document and ampere should be support after 2022 update and the container start with admin
,my windows is 11 and i install the ncu ubuntu version so i dont know what else may cause this issue?

import torch

def test_cuda():
    device = torch.device("cuda:0")
    x = torch.randn(1000, 1000, device=device)
    y = torch.matmul(x, x)
    torch.cuda.synchronize()  # Ensure GPU computation completes
    print(y)

if __name__ == "__main__":
    test_cuda()
ncu python3 test_ncn.py
==PROF== Connected to process 7698 (/usr/bin/python3.10)
==ERROR== Unknown Error on device 0.
tensor([[ -1.6078,   3.2960, -25.2283,  ..., -39.3344,  19.4053,  12.0484],
        [-10.6675, -40.8973, -17.8383,  ..., -44.4900,  55.2061, -15.0428],
        [-13.4432,  18.6459,  -2.8020,  ...,  22.4246,  31.6181,  21.3463],
        ...,
        [ -9.2275, -18.6414,  -3.8433,  ..., -36.3853,  19.7588,  44.5568],
        [-14.8734,  46.8846,  21.3486,  ...,  -1.1413,  49.1632,  28.2038],
        [-23.3245, -21.7096,   0.9777,  ..., -40.5297,  30.1500,  20.3629]],
       device='cuda:0')
==PROF== Disconnected from process 7698
==WARNING== No kernels were profiled.
==WARNING== Profiling kernels launched by child processes requires the --target-processes all option.

Hi, @user96785

Can you clarify how to set up the repro ENV ?
You mentioned windows, container, ubuntu etc

My device is running Windows 11, and I am using Docker on it. The Docker image I am using is Ubuntu 22.04. I am not sure what specific information might be needed, so I am providing everything that might be relevant. I apologize if this causes any confusion.

I reorganise my question .

I am trying to use ncu to check the GPU usage of a simple function, but I encountered an error.

Here is my environment setup:

• I am running inside a Docker container based on the image:

nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04.

• My device has a single NVIDIA RTX 3090 GPU.

• ncu version: 2023.1.0.0 (build 32376155) (public release).

• The container is started with cap_add permissions: SYS_ADMIN and CAP_SYS_PTRACE.

Here are some additional details:

• Output of nvidia-smi inside the container:

NVIDIA-SMI 560.35.02

Driver Version: 560.94

CUDA Version: 12.6.

• My local system is also installed with CUDA 12.1, which matches the version in the container.

• According to the documentation, the Ampere architecture (RTX 3090) should be supported after the 2022 update.

I am running on Windows 11 and installed the Ubuntu version of ncu(I think it in the docker). I have no idea what else might be causing the issue. No matter reboot or recreate container I have all tried still get no luck. Could you help identify the problem?

OK. Thanks for the clarification.
Our devtools don’t support in windows docker container yet.

If you want to use Linux ENV on windows platform, the suggested method is using WSL2.

Thanks a lot for answering my question. How about using Ubuntu docker image in Ubuntu platform can ncu work?

Yes. This scenario should work.

This topic was automatically closed after 10 hours. New replies are no longer allowed.