Importing CUDA.jl under Nsight Systems exits the process

When I run the command nsys launch julia the process is forced to quit unexpectedly.

I also tried to profile a CUDA kernel written in Julia using the following command:

nsys profile --trace=cuda,nvtx julia test.jl

But the profiling report did not capture any CUDA events. The analysis report always showed the following messages:

| Analysis | 1408 | 00:04.805 | NVTX profiling might not have started correctly.
| Analysis | 1408 | 00:04.805 | No NVTX events collected. Does the process use NVTX?
| Analysis |   | 00:04.805 | CUDA profiling might not have started correctly.
| Analysis |   | 00:04.805 | No CUDA events collected. Does the process use CUDA?

I’m now using this version of Nsight Systems

(base) PS C:\Users\huiyu> nsys --version
NVIDIA Nsight Systems version 2024.4.2.133-244234382004v0

I also tested with a newer version of nsys, but the problem remains unchanged.

More details please check this issue from downstream Windows: Importing CUDA.jl under Nsight Systems exits the process · Issue #2546 · JuliaGPU/CUDA.jl · GitHub

@liuyis

Hi @huiyux, could you share more details regarding how to reproduce this issue? I tried the following but it was not crashing:

$ nsys launch julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.1
 _/ |\__'_|_|_|\__'_|  |  Ubuntu ⛬  julia/1.4.1+dfsg-1
|__/                   |

julia> using CUDA
[ Info: Running under Nsight Systems, CUDA.@profile will automatically start the profiler
julia>                                                                                        

Where could we find test.jl?

How about updating Julia to the latest version and launching again @liuyis ?

(base) PS C:\Users\huiyu\.julia\dev> julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.11.1 (2024-10-16)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia>

This is my version.

Version 1.4.1 might be a little bit too old. And are you using Linux or Windows? Actually this crash seems can only be reproduced on Windows.

test.jl is just my simple CUDA kernel example for trying out Nsight Systems with Julia. You can write any kernel you like to test the profiling, or feel free to use mine if you don’t mind.

using CUDA

# Define the kernel to add elements of two arrays
function vector_add_kernel(a, b, c, n)
    i = threadIdx().x + (blockIdx().x - 1) * blockDim().x
    if i <= n
        @inbounds c[i] = a[i] + b[i]
    end
    return
end

# Host code to set up and launch the kernel
function vector_add_test()
    n = 1024
    a = CUDA.fill(1.0f0, n)  # Initialize array `a` with 1.0 (float32)
    b = CUDA.fill(2.0f0, n)  # Initialize array `b` with 2.0 (float32)
    c = CUDA.zeros(Float32, n)  # Output array `c`

    # Define grid and block dimensions
    threads = 256
    blocks = cld(n, threads)

    # Launch the kernel
    @cuda threads = threads blocks = blocks vector_add_kernel(a, b, c, n)

    # Transfer result back to host
    result = Array(c)

    # Verify the result
    is_correct = all(result .== 3.0f0)  # Each element should be 1.0 + 2.0 = 3.0
    if is_correct
        println("Test PASSED")
    else
        println("Test FAILED")
    end
end

# Run the test
vector_add_test()

Also make sure that your CUDA.jl version is the latest one - mine is 5.5.2 see

julia> CUDA.versioninfo()
CUDA runtime 12.6, artifact installation
CUDA driver 12.7
NVIDIA driver 565.90.0

CUDA libraries:
- CUBLAS: 12.6.3
- CURAND: 10.3.7
- CUFFT: 11.3.0
- CUSOLVER: 11.7.1
- CUSPARSE: 12.5.4
- CUPTI: 2024.3.2 (API 24.0.0)
- NVML: 12.0.0+565.90

Julia packages:
- CUDA: 5.5.2
- CUDA_Driver_jll: 0.10.3+0
- CUDA_Runtime_jll: 0.15.3+0

Toolchain:
- Julia: 1.11.1
- LLVM: 16.0.6

1 device:
  0: NVIDIA GeForce RTX 4060 Laptop GPU (sm_89, 7.534 GiB / 7.996 GiB available)

@huiyux Thanks for the suggestions. It turned out I was running Julia on Linux, after switching to Windows I can successfully reproduce the issue.

I attached WinDBG to the Julia process during profiling, and noticed the following exception before it crashes:

This is same as an existing issue in JuliaGPU’s Github repo, which you also mentioned: Crash on Windows · Issue #37 · JuliaGPU/NVTX.jl · GitHub. It is not really related to Nsys and needs to be fixed on Julia side.

Before it is fixed, you can remove nvtx from --trace= option to WAR the issue. It will stop tracing NVTX annotations, but CUDA trace will still work. I have verified that with --trace=cuda option, the CUDA activities from the test.jl script that you shared can be successfully captured:

1 Like

Thanks for your info and picture - looks like it is indeed an issue from Julia.

Thanks again for taking the time to look into this issue and for your suggestions @liuyis !

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.