When I tried run the following command to launch julia through nsys
nsys launch [path\to\julia],
I successfully launch the Julia REPL. And when I try
using CUDA
it gives the following warning
┌ Warning: CUDA runtime library cupti64_120.dll was loaded from a system path. This may cause errors.
│ Ensure that you have not set the LD_LIBRARY_PATH environment variable, or that it does not contain paths to CUDA libraries.
└ @ CUDA C:\Users\hugo1.julia\packages\CUDA\tVtYo\src\initialization.jl:173
And if I continue setting up an CUDA array and do some basic computation,
a = CUDA.rand(5);
sin.(a)
then it returns a long error message
ERROR: Failed to compile PTX code (ptxas exited with code 3221225477)
Invocation arguments: --generate-line-info --verbose --gpu-name sm_86 --output-file C:\Users\hugo1\AppData\Local\Temp\jl_1adCNWXW9k.cubin C:\Users\hugo1\AppData\Local\Temp\jl_ZydkBwTu7O.ptx
ptxas info : 24 bytes gmem
ptxas info : Compiling entry function '_Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li1ELi1EE11BroadcastedI12CuArrayStyleILi1EE5TupleI5OneToI5Int64EE3sinS4_I8ExtrudedIS0_IS1_Li1ELi1EES4_I4BoolES4_IS6_EEEES6_' for 'sm_86'
ptxas info : Function properties for _Z16broadcast_kernel15CuKernelContext13CuDeviceArrayI7Float32Li1ELi1EE11BroadcastedI12CuArrayStyleILi1EE5TupleI5OneToI5Int64EE3sinS4_I8ExtrudedIS0_IS1_Li1ELi1EES4_I4BoolES4_IS6_EEEES6_
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 29 registers, 456 bytes cmem[0], 8 bytes cmem[2]
If you think this is a bug, please file an issue and attach C:\Users\hugo1\AppData\Local\Temp\jl_ZydkBwTu7O.ptx
Stacktrace:
[1] error(s::String)
@ Base .\error.jl:35
[2] compile(job::GPUCompiler.CompilerJob)
@ CUDA C:\Users\hugo1\.julia\packages\CUDA\tVtYo\src\compiler\compilation.jl:188
[3] actual_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, world::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::typeof(CUDA.compile), linker::typeof(CUDA.link))
@ GPUCompiler C:\Users\hugo1\.julia\packages\GPUCompiler\YO8Uj\src\execution.jl:125
[4] cached_compilation(cache::Dict{Any, CuFunction}, src::Core.MethodInstance, cfg::GPUCompiler.CompilerConfig{GPUCompiler.PTXCompilerTarget, CUDA.CUDACompilerParams}, compiler::Function, linker::Function)
@ GPUCompiler C:\Users\hugo1\.julia\packages\GPUCompiler\YO8Uj\src\execution.jl:103
[5] macro expansion
@ C:\Users\hugo1\.julia\packages\CUDA\tVtYo\src\compiler\execution.jl:318 [inlined]
[6] macro expansion
@ .\lock.jl:267 [inlined]
[7] cufunction(f::GPUArrays.var"#broadcast_kernel#26", tt::Type{Tuple{CUDA.CuKernelContext, CuDeviceVector{Float32, 1}, Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(sin), Tuple{Base.Broadcast.Extruded{CuDeviceVector{Float32, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64}}; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
@ CUDA C:\Users\hugo1\.julia\packages\CUDA\tVtYo\src\compiler\execution.jl:313
[8] cufunction
@ C:\Users\hugo1\.julia\packages\CUDA\tVtYo\src\compiler\execution.jl:310 [inlined]
[9] macro expansion
@ C:\Users\hugo1\.julia\packages\CUDA\tVtYo\src\compiler\execution.jl:104 [inlined]
[10] #launch_heuristic#1080
@ C:\Users\hugo1\.julia\packages\CUDA\tVtYo\src\gpuarrays.jl:17 [inlined]
[11] launch_heuristic
@ C:\Users\hugo1\.julia\packages\CUDA\tVtYo\src\gpuarrays.jl:15 [inlined]
[12] _copyto!
@ C:\Users\hugo1\.julia\packages\GPUArrays\5XhED\src\host\broadcast.jl:65 [inlined]
[13] copyto!
@ C:\Users\hugo1\.julia\packages\GPUArrays\5XhED\src\host\broadcast.jl:46 [inlined]
[14] copy
@ C:\Users\hugo1\.julia\packages\GPUArrays\5XhED\src\host\broadcast.jl:37 [inlined]
[15] materialize(bc::Base.Broadcast.Broadcasted{CUDA.CuArrayStyle{1}, Nothing, typeof(sin), Tuple{CuArray{Float32, 1, CUDA.Mem.DeviceBuffer}}})
@ Base.Broadcast .\broadcast.jl:873
[16] top-level scope
@ REPL[2]:1
[17] top-level scope
@ C:\Users\hugo1\.julia\packages\CUDA\tVtYo\src\initialization.jl:185
Last but not least, I am using RTX3050 along with the following softwares
C:\Windows\System32>nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Fri_Jan__6_19:04:39_Pacific_Standard_Time_2023
Cuda compilation tools, release 12.0, V12.0.140
Build cuda_12.0.r12.0/compiler.32267302_0
C:\Windows\System32>nsys --version
NVIDIA Nsight Systems version 2022.4.2.50-32196742v0
BTW, I also encounter the problem about returning “command ignored” when I tried launching the julia. And the workarounds mentioned in
Nsys launch julia hangs on on Windows 11
don’t always work.
Thx for the help.