Newbie 5090 passing CUDA_LAUNCH_BLOCKING=1 problem

Hi
I am trying to learn ai to have a new job but i am not familiar with coding and all those problems.
I will pay for this system 10 more months :/
I have Rtx 5090 and using win11
Amd 9800x3d
1500psu etc
My system benchmarks are ok, rops are correct.

When i try to install wan.2.1 with pinokio it ends with this problem:

RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with TORCH_USE_CUDA_DSA to enable device-side assertions. (env) (base) C:\pinokio\api\wan.git\app>

I executed: pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 in env folder.
i updated all the drivers (NVIDIA Nsight Graphics | NVIDIA Developer) but couldn’t manage to fix, i am having so much problem with all text to video softwares.
Even after so many difficulties i managed to run ComfyUI and repeated a guide but at the end i had this error:
DWPreprocessor
nvrtc: error: invalid value for --gpu-architecture (-arch)

nvrtc compilation failed:

define NAN __int_as_float(0x7fffffff)
define POS_INFINITY __int_as_float(0x7f800000)
define NEG_INFINITY __int_as_float(0xff800000)

template
device T maximum(T a, T b) {
return isnan(a) ? a : (a > b ? a : b);
}

template
device T minimum(T a, T b) {
return isnan(a) ? a : (a < b ? a : b);
}

extern “C” global
void fused_sigmoid_mul(float* tinput_2, float* aten_mul) {
{
float tinput_2_1 = __ldg(tinput_2 + (long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x));
aten_mul[(long long)(threadIdx.x) + 512ll * (long long)(blockIdx.x)] = tinput_2_1 * (1.f / (1.f + (expf(0.f - tinput_2_1))));
}
}
Can anyone help me with a way that i can understand please :(

I have no idea what “pinokio” is. The immediate issue seems to be:

CUDA error: no kernel image is available for execution on the device

which seems to be a consequence of an earlier error:

nvrtc: error: invalid value for --gpu-architecture (-arch)

Because an invalid compute capability was passed to the compiler, no kernel was built, and therefore no binary kernel image is available at run time.

What compute capability did you specify for compilation? From NVIDIA’s published list, RTX 5090 has compute capability 10.0, so the corresponding compiler architecture designations are likely sm_100, arch_100.

Since RTX 5090 just started shipping, make sure you are using the latest NVIDIA software components (latest drivers package, latest CUDA version).

1 Like

I’m fairly confident the RTX 5090 compute capability is 12.0
It can be verified in-situ by running the deviceQuery sample code.

Yes, I realize NVIDIA has published info that is different, I am working to correct that.

1 Like

Thanks for help Njuffa and Robert
The guide i am repeating is a text to video project and he is doing it with Rtx 4060 so 5090 should do it much easly, and i checked every step very carefully, tried more than 10 times but always having same problems.
Maybe i should wait for new drivers because, they are up to date in my computer now also i couldn’t find anyone who is using those softwares with 5000 series.

Unfortunately things don’t work quite that easily.

RTX 4060 has compute capability 8.9, while RTX 5090 has compute capability 12.0. This means (1) the RTX 4060 will work fine with older NVIDIA software while the RTX 5090 requires the latest NVIDIA software stack; (2) The GPU architecture (= compute capability) appropriate for the GPU used needs to be supplied in the compilation step.

First you would want to install the latest NVIDIA drivers package and CUDA software (CUDA Toolkit 12.8 Update 1). Then you need to find the nvrtc invocation(s) and check what --gpu-architecture or -arch settings are being passed in, and then modify it to use the settings appropriate for the RTX 5090.

I am trying to learn ai … but i am not familiar with coding

In my head that translates to “I am going to be an ice road trucker but I do not have a driver’s license yet.” I would strongly suggest finding someone local who can assist you one-on-one. We all need to walk before we run, and crawl before we walk. This is going to be a process. It’s doable, but it takes time and dedication. I don’t know where you are in the world or what your circumstances are, but maybe there is a group of hobbyists that meet regularly near you, or there are relevant classes offered at a community college, or maybe even just a course at the local library.

Best of luck in your endeavors.

1 Like

Can this be the problem?
in last cuda toolkit ( 12.8 Update 1) it says CUDA Compatibility only linux.

I am not sure what you are looking at. The link in your post goes to a page that clearly indicates that Windows is a supported platform for CUDA (and thus it has been since CUDA first became available 18 years ago).

What system platforms the pinokio environment is supported on I could not say as I am not familiar with this software; this thread is the first time I encountered the name.

1 Like

I am from Türkiye and actually i am so agree with you, i am searching for local people to get help with basics but since i couldn’t find yet i am trying to find answers online.
But yes i am not even capable to understand answers now :)
Thanks for your answers and advices.
Best wishes
Umut

I acknowledge that it is easy to get mixed up between the ordinary meaning of the term compatibility with a specific feature called “CUDA compatibility”. As you can see, this is something that applies only to aarch64-jetson platforms, that is, NVIDIA’s integrated platforms based on 64-bit ARM processors.

The relevant platform in your context is x86_64, that is, 64-bit processors based on Intel / AMD architecture.

1 Like