4070ti supers only kind of working

Hello,

I recently purchased two 4070ti supers for rendering and machine learning. However, the latest drivers (550) only give me a black screen with a flashing underscore in the upper left corner. I tried the 550s directly from Nvidia, as well as other options from Ubuntu all with similar results. So I went backwards to 535 and 525 and both worked and brought up the Ubuntu GUI, but PyTorch says it doesn’t see any CUDA devices and Blender sees the devices but when I try to render using them the wattages don’t increase in the cards, their fan speeds don’t seem to increase either, and they don’t decrease render times.

I don’t believe there is a bottleneck in my system, and I have also tried disabling secure boot. Any other suggestions or insight would be greatly appreciated! Thank you.

Please run nvidia-bug-report.sh as root and attach the resulting nvidia-bug-report.log.gz file to your post.

nvidia-bug-report.log.gz (657.7 KB)

Both gpus fail to properly initialize with an Xid 62. On a previous boot, one gpu was falling off the bus. Since both gpus are only using a x1 pcie connection, I suspect you’re using risers to connect which are bad.

Are you saying using risers in general are bad or the specific risers I am using are bad? I am using risers because I eventually want to have more than 2 GPUs connected.

Thanks so much for your answers so far!

I’m not too fond of risers in general, especially at pcie 4 and higher speeds. You might want to check if you can limit pcie gen in bios or try different risers.

I’m sorry what do you mean by pcie gen? How would I check that?

Thank you for the help, I will try new risers and hope it increases performance. I’m not sure how else I could utilize more GPUs than by using risers, but if the risers prevent the GPUs from using cuda then it isn’t worth it.

pcie generation, (1,2,3,4,5), resulting in higher speed and clocks.

So I tried moving down the gen and different risers and neither fixed it. I did update to the new Ubuntu 24.04 and now Nvidia-smi shows no errors and I am using the latest drivers with no graphical issues. However, when I go into Blender it can detect the cards but when rendering the fans don’t turn on and I don’t think it is speeding up the render. Nvidia-smi does show that it increased its memory usage though.

On the plus side, PyTorch detects 2 cuda devices! I am just not sure how to get both devices working for rendering and machine learning fully, but I feel closer than I was in Ubuntu 22.04.

Hi @generix, I am now on Ubuntu 24.04 and it made it so the GPUs are recognized by everything, Blender, PyTorch, Minecraft, nvidia-smi (with no errors). However, they are all still having trouble utilizing the GPUs. I have secure boot and wayland disabled, have tried all the flavors of 550 drivers, and yet PyTorch sees no acceleration, Blender always throws an error (although it changes, currently I am getting “Failed to retain CUDA context (Unknown error)” but sometimes it is different). I generated another nvidia-bug-report. Would you mind giving it a look? I moved the gen of the pcie lanes to gen 3, that seems to be the max that they will render the screen without errors.

Any help is greatly appreciated, thank you!

nvidia-bug-report.log.gz (1.1 MB)

The first gpu is running at gen 3 and looks fine but the secondary gpu is still running at gen 4 and reporting bus errors.