Torch crashes driver on H100

CUDA Version: 12.9
torch Version: 2.7.1+cu128
I am using a single H100 PCIe on paperspace

>>> torch.cuda.is_available()
True

this shows cuda is available.
However when I run

x = torch.randn(1, 3, 224, 224, device="cuda") 

torch tries to initialize cuda and i get the following error.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/tyrin/Desktop/CCAMSync/.pixi/envs/flash/lib/python3.10/site-packages/torch/cuda/__init__.py", line 372, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 802: system not yet initialized
>>> torch.cuda.is_available()

I’ve looked at other posts and seen this error is usually related to fabricmanager, however I am using single-GPU system.

Do you get the same error with Cuda 12.8, which seems to be the version your torch is based on?