I am using Qwen2 7b and I loaded it like this
model, tokenizer = FastLanguageModel.from_pretrained(
model_name=MODEL_DIR, max_seq_length=MAX_SEQ_LEN, dtype=torch.float16, load_in_4bit=True, device_map='cuda:0')
I run an usual generation pipeline through this script
inputs = tokenizer(prompt, return_tensors=“pt”, padding=False, truncation=False)
input_ids = inputs[“input_ids”].to(‘cuda:0’)
attention_mask = inputs[“attention_mask”].to(‘cuda:0’)
prompt_len = input_ids.shape[1]
with torch.no_grad():
outputs = model.generate( input_ids=input_ids, attention_mask=attention_mask, temperature=0.7, top_p=0.9, do_sample=True, use_cache=True, return_dict_in_generate=True )outputs.sequences
when I run it for first time it works well and all is good but if I want to generate another response and I try to run this script again it rises an error
AcceleratorError Traceback (most recent call last)
Cell In[24], line 2
1 inputs = tokenizer(prompt, return_tensors=“pt”, padding=False, truncation=False)
----> 2 input_ids = inputs[“input_ids”].to(‘cuda:0’)
3 attention_mask = inputs[“attention_mask”].to(‘cuda:0’)
4 prompt_len = input_ids.shape[1]AcceleratorError: CUDA error: device-side assert triggered
Search for `cudaErrorAssert’ in https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html for more information.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
I follow the link mentionned above looking for cudaErrorAssert and I found
cudaErrorAssert = 710
An assert triggered in device code during kernel execution. The device cannot be used again. All existing allocations are invalid. To continue using CUDA, the process must be terminated and relaunched.
I am using Tesla V100-PCIE-32GB GPU with
| NVIDIA-SMI 535.54.03 Driver Version: 535.54.03 CUDA Version: 12.2 |
|-----------------------------------------±---------------------±---------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
I explored the internet looking for a solution to fix this problem. I found these configurations
os.environ[‘TORCH_USE_CUDA_DSA’] = “1”
os.environ[“CUDA_LAUNCH_BLOCKING”] = “1”
But it didn’t fix the issue in my case. I am using two gpus
os.environ[“CUDA_VISIBLE_DEVICES”] = “3,4”
How can I fix this error?