Hello.
I am trying to fine tune a model using autotrain. At some point it breaks with Cuda out of Memory Error. Here is the full error message:
ERROR train has failed due to an exception:
ERROR Traceback (most recent call last):
File “/anaconda/envs/customenv/lib/python3.10/site-packages/autotrain/utils.py”, line 280, in wrapper
return func(*args, **kwargs)
File “/anaconda/envs/customenv/lib/python3.10/site-packages/autotrain/trainers/clm/main.py”, line 168, in train
model = AutoModelForCausalLM.from_pretrained(
File “/anaconda/envs/customenv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py”, line 566, in from_pretrained
return model_class.from_pretrained(
File “/anaconda/envs/customenv/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 3480, in from_pretrained
) = cls._load_pretrained_model(
File “/anaconda/envs/customenv/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 3870, in _load_pretrained_model
new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model(
File “/anaconda/envs/customenv/lib/python3.10/site-packages/transformers/modeling_utils.py”, line 751, in _load_state_dict_into_meta_model
set_module_quantized_tensor_to_device(
File “/anaconda/envs/customenv/lib/python3.10/site-packages/transformers/integrations/bitsandbytes.py”, line 98, in set_module_quantized_tensor_to_device
new_value = bnb.nn.Params4bit(new_value, requires_grad=False, **kwargs).to(device)
File “/anaconda/envs/customenv/lib/python3.10/site-packages/bitsandbytes/nn/modules.py”, line 179, in to
return self.cuda(device)
File “/anaconda/envs/customenv/lib/python3.10/site-packages/bitsandbytes/nn/modules.py”, line 157, in cuda
w_4bit, quant_state = bnb.functional.quantize_4bit(w, blocksize=self.blocksize, compress_statistics=self.compress_statistics, quant_type=self.quant_type)
File “/anaconda/envs/customenv/lib/python3.10/site-packages/bitsandbytes/functional.py”, line 816, in quantize_4bit
out = torch.zeros(((n+1)//2, 1), dtype=torch.uint8, device=A.device)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB. GPU 0 has a total capacty of 31.74 GiB of which 87.31 MiB is free. Including non-PyTorch memory, this process has 31.65 GiB memory in use. Of the allocated memory 31.17 GiB is allocated by PyTorch, and 128.73 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
How can I solve this error?
The compute instance I use has 8 x NVIDIA Tesla V100 32GB vRAM
The autotrain command being used is this:
autotrain llm --train --project_name myprojectname --model meta-llama/Llama-2-70b-hf --data_path my/data/path/on/hf --use_peft --use_int4 --learning_rate 2e-4 --train_batch_size 2 --num_train_epochs 3 --trainer sft --model_max_length 4096 --token my_token
#nvidiainception