Grounding dino : out of memory

I am a bit surprised, I can not evaluate model using grounding dino notebook

• Hardware RTX 4060 (8go/shared 16go)

docker and notebooks run inside wsl
2025-01-20 14:26:08,889 [TAO Toolkit] [INFO] root 160: Registry: [‘nvcr.io’]
2025-01-20 14:26:08,925 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt
2025-01-20 14:26:08,941 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True
sys:1: UserWarning:
‘evaluate.yaml’ is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/hydra/hydra_runner.py:107: UserWarning:
‘evaluate.yaml’ is validated against ConfigStore schema with the same name.
This behavior is deprecated in Hydra 1.1 and will be removed in Hydra 1.2.
See https://hydra.cc/docs/next/upgrades/1.0_to_1.1/automatic_schema_matching for migration instructions.
_run_hydra(
/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py:119: UserWarning: Future Hydra versions will no longer change working directory at job runtime by default.
See … for more information.
ret = run_job(
/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/loggers/api_logging.py:236: UserWarning: Log file already exists at /results/evaluate/status.json
rank_zero_warn(
/usr/local/lib/python3.10/dist-packages/torch/functional.py:512: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/pytorch/pytorch/aten/src/ATen/native/TensorShape.cpp:3553.)
Evaluate results will be saved at: /results/evaluatere[attr-defined]
final text_encoder_type: bert-base-uncased

tokenizer_config.json: 100%|██████████| 48.0/48.0 [00:00<00:00, 561kB/s]
config.json: 100%|██████████| 570/570 [00:00<00:00, 6.46MB/s]
vocab.txt: 100%|██████████| 232k/232k [00:00<00:00, 1.97MB/s]
tokenizer.json: 100%|██████████| 466k/466k [00:00<00:00, 2.40MB/s]
final text_encoder_type: bert-base-uncased40M [03:56<00:00, 1.86MB/s]]
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
The following callbacks returned in LightningModule.configure_callbacks will override existing callbacks passed to Trainer: ModelCheckpoint
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
Testing DataLoader 0: 0%| | 0/25 [00:00<?, ?it/s]/usr/local/lib/python3.10/dist-packages/transformers/modeling_utils.py:993: FutureWarning: The device argument is deprecated and will be removed in v5 of Transformers. warnings.warn(
/usr/local/lib/python3.10/dist-packages/torch/utils/checkpoint.py:91: UserWarning: None of the inputs have requires_grad=True. Gradients will be None
warnings.warn(

Testing DataLoader 0: 12%|█▏ | 3/25 [00:49<06:01, 0.06it/s]Error executing job with overrides: [‘evaluate.checkpoint=/workspace/tao-experiments/grounding_dino/grounding_dino_vgrounding_dino_swin_tiny_commercial_trainable_v1.0/grounding_dino_swin_tiny_commercial_trainable.pth’, ‘results_dir=/results’]Traceback (most recent call last):
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/decorators/workflow.py”, line 69, in _func
raise e
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/core/decorators/workflow.py”, line 48, in _func
runner(cfg, **kwargs)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/grounding_dino/scripts/evaluate.py”, line 81, in main
run_experiment(experiment_config=cfg)
File “/usr/local/lib/python3.10/dist-packages/nvidia_tao_pytorch/cv/grounding_dino/scripts/evaluate.py”, line 61, in run_experiment
trainer.test(model, datamodule=dm)
File “/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py”, line 753, in test
return call._call_and_handle_interrupt(
File “/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/call.py”, line 44, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File “/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py”, line 793, in _test_impl
results = self._run(model, ckpt_path=ckpt_path)
File “/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py”, line 986, in _run
results = self._run_stage()
File “/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py”, line 1025, in _run_stage
return self._evaluation_loop.run()
File “/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/utilities.py”, line 182, in _decorator
return loop_run(self, *args, **kwargs)
File “/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/evaluation_loop.py”, line 128, in run
batch, batch_idx, dataloader_idx = next(data_fetcher)
File “/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/fetchers.py”, line 133, in next
batch = super().next()
File “/usr/local/lib/python3.10/dist-packages/pytorch_lightning/loops/fetchers.py”, line 60, in next
batch = next(self.iterator)
File “/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/combined_loader.py”, line 341, in next
out = next(self._iterator)
File “/usr/local/lib/python3.10/dist-packages/pytorch_lightning/utilities/combined_loader.py”, line 142, in next
out = next(self.iterators[0]) File “/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py”, line 631, in next
data = self._next_data()
File “/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py”, line 1346, in _next_data
return self._process_data(data)
File “/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py”, line 1372, in _process_data
data.reraise()
File “/usr/local/lib/python3.10/dist-packages/torch/_utils.py”, line 705, in reraise
raise exception
RuntimeError: Caught RuntimeError in pin memory thread for device 0.
Original Traceback (most recent call last):
File “/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py”, line 37, in do_one_step
data = pin_memory(data, device)
File “/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py”, line 79, in pin_memory
return [pin_memory(sample, device) for sample in data] # Backwards compatibility. File “/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py”, line 79, in
return [pin_memory(sample, device) for sample in data] # Backwards compatibility. File “/usr/local/lib/python3.10/dist-packages/torch/utils/data/_utils/pin_memory.py”, line 58, in pin_memory
return data.pin_memory(device)
RuntimeError: CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

I have similar problem using unet.

I have an intel gpu in my machine too, may be I have to change gpu index ???

For WSL, we do not know its status since TAO is not verified on WSL. Also, you are using notebook, there is also some memory limitation in notebook. Refer to python - How to increase Jupyter notebook Memory limit? - Stack Overflow to increase Jupyter notebook memory. And for WSL, please also try to increase SWAP size.

Ok…I used notebook because it was recommended. I don’t know if there are somewhere ready python code of it to run . That would avoid myself to copy/paste python code…

No problem, I can use python standalone . My wsl implementation already works with python and the GPU. But there may be some limitations with GPU memory usage . I don’t know how to check it.

My docker is running in my wsl too - for licensing questions - . At the moment, I don’t know really how the nvidia docker image interacts . But I think that the docker image may need to access to the nvidia GPU and it fails. Is there a simple way to check the nvidia docker runs fine ?

User can run with docker run, the docker info can be found via “$tao info --verbose”.

Please install nvidia-docker2.

curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | \
sudo apt-key add -
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd

nvidia-docker2 is installed

my tao info verbose log Configuration of the TAO Toolkit Instance

task_group:
model:
dockers:
nvidia/tao/tao-toolkit:
5.5.0-pyt:
docker_registry: nvcr.io
tasks:
1. action_recognition
2. centerpose
3. visual_changenet
4. deformable_detr
5. dino
6. grounding_dino
7. mask_grounding_dino
8. mask2former
9. mal
10. ml_recog
11. ocdnet
12. ocrnet
13. optical_inspection
14. pointpillars
15. pose_classification
16. re_identification
17. classification_pyt
18. segformer
19. bevfusion
5.0.0-tf1.15.5:
docker_registry: nvcr.io
tasks:
1. bpnet
2. classification_tf1
3. converter
4. detectnet_v2
5. dssd
6. efficientdet_tf1
7. faster_rcnn
8. fpenet
9. lprnet
10. mask_rcnn
11. multitask_classification
12. retinanet
13. ssd
14. unet
15. yolo_v3
16. yolo_v4
17. yolo_v4_tiny
5.5.0-tf2:
docker_registry: nvcr.io
tasks:
1. classification_tf2
2. efficientdet_tf2
dataset:
dockers:
nvidia/tao/tao-toolkit:
5.5.0-data-services:
docker_registry: nvcr.io
tasks:
1. augmentation
2. auto_label
3. annotations
4. analytics
deploy:
dockers:
nvidia/tao/tao-toolkit:
5.5.0-deploy:
docker_registry: nvcr.io
tasks:
1. visual_changenet
2. centerpose
3. classification_pyt
4. classification_tf1
5. classification_tf2
6. deformable_detr
7. detectnet_v2
8. dino
9. dssd
10. efficientdet_tf1
11. efficientdet_tf2
12. faster_rcnn
13. grounding_dino
14. mask_grounding_dino
15. mask2former
16. lprnet
17. mask_rcnn
18. ml_recog
19. multitask_classification
20. ocdnet
21. ocrnet
22. optical_inspection
23. retinanet
24. segformer
25. ssd
26. trtexec
27. unet
28. yolo_v3
29. yolo_v4
30. yolo_v4_tiny
format_version: 3.0
toolkit_version: 5.5.0
published_date: 08/26/2024

So, you can use tao launcher or docker run to access the TAO container.
For example,
docker run --runtime=nvidia -it --rm -v /localhome/morganh:/localhome/morganh nvcr.io/nvidia/tao/tao-toolkit:5.5.0-pyt /bin/bash