Nvidia Cosmos running on Jetson

๐Ÿš€ ๐—ง๐—ต๐—ฟ๐—ถ๐—น๐—น๐—ฒ๐—ฑ ๐˜๐—ผ ๐—ฎ๐—ป๐—ป๐—ผ๐˜‚๐—ป๐—ฐ๐—ฒ ๐—ฎ ๐—บ๐—ฎ๐—ท๐—ผ๐—ฟ ๐—บ๐—ถ๐—น๐—ฒ๐˜€๐˜๐—ผ๐—ป๐—ฒ ๐—ถ๐—ป ๐—บ๐˜† ๐—ท๐—ผ๐˜‚๐—ฟ๐—ป๐—ฒ๐˜† ๐˜„๐—ถ๐˜๐—ต ๐—ก๐—ฉ๐—œ๐——๐—œ๐—” ๐—–๐—ผ๐˜€๐—บ๐—ผ๐˜€โ„ข! ๐ŸŒŒโฃ

โฃ

I successfully ported the revolutionary NVIDIA ๐—–๐—ผ๐˜€๐—บ๐—ผ๐˜€โ„ข ๐—ฝ๐—น๐—ฎ๐˜๐—ณ๐—ผ๐—ฟ๐—บ to the ๐—๐—ฒ๐˜๐˜€๐—ผ๐—ป ๐—”๐—š๐—ซ ๐—ข๐—ฟ๐—ถ๐—ป, along with the Transformer Engine, making both fully containerized with Docker for a true plug-and-play experience. ๐—–๐—ผ๐˜€๐—บ๐—ผ๐˜€ is a groundbreaking platform of generative world foundation models (W๐—™๐— ๐˜€), advanced tokenizers, and an accelerated data processing pipeline, purpose-built to advance Physical AI in autonomous vehicles and robotics.โฃ

โฃ

This work was recently showcased at CES 2025, where Cosmos took center stage as a transformative technology for developers and industries worldwide. With the port to Jetson AGX Orin, weโ€™re unlocking the power of Cosmos and the ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฒ๐—ฟ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ for edge applications, allowing developers to leverage its physics-based synthetic data generation, model fine-tuning capabilities, and highly efficient inference on compact, efficient systems.โฃ

โฃ

๐—ฃ๐—ผ๐—ฟ๐˜๐—ถ๐—ป๐—ด ๐—–๐—ผ๐˜€๐—บ๐—ผ๐˜€ ๐—ฎ๐—ป๐—ฑ ๐˜๐—ต๐—ฒ ๐—ง๐—ฟ๐—ฎ๐—ป๐˜€๐—ณ๐—ผ๐—ฟ๐—บ๐—ฒ๐—ฟ ๐—˜๐—ป๐—ด๐—ถ๐—ป๐—ฒ wasnโ€™t just about integrationโ€”itโ€™s about empowering developers to harness the future of AI-driven robotics and autonomous systems. With its modular, scalable design, Cosmos is a key enabler for innovation, helping the industry address challenges like data scarcity and variability through synthetic environments that are both photoreal and physics-based.โฃ

โฃ

Letโ€™s shape the future of Physical AI together! Feel free to connect, collaborate, and share your insights on this exciting journey. ๐Ÿš€โฃ

โฃ

#NVIDIA cosmos #NvidiaCosmos #TransformerEngine #PhysicalAI #GenerativeAI #JetsonAGXOrin robotics edgecomputing #AutonomousVehicles ai #Innovation #CES2025 docker #SyntheticDataโฃ

3 Likes

Hello Johnny

Can you please share step by step how you install it on the AGX orin ( I have the 32Gb version on Jetpack 6.1 ) ?

NOTE: When following and running from below page., the process got โ€œkilledโ€

Thanks

Using jetson containers and my docker:

https://hub.docker.com/r/johnnync/r36.4.0-cu126-cp310-cosmos

I am downloading the Cosmos from the Docker page now for testing. But I can not find it on the Dusty github portal. How I should pull it from the Dusty container page ?

Thanks

because, he still is working in other promising things @dusty_nv

ok , no problem. I finished downloading it from the Docker page. I am running it as sudo docker run johnnync/r36.4.0-cu126-cp310-cosmos, but it does not start the Cosmos container. During installation no errors found. Any suggestion to how properly run it ?

Screenshot 2025-01-13 at 9.28.52 AM

jetson-containers run -it -v $(pwd):/workspace johnnync/r36.4.0-cu126-cp310-cosmos (my docker)

in pwd it is pointing to Cosmos clone, because if you download model inside docker, if it break, you lost the modelsโ€ฆ and the process to download them is very large.

also @shahizat replicates my process. Can you help him?

1 Like

Hello @kalustian

You can use this command also. I confirm that @johnnynunezโ€™s container image works.

docker run --runtime nvidia -it --rm -v ./cosmos:/models --network=host johnnync/r36.4.0-cu126-cp310-cosmos:latest
1 Like

checkingโ€ฆ

Hi shahizat

I run Johnnyโ€™s and your command and I could be able to get into the container. Once in there I typed the following commands:

PROMPT=โ€œA sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves.
The robotโ€™s metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints.
A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes,
suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting.
The camera remains static, capturing the robotโ€™s poised stance amidst the orderly environment, with a shallow depth of
field that keeps the focus on the robot while subtly blurring the background for a cinematic effect.โ€

PYTHONPATH=$(pwd) python cosmos1/models/diffusion/inference/text2world.py
โ€“checkpoint_dir checkpoints
โ€“diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World
โ€“prompt โ€œ$PROMPTโ€
โ€“offload_prompt_upsampler
โ€“video_save_name Cosmos-1.0-Diffusion-7B-Text2World

and I got this :

โ€œPython not foundโ€

@kalustian please use this commands below, mount docker using -v option to download the models there:

Firstly, download the model:

PYTHONPATH=/opt/Cosmos python3 /opt/Cosmos/cosmos1/scripts/download_diffusion.py --model_sizes 7B 14B --model_types Text2World Video2World

Then run

PROMPT="The video is a dynamic and immersive driving experience captured from the perspective of a car's dashboard camera, likely mounted on the windshield. The setting is a narrow, two-lane road surrounded by lush greenery, suggesting a scenic route through a forested area. The road is marked with a single yellow line in the center, indicating a one-way traffic direction. The camera remains mostly static, providing a consistent view of the road ahead, while the car moves swiftly around a sharp curve to the right. The surroundings are dense with tall trees, and the road is flanked by a guardrail on the left side, which adds to the sense of speed and adventure. The weather appears overcast, with a misty atmosphere that enhances the feeling of being enveloped in nature. The car's speed is evident from the blurred background and the consistent motion of the road's edge. The video captures the thrill of driving through a picturesque landscape, emphasizing the connection between the driver and the natural environment. The camera's perspective remains focused on the road, with no visible pedestrians or other vehicles, creating an uninterrupted driving experience."

and finally run:

PYTHONPATH=$(pwd) python3 cosmos1/models/diffusion/inference/text2world.py  \
    --checkpoint_dir /models/checkpoints \
    --diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World \
    --prompt "$PROMPT" \
    --video_save_name /models/New_Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient \
    --offload_prompt_upsampler \
    --offload_tokenizer \
    --offload_diffusion_transformer

You can also join our discord channel: Jetson AI Lab Research Group Community

Please allow me 10-15 min to test drive โ€ฆwill provide feedback soon

change python to python3.
Also use all offload models.
With jetson thor will can execute every model on memory

Environment:

  • JetPack 6.1
  • AGX orin 32Gb RAM
  • SWP increased from 15Gb to 55Gb
  • CPU/GPU clocked setup at max speed: 2.2Ghz / 1.3Ghz

Here are the steps I have taken (thanks to Johnny and Shahizat)

1)Run the Docker:
$sudo docker run --runtime nvidia -it --rm -v ./cosmos:/models --network=host johnnync/r36.4.0-cu126-cp310-cosmos:latest

2) Install and login in HuuginFace:
pip install -U โ€œhuggingface_hub[cli]โ€

3) Download the model:
PYTHONPATH=/opt/Cosmos python3 /opt/Cosmos/cosmos1/scripts/download_diffusion.py --model_sizes 7B --model_types Text2World Video2World

4) Add a Prompt

PROMPT=โ€œA sleek, humanoid robot stands in a vast warehouse filled with neatly stacked cardboard boxes on industrial shelves.
The robotโ€™s metallic body gleams under the bright, even lighting, highlighting its futuristic design and intricate joints.
A glowing blue light emanates from its chest, adding a touch of advanced technology. The background is dominated by rows of boxes,
suggesting a highly organized storage system. The floor is lined with wooden pallets, enhancing the industrial setting.
The camera remains static, capturing the robotโ€™s poised stance amidst the orderly environment, with a shallow depth of
field that keeps the focus on the robot while subtly blurring the background for a cinematic effect.โ€

or

PROMPT=โ€œThe video is a dynamic and immersive driving experience captured from the perspective of a carโ€™s dashboard camera, likely mounted on the windshield. The setting is a narrow, two-lane road surrounded by lush greenery, suggesting a scenic route through a forested area. The road is marked with a single yellow line in the center, indicating a one-way traffic direction. The camera remains mostly static, providing a consistent view of the road ahead, while the car moves swiftly around a sharp curve to the right. The surroundings are dense with tall trees, and the road is flanked by a guardrail on the left side, which adds to the sense of speed and adventure. The weather appears overcast, with a misty atmosphere that enhances the feeling of being enveloped in nature. The carโ€™s speed is evident from the blurred background and the consistent motion of the roadโ€™s edge. The video captures the thrill of driving through a picturesque landscape, emphasizing the connection between the driver and the natural environment. The cameraโ€™s perspective remains focused on the road, with no visible pedestrians or other vehicles, creating an uninterrupted driving experience.โ€

5) Run it:
PYTHONPATH=$(pwd) python3 cosmos1/models/diffusion/inference/text2world.py \
โ€“checkpoint_dir checkpoints \
โ€“diffusion_transformer_dir Cosmos-1.0-Diffusion-7B-Text2World
โ€“prompt โ€œ$PROMPTโ€ \
โ€“video_save_name Cosmos-1.0-Diffusion-7B-Text2World_memory_efficient
โ€“offload_tokenizer \
โ€“offload_diffusion_transformer \
โ€“offload_text_encoder_model \
โ€“offload_prompt_upsampler \
โ€“offload_guardrail_models

5) Success !!
After almost 3h @60 watts a 5 sec. video have been created. Special thanks to Johnny and Shahizat

1 Like