Dear nvidia - why are you so awesome and terrible at the same time?

JamesCEO · December 23, 2020, 7:25pm

Dear Nvidia team,
I’ve been super excited about using my nvidia products. On my primary dev box i’m running 2 2080ti founders edition gpus, and now i’m working with an nvidia jetson nano.

I haven’t got much worth out of any of them, because every time i try to do something locally i spend more time setting up my eco system than working on AI/ML.

Lets take for example the jetson nano. This is nvidia hardware, with a flash drive from nvidia. Why do i spend hours installing nvidia items.

Whats more annoying is i can’t port code from machine to machine. I got yolov5 running on my desktop, now i want to get it running on a live feed on my jetson. 4 hours into it, and i’m still failing to load libudart.so.10.0 when i call pytorch.

I don’t want to deal with dockers or vms or set env, etc…

i want to control my packages, and whats runnning. I just want to be able to load a generic python script and run it.

It can’t be that hard.

Who runs nvidia ecosystem? They need be fired.

I would also work with google to get tensorflow sorted out. Its amazing to me how some package dependency in numpy can bust an entire AI workflow.

What happened to microservices, and microcomponents… things should be forward and backward compatible.
Linux makes this so easy with symbolic links!

Well thats my rant… but seriously, you guys offer so much joy and disappointment at the same time.

dusty_nv · December 25, 2020, 3:16am

Hi @JamesCEO, you may want to re-consider using containers because they address many of the package setup issues it sounds like you have experienced. ML frameworks in particular have complex dependency chains and the containers not only avoid you having to install these yourself, but also makes it easy to maintain separate environments without cross-polution due to the sheer number of Python packages involved. You can mount any number of directories from your host filesystem into the containers to access your datasets and work. Since I myself have begun using/building containers it keeps my environment much cleaner - they are commonly used for ML on PC and servers too.

Take the PyTorch issue you mentioned for example - that error is likely because you have installed a PyTorch wheel that wasn’t compiled against the version of JetPack-L4T you have installed. The l4t-pytorch containers are already pre-built for each JetPack release and are labeled with tags which clearly associate them with their JetPack-L4T version so that would be avoided. If you don’t wish to use the container, then double-check the PyTorch wheel you download is compatible with your JetPack version- the wheels are categorized in the PyTorch thread and on the Jetson Zoo page.

Also regarding flashing, you should just have to flash the SD card image with Etcher tool (which takes around 10 minutes or so) and it comes preloaded with all the JetPack components including CUDA/cuDNN/TensorRT/OpenCV/ect. PyTorch and TensorFlow are add-ons that are installed after from the pip wheels or containers. The Jetson Zoo indexes these add-ons along with their download links and install procedures here:

https://eLinux.org/Jetson_Zoo

Best wishes and hope this helps.