how to use DataParallel on TX2?


when I use DataParallel on TX2, my python3.5 code was killed,

but use on GPU desktop server, my python3.5 code can work fine

I import DataParallel from torch.nn.parallel

GPU desktop server pytorch version is 0.4.1

TX2 pytorch version is 0.4.0a0+3749c58

Maybe it’s the pytorch version issue or any else?

how can I solved this issue?


You say “killed”. Did it run out of memory? Run something like “htop” and watch RAM usage if that is the case.


I run “htop” and my code on the same time,

when RAM arrived to 100% and my code was killed,

so how can I solved this issue?

maybe Release memory or any else methods?

I find this discussion(

maybe it is good for me, but my space is not enough…

/dev/root 28G 26G 578M 98% /

I can’t give a complete answer, but I’m sure someone else will add comment for this particular case.

In general, anything CUDA can’t use swap space…but other competing processes probably can, so there is some use in adding swap.

Closing unneeded programs is of course another way to help if there is anything running and consuming RAM which you don’t need at that moment.

Sometimes in cases where lots of threads are being generated you can cut back to one or two threads and the memory required will go down (it still goes through all of the logic, but not at the same time).

Can someone suggest ways to lower RAM usage from a Python based CUDA program?


Maybe you can try to a smaller batch size to decrease the memory usage.


I decreased my one model on TX2…

I want to compression my model at present and let it smaller,

Is this a good way?


Hi AastaLLL,

The batch size is set to 1, maybe it’s can’t smaller.

Thanks your answer

And my classmate suggest to use TensorRT, maybe I will try it?


Sure, TensorRT will decrease much of memory consumption.

But why you want to use DataParallel?
Do you want to use it for multiple GPU training?



Because I run it on GPU server first,

and then I want to run it on TX2,

But TX2 has only one GPU, so DataParallel is not really work what I think.

So, I will survey TensorRT on next step

Because I changed yolo to tiny-yolo on TX2, it increased fps but losed the accuracy…