Jetson Tx1 pytorch

Hello, I have fresh installed Jetpack 3.1 with 28.1, CUDA and Cudnn. Opencv and others are built without errors. My problem is when I use this
[url]https://gist.github.com/dusty-nv/ef2b372301c00c0a9d3203e42fd83426[/url]
install procedure, “python setup.py develop” command freezes jetson then it gives segmentation error, I have opened the system monitor and saw one of the build step RAM overflows.
Any idea what to do?

Hi again, today after various tests (build error, change, build error, …) pytorch built successfully on python3. [url]https://github.com/andrewadare/jetson-tx2-pytorch[/url] I used this link. I have changed the cmake version I don’t know this is the problem or not. But I think I have solved it, I will also try it with python2.

Thanks for the update and hope to hear the results of python2.
Thanks.

Hello my update is, python2 build also ran out of memory so I rebuild kernel with swap enabled. Then pytorch compiled very well. Actually I don’t get it why you didn’t activated it in the first place. Now my problem is old version of pytorch installed whatever I do. Installed version is 0.1.10+ac9245a but with git downloads version 0.4.0a0.

If you anyone have successfully installed pytorch can you share your version?

Hi durmushalil, I have this repo building against pyTorch v0.3.0 on TX2 (no swap necessary):

[url]https://github.com/dusty-nv/jetson-reinforcement/blob/master/CMakePreBuild.sh[/url]

pyTorch master (v0.4.0+) builds too, but pyTorch keeps their tutorials/samples updated against their latest binary release (which is v0.3.0 currently), so to maintain compatibility with majority of pyTorch scripts, I checkout v0.3.0 in the script above.

Hi,
I meet trouble when i run “python setup.py develop” command :
"
[ 29%] Building NVCC (Device) object CMakeFiles/THC.dir/THC_generated_THCTensorMode.cu.o
/home/ubuntu/pytorch/torch/lib/THC/THCNumerics.cuh(38): warning: integer conversion resulted in a change of sign

Killed
CMake Error at THC_generated_THCTensorIndex.cu.o.cmake:267 (message):
Error generating file
/home/ubuntu/pytorch/torch/lib/build/THC/CMakeFiles/THC.dir//./THC_generated_THCTensorIndex.cu.o

"
what should i do?

If i use Docker for pyTorch ,is it work?

@TTL, if you are building on TX1, you probably ran out of memory while compiling and need to enable SWAP.

Thank you!@dusty_nv. So how can i enable SWAP on TX1?

If the kernel version you are using needs swap enabled, see this thread: https://devtalk.nvidia.com/default/topic/916777/?comment=4807307

Then after attaching external storage (ideally via SATA or PCIe), create a SWAP partition and mount it like so: https://help.ubuntu.com/community/SwapFaq#How_do_I_add_or_modify_a_swap_partition.3F

I have used jetsonhacks tutorials. He even shows how to build the kernel with swap. It is easy to do, but my advise is use jetpack 3.2, it has swap enabled. Also pytorch supports cuda 9.0, with 3.2 you can use jetsonhacks swap code, then build your pytorch.

Thanks for your advice @durmushalil,but I can’t use jetpack 3.2.When I run ‘JetPack-L4T-3.2-linux-x64_b196.run’, it is always interrupted by ‘manifest file was broken’, I tried all the solutions on this forum, it still doesn’t work,is there any one can help me ?

What happens if you try a fresh JetPack 3.2 in an empty directory?

Are you behind a network firewall? What geographic region are you downloading from?

When I fresh JetPack 3.2 in an empty directory,it still output same error:‘manifest file is broken’.I am not behind a network firewall.After created Jetson TX1 sawp file, I tried to build pyTorch,it wasn’t interrupted by ‘killed’,but I got an error:
[ 85%] Building NVCC (Device) object src/ATen/CMakeFiles/ATen_cuda.dir/native/cuda/ATen_cuda_generated_TensorFactories.cu.o
In file included from tmpxft_00004a33_00000000-4_SoftMax.cudafe1.stub.c:1:0:
/tmp/tmpxft_00004a33_00000000-4_SoftMax.cudafe1.stub.c:41:17: error: parse error in template argument list
template<> __specialization_static void __wrapper__device_stub_cunn_SoftMaxForward<2, ::at::cuda::type , ::at::acc_type<double, (bool)1> , ::at::native::operator ::LogSoftMaxForwardEpilogue>( _ZN2at4cuda4typeIdEE *&__cuda_0,_ZN2at4cuda4typeIdEE *&__cuda_1,int &__cuda_2){__device_stub__ZN2at6native66_GLOBAL__N__42_tmpxft_00004a33_00000000_7_SoftMax_cpp1_ii_826a462619cunn_SoftMaxForwardILi2EddNS1_25LogSoftMaxForwardEpilogueEEEvPT0_S5_i( (_ZN2at4cuda4typeIdEE *&)__cuda_0,(_ZN2at4cuda4typeIdEE *&)__cuda_1,(int &)__cuda_2);}}}}

What is ‘parse error’? Should I change cmake version? Now its version is 3.11.1

Please see this post, there are some China-based ISPs issue recently, and DNS issue is still under fixing, please stay tuned.

Which version of PyTorch are you using? I’m able to build and run v0.3.0. Master (v0.4.0) has changes which aren’t totally ironed out yet.

After switching operators, I was able to download some packages by JetPack v3.2 or JetPack v3.1.
I am using PyTorch v0.3,CUDA v8.0 ,cudnn v5.0,cmake v11.1.2,gcc v5.4 or gcc v4.9. Should I change version of cudnn? I have changed the version of them to build pyTorch, except CUDA and CUDNN

With PyTorch v0.3.0 I am using JetPack 3.2 — which comes with CUDA9 and cuDNN 7.0.5.

Here is the build script that I use. It configures this repo that uses PyTorch on Jetson.

Thank you very much!I will try it! waiting for my good news

After I had installed jetpack 3.2, pyTorch was well build,but I got another error:

RuntimeError: cuda runtime error (7) : too many resources requested for launch at /home/nvidia/pytorch/torch/lib/THCUNN/generic/SpatialUpSamplingBilinear.cu:63

I know that ‘SpatialUpSamplingBilinear.cu’ without launch_bounds(1024) leads to this error, but I don’t know how to fix it…

Hi,

CUDA error 7 means cudaErrorLaunchOutOfResources.

This error usually indicates that the user has attempted to pass too many arguments to the device kernel, or the kernel launch specifies too many threads for the kernel’s register count.

Could you monitor the system status and share results with us?

sudo ./tegrastats

Thanks.

Hi Experts

I am running jetpack 3.3 and python3 on my tx1.

Anybody have a link to script that installs pytorch with the above base?

Needed for some reinforcement learning experiments…

sojohans