Torch Tensor.cuda() very slow

pranay731 · June 2, 2020, 10:49am

The tensor.cuda() call is very slow. I am using Torch 1.1.0 and Cuda 10.0. Interestingly the call for 5 different tensors, ranging between (1,3,400,300) to (1,3,800,600) varies from 0.003 to1.48 seconds.
Isn’t this should be fast because of same memory shared by gpu and cpu?
Is there any way to speed it up?

dusty_nv · June 2, 2020, 3:38pm

Hi @pranay731, is it the very first call to tensor.cuda() that is taking the longest?

In my experience, the first time you use GPU in torch, it can take a bit of extra time to initialize.

pranay731 · June 3, 2020, 4:11am

Hi @dusty_nv
Its not the first call. I am copying these tensors after copying the model.
Moreover, the first tensor takes least time, also being the smallest. I thought it may be storage issue or some thing, but almost 2 GB of memory is always unused. And I also tried deleting the previous tensor before copying the next one, still same results.

dusty_nv · June 3, 2020, 5:26pm

I don’t believe PyTorch takes advantage of CUDA zeroCopy memory, so it may be allocating the CUDA device memory and then performing cudaMemcpy() operations. PyTorch does however support pinned memory for fast CPU<->GPU memory copies.

In case the clocks have gone idle, have you tried running sudo jetson_clocks beforehand?

pranay731 · June 6, 2020, 6:40pm

Thanks for the help.

I already put the mode to all cores max at the drop down menu on the top right corner beside clock. Both are same or jetson_clocks will do something more?

dusty_nv · June 11, 2020, 4:20pm

The drop-down menu sets the nvpmodel, which sets the min/max clock frequencies and the number of CPU cores that are online. Frequency scaling is still enabled, which dynamically scales the frequencies at runtime based on workload.

jetson_clocks disables frequency scaling, and locks the clocks to their maximums for the current nvpmodel. So they do different things.

Also, while running your pytorch script, do you get any kernel log messages from dmesg?

Topic		Replies	Views
Why is torch.tensor.to('cuda') so slow? Jetson AGX Orin pytorch	5	41	December 9, 2024
Jetson nano sometimes extremely slow with GPU Jetson Nano cuda , pytorch	7	976	November 3, 2023
Tensor.cuda() low fps Jetson Xavier NX tensorrt , fps	4	563	June 21, 2023
Running PyTorch CUDA Jetson Nano pytorch	8	2052	July 13, 2022
GPU/CUDA start time Jetson Xavier NX cuda	5	923	December 15, 2021
Jetson nano slow cuda times with pytorch Jetson Nano cuda , pytorch	14	951	October 11, 2023
Pycuda runs super slow on Jetson Xavier NX compared to running on CPU Jetson Xavier NX pycuda	8	1841	October 18, 2021
Decreased performances if CUDA kernels are not run continuously Jetson TX2	1	449	June 8, 2018
Loading image to GPU with pytorch very slow Jetson Nano cuda , pytorch	4	1440	September 8, 2022
Slow CUDA Loading&Initialisation / GPU Warmup issue Jetson Orin Nano cuda	7	1224	July 21, 2023

Torch Tensor.cuda() very slow

Related topics