For TX1, are the device and host memories the same and indentical?

hd_ali · July 31, 2016, 6:49am

For TX1, are the device and host memories the same? Does it still require copy from host to device and vice versa?

linuxdev · August 1, 2016, 3:17am

You’ll have to specify what you mean by “device” and “host”. GPU is not attached via PCIe, it is attached directly to the memory controller, so that and anything with direct memory controller connection uses the same memory as the kernel and user space. On a desktop system you’d expect video/GPU to have its own memory and require copy back and forth.

hd_ali · August 1, 2016, 3:49am

Thanks for your reply. So for example if an image is loaded into the main memory using openCV, one would not need to copy it into GPU memory in order to apply a NPP/Cuda function on that, whereas as you mentioned for a desktop system you would have to do that. Right?

linuxdev · August 1, 2016, 12:24pm

I can’t answer OpenCV/CUDA questions, but it may be useful to know that the CUDA code uses pinned memory which is not swapped out and may also be directly accessed via the memory controller or DMA. Data still needs to reach that memory, but there will be no PCIe bus needed. There is no transfer of data across physically separate memory devices. Someone else would need to answer for more detail.

kayccc · August 5, 2016, 6:49am

Hi hd_ali,

Would you please provide more details of your use case?
That could help to provide the suggestion in the right direction.

Thanks

hd_ali · August 5, 2016, 3:08pm

Hi guys

For example, the following snippet is an example of how a filtering would be applied on an image, in a desktop system case. In this case my kernel, here “hostKernel”, has to be transferred to the CUDA memory and then used in “nppiFilterRow_32f_C1R”.
Now my question is: for an embedded system like TX1, do I need still to copy “hostKernel” to CUDA memory as it was the case for a desktop system? Can I directly use “hostKernel” in my “nppiFilterRow_32f_C1R”?

/////////////////////////
Npp32f hostKernel[3] = {1 ,1, 1} ;
Npp32s kernelSize = 3 ;
Npp32s kernelAnchor= 1 ;

Npp32f* deviceKernel;
NPP_CHECK_CUDA(cudaMalloc((void**)&deviceKernel, kernelSize * sizeof(Npp32f)));
NPP_CHECK_CUDA(cudaMemcpy(deviceKernel, &hostKernel, kernelSize * sizeof(Npp32f), cudaMemcpyHostToDevice));

int pixelSize = 4 ;
NppiSize ROI2 = {380,620} ;
int xROI = 4 ;
int yROI = 4 ;

Npp32f* pSrcOffset = oDeviceSrc->data() + yROI * oDeviceSrc->pitch() + xROI * pixelSize ;

NPP_CHECK_NPP (
nppiFilterRow_32f_C1R (pSrcOffset, oDeviceSrc->pitch(),
oDeviceDst->data(), oDeviceDst->pitch(),
ROI2, deviceKernel, kernelSize, kernelAnchor)) ;

hd_ali · August 5, 2016, 3:10pm

Sorry “linuxdev”. I just saw your reply. Thank you very much. I made a new reply as well at below.

kayccc · August 19, 2016, 2:34am

Hi hd_ali,

We have CUDA 7.0 Toolkit support on TX1, it should be no problem to run the same design concept CUDA program on both TX1 and your desktop system.

Have you met any specific issue while running your code?

For specific CUDA programming issue, you could post to CUDA Programming and Performance to get more assistance:
[url]https://devtalk.nvidia.com/default/board/57/[/url]

Thanks

hd_ali · August 19, 2016, 2:43am

Hi kaycc,

Thank you very much for your reply and the performance discussion link.

Topic		Replies	Views
Common memory Jetson TX1	4	690	October 18, 2021
DMA Transfer between Third party device, host, and GPU CUDA Programming and Performance hw , cuda	0	710	June 8, 2020
TX2 GPU and CPU shared same memory space Jetson TX2	2	1449	October 18, 2021
Is there a way to let both the device (GPU) and host (CPUs: 4 ARM + 2 Denver) see the same region of... Jetson TX2	3	1126	December 21, 2018
copy from pinned memory to host is 3x slower than copy from cuda to host, why? Jetson TX2	2	1356	October 18, 2021
Memory from peripheral devices to GPU DMA directly to another device... CUDA Programming and Performance	6	4269	August 16, 2009
PCI Transfer directly to GPU anytime soon? CUDA Programming and Performance	8	3864	November 30, 2010
CUDA Zero Copy On TX1 Jetson TX1	20	7126	October 18, 2021
TX1 camera data to GPU memory directly Jetson TX1	3	3006	May 27, 2016
Jetson TK1 memory management Jetson TK1	0	921	September 27, 2014

For TX1, are the device and host memories the same and indentical?

Related topics