EzPizzy
December 12, 2022, 12:08pm
1
The CPU and iGPU share the SoC DRAM memory, so what should I do if I want to cpy an array to another?
For example:
float *a;
float *b;
cudaMallocManaged((void **)&a, 10 * sizeof(float));
cudaMallocManaged((void **)&b, 10 * sizeof(float));
cudaMemcpy(a, b, 10 * sizeof(float), cudaMemcpyDeviceToDevice);
I mean, what the flag should be? Is cudaMemcpyDeviceToDevice the correct way? Or something else should I do?
EzPizzy
December 13, 2022, 3:15am
4
Thanks, and I wonder what the execution part is? CPU or GPU when the flag is H2D, D2H and D2D?
Is CPU, CPU and GPU correspondingly?
Because I found a thread here said memcpy is not slower but even faster than naive kernel.
Hi,
The behavior of Jetson memory is slightly different.
Please check the below document for details:
https://docs.nvidia.com/cuda/cuda-for-tegra-appnote/index.html
Thanks.
EzPizzy
December 16, 2022, 7:40am
6
Thanks, actually I think my question is not only specified on Jetson but all GPU device.
So maybe you could tell me the execution part in most scenarios.
That when flag is H2D, D2H and D2D, who did the copy mission, CPU or GPU?
Hi,
GPU will copy the memory (for all flags).
For the native kernel, the memory is accessed with the user-implemented pattern.
The performance might decrease if the access pattern causes too much page missing.
Thanks.
system
Closed
January 11, 2023, 2:49am
10
This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.