Actually, I need to sum matrix, and calculate the average.
The data are from a camera (6004*7920), in 8 bits. Because of the amount of data, the GPU is the most efficient way to obtain a sum of 5 of those images.
For that, I use the example vectorAdd. I’m very very happy about the calculation time, it’s about 0.08ms against 700ms using the CPU (in sequential).
My problem is about the time to transfert data.
I copy 5 vectors from the CPU to the GPU, of 45 Mo, it’s about 225Mo, and I calculate a data speed transfert of 2 Go/s.
Then, I copy 1 vector of 90 Mo (unsigned short), and I calulate a data speed transfert of 1 Go/s.
Can you confirm ?
Moreother, is there a way to increase the data speed transfert, using an other memory ?
Thank you for your answer, I change the way to allocate memory.
I have a question.
Actually, I’m using now CudaMallocHost to use the PINNED memory.
The transfert from host to device doesn’t change, it is always about 2.2Go/s. It is seems to be reasonable.
On the other hand, transfert from device to host change a lot !
I handle an image of 7920*6004, in unsigned short, e.g. 90.7 Mo.
Without PINNED memory, the time to download the image is about 90 ms, ~1Go/s
Without PINNED memory, that time is about 0.3 ms, ~300Go/s.
The final result is right, but I do not understand that speed.
With that method, I have a problem to copy element. In the following code, I just try to make a copy of an image using GPU. I cope with the following difficulty.
The image that I handle is an array of unsigned char, provide by openCV (see the code).
Actually, that is just a way to simulate an amount of data provide by a camera.
With the TX2; I should be able to handle all data contains in the RAM, as CPU and GPU share it. For that, we have to allocate memory, and handle the information thanks to pointer.
int height = 6004 ;
int width = 7920 ;
int NumElement = height*width ;
unsigned char *img1 = NULL ;
unsigned short *imgf = NULL ;
unsigned char *img1_d = NULL ;
unsigned short *imgf_d = NULL ;
The src1 is allocated with cv::Mat, which is a general CPU buffer.
Once you assign the buffer address from img1 = src1.data.
The buffer type of img1 change from mapped memory into a generate CPU buffer that cannot be accessed via GPU.
But if you copy the buffer by value, this won’t change the buffer type but only value.
You can double check if this is the cause of this issue.