Hi !
Actually, I need to do a sample operation. I have 5 images of 48Mp, and I need to calculate the mean image.
I have a two methods, one with GPU and one with CPU.
Using CPU, I have :
#pragma omp parallel for simd for (int i=0 ; i<height*width ; i++){ img_sum[i] = (unsigned char)(((unsigned short)(img1[i])+img2[i]+img3[i]+img4[i]+img5[i])/5); }
I use multithreading, with openmp. The calculation time is around 25ms.
Using the GPU, I have :
global void
vectorAdd(const unsigned char *A, const unsigned char *B, const unsigned char *C, const unsigned char *D, const unsigned char *E, unsigned short *F, const int numElements)
{
int i = blockDim.x * blockIdx.x + threadIdx.x;if (i < numElements) { F[i] = (unsigned short)(A[i] + B[i] + C[i] + D[i] + E[i])/5; } } .... int threadsPerBlock = 256; int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock; vectorAdd<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, d_D, d_E, d_S, numElements);
The calculation time is about 35ms.
I expected to have better result using GPU, but apparently that is not the case. It is a very simple operation, and the GPU should be more efficient than the CPU.
Is it because of the number a core, only 256 ? Or maybe I don’t use block and thread correctly.
Does jetson’s GPU is efficient for image processing ? Or that GPU is more efficient only for task like AI ? I ask because I also code a sample median filter using share memory, and the result was better using CPU.
Thank you !