After reading this guide I thought I no longer need to use mecpy in case of unified memory on Tegra but probably I am wrong:
My main problem is that if in this code with unified memory I don’t do memcpy but pass the array directly to the CUDA kernel via unified memory it doesn’t work. It seems absurd on Tegra to have to use memcpy if there is unified memory because on the soc GPU and CPU really share the same memory:
InputImage = PPM_import("img/inputimage.ppm");
imageWidth = Image_getWidth(inputImage);
imageHeight = Image_getHeight(inputImage);
imageChannels = Image_getChannels(inputImage);
outputImage = Image_new(imageWidth, imageHeight, imageChannels);
hostInputImageData = Image_getData(inputImage);
hostOutputImageData = Image_getData(outputImage);
cudaDeviceReset();
cudaMallocManaged((void **) &deviceInputImageData, imageWidth * imageHeight *
imageChannels * sizeof(float));
cudaMallocManaged((void **) &deviceOutputImageData, imageWidth * imageHeight *
imageChannels * sizeof(float));
cudaMallocManaged((void **) &deviceMaskData, maskRows * maskCols * sizeof(float));
//memcpy(deviceInputImageData, hostInputImageData, imageWidth * imageHeight * imageChannels * sizeof(float)); //<- IT WORKS
deviceInputImageData = Image_getData(inputImage);//<- NOT WORKS
cudaMemcpy(deviceMaskData, hostMaskData, maskRows * maskCols * sizeof(float), cudaMemcpyHostToDevice);
dim3 dimGrid(ceil((float) imageWidth/TILE_WIDTH),
ceil((float) imageHeight/TILE_WIDTH));
dim3 dimBlock(TILE_WIDTH,TILE_WIDTH,1);
myKernelProcessing<<<dimGrid,dimBlock>>>(deviceInputImageData, deviceMaskData, deviceOutputImageData,imageChannels, imageWidth, imageHeight);
I don’t understand where I’m wrong.
Thanks in advance