After reading this guide I thought I no longer need to use mecpy in case of unified memory on Tegra but probably I am wrong:
My main problem is that if in this code with unified memory I don’t do memcpy but pass the array directly to the CUDA kernel via unified memory it doesn’t work. It seems absurd on Tegra to have to use memcpy if there is unified memory because on the soc GPU and CPU really share the same memory:
You may try these examples for a way of using unified memory:
In short, first allocate unified memory. You would then be able to use its address for both CPU and GPU processing, such as read from CPU into it, then transform it from GPU.
Thanks, but I used cudaMallocManaged and then allocated the unified memory. In this case, shouldn’t memcpy be used anymore? Can I please ask you to modify my code to understand where am I wrong in allocating the memory to pass to the CUDA kernel?
I have already read these links before making the post, I apologize if you interpreted this way … unfortunately in the posts you linked to me (one is a topic that I created and to which you have brilliantly answered) the code is written in opencv and I don’t understand where am I wrong in my CUDA code. Sorry but I searched thoroughly in the forum before posting. If you want to be paid I am still available to pay you for the consultation.
Dear Honey thank you. The problem was related to the fact that in the posted code, not using opencv I was messing with pointers. I followed your suggestions step by step and realized that I was not using managed memory when passing the pointer to the kernel. I rewrote everything using OpenCv and realized where I was wrong thanks to your suggestions.