I’m a newbie trying to code something in cuda as part of my project.
Please see the attached code.
Here i’m trying to do some small scale image manipulation.
1.The input images are read(500 files, size 64x64, 16bits per pixel , raw format ) into ‘fpInputFrames’ array.
2.Copied it into gpu in fpInputFramesGpu. ( line 155 )
3.Allocated an output array fpOutputFramesGpu.
4.Run the kernel ( i have given block and thread size of 64, so that each thread will handle 1 pixel only ) ( it is changed in the uploaded code as 1 - changed while testing ).
The kernel is invoked in loop ( commented now ).
5. Copy the processed data from device to host.
6. Free the resources.
Problems seen in debug mode. ( not in emulation )
The device to host memory copy copies junk / returns cudaErrorLaunchFailure.
what could be the reason ? :sad: :crying:
Thanks in advance.
Temp.zip (2.09 KB)