Help needed

Friends I am trying to change my c++ code to Cuda but I am not getting the desired output and also there is no error from Cuda error check.
I have tried my best. Its FWI code where I am calling 2 kernels, once for calculation other for updating my values. Please help…
My code:-//the commented out for loops is code that I have converted to Cuda.
If we uncomment the for loops after the kernel calls and comment out the kernel calls and also the Cuda memcpy in the end (for vz array) then we will get to see the correct output.

My Cuda Code where Conversion took place

My Full repo link

Please have a look and let me know if you need any more information of any sort.
NOTE: Those who are trying to run the code, there may be an error while deallocation at the end of program. Please ignore it as it has no effect on our output.

Can anyone please have a look?

Hi @anasmd4u

It would be helpful if you could limit a bit the scope. You may try with a small proof of concept, because the code is too large to be analysed quickly and give you a concrete answer. Try to create a small problem that behaves in a similar way to your real problem.

Regards,
Leon.

I have changed my code and made it reproducible on your system. I have also applied a breakpoint where if the CPU and GPU output are not matching code will exit. Unfortunately, kernel exits after 3rd iteration. Code can be found here..