I’am new user using CUDA to calculate calculation on matrix in my project. And other code executed in CPU (logic if/else, memory malloc, etc.).
The code in CPU is like below:
The problem in my usage is that: CUDA calculaiton is fast, but there’s many data transter between CPU and GPU, the data transfer take almost all the time.
So my question is:
1, Does it possible to do some “logic code like malloc in CUDA(in a thread?)”?
2, Dos some topics or blogs to show?
3, Or other solution?