Hi,
I’am new user using CUDA to calculate calculation on matrix in my project. And other code executed in CPU (logic if/else, memory malloc, etc.).
The code in CPU is like below:
malloc(matrix1);
cuda_calc1(matrix1);
malloc(matrix2);
cuda_calc2(matrix2);
The problem in my usage is that: CUDA calculaiton is fast, but there’s many data transter between CPU and GPU, the data transfer take almost all the time.
So my question is:
1, Does it possible to do some “logic code like malloc in CUDA(in a thread?)”?
2, Dos some topics or blogs to show?
3, Or other solution?