I have some fully working code that does Java => JNI => C++ => Kernel => C++ => JNI => Java, so I think I’ve got most of the problems covered.
However, I try to expand on the code, (incrementally moving from host to device code) and add the below segment, and while it compiles fine it appears to segfault when cudaMallocPitch executes.
Most likely I haven’t quite understood the documentation for MallocPitch, but as far as I can tell, I’m doing it right.
[codebox]vector
//Read file and put data into inputVector. When done,
//inputVector basically looks like inputVector[59][10000]
float* kernel_inputArray;
size_t* size;
int error = 0;
error = cudaMallocPitch((void**)&kernel_inputArray, kernel_size, inputVector.size() * sizeof(float), inputVector[0].size());[/codebox]