I am developing java-application which uses CUDA with help of native DLL. The problem which i faced recently related to calling CUDA code from two java-threads in parallel.
simplified version of C++ code below (I ommitted kernels code):
void calcDistances(...) {
cudaStream_t stream;
cudaStreamCreate(&stream);
HANDLE_ERROR(cudaMalloc(...));
....more cudaMalloc...
HANDLE_ERROR(cudaMemcpyAsync(...));
for (int index = 0; index < anglesCount; index++) {
... kernel1<<< >>> ...
... kernel2<<< >>> ...
... kernel3<<< >>> ...
rotAngle += angleStep;
}
HANDLE_ERROR(cudaMemcpyAsync(...));
HANDLE_ERROR(cudaFree(dFloats1));
... more cudaFree()...
}
Symptoms:
- when this code called in serial (placed “synchronized” on java-side) - OK
- when two java-threads called this code in parallel - it gives “unknown error” (with ~50% chance, so, sometimes it’s ok) on random line after cudaMemcpyAsync() copying to host.
- commenting out cudaFreee() makes this error to dissapear.
- descreasing input data-sizes also makes GPU happy, no error
Thanks for your comments and suggestions, guys.
PS: Posted this question on stackoverflow.com but got no reply there.