I’m porting to openCl a wellknown implementation of SIFT features extraction from images.
It’s quite a complex task so I used 3 big kernels and 4 smaller ones.
I adopted aquite linear pattern where, more ofr less, each kernel wait for the previous one to complete. Kernel calling sequence is the following
1 - Many calls to small kernels to preprocess input image and prepare in global memory data structure (quite big) needed by the algorithm.
2 - 4-5 Subsequent calls to the first big kernel to process different area of inpud data structure and producing 4-5 output vector
3 - One call to the second big kernel
4 - One call to the third big kernel
Every stage of these wait for the previous one to complete. At the very end all gpu buffers and events are released.
Everything works if I process a single image (outpus are as expected), but if sequentially execute more than once the alghorith (the second execution starts only when the first returns and between them I call clFinish), the second execution crashes. It crashes while equeueing the first time the fist big kernel (begin of step 2 in the previous list).
When I say “crash” I mean that clEnqueueNDRangeKernel crashes whith the following message “Unhandled exception at 0x02a674bd in OpenCLComputing.exe: 0xC0000005: Access violation reading location 0x00000070.”, so that I get no error code from the function and I don’t know what to do to make it works.
The weird thing is that I alredy called 5 times the kernel during the first execution, and all other kernels too. And how can it be possible that it’s able to start the second execution with smaller kernel before crashing?
I’m working with a Asus GF 9800 GT, windows 7 64-bit, drivers 196.21 and Gpu computing sdk 3.0beta.
Does anybody face similar problems.
Thanks a lot