I’m trying to use CUDA from a DirectShow filter that I wrote and I have troubles. I created a lib with my CUDA functions that I’m calling from my Directshow Filter dll. The cudaMalloc and cudaMemcpy calls are successful but when I call my kernel, I get a cudaErrorInvalidDeviceFunction return code from my cudaGetLastError() that follows. When I integrate the same lib in a regular exe, I have no problem. What could be the problem ?
A wild guess – perhaps the DirectShow filter code that launches the kernel is executed on a separate thread than the code that does the memory allocation? That would give it a different context, which may be unable to access the device at the point (for some reason).
Sorry I couldn’t be more help. Coding DirectShow stuff is a bit of a specialty, so I don’t know how many other people on here will be able to answer that question with specific details. If you search the General GPU Computing forum, I believe that someone wrote a DirectShow filter for their webcam a little while back that used CUDA to do some transformations on the video. Perhaps you could take a look at that code and compare it to yours.
In my case, it is the same thread for both.(I think)
I also PMed someone at nvidia, and got the answer:
"I have no idea; I’ve used CUDA in very simple DShow filters with no problems. Sorry I can’t be more helpful. Only thing I can think of might be some name-mangling related to the runtime API or something like that. "
I thought that maybe it would work with the new MD version of cuda (my library is multi threaded dll), but I could not test it, since I only have VS2005, and the MD version of the library is only released for VS2008.
I checked with the thread ID, the thread where cudaSetDevice is called is the same with the thread launching the kernel; thus, cuda context should be fine.
I’m still looking into the problem and will update you guys if I find anything