CUDA and Directshow

Hello,

I’m trying to use CUDA from a DirectShow filter that I wrote and I have troubles. I created a lib with my CUDA functions that I’m calling from my Directshow Filter dll. The cudaMalloc and cudaMemcpy calls are successful but when I call my kernel, I get a cudaErrorInvalidDeviceFunction return code from my cudaGetLastError() that follows. When I integrate the same lib in a regular exe, I have no problem. What could be the problem ?

Thank you very much to help me.

I ran into the same problem… does anyone have some advice on that?

A wild guess – perhaps the DirectShow filter code that launches the kernel is executed on a separate thread than the code that does the memory allocation? That would give it a different context, which may be unable to access the device at the point (for some reason).

Sorry I couldn’t be more help. Coding DirectShow stuff is a bit of a specialty, so I don’t know how many other people on here will be able to answer that question with specific details. If you search the General GPU Computing forum, I believe that someone wrote a DirectShow filter for their webcam a little while back that used CUDA to do some transformations on the video. Perhaps you could take a look at that code and compare it to yours.

Had the same problem, but could not figure out the answer:

http://forums.nvidia.com/index.php?showtopic=86779&hl=

http://forums.nvidia.com/index.php?showtopic=93617&hl=

In my case, it is the same thread for both.(I think)

I also PMed someone at nvidia, and got the answer:

"I have no idea; I’ve used CUDA in very simple DShow filters with no problems. Sorry I can’t be more helpful. Only thing I can think of might be some name-mangling related to the runtime API or something like that. "

I thought that maybe it would work with the new MD version of cuda (my library is multi threaded dll), but I could not test it, since I only have VS2005, and the MD version of the library is only released for VS2008.

Hi,

Thanks for the advice!

I checked with the thread ID, the thread where cudaSetDevice is called is the same with the thread launching the kernel; thus, cuda context should be fine.

I’m still looking into the problem and will update you guys if I find anything

Yilei