I need to process large amount of data and if I process all the data once, the program will be crashed. So I followed the Stream Management in Programming Guide to divide the data into several streams and process a small portion of data once. It works if the number of streams is small, but fails when it becomes large, say 8. :(
I believe what he’s trying to say is if you increase the number of streams in the cuda sdk streams example from 2 to 8 that it behaves oddly. I’ve noticed this as well. In fact, I’ve yet to find a way to reliably use streams in real world scenerios and always end up reverting to a custom queuing scheme. Further cudaStreamQuery and cudaEventQuery are unreliable with > 2 events or in programs with more than one thread. I’ve submitted numerous reproduction examples to nVidia.
Are you sure you are not trying to use more than 512 threads per blocK? I see no error-checking for the kernel call whatsoever, so my guess is you are trying to launch more than 512 threads per block.
Well, you cannot claim you have trouble with streams if in fact your kernel code is buggy. That is something you can check when you skip using streams for a second.
Getting this thread back on track with what the original poster was asking, when does nVidia plan to address the deficiencies in cudaStream_t and cudaEvent_t? The simplest reproduction case is take the CUDA SDK stream example and set number of streams to 8. From debugging the example, it seems to be data mismanagement or a race condition of sorts in the CUDA runtime, as the driver responds correctly to it (the runtime library) ioctl calls.
Obviously getting an issue of this nature requires modification to the cuda runtime. Thus taking the obvious course of action I’ve emailed (about 2 weeks prior to the original post on this thread) to two different contacts at nVidia, that I had found said deficiency, along with a reproduction case, a fairly detailed system call trace, and a fairly detailed step by step memory dump of affected regions (of the runtime and respective driver variables). This course of actions would fall under the “Captain Obvious” realm of dealing with product support with any software company that has ever existed on the planet.
You have the official title of “CUDA Forums Captain Obvious”. You post completely obvious comments on everyone’s threads that has absolutely no value in resolving or aiding anyone. Further you sidetrack legitimate threads with banter instead of taking it to a different thread, like this one.
When I first saw this thread, I saw it lacked certain details, was unaddressed, and I had encountered a similar bug. Therefore by posting, I thought that the following could be accomplished:
Put another user at ease that the issue has been reported and that he’s not doing something wrong, as I’ve encountered this in almost any decently complex CUDA enabled program.
Further point out to the people I’ve submitted the bug to (they read the forums) that maybe the priority placed on resolution of this bug should be raised.
I am not sure what is your problem, but this thread started with a somewhat vague bugreport. I pointed that out. Afterwards, there was someone else who reported a bug (with code) whom I tried to help.
You may call it obvious, but that is all I can do, I am not someone who make system call traces, memory dumps and such, I would have to learn how to first.
What I do is help when I can.
edit: I just re-read and noticed you posted you submitted numerous bug reports. Sorry that I missed/forgot that point. My remark about the pm was therefore stupid