I’ve got following question: I’m only launching little bit less than 500 threads (493 to be precise), and only use 1 single block.
However, at execution in debug mode I get the error “cudaErrorLaunchOutOfResources” (msg: too many resources requested for launch.)
I’m using a GeForce 9600GT, which should support 512 threads per block.
So my questions are:
how comes I’m already running out of resources?
does it maybe have sthg to do with the allocated registers per thread? if so…how can I find out how many registers are being used right now?
(note: my kernel code is fairly simple…uses only global memory and performs some computations in a loop…I could post it if necessary for your analysis)
in emudebug mode everything’s fine; does this mode not check the accepted launch configurations at all? (if it’s an issue with the register allocation however, I suppose it cannot figure out how many registers will be used on the actual device?)
Any help is appreciated,