Perhaps the problem is an alignment problem.
Try using MemAllocPitch for 2D memory.
MemAllocPitch3D does not exist ?
And try one of the memcpy 3d methods or so later on…
Just guessing what could be wrong… could also be memory corrupt of graphics card ?
Try other cuda software/kernels and see if they have problems too… otherwise problem is probably with your code still…
Soon I will be able to run kernels from my own api calls (in a different language).
Perhaps later I will try to run your kernel from my own api implementation.
I am too scared to run anybodies kernel in visual studio… too scared that it might fok something up ! =D
But running ptx code should be a bit more safe ?
Then again… ptx is still a script and gets compiled by the nvcuda.dll ptx compiler or whatever… so if there are bugs in the compiler those could be exploited to do nasty stuff.
I think there might also be “binary versions” which are precompiled, less portable but perhaps more safe… they can be loaded into device memory… like an image ?!?
How it exactly work I don’t know yet… I would suspect there is still some string searching in their for the kernel entry point… or perhaps that’s provided via binary too.
Binary is interesting stuff.