My code has been running on Cuda 3.0 and 195 pretty much since their respective first beta versions were released, without any problems.
Today I got my first GTX480 cards and my code dies with an unspecified launch failure. I looked in the programming guide hoping to find “Things to keep in mind when migrating from GTX2xx -> GTX4xx”, but didn’t find anything interesting.
Has something similar hit anyone else?
When they get freed up, I’m gonna take one 295 and put in the new computer, to rule out the other hardware.
So I’m not using all shared memory. Getting close on the registers though.
Not sure about the other numbers, am I using all my constant memory? There is an awful lot of parameters to the kernel, so I don’t rule that out.
Also did you include ptx version in exe file
I’m not sure what that means?
Out of bounds shared memory accesses will cause a ULF on 480 but not 295.
I’m gonna guess this is it them. I’ll be testing some more to verify things get right, but if I remove the one code path which uses a lot of shared memory, and replace it with non-shared, the ULF goes away.
In the general kernel, each array(0,1,) will be N elements long, but even this degenerate case with a single float float float4 causes ULFs. And yes, I triple checked that the correct value is being set for shared memory in the kernel launch configuration.
I find that if I rearrange the order of the arrays to float4 float float, the ULF goes away!
Is there some new memory alignment rule for vector loads from shared memory? I don’t recall seeing that in the guide. For that matter, I am following the guide’s recommendation to the letter for assigning dynamically allocated shared memory arrays, except with floats and float4s instead of shorts, floats and ints.