I have a small test program that fails on large datasets, but works on smaller ones.
Actually I have 2 versions : one fast version, that “works” on all sizes (~takes 1.4 seconds on big case)
And a slow version, which uses only global memory. Works on smaller cases, but on the big one, after ~11s the program ends (no error message), but the output array is all zeros, looks like all threads where ‘killed’ before the end.
I am aware of a 5s time limit on Windows, but what about Linux ? Nothing about that in the FAQ.
I’m using Red Hat 4 Update 4, cuda 1.0.