why my k-means example fails for large datasets?

I have written a k-means example that works as it should for small datasets but produces a segmentation fault for large datasets.(and by large i mean less than 100MB).
I’m attaching my code.
it works for let’s say 1024 elements and 4 clusters but fails for 10241024 elements and 10241024/256 clusters.
Even with memory leaks the dataset is pretty small for a 1gb card.
I generated the two files in matlab by doing a=rand(10241024,1) and b=rand(10241024,1) but i cannot attach them because they are 20 mb each
I have a gts450 card and i have tried compiling both with and without -arch=compute_20

my_k-means_map_tid_reduce_no_atomic_large.cu (7.81 KB)

The problems in that code have nothing to do with CUDA. Don’t statically declare those large host side data arrays and you might find it works.

you were right.
I replaced the static declarations with malloc and now i get no segmentation fault,although i have another problem.
As the dataset gets bigger my example stops calling the reduce function.
For example for NUMBER_OF_ELEMENTS=17 everything works,if i set NUMBER_OF_ELEMENTS=1024*1024 the program doesn’t execute the reduce phaze and classifies elements in the cluster -1.
I’m ataching the new file.

my_k-means_map_tid_reduce_no_atomic_large.cu (8.08 KB)

I think i found the problem with the larger data set.In my main i’m calling map with 256 threads and 256 blocks which after a point are not enought

Can anyone tell me why my k-means map function doesn’t work as it should for large datasets?
I even modified it to look exacly the same as the one in the attached publication and it still fails.
I’m attaching a variety that uses thrust for the reduction stage.
Thanks a lot

PDPTA08-farivar2.pdf (180 KB)

my_k-means_map_tid.cu (9.27 KB)