Marching cube CUDA toolkit

Marching cube CUDA tool kit, says it uses per voxel to per thread.
But the voxel number is very small 32X32X32, so the thread number used 128.
I am confused. Please some one explain me how the threads are dealing with the voxels.
If I use 512X512X512 Voxels, then how many voxels I can pass through each threads, as maximum thread number is 1024.
If any one did marching cube with CUDA with higher voxel number, then would you please advise me, or share a code, if I am not asking for too much.
Many thanks !