The output to stdout is buffered, so it will not write anything to screen until the buffer is full or you add a new line. You can write to stderr, which is unbuffered or try setting the stdout buffer to NULL.
Is the printf() running on the GPU? If yes then I think (if I remember correctly) that it will only output once the GPU process has finished.
If not, remember that kernel calls are assynchronous to the host (ie: they launch and continue the host code, while the GPU is running), then you would need some kind of synchronization, like for intance cudaDeviceSynchronize(), prior to the printf.