Problems with printf() in kernel

Hey there,
I have Win7 x64 with CUDA 3.2 32bit and the forceware driver… I’m using a GTX460 and whenever I try to use printf() in my kernel compiling with -arch=sm_20 I’m always getting the error: "error: calling a host function from a device/global function is not allowed "

Just can’t get it to work :(
Any ideas?
Thanks!

Do you #include <stdio.h> at the top of the source file?

Oh no, I didn’t… Thanks for that. And another question would be: Is it possible to somehow use printf when having a CUDA program in my mex file (matlab executables which are called from within matlab)?