Can I print-to-file from a kernel?

There is only a device-side printf(), there is no device-side fprintf(). The way that device-side printf works is by depositing data into a buffer that is copied back to the host, and processed there via stdout. Note that the buffer can overflow if a kernel produces a lot of output. Programmers can select a size different from the default size (I seem to recall it is 1 MB) by specifying the desired size with a call to:

cudaDeviceSetLimit (cudaLimitPrintfFifoSize, size_t size)

On the host side, one can re-direct the stdout stream using the standard freopen() function of cstdio. A simple example:

#include <cstdio>
#include <cstdlib>

__global__ void kernel1 (void)
{
    printf ("Written by kernel 1\n");
}
__global__ void kernel2 (void)
{
    printf ("Written by kernel 2\n");
}

int main ()
{
    fflush (stdout);
    fclose (stdout);
    freopen ("kernel1_output.txt", "w", stdout);
    kernel1<<<1,1>>>();
    cudaDeviceSynchronize();
    fflush (stdout);
    fclose (stdout);
    freopen ("kernel2_output.txt", "w", stdout);
    kernel2<<<1,1>>>();
    cudaDeviceSynchronize();
    fflush (stdout);
    fclose (stdout);
    return EXIT_SUCCESS;
}

After running this program, and assuming permissions and disk-space requirements allowed the files to be written, there should now be two files in the current directory, one containing the output from kernel1 and the other containing the output from kernel2