I have a task that involves a number of kernels in a loop that could be running from 1 to 64k times, that randomly produces an 8 byte result quite sporadically - roughly 260 occurences per 2.5 seconds, one loop = 11.8ms, in the current test setup on a single GTX 1060.
Due to the various variabilities in output and the small size, my thoughts were to obtain the results via a printf(“%16llx”,…) in the last kernel.
Is this a realistic and efficient option?
Due to a lack of both Cuda and C coding experience, I am having difficulty capturing the output stream for further processing, which entails loading an eight byte buffer, testing it and repeating for as long as results are arriving:
for(i = start; i < end; i++){
kernel1<<<BLOCKS,THREADS,0,0>>>();
kernel2<<<BLOCKS,THREADS,0,0>>>();
kernel3<<<BLOCKS,THREADS,0,0>>>();
kernel4<<<BLOCKS,THREADS,0,0>>>();
}
uint32_t fd = 0;
uint8_t buf[8];
fd = dup(1); //dup stdout and close it to prevent screen clutter.
close(1);
if (fd){
while((read(fd, buf, 8)) == 8){
//Test results in buf
}
exit(EXIT_SUCCESS);
else{
..........
From this point messages are being sent via fprintf(stderr,…) and unfortunately testing to this point, printing to both file and screen seems to indicate nothing getting into buf, with and without close(1) .
Looking at the man page for read();
“On files that support seeking, the read operation commences at the file
offset, and the file offset is incremented by the number of bytes read.
If the file offset is at or past the end of file, no bytes are read,
and read() returns zero.”
is read() not appropriate here? (is seeking possible?).
I have been spending a lot of time reading the last few days, so any help very much appreciated. Am certainly not expecting a fully worked solution, just a few pointers (pun not intended) and indeed, whether this is even the right approach.
Kind regards.