Weird behaviour with cuPrintf Need help

I’m having a problem with cuPrintf that I can’t understand what is causing it. When I have scaled down my code so only the relevant parts is left it looks like this ( this is the code I’m running now as well when trying it, just to be clear )

__global__ static void testKernel(float* program, struct Result* result, struct BarWrapper* weekData, struct settings* Settings,

		struct wfSettings* wfSet, int nrLinesSettingsFile) {

	extern __shared__ int array[];

	int blockIndex = blockIdx.x + gridDim.x*blockIdx.y;

	int* optValues = (int*)array;

	if(threadIdx.x==0){

		int cumultVal = 1;

		for(int i=0;i<nrLinesSettingsFile;i++){

			optValues[i] = wfSet[i].start + ((blockIndex/cumultVal)%wfSet[i].amount)*wfSet[i].step;

			cumultVal *= wfSet[i].amount;

		}

		cuPrintf("%d %d %d  \n", optValues[0],optValues[1],optValues[2]);

		a++;

	}

I’m sending in quite a few parameters to my kernel but these are for other functionality which is not used at the moment when I’m trying to find out what ih happening with cuPrintf. The only one of interest now is the struct wfSettings* wfSet. That is a struct which looks like this:

struct wfSettings{

	int start;

	int stop;

	int step;

	int amount;

};

What I’m doing here is that I want to find all possible combinations for a some variables that is read from a file basically, so each block will use a particular variable setup. I’m reading these settings from a file and send those to the kernel. Now to the weird part.

Case 1 : If my settings file looks like this

5 9 1

1 3 1

5 8 1

it means there is 60 possible combinations here. The first column is start value, 2nd is endvalue and 3rd is “stepvalue”. So the first row can have values 5,6,7,8,9, the second row can have 1,2,3 and so on. The number of blocks I launch the kernel with is based on how many combinations there is. So when I run the program with this settings-file I get the output.

[0, 0]: 5 1 5

[1, 0]: 6 1 5

[2, 0]: 7 1 5

and so on and it lists all possible combinations. So far everything works as I expected. But now if I do a slight change in my settings file so it looks like this

5 9 1

1 3 1

4 8 1

I only changed the start value on the third row to 4 instead of 5. So now there is 75 possible combinations. But now I get no output from cuPrintf. It does not crash and it seems to produce proper calculations if I do some trivial calculation on the kernel and copy it back but for some reason I get no output from cuPrintf.

I launch my kernel like this:

testKernel<<<blockDimensions, 32, nrLinesInFile*sizeof(int)>>>(kernel_program, kernel_result, kernel_wrapper_bars, SettingsKernel, wfSettingsArrayKernel, nrLinesInFile);

nrLinesInFile ( as the name implies ) is just the amount of rows in the settings-file. So in this case it is 3. The blockDimensions in the case where cuPrintf works is 30x2x1 and in the case where it does not work is 25x3x1. I calculate proper dimensions based on how many combinations there is. Will need this later because there is boundaries of how big the dimension can be in each direction (65535 x 65535 x 1) so if I have like 100000 combinations I can’t only use dimension in x-direction for example.

Anyone have an idea what might be the problem here? Is there some buffersize I’m going over or something that makes cuPrintf not make any output? In other programs before I’vre written out quite many lines if I recall correctly and in this case it is not very many lines. Do I have some obvious problem in my logic? Even if I put a cuPrintf-statement somewhere else in the code and just write like cuPrintf(“test %d”, threadIdx.x) I get no output from that eighter.

Some new input for another weird situation. In my kernelcode I have a for-loop ( for debugging ) that uses cuPrintf and prints some stuff. So just like this

for (int j = 0; j < blockIndex; j++)

	cuPrintf("program %d %f %d \n",j, sh_progs[blockIndex], blockIndex);

That gives no output at all. But if I add a useless cuPrintf-statement before the for-loop like cuPrintf(“hello \n”); then that gets written AND the things in the loop also gets written!?

But the output from the loop is only from the last iteration for some unknown reason as well, it seems a particular thread can only output 1 row. Everything seems very unreliable and behaving weird.

Is there some memory issue with cuPrintf that Im not familiar with, it feels that way.

Turned out my problem was related to my Makefile…

I hade forgot to include the -arch=sm_13 ( in my case ) and that probably compiled the code in some “older” way which I guess lead to problems.