Unexpected results

I am currently running into two problems, the first of which is that when I run the executable with the --ptxas-options=-v I get the following results:

ptxas info: 0 bytes lmem, 3300 bytes sm, -1073746256 cm, 17 reg

The cm value is the part that really doesn’t seam right. Does anyone know if this is at all normal and if not what could be wrong?

Also the other problem that I believe could be linked to the first is that I am trying to calculate multiple matrix inverses at the same time. I can successfully run the program with 11 16x16 32 bit float matrices. If I add a 12th the result of just that matrix is only accurate for about 3 decimal places. If I add a 13th all of the inverse values are incorrect with values mostly of NAN and large negative numbers. Does anyone know what could cause this behavior and if the two issues are linked?

P.S. I realized after posting this that I posted it on the wrong section of the forums. I guess that is what happens when you average 3 hours a night of sleep. All my programming questions will be in the right section, but I could still really use help on this issue.

I can’t say anything about cm issue except that it seems like signed/unsigned overflow.

As for your second question, pease give little more details on how you process matrices: what threads are doing, how much theads are you launching… Is there any difference in execution configuration for 11 and 12 matrices?

One more piece on the first issue. I write nothing to constant memory which I think supports the overflow problem, but I don’t know where it is coming from.

More info on the second issue, I am launching 128 threads. The threads load one matrix into shared memory depending on there block grid number. They then use a pivoting algorithm to invert the matrix and copies the result into the same matrix they loaded from. When I run 11 matrices I have 11 blocks per grid. When I run with 12 I have 12 blocks per grid. The number of threads per block stays the same as well as the rest of the configuration with the exception of an extra matrix in global memory. My end goal is to get 32 of these running and I was slowly stepping the load up when I hit the problem.

Well, it is not because of incorrect execution configuration, I think :)
Check your code for overflow/writing past end of array errors. They may cause such type of errors. I can’t think of anything else at the moment. If you can’t find them please post source code here so that we can take a look on it.

When I added my memory amounts by hand I think I discovered the source of the second error. I was allocating over 1.5 times the maximum amount of shared memory per block. The part that I am still confused about is why the ptxas info said that I was only at 3k bytes well under the limit. Plus I still don’t know what is causing the constant memory reporting error. It still prints that was with the fix to reduce shared consumption.

P.S. Thanks for you time AndreiB

I am also often seeing negative values reported for cmem. I also have no clue why, as my kernels run just fine.

I have not seen negative values, but I have seen large bogus values (1.2MB) for kernels that use no constant memory.

I’ve filed a bug against the compiler for this problem. Thanks for reporting it.

Cheers,

Mark