CUDA 3.1 ptxas error on sm_13

I may well be wrong, but it seems that CUDA 3.1 is not able to compile this on sm_13:

#include <stdio.h>

#include <stdlib.h>

#include <assert.h>

#include <cuda.h>

__device__ unsigned int atomic_cnt = 0;

__device__ int queue_fetch()

{

  __shared__ bool amFirst;

  __shared__ bool amLast;

  const uint id = threadIdx.z + threadIdx.y*blockDim.z + threadIdx.x*blockDim.z*blockDim.y;

  const uint nblocks = gridDim.z*gridDim.y*gridDim.x;

  if(0==id) {

	unsigned int ticket = 0;

	ticket = atomicInc(&atomic_cnt, nblocks);

	amFirst = (ticket == 0);

	amLast = (ticket == (nblocks-1)); 

  }

  __syncthreads();

  // other code here which was removed to simplify it

  return 0;

}

__global__ void kernel()

{

  queue_fetch();

}

int main()

{

}

while CUDA2.3 is able to compile it.

any ideas?

Hi,
Looks like this one:
http://forums.nvidia.com/index.php?showtopic=179452

Hi,
Looks like this one:
http://forums.nvidia.com/index.php?showtopic=179452

I initially thought that as well, but changing the pair of shared memory booleans to a 32 bit type doesn’t help. The ptx it generates looks OK to my inexpert eye, but the assembler is choking on something.

I initially thought that as well, but changing the pair of shared memory booleans to a 32 bit type doesn’t help. The ptx it generates looks OK to my inexpert eye, but the assembler is choking on something.

I think it was suggested in the other thread that the if statement somehow causes this… the OP also has if statement in his code.

eyal

I think it was suggested in the other thread that the if statement somehow causes this… the OP also has if statement in his code.

eyal

judging from the weird output, I really think that there is a bug inside ptxas, which has that print as a side effect.
just look at the chars after the “line 0:” part, and the “”
it may be in a ptxas code path related to the sm_13, which I think is much less tested nowadays

judging from the weird output, I really think that there is a bug inside ptxas, which has that print as a side effect.
just look at the chars after the “line 0:” part, and the “”
it may be in a ptxas code path related to the sm_13, which I think is much less tested nowadays

For developers targeting primarily compute 1.x devices there are currently not a lot of good reasons to update to the 3.0 and 3.1 toolkits.

For developers targeting primarily compute 1.x devices there are currently not a lot of good reasons to update to the 3.0 and 3.1 toolkits.

You are not going to believe this:

Leave out gridDim.z in the nblocks declaration and it will compile (1.3 on 3.1).

Afaik gridDim.z is always 1 on 1.3?

You are not going to believe this:

Leave out gridDim.z in the nblocks declaration and it will compile (1.3 on 3.1).

Afaik gridDim.z is always 1 on 1.3?

hey… that’s right! nice spot!

now I understand why a similar construct is ok in another code…

thank you so much!

hey… that’s right! nice spot!

now I understand why a similar construct is ok in another code…

thank you so much!

Thank you for bringing this issue to our attention. I was able to reproduce the problem on 64-bit Linux (RHEL 5.3) with CUDA 3.1. Interestingly it does not reproduce on 64-bit Windows (the code won’t compile however because “uint” is undefined). I will follow up with our toolchain team.

Thank you for bringing this issue to our attention. I was able to reproduce the problem on 64-bit Linux (RHEL 5.3) with CUDA 3.1. Interestingly it does not reproduce on 64-bit Windows (the code won’t compile however because “uint” is undefined). I will follow up with our toolchain team.