CUDA 3.1 ptxas error on sm_13

d.rossetti · September 6, 2010, 10:34am

I may well be wrong, but it seems that CUDA 3.1 is not able to compile this on sm_13:

#include <stdio.h>

#include <stdlib.h>

#include <assert.h>

#include <cuda.h>

__device__ unsigned int atomic_cnt = 0;

__device__ int queue_fetch()

{

  __shared__ bool amFirst;

  __shared__ bool amLast;

  const uint id = threadIdx.z + threadIdx.y*blockDim.z + threadIdx.x*blockDim.z*blockDim.y;

  const uint nblocks = gridDim.z*gridDim.y*gridDim.x;

  if(0==id) {

	unsigned int ticket = 0;

	ticket = atomicInc(&atomic_cnt, nblocks);

	amFirst = (ticket == 0);

	amLast = (ticket == (nblocks-1)); 

  }

  __syncthreads();

  // other code here which was removed to simplify it

  return 0;

}

__global__ void kernel()

{

  queue_fetch();

}

int main()

{

}

while CUDA2.3 is able to compile it.

any ideas?

eyalhir74 · September 6, 2010, 10:54am

Hi,
Looks like this one:
[url=“http://forums.nvidia.com/index.php?showtopic=179452”]http://forums.nvidia.com/index.php?showtopic=179452[/url]

eyalhir74 · September 6, 2010, 10:54am

Hi,
Looks like this one:
[url=“The Official NVIDIA Forums | NVIDIA”]http://forums.nvidia.com/index.php?showtopic=179452[/url]

avidday · September 6, 2010, 10:58am

I initially thought that as well, but changing the pair of shared memory booleans to a 32 bit type doesn’t help. The ptx it generates looks OK to my inexpert eye, but the assembler is choking on something.

avidday · September 6, 2010, 10:58am

I initially thought that as well, but changing the pair of shared memory booleans to a 32 bit type doesn’t help. The ptx it generates looks OK to my inexpert eye, but the assembler is choking on something.

eyalhir74 · September 6, 2010, 11:00am

I think it was suggested in the other thread that the if statement somehow causes this… the OP also has if statement in his code.

eyal

eyalhir74 · September 6, 2010, 11:00am

I think it was suggested in the other thread that the if statement somehow causes this… the OP also has if statement in his code.

eyal

d.rossetti · September 6, 2010, 1:39pm

judging from the weird output, I really think that there is a bug inside ptxas, which has that print as a side effect.
just look at the chars after the “line 0:” part, and the “”
it may be in a ptxas code path related to the sm_13, which I think is much less tested nowadays

d.rossetti · September 6, 2010, 1:39pm

judging from the weird output, I really think that there is a bug inside ptxas, which has that print as a side effect.
just look at the chars after the “line 0:” part, and the “”
it may be in a ptxas code path related to the sm_13, which I think is much less tested nowadays

cbuchner1 · September 6, 2010, 1:43pm

For developers targeting primarily compute 1.x devices there are currently not a lot of good reasons to update to the 3.0 and 3.1 toolkits.

cbuchner1 · September 6, 2010, 1:43pm

For developers targeting primarily compute 1.x devices there are currently not a lot of good reasons to update to the 3.0 and 3.1 toolkits.

jan.heckman · September 6, 2010, 7:39pm

I may well be wrong, but it seems that CUDA 3.1 is not able to compile this on sm_13:

#include <stdio.h>

#include <stdlib.h>

#include <assert.h>

#include <cuda.h>

__device__ unsigned int atomic_cnt = 0;

__device__ int queue_fetch()

{

  __shared__ bool amFirst;

  __shared__ bool amLast;

  const uint id = threadIdx.z + threadIdx.y*blockDim.z + threadIdx.x*blockDim.z*blockDim.y;

  const uint nblocks = gridDim.z*gridDim.y*gridDim.x;

  if(0==id) {

	unsigned int ticket = 0;

	ticket = atomicInc(&atomic_cnt, nblocks);

	amFirst = (ticket == 0);

	amLast = (ticket == (nblocks-1)); 

  }

  __syncthreads();

  // other code here which was removed to simplify it

  return 0;

}

__global__ void kernel()

{

  queue_fetch();

}

int main()

{

}

while CUDA2.3 is able to compile it.

any ideas?

You are not going to believe this:

Leave out gridDim.z in the nblocks declaration and it will compile (1.3 on 3.1).

Afaik gridDim.z is always 1 on 1.3?

jan.heckman · September 6, 2010, 7:39pm

I may well be wrong, but it seems that CUDA 3.1 is not able to compile this on sm_13:

#include <stdio.h>

#include <stdlib.h>

#include <assert.h>

#include <cuda.h>

__device__ unsigned int atomic_cnt = 0;

__device__ int queue_fetch()

{

  __shared__ bool amFirst;

  __shared__ bool amLast;

  const uint id = threadIdx.z + threadIdx.y*blockDim.z + threadIdx.x*blockDim.z*blockDim.y;

  const uint nblocks = gridDim.z*gridDim.y*gridDim.x;

  if(0==id) {

	unsigned int ticket = 0;

	ticket = atomicInc(&atomic_cnt, nblocks);

	amFirst = (ticket == 0);

	amLast = (ticket == (nblocks-1)); 

  }

  __syncthreads();

  // other code here which was removed to simplify it

  return 0;

}

__global__ void kernel()

{

  queue_fetch();

}

int main()

{

}

while CUDA2.3 is able to compile it.

any ideas?

You are not going to believe this:

Leave out gridDim.z in the nblocks declaration and it will compile (1.3 on 3.1).

Afaik gridDim.z is always 1 on 1.3?

d.rossetti · September 7, 2010, 10:11am

hey… that’s right! nice spot!

now I understand why a similar construct is ok in another code…

thank you so much!

d.rossetti · September 7, 2010, 10:11am

hey… that’s right! nice spot!

now I understand why a similar construct is ok in another code…

thank you so much!

njuffa · September 7, 2010, 9:30pm

Thank you for bringing this issue to our attention. I was able to reproduce the problem on 64-bit Linux (RHEL 5.3) with CUDA 3.1. Interestingly it does not reproduce on 64-bit Windows (the code won’t compile however because “uint” is undefined). I will follow up with our toolchain team.

njuffa · September 7, 2010, 9:30pm

Thank you for bringing this issue to our attention. I was able to reproduce the problem on 64-bit Linux (RHEL 5.3) with CUDA 3.1. Interestingly it does not reproduce on 64-bit Windows (the code won’t compile however because “uint” is undefined). I will follow up with our toolchain team.

Topic		Replies	Views
A serious bug in ptxas? CUDA Programming and Performance	4	8075	August 9, 2010
Nvcc 13.1 ptxas codegen bug: createpolicy.fractional dropped on sm_90 CUDA NVCC Compiler	1	59	May 1, 2026
Shared mem atomics Repeat topic CUDA Programming and Performance	47	9124	December 1, 2009
Compilation errors with CUDA 3.1 CUDA Programming and Performance	1	1660	July 6, 2010
Shared Memory Compilation Error CUDA Programming and Performance	2	739	November 10, 2009
CUDA 1.1 Bug - Compiler crash (ptxas) w/repro CUDA Programming and Performance	16	8777	May 19, 2008
CUDA_ERROR_NO_BINARY_FOR_GPU CUDA Programming and Performance	2	4195	March 18, 2012
BUG: Broken register allocation, toolkit 2.3 CUDA Programming and Performance	15	7050	May 10, 2010
CUDA 3.1 crashes CUDA Programming and Performance	6	2590	June 30, 2010
nvcc: compute_13 breaks -Xptxas=-v nvcc bug CUDA Programming and Performance	0	3021	August 25, 2008

CUDA 3.1 ptxas error on sm_13

Related topics