Simple surface compilation problem on CUDA

I’m trying to compile the example code from Section 3.2.5 of the CUDA reference manual, but get the following error from ptxas:

[i]1>ptxas C:/Temp/tmpxft_00002200_00000000-7_surfTest.compute_10.ptx, line 48; Ç=8ü: Parsing error near ‘.surf’: syntax error

1>ptxas Ç=8ü: Ptx assembly aborted due to errors[/i]

I get the same errors from my other code when I simply add the line

surface<void,2> surfRef;

System spec:

Windows XP

CUDA +SDK v3.1

Quadro fx 3700 (Compute Compatibility 1.1)

Visual Studio 2008 Express

DisplayDriver 257.21

Code snippet below:

[codebox]#include <stdio.h>

#include <string.h>

#include <cutil_inline.h>

#include <cuda_runtime_api.h>

// 2D surfaces

surface<void, 2> inputSurfRef;

surface<void, 2> outputSurfRef;

// Simple copy kernel

global void copyKernel(int width, int height) {

// Calculate surface coordinates 

unsigned int x = blockIdx.x * blockDim.x + threadIdx.x; 

unsigned int y = blockIdx.y * blockDim.y + threadIdx.y; 

if (x < width && y < height) { 

	uchar4 data; 

	// Read from input surface 

	surf2Dread(&data, inputSurfRef, x * 4, y); 

	// Write to output surface 

	surf2Dwrite(data, outputSurfRef, x * 4, y); 

} 

}

extern “C” void surfTest() {

int width = 10;

int height = 10;

int size = width * height * sizeof(float);

float *h_data = (float*) malloc(size);

// Allocate CUDA arrays in device memory 

cudaChannelFormatDesc channelDesc = cudaCreateChannelDesc(8, 8, 8, 8, cudaChannelFormatKindUnsigned); 

cudaArray* cuInputArray; 

cudaMallocArray(&cuInputArray, &channelDesc, width, height, cudaArraySurfaceLoadStore); 

cudaArray* cuOutputArray; 

cudaMallocArray(&cuOutputArray, &channelDesc, width, height, cudaArraySurfaceLoadStore); 

// Copy to device memory some data located at address h_data 

// in host memory 

cudaMemcpyToArray(cuInputArray, 0, 0, h_data, size, cudaMemcpyHostToDevice); 

// Bind the arrays to the surface references 

cudaBindSurfaceToArray(inputSurfRef, cuInputArray); 

cudaBindSurfaceToArray(outputSurfRef, cuOutputArray); 

// Invoke kernel 

dim3 dimBlock(16, 16); 

dim3 dimGrid((width + dimBlock.x - 1) / dimBlock.x, 

			 (height + dimBlock.y - 1) / dimBlock.y); 

copyKernel<<<dimGrid, dimBlock>>>(width, height); 

// Free device memory 

cudaFreeArray(cuInputArray); 

cudaFreeArray(cuOutputArray); 

}

[/codebox]

Unfortunately there are no examples that I could find in the SDK which use surfaces, so I can’t test whether these compile.

Any ideas?

  • Richard

I am getting a similar result on Windows Server 2008 x64 using nvcc 3.1

I have traced it to array indexing in my code (which I cannot post), but will describe here. My compile output is:

2>C:/CUDA/bin64/nvcc.exe C:/Users/bdavis5/Documents/QS/QS-NIH/source/branches/trunk/source/Matlab/lib/3rdParty/Siemens/gpu/PMatrixRecon/gpuPMatrixRecon.cu -arch sm_11 --ptxas-options=-v --keep -maxrregcount=32 -Xcompiler /EHsc,/W3,/nologo,/Od,/Zi,/MTd -m64 -DgpuPMatrixRecon_EXPORTS -ccbin "C:/Program Files (x86)/Microsoft Visual Studio 9.0/VC/bin" -DNVCC -c -o C:/Users/bdavis5/Documents/QS/QS-NIH/source/branches/trunk/build/dvip4-Win64/source/Matlab/lib/3rdParty/Siemens/gpu/PMatrixRecon/Debug/gpuPMatrixRecon_generated_gpuPMatrixRecon.cu.obj -IC:/CUDA/include "-IC:/ProgramData/NVIDIA Corporation/NVIDIA GPU Computing SDK/C/common/inc" -IC:/Users/bdavis5/Documents/QS/QS-NIH/source/branches/trunk/source/cpp/lib/3rdParty/Win/boost-cmake-1_41_0 -IC:/Users/bdavis5/Documents/QS/QS-NIH/source/branches/trunk/build/Windows-6.0/install/include/vtk-5.6 -IC:/Users/bdavis5/Documents/QS/QS-NIH/source/branches/trunk/build/Windows-6.0/install/include/VTKEdge "-IC:/Program Files/MATLAB/R2010a/extern/include" -IC:/Users/bdavis5/Documents/QS/QS-NIH/source/branches/trunk/source/Matlab/lib/3rdParty/Siemens/gpu/PMatrixRecon/. -IC:/CUDA/include

2>gpuPMatrixRecon.cu

...

...

2>gpuPMatrixRecon.cudafe2.gpu

2>ptxas gpuPMatrixRecon.ptx, line 108; error   : Arguments mismatch for instruction 'and'

2>ptxas gpuPMatrixRecon.ptx, line 109; error   : Arguments mismatch for instruction 'and'

2>ptxas gpuPMatrixRecon.ptx, line 110; error   : Arguments mismatch for instruction 'shl'

2>ptxas gpuPMatrixRecon.ptx, line 111; error   : Arguments mismatch for instruction 'or'

2>ptxas gpuPMatrixRecon.ptx, line 146; error   : Arguments mismatch for instruction 'and'

2>ptxas gpuPMatrixRecon.ptx, line 147; error   : Arguments mismatch for instruction 'and'

2>ptxas gpuPMatrixRecon.ptx, line 148; error   : Arguments mismatch for instruction 'shl'

2>ptxas gpuPMatrixRecon.ptx, line 149; error   : Arguments mismatch for instruction 'or'

2>ptxas gpuPMatrixRecon.ptx, line 191; error   : Arguments mismatch for instruction 'and'

2>ptxas gpuPMatrixRecon.ptx, line 192; error   : Arguments mismatch for instruction 'and'

2>ptxas gpuPMatrixRecon.ptx, line 193; error   : Arguments mismatch for instruction 'shl'

2>ptxas gpuPMatrixRecon.ptx, line 194; error   : Arguments mismatch for instruction 'or'

2>ptxas gpuPMatrixRecon.ptx, line 266; warning : Double is not supported. Demoting to float

2>ptxas Hâ∞(Ç=¥╔R: Ptx assembly aborted due to errors

The code is that seemingly causes this (I have verified this by commenting in/out the code relates to array indexing into a pointer to floats. In my code I have a structure:

typedef struct Matrix

{

	size_t width;

	size_t height;

	float* data;

	bool column_major;

} Matrix_t;

And the code that causes my issue where z is of type Matrix in kernel code:

z.data[z_index] = z.data[z_index] + x.data[x_index] * y.data[y_index];

//				z.data[z_index] = z.data[z_index] + x.data[x_index] * y.data[y_index];

//				*(z.data + z_index) = *(z.data + z_index) + *(x.data + x_index ) * *(y.data + y_index);

I have tried some variants both array indexing and pointer arithmetic differencing to float. The odd thing I was experiencing that this code would compile and run. The really odd thing is on my Laptop Win7 x64 I could get it to break, then rebuild and it would compile and run I would then later run into the same problem even though I was making unrelated changes to the code. I changed something which now causes this error consistently. If I figure it out I will post or if someone has had the same issue…please post.

I am very curious to know if:

is Unicode Swahili for “I took your goat” or if converted to decimal winning lottery ticket numbers (which state I am not sure. I need more go go on here.). I have seen cryptic error messages, but NVIDIA has really out done themselves here.

Sorry, surfaces aren’t very well documented yet. They are only supported on Fermi (compute capability 2.0 devices).

I agree the error reporting should be better!

First off I wanted to clarify why I posted to this thread. I had to do with error code generated by nvcc

ptxas Ç=8ü: Ptx assembly aborted due to errors

Which after searching google for “ptxas Ptx assembly aborted due to errors” led me here. My posts have nothing to do with surfaces, but everything to do with ptxas cryptic errors.

There must be a random assembly generator inside nvcc as my code(in the kernel):

Matrix_t v = {0};

Now generates:

1>ptxas C:/Users/bdavis/AppData/Local/Temp/tmpxft_000010f0_00000000-4_gpuPMatrixRecon.ptx, line 144; error   : Arguments mismatch for instruction 'and'

1>ptxas C:/Users/bdavis/AppData/Local/Temp/tmpxft_000010f0_00000000-4_gpuPMatrixRecon.ptx, line 145; error   : Arguments mismatch for instruction 'and'

1>ptxas C:/Users/bdavis/AppData/Local/Temp/tmpxft_000010f0_00000000-4_gpuPMatrixRecon.ptx, line 146; error   : Arguments mismatch for instruction 'shl'

1>ptxas C:/Users/bdavis/AppData/Local/Temp/tmpxft_000010f0_00000000-4_gpuPMatrixRecon.ptx, line 147; error   : Arguments mismatch for instruction 'or'

1>ptxas C:/Users/bdavis/AppData/Local/Temp/tmpxft_000010f0_00000000-4_gpuPMatrixRecon.ptx, line 250; warning : Double is not supported. Demoting to float

1>ptxas fatal   : Ptx assembly aborted due to errors

but

Matrix_t v;

does not. I am certain this compiled and ran on the gpu.

And the code:

z.data[z_index] = z.data[z_index] + x.data[x_index] * y.data[y_index];

Which I had commented out now compiles in. This is frustrating and obscure. So my code is compiling again, but I haven’t a clue why and I am just waiting for the random assembly generator to tell me I have another cryptic obscure error. This smells of data corruption or invalid memory access inside nvcc IMHO.

Thanks for the fast response Simon.

It would be really useful if surface support (along with any other unsupported functions / types) could be included in Appendix G of the CUDA programming guide.

  • Richard

I had exactly the same issue in my Linux computer. I’ve solved it by compiling with the -arch=sm_20 option:

nvcc -arch=sm_20 myFile.out myFile.cu

I didn’t know that surfaces were an only-Fermi feature, so that explains why you have to target platforms with compute capability 2.0 or higher, as currently the compiler defaults to compute capability 1.0.

Thank you, I compiled without problems with -arch=sm_20.