Struct in CUDA can i use this struct in CUDA

Quoc_Vinh · October 30, 2008, 12:08am

Hi everybody.

I have a problem, please help me.

I have this “struct”, and it works perfect in C++, but i don’t know that this struct can work in CUDA

struct Mystruct

{

  int n;

  double f;

public:

  Mystruct(const double& d) 

  {

	double dn;

	f = modf(d, &dn);

	n = static_cast<int>(dn);

  }

Mystruct(const int& n, const double& f)

	: n(n), f(f) {}

};

Thank you very much.

Quoc_Vinh · October 31, 2008, 9:33am

I was trying to use this struct but I can not.
so I think that CUDA not support for this kind.

Simon_Green · October 31, 2008, 11:15am

We don’t officially support C++ in CUDA kernel code. Some stuff works (e.g. templates), but we don’t recommend using these features in production code.

Quoc_Vinh · November 1, 2008, 10:18am

Thank Simon Green External Media

CUDA　kernel code only support C language.

alex_dubinsky · November 1, 2008, 3:14pm

That struct should work fine. What do you mean, more specifically, by ‘it doesn’t work’?

Aditi · January 15, 2009, 3:52am

Hi,

I face a similar problem but with C.

I have written codes in C + CUDA with passing arrays between host and device.

Now I want to use “struct” as the data structure but it is giving me a lot of errors.

Eg. if M is a struct and if I use commands like (just for instance):

CUDA_SAFE_CALL(cudaMalloc((void**)&M,structsize));

CUDA_SAFE_CALL(cudaMemset(M,1,structsize));

CUDA_SAFE_CALL(cudaMemcpy(Md,Mh, structsize, cudaMemcpyHostToDevice));

CUDA_SAFE_CALL(cudaFree(M));

etc.

it gives me errors in places where I try to access the elements of the struct on the device and where I declare M to be struct.

If M is a pointer to a struct, it gives me many more compilation errors…many of them hint towards as if CUDA is expecting some “c++ class” instead of a struct. Whatever I do, the problem is not solved and I am not very familiar with C++ too.

No logical “struct” handling in C works to resolve this. Can someone please post an example of struct being passed between the device and the host, allocation on host and device, etc.?

Thanks a lot.

Aditi

T.B · January 15, 2009, 11:02am

Does this help?

[codebox]include <stdio.h>

struct SWhatever

{

float a;

unsigned b;

device float Get() {return a*b;}

};

global void test(SWhatever *data, float * result)

{

result[threadIdx.x]=data[threadIdx.x].Get();

}

int main()

{

const unsigned N=128;

SWhatever *host_data=new SWhatever[N];

for(unsigned i=0;i<N;++i)

{

host_data[i].a=i;

host_data[i].b=i*i;

}

SWhatever *device_data;

cudaMalloc((void**)&device_data,N*sizeof(SWhatever));

cudaMemcpy(device_data,host_data,N*sizeof(SWhatever),cudaMem

cpyHostToDevice);

float *device_result;

cudaMalloc((void**)&device_result,N*sizeof(float));

test<<<1,N>>>(device_data,device_result);

float* host_result=new float[N];

cudaMemcpy(host_result,device_result,N*sizeof(float),cudaMem

cpyDeviceToHost);

for(unsigned i=0;i<N;++i)

{

if(i*i*i!=host_result[i])

{

  printf("Error at index %d. Should be %f but is %f.\n",i,1.f*i*i*i,host_result[i]);

}

}

cudaFree(device_result);

cudaFree(device_data);

delete host_data;

delete host_result;

}[/codebox]

Aditi · January 16, 2009, 2:21am

Hi,

Thanks for your reply. My code scheme is not very different from yours, still it gives me the error. The only difference is that I have a dynamic array of integers within the struct.

I am pasting my code here. It is a simple 4x4 matrix multiplication code (embarrassingly parallel) that I am using to experiment with using structs with CUDA. After the code, I have pasted the error msgs that I get when I compile the code.

Please note two things: in transferring the data between device and host (cudaMemcpy), if I use “matrixsize (i.e. the size of the array within the struct)”, there are no errors for lines 261, 266, 271 and 292. But if I use “sizeof(Matrix)” like in the version below, these lines give the following error: “error: incomplete type is not allowed”.

I am totally perplexed. Will be thankful if someone can look into it and point out the possible reasons of errors.

/************************************************************

sample.cu

This is a example of the CUDA program.

*********/

include <stdio.h>

include <stdlib.h>

include <string.h>

include <cuda.h>

include <cutil.h> /* includes project */

include “cuda_runtime.h”

include “cuda_runtime_api.h”

define WIDTH 4

/* ------------------------------- declaration of functions -------------------------------- */

bool InitCUDA(void);

/************************************************************

************/

/* Init CUDA */

/************************************************************

************/

#if DEVICE_EMULATION

bool InitCUDA(void){return true;}

else

bool InitCUDA(void)

{
int count = 0;

int i = 0;

cudaGetDeviceCount(&count);

if(count == 0) {

	fprintf(stderr, "There is no device.\n");

	return false;

}

for(i = 0; i < count; i++) {

	cudaDeviceProp prop;

	if(cudaGetDeviceProperties(&prop, i) == cudaSuccess) {

		if(prop.major >= 1) {

			break;

		}

	}

}

if(i == count) {

	fprintf(stderr, "There is no device supporting CUDA.\n");

	return false;

}

cudaSetDevice(i);

printf("CUDA initialized.\n");

return true;
}

endif

/************************************************************

************/

/* My First CUDA Code */

/************************************************************

************/

// Code for Multiplication in GPU vs CPU.

// A suffix â€˜dâ€™ suggests operation on the Device.

// A suffix â€˜hâ€™ suggests operation on the Host.

// Matrix Multiplication required: P = M x N.

// One thread handles one element of P. Each thread:

// * Loads a row of Matrix M.

// * Loads a column of Matrix N.

// * For each pair of elements (Mij and Nji), it performs a multiplication and then addition.

// However, here the matrices have been used as one-dimensional arrays.

// Shared memory usage not employed now. Only one block of thread will compute the matrix P.

// So the size of the matrix P (also M & N) is limited by the number of threads allowed in a block.

/* ----------------------------------------- global variables ------------------------------------- */

typedef struct {

int* elements;

} Matrix;

/* ----------------------------------------- global Functions ------------------------------------- */

// extern “C”

global static void MatrixMul_DeviceKernel(int* Md, int* Nd, int* Pd)

{

// Performs Matrix Multiplication on the device.

// Set-up configuration (grid, block, etc.) details available from main().

// Temporary variables

int i;

int tx;

int ty;

int M_element = 0;

int N_element = 0;

int P_element = 0;

// 2D Thread ID

tx = threadIdx.x;

ty = threadIdx.y;

// Perform Multiplication

// Each thread is supposed to pick a row in Md and a column in Nd, multiply corresponding elements and add them.

__syncthreads();

for (i = 0; i < WIDTH; i++)
{

	M_element = Md.elements[tx*WIDTH + i];

	N_element = Nd.elements[i*WIDTH + ty];

	P_element += M_element* N_element;

}
Pd.elements[tx*WIDTH + ty] = P_element;

__syncthreads();

}

/* --------------------------------------------- Host’s (CPU) Main( ) Code ----------------------------------- */

int main(void) {

if(!InitCUDA()) {
	return 0;

}
int i;

struct Matrix *Mh, *Nh, *Ph, *Md, *Nd, *Pd;

int matrixsize = WIDTHWIDTHsizeof(int);

cudaError_t err;

// Allocate and initialize the matrices on the CPU

CUDA_SAFE_CALL(cudaMallocHost((void**)&Mh,sizeof(Matrix)));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Mh malloc error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMallocHost((void**)&Nh,sizeof(Matrix)));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Nh malloc error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMallocHost((void**)&Ph,sizeof(Matrix)));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Ph malloc error: %s.\n",cudaGetErrorString(err));
memset(Mh,1,matrixsize);

memset(Nh,1,matrixsize);

memset(Ph,0,matrixsize);

// Allocate and initialize the elements array in the Matrices on the CPU

CUDA_SAFE_CALL(cudaMallocHost((void**)&Mh.elements,matrixsize));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Mh.elements malloc error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMallocHost((void**)&Nh.elements,matrixsize));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Nh.elements malloc error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMallocHost((void**)&Ph.elements,matrixsize));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Ph.elements malloc error: %s.\n",cudaGetErrorString(err));
memset(Mh.elements,1,matrixsize);

memset(Nh.elements,1,matrixsize);

memset(Ph.elements,0,matrixsize);

// Assign/Fetch values of matrices M and N

for(i=0;i<WIDTH*WIDTH;i++)

{
Mh.elements[i]=5;

Nh.elements[i]=1;

Ph.elements[i]=0;
}

// Allocates enough memory for matrices Md, Nd and Pd on the Device.

CUDA_SAFE_CALL(cudaMalloc((void**)&Md,sizeof(Matrix)));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Md malloc error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMalloc((void**)&Nd,sizeof(Matrix)));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Nd malloc error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMalloc((void**)&Pd,sizeof(Matrix)));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Pd malloc error: %s.\n",cudaGetErrorString(err));
// Initializes matrices Md, Nd and Pd on the Device.

CUDA_SAFE_CALL(cudaMemset(Md,1,sizeof(Matrix)));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Md memset error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMemset(Nd,1,sizeof(Matrix)));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Nd memset error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMemset(Pd,0,sizeof(Matrix)));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Pd memset error: %s.\n",cudaGetErrorString(err));
// Allocate the elements array in matrices Md, Nd and Pd on the GPU

CUDA_SAFE_CALL(cudaMalloc((void**)&Md.elements,matrixsize));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Md.elements malloc error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMalloc((void**)&Nd.elements,matrixsize));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Nd.elements malloc error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMalloc((void**)&Pd.elements,matrixsize));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Pd.elements malloc error: %s.\n",cudaGetErrorString(err));
// Initializes the elements array in matrices Md, Nd and Pd on the GPU

CUDA_SAFE_CALL(cudaMemset(Md.elements,1,matrixsize));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Md.elements memset error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMemset(Nd.elements,1,matrixsize));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Nd.elements memset error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMemset(Pd.elements,0,matrixsize));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "Pd.elements memset error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMemcpy(Md, Mh, sizeof(Matrix), cudaMemcpyHostToDevice));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaMemcpyHostToDevice error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMemcpy(Nd, Nh, sizeof(Matrix), cudaMemcpyHostToDevice));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaMemcpyHostToDevice error: %s.\n",cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaMemcpy(Pd, Ph, sizeof(Matrix), cudaMemcpyHostToDevice));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaMemcpyHostToDevice error: %s.\n",cudaGetErrorString(err));
// Perform Multiplication

// Set-up the execution configuration

dim3 dimGrid(1, 1); /* the grid has only 1 block in this code */

dim3 dimBlock(WIDTH, WIDTH); /* # elements in the matrix = # threads in the block */

// Launch a kernel of threads to perform Matrix Multiplication on the Device

// The function (MatrixMul_DeviceKernel) performs the Matrix Multiplication

MatrixMul_DeviceKernel<<<dimGrid,dimBlock>>>(Md,Nd,Pd);

// This launches a kernel of threads in the â€œBlockâ€ in the â€œGridâ€, all of whose threads perform the function defined in the global function MatrixMul_DeviceKernel and need arguments Md, Nd and Pd to do that.

// Multiplication Over

// Read and copy output matrix Pd from the device to the output matrix P on the host

CUDA_SAFE_CALL(cudaMemcpy(Ph, Pd, sizeof(Matrix), cudaMemcpyDeviceToHost));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaMemcpyDeviceToHost for result of Multiplication error: %s.\n",cudaGetErrorString(err));
// Print the output matrix

for (i=0;i<WIDTH*WIDTH;i++){
printf("Ph.elements[%d] = %d\n",i,Ph.elements[i]);

}
// Free device memory

CUDA_SAFE_CALL(cudaFree(Md.elements));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaFree(Md.elements) error: %s.\n",i,cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaFree(Nd.elements));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaFree(Nd.elements) error: %s.\n",i,cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaFree(Pd.elements));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaFree(Pd.elements) error: %s.\n",i,cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaFree(Md));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaFree(Md) error: %s.\n",i,cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaFree(Nd));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaFree(Nd) error: %s.\n",i,cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaFree(Pd));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaFree(Pd) error: %s.\n",i,cudaGetErrorString(err));
// Free matrices allocated on the CPU

CUDA_SAFE_CALL(cudaFreeHost(Mh.elements));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaFreeHost(Mh.elements) error: %s.\n",i,cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaFreeHost(Nh.elements));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaFreeHost(Nh.elements) error: %s.\n",i,cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaFreeHost(Ph.elements));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaFreeHost(Ph.elements) error: %s.\n",i,cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaFreeHost(Mh));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaFreeHost(Mh) error: %s.\n",i,cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaFreeHost(Nh));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaFreeHost(Nh) error: %s.\n",i,cudaGetErrorString(err));
CUDA_SAFE_CALL(cudaFreeHost(Ph));

err = cudaGetLastError();

if( cudaSuccess != err)
fprintf(stderr, "cudaFreeHost(Ph) error: %s.\n",i,cudaGetErrorString(err));
return 0;

}

/* ------------------------------------------------------------------------------------------------------------------------- */

ERROR LOG:

1>------ Build started: Project: CUDAWinApp1_MatrixMul, Configuration: EmuDebug Win32 ------

1>Compiling…

1>sample.cu

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(114): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(115): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(119): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(142): error: incomplete type is not allowed

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(147): error: incomplete type is not allowed

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(152): error: incomplete type is not allowed

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(163): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(168): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(173): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(178): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(179): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(180): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(186): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(187): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(188): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(193): error: incomplete type is not allowed

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(198): error: incomplete type is not allowed

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(203): error: incomplete type is not allowed

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(210): error: incomplete type is not allowed

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(215): error: incomplete type is not allowed

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(220): error: incomplete type is not allowed

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(227): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(232): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(237): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(244): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(249): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(254): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(261): error: incomplete type is not allowed

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(266): error: incomplete type is not allowed

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(271): error: incomplete type is not allowed

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(285): error: argument of type “Matrix *” is incompatible with parameter of type “int *”

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(285): error: argument of type “Matrix *” is incompatible with parameter of type “int *”

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(285): error: argument of type “Matrix *” is incompatible with parameter of type “int *”

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(292): error: incomplete type is not allowed

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(299): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(304): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(308): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(312): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(332): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(336): error: expression must have class type

1>c:/Documents and Settings/fantom/Desktop/CPU_Project/cuda_00/CUDAWinApp1_MatrixMul//sample.cu(340): error: expression must have class type

1>41 errors detected in the compilation of “C:\DOCUME~1\fantom\LOCALS~1\Temp/tmpxft_00000c28_00000000-6_sample.cpp1.ii”.

1>Build log was saved at “file://c:\Documents and Settings\fantom\Desktop\CPU_Project\cuda_00\CUDAWinApp1_MatrixMul\EmuDebug\BuildLog.htm”

1>CUDAWinApp1_MatrixMul - 41 error(s), 0 warning(s)

========== Build: 0 succeeded, 1 failed, 0 up-to-date, 0 skipped ==========

T.B · January 16, 2009, 9:05am

Hi again,

I just ran through a few of your compiler issues and none of them are CUDA related. Without sounding too much like a jerk, I would recommend that you work on your C/C++ skills a bit more before going on, especially w.r.t. pointer usage. If concepts like that are giving you trouble, then working with CUDA is not going to be a pleasant experience. Grab a book, make sure you understand the language, and you’re going to be fine.

mischan · March 15, 2009, 4:01pm

Does this help?

[codebox]include <stdio.h>

struct SWhatever

{

float a;

unsigned b;

device float Get() {return a*b;}

};

global void test(SWhatever *data, float * result)

{

result[threadIdx.x]=data[threadIdx.x].Get();

}

int main()

{

const unsigned N=128;

SWhatever *host_data=new SWhatever[N];

for(unsigned i=0;i<N;++i)

{
host_data[i].a=i;

host_data[i].b=i*i;
}

SWhatever *device_data;

cudaMalloc((void**)&device_data,N*sizeof(SWhatever));

cudaMemcpy(device_data,host_data,N*sizeof(SWhatever),cudaMem

cpyHostToDevice);

float *device_result;

cudaMalloc((void**)&device_result,N*sizeof(float));

test<<<1,N>>>(device_data,device_result);

float* host_result=new float[N];

cudaMemcpy(host_result,device_result,N*sizeof(float),cudaMem

cpyDeviceToHost);

for(unsigned i=0;i<N;++i)

{
if(i*i*i!=host_result[i])

{

  printf("Error at index %d. Should be %f but is %f.\n",i,1.f*i*i*i,host_result[i]);

}
}

cudaFree(device_result);

cudaFree(device_data);

delete host_data;

delete host_result;

}[/codebox]

This code snippet is great! Thank you so much for sharing.

I just have one question, are you allowed to use pointers in the struct?

Would something like this be legal:

struct Vertices {

int *a;  //set to point to arrays in main

int *b;  

__device__ int Geta(int i) {return a[i];}

__device__ int Getb(int i) {return b[i];}

};

__global__ void test(Vertices *data, int * result)

{

int idx = threadIdx.x;

result[threadIdx.x]= data->Geta(idx) + data->Getb(idx);

}

I can call Geta/b with no problem from the kernel, except when a and b are pointers, Geta and Getb just return 0;

Is the kernel not able to work with pointers?

SPWorley · March 15, 2009, 11:23pm

Are you sure you’re initializing your Vertex objects using DEVICE pointers?
It’s a really common mistake to accidentally use host pointers when doing manual structure initialization on the host.

mischan · March 16, 2009, 1:23am

You are right, and I was making that mistake.

How I got around it was to copy the arrays a and b to device pointers d_a and d_b, with the usual CudaMemcpy, and then calling a kernel function with Vertices *data and d_a d_b, and setting data->a = d_a and data->b = d_b.

This works fine now, but seems a bit ad-hoc. Is this the correct way of doing it, or there are better ways?

Thank you!

seibert · March 16, 2009, 2:31am

Generally, the approach is to avoid more than one level of pointer indirection if at all possible. This usually has the side-effect of forcing you to structure your data in a flatter way that allows for more memory coalescing.

KUNDAN_KUMAR · March 16, 2009, 5:36am

Hi everybody.

I have a problem, please help me.

I have this “struct”, and it works perfect in C++, but i don’t know that this struct can work in CUDA
struct Mystruct

{

  int n;

  double f;

public:

  Mystruct(const double& d) 

  {

	double dn;

	f = modf(d, &dn);

	n = static_cast<int>(dn);

  }

Mystruct(const int& n, const double& f)

	: n(n), f(f) {}

};
Thank you very much.

I have also used struct in CUDA but it worked fine.

what i have done was i create a structure in host and also alocating memory in device and copying structure from host to device and using it as parameter at the time of kernel launch.

One more thing you have used double as data member in structure which is not supported by all devices so use float at the place of double.

asegovia · June 25, 2009, 9:35pm

This might sound obvious, but have you checked that you don’t have circular references and are trying to calculate sizeof(Matrix) before the proper header is actually #included ?

yahastu · June 26, 2009, 2:13am

Is that because templates are not optimized as well?

Topic		Replies	Views
How can I compile CUDA code then link it to a C++/CLR project CUDA Programming and Performance	21	13053	August 21, 2017
Reading R8G8B8A8 texture using tex2D() causes strange result. CUDA Programming and Performance	27	2889	April 28, 2018
multi dimension array CUDA Programming and Performance	26	32786	February 12, 2010
a problem complex array add with cuda ????? CUDA Programming and Performance	2	984	August 17, 2017
cudamalloc struct problems - unspecified launch failure CUDA Programming and Performance	16	3612	April 8, 2011
Constant memory when having more than one file external does not work CUDA Programming and Performance	24	3222	August 27, 2010
VST - CUDA integration CUDA Programming and Performance	16	19943	April 29, 2010
problem compiling cuda code HELP CUDA Programming and Performance	5	3814	July 21, 2009
Cuda kernel is not working and tried to detect errors using gpuAsset() but, no error message CUDA Programming and Performance	14	2866	December 31, 2017
My kernel functions aren't recognized CUDA Programming and Performance	5	1902	April 16, 2013

Struct in CUDA can i use this struct in CUDA

Related topics