Run simple cuda function fail I think it is a little error

Hey guys,

I’m trying to run a simple cuda function in the example underneath. The thing is that nothing happens there. Ohhh I’m wrong: I get a safecall error after the execution of that function if I want to copy the array to host. What did I do wrong? I’m sure it’s a very silly mistake but I’m trying since an hour and can’t find the solution :(

__global__ void moveAcc()

{

int x = threadIdx.x;

        float sec = 1000.0f / (float) timestep;

        gravitation = 2.0;

// update positions

        particles[x * particleAttributes]       +=  particles[x+3] / sec;

        particles[x * particleAttributes +1]    +=  particles[x+4] / sec;

        particles[x * particleAttributes +2]    +=  particles[x+5] / sec;

// update speed (gravitation)

        particles[x+5] -= gravitation/sec;

__syncthreads();

}

void move(){    moveAcc<<<1,particleCount>>>(); cudaThreadSynchronize(); }

I set gravitation to 2.0 there to see if something is happening but it is still 9.81 after the execution. Particles is an initialized float device array. The other vars are shared variables.

Hi,

What is the size of particleCount?

It should not be above 512 or 1024 (depending on compute capability).

It is 20, for test reasons :P

No one an idea?

It’s hard to find the problem in a fragment of code that refers to multiple variables which weren’t declared, defined or initialized. Generally speaking, if you’re going to ask people in the internet to find a bug in your code, at the very least try to post code that would compile.

That said, try to call cudaGetLastError after moveAcc<<< >>> and see if the error code tells you anything.

P.S. Also try to call cudaGetLastError before moveAcc, too.

Ty for your replay. cudaGetLastError was a nice tip, sadly not in that case.

Here is the whole code:

#include "ParticleSystem_kernel.h"

__constant__ int p1;

__constant__ int p2;

__constant__ int p3;

__device__ float * particles;

__device__ int * hash;

__shared__ int particleAttributes;

__shared__ int particleCount;

__shared__ int timestep;

__shared__ float gravitation;

extern "C"

{

void initCudaMem(float * f , int length, int partAttr, int time)

{

size_t size = length * partAttr * sizeof(float);

timestep = time;

        particleCount = length;

        particleAttributes = partAttr;

        cutilSafeCall(  cudaMalloc(&particles ,size) );

        cudaMemcpy(particles, f, size, cudaMemcpyHostToDevice);

        gravitation = 9.81;

}

__global__ void moveAcc()

{

int x = threadIdx.x;

        float sec = 1000.0f / (float) timestep;

        gravitation *= 2.00f;

// update positions

        particles[x * particleAttributes]       +=  particles[x+3] / sec;

        particles[x * particleAttributes +1]    +=  particles[x+4] / sec;

        particles[x * particleAttributes +2]    +=  particles[x+5] / sec;

// update speed (gravitation)

        particles[x+5] -= gravitation/sec;

__syncthreads();

}

void move(){moveAcc<<<1,particleCount>>>(); cudaThreadSynchronize(); }

int writeBackParticleCount(){   return particleCount; }

void writeBackParticles(float * f)

{

        cudaGetLastError();                // NOTHING HERE

        printf("%f\n\n", gravitation);     // PRINTS 9.81 ATM

        size_t size = particleCount * particleAttributes * sizeof(float);

        cutilSafeCall( cudaMemcpy(f, particles, size, cudaMemcpyDeviceToHost) );

}

void collide(){}

The function are called from a cpp class in that order:

initCudaMem(vars);

move();

collide();

writeBackParticles();

If I dont call move() there isn’t any error and the writeBack works like it should.

The error is:

ParticleSystem_kernel.cu(75) : cudaSafeCall() Runtime API error : unspecified launch failure.

€: The safeCall is from the writeBack memcpy

Ty to all who take the time to help me :)

Hi smoes,

the errors in your code are

  1. CUDA doesn’t support device(device) and shared (shared) variables globally to a kernel. Shared variables can be used only inside a kernel.

device float * particles;

device int * hash;

shared int particleAttributes;

shared int particleCount;

shared int timestep;

shared float gravitation;

  1. we have to pass the buffers and variables to kernel as parameter to the kernel only.

global void moveAcc();

change like this

global void moveAcc(float * particles, int timestep, float gravitation, int particleAttributes);

Thank u a lot Sijo. Thought it would be a stupid mistake :)
Have a nice day!