problem of substrut two vectors substruction of vectors

hello,

i would like write a code of substruction of two vectors like vectTest[576]-vectNormal[576]

the problem when i execute the code the result is false this is my code

if somebody have any solution please !

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <math.h>

// includes, project
#include <cutil_inline.h>
#include <assert.h>
#include <cuda.h>

// includes, kernels
//#include <subvect_kernel.cu>

#define SI 24

#define NBRE 10

float *a_d; // pointer to device memory
float *b_h;
float *res_d;
float *b_d;

int N = 576;

////////////////////////////////////////////////////////////////////////////////
// declaration, forward
global void incrementArrayOnDevice(float a,floatc,float *res, int N);

/////////////////declaration des tableaux/////////////////////////////////////
float *vectTest; //vector of test
float *vectNormal; //vector of normalization

global void incrementArrayOnDevice(float *a,float *c,float res ,int N)
{
int idx = blockIdx.x
blockDim.x + threadIdx.x;
if (idx<N) res[idx]=a[idx]-c[idx];
}

////////////////////////////////////////////////////////////////////////////////
// Program main
////////////////////////////////////////////////////////////////////////////////
int
main(int argc, char** argv)
{

vectTest =(float*)malloc(SISIsizeof(float));
vectNormal =(float*)malloc(SISIsizeof(float));

for(int i=0;i<576;i++) vectTest[i]=(float)i;
for(int i=0;i<576;i++) vectNormal[i]=(float)i/2;

b_h =(float*)malloc(SISIsizeof(float));
size_t size = N*sizeof(float);
// allocate array on device
cudaMalloc((void **) &a_d, size);
cudaMalloc((void **) &b_d, size);
cudaMalloc((void **) &res_d, size);

// copy data from host to device
cudaMemcpy(a_d, vectTest, sizeof(float)*N, cudaMemcpyHostToDevice);
cudaMemcpy(b_d, vectNormal, sizeof(float)*N, cudaMemcpyHostToDevice);

// do calculation on device:
// Part 1 of 2. Compute execution configuration
int blockSize = 16;
int nBlocks = N/blockSize + (N%blockSize == 0?0:1);
// Part 2 of 2. Call incrementArrayOnDevice kernel
incrementArrayOnDevice <<< nBlocks, blockSize >>> (a_d,b_d,res_d, N);
// Retrieve result from device and store in b_h
cudaMemcpy(b_h, res_d, sizeof(float)*N, cudaMemcpyDeviceToHost);

// display result
for (int i=0;i<576;i++)
printf(“b[%d]=%f\n”,i,b_h[i]);
free(vectTest); cudaFree(a_d);
}

the result is just the only b_h[i]=0 …

Check for errors after the kernel launch.
The code works for me, so it’s probably an installation issue rather than a problem with the code.

Check for errors after the kernel launch.
The code works for me, so it’s probably an installation issue rather than a problem with the code.

hello tera,

when you execute the code you have the good result like 0 0.5 1 1.5 2 2.5 3 …

just for imformation i execute my code cuda (.cu) in code blocks (anther ubuntu ) and also

with method of (make ) and i have the same problem !

and please i dont understand you said that can be problem of installation ?

hello tera,

when you execute the code you have the good result like 0 0.5 1 1.5 2 2.5 3 …

just for imformation i execute my code cuda (.cu) in code blocks (anther ubuntu ) and also

with method of (make ) and i have the same problem !

and please i dont understand you said that can be problem of installation ?

What I do is copy your code above into a file test.cu, comment out the unneeded include of <cutil_inline.h> which I don’t have, compile it with [font=“Courier New”]nvcc test.cu[/font], and running it gives

[font=“Courier New”]> ./a.out

b[0]=0.000000

b[1]=0.500000

b[2]=1.000000

b[3]=1.500000

b[4]=2.000000

b[5]=2.500000

b[6]=3.000000

b[7]=3.500000

b[8]=4.000000

b[9]=4.500000

b[10]=5.000000

[/font]

and so on.

Are you able to successfully run examples from the SDK?

What I do is copy your code above into a file test.cu, comment out the unneeded include of <cutil_inline.h> which I don’t have, compile it with [font=“Courier New”]nvcc test.cu[/font], and running it gives

[font=“Courier New”]> ./a.out

b[0]=0.000000

b[1]=0.500000

b[2]=1.000000

b[3]=1.500000

b[4]=2.000000

b[5]=2.500000

b[6]=3.000000

b[7]=3.500000

b[8]=4.000000

b[9]=4.500000

b[10]=5.000000

[/font]

and so on.

Are you able to successfully run examples from the SDK?

hello,tera

yes i can run examples from sdk ,

i run the code and i have a good result

but,now i would like read data from a file and store it in the tables (the same code )

this is my code

#include <stdlib.h>

#include <stdio.h>

#include <string.h>

#include <math.h>

#include <assert.h>

#include <cuda.h>

#define SI 24 // taille d’image 24x24

#define NBRE 10 //nombre des images du bases de données

float *a_d; // pointer to device memory

float *b_h;

float *res_d;

float *b_d;

int N =576;

void NormVect(int n);

// declaration, forward

global void incrementArrayOnDevice(float a,floatc,float *res, int N);

/////////////////declaration des tableaux/////////////////////////////////////

float *vectTest; //vecteurs de test

float *vectNormal; //vecteurs de normalisation

float **subSpace; //vecteurs de test

global void incrementArrayOnDevice(float *a,float *c,float *res ,int N)

{

int idx = blockIdx.x*blockDim.x + threadIdx.x;

if (idx<N) res[idx]=a[idx]-c[idx];

}

int main(int argc, char **argv) {

FILE *fichier =NULL;

vectTest =(float*)malloc(SISIsizeof(float));

vectNormal =(float*)malloc(SISIsizeof(float));

subSpace=(float **)malloc(SISIsizeof(float *));

for(int i=0;i<SI*SI;i++)

subSpace[i]=(float )malloc(NBREsizeof(float));

fichier=fopen(“donne.txt”,“r”);

if(fichier !=NULL)

{

for (int i=0 ; i<SI*SI ; i++){                      //read data from donne.txt and store in vectTest[]

fscanf(fichier,"%f\t",&vectTest[i]);

fscanf(fichier,"\r\n");

 }

for (int i=0 ; i<SI*SI ; i++){

fscanf(fichier,"%f\t",&vectNormal[i]);

fscanf(fichier,"\r\n");

 }

for (int i=0 ; i<576 ; i++)

  for(int j=0; j<10 ;j++)

  fscanf(fichier,"%f\t",&subSpace[i][j]);

  fscanf(fichier,"\r\n");

fclose(fichier) ;

}

NormVect(SI*SI);

}

//substruct //////////////

void NormVect(int n) {

b_h =(float*)malloc(SISIsizeof(float));

size_t size = N*sizeof(float);

// allocate array on device

cudaMalloc((void **) &a_d, size);

cudaMalloc((void **) &b_d, size);

cudaMalloc((void **) &res_d, size);

// copy data from host to device

cudaMemcpy(a_d, vectTest, sizeof(float)*N, cudaMemcpyHostToDevice);

cudaMemcpy(b_d, vectNormal, sizeof(float)*N, cudaMemcpyHostToDevice);

dim3 dimGrid2(1,1);

dim3 dimBlock2(n,1);

incrementArrayOnDevice <<< dimGrid2, dimBlock2 >>> (a_d,b_d,res_d, N);

cudaMemcpy(b_h, res_d, sizeof(float)*N, cudaMemcpyDeviceToHost);

//diplay result

 for (int i=0;i<n;i++)

 printf("b[%d]=%f\n",i,b_h[i]);

}

the problem the result is false because i should have result like

b[0]=13.2

b[1]=13.9

b[2]=23.1

b[0]=35.7

.

.

.

.

it’s probably the problem of read data from donne.txt

the donne.txt is attatched the this replay

so please can you give me the reason what the origine of this problem ?
donne.txt (64.6 KB)

hello,tera

yes i can run examples from sdk ,

i run the code and i have a good result

but,now i would like read data from a file and store it in the tables (the same code )

this is my code

#include <stdlib.h>

#include <stdio.h>

#include <string.h>

#include <math.h>

#include <assert.h>

#include <cuda.h>

#define SI 24 // taille d’image 24x24

#define NBRE 10 //nombre des images du bases de données

float *a_d; // pointer to device memory

float *b_h;

float *res_d;

float *b_d;

int N =576;

void NormVect(int n);

// declaration, forward

global void incrementArrayOnDevice(float a,floatc,float *res, int N);

/////////////////declaration des tableaux/////////////////////////////////////

float *vectTest; //vecteurs de test

float *vectNormal; //vecteurs de normalisation

float **subSpace; //vecteurs de test

global void incrementArrayOnDevice(float *a,float *c,float *res ,int N)

{

int idx = blockIdx.x*blockDim.x + threadIdx.x;

if (idx<N) res[idx]=a[idx]-c[idx];

}

int main(int argc, char **argv) {

FILE *fichier =NULL;

vectTest =(float*)malloc(SISIsizeof(float));

vectNormal =(float*)malloc(SISIsizeof(float));

subSpace=(float **)malloc(SISIsizeof(float *));

for(int i=0;i<SI*SI;i++)

subSpace[i]=(float )malloc(NBREsizeof(float));

fichier=fopen(“donne.txt”,“r”);

if(fichier !=NULL)

{

for (int i=0 ; i<SI*SI ; i++){                      //read data from donne.txt and store in vectTest[]

fscanf(fichier,"%f\t",&vectTest[i]);

fscanf(fichier,"\r\n");

 }

for (int i=0 ; i<SI*SI ; i++){

fscanf(fichier,"%f\t",&vectNormal[i]);

fscanf(fichier,"\r\n");

 }

for (int i=0 ; i<576 ; i++)

  for(int j=0; j<10 ;j++)

  fscanf(fichier,"%f\t",&subSpace[i][j]);

  fscanf(fichier,"\r\n");

fclose(fichier) ;

}

NormVect(SI*SI);

}

//substruct //////////////

void NormVect(int n) {

b_h =(float*)malloc(SISIsizeof(float));

size_t size = N*sizeof(float);

// allocate array on device

cudaMalloc((void **) &a_d, size);

cudaMalloc((void **) &b_d, size);

cudaMalloc((void **) &res_d, size);

// copy data from host to device

cudaMemcpy(a_d, vectTest, sizeof(float)*N, cudaMemcpyHostToDevice);

cudaMemcpy(b_d, vectNormal, sizeof(float)*N, cudaMemcpyHostToDevice);

dim3 dimGrid2(1,1);

dim3 dimBlock2(n,1);

incrementArrayOnDevice <<< dimGrid2, dimBlock2 >>> (a_d,b_d,res_d, N);

cudaMemcpy(b_h, res_d, sizeof(float)*N, cudaMemcpyDeviceToHost);

//diplay result

 for (int i=0;i<n;i++)

 printf("b[%d]=%f\n",i,b_h[i]);

}

the problem the result is false because i should have result like

b[0]=13.2

b[1]=13.9

b[2]=23.1

b[0]=35.7

.

.

.

.

it’s probably the problem of read data from donne.txt

the donne.txt is attatched the this replay

so please can you give me the reason what the origine of this problem ?

Unless your device is of compute capability 2.0 or higher, your kernel will not launch because it has too many threads per block (576, where only 512 are allowed). Split the workload into multiple blocks, just as you did in your first example.

And please, check the return codes. You could have detected yourself that the kernel does not launch.

Unless your device is of compute capability 2.0 or higher, your kernel will not launch because it has too many threads per block (576, where only 512 are allowed). Split the workload into multiple blocks, just as you did in your first example.

And please, check the return codes. You could have detected yourself that the kernel does not launch.

hello,tera

thank you ,yes i try with the first declaration of grids and blocks the code work successfuly

but the problem is how i can read data from file.txt ??

please i would i ask you if you you have any idea about compliling a cuda code on code Blocks ??

hello,tera

thank you ,yes i try with the first declaration of grids and blocks the code work successfuly

but the problem is how i can read data from file.txt ??

please i would i ask you if you you have any idea about compliling a cuda code on code Blocks ??