core dumped on accessing memory

Hi !

can we call a printf() on data allocated on device memory ?

Here is my code, please help me understand where the problem is. <img src=‘http://hqnveipbwb20/public/style_emoticons/<#EMO_DIR#>/crying.gif’ class=‘bbc_emoticon’ alt=’:’(’ />

/* Includes */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <assert.h>
// bibliotheque CUDA
#include <cutil.h>
// header file
#include “full_scan_2.h”
// kernel code
#include “”

void read_file(char *nom, struct data *db);
//chargement des donnees
int formated(int n, char c);
// petite fonction servant a aficher ‘n’ fois le caractere ‘c’
void runProg(int argc, char **argv);
// Device code
global void scan_ORF();

int main(int argc, char **argv)
fprintf(stderr, “usage : ./<nom_prog> *.tfa\n”);

runProg(argc, argv);

CUT_EXIT(argc, argv);

void runProg(int argc, char **argv)
//demarrage carte GPGPU


int i/, j, k, l, m, n/; // variables de boucles etc
int nb_files = argc-1;
unsigned int timer = 0;
unsigned long mem_size_db = 1000 * nb_files;
unsigned int mem_size_fasta = 0;
for(i=0; i<nb_files; i++)
mem_size_fasta += strlen(argv[i+1]);
mem_size_fasta *= nb_files;

char **h_n_fasta; // tableau des noms de fichiers FASTA
h_n_fasta = (char **)calloc(nb_files, sizeof(char *));
assert(h_n_fasta != NULL);

for(i=0; i<nb_files; i++){
h_n_fasta[i] = (char *)calloc(strlen(argv[i+1]),sizeof(char));
assert(h_n_fasta[i] != NULL);
h_n_fasta[i] = strcpy(h_n_fasta[i],argv[i+1]);

struct data **h_db;
h_db = (struct data **) malloc(sizeof(struct data *) * nb_files);
for(i=0; i<nb_files; i++)
h_db[i] = (struct data *) malloc(sizeof(struct data) * TAILLE_D);

struct data d_db;
CUDA_SAFE_CALL(cudaMalloc((void **)&d_db, mem_size_db
char d_n_fasta;
CUDA_SAFE_CALL(cudaMalloc((void **)&d_n_fasta, mem_size_fasta

CUDA_SAFE_CALL(cudaMemcpy(d_n_fasta, h_n_fasta, mem_size_fasta, cudaMemcpyHostToDevice));


printf("\nfichier : %d\n", d_n_fasta[0]); // the problem might be HERE !

for(i=0; i<nb_files; i++)
read_file(h_n_fasta[i], h_db[i]);

// Check kernel execution error
CUT_CHECK_ERROR(“L’execution du kernel code a echouee\n”);

CUDA_SAFE_CALL(cudaMemcpy(d_db, h_db, mem_size_db, cudaMemcpyDeviceToHost));

printf(“Processing time : %f (ms)\n”, cutGetTimerValue(timer));

for(i=0; i<nb_files; i++)
for(i=0; i<nb_files; i++)


Greetings, :wave:


Device memory exists on the device. You must copy values to host memory before you can print them.

… but printf() works in emulation mode.

Find examples of
in the sample projects to see more.

Yes, off course it works in emulation mode, since the memory will not be allocated on the device, but in system RAM.

Hi !

I have found an example of device emulation macros inside of the Monte Carlo project, thanx a lot ! :D

In fact, the printf() is only for test on the copy (cudaMemcpy) from host to device.

when i try to print on screen d_n_fasta (allocated with cudaMalloc), i got a core dumped (segmentation fault),

is it because I must put “device emulation” macro definition before these instructions or is there another reason ?

another question :

Don’t I need to declare the results matrix as device type in order to allocate it on GPU global memory ?

You just should never reference an array that is allocated on the device from the host. (and vice versa)
In emulation mode you can reference host memory from device code, but it will crash when run in release mode. You could for debugging purposes do what you want to do, but personally, I prefer to MemCopyDeviceToHost the data and print it then from host memory.

Thank you, i shall use your advice as wisely as possible.

For the time being I have an(other) cudaMemcpy error :

I have searched in the forum for a clue, unsuccessfully for now.

The problem is : I am trying to a matrix into another matrix like this


  t_mot *h_Wd;

  h_Wd = (t_mot *) malloc(sizeof(t_mot) * TAILLE_M);


Here, there are some computations recorded in pointer h_Wd

(given as a argument to a void function)


  unsigned int small_size = TAILLE_M / TAILLE_H;

  h_Wd_small = (t_mot **) malloc(TAILLE_H * sizeof(t_mot *));

  assert(h_Wd_small != NULL);

  for(i=0; i<TAILLE_H; i++)


      h_Wd_small[i] = (t_mot *) malloc(small_size * sizeof(t_mot));

      assert(h_Wd_small[i] != NULL);


 CUDA_SAFE_CALL(cudaMemcpy(h_Wd_small, h_Wd, 

       TAILLE_H * small_size * sizeof(t_mot), 



After emudebug compilation and execution I got a core dumped (segmentation fault) on this cudaMemcpy(something wrong with the pointers ?).

Please, examine my problem and help me find a solution, i am out of ideas !


I am thinking of a simple solution for this.

Obviously, you can’t fool the cudaMemcpy (even HostToHost) by giving it a (t_mot **) and use it with a (t_mot *) for the Memcpy.
It lacks of the redirection adresses in the (t_mot *) array of pointers.

I need to change my code in order to avoid this.

Here I will report an attempt to copy (t_mot *h_word_small[i]) from “host” memory (CPU memory ?) into (t_mot *d_Wd) device memory (GPU global memory ?).

Access to the host memory goes wrong when attempting to copy its contents into device memory. I can’t comment much more this but by giving a piece of code and emudebug and gdb out. ^^



  for(i=0; i<TAILLE_H; i++)


      h_word_small[i] = (t_mot *) malloc(small_size * sizeof(t_mot));

      assert(h_word_small[i] != NULL);

     for(j=0; j<small_size; j++)


   h_word_small[i][j].seq = (t_lettre *) malloc(TAILLE_MOT* sizeof(t_lettre));

   assert(h_word_small[i][j].seq != NULL);

   for(k=0; k<TAILLE_MOT;k++)


       h_word_small[i][j].seq[k].nt |= h_Wd[i*small_size+j].seq[k].nt;


       // printf("copy %d\n", h_word_small[i][j].seq[k].nt);





 for(i=0; i<nb_req; i++)   

    read_query(h_Query[i], h_Rq[i]);


  for(i=0; i<nb_req; i++)

    mem_result[i] = h_Rq[i]->taille;        

  h_Out = (unsigned int **) malloc(TAILLE_H * sizeof(unsigned int *));

  for(i=0; i<TAILLE_H; i++)


      h_Out[i] = (unsigned int *) malloc(mem_result[0] * sizeof(unsigned int));

      assert(h_Out[i] != NULL);

      for(j=0; j<mem_result[0]; j++)

	h_Out[i][j] = 0;




  printf("device_init : requete test (%d nt) \n", mem_result[0]);


 CUDA_SAFE_CALL(cudaMalloc((void **) &d_Rq, mem_result[0] * sizeof(t_lettre *)));  

 CUDA_SAFE_CALL(cudaMemcpy(d_Rq, h_Rq[0],

       sizeof(struct query),


 for(i=0; i<TAILLE_H; i++)


      CUDA_SAFE_CALL(cudaMalloc((void **) &d_Wd, sizeof(t_mot)*small_size));

      CUDA_SAFE_CALL(cudaMalloc((void **) &d_Out, mem_result[0]*sizeof(unsigned int)));


      printf("copy %d %d %d\n", 





      CUDA_SAFE_CALL(cudaMemcpy(d_Wd, h_word_small[i], 

    small_size * sizeof(t_mot), 



      printf("test bits : \n");

      printf("host   : %#2X %#2X %#2X\n", h_Rq[0][1], h_Rq[0][2], h_Rq[0][3]);

      printf("device : %#2X %#2X %#2X\n", d_Rq[1], d_Rq[2], d_Rq[3]);



emudebug execution

$ ./full_scan_3 firmicutes_5.tfa *.seq

0 3 2

device_init : requete test (1103 nt) 

Erreur de segmentation (core dumped)


Program received signal SIGSEGV, Segmentation fault.

[Switching to Thread 47099714733504 (LWP 18176)]

0x0000000000404c06 in _Z7runProgiPPc (argc=32, argv=0x7fff6bf49288) at

(gdb) bt

#0  0x0000000000404c06 in _Z7runProgiPPc (argc=32, argv=0x7fff6bf49288) at

#1  0x00000000004041f3 in main (argc=32, argv=0x7fff6bf49288) at

	CUDA_SAFE_CALL(cudaMemcpy(d_Wd, h_word_small[i], 

                                  small_size * sizeof(t_mot), 


Is there something wrong with the order of execution of these code lines ?

I wonder if I should move this order to make this work properly.

Or maybe there is something wrong with the pointers, but I can’t see what !

As I suspected, the order of allocations is very important.