Passing a dynamic array in C++ to CUDA kernel

Hi all;

I’m having a problem in passing a dynamic array in C++ into the CUDA Kernel. Here is the snapshot of the program

char *pWordListArray[WORDLENGTH] ; // Dynamic allocated array with a very large number of WORDLENGTH
char *dev_pWordListArray; // Pointer to the GPU (device) memory

// Allocate memory
const int dev_pWordListArray_sizeof = (WORDLENGTH)sizeof(char);
cudaMalloc((void
*)&(dev_pWordListArray), dev_pWordListArray_sizeof);

// Copy from Host to Device
cudaMemcpy(dev_pWordListArray, pWordListArray, dev_pWordListArray_sizeof, cudaMemcpyHostToDevice);

// Launch the kernel
insert <<< 1, 1>>>(dev_pWordListArray);

// Inside global function insert

global void insert(char *dev_pWordListArray) {

 for(int i=0; i < 5; i++) {
	 printf("%s\n", dev_pWordListArray[i]);
 }				

When my program reach the function global, the program got killed unexpectedly and unable to print the string stored in the dev_pWordListArray.

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application’s support team for more information.
Press any key to continue . . .

I would expect my program above to have the similar output like in Standard C++ as shown by the code below:

for(int i=0; i < 5; i++) {
printf("%s\n", pWordListArray[i]);
}

Please check array types of your code:

pWordListArray is ‘char**’
pWordListArray[i] is ‘char*’
dev_pWordListArray is ‘char*’
dev_pWordListArray[i] is type ‘char’

Therefore, it looks like when you print ‘dev_pWordListArray[i]’ inside the kernel you are using a single char as a pointer parameter. Check also your cudaMemcpy because you are trying to copy an array of strings 'pWordListArray ’ (not their content) to a single string ‘dev_pWordListArray’. If your intention was to cudaMemcpy the pointer and not its content, the CPU memory has to be allocated as mapped memory to be accessible from the device.

my intention is to copy the content, in this case, what is the best way for me to solve this problem ??
Any input ?

Also, how can I allocate the CPU memory as mapped memory to be accessible from the device ?

I made some modification to the code, but it still doesnt work …

char *pWordListArray [WORDLENGTH];
char *dev_pWordListArray[WORDLENGTH];

const int dev_pWordListArray_sizeof = (WORDLENGTH)sizeof(char);
cudaMalloc((void
*)&(dev_pWordListArray), dev_pWordListArray_sizeof);

for(int i=0; i < 6; i++) {
cudaMemcpy(dev_pWordListArray[i], pWordListArray[i], (strlen(pWordListArray[i]) + 1), cudaMemcpyHostToDevice);
}// Launch the kernel
insert <<< 1, 1>>>(dev_pWordListArray);

// Inside global function insert

global void insert(char **dev_pWordListArray) {

for(int i=0; i < 6; i++) {
printf("%s\n", dev_pWordListArray[i]);
}

However;

It works on the standard C++ flow of passing a string of character in an array

// Call function printit
printit(pWordListArray);

// function printit

void printit(char **word) {

for(int i=0; i < 6; i++) {
	printf("%s, %d\n", word[i], (strlen(word[i]) + 1));
}

}

Also,

by following the existing thread (https://devtalk.nvidia.com/default/topic/498920/cudamemcpy-seg-fault-segmentation-fault-copying-array/), by allocating memory as below :

for(int i=0; i < 6; i++) {
cudaMalloc((void**)&dev_pWordListArray[i], (strlen(pWordListArray[i]) + 1)*sizeof(char));
}

It still doesnt work and give me an error.

I appreciate that if someone could help in providing input to solve this problem …