I’m having a problem in passing a dynamic array in C++ into the CUDA Kernel. Here is the snapshot of the program
char *pWordListArray[WORDLENGTH] ; // Dynamic allocated array with a very large number of WORDLENGTH
char *dev_pWordListArray; // Pointer to the GPU (device) memory
// Allocate memory
const int dev_pWordListArray_sizeof = (WORDLENGTH)sizeof(char);
cudaMalloc((void*)&(dev_pWordListArray), dev_pWordListArray_sizeof);
// Copy from Host to Device
cudaMemcpy(dev_pWordListArray, pWordListArray, dev_pWordListArray_sizeof, cudaMemcpyHostToDevice);
// Launch the kernel
insert <<< 1, 1>>>(dev_pWordListArray);
// Inside global function insert
global void insert(char *dev_pWordListArray) {
for(int i=0; i < 5; i++) {
printf("%s\n", dev_pWordListArray[i]);
}
When my program reach the function global, the program got killed unexpectedly and unable to print the string stored in the dev_pWordListArray.
This application has requested the Runtime to terminate it in an unusual way.
Please contact the application’s support team for more information.
Press any key to continue . . .
I would expect my program above to have the similar output like in Standard C++ as shown by the code below:
for(int i=0; i < 5; i++) {
printf(“%s\n”, pWordListArray[i]);
}
pWordListArray is ‘char**’
pWordListArray[i] is ‘char*’
dev_pWordListArray is ‘char*’
dev_pWordListArray[i] is type ‘char’
Therefore, it looks like when you print ‘dev_pWordListArray[i]’ inside the kernel you are using a single char as a pointer parameter. Check also your cudaMemcpy because you are trying to copy an array of strings 'pWordListArray ’ (not their content) to a single string ‘dev_pWordListArray’. If your intention was to cudaMemcpy the pointer and not its content, the CPU memory has to be allocated as mapped memory to be accessible from the device.