Memory access violation Reading text file, sending content to GPU

Silvernight1 · March 5, 2012, 8:38am

Hello everyone,

I’m quite new to C++ and have just started CUDA programming as well for my master’s thesis, but all i have encountered so far is thousands different problems.

For days now i have struggled trying to get a program to read a text file with a list of words (patterns), then send it to the GPU for further processing (pattern matching against some other text).

For this i have found a text file with “bad words”, and i want to try run a kernel for each line from this file.

(The goal of the program as a whole is to be like a tiny prototype of an IDS, scanning network packets like Snort for known signatures, however that seems a few years away as of now).

I have tried many sorts of guides but none of them seem to manage to send a dynamic character array to the GPU, (or a list of character arrays).

#include "cuda_runtime.h"

#include "device_launch_parameters.h"

#include <stdio.h>

#include <list>

#include <iostream>

#include <fstream>

#include <string>

using namespace std;

struct Dict{

	char *list;

	int offsets[480]; //Can this be made dynamically? its 480 patterns in the badwords.txt

	int max_len;

	int num_patterns;

	void genDictionary(const char * filename){	

		max_len =0;

		num_patterns =0;

		char * cstr;

		string temp, line;

		int i =0;

		int curr_offset =0;

	ifstream myfile (filename);

			if (myfile.is_open())

				{

				

				while ( myfile.good() )

				{

					getline (myfile, line);

					cstr = new char[line.size()+1];

					strcpy(cstr, line.c_str());

					temp += cstr;

					curr_offset +=line.size(); 

					offsets[i] = curr_offset;

					

					num_patterns++;

					if (max_len < line.size()) max_len = line.size();

					i++;

					}

				myfile.close();

				}

			list = new char[temp.size()+1];

			strcpy(list, temp.c_str());

	}

};

__global__ void runKernel( char** g_list, int *g_offset) 

{  

	  printf("Hello thread %d\n", threadIdx.x);

	  //Do something with the input

}

void runEngine(){

	Dict dictionary;

	dictionary.genDictionary("badwords.txt");

	int num_threads = dictionary.num_patterns;

	int mem_size_list = sizeof(char) *strlen(dictionary.list);

	int mem_size_offsets = sizeof(int)* sizeof(dictionary.offsets);

// allocate device memory for list

    char** d_list;

    cudaMalloc( (void**) &d_list, mem_size_list);

	cudaMemcpy( d_list, dictionary.list, mem_size_list, cudaMemcpyHostToDevice) ;

	//// allocate device memory for offset

    int* d_offset;

    cudaMalloc( (void**) &d_offset, mem_size_offsets);

	cudaMemcpy( d_offset, dictionary.offsets, mem_size_offsets, cudaMemcpyHostToDevice) ;

    // copy host memory to device

	runKernel<<<1,1>>>(d_list, d_offset);  //start 1 kernel just for testing

}

int main(){

runEngine();

cudaDeviceReset();

return 0;

}

Can anyone please see if they locate the error, or give link to some guides that manage list of character arrays.

Appreciate all efforts made!

I added my whole project file, + the badwords.txt as attachment as well. (VS 2010 project).

Windows 7 64bit (program 32bit)

CUDA 4.1

SDK 4.1

Compiled as 2.0 compute.
Test.rar (3.05 MB)

A.Goude · March 5, 2012, 12:20pm

Hello

According to my understanding of your solution, you are trying to create one long character array with all patterns in the same array, and then use an offset variable to determine where each pattern start. In principle, this should be a good approach.

I noticed some errors in the memory allocation part of the code. First, since you have merged all patterns into one long string, this one should be of type char* and not char**, which is an array of pointers.

Second, every time you allocate memory using malloc or cudaMalloc, you need to multiply the length of the array with the size of the type you want to allocate, i.e.

cudaMalloc( (void**) &d_offset, mem_size_offsets*sizeof(int));

the same applies for cudaMemcpy.

Besides these errors, if you use dynamic memory allocation with new, you should remember to call delete on the arrays when you have finished using them.

If you want to use dynamic memory allocation for the offsets variable, you either need to let your program count the number of lines in advance before starting to read them, or you could allocate a suitable size, and when that size is too small, you allocate a larger array and copy all data to the new array.

Regards

Anders

Silvernight1 · March 5, 2012, 2:04pm

Hello

According to my understanding of your solution, you are trying to create one long character array with all patterns in the same array, and then use an offset variable to determine where each pattern start. In principle, this should be a good approach.

I noticed some errors in the memory allocation part of the code. First, since you have merged all patterns into one long string, this one should be of type char* and not char**, which is an array of pointers.

Second, every time you allocate memory using malloc or cudaMalloc, you need to multiply the length of the array with the size of the type you want to allocate, i.e.
cudaMalloc( (void**) &d_offset, mem_size_offsets*sizeof(int));
the same applies for cudaMemcpy.

Besides these errors, if you use dynamic memory allocation with new, you should remember to call delete on the arrays when you have finished using them.

If you want to use dynamic memory allocation for the offsets variable, you either need to let your program count the number of lines in advance before starting to read them, or you could allocate a suitable size, and when that size is too small, you allocate a larger array and copy all data to the new array.

Regards

Anders

Thank you for the reply!

I got the char array into the kernel now, so does the offset (atleast so far it looks good of what i have printf’ed).

Now to figure out how to handle output!

Still would love links to char array examples with 2 or more dimensions if anyone know any.

A.Goude · March 5, 2012, 2:35pm

In principle, you can do it like the following, assuming that you have your data in the two dimensional array cpuarray. For simplicity, I made the array containing 10 strings

char* array[10];

char** cudaarray;

int strlength;

for(int k=0;k<10;k++) {

  strlength=strlen(cpuarray[k]);

  cudaMalloc( (void**) &array[k], strlength*sizeof(char));

  cudaMemcpy( array[k], cpuarray[k], strlength*sizeof(char), cudaMemcpyHostToDevice);

}

cudaMalloc( (void**) &cudaarray, 10*sizeof(char*));

cudaMemcpy( cudaarray, array, 10*sizeof(char*), cudaMemcpyHostToDevice);

However, if you do not need to do it in this way, I would not really recommend using arrays of pointers on the gpu unless it is necessary. First, it requires a lot small memory allocations and memory copies, which likely is slower than one large. Besides, I have personally had problems using pointers to pointers on the gpu (see [topic=‘The Official NVIDIA Forums | NVIDIA’]The Official NVIDIA Forums | NVIDIA) and I would say that having all data in one single array is a more reliable solution.

Regards

Anders

Silvernight1 · March 5, 2012, 3:25pm

Thank you for good advice;)

As the goal is to make it as fast as possible, so one single large memcopy would be ideal as you say, (and as I’ve read in the CUDA books i have), it would just have made it easier to handle in the kernel code-vise, but i guess i have to manage;)

What I’m struggling with now is how i can get all the kernels to write to the same output character array for the return answers as well, if that is ever possible.

If kernel thread x find the pattern in the input string, it should just add a “notification of it” at the current end of g_output, and continue parse the input until its done.

But this would make a race condition against g_output would it not?

Is there some sort of on-kernel atomicAdd for combining character arrays?

and how would one actually go forth in combining the current g_output with this new notification? And is there some cuda function as normal STD such as strcat?

example: g_output = (g_output + “notification”);?

What my code looks like now is something like this, however it does not work because of the strcat, and frankly I’m not sure if the rest if correct either, still fresh on this C code:

__global__ void runKernel(char* g_pattern_list, int *g_pattern_offset, int pattern_size, char* g_packet_input, int packet_size, char * g_output)

{  

	int tid = threadIdx.x + blockIdx.x * blockDim.x;

	int index = 0;

	for(int i=0; i<packet_size; i++){

		for(int j=g_pattern_offset[tid]; j< g_pattern_offset[tid+1]; j++){

			if(g_pattern_list[j] != g_packet_input[i+index]) break; //Break if not the same

			else

			index++;

			if(index == g_pattern_offset[tid+1]) //found the whole pattern

				strcat(g_output, "found pattern at: "+g_pattern_offset[tid]);

		}

	}

}

A.Goude · March 5, 2012, 3:51pm

One way you could try is to use atomicAdd for integers. Have a offset value, where the next data should be inserted. Once you have your data and know how long it is, call atomicAdd on the offset value to reserve space for the data. Then, you should safely be able to insert your data in the array at the offset position without any race conditions.

regarding strcat, it can be implemented using a normal for-loop.

If your intention is to write fast code, I recommend you to study how the gpu warps work. In principle, your goal is to keep all threads (within a warp) executing exactly the same code all the time. Considering that your work appears to contain several if statements etc, this is probably something you need to consider for optimal performance.

Regards
Anders

Topic		Replies	Views
Cuda kernel is not working and tried to detect errors using gpuAsset() but, no error message CUDA Programming and Performance	14	2861	December 31, 2017
CUDA and char* programming CUDA Programming and Performance	8	14949	July 31, 2008
Is it possible to process multidimensional arrays inside the kernel? CUDA Programming and Performance	13	9040	March 31, 2015
Working with arrays, pointers and a race condition CUDA Programming and Performance	6	3380	April 23, 2012
Annoying problems with memory and/or syntax CUDA Programming and Performance	19	4769	April 8, 2008
2D char array of random characters represented as 1D char array Trying to make each thread on cuda g CUDA Programming and Performance	7	6564	February 22, 2015
Global arrays? CUDA Programming and Performance	24	10624	August 18, 2010
GPU Transfer problems GPU won't correctly read data out from Device to Host CUDA Programming and Performance	15	2633	August 2, 2010
multi dimension array CUDA Programming and Performance	26	32771	February 12, 2010
Cuda code performance CUDA Programming and Performance	14	3133	December 16, 2014

Memory access violation Reading text file, sending content to GPU

Related topics