I am implementing AES algorithm in ECB mode. My basic variable is custom struct (Block) that contains
unsigned char[4][4], and represents one block 4x4.
First of all, I am reading some text file and storing it to these blocks. For example, if there is 16000 characters in text file I will make array of 1000 structs (blocks 4x4),so number of blocks is 1000 (using it later as dim numBlocks(1, number_of_blocks)).
After that, I am allocating memory on host and device for plaintext, ciphertext and encrypted text all of them having same size. Those are arrays of structs.
Like this
cudaMallocHost((void**)&plaintext, number_of_blockssizeof(Block));
cudaMalloc((void**)&plaintext, number_of_blockssizeof(Block));
Then I’m using cudaMemCpy to copy data from host to device.
Before calling kernel I set up
dim3 threadsPerBlock(4,4)
dim numBlocks(1, number_of_blocks)
Now there is a problem. For example if input text file contains less than one milion characters everything is working fine, but if number of charaters is greater than one million kernel returns empty array, nothing.
What could be the problem? Is it most likely to be problem with memory or my code (chances are 90%)?
I can provide my code later if necessary.