Allocating memory for 2D Array

Hi,

I’m completely new to CUDA and I’m trying to learn it at the moment, but I’m having some difficulties and I hope someone out there can point me in the right direction.

I’m trying to run the following code:

[codebox]unsigned char d_pixelDate = NULL;

unsigned char h_pixelDate[100][100];



//Allocate required memory on device

size_t *pitch = NULL;

cudaMallocPitch((void **)d_pixelDate, pitch, 100, 100);

//copy pixel data to the device

cudaMemcpy2D((void *)d_pixelDate, *pitch, h_pixelDate, 100, 100, 100, cudaMemcpyHostToDevice);[/codebox]

on the cudaMallocPitch line, I’m getting an error which, when I translate from German to English should be something like: “unauthorised access while writing to address 0x00000000” .

I’d be very grateful if anybody can point me to why I’m getting this error in this piece of code. Thanks.

Anybody? I’m getting desperate :(

These look like straight C errors. Did you mean something like this?

unsigned char* d_pixelDate = NULL; // Pointer to your allocated destination buffer

unsigned char h_pixelDate[100][100]; // Source data you plan to copy

//Allocate required memory on device

size_t pitch = 0; // Dst buffer pitch

cudaMallocPitch(&d_pixelDate, &pitch, 100, 100); // Alloc dst buffer, note returning both buffer pointer and pitch

//copy pixel data to the device

cudaMemcpy2D(d_pixelDate, pitch, h_pixelDate, 100, 100, 100, cudaMemcpyHostToDevice); // Copy buffers, note src and dst pitches

Code fixes off the top of my head, may not compile.

Hi Cudabean,

thanks for your reply.

That’s very likely, considering that I haven’t done anything in C in ages and it was never that good to begin with.

I tried your suggestions, but I get the following error:

“argument of type “unsigned char **” is incompatible with parameter of type “void **””

that is the reason why I casted it in my version by doing “(void **)d_pixelDate”. After casting, I still get the same error message.

I just corrected it to “(void **)&d_pixelDate” and it works :thumbup:

Good to hear you are having some success. I purposely removed those casts in case they hid real type conversion errors.

Now, after modifying the code a little bit, I’m facing some other problem I can’t understand… Here is the code - please notice the enlarged characters colored red, that’s the interesting part of the code:

[codebox]

    unsigned char **device_pixelData = NULL;

unsigned char **host_pixelData = NULL;

unsigned char *buffer;



int height = 10;

int width = 5;

host_pixelData = new unsigned  char *[height];

for (int a=0;a<height; a++)

    host_pixelData[a]=new unsigned  char [width];

buffer= new unsigned char[width];

    for(int i = 0;i<width;i++) buffer[i] = i;

for (int row=(height-1);row>=0;row–)

   {

       for (int line=0;line<width;line++)

           host_pixelData[row][line]= buffer[line];

   } 



cudaError_t errorMessage1, errorMessage2;

//Allocate required memory on device

size_t pitch = 0;

size_t spitch = width*sizeof(unsigned char);

errorMessage1 = cudaMallocPitch((void **)&device_pixelData, &pitch, spitch, height);

//copy pixel data to the device

errorMessage2 = cudaMemcpy2D(device_pixelData, pitch, host_pixelData, spitch, width, height, cudaMemcpyHostToDevice);[/codebox]

I noticed that the success or failure of the code depends on the values of height and width, and I can’t understand why. Below are some examples of height and width values I tried and the errorMessages.

when:

  • height = 10 and width = 5

     ------>  errorMessage1 = cudaSuccess
    
                 errorMessage2 = cudaSuccess
    
  • height = 100 and width = 50

      ------>  errorMessage1 = cudaSuccess
    
                  errorMessage2 = cudaSuccess
    
  • height = 50 and width = 100

      ------>  errorMessage1 = cudaSuccess
    
                 trying to execute the line with errorMessage2 leads to an access violation while trying to read from position 0x00361000
    
  • height = 5 and width = 100

      ------>  errorMessage1 = cudaSuccess
    
                  errorMessage2 = cudaSuccess
    
  • height = 5 and width = 10

      ------>  errorMessage1 = cudaSuccess
    
                  errorMessage2 = cudaSuccess
    
  • height = 384 and width = 1536

      ------>  errorMessage1 = cudaSuccess
    
                    trying to execute the line with errorMessage2 leads to an access violation while trying to read from position 0x00361010
    
  • height = 1536 and width = 384

      ------>  errorMessage1 = cudaSuccess
    
                  errorMessage2 = cudaSuccess
    
  • height = 1536 and width = 1536

      ------>  errorMessage1 = cudaSuccess
    
                  errorMessage2 = cudaSuccess
    

I can’t understand why certain configurations of height and width lead to access violations. Can anybody help point me to the source of the problem? Thanks.

You can’t use CUDA memcpy2D() to copy the kind of array of pointers you have on the host side to the device. “Pitched” memory on the device is just linear, 1D memory which has been padded and aligned for optimal performance on the device.

Hi avidday,

thanks for your reply. I still don’t understand why I can’t do that, please can you be a little more explicit as to why not? I was under the impression that the pitch parameter would take care of the alignment issues? And why does it work with some configurations of height and width, and doesn’t for others? Also, what kind of arrays can I then copy from host to device using cudaMemcpy2D()?

The basic problem is that your storage host_pixelData is a one dimensional array of pointers. This is not the same as a statically declared two dimensional array, nor is it the same as a one dimensional statically declared or dynamically allocated array of the same size. Consider the following code:

#include <string.h>

#include <stdlib.h>

#include <stdio.h>

int main()

{

	int i,j;

	int One[4][4];

	int Two[16];

	int **Three;

	fprintf(stdout,"Allocating Three\n");

	Three = malloc((size_t)4*sizeof(int *));

	for(i=0; i<4; i++)

		Three[i] = malloc((size_t)4*sizeof(int));

	for(i=0; i<4; i++)

		for(j=0; j<4; j++)

			Three[i][j] = j + 4*i;

	fprintf(stdout,"copying Three->One .... ");

	(void)memcpy(One, Three, 16*sizeof(int));

	fprintf(stdout,"done\n");

	for(i=0; i<4; i++)

		for(j=0; j<4; j++)

			fprintf(stdout,"i=%d j=%d One=%d\n", i, j, One[i][j]);

	fprintf(stdout,"copying Three->Two .... ");

	(void)memcpy(Two, Three, 16*sizeof(int));

	fprintf(stdout,"done\n");

	for(i=0; i<4; i++)

		for(j=0; j<4; j++)

			fprintf(stdout,"i=%d j=%d Two=%d\n", i, j, Two[j + 4*i]);

	fprintf(stdout,"Freeing Three\n");

	for(i=0; i<4; i++)

		free(Three[i]);

	free(Three);

	return 0;

}

You might imagine that One, Two and Three are functionally equivalent to one another, but they are not. One and Two are interchangeable. Three is not. I suggest you study the above code until you understand why it doesn’t work as you might expect it to.

Excellent illustration avidday, thanks a lot, I feel stupid.