cudaFree, segmentation fault

I have the folowing code:

int main(){
int n = 1000;
int k = 1000000;

float X_h;
X_h = (float )malloc(nk
sizeof(float));
int count = 0;
for (int i = 0; i < n; i++){
for (int j = 0; j < k; j++){
X_h[count] = 0.0;
count++;
}
}
float X_d;
cudaMalloc((void **) &X_d, n
k*sizeof(float));
cudaMemcpy(X_d, X_h, sizeof(float)nk, cudaMemcpyHostToDevice);
cudaFree(X_d);

return(0);
}

After executing it I get “Segmentation fault” at “cudaFree(X_d)”;

Any help as to why is it happening? Thanks a lot.

This code runs fine on Ocelot ( http://code.google.com/p/gpuocelot/ ), although it uses a huge amount of memory (6-8GB on my machine). Ocelot on Valgrind reports 8,000,393,726 bytes allocated. I would guess that cudaMalloc is failing on your card… Try allocating less memory…

It fails at cudaFree. Also, when I run it in -deviceemu mode it works, but in normal mode it fails. It seems that something has not finished while another thing started.

See this modified code.

#include <iostream>

int main(){

  int n = 1000;

  int k = 1000000;

float *X_h;																															   

  X_h = (float *)malloc(n*k*sizeof(float));

  int count = 0;

	for (int i = 0; i < n; i++){

	  for (int j = 0; j < k; j++){

		X_h[count] = 0.0;

		count++;

	  }

	}

  float *X_d;

  cudaError_t error = cudaMalloc((void **) &X_d, n*k*sizeof(float));

  if( error != cudaSuccess )

  {

	  std::cout << "Failed at malloc.\n";

	  return 0;

  }

  error = cudaMemcpy(X_d, X_h, sizeof(float)*n*k, cudaMemcpyHostToDevice);

  if( error != cudaSuccess )

  {

	  std::cout << "Failed at memcpy.\n";

	  return 0;

  }

  error = cudaFree(X_d);

  if( error != cudaSuccess )

  {

	  std::cout << "Failed at free.\n";

	  return 0;

  }

std::cout << "Passed\n";

 return(0);

}

It fails in malloc on my card. The problem that you were probably having before was that it was failing in malloc, which returned a garbage pointer, which you were then passing to cudaMemcpy or cudaFree causing a segfault. In general you should check all cuda api calls to make sure that they succeed.

Indeed the problem is in malloc. Thanks!