Why cuCtxDestroy craches?

Nerei · August 25, 2010, 5:20pm

Below is a small program that tryes to initialize second GPU, allocates memory, frees it and then releases second GPU.

In this code cuCtxDestroy crashes. If I comment first malloc (on first GPU) or second malloc (on second GPU) all is Ok.

Where is mistake? Guide promise that I can mix runtime and driver api.

int main(int argc, char* argv[])

{   

	CUcontext ctxs[10];

	

	cuSafeCall( cuInit(0) );

	void *ptr;

	size_t step;

	cudaMallocPitch(&ptr, &step, 100, 100);

		

	//cuSafeCall( cuCtxAttach(&ctxs[0], 0) );   

	//cuSafeCall( cuCtxDetach(ctxs[0]) );   

		

	CUdevice dev;

	cuDeviceGet(&dev, 1);   

	cuCtxCreate(&ctxs[1], 0, dev);		

	void *ptr2;

	size_t step2;

	cudaMallocPitch(&ptr2, &step2, 100, 100);

	cudaFree( ptr2 );

	//crash

	cuCtxDestroy(ctxs[1]); 

	cout << "unreacheble place in code" << endl;

}

Configuration:

GeForce 470
GeForce 9600
Toolikt 3.1 32-bit
Win7 64-bit
Compiled with “-gencode arch=compute_11,code=sm_11”

jan.heckman · August 25, 2010, 7:25pm

Below is a small program that tryes to initialize second GPU, allocates memory, frees it and then releases second GPU.

In this code cuCtxDestroy crashes. If I comment first malloc (on first GPU) or second malloc (on second GPU) all is Ok.

Where is mistake? Guide promise that I can mix runtime and driver api.
int main(int argc, char* argv[])

{   

	CUcontext ctxs[10];

	

	cuSafeCall( cuInit(0) );

	void *ptr;

	size_t step;

	cudaMallocPitch(&ptr, &step, 100, 100);

		

	//cuSafeCall( cuCtxAttach(&ctxs[0], 0) );   

	//cuSafeCall( cuCtxDetach(ctxs[0]) );   

		

	CUdevice dev;

	cuDeviceGet(&dev, 1);   

	cuCtxCreate(&ctxs[1], 0, dev);		

	void *ptr2;

	size_t step2;

	cudaMallocPitch(&ptr2, &step2, 100, 100);

	cudaFree( ptr2 );

	//crash

	cuCtxDestroy(ctxs[1]); 

	cout << "unreacheble place in code" << endl;

}
Configuration:

GeForce 470

GeForce 9600

Toolikt 3.1 32-bit

Win7 64-bit

Compiled with “-gencode arch=compute_11,code=sm_11”

I get no crash. Only thing I changed was cuSafeCall() to cutilDrvSafeCall()

Maybe this helps, or it might be another problem (no idea what).

Using win 7 64 with toolkit 3.1, compiled and ran for win32 debug and release, output:

unreacheble place in code

Press any key to continue . . .

jan.heckman · August 25, 2010, 7:25pm

Below is a small program that tryes to initialize second GPU, allocates memory, frees it and then releases second GPU.

In this code cuCtxDestroy crashes. If I comment first malloc (on first GPU) or second malloc (on second GPU) all is Ok.

Where is mistake? Guide promise that I can mix runtime and driver api.
int main(int argc, char* argv[])

{   

	CUcontext ctxs[10];

	

	cuSafeCall( cuInit(0) );

	void *ptr;

	size_t step;

	cudaMallocPitch(&ptr, &step, 100, 100);

		

	//cuSafeCall( cuCtxAttach(&ctxs[0], 0) );   

	//cuSafeCall( cuCtxDetach(ctxs[0]) );   

		

	CUdevice dev;

	cuDeviceGet(&dev, 1);   

	cuCtxCreate(&ctxs[1], 0, dev);		

	void *ptr2;

	size_t step2;

	cudaMallocPitch(&ptr2, &step2, 100, 100);

	cudaFree( ptr2 );

	//crash

	cuCtxDestroy(ctxs[1]); 

	cout << "unreacheble place in code" << endl;

}
Configuration:

GeForce 470

GeForce 9600

Toolikt 3.1 32-bit

Win7 64-bit

Compiled with “-gencode arch=compute_11,code=sm_11”

I get no crash. Only thing I changed was cuSafeCall() to cutilDrvSafeCall()

Maybe this helps, or it might be another problem (no idea what).

Using win 7 64 with toolkit 3.1, compiled and ran for win32 debug and release, output:

unreacheble place in code

Press any key to continue . . .

Nerei · August 26, 2010, 9:10am

Thank you for answer. I realised that this code really works even at my machine. A bit update below - here is a crash. If I comment call getCudaEnabledDeviceCount then no crash appeared. This function is from trunk version of OpenCV (compiled with flag WITH_CUDA set in CMake) and just calls and returns cudaGetDeviceCount. I attached the code and with cmake-file if one wants check it. OpenCV can be downloaded here: https://code.ros.org/svn/opencv/trunk/opencv The library has no static conctructors. So no cuda code except cudaGetDeviceCount is called explicitly.

#include "opencv2/gpu/gpu.hpp"

int main(int argc, char* argv[])

{   

	CUcontext ctxs[10];

	

	cuInit(0);

	cout << cv::gpu::getCudaEnabledDeviceCount() << endl;

	void *ptr;	size_t step;

	cudaMallocPitch(&ptr, &step, 100, 100);

				

	CUdevice dev;	cuDeviceGet(&dev, 1);   

	cuCtxCreate(&ctxs[1], 0, dev);		

	void *ptr2;	size_t step2;

	cudaMallocPitch(&ptr2, &step2, 100, 100);

	//crash

	cuCtxDestroy(ctxs[1]); 

	cout << "unreacheble place in code" << endl;

}

test.zip (2.33 KB)

Nerei · August 26, 2010, 9:10am

Thank you for answer. I realised that this code really works even at my machine. A bit update below - here is a crash. If I comment call getCudaEnabledDeviceCount then no crash appeared. This function is from trunk version of OpenCV (compiled with flag WITH_CUDA set in CMake) and just calls and returns cudaGetDeviceCount. I attached the code and with cmake-file if one wants check it. OpenCV can be downloaded here: https://code.ros.org/svn/opencv/trunk/opencv The library has no static conctructors. So no cuda code except cudaGetDeviceCount is called explicitly.

#include "opencv2/gpu/gpu.hpp"

int main(int argc, char* argv[])

{   

	CUcontext ctxs[10];

	

	cuInit(0);

	cout << cv::gpu::getCudaEnabledDeviceCount() << endl;

	void *ptr;	size_t step;

	cudaMallocPitch(&ptr, &step, 100, 100);

				

	CUdevice dev;	cuDeviceGet(&dev, 1);   

	cuCtxCreate(&ctxs[1], 0, dev);		

	void *ptr2;	size_t step2;

	cudaMallocPitch(&ptr2, &step2, 100, 100);

	//crash

	cuCtxDestroy(ctxs[1]); 

	cout << "unreacheble place in code" << endl;

}

jan.heckman · August 28, 2010, 6:04pm

[quote name=‘Nerei’ post=‘1109004’ date=‘Aug 26 2010, 11:10 AM’]

Thank you for answer. I realised that this code really works even at my machine. A bit update below - here is a crash. If I comment call getCudaEnabledDeviceCount then no crash appeared. This function is from trunk version of OpenCV (compiled with flag WITH_CUDA set in CMake) and just calls and returns cudaGetDeviceCount. I attached the code and with cmake-file if one wants check it. OpenCV can be downloaded here: https://code.ros.org/svn/opencv/trunk/opencv The library has no static conctructors. So no cuda code except cudaGetDeviceCount is called explicitly.

[codebox]#include

#include <cuda.h>

//#include <cutil.h>

#include <cutil_inline.h>

#include <cutil_inline_drvapi.h>

using namespace std;

int main(int argc, char* argv)

{

int devices;

CUcontext ctxs[10];

cuInit(0);

//cout << cv::gpu::getCudaEnabledDeviceCount() << endl;

cudaGetDeviceCount(&devices);

printf("%d cuda devices\n",devices);

void *ptr; size_t step;

cudaMallocPitch(&ptr, &step, 100, 100);

CUdevice dev; cuDeviceGet(&dev, 1);

cuCtxCreate(&ctxs[1], 0, dev);

void *ptr2; size_t step2;

cudaMallocPitch(&ptr2, &step2, 100, 100);

//crash

cuCtxDestroy(ctxs[1]);

cout << “unreacheble place in code” << endl;

return 0;

}[/codebox]

Output:

2 cuda devices

unreacheble place in code

Press any key to continue . . .

jan.heckman · August 28, 2010, 6:04pm

[quote name=‘Nerei’ post=‘1109004’ date=‘Aug 26 2010, 11:10 AM’]

Thank you for answer. I realised that this code really works even at my machine. A bit update below - here is a crash. If I comment call getCudaEnabledDeviceCount then no crash appeared. This function is from trunk version of OpenCV (compiled with flag WITH_CUDA set in CMake) and just calls and returns cudaGetDeviceCount. I attached the code and with cmake-file if one wants check it. OpenCV can be downloaded here: https://code.ros.org/svn/opencv/trunk/opencv The library has no static conctructors. So no cuda code except cudaGetDeviceCount is called explicitly.

[codebox]#include

#include <cuda.h>

//#include <cutil.h>

#include <cutil_inline.h>

#include <cutil_inline_drvapi.h>

using namespace std;

int main(int argc, char* argv)

{

int devices;

CUcontext ctxs[10];

cuInit(0);

//cout << cv::gpu::getCudaEnabledDeviceCount() << endl;

cudaGetDeviceCount(&devices);

printf("%d cuda devices\n",devices);

void *ptr; size_t step;

cudaMallocPitch(&ptr, &step, 100, 100);

CUdevice dev; cuDeviceGet(&dev, 1);

cuCtxCreate(&ctxs[1], 0, dev);

void *ptr2; size_t step2;

cudaMallocPitch(&ptr2, &step2, 100, 100);

//crash

cuCtxDestroy(ctxs[1]);

cout << “unreacheble place in code” << endl;

return 0;

}[/codebox]

Output:

2 cuda devices

unreacheble place in code

Press any key to continue . . .

Nerei · September 3, 2010, 4:03pm

I attached an update that have no OpenCV dependences and 40 lines total. But it crashes only if it is started under debugging mode - F5 in Visual Studio. Also I have a sample that crashes even without Visual Studio, but it includes much more CUDA code.
TestOpenCV.zip (5.65 KB)
TestOpenCV_vs8.0.zip (4.55 KB)

Nerei · September 3, 2010, 4:03pm

I attached an update that have no OpenCV dependences and 40 lines total. But it crashes only if it is started under debugging mode - F5 in Visual Studio. Also I have a sample that crashes even without Visual Studio, but it includes much more CUDA code.