Error question regarding CUDA Sort

11friends1119 · August 29, 2019, 5:55pm

#include <iostream>

#include <cuda.h>

#include <time.h>

#include <math.h>

using namespace std;



// 테스트 용이므로 일단 자료 크기는 10000으로

// 1D이니까 그냥 블럭사이즈는 512로



//10만개부터 에러났음. 아마 랜덤 숫자 만들어내는 데, 아니면 GPU메모리 상에서 문제가 발생한 것 같음.

// 만일 화면 데이터를 정렬한다고 하면, 2560x1600 = 4,096,000 픽셀이니까 GPU메모리 상에서의 문제가

// 아니라 랜덤 숫자 만들어내는 곳에서 문제가 발생한 것일 수도...

#define DATASIZE   20100

#define BLOCK_SIZE    512



__global__ void oddevensort(int * input, int * output, unsigned int len)

{

	//개별 블럭의 좌표

	unsigned int tx = threadIdx.x;



	//전체 이미지의 좌표

	unsigned int x = tx + blockDim.x * blockIdx.x;

	//이동에 쓸 임시 변수

	int temp;



	//자료의 길이만큼 돌리는데, 인덱스(i)가 짝수이면 데이터의 짝수자리와 그 다음 숫자를 비교.

	//인덱스가 홀수이면 데이터의 홀수자리와 그 다음 숫자를 비교해서 정렬한다.

	for (int i = 0 ; i < len; i++)                  //for(int i = 0; i< len; i++)

	{

		if (i % 2 == 0)

		{

			// 길이를 측정안해주면 블럭에 남아있던 자리에 있는 자료가 튀어나올 수 있으니 조심.

			if (input[x] > input[x + 1] && x < len && x % 2 == 0)

			{

				temp = input[x + 1];

				input[x + 1] = input[x];

				input[x] = temp;

			}

		}

		else

		{

			if (input[x] > input[x + 1] && x < len && x % 2 != 0)

			{

				temp = input[x + 1];

				input[x + 1] = input[x];

				input[x] = temp;

			}

		}

	}



	output[x] = input[x];

	__syncthreads();



}







int main()

{

	// 테스트에 쓸 숫자 생성

	int TestInput[DATASIZE], TestOutput[DATASIZE];



	srand(time(NULL));



	for (int i = 0; i < DATASIZE; i++)

	{

		TestInput[i] =   rand() % 500;

	}



	



	//device 설정

	int *devInput, *devOutput;

	//일단 크기는 아니까

	unsigned int MemDataSize = DATASIZE * sizeof(float);



	// device 자리 잡아주고

	cudaMalloc((void**)&devInput, MemDataSize);

	cudaMalloc((void**)&devOutput, MemDataSize);

	cudaMemset(devOutput, 0, MemDataSize);



	// 자리 잡았으면 복사

	cudaMemcpy(devInput, TestInput, MemDataSize, cudaMemcpyHostToDevice);



	// block 크기 설정

	// 1D 이니까, 그냥 간단하게...

	dim3    dimBlocksize(BLOCK_SIZE);

	dim3    dimGridsize(ceil((DATASIZE - 1) / (float)BLOCK_SIZE) + 1);



	// 일단 Max값과 min값을 알아내야됨.

	// 처음부터 끝까지 휙 둘러보면 되니 이건 CPU에게 맡김.



	oddevensort << < dimGridsize, dimBlocksize >> > (devInput, devOutput, DATASIZE);



	// 결과물 복사

	cudaMemcpy(TestOutput, devOutput, MemDataSize, cudaMemcpyDeviceToHost);



	for (int i = 0; i < DATASIZE; i++)

	{

		cout << TestOutput[i] << ", ";

		if ((i + 1) % 10 == 0)

		{

			cout << endl;

		}

	}



	// 위에 GPU에 마련한 자리 해소. 그때 그때 해놓는 게 편할 듯

	cudaFree(devInput);

	cudaFree(devOutput);



	return 0;

}

If you make more than 100,000 DATASIZE, you will get an error. Please tell me the reason and solution.

njuffa · August 29, 2019, 6:00pm

What kind of error?

Are you on Windows? If so, check whether you are running into a Windows TDR event: [url]https://docs.microsoft.com/en-us/windows-hardware/drivers/display/timeout-detection-and-recovery[/url]

By default, the Windows graphical subsystem limits kernel execution time to about 2 seconds, before the watchdog timer times out and resets the driver, destroying the CUDA context in the process.

In that case, you can either limit kernels to < 2 second run time (via any combination of: smaller problem size, more highly tuned code, faster hardware), use a GPU supported by the TCC driver (and thus not associated with the Windows GUI) or experiment with changing the Windows TDR settings: [url]https://docs.microsoft.com/en-us/windows-hardware/drivers/display/tdr-registry-keys[/url]

Robert_Crovella · August 29, 2019, 6:09pm

When I compile and run your code with cuda-memcheck, I get all sorts of errors.

no point exploring larger DATASIZE, debug what you have now.

Any time you are having trouble with a CUDA code, its good practice to use proper CUDA error checking, and run your code with cuda-memcheck.

It’s fairly evident you are on windows. You may wish to enable the memory checker in Visual Studio, to get similar information to cuda-memcheck. But cuda-memcheck can be used on windows also.

Robert_Crovella · August 29, 2019, 6:26pm

Having large stack-based variables like this:

int TestInput[DATASIZE], TestOutput[DATASIZE];

is generally a bad idea. Use dynamic allocation instead.

11friends1119 · August 30, 2019, 1:26am

Thank you for the answers.

The type of error is -842150421 -842150421 -842150421 -842150421 …

I did the answer above but I can’t. Maybe it’s a memory problem.

Robert_Crovella · August 30, 2019, 6:17pm

An odd-even sort cannot work correctly without some form of synchronization between the loops. You don’t have any synchronization, so your code cannot work reliably in any fashion.
Even if we have a convenient grid-wide sync mechanism (there are at least 2 possiblities) we would still have to address the question of what happens at threadblock boundaries.
__syncthreads() as the very last line in any kernel serves no useful purpose. I’m not sure why people write code like that.
Creating large stack-based allocations (in host code) is a bad idea. Use dynamic allocation (e.g. malloc, new, std::vector, etc.) instead.

Essentially all of these problems are addressed in this code:

[url]http://techguru10x.blogspot.com/p/implementable.html[/url]

and there is a CUDA sample code demonstrating odd-even sorting:

[url]CUDA Samples :: CUDA Toolkit Documentation

If you still have problems, make sure you are not running into a WDDM TDR timeout:

[url]NVIDIA GameWorks Documentation

11friends1119 · August 31, 2019, 7:55am

Robert_Crovella!

Thank you very much. Even-oddsort code it was very helpful. I wish you good luck.

Topic		Replies	Views
Faulty sort algorithm. Please help! Odd even sort CUDA Programming and Performance	9	11668	July 26, 2008
My CUDA program fails for larger grids CUDA Programming and Performance	11	1631	October 11, 2010
why does my bubble sort get wrong? CUDA Programming and Performance	3	947	November 27, 2010
Illegal memory access was encountered when launch some kernels in a loop CUDA Programming and Performance	4	1624	October 12, 2021
confusing the result of running a kernel CUDA Programming and Performance	6	770	November 19, 2014
Cuda malfunctions CUDA Programming and Performance	5	878	March 13, 2023
when large number stack overflow CUDA Programming and Performance	1	1135	October 19, 2017
Error in garbage collection stage? CUDA Programming and Performance	2	6384	March 25, 2009
Inconsistent Cuda Errors CUDA Programming and Performance	10	1160	July 7, 2017
Random failures in quick-sort. Compute-sanitizer says there's no error CUDA Programming and Performance	4	97	October 5, 2024

Error question regarding CUDA Sort

Related topics