GTX Titan Black Memory Issues

Ermeng · December 8, 2016, 9:21pm

Hello,

I am using a GTX Titan Black (2880Cores, 6G Memory) and Cuda-6.0 for my matrix computation.
The input matrix has a size of:
200190500(single type) 4B = 70 MB.
I also defined some variables and allocated memory for them. The size of all the variables (float type) defined with cudaMalloc() came up to:
(200190*5003 + 20019050038)4B = 1.91 GB,
which is less than my total 6G memory. The program crashed and I reduced the matrix size to: 2001904404B = 62.3 MB,
and kept the other varialbes the same. These variables took
(2001904403 + 20019044038)*4B = 1.6817 GB memory, and the program ran with no problem. cudaMemGetInfo() function returned 5.97G total memory, 5.66G free memory before allocation and 3.97G memory after allocation. This confirms that all my variables took 5.66-3.97 = 1.68G memory, as I previously calculated. So far, there is no problem with my code.

Then I slightly increased my matrix to 200190441 4B = 62.4MB, and kept all the other variables the same, which took 1.6856 GB memory (slightly > 1.6817 G). My program started to crash and gave me some weird results. I also tried to change the matrix to 200190*{442,443,444,…}, and I had the same problem and results.

By the way, my block size was (8,8,8), and grid size was (200/8, 190/8*440/8) = (25,1320). I do not know what the problem is and I really want to get this GPU work as I spent a lot of money on it. I would appreciate if anyone can help me out.

Thanks.

tera · December 8, 2016, 9:59pm

What platform and driver? The WDDM driver under Windows has limitations similar to what you describe. TCC driver or Linux don’t.

Is the program compiled for 64 bit?

Another thing to check would be that the size calculations are performed with 64 bits. However that wouldn’t explain problems below 2 GB.

Ermeng · December 8, 2016, 10:12pm

Thank you very much tera. I am using CentOS 6.7 Linux system (64 bit) and downloaded the driver here:
Geforce Driver Results | NVIDIA.

And yes the program was compiled for 64bit.

Any comments?

Thanks,
Fumin

Robert_Crovella · December 8, 2016, 10:16pm

Just because your program crashed when you changed matrix dimensions slightly does not mean that is a GPU problem or a GPU memory problem.

You could easily have an error in your code.

Do proper cuda error checking and run your code with cuda-memcheck. If cuda-memcheck reports any errors in the failing case, it usually indicates a coding defect on your part.

It is possible that a simple kernel timeout is occurring even under linux, but proper cuda error checking usually gives you an indication of that (the error code will indicate kernel timeout)

Ermeng · December 12, 2016, 9:16pm

Thank you txbob. I have attached my demo code (runme.txt is actually runme.m, as .m files are not allowed here) and would appreciate if you can take a look of this. I tried matrix sizes of 700700690, which did not work. I reduced it to 700700689 and it worked, so did smaller sizes. I’ve no idea what happens.
testCuda.cu (4.18 KB)
runme.txt (226 Bytes)

njuffa · December 12, 2016, 9:56pm

You say the program “crashed”. Crashed how? Is there a status check for every CUDA API call and every kernel invocation? Does any of them report an error? If so, what error is reported? What happens if you run the code under the control of cuda-memcheck?

Given the dimensions of the 3D matrix, I wonder whether the code might run into (signed) integer overflow in some intermediate computations, although the difference in behavior between dimensions of 689 and 690 would seem to suggest that that is probably not the issue.

Robert_Crovella · December 12, 2016, 11:05pm

Your error checking is broken. This:

void Check_CUDA_Error(const char *message)
{
    cudaError_t error = cudaGetLastError();
    if(error ==cudaSuccess) 
     {
       mexPrintf("no ERROR: %s: %s\n", message, cudaGetErrorString(error));
       //exit(-1);
     }                        
}

should be this:

void Check_CUDA_Error(const char *message)
{
    cudaError_t error = cudaGetLastError();
    if(error !=cudaSuccess)  // *********note change on this line only**************
     {
       mexPrintf("no ERROR: %s: %s\n", message, cudaGetErrorString(error));
       //exit(-1);
     }                        
}

I wouldn’t be able to spend any more time on it unless you provide a stand-alone test case. Remove the dependency on matlab and mex.

Then, after fixing your error checking, with the standalone test case (assuming it fails), run your code with cuda-memcheck.

Provide the full output from that (copy and paste the output into this thread) and I’ll take another look as time permits.

Ermeng · December 13, 2016, 11:22pm

Your error checking is broken. This:
void Check_CUDA_Error(const char *message)
{
    cudaError_t error = cudaGetLastError();
    if(error ==cudaSuccess) 
     {
       mexPrintf("no ERROR: %s: %s\n", message, cudaGetErrorString(error));
       //exit(-1);
     }                        
}
should be this:
void Check_CUDA_Error(const char *message)
{
    cudaError_t error = cudaGetLastError();
    if(error !=cudaSuccess)  // *********note change on this line only**************
     {
       mexPrintf("no ERROR: %s: %s\n", message, cudaGetErrorString(error));
       //exit(-1);
     }                        
}
I wouldn’t be able to spend any more time on it unless you provide a stand-alone test case. Remove the dependency on matlab and mex.

Then, after fixing your error checking, with the standalone test case (assuming it fails), run your code with cuda-memcheck.

Provide the full output from that (copy and paste the output into this thread) and I’ll take another look as time permits.

I did error checking as you did but forgot to change it back after testing this Check_CUDA_Error() function.

I also have provided a standalone test demo. I compiled it with nvcc -o cuda.out cuda.cu and run cuda-memcheck with outputs as follows:

========= CUDA-MEMCHECK
========= Error: process didn’t terminate successfully
========= The application may have hit an error when dereferencing Unified Memory from the host. Please rerun the application under cuda-gdb or Nsight Eclipse Edition to catch host side errors.
========= ERROR SUMMARY: 0 errors

Then I run the code as ./cuda.out 5 5 to multiply two 5*5 matrixes element-by-element. Below is a copy of the output:
iNx is: 5, iNy is: 5.
blocksInX is: 1, blocksInY is: 1
dimBlock.x is: 8, dimGrid.x is: 1
dimBlock.y is: 8, dimGrid.y is: 1
pf_o[0] is: 0.000000
pf_o[1] is: 1.000000
pf_o[2] is: 2.000000
pf_o[3] is: 3.000000
pf_o[4] is: 4.000000.

This is what I expected but when I increased the size of the two matrixes it gave me something wrong. For example, I ran the code as ./cuda.out 10000 10000 and got outputs as follows:

iNx is: 10000, iNy is: 10000.
blocksInX is: 1250, blocksInY is: 1250
dimBlock.x is: 8, dimGrid.x is: 1250
dimBlock.y is: 8, dimGrid.y is: 1250
pf_o[99999990] is: 99999992.000000
pf_o[99999991] is: 99999992.000000
pf_o[99999992] is: 99999992.000000
pf_o[99999993] is: 99999992.000000
pf_o[99999994] is: 99999992.000000
pf_o[99999995] is: 99999992.000000
pf_o[99999996] is: 100000000.000000
pf_o[99999997] is: 100000000.000000
pf_o[99999998] is: 100000000.000000
pf_o[99999999] is: 100000000.000000

I noticed that the outputs for lower indices were correct as opposed to higher indices as shown above. Any comments?
cuda.cu (3.61 KB)

njuffa · December 14, 2016, 12:40am

What is your code doing? If it is summing ‘float’ data, it looks like you are running into an issue with limited granularity of the ‘float’ data type, which then causes data that should mathematically sum to different results round to the same floating-point number.

The resolution provided by ‘float’ is only about 7 decimal digits, and your numbers above exceed that length. Try switching your data to ‘double’.

Topic		Replies	Views
CUDA 5.0 - cudaGetDeviceProperties returns very strange grid size CUDA Programming and Performance	10	4733	June 6, 2013
cudaMemcopy bug CUDA Programming and Performance	3	1520	September 19, 2011
Working with very large arrays in CUDA CUDA Programming and Performance	9	7177	March 29, 2011
Cuda application crashes works fine for small data and crashes for big data CUDA Programming and Performance	3	412	October 12, 2021
Problems of matrix multiplication With and without CUDA CUDA Programming and Performance	15	10000	January 18, 2012
Memory problem? ...incredible slowdown CUDA Programming and Performance	29	16295	January 30, 2011
Unable to run several CUDA samples. CUDA Programming and Performance	2	821	April 1, 2019
cudaMalloc error in big loop CUDA Programming and Performance	12	15602	May 21, 2008
Can a Kernel be too big?? CUDA_ERROR_NO_BINARY_FOR_GPU error 209 CUDA Programming and Performance	11	3021	November 13, 2017
CUDA 2.1 Beta Problem/Bugs (Linux) CUDA Programming and Performance	5	1646	January 6, 2009

GTX Titan Black Memory Issues

Related topics