Problem about assigning data in kernel to large size array

ScottWang · October 28, 2011, 1:07am

Hello everyone,

I am new to CUDA. Now I am having a problem about assigning data to large size array in parallel.

The hardware I am using is Tesla c2070. I made a simple example to show the problem that I have.

This is the code:

#include <stdio.h>

#include <stdlib.h>

#include <cutil_inline.h>

#include <shrQATest.h>

#include <string.h>

#include <fstream>

#include <iostream>

__global__ void func1(float* A, int N)

{

    int i = blockDim.x * blockIdx.x + threadIdx.x;

    if (i < N)

    {

	if (i % 2 == 0)

	    A[i] = 3.5;

        else

	    A[i] = 7.5;

    }

}

int main(int argc, char** argv)

{

    float *A_CPU, *A_GPU;

    int len = 2500000;

    A_CPU = new float[len*2];

    memset(A_CPU, 0, len*8);

    cudaMalloc((void**)&A_GPU, sizeof(cufftComplex)*len);

    int threadsPerBlock = 256;

    int blocksPerGrid = (len*2 + threadsPerBlock - 1) / threadsPerBlock;

    func1<<<blocksPerGrid, threadsPerBlock>>>(A_GPU, len*2);

    cudaMemcpy(A_CPU, A_GPU, len*8, cudaMemcpyDeviceToHost);

std::string resultFile = "/home/qdi_admin/Downloads/parallelResult";

    std::ofstream f_result;

    f_result.open((char*)resultFile.c_str(), std::ios::out | std::ios::binary);

    if (!f_result.write ((char*)A_CPU, len*8))

    {

	  std::cout << "Write-to File Error !!!!" << std::endl;

    }

    f_result.close();

}

This code is just to assign the complex value 3.5+7.5j to an vector which is of size 25000000.

There is no error or warning when I compile and run the code. Then I verify the output data in Matlab. The value of each element should be 3.5+7.5j. But many values are wrong. It may be some strange numbers like -1.9984e+18 - 1.9984e+18j

If I reduce the length of data from 25000000 to 2500000, the result will be all right.

Can anyone give me any advice?

Thanks a lot.

ScottWang · October 28, 2011, 1:17am

The Tesla has 6 GB memory. So I think it shouldn’t be the issue of out of memory when the length is 25000000

zzz256 · October 28, 2011, 7:52am

Hello,

the maximum x- or y-dimension of a grid of thread blocks is 65535.
In your code, for len = 25000000, blocksPerGrid is 195313.

You should check for errors with cudaGetLastError() after your kernel call,
which returns cudaErrorInvalidConfiguration for me in this case.

You can fix your code by using both x- and y-dimensions of the grid.

Timo

ScottWang · October 29, 2011, 6:14am

To Timo,

Thanks very much.

Topic		Replies	Views
help to clairfy usage of number of grids and number of blocks in kernal CUDA Programming and Performance	0	611	February 14, 2014
Maximum number of threads on thread block CUDA Programming and Performance	12	74198	September 21, 2023
Tesla K80 size problem CUDA Programming and Performance	4	2392	September 11, 2015
Probably a simple answer Simple CUDA code - unexpected result CUDA Programming and Performance	7	4853	October 27, 2010
Invalid configuration argument Kernels fail to work with big arrays CUDA Programming and Performance	2	9596	October 6, 2008
Indexing Errors with a large array CUDA Programming and Performance	3	2203	February 24, 2009
kernel invocation parameters CUDA Programming and Performance	2	805	January 26, 2015
cannot resolve the error in running multi-block, mutli-threads kernel CUDA Programming and Performance	5	1062	February 5, 2014
Maximum memory allocation size CUDA Programming and Performance	7	16673	January 24, 2012
Limit on the size of data that can be processed by a kernel Newbie question CUDA Programming and Performance	2	1348	January 16, 2009

Problem about assigning data in kernel to large size array

Related topics