DIM3 data type cannot lanuch at MPI

I use OSU MVAPICH2 2.2b (MPI3.1.4) and ran simpleMPI of CUDA SDK. The graphic card is Titan X, SDK is 7.5. System is CentOS 6.7. It works well. However, after I changed the kernel arguments from int to dim3

void computeGPU(float *hostData, int blockSize, int gridSize)
{
        int dataSize = blockSize * gridSize;

        // Allocate data on GPU memory
        float *deviceInputData = NULL;
        CUDA_CHECK(cudaMalloc((void **)&deviceInputData, dataSize * sizeof(float)));

        float *deviceOutputData = NULL;
        CUDA_CHECK(cudaMalloc((void **)&deviceOutputData, dataSize * sizeof(float)));

        // Copy to GPU memory
        CUDA_CHECK(cudaMemcpy(deviceInputData, hostData, dataSize * sizeof(float), cudaMemcpyHostToDevice));
        
        dim3 gridSize1 = dim3(gridSize, gridSize, gridSize);
        dim3 blockSize1 = dim3(blockSize, blockSize, blockSize);
        // Run kernel
        //simpleMPIKernel<<<gridSize, blockSize>>>(deviceInputData, deviceOutputData);
        simpleMPIKernel<<<gridSize1, blockSize1>>>(deviceInputData, deviceOutputData);
        cudaCheckError()
        // Copy data back to CPU memory
        CUDA_CHECK(cudaMemcpy(hostData, deviceOutputData, dataSize *sizeof(float), cudaMemcpyDeviceToHost));

        // Free GPU memory
        CUDA_CHECK(cudaFree(deviceInputData));
        CUDA_CHECK(cudaFree(deviceOutputData));
}

Then I use the same Makefile provided by SDK. The use the command
mpirun_rsh -hostfile mf -n 2 MV2_USE_GPUDIRECT_GDRCOPY=0 MV2_USE_CUDA=1 MV2_USE_GPUDIRECT=1 ./simpleMPI

I’ve got the wrong message:Cuda failure simpleMPI.cu:83: ‘invalid configuration argument’
Cuda failure simpleMPI.cu:83: ‘invalid configuration argument’

line 81 simpleMPIKernel<<<gridSize1, blockSize1>>>(deviceInputData, deviceOutputData);
line 82 cudaCheckError()

Assuming the grid was a 1-D grid before the change, and the block was a 1-D block, you would want:

dim3 gridSize1 = dim3(gridSize);
dim3 blockSize1 = dim3(blockSize);

Actually, just one dimension works well just like I pass int. But if we use two or three dimensions, it doesn’t work. I use this simple sample just want to explain that the kernel don’t accept 2D and 2D dimension if you generate exe using mpicxx.

I have my code that I use 2D grid and 2D data and don’t work.

Very likely when you specify this:

dim3 blockSize1 = dim3(blockSize, blockSize, blockSize);

you are exceeding the maximum limit of the number of threads per block of a CUDA GPU. The “invalid configuration” error is exactly the error that would be reported if a kernel launch failed for this reason.

You’ll need to learn more about CUDA programming in general, and this app (simpleMPI) in particular.

You can’t take any arbitrary CUDA app that was written to run in a 1D grid and simply change the grid dimensions to 3D and expect things to work. It might work, but it might not. So even if you limit the 3D threadblock dimensions to have a maximum total number of threads that is 1024 or less, the simpleMPI app still may not run correctly (although you’ll likely get past the “invalid configuration” error).

Thanks a lot. Let me check the parameters.