Kernel Launch Failure Very simple kernel

alucard74 · September 14, 2011, 4:10pm

Hi

I wanted to try the cuda programation so I have found a very simple code with a very simple kernel to try it.

her is the code suppose to square an array

#include "cuda/cuda_runtime.h"

#include "cuda/cuda.h"

#include <iostream>

#include <algorithm>

#include <iterator>

#include <sys/time.h>

#include <stdio.h>

// Kernel that executes on the CUDA device

__global__ void square_array(float *a, int N)

{

  int idx = blockIdx.x * blockDim.x + threadIdx.x;

  if (idx<N) a[idx] = a[idx] * a[idx];

}

// main routine that executes on the host

int main(void)

{

  float *a_h, *a_d;  // Pointer to host & device arrays

  const int N = 10;  // Number of elements in arrays

  size_t size = N * sizeof(float);

  a_h = (float *)malloc(size);        // Allocate array on host

  cudaMalloc((void **) &a_d, size);   // Allocate array on device

  // Initialize host array and copy it to CUDA device

  for (int i=0; i<N; i++) a_h[i] = (float)i;

  cudaMemcpy(a_d, a_h, size, cudaMemcpyHostToDevice);

  // Do calculation on device:

  int block_size = 4;

  int n_blocks = N/block_size + (N%block_size == 0 ? 0:1);

  square_array <<< n_blocks, block_size >>> (a_d, N);

  // Retrieve result from device and store it in host array

  cudaMemcpy(a_h, a_d, sizeof(float)*N, cudaMemcpyDeviceToHost);

  // Print results

  for (int i=0; i<N; i++) printf("%d %f\n", i, a_h[i]);

  // Cleanup

  free(a_h); cudaFree(a_d);

}

The problem is that the kernel is not lauched, the output array is not modified and with cuda getlasterror I know that the kernel is not lauched but I don’t know why.

I’m running on CentOs 6.1

Here are the specs of my nvidia card which should be enough

./mon_exec 

CUDA Device Query...

There are 1 CUDA devices.

CUDA Device #0

Major revision number:         1

Minor revision number:         1

Name:                          Quadro NVS 295

Total global memory:           267714560

Total shared memory per block: 16384

Total registers per block:     8192

Warp size:                     32

Maximum memory pitch:          2147483647

Maximum threads per block:     512

Maximum dimension 0 of block:  512

Maximum dimension 1 of block:  512

Maximum dimension 2 of block:  64

Maximum dimension 0 of grid:   65535

Maximum dimension 1 of grid:   65535

Maximum dimension 2 of grid:   1

Clock rate:                    1300000

Total constant memory:         65536

Texture alignment:             256

Concurrent copy and execution: No

Number of multiprocessors:     1

Kernel execution timeout:      Yes

Press any key to exit...

If I use cudacheck error I have the following message, the only problem is that I don’t have any device…

cudaCheckError() failed at main.cu:101 : invalid device function .

I have looked at several forums but didn’t find any answer. I manage to copy and retrieve from the gpu memory and everything, the only thin is that it fail to launch the kernel.

Thank you in advance for your help, feel free to ask me if you need more information.

Thomas

jpaulorio · September 14, 2011, 4:23pm

You should pass the address of “a_d” shouldnÂ´t you?

square_array <<< n_blocks, block_size >>> (&a_d, N);

Att.

JP

avidday · September 14, 2011, 5:04pm

No he should not.

To the original poster: if you are getting invalid device function, that usually means that you are trying to run code which has been compiled for the wrong architecture compared with what you are trying to run it on. Could you edit how you are compiling this code into your original question, and add what CUDA version and OS you are using?

alucard74 · September 14, 2011, 5:36pm

Thank you for your quick and accurate answer avidday, it was indeed the architecture which was the problem.

So I’ve change the option of my makefile and it worked so thank you again.

For those who have the same problem the option at the compilation with nvcc to change the architecture is :

nvcc x.cu -arch=compute_13 -code=sm_13

arch specifies the virtual arquictecture, which can be compute_10, compute_11, etc.
- code specifies the real architecture, which can be sm_10, sm_11, etc.

Topic		Replies	Views
Invalid configuration argument Kernels fail to work with big arrays CUDA Programming and Performance	2	9596	October 6, 2008
Need Help to get CUDA running with c++ CUDA Setup and Installation	1	523	June 25, 2019
Getting error cudaErrorInvalidConfiguration. CUDA Programming and Performance	1	1559	January 7, 2020
invalid device function, all CUDA-capable devices are busy or unavailable CUDA Programming and Performance	5	7748	July 6, 2013
Launching Kernels with Runtime Calls Instead of <<< >>> Notation? CUDA Programming and Performance	2	954	April 2, 2011
Launching Kernel Fail CUDA Programming and Performance	15	3402	May 28, 2014
Kernel is not being launched. SDK kernels get launched. Mine doesn't. CUDA Programming and Performance	4	2618	July 22, 2010
CUDA kernels keep on crashing CUDA Programming and Performance	6	3644	October 27, 2008
Error when using cudaLaunch cudaErrorInvalidDeviceFunction error CUDA Programming and Performance	1	4553	July 1, 2009
unspecified launch failure kernel executes fine , but it gives error while copying back from devi CUDA Programming and Performance	0	1038	January 19, 2009

Kernel Launch Failure Very simple kernel

Related topics