For-Loop is not executed

weichertv · November 9, 2012, 1:49pm

Hello!

I have a problem with a simple for-loop:

#include <stdio.h> 
#include <cuda.h>

__global__ void test(int *a) {
 for (int i = 0; i < 4; i++) {
   a[i] = i;
   printf("%d %d\n", threadIdx.x, i);
 }
}

int main(int argc, char **argv) {
  int *a, *b;
  cudaMalloc((void **) &a, 4 * sizeof(int));
  b = (int *) malloc(4 * sizeof(int));
  test <<< 1,1 >>>(a);
  cudaMemcpy(b, a, 4 * sizeof(int), cudaMemcpyDeviceToHost);
  for (int i = 0; i < 4; i++) printf("%d ", b[i]);
  printf("\n");
  cudaFree(a);
  free(b);
  return 0;
}

Basically, the kernel runs on one thread and executes a for loop that runs from 0 to 3 and sets a value in a global array. Also, some info is printed.

I compiled this code with toolkit 4.2 using
nvcc -gencode=arch=compute_20,code=compute_20 -gencode=arch=compute_20,code=sm_20 -lcudart

The output should be

but instead I get

0 0 
0 0 0 0

I tried the program on a GTX 580 and a GTX 550 Ti. The result is the same. Can anybody explain that to me? I thought about his for a long time, but I just don’t get it.

Volker

Edit: I changed the code to include the correct kernel call.

jgonzac · November 9, 2012, 2:00pm

Have you called “test(a)” like a cuda kernel? I mean:

test <<< 1,1 >>>(a);

It works for me.

weichertv · November 9, 2012, 2:03pm

Yes, the kernel call is test <<< 1,1 >>>(a). That did not show up in the code. Sorry.

jgonzac · November 9, 2012, 2:12pm

Sorry then. This code works for me in a GTX 580, Win7, 32/64 bits. Toolkit 4.2.

njuffa · November 9, 2012, 8:20pm

To get to the bottom of the failures, I would suggest adding error status checks after every CUDA API call and after every kernel launch.

weichertv · December 5, 2012, 11:17am

Thank you for the replies. The problem was caused by the printf in the kernel loop. I ran this example on Linux with toolkit 4.2 and a GTX580. Removing the printf solved the issue.

Topic		Replies	Views
kernell calls inside a loop is it ok? CUDA Programming and Performance	3	12298	December 4, 2010
CUDA program issue, for loop CUDA Programming and Performance cuda	10	41	September 4, 2024
for loop inside kernel CUDA Programming and Performance	2	5372	September 12, 2011
[Help] For-loop freezes Computer A for loop inside a global function freezes all my computer. CUDA Programming and Performance	0	3070	August 10, 2011
attempts to free array inside kernel CUDA Programming and Performance	1	526	November 17, 2014
'for' loop performance hacks? CUDA Programming and Performance	17	10558	February 28, 2009
waiting on variable from host fails CUDA Programming and Performance	2	1641	March 24, 2010
An question about a cuda program CUDA Programming and Performance	2	1137	June 13, 2013
Cuda code exited without printing the results and no error CUDA Programming and Performance	3	886	October 12, 2021
CUDA kernel not running Kernels on windows XP CUDA Programming and Performance	5	18758	October 7, 2008

For-Loop is not executed

Related topics