Different thread values in the same cycle


I made a small CUDA program but my problem is that the tid is sometimes different and as I see it changes itself in a small range. I’ve already googled this problem but found nothing… Please help me!

Here’s my code:

#include "cuda_runtime.h"
#include "device_launch_parameters.h"
#include "device_functions.h"
#include <iostream>

using namespace std;

__global__ void kernelFunc(int* nums) {
	int threadsPerBlock = blockDim.x * blockDim.y * blockDim.z;
	int threadPosInBlock = threadIdx.x + 
		blockDim.x * threadIdx.y +
		blockDim.x * blockDim.y * threadIdx.z;
	int blockPosInGrid = blockIdx.x +
		gridDim.x * blockIdx.y +
		gridDim.x * gridDim.y * blockIdx.z;
	int tid = blockPosInGrid * threadsPerBlock + threadPosInBlock;

	for(i = 0; i<10; i++) {
		nums[i] = tid;


void main() {
	const int cudaSize = 10;
	int csakazertis[cudaSize];

	int* d_csakazertis;
	cudaMalloc((void**)&d_csakazertis, cudaSize * sizeof(int));

	dim3 block = dim3(8, 8, 8);
	dim3 grid = dim3(16, 16);

	kernelFunc<<<grid,block>>>(d_csakazertis, d_result2, 1, d_result);

	cudaMemcpy(csakazertis, d_csakazertis, cudaSize * sizeof(int), cudaMemcpyDeviceToHost);

	for(int i=0; i<cudaSize; i++) {
		cout << csakazertis[i] << ", ";


My output usually: 130303, 130303, 130303, 130431, 130431, 130431, 130431, … (changed after the 3rd value) WHY???
BUT sometimes: 131007, 131007, 131007, … (no change) NOW WHY???

As you see the values are always changing and two different run produces different values…
Does anyone know why is this? How could I fix it??? (During debugging the Warp Watch shows the correct values…I don’t understand why is this happening…)

Thanks in advance!
Zoli :)

Sorry, in the kernel call there’s only one parameter (d_csakazertis), I’ve just copied it from my bigger code… :)

You’re launching lots of threads that are all writing to the same 10 locations. So they are stepping on each other. The order of thread/block execution is not specified and not guaranteed to be the same from run to run.

Thanks for your reply! :)

If I start less threads…

For example.: I work with…
dim3 block = dim3(2,2,2);
dim3 grid = dim3(2,2);

It gives me 31, 31, 31, …
and sometimes: 15, 15, 15, …

as outputs.

In this case I didn’t notice any changes between the values during the run…