Output is correct in EmuDebug mode but wrong in Debug mode

Hi,

I am facing a problem with my code

CPU CODE


char *** symbol;
int *** tree;
// Allocate memory to tree and symbol;

dp<<<1,3>>>(tree,symbol,threshold);

GPU CODE


global void dp(int ***tree, char **symbol1,int threshold)
{
int ind_tree=blockDim.x
blockIdx.x+threadIdx.x;
for(int i=0;i<3;i++)
{
for(int j=0;j<pow(4.0,i);j++)


// value of symbol1 updated in order to update symbol on CPU


}

The code works fine when I run in EmuDebug mode however when ran otherwise the code messes up somewhere in the kernel.
The values that I get back in ***symbol are same as that before the kernel call. The values of tree however are correctly reflected after the kernel call is over.

Your code fragment explains very little.

It is very likely you are passing host memory, or pointers to host memory, or pointers to pointers to host memory, to your kernel.

Yes I am passing a pointer to host memory. The data structure is arranged as a tree. The dp is a call to do a dominant pass over one tree. I am attaching the .cu file if that can help. The troublesome kernel call is in line 207 and the kernel code starts from line 124. The weird thing is that a very similar data structure in the code (int *** tree) is getting correctly updated.

code.txt (6.97 KB)

A kernel can only be called with a pointer to global memory.

:oops: am so very sorry… I meant global memory only… Very sorry again.

Also I am getting an exception when the program terminates that says

First-chance exception at 0x76b742eb in ezw_cuda16.exe: Microsoft C++ exception: cudaError_enum at memory location 0x0012cdd8…

I am getting this exception only some of the times and not for every run of the code.

My problem is quite simple as I now see it. Can someone please explain as to why pointer p is not getting updated in debug mode? *p is getting updated in emudebug mode but is not happening in debug mode. Thank you.

#include <stdio.h>
#include <stdlib.h>
#include <cuda_runtime.h>
#include <conio.h>
char *p;
global void updatep(char p1)
{
p1=‘e’;
}
int main(int argc, char
argv)
{
p=(char *)malloc(sizeof(char));
*p=‘a’;
printf("%c",*p);
updatep<<<1,1>>>§;
printf("\n%c",*p);
getch();
return 0;
}

How is the GPU going to update a value held on host memory? This works in emulation because then the GPU and host memory are not distinct.

#include <stdio.h>

#include <stdlib.h>

#include <cuda_runtime.h> 

#include <conio.h>

char *p;

__global__ void updatep(char *p1)

{

*p1='e';

}

int main(int argc, char** argv)

{

	p=(char *)malloc(sizeof(char));

	*p='a';

	printf("%c",*p);

// Here you need to allocate global memory on gpu, copy p to that allocated memory.

	updatep<<<1,1>>>(p);

// Here you need to copy global memory from gpu back to the host.

	printf("\n%c",*p);

	getch();

	return 0;

}

I want to thank you for your patience for helping me solve my mistakes. I had to modify my code to make everything work in one dimension. I am happy that now the code is running fine. Thanks for your time.