kernels doesn't work

Hello,
Recently, I am programming a CUDA program on Jetson TX2&Ubuntu 16.04 .A main program with “.cpp” , and call function “.cu”.For example, using “Test.cpp” call “simple_printf.cu” using “extern “c” int simple_printf();”.
My program shortly shows as fllowing:

kernel:
__global__ void simple_printf()
{
    printf("Get into kernel successful");
}

extern "C" int test()
{
...
int *d_image = NULL;
CHECK(cudaMalloc((void**)&d_image, data_size));
CHECK(cudaMemcpy(d_image, data, data_size, cudaMemcpyHostToDevice));

simple_printf<<<2,200>>>();
}

The problems, If i use the root priority “sudo ./Test”, the program goes, but the kernel will not execute(cannot see the output), if i use “./Test”, the err occurs on code following:“err code 30, Unknown error”

CHECK(cudaMalloc((void**)&d_image, data_size));

The key problem is , the kernel(simple_printf) cannot launch…

It troubles me a long time ,I comes to see whether you know how to solve the program

Add cudaDeviceSynchronize() after the kernel call.
http://15418.courses.cs.cmu.edu/spring2013/article/15

To saulocpp:
Thanks a lot , it’s work well now.I really need to learn how to use synchronize().