GPGPU programming : CPU program and CUDA kernel launching not in a sequential way

Hi all;

I’m trying to write a GPGPU programming for my research project. However, when I tried to perform a GPGPU programming with CUDA on Nvidia GeForce 640M, I found out that the CPU and kernel launching is not in a sequential way when i compiled and run the program. Below here is the code snapshot :

global void insert( … ) {


global_ void match( … ) {


int main () {

// launch the kernel “insert”
insert <<< 6, 8>>>(dev_pHash_, dev_pTable_Size_, dev_pWordListArray, dev_pWordListLength,
dev_pTotalWordListArray, dev_pBitMask , dev_pBF_bit_table);

// Call input function to input string for pattern matching

// Call function StringMatch() to invoke kernel “match”




When i run the program; i noticed that the StringCapture() function run first before the kernel named “insert” got executed.

Anyway to solve this issue ??

I’m using Visual Studio 2008 IDE and running on Windows 7 Platform.

Are you doing proper cuda error checking on all kernel calls and all cuda API calls? Your insert kernel may not be running properly due to errors. As a quick test, you could also try running your code with cuda-memcheck.

Shouldn’t ‘cudaDeviceSynchronize’ be called before ‘StringCapture’? As far as I remember kernels are executed asynchronously, from host point of view, of course. Catching CUDA error could be helpful too.