Returning to host code in pyCUDA after asynchronous kernel launch [pyCUDA]

gpu_control · May 8, 2015, 4:31pm

I am trying to launch a kernel in pyCUDA and then terminate the kernel by writing to a GPU global memory location. Here is a simple example kernel that I would like to be able to terminate at some point after it enters the infinite while loop:

__global__ void countUp(u16 *inShot, u64 *counter) {
  while(inShot[0]) {
    counter[0]++;
  }
}

From what I have read about streams in CUDA, I should be able to launch this kernel after creating a stream and it will be non-blocking on the host, ie. I should be able to do stuff on the host after this kernel is launched and is running. I compile the above kernel to a cubin file and launch it in pyCUDA like so:

import numpy as np
from pycuda import driver, compiler, gpuarray, tools
# -- initialize the device
import pycuda.autoinit

strm1 = driver.Stream()

h_inShot = np.zeros((1,1))
d_inShot = gpuarray.to_gpu_async(h_inShot.astype(np.uint16), stream = strm1)
h_inShot = np.ones((1,1))
h_counter = np.zeros((1,1))
d_counter = gpuarray.to_gpu_async(h_counter.astype(np.uint64), stream = strm1)

testCubin = "testKernel.cubin"
mod = driver.module_from_file(testCubin)
countUp = mod.get_function("countUp")

countUp(d_inShot, d_counter,
        grid = (1, 1, 1),
        block = (1, 1, 1),
        stream = strm1
        )

Running this script causes the kernel to enter an infinite while loop for obvious reasons. Launching this script from the ipython environment does not seem to return control to the host after the kernel launch (I can’t input new commands as I guess its waiting for the kernel to finish). I would like control to return to the host so that I can change the value in GPU global memory pointer d_inShot and have the kernel exit the while loop. Is this even possible and if so, how do I do it in pyCUDA? Thanks.

little_jimmy · May 9, 2015, 1:57pm

do you have additional device calls after the kernel launch - a supposed asynchronous memory copy from device to host, that is not acting asynchronously perhaps?

“I can’t input new commands as I guess its waiting for the kernel to finish”

can’t you input new commands, or does the input have no effect?
there is a difference

Robert_Crovella · May 9, 2015, 7:48pm

seems the answer is discovered already:

[url]python - Returning to host code in pyCUDA after asynchronous kernel launch - Stack Overflow

Topic		Replies	Views
Scheduling a kernel asynchronously from inside another kernel CUDA Programming and Performance	3	408	May 12, 2023
Accessing cudaLaunchCooperativeKernel api from python (pycuda, cupy, etc..?) CUDA Programming and Performance	3	738	September 13, 2019
Thread destiny where do they go? CUDA Programming and Performance	1	1141	September 7, 2008
Strange memory corruption GPU-Accelerated Libraries	1	402	October 26, 2020
Async Kernel launch cpu seems not getting control after kernel launch CUDA Programming and Performance	7	3156	December 3, 2008
launch another kernel from a kernel on the GPU itself CUDA Programming and Performance	3	3957	June 6, 2009
Host CPU busy while waiting ? CUDA Programming and Performance	3	2129	May 5, 2009
Infinite loop in CUDA kernel CUDA Programming and Performance	11	15965	October 25, 2010
Defining Class Functions as Host and Device in Python Using PyCUDA CUDA Programming and Performance pycuda	3	1150	May 10, 2022
cudaMemcpy during kernel execution asynchronous kernel launch CUDA Programming and Performance	2	3083	July 20, 2007

Returning to host code in pyCUDA after asynchronous kernel launch [pyCUDA]

Related topics