more than one Kernel functions in CUDA application

psehgal · March 12, 2009, 1:58pm

Can we have more than 1 Kernel function in a CUDA application?
E.g.
main ()
{
Host code 1
…
kernel_fn1()
…
Host code 2
kernel_fn2();

…
}

global kernel_fn1()
{
}

global kernel_fn2()
{
}

Thanks and Regards,
Priya

seibert · March 12, 2009, 1:59pm

Yes.

psehgal · March 12, 2009, 3:54pm

If it does allow more than one kernel function in a program, how does the GPU retrieve the old state of the
application. What I mean by the old state is say, the host program did some cudaMalloc and got some GPU memory.
and it wants to reuse it after hostcode 2 in kernel function 2. how will the GPU know that the host had allocated
some x amount of memory before.
OR does the host have take the initiative of doing some sort of context management calls?

main ()
{
Host code 1
mem1 = cudaMalloc();
…
kernel_fn1(mem1)
…
Host code 2
kernel_fn2(mem1);

…
cudaFree(mem1)
}

MisterAnderson42 · March 12, 2009, 4:14pm

On the contrary, why would allocated memory go poof and dissapear after just a mere function call… That would be rather a rather silly way to do things.

Of course. Read all about it in the programming guide.

If you are using the runtime API, one host thread = one context.

psehgal · March 15, 2009, 7:59pm

The host will not disappear after allocating, but suppose it gets scheduled out after just executing cudaMalloc(),
or after running kernel1 function got invoked but before cudaFree() call, who maintains the information about the size of the
malloced area on the global memory.
Does the malloced area (physical global memory)always remain reserved for this process, till
it frees it, or can the same physical location can be given to some other process by swapping the contents
of the old process?

Thanks and Regards,
Priya.

seibert · March 15, 2009, 8:34pm

As far as CUDA is concerned, memory allocated on the card does not move or disappear until it is freed by the host or the host process terminates. (That is to say, you cannot leak global memory by forgetting to free it at the end of your program.) There is no concept of “swapping” out the contents of global memory for other processes.

psehgal · March 16, 2009, 2:57am

Can the CUDA layer detect that a host has terminated?
If there is no concept of memory swapping, then if suppose two host processes have

allocated cudaMAlloced memory (leaving almost no global memory for other new processes) and have been scheduled out; some other host process might not be able to allocate memory and will have to may be try only after process 1 and/or 2 release memory.

Thanks.

seibert · March 17, 2009, 1:08am

Yes, when the host process terminates, the GPU memory is automatically freed.

This is correct. If a host process allocates all the GPU memory, then no other process will be able to allocate any GPU memory until the first process is done.

shibdas · May 21, 2009, 8:40pm

That’s what I did earlier but it seems that the shared memory is getting overwritten by someone else. At least that’s what is indicated by cuda-gdb. I’m pasting a gdb session to clarify.

[codebox]

reakpoint 1, computeSAR () at SAR_kernel.cu:106

106 sData[i] = gObsPoints_x[i];

Current language: auto; currently c++

(cuda-gdb) p i

$1 = 0

(cuda-gdb) n

[Current CUDA Thread <<<(0,0),(0,0,0)>>>]

computeSAR () at SAR_kernel.cu:107

107 sData[nAptPos + i] = gObsPoints_y[i];

(cuda-gdb)

[Current CUDA Thread <<<(0,0),(0,0,0)>>>]

computeSAR () at SAR_kernel.cu:108

108 sData[2 * nAptPos + i] = gObsPoints_z[i];

(cuda-gdb)

[Current CUDA Thread <<<(0,0),(0,0,0)>>>]

computeSAR () at SAR_kernel.cu:109

109 sData[3 * nAptPos + i] = gFreq[i];

(cuda-gdb)

[Current CUDA Thread <<<(0,0),(0,0,0)>>>]

computeSAR () at SAR_kernel.cu:110

110 sData[4 * nAptPos + i] = gRange[i];

b p sData[3 * nAptPos]

$2 = 5693.09619[/b]

(cuda-gdb) p nAptPos

$3 = 587

b p gFreq[i]

$4 = 9.28808038e+09[/b]

[/codebox]

Even if I’m copying gFreq[i] to sData[ 3 * nAptPos] they are having different values after the assignment. They are within their limits and other threads in the same block are not writing to this same location.

Topic		Replies	Views
Memory on the Nvidia device between kernel calls tends to retain state CUDA Programming and Performance	26	14448	June 21, 2009
Contexts and cudaMallocHost Same rules? CUDA Programming and Performance	17	11255	November 15, 2008
Simple Question about kernels and global memory CUDA Programming and Performance	4	3996	June 12, 2009
cudaMalloced memory cannot be used in other functions memory managment CUDA Programming and Performance	10	7082	May 24, 2010
NVIDIA-SMI shows 111M gpu memory used, after cudaFreeHost release the memory created by cudaHostAlloc CUDA Programming and Performance	1	473	January 5, 2019
Keep previously allocated memory on GPU CUDA Programming and Performance	5	1580	July 2, 2010
Question about resume kernel execution CUDA Programming and Performance	4	8876	August 31, 2010
Memory usage within GPU CUDA Programming and Performance	2	2355	July 13, 2009
send two kernels at one time what will happen? CUDA Programming and Performance	9	3621	July 4, 2008
Global memory occupied until cudaDeviceReset() or app exits CUDA Programming and Performance	0	2513	June 25, 2014

more than one Kernel functions in CUDA application

Related topics