Can we have more than 1 Kernel function in a CUDA application?
E.g.
main ()
{
Host code 1
…
kernel_fn1()
…
Host code 2
kernel_fn2();
…
}
global kernel_fn1()
{
}
global kernel_fn2()
{
}
Thanks and Regards,
Priya
Can we have more than 1 Kernel function in a CUDA application?
E.g.
main ()
{
Host code 1
…
kernel_fn1()
…
Host code 2
kernel_fn2();
…
}
global kernel_fn1()
{
}
global kernel_fn2()
{
}
Thanks and Regards,
Priya
Yes.
If it does allow more than one kernel function in a program, how does the GPU retrieve the old state of the
application. What I mean by the old state is say, the host program did some cudaMalloc and got some GPU memory.
and it wants to reuse it after hostcode 2 in kernel function 2. how will the GPU know that the host had allocated
some x amount of memory before.
OR does the host have take the initiative of doing some sort of context management calls?
main ()
{
Host code 1
mem1 = cudaMalloc();
…
kernel_fn1(mem1)
…
Host code 2
kernel_fn2(mem1);
…
cudaFree(mem1)
}
On the contrary, why would allocated memory go poof and dissapear after just a mere function call… That would be rather a rather silly way to do things.
Of course. Read all about it in the programming guide.
If you are using the runtime API, one host thread = one context.
The host will not disappear after allocating, but suppose it gets scheduled out after just executing cudaMalloc(),
or after running kernel1 function got invoked but before cudaFree() call, who maintains the information about the size of the
malloced area on the global memory.
Does the malloced area (physical global memory)always remain reserved for this process, till
it frees it, or can the same physical location can be given to some other process by swapping the contents
of the old process?
Thanks and Regards,
Priya.
As far as CUDA is concerned, memory allocated on the card does not move or disappear until it is freed by the host or the host process terminates. (That is to say, you cannot leak global memory by forgetting to free it at the end of your program.) There is no concept of “swapping” out the contents of global memory for other processes.
Can the CUDA layer detect that a host has terminated?
If there is no concept of memory swapping, then if suppose two host processes have
allocated cudaMAlloced memory (leaving almost no global memory for other new processes) and have been scheduled out; some other host process might not be able to allocate memory and will have to may be try only after process 1 and/or 2 release memory.
Thanks.
Yes, when the host process terminates, the GPU memory is automatically freed.
This is correct. If a host process allocates all the GPU memory, then no other process will be able to allocate any GPU memory until the first process is done.
That’s what I did earlier but it seems that the shared memory is getting overwritten by someone else. At least that’s what is indicated by cuda-gdb. I’m pasting a gdb session to clarify.
[codebox]
reakpoint 1, computeSAR () at SAR_kernel.cu:106
106 sData[i] = gObsPoints_x[i];
Current language: auto; currently c++
(cuda-gdb) p i
$1 = 0
(cuda-gdb) n
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
computeSAR () at SAR_kernel.cu:107
107 sData[nAptPos + i] = gObsPoints_y[i];
(cuda-gdb)
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
computeSAR () at SAR_kernel.cu:108
108 sData[2 * nAptPos + i] = gObsPoints_z[i];
(cuda-gdb)
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
computeSAR () at SAR_kernel.cu:109
109 sData[3 * nAptPos + i] = gFreq[i];
(cuda-gdb)
[Current CUDA Thread <<<(0,0),(0,0,0)>>>]
computeSAR () at SAR_kernel.cu:110
110 sData[4 * nAptPos + i] = gRange[i];
b p sData[3 * nAptPos]
$2 = 5693.09619[/b]
(cuda-gdb) p nAptPos
$3 = 587
b p gFreq[i]
$4 = 9.28808038e+09[/b]
[/codebox]
Even if I’m copying gFreq[i] to sData[ 3 * nAptPos] they are having different values after the assignment. They are within their limits and other threads in the same block are not writing to this same location.