Hello to all,
I have two queries regarding the Launch Bounds (CUDA_C_Programming guide for CUDA 4.0 RC2)
In the CUDA_C_Programming guide for CUDA 4.0 RC2 page 143 reads
“If launch bounds are specified, the compiler first derives from them the upper limit L on the number of
registers the kernel should use to ensure that minBlocksPerMultiprocessor blocks (or a single block if
minBlocksPerMultiprocessor is not specified) of maxThreadsPerBlock threads can reside on the multiprocessor
(see Section 4.2 for the relationship between the number of registers used by a kernel and the number of
registers allocated per block). The compiler then optimizes register usage in the following way:…”
Q1. ) What if the upper limit L on the number of registers the kernel would use exceeds the number of registers
available per multiprocessor ? Nothing is mentiond about this in the guide.
Q2.) If the launch bounds are evaluated and optimization is done at compile time, will the compile fail even if ‘execution configuration’
( <<<n_blocks_grid,m_threads_block>>> is such that n_blocks_grid = minBlocksPerMultiprocessor and m_threads_block < maxThreadsPerBlock) ?