I’m encountering a problem with _launch_bounds(). I want to set up the number of registers per thread manually, so I put _launch_bounds() just in the definition of my global function. However, I’m failed to build the code, I was always told I can’t let a function with maximum registers per thread of 64
call a function with registers of 70. In my opinion, it meant that I was not able to set up my expected number of registers. Then, I realized that there is a device function called by my global function, and I’m pretty sure it causes the trouble. However, I don’t know how to deal with this circumstance, 'cause launch_bounds only works on global function.