ptxas warning : Stack size for entry function '..' cannot be statically determined

I am seeing a number of warnings of a same type (Stack size for entry function … cannot be statically determined) from ptxas. One of possible reasons could be recursion in code, but the symbols are quite big to read once demangled. Are there any built-in method for figuring out such warnings? Could ptxas provide some more information?

I found cuobjdump. I was thinking maybe dumping PTX / assembly could allow me to look for any recursions.

ptxas warning : Stack size for entry function '_ZN6thrust6system4cuda6detail6detail23launch_closure_by_valueINS2_17for_each_n_detail18for_each_n_closureINS_12zip_iteratorINS_5tupleINS7_INS8_INS_17counting_iteratorIiNS_11use_defaultESA_SA_EENS_17constant_iteratorIPdSA_SA_EENSC_IiSA_SA_EENS_9null_typeESG_SG_SG_SG_SG_SG_EEEENS_20permutation_iteratorINS_6detail15normal_iteratorINS_10device_ptrI10devcomplexIdEEEEENS_18transform_iteratorIN13strided_rangeISQ_E14stride_functorENS9_IlSA_SA_SA_EESA_SA_EEEESG_SG_SG_SG_SG_SG_SG_SG_EEEEjNSK_30device_unary_transform_functorI36SpecialIncoherentResonanceCalculatorEENS3_20blocked_thread_arrayEEEEEvT_' cannot be statically determined
ptxas warning : Stack size for entry function '_ZN6thrust6system4cuda6detail6detail23launch_closure_by_valueINS2_13reduce_detail24unordered_reduce_closureINS_18transform_iteratorI26SpecialResonanceIntegratorNS_12zip_iteratorINS_5tupleINS_17counting_iteratorIiNS_11use_defaultESC_SC_EENS_17constant_iteratorIPdSC_SC_EENS_9null_typeESH_SH_SH_SH_SH_SH_SH_EEEE10devcomplexIdESC_EElSL_NS_6detail15normal_iteratorINS_7pointerISL_NS2_3tagESC_SC_EEEENS_4plusISL_EENS3_20blocked_thread_arrayEEEEEvT_' cannot be statically determined
ptxas warning : Stack size for entry function '_ZN6thrust6system4cuda6detail6detail23launch_closure_by_valueINS2_13reduce_detail24unordered_reduce_closureINS_18transform_iteratorI23SpecialDalitzIntegratorNS_12zip_iteratorINS_5tupleINS_17counting_iteratorIiNS_11use_defaultESC_SC_EENS_17constant_iteratorIPdSC_SC_EENS_9null_typeESH_SH_SH_SH_SH_SH_SH_EEEENSA_IddddddSH_SH_SH_SH_EESC_EElSK_NS_6detail15normal_iteratorINS_7pointerISK_NS2_3tagESC_SC_EEEE17SpecialComplexSumNS3_20blocked_thread_arrayEEEEEvT_' cannot be statically determined
ptxas warning : Stack size for entry function '_ZN6thrust6system4cuda6detail6detail23launch_closure_by_valueINS2_17for_each_n_detail18for_each_n_closureINS_12zip_iteratorINS_5tupleINS7_INS8_INS_17counting_iteratorIiNS_11use_defaultESA_SA_EENS_17constant_iteratorIiSA_SA_EENSC_IPdSA_SA_EENS_9null_typeESG_SG_SG_SG_SG_SG_EEEENS_6detail15normal_iteratorINS_10device_ptrIdEEEESG_SG_SG_SG_SG_SG_SG_SG_EEEEjNSJ_30device_unary_transform_functorI11MetricTakerEENS3_20blocked_thread_arrayEEEEEvT_' cannot be statically determined

These warnings don’t necessarily have to be alarming. As far as I recall, CUDA normally tries to allocate enough stack for every kernel launch based on the stack usage information determined at compile time. For recursive functions, the compiler cannot do this (due to unknown recursion depth) and gives a warning. It is up to the programmer to provide sufficient stack in those scenarios. That doesn’t mean that the default stack allocation couldn’t be sufficient, just that it may be too small.

This is not very different from recursive functions running on a CPU which may likewise exceed the stack size. CPU-side code may incorporate a stack probe mechanism to grow the stack automatically if need be; that is not 100% effective though as I recall.

In this case the affected functions seem to be inside Thrust, and given the very lengthy demangled names (I used http://demangler.com/) I do not know how to locate the responsible function in the Thrust sources. Without inspecting the source code to see how the functions’ recursion depth depends on the size of the array, sequence, list, etc it processes I do not see how one can meaningfully anticipate their stack usage.

I wonder whether some sort of dynamic memory allocation could also be the source of these warnings. I forget whether CUDA offers an alloca()-style allocation that dynamically grows the stack.

It would be interesting to hear from other users of Thrust how they typically deal with these warnings.