Receiving ptxas warnings "Stack size for entry function cannot be statically determined" building Tensorflow 2.0 kernel

jlt · August 27, 2019, 1:08pm

We are building the Tensorflow 2.0 kernel with cuda 10.1 and receiving the ptxas warnings: “Stack size for entry function cannot be statically determined.”

The Tensorflow 2.0 part tensorflow/fake_quant_ops_gpu.cu.cc at r2.0 · tensorflow/tensorflow · GitHub shows the ptxas warnings:

ptxas warning : Stack size for entry function ‘ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_16TensorChippingOpILl0ENS_9TensorMapINS_6TensorIfLi1ELi1ElEELi16ENS_11MakePointerEEEEEKNS_17TensorReductionOpINS0_10SumReducerIfEEKNS_6DSizesIlLi1EEEKNS_19TensorCwiseBinaryOpINS0_17scalar_product_opIKfSJ_EEKNS4_ILl1EKNS5_INS6_ISJ_Li2ELi1ElEELi16ES8_EEEEKNS_14TensorSelectOpIKNSH_INS0_13scalar_cmp_opISJ_SJ_LNS0_14ComparisonNameE1EEESP_KNS_20TensorCwiseNullaryOpINS0_18scalar_constant_opISJ_EESP_EEEESY_SY_EEEES8_EEEENS_9GpuDeviceEEElEEvT_T0’ cannot be statically determined

How can the problem that is causing the ptxas warning be diagnosed?

Building with the --ptxas-options=-v option shows:

ptxas info : Compiling entry function ‘ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_16TensorChippingOpILl0ENS_9TensorMapINS_6TensorIfLi1ELi1ElEELi16ENS_11MakePointerEEEEEKNS_17TensorReductionOpINS0_10SumReducerIfEEKNS_6DSizesIlLi1EEEKNS_19TensorCwiseBinaryOpINS0_17scalar_product_opIKfSJ_EEKNS4_ILl1EKNS5_INS6_ISJ_Li2ELi1ElEELi16ES8_EEEEKNS_14TensorSelectOpIKNSH_INS0_13scalar_cmp_opISJ_SJ_LNS0_14ComparisonNameE1EEESP_KNS_20TensorCwiseNullaryOpINS0_18scalar_constant_opISJ_EESP_EEEESY_SY_EEEES8_EEEENS_9GpuDeviceEEElEEvT_T0’ for ‘sm_70’
ptxas info : Function properties for ZN5Eigen8internal15EigenMetaKernelINS_15TensorEvaluatorIKNS_14TensorAssignOpINS_16TensorChippingOpILl0ENS_9TensorMapINS_6TensorIfLi1ELi1ElEELi16ENS_11MakePointerEEEEEKNS_17TensorReductionOpINS0_10SumReducerIfEEKNS_6DSizesIlLi1EEEKNS_19TensorCwiseBinaryOpINS0_17scalar_product_opIKfSJ_EEKNS4_ILl1EKNS5_INS6_ISJ_Li2ELi1ElEELi16ES8_EEEEKNS_14TensorSelectOpIKNSH_INS0_13scalar_cmp_opISJ_SJ_LNS0_14ComparisonNameE1EEESP_KNS_20TensorCwiseNullaryOpINS0_18scalar_constant_opISJ_EESP_EEEESY_SY_EEEES8_EEEENS_9GpuDeviceEEElEEvT_T0
800 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 58 registers, 1160 bytes cmem[0]

The ptxas-options=-v information almost appears to conflict with the ptxas warnings. The ptxas-options=-v information shows the stack frame size.

How can the problem or problems that are causing the ptxas warnings be diagnosed? Are there other build options that would produce additional diagnostic data?

Are these messages actually warnings, or are they informational messages instead?

Is there any documentation on what instances can cause these ptxas warnings? The nvcc compiler is not open source, meaning the code cannot be reviewed to determine what checks are being made.

Reviewing other issues on the developer forums, there are hints that this warning may be caused due to recursion, but reviewing the Tensorflow kernel code, there doesn’t appear to be recursion in this case. What other code sequences would cause the warning?

Thanks for any additional information.