#pragma hd_warning_disable causes nvcc to generate incorrect code (cuda 9.1).

Hi all,

I ran into a problem trying to suppress a warning about calling a host function from a device function. I already submitted a bug, but thought I’d post here as well, since I spent a lot of time chasing this.

I’m writing some code that uses the Eigen library (Eigen). This library uses templates pretty heavily and it triggers spurious warnings about calling a host function from a host device function. And attempt to suppress this warning using

#pragma hd_warning_disable

causes nvcc to generate incorrect code.

An example is available here: GitHub - konstantin-azarov/nvcc-pragma-bug: Demonstration of nvcc bug with pragmas (the pragma is inserted here: nvcc-pragma-bug/Transform.h at eed4d35545a0662265dc1f2aa664cf84620c190c · konstantin-azarov/nvcc-pragma-bug · GitHub).

If this code is compiled without this pragma (as found in the repository, using build.sh), the SASS code looks as expected:

Fatbin elf code:
================
arch = sm_30
code version = [1,7]
producer = cuda
host = linux
compile_size = 64bit

	code for sm_30
		Function : _Z9TransformN5Eigen9TransformIfLi3ELi2ELi0EEENS_6MatrixIfLi3ELi1ELi0ELi3ELi1EEE
	.headerflags    @"EF_CUDA_SM30 EF_CUDA_PTX_SM(EF_CUDA_SM30)"
                                                                     /* 0x22823232028042b7 */
        /*0008*/                   MOV R1, c[0x0][0x44];             /* 0x2800400110005de4 */
        /*0010*/                   IADD32I R1, R1, -0x18;            /* 0x0bffffffa0105c02 */
        /*0018*/                   F2F.F64.F32 R2, c[0x0] [0x180];   /* 0x1000400601309c04 */
        /*0020*/                   LOP.OR R6, R1, c[0x0][0x24];      /* 0x6800400090119c43 */
        /*0028*/                   F2F.F64.F32 R8, c[0x0] [0x184];   /* 0x1000400611321c04 */
        /*0030*/                   F2F.F64.F32 R10, c[0x0] [0x188];  /* 0x1000400621329c04 */
        /*0038*/                   LOP32I.AND R12, R6, 0xffffff;     /* 0x3803fffffc631c02 */
                                                                     /* 0x22b2b042e042e047 */
        /*0048*/                   STL.64 [R12], R2;                 /* 0xc800000000c09ca5 */
        /*0050*/                   MOV32I R4, 0x0;                   /* 0x1800000000011de2 */
        /*0058*/                   STL.64 [R12+0x8], R8;             /* 0xc800000020c21ca5 */
        /*0060*/                   MOV32I R5, 0x0;                   /* 0x1800000000015de2 */
        /*0068*/                   STL.64 [R12+0x10], R10;           /* 0xc800000040c29ca5 */
        /*0070*/                   MOV R7, RZ;                       /* 0x28000000fc01dde4 */
        /*0078*/                   JCAL 0x0;                         /* 0x1000000000011c07 */
                                                                     /* 0x20000000000002e7 */
        /*0088*/                   EXIT;                             /* 0x8000000000001de7 */
        /*0090*/                   BRA 0x90;                         /* 0x4003ffffe0001de7 */
        /*0098*/                   NOP;                              /* 0x4000000000001de4 */
        /*00a0*/                   NOP;                              /* 0x4000000000001de4 */
        /*00a8*/                   NOP;                              /* 0x4000000000001de4 */
        /*00b0*/                   NOP;                              /* 0x4000000000001de4 */
        /*00b8*/                   NOP;                              /* 0x4000000000001de4 */
		..........................................................................................

If, however the pragma is inserted by uncommenting the definition in test.cu, the code comes out wrong:

Fatbin elf code:
================
arch = sm_30
code version = [1,7]
producer = cuda
host = linux
compile_size = 64bit

	code for sm_30
		Function : _Z9TransformN5Eigen9TransformIfLi3ELi2ELi0EEENS_6MatrixIfLi3ELi1ELi0ELi3ELi1EEE
	.headerflags    @"EF_CUDA_SM30 EF_CUDA_PTX_SM(EF_CUDA_SM30)"
                                                          /* 0x2000000002f2f307 */
        /*0008*/                   MOV R1, c[0x0][0x44];  /* 0x2800400110005de4 */
        /*0010*/                   BPT.TRAP 0x1;          /* 0xd00000000400c007 */
        /*0018*/                   EXIT;                  /* 0x8000000000001de7 */
        /*0020*/                   BRA 0x20;              /* 0x4003ffffe0001de7 */
        /*0028*/                   NOP;                   /* 0x4000000000001de4 */
        /*0030*/                   NOP;                   /* 0x4000000000001de4 */
        /*0038*/                   NOP;                   /* 0x4000000000001de4 */
		..........................................................................................

Hi Konstantin,
I stumbled upon this post while researching a similar issue: warnings in Eigen because of host device methods, and the impact of #pragma nv_exec_check_disable and #pragma hd_warning_disable.

Indeed I can reproduce the code generation problem with your test case.

However, I noticed that if I add

EIGEN_DISABLE_HD_WARNING

to all the specialisations of transform_right_product_impl::run(), as in

diff --git a/eigen/Eigen/src/Geometry/Transform.h b/eigen/Eigen/src/Geometry/Transform.h
index 8e95886..fedd919 100644
--- a/eigen/Eigen/src/Geometry/Transform.h
+++ b/eigen/Eigen/src/Geometry/Transform.h
@@ -1313,6 +1313,7 @@ struct transform_right_product_impl< TransformType, MatrixType, 0, RhsCols>
 {
   typedef typename MatrixType::PlainObject ResultType;
 
+  EIGEN_DISABLE_HD_WARNING
   static EIGEN_STRONG_INLINE ResultType run(const TransformType& T, const MatrixType& other)
   {
     return T.matrix() * other;
@@ -1331,6 +1332,7 @@ struct transform_right_product_impl< TransformType, MatrixType, 1, RhsCols>
 
   typedef typename MatrixType::PlainObject ResultType;
 
+  EIGEN_DISABLE_HD_WARNING
   static EIGEN_STRONG_INLINE ResultType run(const TransformType& T, const MatrixType& other)
   {
     EIGEN_STATIC_ASSERT(OtherRows==HDim, YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES);
@@ -1357,6 +1359,7 @@ struct transform_right_product_impl< TransformType, MatrixType, 2, RhsCols>
 
   typedef typename MatrixType::PlainObject ResultType;
 
+  EIGEN_DISABLE_HD_WARNING
   static EIGEN_STRONG_INLINE ResultType run(const TransformType& T, const MatrixType& other)
   {
     EIGEN_STATIC_ASSERT(OtherRows==Dim, YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES);
@@ -1382,6 +1385,7 @@ struct transform_right_product_impl< TransformType, MatrixType, 2, 1> // rhs is
 
   typedef typename MatrixType::PlainObject ResultType;
 
+  EIGEN_DISABLE_HD_WARNING
   static EIGEN_STRONG_INLINE ResultType run(const TransformType& T, const MatrixType& other)
   {
     EIGEN_STATIC_ASSERT(OtherRows==Dim, YOU_MIXED_MATRICES_OF_DIFFERENT_SIZES);
diff --git a/test.cu b/test.cu
index ce8c749..8551762 100644
--- a/test.cu
+++ b/test.cu
@@ -2,7 +2,7 @@
 #include <stdio.h>
 
 // Uncomment the pragma below to break things 
-#define EIGEN_DISABLE_HD_WARNING // #pragma hd_warning_disable
+#define EIGEN_DISABLE_HD_WARNING #pragma hd_warning_disable
 #include <Eigen/Eigen>
 
 namespace e = Eigen;

I get again the correct SASS code:

code for sm_30
                Function : _Z9TransformN5Eigen9TransformIfLi3ELi2ELi0EEENS_6MatrixIfLi3ELi1ELi0ELi3ELi1EEE
        .headerflags    @"EF_CUDA_SM30 EF_CUDA_PTX_SM(EF_CUDA_SM30)"
                                                                     /* 0x22823232028042b7 */
        /*0008*/                   MOV R1, c[0x0][0x44];             /* 0x2800400110005de4 */
        /*0010*/                   IADD32I R1, R1, -0x18;            /* 0x0bffffffa0105c02 */
        /*0018*/                   F2F.F64.F32 R2, c[0x0] [0x180];   /* 0x1000400601309c04 */
        /*0020*/                   LOP.OR R6, R1, c[0x0][0x24];      /* 0x6800400090119c43 */
        /*0028*/                   F2F.F64.F32 R8, c[0x0] [0x184];   /* 0x1000400611321c04 */
        /*0030*/                   F2F.F64.F32 R10, c[0x0] [0x188];  /* 0x1000400621329c04 */
        /*0038*/                   LOP32I.AND R12, R6, 0xffffff;     /* 0x3803fffffc631c02 */
                                                                     /* 0x22b2b042e042e047 */
        /*0048*/                   STL.64 [R12], R2;                 /* 0xc800000000c09ca5 */
        /*0050*/                   MOV32I R4, 0x0;                   /* 0x1800000000011de2 */
        /*0058*/                   STL.64 [R12+0x8], R8;             /* 0xc800000020c21ca5 */
        /*0060*/                   MOV32I R5, 0x0;                   /* 0x1800000000015de2 */
        /*0068*/                   STL.64 [R12+0x10], R10;           /* 0xc800000040c29ca5 */
        /*0070*/                   MOV R7, RZ;                       /* 0x28000000fc01dde4 */
        /*0078*/                   JCAL 0x0;                         /* 0x1000000000011c07 */
                                                                     /* 0x20000000000002e7 */
        /*0088*/                   EXIT;                             /* 0x8000000000001de7 */
        /*0090*/                   BRA 0x90;                         /* 0x4003ffffe0001de7 */
        /*0098*/                   NOP;                              /* 0x4000000000001de4 */
        /*00a0*/                   NOP;                              /* 0x4000000000001de4 */
        /*00a8*/                   NOP;                              /* 0x4000000000001de4 */
        /*00b0*/                   NOP;                              /* 0x4000000000001de4 */
        /*00b8*/                   NOP;                              /* 0x4000000000001de4 */

Not sure what is going on, but it might help you.

Hi, thanks for the answer.

I filed a bug with NVidia, and they pointed out that Eigen code is not actually correct, since
transform_right_product_impl::run() is called from device code and is not marked as device function.

It is now not clear to me why this is a warning and not an error, or why it generates correct code without the #pragma. Maybe something weird to do with inlining?

Anyway, I fixed the problem in my code by adding more device attributes to Eigen code, which got rid of all warnings - I think it is a better way. It’s not complete (only for the parts that I need), but I can share it if you’re interested.

I see… I guess I will go in that direction myself, adding device as needed.

It’s a pity there is no general solution for propagating attributes (like device) through templates, though.