Can't recognize the nvcuda namespace with compile

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  Off  | 00000000:1A:00.0 Off |                    0 |
| N/A   35C    P0    56W / 300W |   1479MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2...  Off  | 00000000:3D:00.0 Off |                    0 |
| N/A   50C    P0    71W / 300W |   4333MiB / 32768MiB |     69%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  Tesla V100-SXM2...  Off  | 00000000:89:00.0 Off |                    0 |
| N/A   29C    P0    39W / 300W |      2MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   3  Tesla V100-SXM2...  Off  | 00000000:B2:00.0 Off |                    0 |
| N/A   29C    P0    38W / 300W |      3MiB / 32768MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
(base) wurui@node10:~/wurui/experiment$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Sun_Mar_21_19:15:46_PDT_2021
Cuda compilation tools, release 11.3, V11.3.58
Build cuda_11.3.r11.3/compiler.29745058_0
(base) wurui@node10:~/wurui/experiment$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/mnt/7T/lly/gcc-7.5.0/libexec/gcc/x86_64-pc-linux-gnu/7.5.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../configure --disable-checking --enable-languages=c,c++ --disable-multilib --prefix=/mnt/7T/lly/gcc-7.5.0 --enable-threads=posix
Thread model: posix
gcc version 7.5.0 (GCC) 
pytorch   1.11.0      py3.8_cuda11.3_cudnn8.2.0_0    pytorch

following is the code structure

|-- TCGNN_kernel.cu
|-- TCGNN.cpp
|-- setup.py

following is the code:

// TCGNN_kernel.cu
#include <torch/extension.h>
#include <stdio.h>
#include <vector>
#include <iostream>
#include <thrust/device_vector.h>
#include <thrust/sort.h>
#include <cuda_fp16.h>
#include <cuda.h>
#include <mma.h>
#include <cuda_runtime.h>

#include "config.h"
#define WPB 8

using namespace std;
using namespace nvcuda;


std::vector<torch::Tensor> spmm_forward_cuda(
    torch::Tensor nodePointer,
    torch::Tensor edgeList,
    torch::Tensor blockPartition, 
    torch::Tensor edgeToColumn,
    torch::Tensor edgeToRow,
              int num_nodes,
              int num_edges,
              int embedding_dim,
    torch::Tensor input
) 
{
    auto output = torch::zeros_like(input);
    const int num_row_windows = blockPartition.size(0);
    const int WARPperBlock = WPB;

    dim3 grid(num_row_windows, 1, 1);
    dim3 block(WARP_SIZE, WARPperBlock, 1);

    const int dimTileNum = (embedding_dim + BLK_H - 1) / BLK_H;
	const int dynamic_shared_size = dimTileNum * BLK_W * BLK_H * sizeof(float); // dynamic shared memory.
	printf("break2\n");
    spmm_forward_cuda_kernel<<<grid, block, dynamic_shared_size>>>(
                                                                    nodePointer.data<int>(), 
                                                                    edgeList.data<int>(),
                                                                    blockPartition.data<int>(), 
                                                                    edgeToColumn.data<int>(), 
                                                                    edgeToRow.data<int>(), 
                                                                    num_nodes,
                                                                    num_edges,
                                                                    embedding_dim,
                                                                    input.data<float>(), 
                                                                    output.data<float>()
                                                                );
	printf("break10\n");

    // check for error
    cudaError_t error = cudaGetLastError();
    if(error != cudaSuccess)
    {
        // print the CUDA error message and exit
        printf("CUDA error: %s\n", cudaGetErrorString(error));
        exit(-1);
    }
    
    return {output};
}



__global__ void spmm_forward_cuda_kernel(
	const int * __restrict__ nodePointer,		// node pointer.
	const int *__restrict__ edgeList,			// edge list.
	const int *__restrict__ blockPartition, 	// number of TC_blocks (16x8) in each row_window.
	const int *__restrict__ edgeToColumn, 		// eid -> col within each row_window.
	const int *__restrict__ edgeToRow, 		    // eid -> col within each row_window.
	const int numNodes,
	const int numEdges,
	const int embedding_dim,				    // embedding dimension.
	const float *__restrict__ input,		    // input feature matrix.
	float *output							    // aggreAGNNed output feature matrix.
) {
    const unsigned bid = blockIdx.x;								// block_index == row_window_index
	const unsigned wid = threadIdx.y;								// warp_index handling multi-dimension > 16.
	const unsigned laneid = threadIdx.x;							// lanid of each warp.
	const unsigned tid = threadIdx.y * blockDim.x + laneid;			// threadid of each block.
	const unsigned warpSize = blockDim.x;							// number of threads per warp.
	const unsigned threadPerBlock = blockDim.x * blockDim.y;		// number of threads per block.
	if(bid == 0 && tid == 0) printf("break3\n");
	const unsigned dimTileNum = embedding_dim / BLK_H;              // number of tiles along the dimension
	const unsigned nIdx_start = bid * BLK_H;					    // starting nodeIdx of current row_window.
	const unsigned nIdx_end = min((bid + 1) * BLK_H, numNodes);		// ending nodeIdx of current row_window.
	
	const unsigned eIdx_start = nodePointer[nIdx_start];			// starting edgeIdx of current row_window.
	const unsigned eIdx_end = nodePointer[nIdx_end];				// ending edgeIdx of the current row_window.
	const unsigned num_TC_blocks = blockPartition[bid]; 			// number of TC_blocks of the current row_window.
	const unsigned dense_bound = numNodes * embedding_dim;
	printf("break4\n");
	__shared__ float sparse_A[BLK_H * BLK_W];					// row-major sparse matrix shared memory store.
	__shared__ int sparse_AToX_index[BLK_W];					// TC_block col to dense_tile row.
	// __shared__ float dense_X[dimTileNum * BLK_W * BLK_H];	// column-major dense tile [dimTileNum, BLK_W, BLK_H]
	extern __shared__ float dense_X[];
	nvcuda::wmma::fragment<nvcuda::wmma::matrix_a, BLK_H, BLK_H, BLK_W, nvcuda::wmma::precision::tf32, nvcuda::wmma::row_major> a_frag;
	wmma::fragment<wmma::matrix_b, BLK_H, BLK_H, BLK_W, wmma::precision::tf32, wmma::col_major> b_frag;
	wmma::fragment<wmma::accumulator, BLK_H, BLK_H, BLK_W, float> acc_frag;
	wmma::fill_fragment(acc_frag, 0.0f);
	printf("break5\n");
	// Processing TC_blocks along the column dimension of Sparse A.
	for (unsigned i = 0; i < num_TC_blocks; i++){

		// Init A_colToX_row with dummy values.
		if (tid < BLK_W){
			sparse_AToX_index[tid] = numNodes + 1;
		}

		__syncthreads();

		// Init sparse_A with zero values.
		#pragma unroll
		for (unsigned idx = tid; idx < BLK_W * BLK_H; idx += threadPerBlock){
			sparse_A[idx] = 0;
		}

		// Init dense_X with zero values.
		#pragma unroll
		for (unsigned idx = tid; idx < dimTileNum * BLK_W * BLK_H; idx += threadPerBlock){
			dense_X[idx] = 0;
		}

		// Initialize sparse_A by using BLK_H (16) threads from the warp-0.
		// currently fetch all neighbors of the current nodes.
		// then to see whether it can fit into current TC_block frame of column.		
		#pragma unroll
		for (unsigned eIdx = eIdx_start + tid; eIdx < eIdx_end; eIdx += threadPerBlock){
			unsigned col = edgeToColumn[eIdx];
			if (i * BLK_W <= col && col < (i + 1) * BLK_W){			// if the edge in the current TC_block frame of column.
				unsigned row_local = edgeToRow[eIdx] % BLK_H;
				unsigned col_local = col % BLK_W;
				sparse_A[row_local * BLK_W + col_local] = 1;		// set the edge of the sparse_A.
				sparse_AToX_index[col_local] = edgeList[eIdx];		// record the mapping from sparse_A colId to rowId of dense_X.
			}		
		}

		__syncthreads();

		// Initialize dense_X by column-major store,
		// Threads of a warp for fetching a dense_X.
		// each warp identify by wid.
		if (wid < dimTileNum)
			#pragma unroll
			for (unsigned idx = laneid; idx < BLK_W * BLK_H; idx += warpSize){
				unsigned dense_rowIdx = sparse_AToX_index[idx % BLK_W];						// TC_block_col to dense_tile_row.
				unsigned dense_dimIdx = idx / BLK_W;										// dimIndex of the dense tile.
				unsigned source_idx = dense_rowIdx * embedding_dim + wid * BLK_H + dense_dimIdx;
				unsigned target_idx = wid * BLK_W * BLK_H + idx;
				// boundary test.
				if (source_idx >= dense_bound)
					dense_X[target_idx] = 0;
				else
					dense_X[target_idx] = input[source_idx];
			}

		__syncthreads();

		if (wid < dimTileNum)
		{
			wmma::load_matrix_sync(a_frag, sparse_A, BLK_W);
			wmma::load_matrix_sync(b_frag, dense_X + wid * BLK_W * BLK_H, BLK_W);

			#pragma unroll
			for (unsigned t = 0; t < a_frag.num_elements; t++) {
				a_frag.x[t] =  wmma::__float_to_tf32(a_frag.x[t]);
			}

			#pragma unroll
			for (unsigned t = 0; t < b_frag.num_elements; t++) {
				b_frag.x[t] =  wmma::__float_to_tf32(b_frag.x[t]);
			}
			// Perform the matrix multiplication.
			wmma::mma_sync(acc_frag, a_frag, b_frag, acc_frag);
		}
	}

	if (wid < dimTileNum)
		// Store the matrix to output matrix.
		// * Note * embeeding dimension should be padded divisible by BLK_H for output correctness.
		wmma::store_matrix_sync(output + bid * BLK_H * embedding_dim + wid * BLK_H, acc_frag, embedding_dim, wmma::mem_row_major);
}

setup.py like below:

import torch
from setuptools import setup
from torch.utils.cpp_extension import BuildExtension, CUDAExtension
setup(
    name='TCGNN',
    ext_modules=[
        CUDAExtension('TCGNN', [
            'TCGNN.cpp',
            'TCGNN_kernel.cu',
        ], extra_compile_args={'cxx': cxx_args, 'nvcc': nvcc_args})
    ],
    cmdclass={
        'build_ext': BuildExtension
    })

but when I run the command: python setup.py install, I get the error like error: name followed by "::" must be a class or namespace name no instance of function template "nvcuda::wmma::fill_fragment" matches the argument list argument types are: (<error-type>, float

(tcgnn) liyang@node10:~/ly/experiment$ python setup.py install
running install
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/command/easy_install.py:156: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
creating TCGNN.egg-info
writing manifest file 'TCGNN.egg-info/SOURCES.txt'
writing manifest file 'TCGNN.egg-info/SOURCES.txt'
running install_lib
running build_ext
creating /home/liyang/ly/experiment/build
creating /home/liyang/ly/experiment/build/temp.linux-x86_64-3.8
Emitting ninja build file /home/liyang/ly/experiment/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/2] /usr/local/cuda-11.3/bin/nvcc  -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/TH -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.3/include -I/mnt/7T/lly/anaconda3/envs/tcgnn/include/python3.8 -c -c /home/liyang/ly/experiment/TCGNN_kernel.cu -o /home/liyang/ly/experiment/build/temp.linux-x86_64-3.8/TCGNN_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=TCGNN -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -std=c++14
FAILED: /home/liyang/ly/experiment/build/temp.linux-x86_64-3.8/TCGNN_kernel.o 
/usr/local/cuda-11.3/bin/nvcc  -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/TH -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.3/include -I/mnt/7T/lly/anaconda3/envs/tcgnn/include/python3.8 -c -c /home/liyang/ly/experiment/TCGNN_kernel.cu -o /home/liyang/ly/experiment/build/temp.linux-x86_64-3.8/TCGNN_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=TCGNN -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 -std=c++14
/home/liyang/ly/experiment/TCGNN_kernel.cu(127): error: name followed by "::" must be a class or namespace name

/home/liyang/ly/experiment/TCGNN_kernel.cu(127): error: incomplete type is not allowed

/home/liyang/ly/experiment/TCGNN_kernel.cu(128): error: name followed by "::" must be a class or namespace name

/home/liyang/ly/experiment/TCGNN_kernel.cu(128): error: incomplete type is not allowed

/home/liyang/ly/experiment/TCGNN_kernel.cu(129): error: incomplete type is not allowed

/home/liyang/ly/experiment/TCGNN_kernel.cu(130): error: no instance of function template "nvcuda::wmma::fill_fragment" matches the argument list
            argument types are: (<error-type>, float)

/home/liyang/ly/experiment/TCGNN_kernel.cu(191): error: no instance of overloaded function "nvcuda::wmma::load_matrix_sync" matches the argument list
            argument types are: (<error-type>, float [128], int)

/home/liyang/ly/experiment/TCGNN_kernel.cu(192): error: no instance of overloaded function "nvcuda::wmma::load_matrix_sync" matches the argument list
            argument types are: (<error-type>, float *, int)

/home/liyang/ly/experiment/TCGNN_kernel.cu(196): error: namespace "nvcuda::wmma" has no member "__float_to_tf32"

/home/liyang/ly/experiment/TCGNN_kernel.cu(201): error: namespace "nvcuda::wmma" has no member "__float_to_tf32"

10 errors detected in the compilation of "/home/liyang/ly/experiment/TCGNN_kernel.cu".
[2/2] c++ -MMD -MF /home/liyang/ly/experiment/build/temp.linux-x86_64-3.8/TCGNN.o.d -pthread -B /mnt/7T/lly/anaconda3/envs/tcgnn/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/TH -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.3/include -I/mnt/7T/lly/anaconda3/envs/tcgnn/include/python3.8 -c -c /home/liyang/ly/experiment/TCGNN.cpp -o /home/liyang/ly/experiment/build/temp.linux-x86_64-3.8/TCGNN.o -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=TCGNN -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
/home/liyang/ly/experiment/TCGNN.cpp:103:0: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
     #pragma omp parallel for
 
/home/liyang/ly/experiment/TCGNN.cpp:109:0: warning: ignoring #pragma omp parallel [-Wunknown-pragmas]
     #pragma omp parallel for reduction(+:block_counter)
 
In file included from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8:0,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/ATen.h:7,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/liyang/ly/experiment/TCGNN.cpp:1:
/home/liyang/ly/experiment/TCGNN.cpp: In function ‘std::vector<at::Tensor> spmm_forward(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor)’:
/home/liyang/ly/experiment/TCGNN.cpp:27:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:204:64: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
                                                                ^~~~
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:460:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {            \
       ^~~~~~~~~~~~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:27:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:29:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:44:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(input);
   ^
In file included from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/Tensor.h:3:0,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/DeviceGuard.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/ATen.h:11,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/liyang/ly/experiment/TCGNN.cpp:1:
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:210:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8:0,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/ATen.h:7,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/liyang/ly/experiment/TCGNN.cpp:1:
/home/liyang/ly/experiment/TCGNN.cpp:27:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:204:64: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
                                                                ^~~~
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:460:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {            \
       ^~~~~~~~~~~~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:27:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:29:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:45:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(nodePointer);
   ^
In file included from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/Tensor.h:3:0,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/DeviceGuard.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/ATen.h:11,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/liyang/ly/experiment/TCGNN.cpp:1:
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:210:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8:0,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/ATen.h:7,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/liyang/ly/experiment/TCGNN.cpp:1:
/home/liyang/ly/experiment/TCGNN.cpp:27:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:204:64: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
                                                                ^~~~
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:460:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {            \
       ^~~~~~~~~~~~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:27:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:29:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:46:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(edgeList);
   ^
In file included from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/Tensor.h:3:0,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/DeviceGuard.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/ATen.h:11,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/liyang/ly/experiment/TCGNN.cpp:1:
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:210:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8:0,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/ATen.h:7,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/liyang/ly/experiment/TCGNN.cpp:1:
/home/liyang/ly/experiment/TCGNN.cpp:27:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:204:64: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
                                                                ^~~~
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:460:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {            \
       ^~~~~~~~~~~~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:27:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:29:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:47:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(blockPartition);
   ^
In file included from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/Tensor.h:3:0,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/DeviceGuard.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/ATen.h:11,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/liyang/ly/experiment/TCGNN.cpp:1:
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:210:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8:0,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/ATen.h:7,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/liyang/ly/experiment/TCGNN.cpp:1:
/home/liyang/ly/experiment/TCGNN.cpp:27:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:204:64: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
                                                                ^~~~
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:460:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {            \
       ^~~~~~~~~~~~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:27:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:29:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:48:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(edgeToColumn);
   ^
In file included from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/Tensor.h:3:0,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/DeviceGuard.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/ATen.h:11,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/liyang/ly/experiment/TCGNN.cpp:1:
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:210:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
In file included from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/DeviceType.h:8:0,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/Device.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/core/Allocator.h:6,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/ATen.h:7,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/liyang/ly/experiment/TCGNN.cpp:1:
/home/liyang/ly/experiment/TCGNN.cpp:27:42: warning: ‘at::DeprecatedTypeProperties& at::Tensor::type() const’ is deprecated: Tensor.type() is deprecated. Instead use Tensor.options(), which in many cases (e.g. in a constructor) is a drop-in replacement. If you were using data from type(), that is now available from Tensor itself, so instead of tensor.type().scalar_type(), use tensor.scalar_type() instead and instead of tensor.type().backend() use tensor.device(). [-Wdeprecated-declarations]
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                                          ^
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/macros/Macros.h:204:64: note: in definition of macro ‘C10_UNLIKELY’
 #define C10_UNLIKELY(expr) (__builtin_expect(static_cast<bool>(expr), 0))
                                                                ^~~~
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/c10/util/Exception.h:460:7: note: in expansion of macro ‘C10_UNLIKELY_OR_CONST’
   if (C10_UNLIKELY_OR_CONST(!(cond))) {            \
       ^~~~~~~~~~~~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:27:23: note: in expansion of macro ‘TORCH_CHECK’
 #define CHECK_CUDA(x) TORCH_CHECK(x.type().is_cuda(), #x " must be a CUDA tensor")
                       ^~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:29:24: note: in expansion of macro ‘CHECK_CUDA’
 #define CHECK_INPUT(x) CHECK_CUDA(x); CHECK_CONTIGUOUS(x)
                        ^~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:49:3: note: in expansion of macro ‘CHECK_INPUT’
   CHECK_INPUT(edgeToRow);
   ^
In file included from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/Tensor.h:3:0,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/DeviceGuard.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/ATen.h:11,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/types.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader_options.h:4,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/base.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader/stateful.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data/dataloader.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/data.h:3,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/all.h:8,
                 from /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/extension.h:4,
                 from /home/liyang/ly/experiment/TCGNN.cpp:1:
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:210:30: note: declared here
   DeprecatedTypeProperties & type() const {
                              ^~~~
/home/liyang/ly/experiment/TCGNN.cpp: In function ‘std::map<unsigned int, unsigned int> inplace_deduplication(unsigned int*, unsigned int)’:
/home/liyang/ly/experiment/TCGNN.cpp:71:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     while (cur < length){
            ~~~~^~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp: In function ‘void preprocess(at::Tensor, at::Tensor, int, int, int, at::Tensor, at::Tensor, at::Tensor)’:
/home/liyang/ly/experiment/TCGNN.cpp:104:32: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (unsigned nid = 0; nid < num_nodes; nid++){
                            ~~~~^~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:105:51: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
         for (unsigned eid = nodePointer[nid]; eid < nodePointer[nid+1]; eid++)
/home/liyang/ly/experiment/TCGNN.cpp:110:34: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
     for (unsigned iter = 0; iter < num_nodes + 1; iter +=  blockSize_h){
                             ~~~~~^~~~~~~~~~~~~~~
/home/liyang/ly/experiment/TCGNN.cpp:11:25: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
 #define min(x, y) (((x) < (y))? (x) : (y))
                     ~~~~^~~~~
/home/liyang/ly/experiment/TCGNN.cpp:113:42: note: in expansion of macro ‘min’
         unsigned block_end = nodePointer[min(iter + blockSize_h, num_nodes)];
                                          ^
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1740, in _run_ninja_build
    subprocess.run(
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/subprocess.py", line 516, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "setup.py", line 15, in <module>
    setup(
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 148, in setup
    return run_commands(dist)
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/core.py", line 163, in run_commands
    dist.run_commands()
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 967, in run_commands
    self.run_command(cmd)
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
    cmd_obj.run()
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/command/install.py", line 74, in run
    self.do_egg_install()
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/command/install.py", line 116, in do_egg_install
    self.run_command('bdist_egg')
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
    cmd_obj.run()
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 164, in run
    cmd = self.call_command('install_lib', warn_dir=0)
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/command/bdist_egg.py", line 150, in call_command
    self.run_command(cmdname)
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
    cmd_obj.run()
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/command/install_lib.py", line 11, in run
    self.build()
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/command/install_lib.py", line 107, in build
    self.run_command('build_ext')
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/dist.py", line 986, in run_command
    cmd_obj.run()
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 339, in run
    self.build_extensions()
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 741, in build_extensions
    build_ext.build_extensions(self)
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 448, in build_extensions
    self._build_extensions_serial()
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 473, in _build_extensions_serial
    self.build_extension(ext)
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
    _build_ext.build_extension(self, ext)
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/_distutils/command/build_ext.py", line 528, in build_extension
    objects = self.compiler.compile(sources,
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 562, in unix_wrap_ninja_compile
    _write_ninja_file_and_compile_objects(
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1419, in _write_ninja_file_and_compile_objects
    _run_ninja_build(
  File "/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/utils/cpp_extension.py", line 1756, in _run_ninja_build
    raise RuntimeError(message) from e
RuntimeError: Error compiling objects for extension

but when I run the command: TORCH_CUDA_ARCH_LIST="8.6" python setup.py install
it seems can compile well, between is the output:

running install
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/setuptools/command/easy_install.py:156: EasyInstallDeprecationWarning: easy_install command is deprecated. Use build and pip and other standards-based tools.
  warnings.warn(
running bdist_egg
running egg_info
writing manifest file 'TCGNN.egg-info/SOURCES.txt'
running install_lib
running build_ext
Emitting ninja build file /home/liyang/ly/experiment/build/temp.linux-x86_64-3.8/build.ninja...
Compiling objects...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
[1/1] /usr/local/cuda-11.3/bin/nvcc  -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/TH -I/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/THC -I/usr/local/cuda-11.3/include -I/mnt/7T/lly/anaconda3/envs/tcgnn/include/python3.8 -c -c /home/liyang/ly/experiment/TCGNN_kernel.cu -o /home/liyang/ly/experiment/build/temp.linux-x86_64-3.8/TCGNN_kernel.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=TCGNN -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_86,code=sm_86 -std=c++14
/home/liyang/ly/experiment/TCGNN_kernel.cu: In function ‘std::vector<at::Tensor> spmm_forward_cuda(at::Tensor, at::Tensor, at::Tensor, at::Tensor, at::Tensor, int, int, int, at::Tensor)’:
/home/liyang/ly/experiment/TCGNN_kernel.cu:61:126: warning: ‘T* at::Tensor::data() const [with T = int]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
     spmm_forward_cuda_kernel<<<grid, block, dynamic_shared_size>>>(
                                                                                                                              ^
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:232:1: note: declared here
   T * data() const {
 ^ ~~
/home/liyang/ly/experiment/TCGNN_kernel.cu:61:150: warning: ‘T* at::Tensor::data() const [with T = int]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
     spmm_forward_cuda_kernel<<<grid, block, dynamic_shared_size>>>(
                                                                                                                                                      ^
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:232:1: note: declared here
   T * data() const {
 ^ ~~
/home/liyang/ly/experiment/TCGNN_kernel.cu:61:180: warning: ‘T* at::Tensor::data() const [with T = int]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
     spmm_forward_cuda_kernel<<<grid, block, dynamic_shared_size>>>(
                                                                                                                                                                                    ^
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:232:1: note: declared here
   T * data() const {
 ^ ~~
/home/liyang/ly/experiment/TCGNN_kernel.cu:61:208: warning: ‘T* at::Tensor::data() const [with T = int]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
     spmm_forward_cuda_kernel<<<grid, block, dynamic_shared_size>>>(
                                                                                                                                                                                                                ^
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:232:1: note: declared here
   T * data() const {
 ^ ~~
/home/liyang/ly/experiment/TCGNN_kernel.cu:61:233: warning: ‘T* at::Tensor::data() const [with T = int]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
     spmm_forward_cuda_kernel<<<grid, block, dynamic_shared_size>>>(
                                                                                                                                                                                                                                         ^
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:232:1: note: declared here
   T * data() const {
 ^ ~~
/home/liyang/ly/experiment/TCGNN_kernel.cu:61:293: warning: ‘T* at::Tensor::data() const [with T = float]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
     spmm_forward_cuda_kernel<<<grid, block, dynamic_shared_size>>>(
                                                                                                                                                                                                                                                                                                     ^
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:232:1: note: declared here
   T * data() const {
 ^ ~~
/home/liyang/ly/experiment/TCGNN_kernel.cu:61:317: warning: ‘T* at::Tensor::data() const [with T = float]’ is deprecated: Tensor.data<T>() is deprecated. Please use Tensor.data_ptr<T>() instead. [-Wdeprecated-declarations]
     spmm_forward_cuda_kernel<<<grid, block, dynamic_shared_size>>>(
                                                                                                                                                                                                                                                                                                                             ^
/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/include/ATen/core/TensorBody.h:232:1: note: declared here
   T * data() const {
 ^ ~~
creating build/lib.linux-x86_64-3.8
g++ -pthread -shared -B /mnt/7T/lly/anaconda3/envs/tcgnn/compiler_compat -L/mnt/7T/lly/anaconda3/envs/tcgnn/lib -Wl,-rpath=/mnt/7T/lly/anaconda3/envs/tcgnn/lib -Wl,--no-as-needed -Wl,--sysroot=/ /home/liyang/ly/experiment/build/temp.linux-x86_64-3.8/TCGNN.o /home/liyang/ly/experiment/build/temp.linux-x86_64-3.8/TCGNN_kernel.o -L/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/torch/lib -L/usr/local/cuda-11.3/lib64 -lc10 -ltorch -ltorch_cpu -ltorch_python -lcudart -lc10_cuda -ltorch_cuda_cu -ltorch_cuda_cpp -o build/lib.linux-x86_64-3.8/TCGNN.cpython-38-x86_64-linux-gnu.so
/usr/bin/x86_64-linux-gnu-ld: warning: /mnt/7T/lly/anaconda3/envs/tcgnn/lib/libstdc++.so: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/x86_64-linux-gnu-ld: warning: /mnt/7T/lly/anaconda3/envs/tcgnn/lib/libstdc++.so: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
/usr/bin/x86_64-linux-gnu-ld: warning: /mnt/7T/lly/anaconda3/envs/tcgnn/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/x86_64-linux-gnu-ld: warning: /mnt/7T/lly/anaconda3/envs/tcgnn/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
/usr/bin/x86_64-linux-gnu-ld: warning: /mnt/7T/lly/anaconda3/envs/tcgnn/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010001
/usr/bin/x86_64-linux-gnu-ld: warning: /mnt/7T/lly/anaconda3/envs/tcgnn/lib/libgcc_s.so.1: unsupported GNU_PROPERTY_TYPE (5) type: 0xc0010002
creating build/bdist.linux-x86_64
creating build/bdist.linux-x86_64/egg
byte-compiling build/bdist.linux-x86_64/egg/TCGNN.py to TCGNN.cpython-38.pyc
creating build/bdist.linux-x86_64/egg/EGG-INFO
copying TCGNN.egg-info/PKG-INFO -> build/bdist.linux-x86_64/egg/EGG-INFO
copying TCGNN.egg-info/SOURCES.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying TCGNN.egg-info/dependency_links.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
copying TCGNN.egg-info/top_level.txt -> build/bdist.linux-x86_64/egg/EGG-INFO
zip_safe flag not set; analyzing archive contents...
__pycache__.TCGNN.cpython-38: module references __file__
creating dist
removing 'build/bdist.linux-x86_64/egg' (and everything under it)
removing '/mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/TCGNN-0.0.0-py3.8-linux-x86_64.egg' (and everything under it)
creating /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/TCGNN-0.0.0-py3.8-linux-x86_64.egg
Extracting TCGNN-0.0.0-py3.8-linux-x86_64.egg to /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages
byte-compiling /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/TCGNN-0.0.0-py3.8-linux-x86_64.egg/TCGNN.py to TCGNN.cpython-38.pyc
TCGNN 0.0.0 is already the active version in easy-install.pth

Installed /mnt/7T/lly/anaconda3/envs/tcgnn/lib/python3.8/site-packages/TCGNN-0.0.0-py3.8-linux-x86_64.egg
Processing dependencies for TCGNN==0.0.0
Finished processing dependencies for TCGNN==0.0.0

but when i run the program, error occured like this:

tcgnn) wurui@node10:~/wurui/experiment$ ./1_bench_gcn.py
=> citeseer, hiddn: 16
Namespace(classes=6, dataset='citeseer', dim=3703, epochs=200, hidden=16, model='gcn', num_layers=2, single_kernel=False)
TC_Blocks:      1197
Exp_Edges:      153216
Prep. (ms):     1.615
torch.Size([3327, 3703]) torch.Size([3703, 16])
cuda:0 cuda:0
tensor([[-8.7582e+01,  2.4211e+01, -8.3589e-01,  ...,  2.6267e+01,
         -1.1155e+02,  8.8293e+01],
        [ 8.3507e+01,  3.4958e+00,  5.1139e+01,  ..., -9.2721e+01,
         -2.4189e+01, -8.7180e+01],
        [ 3.0647e+01,  1.0059e+01, -2.5137e+01,  ..., -1.7531e+01,
          1.6680e+02, -1.7924e+01],
        ...,
        [-9.7669e+01,  6.3489e-02,  3.4873e+01,  ...,  3.0643e+01,
          3.0913e+01, -7.3062e+01],
        [-1.7428e+01, -5.8077e+01, -8.2955e+01,  ..., -1.5578e+01,
          5.6977e+01,  3.3946e+01],
        [ 7.7431e+01,  4.6048e+01,  3.6308e+01,  ..., -5.4706e+01,
          2.2290e+01, -2.6366e+01]], device='cuda:0')
torch.Size([3327, 16]) cuda:0
break1
break2
break10
CUDA error: no kernel image is available for execution on the device

the print shows that the program didn’t go into the kernel, and when I delete the code related to nvcuda, it compiles well and run well, so there is some problem with the nvcuda namespace.

any response would be appreciated!

Tesla V100 has compute capability 7.0, but it seems like you are specifying compute capability 8.6 via TORCH_CUDA_ARCH_LIST. As GPU architectures lack binary compatibility, that will not work. Try specifying TORCH_CUDA_ARCH_LIST="7.0"

I am not familiar with Torch, so if that does not fix the issue you will have to wait for feedback from other forum participants.

Thanks for your response! I tried TORCH_CUDA_ARCH_LIST="7.0", but I get the compile error same as above, the error still occur while setting TORCH_CUDA_ARCH_LIST=“7.5”,
TORCH_CUDA_ARCH_LIST=“8.0” also can compile well but get the same problem with TORCH_CUDA_ARCH_LIST=“8.6”

My memory is not very good. I was, possibly incorrectly, under the impression that WMMA was introduced with Volta, meaning a Tesla V100 with compute capability 7.0 should support it, and so should software that targets compute capability 7.0.

But your observation would appear to indicate that WMMA is only supported when targeting compute capability 8.0 or higher. If that is in fact so, you cannot use this software on a device with compute capability 7.0 such as the Tesla V100.

[Later:]

I checked NVIDIA’s documentation, and my recollection appears to be accurate in that nvcuda::wmma is supported for compute capability 7.0 or higher:

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#wmma

I don’t know what is going on in your Torch build environment. You would probably want make use of the Torch support infrastructure to resolve this, whatever that is (GitHub?).

Not all wmma variants appeared in the cc7.0 arch/timeframe. If the code compiles correctly for cc8.6 but not for cc7.0 its a good indication the code has a dependency on a newer arch, and it cannot be run on a cc7.0 device, as-is.

Certainly TF32 support was not available in volta/cc7.0, and required a cc8.0 or higher arch/processor.

And of course, if you compile any CUDA code for a cc8.x arch, but try to run it on a cc7.0 device, you are going to have trouble.

since compiling correctly with arch 8.x, I have no idea why it terminate when run the code. Thanks for your reply!

since compiling correctly with arch 8.x, I have no idea why it terminate when running the code.

CUDA error: no kernel image is available for execution on the device

Because of what I said here:

This is a basic CUDA principle. You must compile for a compatible architecture. cc8.x compilation target is not a compatible architecture to run on a cc7.0 device.

It would be like if you specified a CPU compilation target of a sandy bridge architecture but tried to run that code on a nehalem device. It may/probably won’t work. You’ll get an illegal instruction trap or something similar.

when you compile for a cc8.0 architecture, you are generating cc8.0 machine code. cc8.0 machine code cannot/will not run on a cc7.0 device. and the error message you will get is likely to be:

Let me state it differently. cc7.0 does not support TF32. How do you expect to run a CUDA code that requires TF32 on a cc7.0 device?

Just because you can compile that code for a device other than cc7.0, does not mean you can run it on a cc7.0 device.

So Tesla V100 is a cc7.0 device? I think I get the reason, thanks a lot!

Yes, it is cc7.0.

njuffa stated that already here:

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.