Invoke cudaFuncGetAttributes in kernel

343694462 · July 6, 2023, 6:03am

Hi, when I try to invoke cudaFuncGetAttributes in a kernel function defined as global, I got cudaErrorUnknown. Here is my code:

__global___ void foo() {
    return;
}

__global__ void test_kernel() {
    cudaFuncAttributes attr;
    auto ret = cudaFuncGetAttributes(&attr, foo);
    
    printf("%d\n", ret);
    printf("%x\n", foo);
    printf("%d\n", attr.binaryVersion);
    return;
}

int main() {
    test_kernel<<<1,1>>>();
    cudaDeviceSynchronize();

    cudaFuncAttributes attr;
    auto ret = cudaFuncGetAttributes(&attr, foo);
    
    printf("%d\n", ret);
    printf("%x\n", foo);
    printf("%d\n", attr.binaryVersion);
    return 0;
}

On host side I can get the address and the attributes of foo, but on device side the test_kernel prints 0 for the foo’s address and the cudaFuncGetAttribute returns cudaErrorUnknown.

I was using A100 with cuda 11.6.

Robert_Crovella · July 6, 2023, 3:00pm

Using the device runtime API usually requires specifying relocatable code with device linking, and I also recommend explicitly linking against the device runtime, although this last step may not be necessary depending on CUDA version being used.

Your code runs with expected output for me when I add the compilation switches -rdc=true -lcudadevrt.

$ cat t4.cu
#include <cstdio>

__global__ void foo() {
    return;
}

__global__ void test_kernel() {
    cudaFuncAttributes attr;
    auto ret = cudaFuncGetAttributes(&attr, foo);

    printf("%d\n", ret);
    printf("%p\n", foo);
    printf("%d\n", attr.binaryVersion);
    return;
}

int main() {
    test_kernel<<<1,1>>>();
    cudaDeviceSynchronize();

    cudaFuncAttributes attr;
    auto ret = cudaFuncGetAttributes(&attr, foo);

    printf("%d\n", ret);
    printf("%p\n", foo);
    printf("%d\n", attr.binaryVersion);
    return 0;
}
$ nvcc -o t4 t4.cu -rdc=true -lcudadevrt
$ compute-sanitizer ./t4
========= COMPUTE-SANITIZER
0
0x7f93bf079e00
90
0
0x55e8a6667060
90
========= ERROR SUMMARY: 0 errors
$

It’s not rational to expect the foo pointer to print out with the same value from host and device side, because in CUDA it is generally UB to take the address of a device entity in host code.

system · July 20, 2023, 3:01pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
cudaFuncGetAttributes bug in CUDA 5? CUDA Programming and Performance	0	538	December 12, 2012
cudaFuncGetAttributes return unexpected result CUDA Programming and Performance	2	11362	January 24, 2011
How to set function attributes for kernels loaded from .cubin file. CUDA Programming and Performance	3	576	September 29, 2019
cudaPointerGetAttributes returns cudaErrorInvalidValue for host-pinned mem on Win 32-bit build CUDA Programming and Performance cuda	2	1091	January 6, 2021
regarding cudaFuncGetAttributes and -ptx flag CUDA Programming and Performance	0	981	August 29, 2011
cuPointerGetAttribute() fails with CUDA_ERROR_INVALID_DEVICE CUDA Programming and Performance	0	766	April 10, 2014
Problems calling __device__ function in __host__ __device__ function CUDA Programming and Performance	4	2856	April 9, 2013
Device ID query from device thread. CUDA Programming and Performance	4	2995	November 8, 2018
PGF90-S-0528: Device attribute mismatch Legacy PGI Compilers	3	3969	October 9, 2012
cudaPointerGetAttributes CUDA Programming and Performance	0	993	August 3, 2011

Invoke cudaFuncGetAttributes in kernel

Related topics