Hi, when I try to invoke cudaFuncGetAttributes in a kernel function defined as global, I got cudaErrorUnknown. Here is my code:
__global___ void foo() {
return;
}
__global__ void test_kernel() {
cudaFuncAttributes attr;
auto ret = cudaFuncGetAttributes(&attr, foo);
printf("%d\n", ret);
printf("%x\n", foo);
printf("%d\n", attr.binaryVersion);
return;
}
int main() {
test_kernel<<<1,1>>>();
cudaDeviceSynchronize();
cudaFuncAttributes attr;
auto ret = cudaFuncGetAttributes(&attr, foo);
printf("%d\n", ret);
printf("%x\n", foo);
printf("%d\n", attr.binaryVersion);
return 0;
}
On host side I can get the address and the attributes of foo, but on device side the test_kernel prints 0 for the foo’s address and the cudaFuncGetAttribute returns cudaErrorUnknown.
Using the device runtime API usually requires specifying relocatable code with device linking, and I also recommend explicitly linking against the device runtime, although this last step may not be necessary depending on CUDA version being used.
Your code runs with expected output for me when I add the compilation switches -rdc=true -lcudadevrt.
It’s not rational to expect the foo pointer to print out with the same value from host and device side, because in CUDA it is generally UB to take the address of a device entity in host code.