Passing by reference to __global__ function

I got a question regarding the reference pass to global functions.
As I searched about it, a question about it says it is not supported.
https://forums.developer.nvidia.com/t/passing-by-reference-in-global-functs-arguments/26976

But when I tested the following code on Linux Ubuntu 18.04, it works fine.
Am I misunderstanding something?
Or passing reference to global functions is not illegal?

#include <cuda_runtime.h>
#include <iostream>

template <typename T>
__global__ void kernel(T& a)
{
  atomicAdd(&a, static_cast<T>(1));
}

int main(int argc, char *argv[])
{
  double *ptr;
  cudaMalloc(&ptr, sizeof(double));
  cudaMemset(ptr, 0, sizeof(double));

  kernel <<< 1, 1024 >>> (*ptr);

  double check;
  cudaMemcpy(&check, ptr, sizeof(double), cudaMemcpyDeviceToHost);

  cudaDeviceSynchronize();
  auto status = cudaPeekAtLastError();
  std::cout << "STATUS " << status << std::endl;
  std::cout << "RESULT " << check << std::endl;
}

A problem does not arise just because it is a reference. illegal memory access occurs if it is a reference to host memory. You pass a reference to a double residing in device memory, so the kernel is able to access it.