Looking for a simplest CMake file to build CUDA Hello World project on Linux

I’m using Ubuntu 18.04.5 with gcc-9.3.0 and Cuda compilation tools, release 11.0, V11.0.194. I’m trying to build this simple CUDA project - only one file cud.cu:

#include<stdio.h>
#include<stdlib.h>

__global__ void print_from_gpu(void) {
  printf("Hello World! from thread [%d,%d] From device\n", threadIdx.x,blockIdx.x);
}

int main() {
  printf("Hello World from host!\n");
  print_from_gpu<<<1,1>>>();
  cudaDeviceSynchronize();
}

I can build it, but the resulting program prints only from the host, not the device:

(base) paul@extra:~/st/cu$ nvcc -o cud cud.cu
(base) paul@extra:~/st/cu$ ./cud
Hello World from host!
(base) paul@extra:~/st/cu$ 

I have two questions:

  1. How to build it to print from the device too?
  2. How to do it using CMake?

I tried this CMakeLists.txt:

cmake_minimum_required(VERSION 3.19)
project(cud CXX CUDA)

add_executable(cud cud.cu)

but the result is the same, i.e. no printing from device. I do have NVIDIA GPU, of course:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.102.04   Driver Version: 450.102.04   CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GT 730      Off  | 00000000:01:00.0 N/A |                  N/A |
| 30%   28C    P0    N/A /  N/A |    487MiB /  1998MiB |     N/A      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

I was missing:

set_property(TARGET cud PROPERTY CUDA_ARCHITECTURES 35)

in my CMakeLists.txt. The default value for CUDA_ARCHITECTURES was 52 and generated device code was crashing silently on my old GPU.