Do CUDA Runtime APIs launch kernels internally?

Do CUDA Runtime APIs launch kernels internally? Or are they fundamentally different from kernels?

A few may do that. I am reasonably certain that cudaMemcpy() uses a kernel to copy data when both source and destination are in device memory.

Most CUDA API calls manipulate host-side control structures, primarily the CUDA context but also (indirectly) OS control structures by making calls to operating system APIs. Much of this is single-threaded CPU activity.

I believe cudaMemset/Async may launch a kernel under some circumstances also.