Crash Course to Cuda Terminology and Theory?

I’ve been looking to getting into deeper CUDA programming. I’ve only had some experience with Cupy, basically Numpy but implemented on the GPU, and it doesn’t require any CUDA terminology, except streams for concurrency, but that feature doesn’t work too well anyways, so I haven’t actually used it to write anything useful. I figured it would be a good idea to read up on Terminology for CUDA and GPU programming in general, before diving into language/framework specific code, so does anyone recommend a free, preferably not too long resource that I can use to learn terminology and theory required for CUDA programming?

I’m still new as well but I found that this series of posts by Mark Harris is a very good place to start.

It does not separate the code from the theory however but introduces both together.

Hope that helps.

A much longer introduction is here.