about thread context switching

I don’t know if “core” is its name but I will call it core as in the CPU. I believe the GPU has hundreds if not thousands of them. Each one is able to execute a thread.

My doubt is:
In the GPU, are cores multi thread? obviously they will execute only one thread at the time, but I was wondering if it is possible to assign more than one thread to a core, maybe different streams can share the same cores.

My doubt arouse when I read this fragment:
“A few characteristics of CUDA programming model are very different from CPU based parallel programming model. One difference is that there is very little overhead creating GPU threads. In addition to fast thread creation, context switches, where threads change from active to inactive and vice versa are very fast compared to CPU threads. The reason context switching is essentially instantaneous on GPU, is that the GPU does not have to store state”

Why the GPU does not have to store state? how does it remember its variables when it return to a thread?

you assign a thread as program/ code to a cpu core; you assign a kernel as program/ code to a gpu sm
you assign kernel dimensions to the kernel - the number of thread blocks; the number of threads per block
each thread ‘participates’ in the kernel, and its execution, with the other threads of the kernel, according to the dimensions of the kernel
each thread has an instruction pointer into the kernel as code
threads of a kernel can be switched on/ off - mostly via their instruction pointer such that it is implied that said threads would/ would not participate in the following x instructions; threads can not enter a kernel, nor leave a kernel, once the kernel commences (at that point, the kernel dimensions are considered constant)