Dear All.
I have some questions.
-
what is the real shared memory size of my gpu(titan v)?.
I write a program with the parameter(sharedMemPerBlock) showing that my the shared memory storage capacity per block is 64kb. However I see the post that someone tell there are 96 KB shared memory on Volta(https://devtalk.nvidia.com/default/topic/1052021/cuda-programming-and-performance/shared-memory-size-per-thread-block/). However it is 128kb in the manual of nvidia and may it contains a unified data cache?.(Programming Guide :: CUDA Toolkit Documentation). So which is the true? -
are the threads of warp run serially? is warp the base scheduling unit in all the architecture?
if it is, so do the threads of a warp run in a sp serially in hardware level? -
does the shared memory storage capacity per block mean that the shared memory storage capacity of per sm?
so, if a block occupy the all shared memory, does the sm resident only a block?
Also, suppose that one sm have 64 sp, and mean that it can execute 64 warp in parallel, so if my block size is 128 containing 4 warps and the block occupys the all shared memory, so does the sm just run only 4 threads parallel in hardware level? -
Programming Guide :: CUDA Toolkit Documentation.
does the “The Volta architecture introduces Independent Thread Scheduling which changes the way threads are scheduled on the GPU. For code relying on specific behavior of SIMT scheduling in previous architecures, Independent Thread Scheduling may alter the set of participating threads, leading to incorrect results. To aid migration while implementing the corrective actions detailed in Independent Thread Scheduling, Volta developers can opt-in to Pascal’s thread scheduling with the compiler option combination -arch=compute_60 -code=sm_70.” means that the scheduling of threads will never be warp in the volta?
Excuse me for bad english. Thank you very much.