Hi , i am profiling a really simple kernel and there is no shared memory use.
__global__ void stride(const float *A, float *B){
int idx = threadIdx.x+blockDim.x*blockIdx.x;
B[idx]= A[idx*Stride];
}
In memory section of nsight compute, why there are some shared memory use ?
l1tex__data_pipe_lsu_wavefronts_mem_shared is 65536 .
Thanks for the help!