Windows and linux execution


I try to execute code on linux and on windows.Here us the code:

template <typename Tin, typename TOut>
__global__ void SepColkernelRGB(Tin *ptImIn,TOut *ptImROut,TOut *ptImGOut,TOut *ptImBOut,
                                int w, int h,double *ptKernel,int ikernelSizeX)
    extern __shared__ double  LocalS[];


Call of the function:
     dim3 blocks(ceil((float)iwidth_ / ( BLOCK_SIZE_X)), ceil((float)iheight_ / BLOCK_SIZE_Y));
        dim3 threads(BLOCK_SIZE_X, BLOCK_SIZE_Y);

        int iKernelSize = KGaussianX_G1->iwidth_;
        int isharedMemSize = ((BLOCK_SIZE_X +iKernelSize)*BLOCK_SIZE_Y*3 + iKernelSize)*sizeof(double);
		std::cout << "iKernelSize " << iKernelSize << " isharedMemSize " << isharedMemSize << std::endl;
		std::cout << "blocks " << blocks.x << " " << blocks .y << " threads " << threads.x << " " << threads.y  << std::endl;

On linux the program works properly. On Windows, as soon as iKernelSize is bigger than 31 I get that error:

During the launch of the program :
CUDA Runtime API error 11: invalid argument.

When I use cuda-memcheck :
CUDA Runtime API error 9: invalid configuration argument.
========= Program hit cudaErrorInvalidConfiguration (error 9) due to "invalid configuration argument" on CUDA API call to cudaLaunch.

I think that problem comes from the memory share size. Is there more shared memory size in Linux than in Windows?

Then I try to look at the deviceQuery output. Both are the same output except for the ECC:

Device has ECC support:                        Disabled(Windows) / Yes (linux)

Can someone help me with that?

Someone can explain me how to found the root of my troubles?

So what is being computed for shared memory size on Linux and Windows, respectively?

First the code worked on linux, and it works properly. Then I would like to put the same code on Windows and it crashes.

On windows only the launch of the kernel craches, I remove all th content of th kernel as my previous post. There is nothing in the code.

The size of the blocks and thread and shared memory are the same in linux and in windows. There is noting in the kernel (except the declaration of the shared memort).
On linux, it works and on windows it crashes with the previous errors.

if you have problem with shmem size, the first step is to compute and print this size in both executables

the next idea is to check compile commands, describe here your compiler, driver and gpu model

shmem size was limited to 16 KB on SM 1.0 devices, so if you stuck with older nvcc, it may be the reason

if you can, just publish here the ready-to-build project that can reproduce the error