problem with shared memory

I am using K80 Tesla GPU with 48 kb shared memory per sm. I have launched a kenel like

__global__ void kernel( some inputs )
       {
// i is the equation number
	int i = threadIdx.x;

// allocating memory in the shared memory for all 6 arrays 
 /*     __shared__ double a[n];    __shared__ double b[n];                  //case 1
        __shared__ double c[n];    __shared__ double d[n];
        __shared__ double e[n];    __shared__ double x[n];
*/	
	extern __shared__ double array [];                                  // case 2
		 
	double *a = array;	 		double *b = (double*) &a[n];
        double *c = (double*) &b[n];	 	double *d = (double*) &c[n];
        double *e = (double*) &d[n];	 	double *x = (double*) &e[n];
        
 // rest of kernel unnecessary for this question 

}

int main(int argc, char ** argv)
      {

 // launching the kernel
        // kernel<<<1, n>>>(some inputs);                  //case -1
      kernel<<<1, n, 6*n*sizeof(double)>>>(some inputs);  // case -2
       checkCuda( cudaPeekAtLastError() );	checkCuda(cudaDeviceSynchronize());
  
      }

For case -1, I can launch the kernel with around 750 threads but for case -2 it increased and up to 896. Six double type arrays of size 896 need 68968 = 43008 bytes = 43.008 kb. I know that I cannot use all 48 kb but 43 kb looks little less for me, any justification? Also what is the problem with case 1 ans case 2, why there is a difference between number of threads?

Thanks in advance.