Variation in MaxThreadPerBlock property value as determined by MATLAB 'gpuDevice' and by cuda kernel

I get two different values for MaxThreadPerBlock property as explained below -

  1. Following properties are displayed by gpuDevice command of MATLAB -
    parallel.gpu.CUDADevice handle
    Package: parallel.gpu

Name: ‘Tesla C2075’
Index: 1
ComputeCapability: ‘2.0’
SupportsDouble: 1
DriverVersion: 7.5
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [65535 65535]
SIMDWidth: 32
TotalMemory: 5.6368e+09
FreeMemory: 5.5524e+09
MultiprocessorCount: 14
ClockRateKHz: 1147000
ComputeMode: ‘Default’
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 0
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1

  1. However when I launch CUDA kernel following is displayed -
    parallel.gpu.CUDAKernel handle
    Package: parallel.gpu

ThreadBlockSize: [1 1 1]
MaxThreadsPerBlock: 512
GridSize: [1 1]
SharedMemorySize: 0
EntryPoint: ‘Z10testKernelPfPtS_S_S0_S_S0_S0_S0
MaxNumLHSArguments: 9
NumRHSArguments: 9
ArgumentTypes: {1x9 cell}

While in first case the value is 1024, it is shown as 512 in second case. Any solutions please???

how did you compile the CUDA kernel? Be specific: provide the CUDA toolkit version and the exact compile command you used to compile the CUDA kernel.

Thanks for reply…

Toolkit version 7.5
with MSVS 2012 and MATLAB R2012a (with patch from MATLAB)

I used following command for compilation:
nvcc -m 64 -ptx -arch sm_20 -ccbin “C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin” -I “C:\Program Files\MATLAB\R2012a\extern\include” --output-file eN10.ptx