Pinned memory slowdown in cuda kernels

I have some code that access shared pinned memory passed into the kernel as a pointer

The pointer is used very rarely by the cuda kernel.

However, the is a huge slowdown in computations per second when the kernel even think kernel is going to write the shared memory, the performance goes down even though the branches are never taken.
All the kernels write to the same location.

Any suggestions?