GPUs have multiple constant banks. One of these banks provides the programmer visible constant memory, which is limited to 64 KB across all currently supported GPUs (see CUDA Programming Guide). Another bank is used for passing kernel arguments, yet another for literal constants from the source code or created by the compiler. When you build with -Xptxas -v, the compiler should report the usage of each constant bank (cmem[X]). Example:
ptxas info : 10 bytes gmem, 65536 bytes cmem
ptxas info : Used 13 registers, 328 bytes cmem, 216 bytes cmem
The bank assignments and the sizes of the non-programmer-visible ones vary from architecture to architecture. Best I can tell, for sm_61 kernel arguments use bank 0, literal constants use bank 2, and programmer visible constant memory uses bank 3. Based on your description, it seems you are running out of space in bank 2. This is unusual, I cannot recall having encountered this issue in a dozen years of CUDA programming.
The obvious thing to try is to stick your data into constant memory, i.e. bank 3. If that is not possible, check whether you can compress the data in any way. For example by storing some data as ‘float’ instead of ‘double’ or using narrower integer types. If that is not feasible, store the data in global or shared memory as appropriate.