issues with Tensor having more than 2^31-1 elements

Hi, it seems that cuBLAS uses signed int32 for indexes so it’s impossible to create Tensors bigger than 2^31-1 elements.
with float16 biggest possible tensor occupies less than 4G of GPU memory.