About limitations in data scale of Batch Normalization in cuDNN

The details of issue are in: [Performance] Data size of Batch Normalization using cuDNN in inference. · Issue #17406 · microsoft/onnxruntime · GitHub
It looks like the max data scale of Batch Normalization in cuDNN has limitation, I wonder is there any solutions for it.

Hi @neuqrm ,
Apologies for the delayed response.
We can say that the n dimension currently supported must be <= 65535 in the v7 API.
Can you please help us with more details around your issue. Also which version are you using?

Sorry for late reply. The details of issues are introduced in Github issues from ONNX:

I am using CUDA 11.x with cuDNN8, this happens when I use ONNX try to apply Batch Normalization on data shpe like (N*C), while C is channels, which is in range 16~128 and n is the size of data in a batch, typically about 100,000 and more, this happens when I try to use batch normalization provided by ONNX, which make me fell strange, because the channel is not out of range and it should be about to handle data in this shape.