The “size” here is what were refer to as variable dimension in the physical sense. For example most physical quantities just have a dimension of 1 at a given point. In the Darcy example an input has size=1 since its the permeability field.
In the pythonic or DL sense size here is the channel size of the input tensor. FNO operates on a euclidean grid like a image. So I find thinking about the input for FNO as an image to make the most sense. Its just that Modulus focuses on connected DL with physical quantities, so its named slight different in the docs.