Do Vision Transformers See Like Convolutional Neural Networks?

Move over CNNs, there is a new player in town that will determine if its Hotdog or Not Hotdog :)
