1D convolution when choosing FFT or brute force convolution?

I have to perform 1D convolution of a 4k samples vector with a longer (infinite) vector.
In general a convolution may be performed by performing FFT or performing the usual scalar product. By experience (and by theoretical estimates) whenever the smaller vector is longer then 10-100 samples FFT strategy is advantageous. This empirical threshold level changes according to the architectures and to the implementations: what’s its value on a TESLA?
I already performed some preliminary tests and it seems FFT strategy to be best performing in this case too, but to be sincere, my brute-force-convolution (BFC) code is absolutely not optimized, and maybe through a good optimization it may happen to the ratio FFT time / BFC time to dramatically decrease (now FFT is ~80 times faster).
Did anybody has already some experience in that?

francesco