performance of virtual functions in Cuda 4.0

What performance penalty can one expect when using
new virtual functions in Cuda 4.0?

Should they be avoided in high performance kernels?