Can I assume that by “creating a context” you’re referring to the time taken to compile the context, and not the time taken to generate the context object? I experienced something similar: 1-2 seconds to compile on my Quadro K4000 vs 5-10 seconds on two Tesla K40 accelerators.
I did some profiling and determined that for the machine with two Teslas, OptiX goes through the process of compiling twice - once for each card, even though they’re identical cards. Also, if your remote machine has to read your PTX files over a network (like mine does), that could be slowing it down as well.