Titan X (with latest drivers) slower than Titan Black with older drivers

The canonical way to trigger CUDA context initiaization used to be a call cudaFree(0). I don’t think that has changed? Any performance measurements on CUDA APIs should not include context initialization time. I would have thought that this is common knowledge eight years into CUDA’s public existence, but maybe not.

NVIDIA may want to consider adding a sticky post to these forums pointing this out.

Have you tried using the TCC driver with the Titan X? If so did it help your bottleneck?

I didn’t even know about TCC on Titan X until a couple of days ago.

I am currently configured to use v344 drivers with my old Titan Black since it is faster, but I will try TCC with the Titan X soon.

Just installed the latest greatest v358.50 driver.

Of three mex functions, 2 are within 10% of the v344 drivers on Titan Black (One is 10% slower).

The 3rd, which is the most I/O intensive (largest # of inputs) is still 30% slower (7.1ms vs. 5.5ms) on the Titan Black with the v358 driver. This is timing AFTER the data is transferred to the GPU. ONLY difference is the driver.

Now I will re-install the Titan X and see if that offers any improvement.

With Titan X v358, one function is twice as fast as Titan Black v344.
One is 20% faster.

And the problem child is still 10% slower (vs. 30% slower on Titan Black v358).

So while the diver is getting better, it is still slower for functions which have a large number of inputs to the kernel.

And our entire algorithm running on Matlab with a mix of native gpuArray functions and mex CUDA functions, takes 2.5x longer running on the Titan X v358, than it did on Titan Black v344.

So it seems GeForce driver development still has a long way to go for Titan X.