GeForce Titan X (3072 cores) and Quadro K2000M (384 cores) same performance

I am running an OpenCL benchmark program through a Java wrapper library JavaCL. It reports the same performance when I run it against the GeForce Titan X and Quadro K2000M. I do not know whether there are any tuning parameters for the Titan X. I run both with block size of 1024, 512 etc but do not see much improvement when using GeForce Titan X.
The link to the code: https://github.com/klonikar/nativelibs4java/blob/master/libraries/OpenCL/Core/src/test/java/com/nativelibs4java/opencl/OpenCL4JavaBasicTest.java

java -Xmx1G -Xms1G -cp …\scalacl\javacl-1.0.0-RC3-shaded.jar;bin trial.javacl.OpenCL4JavaBasicTest 10000000
java -Xmx1G -Xms1G -cp …\scalacl\javacl-1.0.0-RC3-shaded.jar;bin trial.javacl.OpenCL4JavaBasicTest 10000000 double

Without going into details, the program basically computes an expression for every element of an array, which is a basic map operation. I get almost 290ms for computing the expression on 10 million numbers (float or double). The breakup of compute time and data transfer time differs slightly but the total time is almost the same.

Am I misinterpreting the results? Would Cuda show significant performance difference?

You are posting in the wrong sub forum.

The GTX Titan X is probably a factor of about 4-8 times faster than that Quadro for compute. Because you utilizing GPU resources in a atypical way (OpenCL via Java) that probably has something to do with the issue. This mainly applies to 32 bit performance, and it appears that you are using 64 bit computation. Try with 32 bit computation.

CUDA is far more developed language than OpenCL and has a more robust software ecosystem. I suggest using CUDA C with C++ rather than Java with OpenCL.

If you want an explicit comparison run the CUDA-Z utility on your PC which will show you the performance difference between the two GPUs.

http://cuda-z.sourceforge.net/

Thanks for replying. Can you tell me which is the correct forum/sub forum for this kind of question?

I tried with 32 bit (float) as well, but the difference wasn’t much, certainly not 4-8 times.

I would like to point out that running on Java did not make a difference. I tried OpenCL via C as well, the result being similar. The reason being Java is ultimately using OpenCL C APIs, and the computation is happening on GPU, while Java code just waits for the computation to finish.

However, I agree that CUDA may give better results compared to OpenCL. I will certainly try CUDA and see if it makes any difference. The cuda-z tool showed pretty large differences between Titan X and Quadro. Thanks for the link. Let me verify with direct CUDA code too.

Is it?? Have a look at jcuda, that’s a pretty good CUDA wrapper for Java:

http://www.jcuda.org/