I am running an OpenCL benchmark program through a Java wrapper library JavaCL. It reports the same performance when I run it against the GeForce Titan X and Quadro K2000M. I do not know whether there are any tuning parameters for the Titan X. I run both with block size of 1024, 512 etc but do not see much improvement when using GeForce Titan X.
The link to the code: https://github.com/klonikar/nativelibs4java/blob/master/libraries/OpenCL/Core/src/test/java/com/nativelibs4java/opencl/OpenCL4JavaBasicTest.java
java -Xmx1G -Xms1G -cp …\scalacl\javacl-1.0.0-RC3-shaded.jar;bin trial.javacl.OpenCL4JavaBasicTest 10000000
java -Xmx1G -Xms1G -cp …\scalacl\javacl-1.0.0-RC3-shaded.jar;bin trial.javacl.OpenCL4JavaBasicTest 10000000 double
Without going into details, the program basically computes an expression for every element of an array, which is a basic map operation. I get almost 290ms for computing the expression on 10 million numbers (float or double). The breakup of compute time and data transfer time differs slightly but the total time is almost the same.
Am I misinterpreting the results? Would Cuda show significant performance difference?