I am integrating CUDA capabilities into an existing Java application of mine in which I am performing matrix multiplication/reduction on large arrays of numbers. The Java code calls the C++ dll I wrote which will perform many iterations of multiplication/reduction before returning.
I have the code working properly but while evaluating the programs’ performance I noticed that after a number of iterations, huge delays start showing up.
Here is some timing data I collected, the columns are 1.) iteration number, 2.) last iteration time, 3.) total time so far
1 72,355 72,355
2 86,534 158,889
3 69,911 228,800
4 69,422 298,222
5 69,422 367,644
6 154,489 522,133
7 70,889 593,022
8 69,422 662,444
9 69,912 732,356
10 69,911 802,267
11 70,400 872,667
12 98,266 970,933
13 70,889 1,041,822
14 75,778 1,117,600
15 69,911 1,187,511
16 69,911 1,257,422
after some period of time though…
760 187,244 71,368,009
761 73,333 71,441,342
762 70,400 71,511,742
763 64,045 71,575,787
764 65,022 71,640,809
765 68,444 71,709,253
766 34,750,227 106,459,480
767 70,400 106,529,880
768 64,533 106,594,413
769 64,534 106,658,947
770 64,044 106,722,991
771 64,045 106,787,036
772 123,533,926 230,320,962
773 73,334 230,394,296
774 77,733 230,472,029
775 69,911 230,541,940
776 67,467 230,609,407
777 69,422 230,678,829
778 69,911 230,748,740
779 107,514,503 338,263,243
780 69,422 338,332,665
781 65,511 338,398,176
782 67,467 338,465,643
783 73,822 338,539,465
784 68,933 338,608,398
785 122,981,483 461,589,881
786 69,422 461,659,303
787 69,911 461,729,214
788 69,422 461,798,636
789 84,089 461,882,725
790 65,511 461,948,236
791 107,980,414 569,928,650
So I start to see that after every 4-6 iterations, the next iteration takes ~100ms. Originally I thought it was a Java problem, but I saw no change when I moved more of the code into the C++ DLL so that Java isn’t calling for each iteration to take place, but only calling for the card to start performing a series of iteration (~1000). Given the unique signature, I am hoping someone will recognize the problem.
And a related question, how do I make sure that Windows is not trying to use the card (GTX 275) for screen display purposes? Could screen refresh requests be causing these delays? Since I have another video adapter, I want the 275 free for HPC purposes only.
Thanks,
Tim