I am new to CUDA .I wanted to test some code and run on GPU .So i took a sample code that i got online .The code gives the time of execution on GPU .Then i gave different combinations of number of blocks and number of threads for the kernel under the condition that product of blocks and threads should be less than 512(num of blocks * num of threads <=512)
I ran the program and the output i am attaching
( the output format is )
Then i did graph on the output file in matlab .The output graph wasnot as expected…
There were peaks(major fluctuations)in between the graph at many places and minor fluctuations (can be ignored i guess) .
My point was to find what is the optimum number of threads and blocks combination for the program (i want to do the same for my project some time later ).
By the way i am connected to a remote system which has GPU on it !!
I am attaching code ,plus the output file