Hi Friends n Experts
We have a tesla S1070 system with 4 T10 ( C1060 ) GPUs
we work on 1 GPU of the system
the code:
N= (read from input file);
nblocks=(N+319)/320;
for(i=0;i<=100;i++)
kernel<<<nblocks,320>>>(…);
is used in our program.
We have 100 partitions of data, we work on each partition having size <= 9600
when we recreate 100 partitions with each partition size greater than 9600 which is passed to N the output values of the kernel changes.
What could be the problem?
Another doubt is on Compute Capability 1.3, how can we implement concurrent kernel execution which is not supported ?