Executing Kernel in loops in Compute Capability 1.3

Hi Friends n Experts

We have a tesla S1070 system with 4 T10 ( C1060 ) GPUs

we work on 1 GPU of the system

the code:

N= (read from input file);

nblocks=(N+319)/320;

for(i=0;i<=100;i++)
kernel<<<nblocks,320>>>(…);

is used in our program.

We have 100 partitions of data, we work on each partition having size <= 9600

when we recreate 100 partitions with each partition size greater than 9600 which is passed to N the output values of the kernel changes.

What could be the problem?

Another doubt is on Compute Capability 1.3, how can we implement concurrent kernel execution which is not supported ?

I’d like to ask that question myself. Are there any problems with your code? Do you get error messages? Are the results different from what you expect?

Without any mention of a problem it is difficult to help you…