I am running my code on two machines:
8800 gt – 1.1
quadro 5800fx – 1.3
i use one atomicInc() to assign value to some variable… d atomic increment works on unsigned int variable in device memory.
I am observing some serious timing overheads while doing this on quadro machine.
my kernel with atomicInc() takes:
On quadro — 5.8 msec
On 8800gt — 4.7 msec BOTH on exactly same code.
my kernel without atomicInc() (i.e i replace d result of atomicInc by a ZERO) takes:
On quadro — 1.9 msec
On 8800gt — 3.2 msec
SO code w/o atomic function scales well onto quadro from 8800 gt but does very badly on quadro when using atomicInc… any reason for such behavior?
Thanks u all for help!