Atomic functions Better programming on cheaper GPU ?

ErenS · July 17, 2007, 11:02am

I have a 8800 gtx nvidia gpu however when I wanted to use atomic functions that only 8600 GPUs have that support. The expensive GPU cannot use those functions and I cannot use global variables in a way I want to use.

The architecture is 1.0 but 1.1 architecture has to be given (-arch sm_11) to compiler to use atomic functions.

Will the future releases support atomic functions for 1.0 architecture or should I downgrade my GPU to 8600 ?

pkeir · July 17, 2007, 1:41pm

Unfortunately I would suggest you downgrade; as I did.

Simon_Green · July 17, 2007, 4:46pm

G80-based cards (GeForce 8800 / Quadro FX 5600) only have hardware for compute capability 1.0 (sm_10). There is no way atomic operations can be enabled by a software upgrade.

It is quite common for NVIDIA to introduce new features on low-end cards first since these are often later in the product cycle. Future products will support compute 1.1 or higher.

If you can tell us what you’re trying to do with atomics, we may be able to suggest workarounds for earlier hardware.

ErenS · July 18, 2007, 5:36am

I was trying to divide a process which calculates some variables like “mean” and “standard deviation”. All these requires access from different blocks and threads to the same variable.

For threads the solution seem to be an easy one, using a shared variable and letting one thread after a __syncthread() function do the job.

But the communication from different blocks creates a problem. Using a global variable may solve the problem in a long way (using an array of the same size as the number of the blocks and with using blockidx storing all the variables in different memory spaces. Then calling a kernbel function to sum them up). I cannot access the global variable in kernel function since two different blocks may access the same variable at the same time and try to change the same value (which must have been first increased for example) and get the calculation wrong.

Is there any faster way doing this, without calling two different kernel functions ?

Simon_Green · July 18, 2007, 9:13am

Yes, it is possible to do these kinds of operations in CUDA without using atomics by using what are called parallel reductions. Here are some good references:

[url=“Course Websites | The Grainger College of Engineering | UIUC”]http://courses.ece.uiuc.edu/ece498/al/lect...performance.ppt[/url]
[url=“http://developer.download.nvidia.com/compute/cuda/sdk/website/projects/scan/doc/scan.pdf”]http://developer.download.nvidia.com/compu...an/doc/scan.pdf[/url]

Topic		Replies	Views
atomic function for 8600 and not 8800 ? for 1.1 and not 1.0 ??? CUDA Programming and Performance	1	4650	July 16, 2007
Atomic operation Getting atomicAdd support CUDA Programming and Performance	3	2918	December 3, 2007
Can we use "AtomicAdd()" with GTX 8800? Any other option to do same thing...? CUDA Programming and Performance	14	5765	January 2, 2008
question on v1.1's Atomic ops and Async kernels CUDA Programming and Performance	1	3248	June 22, 2007
Cuda 1.1 on 8800 Ultra which hardware to use best? CUDA Programming and Performance	2	3595	November 23, 2007
Compute Capability 1.0 and atomic functions CUDA Programming and Performance	2	3004	June 24, 2009
Own atomic functions CUDA Programming and Performance	4	3057	August 4, 2008
atomicAdd CUDA Programming and Performance	4	3405	September 9, 2008
atomic add CUDA Programming and Performance	4	4646	March 20, 2008
Atomic operations versus frame buffer blend How do these relate? CUDA Programming and Performance	3	12503	June 8, 2007

Atomic functions Better programming on cheaper GPU ?

Related topics