Memory Consistency Model (strong or weak)

Kuroi · May 1, 2009, 5:57am

I have seen this topic before: [url=“http://forums.nvidia.com/lofiversion/index.php?t28336.html”]http://forums.nvidia.com/lofiversion/index.php?t28336.html[/url] but I think it has not been answered…

Does CUDA support a strong or weak memory consistency model?

In other words, does it support a sequentially consistent memory model or not?

Smokey · May 1, 2009, 6:19am

I must admit I’ve never even heard of these memory consistency models before now (I’m guessing they apply more to distributed programming, than to parallel programming) - but I’d say CUDA is a mix of a weak memory consistency model and a sequential consitency model depending on what type of memory you’re referring to (shared, constant, texture, and global memory all behave differently, have different rules, and in some cases can or cannot be guaranteed to be visible to other threads and/or blocks and/or kernels depending on various circumstances).

Honestly though, it sounds like you haven’t read the CUDA programming guide - which will answer all of your questions, and more.

Kuroi · May 1, 2009, 3:14pm

Thanks for your answer. I have been reading the CUDA programming guide and could not find an answer… that is why I decided to post this topic.

Maybe I should have put an example:

Imagine you have 2 variables x and y that are initially 0, then:

Thread1 execute: x=1; a=y; (written in that order)

Thread2 execute: y=1; b=x; (written in that order)

In a sequentially consistent memory model, after you execute the code, it would be imposible to find that a and b are both equal to 0. You may get one of the following after the execution:

a=1

b=0

or

a=0

b=1

or

a=1

b=1

The last case would mean that Thread1 and Thread2 executed the code at exactly the same time (completely in parallel External Media)

This would be a strong memory consistency model.

On the other hand, there are some systems were it is possible to find that a and b are both equal to 0… this is because the compiler may change the order of execution (a=y; x=1; in thread1, for example) as an optimization, or because the hardware scheduler decided to do the same change to speedup the execution… anyway, this would be a weak memory consistency model.

My question is which model NVIDIA-GPU/CUDA support? strong or weak?

Thanks for your time!

Jamie_K · May 2, 2009, 6:34am

I believe this is the reason for __threadfence(). I think in general the model is weak, but __threadfence() can be used to enforce ordering when you need it.

So for example
Thread1 execute: x=1; __threadfence(); a=y;
Thread2 execute: y=1; __threadfence(); b=x;

Then a=y must occur after x=1 has been flushed to memory and is visible to other threads, and likewise b=x must occur after y=1 is visible to other threads. There is no guarantee as to which will occur first, but at least one must write before the other reads it. Assuming appropriate use of volatile.

Kuroi · May 2, 2009, 5:42pm

Thanks for your answer, it really helped me External Media.!

Keldor314 · May 4, 2009, 11:53am

I believe that CUDA has strong consistency within a given warp, and weak overall. Thus, if thread1 and thread2 are in the same warp, then a and b will be equal to 1. If the threads are in different warps, all bets are off.

Of course, there’s no guarantee that the compiler won’t mess it up in either case.

Topic		Replies	Views
CUDA Memory Consistency CUDA Programming and Performance	23	55488	March 8, 2007
Memory Consistency and __syncthreads() CUDA Programming and Performance	2	7256	July 5, 2011
Synchronization, threadfence, random memory access beginner questions CUDA Programming and Performance	7	2626	April 9, 2012
A few thaughts about CUDA CUDA Programming and Performance	8	7758	January 7, 2010
Dealing with relaxed memory consistency model CUDA Programming and Performance	5	1383	February 13, 2010
_constant_ memory not thread safe in CUDA 4.0 CUDA Programming and Performance	2	2118	April 12, 2011
Questions on Thread-level resource management CUDA Programming and Performance	6	535	December 3, 2018
about the __syncwarp() in P100 CUDA Programming and Performance	11	4047	June 6, 2018
Memory programming model of Fermi CUDA Programming and Performance	12	5564	March 22, 2010
Is CUDA C or C++ ? CUDA Programming and Performance	12	33636	January 30, 2009

Memory Consistency Model (strong or weak)

Related topics