If I have a
__device__ int data;
-
If I start a kernel with multiple threads to write to data with integer values between 1 ~ 10, and read from it in the kernel; is it possible to read a value of data that is not an integer between 1 ~ 10?
-
If the threads all write the same integer A to data, is it possible to read a value that is not A?
-
If the data is not corrupted, what is the size that is valid for the data? 8bit? 16? 32? 64?
No atomic/sync is done for both read and write.
cross-posting:
[url]https://stackoverflow.com/questions/55090270/cuda-data-writing-and-corruption[/url]
- no (assuming the initial value is 1-10 or it is guaranteed that any read occurs after at least 1 write).
- no, again assuming that somehow the read is guaranteed to take place after at least 1 write.
- for any supported data size (1,2,4,8, or 16 bytes per thread), assuming consistent transaction size across threads and natural alignment:
[url]https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#device-memory-accesses[/url]
The PTX memory model, which is instructive for CUDA users, is defined in some detail:
[url]https://docs.nvidia.com/cuda/parallel-thread-execution/index.html#memory-consistency-model[/url]
Thanks for your answer (I have deleted the cross-post).
According to the first link:
“Global memory instructions support reading or writing words of size equal to 1, 2, 4, 8, or 16 bytes. Any access (via a variable or a pointer) to data residing in global memory compiles to a single global memory instruction if and only if the size of the data type is 1, 2, 4, 8, or 16 bytes and the data is naturally aligned (i.e., its address is a multiple of that size).”
Does this mean partial writing will never happen when writing to these types of data (assume it is aligned)?
Bt partial writing I mean, in some x86 system, 8-byte writing is split into two 4-byte writing. When you read the 8-byte data only when the writing is half done, you get corrupted data that is not written by anyone.
To be certain, you have to have some familiarity with compiler behavior, or else inspect the generated SASS code directly.
If you’re dealing with aligned POD data types (int, float, unsigned, long, etc.) then yes, aligned read/writes should always be a single transaction/instruction. If you have proper usage of so-called vector types (e.g. int2, int4, float2, float4, double2, etc.) then I think these would be in the same category.
However, for other data types, you need to be careful. The case I would give as an example is a structure copy. Here’s an example of a possible issue to be aware of:
https://stackoverflow.com/questions/53214072/cudas-nvvp-reports-non-ideal-memory-access-pattern-but-bandwidth-is-almost-pea/53232717#53232717