The original code used relaxed ordering which does not fence reordering.
Relaxed atomics infact are identical to volatile variables [on x86
atleast]
compare&swap was wrong too and not resetting the variable to false
correctly.
Stores were using strict ordering which is not actually required.