<div dir="ltr"><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername">Jeff Epler</b> <span dir="ltr"><<a href="mailto:jepler@unpythonic.net">jepler@unpythonic.net</a>></span><br>Date: Wed, Jul 22, 2015 at 2:16 AM<br>Subject: [Emc-developers] More trouble for LinuxCNC's memory model on ARM<br>To: <a href="mailto:emc-developers@lists.sourceforge.net">emc-developers@lists.sourceforge.net</a><br><br><br>A year ago I wrote about how ARM's memory model is not strong enough to<br>

reliably transport double-precision floats across HAL pins:<br>

    <a href="http://mid.gmane.org/20140702141237.GB65254%40unpythonic.net" rel="noreferrer" target="_blank">http://mid.gmane.org/20140702141237.GB65254%40unpythonic.net</a><br>

<br>

Today after looking at rare ARM-only buildbot failures, some of us<br>

researched the ARM memory model a bit more, and found some unfortunate<br>

assumptions that seem to hold up on x86 but not on ARM.<br>

<br>

You can find the lengthy PDF document "ARM Architecture Reference Manual<br>

ARMv7-A and ARMv7-R edition" by your favorite search engine.  Down in<br>

appendix G.2.2 is a nice section explaining the observed failures, all<br>

of which seemed to happen only on ARM, the impact being that sometimes<br>

halsampler prints 0 instead of an expected value.<br>

<br>

    Weakly-ordered message passing problem<br>

        P1:<br>

            STR R5, [R1]        ; set new data<br>

            STR R0, [R2]        ; send flag indicating data ready<br>

<br>

        P2:<br>

            WAIT([R2]==1)       ; wait on flag<br>

            LDS R5, [R1]        ; read new data<br>

<br>

        In the absence of barriers, an end result of P2: R5=0 is<br>

        permissible<br>

<br>

The fix is to use "barrier instructions", "DMB [ST]" and "DMB" on the<br>

writer and reader sides respectively.  ("DMB [ST]" seems to mean '"DMB"<br>

or "DMB ST"; "DMB" is the strongest barrier, "DMB ST" is a specific kind<br>

of weaker barrier)<br>

<br>

It appears that the gcc built-in function __sync_synchoronize will<br>

generate the required instruction on ARM.  On x86 this generates the odd<br>

instruction 'lock orl $0, (%esp)' and on x86_64 (or x86 with<br>

-march=pentium4), the 'mfence' instruction which will cause a small<br>

performance hit and as far as I know is not necessary.  In particular,<br>

it's not required in this case according to this summary of the Intel<br>

SDM in <a href="http://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf" rel="noreferrer" target="_blank">http://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf</a> :<br>

    Example 8-1 Stores are not reordered with other stores<br>

<br>

        Proc 0                          Proc 1<br>

    MOV [x] <- 1                     mov EAX <- [y]<br>

    MOV [y] <- 1                     mov EBX <- [x]<br>

<br>

    Forbidden final state: Proc 1:EAX=1 and Proc 1:EBX=0<br>

<br>

We identified some locations in LinuxCNC where these barriers definitely<br>

need to be added:<br>

    streamer/sampler<br>

    halscope<br>

    task/motion<br>

    nml shared memory regions<br>

    mutex operations (may already be right)<br>

but probably we will not immediately identify all such places and fix<br>

them all.  I hope to work up a branch this weekend for further testing,<br>

particularly if I can reproduce the behavior on my ARM board (which is<br>

the odroid u3, same as in the buildbot farm).  If it's not too invasive,<br>

I'll propose it for inclusion in 2.7, but I'm likely to make it<br>

cumulative with my rework of streamer/sampler since it centralizes what<br>

was 2 distinct sets of code before.<br>

<br>

... since I first tried to send this message, sourceforge had their<br>

little meltdown and I did further research.<br>

<br>

First, I did targeted testing on my ARM and found that with a new test I<br>

coded up, the sampler bug showed up on average more than once per<br>

minute; and that with the addition of barriers it went way down to zero,<br>

or at least less than once per 16 hours.  I have not placed this work on<br>

a tree on <a href="http://git.linuxcnc.org" rel="noreferrer" target="_blank">git.linuxcnc.org</a> yet, but I plan to rework the basic fix for<br>

streamer/sampler *not* on top of the experimental library-ized streamer<br>

but in a way that is suitable for 2.7.  I also now believe that nml<br>

shared memory regions are safe, due to use of OS mutexes which should<br>

already contain the required barriers.<br>

<br>

Jeff<br>

<br>

------------------------------------------------------------------------------<br>

Don't Limit Your Business. Reach for the Cloud.<br>

GigeNET's Cloud Solutions provide you with the tools and support that<br>

you need to offload your IT needs and focus on growing your business.<br>

Configured For All Businesses. Start Your Cloud Today.<br>

<a href="https://www.gigenetcloud.com/" rel="noreferrer" target="_blank">https://www.gigenetcloud.com/</a><br>

_______________________________________________<br>

Emc-developers mailing list<br>

<a href="mailto:Emc-developers@lists.sourceforge.net">Emc-developers@lists.sourceforge.net</a><br>

<a href="https://lists.sourceforge.net/lists/listinfo/emc-developers" rel="noreferrer" target="_blank">https://lists.sourceforge.net/lists/listinfo/emc-developers</a><br>

</div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature">Pozdrawiam,<br>Artur Kozubski<br></div>

</div>