[PATCH] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()

Frederic Weisbecker posted 1 patch 5 days, 14 hours ago
Failed in applying to current master (apply log)
.../Tree-RCU-Memory-Ordering.rst              | 33 +++++++++++++++++++
1 file changed, 33 insertions(+)

[PATCH] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()

Posted by Frederic Weisbecker 5 days, 14 hours ago
Add some missing critical pieces of explanation to understand the need
for full memory barriers throughout the whole grace period state machine,
thanks to Paul's explanations.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
Cc: Joel Fernandes <joel@joelfernandes.org>
Cc: Uladzislau Rezki <urezki@gmail.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
---
 .../Tree-RCU-Memory-Ordering.rst              | 33 +++++++++++++++++++
 1 file changed, 33 insertions(+)

diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
index 11cdab037bff..f21432115627 100644
--- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
+++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
@@ -112,6 +112,39 @@ on PowerPC.
 The ``smp_mb__after_unlock_lock()`` invocations prevent this
 ``WARN_ON()`` from triggering.
 
++-----------------------------------------------------------------------+
+| **Quick Quiz**:                                                       |
++-----------------------------------------------------------------------+
+| But the whole chain of rnp locking is enough for the readers to see   |
+| all the pre-grace-period accesses from the updater and for the updater|
+| to see all the accesses from the readers performed before the end of  |
+| the grace period. So why do we need to enforce full ordering at all   |
+| through smp_mb__after_unlock_lock()?                                  |
++-----------------------------------------------------------------------+
+| **Answer**:                                                           |
++-----------------------------------------------------------------------+
+| Because we still need to take care of the lockless counterparts of    |
+| RCU. The first key example here is grace period polling. Using        |
+| poll_state_synchronize_rcu() or cond_synchronize_rcu(), an updater    |
+| can rely solely on lockess full ordering to benefit from the usual    |
+| TREE RCU ordering guarantees.                                         |
+|                                                                       |
+| The second example lays behind the fact that a grace period still     |
+| claims to imply full memory ordering. Therefore in the following      |
+| scenario:                                                             |
+|                                                                       |
+| CPU 0                     CPU 1                                       |
+| ----                      ----                                        |
+| WRITE_ONCE(X, 1)          WRITE_ONCE(Y, 1)                            |
+| synchronize_rcu()         smp_mb()                                    |
+| r0 = READ_ONCE(Y)         r1 = READ_ONCE(X)                           |
+|                                                                       |
+| It must be impossible to have r0 == 0 && r1 == 0 after both CPUs      |
+| have completed their sequences, even if CPU 1 is in an RCU extended   |
+| quiescent state (idle mode) and thus won't report a quiescent state   |
+| throughout the common rnp locking chain.                              |
++-----------------------------------------------------------------------+
+
 This approach must be extended to include idle CPUs, which need
 RCU's grace-period memory ordering guarantee to extend to any
 RCU read-side critical sections preceding and following the current
-- 
2.25.1

Re: [PATCH] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()

Posted by Paul E. McKenney 5 days, 13 hours ago
On Thu, Jun 10, 2021 at 05:50:29PM +0200, Frederic Weisbecker wrote:
> Add some missing critical pieces of explanation to understand the need
> for full memory barriers throughout the whole grace period state machine,
> thanks to Paul's explanations.
> 
> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> Cc: Joel Fernandes <joel@joelfernandes.org>
> Cc: Uladzislau Rezki <urezki@gmail.com>
> Cc: Boqun Feng <boqun.feng@gmail.com>

Nice!!!  And not bad wording either, though I still could not resist the
urge to wordsmith further.  Plus I combined your two examples, in order to
provide a trivial example use of the polling interfaces, if nothing else.

Please let me know if I messed anything up.

							Thanx, Paul

------------------------------------------------------------------------

commit f21b8fbdf9a59553da825265e92cedb639b4ba3c
Author: Frederic Weisbecker <frederic@kernel.org>
Date:   Thu Jun 10 17:50:29 2021 +0200

    rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()
    
    Add some missing critical pieces of explanation to understand the need
    for full memory barriers throughout the whole grace period state machine,
    thanks to Paul's explanations.
    
    Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
    Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
    Cc: Joel Fernandes <joel@joelfernandes.org>
    Cc: Uladzislau Rezki <urezki@gmail.com>
    Cc: Boqun Feng <boqun.feng@gmail.com>
    Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
index 11cdab037bff..3cd5cb4d86e5 100644
--- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
+++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
@@ -112,6 +112,35 @@ on PowerPC.
 The ``smp_mb__after_unlock_lock()`` invocations prevent this
 ``WARN_ON()`` from triggering.
 
++-----------------------------------------------------------------------+
+| **Quick Quiz**:                                                       |
++-----------------------------------------------------------------------+
+| But the whole chain of rcu_node-structure locking guarantees that     |
+| readers see all pre-grace-period accesses from the updater and        |
+| also guarantees that the updater to see all post-grace-period         |
+| accesses from the readers.  So why do we need all of those calls      |
+| to smp_mb__after_unlock_lock()?                                       |
++-----------------------------------------------------------------------+
+| **Answer**:                                                           |
++-----------------------------------------------------------------------+
+| Because we must provide ordering for RCU's polling grace-period       |
+| primitives, for example, get_state_synchronize_rcu() and              |
+| poll_state_synchronize_rcu().  For example:                           |
+|                                                                       |
+| CPU 0                                     CPU 1                       |
+| ----                                      ----                        |
+| WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)            |
+| g = get_state_synchronize_rcu()           smp_mb()                    |
+| while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)           |
+|         continue;                                                     |
+| r0 = READ_ONCE(Y)                                                     |
+|                                                                       |
+| RCU guarantees that that the outcome r0 == 0 && r1 == 0 will not      |
+| happen, even if CPU 1 is in an RCU extended quiescent state (idle     |
+| or offline) and thus won't interact directly with the RCU core        |
+| processing at all.                                                    |
++-----------------------------------------------------------------------+
+
 This approach must be extended to include idle CPUs, which need
 RCU's grace-period memory ordering guarantee to extend to any
 RCU read-side critical sections preceding and following the current

Re: [PATCH] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()

Posted by Akira Yokosawa 5 days, 6 hours ago
On Thu, 10 Jun 2021 09:57:10 -0700, Paul E. McKenney wrote:
> On Thu, Jun 10, 2021 at 05:50:29PM +0200, Frederic Weisbecker wrote:
>> Add some missing critical pieces of explanation to understand the need
>> for full memory barriers throughout the whole grace period state machine,
>> thanks to Paul's explanations.
>> 
>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
>> Cc: Joel Fernandes <joel@joelfernandes.org>
>> Cc: Uladzislau Rezki <urezki@gmail.com>
>> Cc: Boqun Feng <boqun.feng@gmail.com>
> 
> Nice!!!  And not bad wording either, though I still could not resist the
> urge to wordsmith further.  Plus I combined your two examples, in order to
> provide a trivial example use of the polling interfaces, if nothing else.
> 
> Please let me know if I messed anything up.

Hi Paul,

See minor tweaks below to satisfy sphinx.

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> commit f21b8fbdf9a59553da825265e92cedb639b4ba3c
> Author: Frederic Weisbecker <frederic@kernel.org>
> Date:   Thu Jun 10 17:50:29 2021 +0200
> 
>     rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()
>     
>     Add some missing critical pieces of explanation to understand the need
>     for full memory barriers throughout the whole grace period state machine,
>     thanks to Paul's explanations.
>     
>     Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>     Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
>     Cc: Joel Fernandes <joel@joelfernandes.org>
>     Cc: Uladzislau Rezki <urezki@gmail.com>
>     Cc: Boqun Feng <boqun.feng@gmail.com>
>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> index 11cdab037bff..3cd5cb4d86e5 100644
> --- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> +++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> @@ -112,6 +112,35 @@ on PowerPC.
>  The ``smp_mb__after_unlock_lock()`` invocations prevent this
>  ``WARN_ON()`` from triggering.
>  
> ++-----------------------------------------------------------------------+
> +| **Quick Quiz**:                                                       |
> ++-----------------------------------------------------------------------+
> +| But the whole chain of rcu_node-structure locking guarantees that     |
> +| readers see all pre-grace-period accesses from the updater and        |
> +| also guarantees that the updater to see all post-grace-period         |
> +| accesses from the readers.  So why do we need all of those calls      |
> +| to smp_mb__after_unlock_lock()?                                       |
> ++-----------------------------------------------------------------------+
> +| **Answer**:                                                           |
> ++-----------------------------------------------------------------------+
> +| Because we must provide ordering for RCU's polling grace-period       |
> +| primitives, for example, get_state_synchronize_rcu() and              |
> +| poll_state_synchronize_rcu().  For example:                           |
> +|                                                                       |
> +| CPU 0                                     CPU 1                       |
> +| ----                                      ----                        |
> +| WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)            |
> +| g = get_state_synchronize_rcu()           smp_mb()                    |
> +| while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)           |
> +|         continue;                                                     |

This indent causes warnings from sphinx:

Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst:135: WARNING: Unexpected indentation.
Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst:137: WARNING: Block quote ends without a blank line; unexpected unindent

> +| r0 = READ_ONCE(Y)                                                     |
> +|                                                                       |
> +| RCU guarantees that that the outcome r0 == 0 && r1 == 0 will not      |
> +| happen, even if CPU 1 is in an RCU extended quiescent state (idle     |
> +| or offline) and thus won't interact directly with the RCU core        |
> +| processing at all.                                                    |
> ++-----------------------------------------------------------------------+
> +
>  This approach must be extended to include idle CPUs, which need
>  RCU's grace-period memory ordering guarantee to extend to any
>  RCU read-side critical sections preceding and following the current

The code block in the answer can be fixed as follows:

++-----------------------------------------------------------------------+
+| **Answer**:                                                           |
++-----------------------------------------------------------------------+
+| Because we must provide ordering for RCU's polling grace-period       |
+| primitives, for example, get_state_synchronize_rcu() and              |
+| poll_state_synchronize_rcu().  For example::                          |
+|                                                                       |
+|  CPU 0                                     CPU 1                      |
+|  ----                                      ----                       |
+|  WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)           |
+|  g = get_state_synchronize_rcu()           smp_mb()                   |
+|  while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)          |
+|          continue;                                                    |
+|  r0 = READ_ONCE(Y)                                                    |
+|                                                                       |
+| RCU guarantees that that the outcome r0 == 0 && r1 == 0 will not      |
+| happen, even if CPU 1 is in an RCU extended quiescent state (idle     |
+| or offline) and thus won't interact directly with the RCU core        |
+| processing at all.                                                    |
++-----------------------------------------------------------------------+

Hint: Use of "::" and indented code block.
 
       Thanks, Akira

Re: [PATCH] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()

Posted by Paul E. McKenney 5 days, 5 hours ago
On Fri, Jun 11, 2021 at 09:28:10AM +0900, Akira Yokosawa wrote:
> On Thu, 10 Jun 2021 09:57:10 -0700, Paul E. McKenney wrote:
> > On Thu, Jun 10, 2021 at 05:50:29PM +0200, Frederic Weisbecker wrote:
> >> Add some missing critical pieces of explanation to understand the need
> >> for full memory barriers throughout the whole grace period state machine,
> >> thanks to Paul's explanations.
> >> 
> >> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> >> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> >> Cc: Joel Fernandes <joel@joelfernandes.org>
> >> Cc: Uladzislau Rezki <urezki@gmail.com>
> >> Cc: Boqun Feng <boqun.feng@gmail.com>
> > 
> > Nice!!!  And not bad wording either, though I still could not resist the
> > urge to wordsmith further.  Plus I combined your two examples, in order to
> > provide a trivial example use of the polling interfaces, if nothing else.
> > 
> > Please let me know if I messed anything up.
> 
> Hi Paul,
> 
> See minor tweaks below to satisfy sphinx.
> 
> > 
> > 							Thanx, Paul
> > 
> > ------------------------------------------------------------------------
> > 
> > commit f21b8fbdf9a59553da825265e92cedb639b4ba3c
> > Author: Frederic Weisbecker <frederic@kernel.org>
> > Date:   Thu Jun 10 17:50:29 2021 +0200
> > 
> >     rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()
> >     
> >     Add some missing critical pieces of explanation to understand the need
> >     for full memory barriers throughout the whole grace period state machine,
> >     thanks to Paul's explanations.
> >     
> >     Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
> >     Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
> >     Cc: Joel Fernandes <joel@joelfernandes.org>
> >     Cc: Uladzislau Rezki <urezki@gmail.com>
> >     Cc: Boqun Feng <boqun.feng@gmail.com>
> >     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> > 
> > diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> > index 11cdab037bff..3cd5cb4d86e5 100644
> > --- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> > +++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> > @@ -112,6 +112,35 @@ on PowerPC.
> >  The ``smp_mb__after_unlock_lock()`` invocations prevent this
> >  ``WARN_ON()`` from triggering.
> >  
> > ++-----------------------------------------------------------------------+
> > +| **Quick Quiz**:                                                       |
> > ++-----------------------------------------------------------------------+
> > +| But the whole chain of rcu_node-structure locking guarantees that     |
> > +| readers see all pre-grace-period accesses from the updater and        |
> > +| also guarantees that the updater to see all post-grace-period         |
> > +| accesses from the readers.  So why do we need all of those calls      |
> > +| to smp_mb__after_unlock_lock()?                                       |
> > ++-----------------------------------------------------------------------+
> > +| **Answer**:                                                           |
> > ++-----------------------------------------------------------------------+
> > +| Because we must provide ordering for RCU's polling grace-period       |
> > +| primitives, for example, get_state_synchronize_rcu() and              |
> > +| poll_state_synchronize_rcu().  For example:                           |
> > +|                                                                       |
> > +| CPU 0                                     CPU 1                       |
> > +| ----                                      ----                        |
> > +| WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)            |
> > +| g = get_state_synchronize_rcu()           smp_mb()                    |
> > +| while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)           |
> > +|         continue;                                                     |
> 
> This indent causes warnings from sphinx:
> 
> Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst:135: WARNING: Unexpected indentation.
> Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst:137: WARNING: Block quote ends without a blank line; unexpected unindent
> 
> > +| r0 = READ_ONCE(Y)                                                     |
> > +|                                                                       |
> > +| RCU guarantees that that the outcome r0 == 0 && r1 == 0 will not      |
> > +| happen, even if CPU 1 is in an RCU extended quiescent state (idle     |
> > +| or offline) and thus won't interact directly with the RCU core        |
> > +| processing at all.                                                    |
> > ++-----------------------------------------------------------------------+
> > +
> >  This approach must be extended to include idle CPUs, which need
> >  RCU's grace-period memory ordering guarantee to extend to any
> >  RCU read-side critical sections preceding and following the current
> 
> The code block in the answer can be fixed as follows:
> 
> ++-----------------------------------------------------------------------+
> +| **Answer**:                                                           |
> ++-----------------------------------------------------------------------+
> +| Because we must provide ordering for RCU's polling grace-period       |
> +| primitives, for example, get_state_synchronize_rcu() and              |
> +| poll_state_synchronize_rcu().  For example::                          |
> +|                                                                       |
> +|  CPU 0                                     CPU 1                      |
> +|  ----                                      ----                       |
> +|  WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)           |
> +|  g = get_state_synchronize_rcu()           smp_mb()                   |
> +|  while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)          |
> +|          continue;                                                    |
> +|  r0 = READ_ONCE(Y)                                                    |
> +|                                                                       |
> +| RCU guarantees that that the outcome r0 == 0 && r1 == 0 will not      |
> +| happen, even if CPU 1 is in an RCU extended quiescent state (idle     |
> +| or offline) and thus won't interact directly with the RCU core        |
> +| processing at all.                                                    |
> ++-----------------------------------------------------------------------+
> 
> Hint: Use of "::" and indented code block.

Thank you!

As in with the following patch to be merged into Frederic's original,
with attribution?

							Thanx, Paul

------------------------------------------------------------------------

diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
index 3cd5cb4d86e5..bc884ebf88bb 100644
--- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
+++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
@@ -125,15 +125,15 @@ The ``smp_mb__after_unlock_lock()`` invocations prevent this
 +-----------------------------------------------------------------------+
 | Because we must provide ordering for RCU's polling grace-period       |
 | primitives, for example, get_state_synchronize_rcu() and              |
-| poll_state_synchronize_rcu().  For example:                           |
+| poll_state_synchronize_rcu().  For example::                          |
 |                                                                       |
-| CPU 0                                     CPU 1                       |
-| ----                                      ----                        |
-| WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)            |
-| g = get_state_synchronize_rcu()           smp_mb()                    |
-| while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)           |
-|         continue;                                                     |
-| r0 = READ_ONCE(Y)                                                     |
+|  CPU 0                                     CPU 1                      |
+|  ----                                      ----                       |
+|  WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)           |
+|  g = get_state_synchronize_rcu()           smp_mb()                   |
+|  while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)          |
+|          continue;                                                    |
+|  r0 = READ_ONCE(Y)                                                    |
 |                                                                       |
 | RCU guarantees that that the outcome r0 == 0 && r1 == 0 will not      |
 | happen, even if CPU 1 is in an RCU extended quiescent state (idle     |

Re: [PATCH] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()

Posted by Akira Yokosawa 5 days, 5 hours ago
On Thu, 10 Jun 2021 17:48:13 -0700, Paul E. McKenney wrote:
> On Fri, Jun 11, 2021 at 09:28:10AM +0900, Akira Yokosawa wrote:
>> On Thu, 10 Jun 2021 09:57:10 -0700, Paul E. McKenney wrote:
>>> On Thu, Jun 10, 2021 at 05:50:29PM +0200, Frederic Weisbecker wrote:
>>>> Add some missing critical pieces of explanation to understand the need
>>>> for full memory barriers throughout the whole grace period state machine,
>>>> thanks to Paul's explanations.
>>>>
>>>> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>>>> Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
>>>> Cc: Joel Fernandes <joel@joelfernandes.org>
>>>> Cc: Uladzislau Rezki <urezki@gmail.com>
>>>> Cc: Boqun Feng <boqun.feng@gmail.com>
>>>
>>> Nice!!!  And not bad wording either, though I still could not resist the
>>> urge to wordsmith further.  Plus I combined your two examples, in order to
>>> provide a trivial example use of the polling interfaces, if nothing else.
>>>
>>> Please let me know if I messed anything up.
>>
>> Hi Paul,
>>
>> See minor tweaks below to satisfy sphinx.
>>
>>>
>>> 							Thanx, Paul
>>>
>>> ------------------------------------------------------------------------
>>>
>>> commit f21b8fbdf9a59553da825265e92cedb639b4ba3c
>>> Author: Frederic Weisbecker <frederic@kernel.org>
>>> Date:   Thu Jun 10 17:50:29 2021 +0200
>>>
>>>     rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()
>>>     
>>>     Add some missing critical pieces of explanation to understand the need
>>>     for full memory barriers throughout the whole grace period state machine,
>>>     thanks to Paul's explanations.
>>>     
>>>     Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
>>>     Cc: Neeraj Upadhyay <neeraju@codeaurora.org>
>>>     Cc: Joel Fernandes <joel@joelfernandes.org>
>>>     Cc: Uladzislau Rezki <urezki@gmail.com>
>>>     Cc: Boqun Feng <boqun.feng@gmail.com>
>>>     Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
>>>
>>> diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
>>> index 11cdab037bff..3cd5cb4d86e5 100644
>>> --- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
>>> +++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
>>> @@ -112,6 +112,35 @@ on PowerPC.
>>>  The ``smp_mb__after_unlock_lock()`` invocations prevent this
>>>  ``WARN_ON()`` from triggering.
>>>  
>>> ++-----------------------------------------------------------------------+
>>> +| **Quick Quiz**:                                                       |
>>> ++-----------------------------------------------------------------------+
>>> +| But the whole chain of rcu_node-structure locking guarantees that     |
>>> +| readers see all pre-grace-period accesses from the updater and        |
>>> +| also guarantees that the updater to see all post-grace-period         |
>>> +| accesses from the readers.  So why do we need all of those calls      |
>>> +| to smp_mb__after_unlock_lock()?                                       |
>>> ++-----------------------------------------------------------------------+
>>> +| **Answer**:                                                           |
>>> ++-----------------------------------------------------------------------+
>>> +| Because we must provide ordering for RCU's polling grace-period       |
>>> +| primitives, for example, get_state_synchronize_rcu() and              |
>>> +| poll_state_synchronize_rcu().  For example:                           |
>>> +|                                                                       |
>>> +| CPU 0                                     CPU 1                       |
>>> +| ----                                      ----                        |
>>> +| WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)            |
>>> +| g = get_state_synchronize_rcu()           smp_mb()                    |
>>> +| while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)           |
>>> +|         continue;                                                     |
>>
>> This indent causes warnings from sphinx:
>>
>> Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst:135: WARNING: Unexpected indentation.
>> Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst:137: WARNING: Block quote ends without a blank line; unexpected unindent
>>
>>> +| r0 = READ_ONCE(Y)                                                     |
>>> +|                                                                       |
>>> +| RCU guarantees that that the outcome r0 == 0 && r1 == 0 will not      |
>>> +| happen, even if CPU 1 is in an RCU extended quiescent state (idle     |
>>> +| or offline) and thus won't interact directly with the RCU core        |
>>> +| processing at all.                                                    |
>>> ++-----------------------------------------------------------------------+
>>> +
>>>  This approach must be extended to include idle CPUs, which need
>>>  RCU's grace-period memory ordering guarantee to extend to any
>>>  RCU read-side critical sections preceding and following the current
>>
>> The code block in the answer can be fixed as follows:
>>
>> ++-----------------------------------------------------------------------+
>> +| **Answer**:                                                           |
>> ++-----------------------------------------------------------------------+
>> +| Because we must provide ordering for RCU's polling grace-period       |
>> +| primitives, for example, get_state_synchronize_rcu() and              |
>> +| poll_state_synchronize_rcu().  For example::                          |
>> +|                                                                       |
>> +|  CPU 0                                     CPU 1                      |
>> +|  ----                                      ----                       |
>> +|  WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)           |
>> +|  g = get_state_synchronize_rcu()           smp_mb()                   |
>> +|  while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)          |
>> +|          continue;                                                    |
>> +|  r0 = READ_ONCE(Y)                                                    |
>> +|                                                                       |
>> +| RCU guarantees that that the outcome r0 == 0 && r1 == 0 will not      |
>> +| happen, even if CPU 1 is in an RCU extended quiescent state (idle     |
>> +| or offline) and thus won't interact directly with the RCU core        |
>> +| processing at all.                                                    |
>> ++-----------------------------------------------------------------------+
>>
>> Hint: Use of "::" and indented code block.
> 
> Thank you!
> 
> As in with the following patch to be merged into Frederic's original,
> with attribution?

Sounds good to me!

        Thanks, Akira

> 
> 							Thanx, Paul
> 
> ------------------------------------------------------------------------
> 
> diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> index 3cd5cb4d86e5..bc884ebf88bb 100644
> --- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> +++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> @@ -125,15 +125,15 @@ The ``smp_mb__after_unlock_lock()`` invocations prevent this
>  +-----------------------------------------------------------------------+
>  | Because we must provide ordering for RCU's polling grace-period       |
>  | primitives, for example, get_state_synchronize_rcu() and              |
> -| poll_state_synchronize_rcu().  For example:                           |
> +| poll_state_synchronize_rcu().  For example::                          |
>  |                                                                       |
> -| CPU 0                                     CPU 1                       |
> -| ----                                      ----                        |
> -| WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)            |
> -| g = get_state_synchronize_rcu()           smp_mb()                    |
> -| while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)           |
> -|         continue;                                                     |
> -| r0 = READ_ONCE(Y)                                                     |
> +|  CPU 0                                     CPU 1                      |
> +|  ----                                      ----                       |
> +|  WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)           |
> +|  g = get_state_synchronize_rcu()           smp_mb()                   |
> +|  while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)          |
> +|          continue;                                                    |
> +|  r0 = READ_ONCE(Y)                                                    |
>  |                                                                       |
>  | RCU guarantees that that the outcome r0 == 0 && r1 == 0 will not      |
>  | happen, even if CPU 1 is in an RCU extended quiescent state (idle     |
> 

Re: [PATCH] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()

Posted by Frederic Weisbecker 4 days, 20 hours ago
On Thu, Jun 10, 2021 at 09:57:10AM -0700, Paul E. McKenney wrote:
> diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> index 11cdab037bff..3cd5cb4d86e5 100644
> --- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> +++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> @@ -112,6 +112,35 @@ on PowerPC.
>  The ``smp_mb__after_unlock_lock()`` invocations prevent this
>  ``WARN_ON()`` from triggering.
>  
> ++-----------------------------------------------------------------------+
> +| **Quick Quiz**:                                                       |
> ++-----------------------------------------------------------------------+
> +| But the whole chain of rcu_node-structure locking guarantees that     |
> +| readers see all pre-grace-period accesses from the updater and        |
> +| also guarantees that the updater to see all post-grace-period         |

Should it be either "that the updater see" or "the updater to see"?

> +| accesses from the readers.

Is it really post-grace-period that you meant here? The updater can't see
the future. It's rather all reader accesses before the end of the grace period?

>  So why do we need all of those calls      |
> +| to smp_mb__after_unlock_lock()?                                       |
> ++-----------------------------------------------------------------------+
> +| **Answer**:                                                           |
> ++-----------------------------------------------------------------------+
> +| Because we must provide ordering for RCU's polling grace-period       |
> +| primitives, for example, get_state_synchronize_rcu() and              |
> +| poll_state_synchronize_rcu().  For example:                           |

Two times "for example" (sorry I'm nitpicking...)

> +|                                                                       |
> +| CPU 0                                     CPU 1                       |
> +| ----                                      ----                        |
> +| WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)            |
> +| g = get_state_synchronize_rcu()           smp_mb()                    |
> +| while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)           |
> +|         continue;                                                     |
> +| r0 = READ_ONCE(Y)                                                     |

Good point, it's a nice merge of the initial examples!

> +|                                                                       |
> +| RCU guarantees that that the outcome r0 == 0 && r1 == 0 will not      |

One "that" has to die here.

> +| happen, even if CPU 1 is in an RCU extended quiescent state (idle     |
> +| or offline) and thus won't interact directly with the RCU core        |
> +| processing at all.                                                    |

Thanks a lot!

> ++-----------------------------------------------------------------------+
> +
>  This approach must be extended to include idle CPUs, which need
>  RCU's grace-period memory ordering guarantee to extend to any
>  RCU read-side critical sections preceding and following the current

Re: [PATCH] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()

Posted by Paul E. McKenney 4 days, 13 hours ago
On Fri, Jun 11, 2021 at 12:34:32PM +0200, Frederic Weisbecker wrote:
> On Thu, Jun 10, 2021 at 09:57:10AM -0700, Paul E. McKenney wrote:
> > diff --git a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> > index 11cdab037bff..3cd5cb4d86e5 100644
> > --- a/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> > +++ b/Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.rst
> > @@ -112,6 +112,35 @@ on PowerPC.
> >  The ``smp_mb__after_unlock_lock()`` invocations prevent this
> >  ``WARN_ON()`` from triggering.
> >  
> > ++-----------------------------------------------------------------------+
> > +| **Quick Quiz**:                                                       |
> > ++-----------------------------------------------------------------------+
> > +| But the whole chain of rcu_node-structure locking guarantees that     |
> > +| readers see all pre-grace-period accesses from the updater and        |
> > +| also guarantees that the updater to see all post-grace-period         |
> 
> Should it be either "that the updater see" or "the updater to see"?

Good catch, I have reworked this paragraph.

> > +| accesses from the readers.
> 
> Is it really post-grace-period that you meant here? The updater can't see
> the future. It's rather all reader accesses before the end of the grace period?

I have reworked this to talk about old and new readers on the one hand
and the updater's pre- and post-grace-period accesses on the other.

> >  So why do we need all of those calls      |
> > +| to smp_mb__after_unlock_lock()?                                       |
> > ++-----------------------------------------------------------------------+
> > +| **Answer**:                                                           |
> > ++-----------------------------------------------------------------------+
> > +| Because we must provide ordering for RCU's polling grace-period       |
> > +| primitives, for example, get_state_synchronize_rcu() and              |
> > +| poll_state_synchronize_rcu().  For example:                           |
> 
> Two times "for example" (sorry I'm nitpicking...)

But the example has two threads!

Kidding aside, I substituted "Consider this code" for the second
"For example".

> > +|                                                                       |
> > +| CPU 0                                     CPU 1                       |
> > +| ----                                      ----                        |
> > +| WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)            |
> > +| g = get_state_synchronize_rcu()           smp_mb()                    |
> > +| while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)           |
> > +|         continue;                                                     |
> > +| r0 = READ_ONCE(Y)                                                     |
> 
> Good point, it's a nice merge of the initial examples!

Glad you like it!

> > +|                                                                       |
> > +| RCU guarantees that that the outcome r0 == 0 && r1 == 0 will not      |
> 
> One "that" has to die here.

Can we instead show clemency and banish it to some other paragraph?

> > +| happen, even if CPU 1 is in an RCU extended quiescent state (idle     |
> > +| or offline) and thus won't interact directly with the RCU core        |
> > +| processing at all.                                                    |
> 
> Thanks a lot!

Glad to help, and I will reach out to you should someone make the mistake
of insisting that I write something in French.  ;-)

> > ++-----------------------------------------------------------------------+
> > +
> >  This approach must be extended to include idle CPUs, which need
> >  RCU's grace-period memory ordering guarantee to extend to any
> >  RCU read-side critical sections preceding and following the current

How about like this?

+-----------------------------------------------------------------------+
| **Quick Quiz**:                                                       |
+-----------------------------------------------------------------------+
| But the chain of rcu_node-structure lock acquisitions guarantees      |
| that new readers will see all of the updater's pre-grace-period       |
| accesses and also guarantees that the updater's post-grace-period     |
| accesses will see all of the old reader's accesses.  So why do we     |
| need all of those calls to smp_mb__after_unlock_lock()?               |
+-----------------------------------------------------------------------+
| **Answer**:                                                           |
+-----------------------------------------------------------------------+
| Because we must provide ordering for RCU's polling grace-period       |
| primitives, for example, get_state_synchronize_rcu() and              |
| poll_state_synchronize_rcu().  Consider this code::                   |
|                                                                       |
|  CPU 0                                     CPU 1                      |
|  ----                                      ----                       |
|  WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)           |
|  g = get_state_synchronize_rcu()           smp_mb()                   |
|  while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)          |
|          continue;                                                    |
|  r0 = READ_ONCE(Y)                                                    |
|                                                                       |
| RCU guarantees that the outcome r0 == 0 && r1 == 0 will not           |
| happen, even if CPU 1 is in an RCU extended quiescent state           |
| (idle or offline) and thus won't interact directly with the RCU       |
| core processing at all.                                               |
+-----------------------------------------------------------------------+

							Thanx, Paul

Re: [PATCH] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()

Posted by Frederic Weisbecker 4 days, 7 hours ago
On Fri, Jun 11, 2021 at 10:25:14AM -0700, Paul E. McKenney wrote:
> On Fri, Jun 11, 2021 at 12:34:32PM +0200, Frederic Weisbecker wrote:
> Glad to help, and I will reach out to you should someone make the mistake
> of insisting that I write something in French.  ;-)

If that can help, we still have frenglish for neutral territories such as airports.
Not easy to master though...

> 
> > > ++-----------------------------------------------------------------------+
> > > +
> > >  This approach must be extended to include idle CPUs, which need
> > >  RCU's grace-period memory ordering guarantee to extend to any
> > >  RCU read-side critical sections preceding and following the current
> 
> How about like this?
> 
> +-----------------------------------------------------------------------+
> | **Quick Quiz**:                                                       |
> +-----------------------------------------------------------------------+
> | But the chain of rcu_node-structure lock acquisitions guarantees      |
> | that new readers will see all of the updater's pre-grace-period       |
> | accesses and also guarantees that the updater's post-grace-period     |
> | accesses will see all of the old reader's accesses.  So why do we     |
> | need all of those calls to smp_mb__after_unlock_lock()?               |
> +-----------------------------------------------------------------------+
> | **Answer**:                                                           |
> +-----------------------------------------------------------------------+
> | Because we must provide ordering for RCU's polling grace-period       |
> | primitives, for example, get_state_synchronize_rcu() and              |
> | poll_state_synchronize_rcu().  Consider this code::                   |
> |                                                                       |
> |  CPU 0                                     CPU 1                      |
> |  ----                                      ----                       |
> |  WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)           |
> |  g = get_state_synchronize_rcu()           smp_mb()                   |
> |  while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)          |
> |          continue;                                                    |
> |  r0 = READ_ONCE(Y)                                                    |
> |                                                                       |
> | RCU guarantees that the outcome r0 == 0 && r1 == 0 will not           |
> | happen, even if CPU 1 is in an RCU extended quiescent state           |
> | (idle or offline) and thus won't interact directly with the RCU       |
> | core processing at all.                                               |
> +-----------------------------------------------------------------------+

Very good, thanks a lot :o)

Re: [PATCH] rcu/doc: Add a quick quiz to explain further why we need smp_mb__after_unlock_lock()

Posted by Paul E. McKenney 4 days, 6 hours ago
On Sat, Jun 12, 2021 at 12:45:17AM +0200, Frederic Weisbecker wrote:
> On Fri, Jun 11, 2021 at 10:25:14AM -0700, Paul E. McKenney wrote:
> > On Fri, Jun 11, 2021 at 12:34:32PM +0200, Frederic Weisbecker wrote:
> > Glad to help, and I will reach out to you should someone make the mistake
> > of insisting that I write something in French.  ;-)
> 
> If that can help, we still have frenglish for neutral territories such as airports.
> Not easy to master though...

That does sound dangerous!  ;-)

> > > > ++-----------------------------------------------------------------------+
> > > > +
> > > >  This approach must be extended to include idle CPUs, which need
> > > >  RCU's grace-period memory ordering guarantee to extend to any
> > > >  RCU read-side critical sections preceding and following the current
> > 
> > How about like this?
> > 
> > +-----------------------------------------------------------------------+
> > | **Quick Quiz**:                                                       |
> > +-----------------------------------------------------------------------+
> > | But the chain of rcu_node-structure lock acquisitions guarantees      |
> > | that new readers will see all of the updater's pre-grace-period       |
> > | accesses and also guarantees that the updater's post-grace-period     |
> > | accesses will see all of the old reader's accesses.  So why do we     |
> > | need all of those calls to smp_mb__after_unlock_lock()?               |
> > +-----------------------------------------------------------------------+
> > | **Answer**:                                                           |
> > +-----------------------------------------------------------------------+
> > | Because we must provide ordering for RCU's polling grace-period       |
> > | primitives, for example, get_state_synchronize_rcu() and              |
> > | poll_state_synchronize_rcu().  Consider this code::                   |
> > |                                                                       |
> > |  CPU 0                                     CPU 1                      |
> > |  ----                                      ----                       |
> > |  WRITE_ONCE(X, 1)                          WRITE_ONCE(Y, 1)           |
> > |  g = get_state_synchronize_rcu()           smp_mb()                   |
> > |  while (!poll_state_synchronize_rcu(g))    r1 = READ_ONCE(X)          |
> > |          continue;                                                    |
> > |  r0 = READ_ONCE(Y)                                                    |
> > |                                                                       |
> > | RCU guarantees that the outcome r0 == 0 && r1 == 0 will not           |
> > | happen, even if CPU 1 is in an RCU extended quiescent state           |
> > | (idle or offline) and thus won't interact directly with the RCU       |
> > | core processing at all.                                               |
> > +-----------------------------------------------------------------------+
> 
> Very good, thanks a lot :o)

And thank you!

							Thanx, Paul