[libvirt] [PATCH v2] nodedev: Document the udevEventHandleThread

John Ferlan posted 1 patch 5 years, 5 months ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/libvirt tags/patchew/20181101174839.14863-1-jferlan@redhat.com
src/node_device/node_device_udev.c | 28 ++++++++++++++++++++++++++++
1 file changed, 28 insertions(+)
[libvirt] [PATCH v2] nodedev: Document the udevEventHandleThread
Posted by John Ferlan 5 years, 5 months ago
Commit cdbe1332 neglected to document the API. So let's add some
details about the algorithm and why it was used to help future
readers understand the issues encountered. Based largely off a
description provided by Erik Skultety <eskultet@redhat.com>.

NB: Management of the processing udev device notification is a
delicate balance between the udev process, the scheduler, and when
exactly the data from/for the socket is received. The balance is
particularly important for environments when multiple devices are
added into the system more or less simultaneously such as is done
for mdev. In these cases scheduler blocking on the udev recv() call
is more noticeable. It's expected that future devices will follow
similar algorithms. Even though the algorithm does present some
challenges for older OS's (such as Centos 6), trying to rewrite
the algorithm to fit both models would be more complex and involve
pulling the monitor object out of the private data lockable object
and would need to be guarded by a separate lock. Devising such an
algorithm to work around issues with older OS's at the expense
of more modern OS algorithms in newer event processing code may
result in unexpected issues, so the choice is to encourage use
of newer OS's with newer udev event processing code.

Signed-off-by: John Ferlan <jferlan@redhat.com>
---

 v1: https://www.redhat.com/archives/libvir-list/2018-October/msg01053.html

 Changes are from code review with some minor tweaks/editing as I
 went along. Mistakes are my own!

 src/node_device/node_device_udev.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
index 22897591de..f2c2299d4d 100644
--- a/src/node_device/node_device_udev.c
+++ b/src/node_device/node_device_udev.c
@@ -1591,6 +1591,26 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv,
 }
 
 
+/**
+ * udevEventHandleThread
+ * @opaque: unused
+ *
+ * Thread to handle the udevEventHandleCallback processing when udev
+ * tells us there's a device change for us (add, modify, delete, etc).
+ *
+ * Once notified there is data to be processed, the actual @device
+ * data retrieval by libudev may be delayed due to how threads are
+ * scheduled. In fact, the event loop could be scheduled earlier than
+ * the handler thread, thus potentially emitting the very same event
+ * the handler thread is currently trying to process, simply because
+ * the data hadn't been retrieved from the socket.
+ *
+ * NB: Usage of event based socket algorithm causes some issues with
+ * older platforms such as CentOS 6 in which the data socket is opened
+ * without the NONBLOCK bit set. That can be avoided by moving the reset
+ * priv->dataReady to just after the udev_monitor_receive_device; however,
+ * scheduler issues would still come into play.
+ */
 static void
 udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
 {
@@ -1637,6 +1657,9 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
                 return;
             }
 
+            /* Trying to move the reset of the @priv->dataReady flag to
+             * after the udev_monitor_receive_device wouldn't help much
+             * due to event mgmt and scheduler timing. */
             virObjectLock(priv);
             priv->dataReady = false;
             virObjectUnlock(priv);
@@ -1646,6 +1669,11 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
 
         udevHandleOneDevice(device);
         udev_device_unref(device);
+
+        /* Instead of waiting for the next event after processing @device
+         * data, let's keep reading from the udev monitor and only wait
+         * for the next event once either a EAGAIN or a EWOULDBLOCK error
+         * is encountered. */
     }
 }
 
-- 
2.17.2

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2] nodedev: Document the udevEventHandleThread
Posted by Erik Skultety 5 years, 5 months ago
On Thu, Nov 01, 2018 at 01:48:39PM -0400, John Ferlan wrote:
> Commit cdbe1332 neglected to document the API. So let's add some
> details about the algorithm and why it was used to help future
> readers understand the issues encountered. Based largely off a
> description provided by Erik Skultety <eskultet@redhat.com>.

Oh, I missed ^this last sentence. Since you're blaming the commit already and
the description based on a mailing list thread will not be part of the log, you
should IMHO drop it.

>
> NB: Management of the processing udev device notification is a
> delicate balance between the udev process, the scheduler, and when
> exactly the data from/for the socket is received. The balance is
> particularly important for environments when multiple devices are
> added into the system more or less simultaneously such as is done
> for mdev.

"or SRIOV"

Note: I can't really remember what I used for reproducer, mdevs or a SRIOV
card.

> In these cases scheduler blocking on the udev recv() call

I don't think scheduler is blocking anywhere, it's rather old libudev blocking
on the recv call ;)

> is more noticeable. It's expected that future devices will follow
> similar algorithms. Even though the algorithm does present some
> challenges for older OS's (such as Centos 6), trying to rewrite
> the algorithm to fit both models would be more complex and involve
> pulling the monitor object out of the private data lockable object
> and would need to be guarded by a separate lock. Devising such an
> algorithm to work around issues with older OS's at the expense
> of more modern OS algorithms in newer event processing code may
> result in unexpected issues, so the choice is to encourage use
> of newer OS's with newer udev event processing code.
>
> Signed-off-by: John Ferlan <jferlan@redhat.com>
> ---
>
>  v1: https://www.redhat.com/archives/libvir-list/2018-October/msg01053.html
>
>  Changes are from code review with some minor tweaks/editing as I
>  went along. Mistakes are my own!
>
>  src/node_device/node_device_udev.c | 28 ++++++++++++++++++++++++++++
>  1 file changed, 28 insertions(+)
>
> diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
> index 22897591de..f2c2299d4d 100644
> --- a/src/node_device/node_device_udev.c
> +++ b/src/node_device/node_device_udev.c
> @@ -1591,6 +1591,26 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv,
>  }
>
>
> +/**
> + * udevEventHandleThread
> + * @opaque: unused
> + *
> + * Thread to handle the udevEventHandleCallback processing when udev
> + * tells us there's a device change for us (add, modify, delete, etc).
> + *
> + * Once notified there is data to be processed, the actual @device
> + * data retrieval by libudev may be delayed due to how threads are
> + * scheduled. In fact, the event loop could be scheduled earlier than
> + * the handler thread, thus potentially emitting the very same event
> + * the handler thread is currently trying to process, simply because
> + * the data hadn't been retrieved from the socket.
> + *

...

> + * NB: Usage of event based socket algorithm causes some issues with
> + * older platforms such as CentOS 6 in which the data socket is opened

^Sounds a bit too generic, as an event based algorithm is not forced to open a
socket without O_NONBLOCK - just put something like "older platforms' libudev
opens sockets without the NONBLOCK flag which might cause issues with event
based algorithm" or something alike in there.

otherwise looks okay, I don't think a v3 is necessary:
Reviewed-by: Erik Skultety <eskultet@redhat.com>

> + * without the NONBLOCK bit set. That can be avoided by moving the reset
> + * priv->dataReady to just after the udev_monitor_receive_device; however,
> + * scheduler issues would still come into play.
> + */
>  static void
>  udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
>  {
> @@ -1637,6 +1657,9 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
>                  return;
>              }
>
> +            /* Trying to move the reset of the @priv->dataReady flag to
> +             * after the udev_monitor_receive_device wouldn't help much
> +             * due to event mgmt and scheduler timing. */
>              virObjectLock(priv);
>              priv->dataReady = false;
>              virObjectUnlock(priv);
> @@ -1646,6 +1669,11 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
>
>          udevHandleOneDevice(device);
>          udev_device_unref(device);
> +
> +        /* Instead of waiting for the next event after processing @device
> +         * data, let's keep reading from the udev monitor and only wait
> +         * for the next event once either a EAGAIN or a EWOULDBLOCK error
> +         * is encountered. */
>      }
>  }
>
> --
> 2.17.2
>
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2] nodedev: Document the udevEventHandleThread
Posted by John Ferlan 5 years, 5 months ago

On 11/2/18 3:48 AM, Erik Skultety wrote:
> On Thu, Nov 01, 2018 at 01:48:39PM -0400, John Ferlan wrote:
>> Commit cdbe1332 neglected to document the API. So let's add some
>> details about the algorithm and why it was used to help future
>> readers understand the issues encountered. Based largely off a
>> description provided by Erik Skultety <eskultet@redhat.com>.
> 
> Oh, I missed ^this last sentence. Since you're blaming the commit already and
> the description based on a mailing list thread will not be part of the log, you
> should IMHO drop it.
> 
>>
>> NB: Management of the processing udev device notification is a
>> delicate balance between the udev process, the scheduler, and when
>> exactly the data from/for the socket is received. The balance is
>> particularly important for environments when multiple devices are
>> added into the system more or less simultaneously such as is done
>> for mdev.
> 
> "or SRIOV"
> 
> Note: I can't really remember what I used for reproducer, mdevs or a SRIOV
> card.
> 
>> In these cases scheduler blocking on the udev recv() call
> 
> I don't think scheduler is blocking anywhere, it's rather old libudev blocking
> on the recv call ;)
> 
>> is more noticeable. It's expected that future devices will follow
>> similar algorithms. Even though the algorithm does present some
>> challenges for older OS's (such as Centos 6), trying to rewrite
>> the algorithm to fit both models would be more complex and involve
>> pulling the monitor object out of the private data lockable object
>> and would need to be guarded by a separate lock. Devising such an
>> algorithm to work around issues with older OS's at the expense
>> of more modern OS algorithms in newer event processing code may
>> result in unexpected issues, so the choice is to encourage use
>> of newer OS's with newer udev event processing code.
>>
>> Signed-off-by: John Ferlan <jferlan@redhat.com>
>> ---
>>
>>  v1: https://www.redhat.com/archives/libvir-list/2018-October/msg01053.html
>>
>>  Changes are from code review with some minor tweaks/editing as I
>>  went along. Mistakes are my own!
>>
>>  src/node_device/node_device_udev.c | 28 ++++++++++++++++++++++++++++
>>  1 file changed, 28 insertions(+)
>>
>> diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
>> index 22897591de..f2c2299d4d 100644
>> --- a/src/node_device/node_device_udev.c
>> +++ b/src/node_device/node_device_udev.c
>> @@ -1591,6 +1591,26 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv,
>>  }
>>
>>
>> +/**
>> + * udevEventHandleThread
>> + * @opaque: unused
>> + *
>> + * Thread to handle the udevEventHandleCallback processing when udev
>> + * tells us there's a device change for us (add, modify, delete, etc).
>> + *
>> + * Once notified there is data to be processed, the actual @device
>> + * data retrieval by libudev may be delayed due to how threads are
>> + * scheduled. In fact, the event loop could be scheduled earlier than
>> + * the handler thread, thus potentially emitting the very same event
>> + * the handler thread is currently trying to process, simply because
>> + * the data hadn't been retrieved from the socket.
>> + *
> 
> ...
> 
>> + * NB: Usage of event based socket algorithm causes some issues with
>> + * older platforms such as CentOS 6 in which the data socket is opened
> 
> ^Sounds a bit too generic, as an event based algorithm is not forced to open a
> socket without O_NONBLOCK - just put something like "older platforms' libudev
> opens sockets without the NONBLOCK flag which might cause issues with event
> based algorithm" or something alike in there.
> 

 * NB: Some older distros, such as CentOS 6, libudev opens sockets
 * without the NONBLOCK flag which might cause issues with event
 * based algorithm. Although the issue can be mitigated by resetting
 * priv->dataReady for each event found; however, the scheduler issues
 * would still come into play.

Should I drop the "Although the issue..." as well? IDC... mainly trying
to avoid the "trap" of patches looking to fix older distros...

Tks -

John


> otherwise looks okay, I don't think a v3 is necessary:
> Reviewed-by: Erik Skultety <eskultet@redhat.com>
> 
>> + * without the NONBLOCK bit set. That can be avoided by moving the reset
>> + * priv->dataReady to just after the udev_monitor_receive_device; however,
>> + * scheduler issues would still come into play.
>> + */
>>  static void
>>  udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
>>  {
>> @@ -1637,6 +1657,9 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
>>                  return;
>>              }
>>
>> +            /* Trying to move the reset of the @priv->dataReady flag to
>> +             * after the udev_monitor_receive_device wouldn't help much
>> +             * due to event mgmt and scheduler timing. */
>>              virObjectLock(priv);
>>              priv->dataReady = false;
>>              virObjectUnlock(priv);
>> @@ -1646,6 +1669,11 @@ udevEventHandleThread(void *opaque ATTRIBUTE_UNUSED)
>>
>>          udevHandleOneDevice(device);
>>          udev_device_unref(device);
>> +
>> +        /* Instead of waiting for the next event after processing @device
>> +         * data, let's keep reading from the udev monitor and only wait
>> +         * for the next event once either a EAGAIN or a EWOULDBLOCK error
>> +         * is encountered. */
>>      }
>>  }
>>
>> --
>> 2.17.2
>>
>> --
>> libvir-list mailing list
>> libvir-list@redhat.com
>> https://www.redhat.com/mailman/listinfo/libvir-list

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2] nodedev: Document the udevEventHandleThread
Posted by Erik Skultety 5 years, 5 months ago
On Fri, Nov 02, 2018 at 07:46:35AM -0400, John Ferlan wrote:
>
>
> On 11/2/18 3:48 AM, Erik Skultety wrote:
> > On Thu, Nov 01, 2018 at 01:48:39PM -0400, John Ferlan wrote:
> >> Commit cdbe1332 neglected to document the API. So let's add some
> >> details about the algorithm and why it was used to help future
> >> readers understand the issues encountered. Based largely off a
> >> description provided by Erik Skultety <eskultet@redhat.com>.
> >
> > Oh, I missed ^this last sentence. Since you're blaming the commit already and
> > the description based on a mailing list thread will not be part of the log, you
> > should IMHO drop it.
> >
> >>
> >> NB: Management of the processing udev device notification is a
> >> delicate balance between the udev process, the scheduler, and when
> >> exactly the data from/for the socket is received. The balance is
> >> particularly important for environments when multiple devices are
> >> added into the system more or less simultaneously such as is done
> >> for mdev.
> >
> > "or SRIOV"
> >
> > Note: I can't really remember what I used for reproducer, mdevs or a SRIOV
> > card.
> >
> >> In these cases scheduler blocking on the udev recv() call
> >
> > I don't think scheduler is blocking anywhere, it's rather old libudev blocking
> > on the recv call ;)
> >
> >> is more noticeable. It's expected that future devices will follow
> >> similar algorithms. Even though the algorithm does present some
> >> challenges for older OS's (such as Centos 6), trying to rewrite
> >> the algorithm to fit both models would be more complex and involve
> >> pulling the monitor object out of the private data lockable object
> >> and would need to be guarded by a separate lock. Devising such an
> >> algorithm to work around issues with older OS's at the expense
> >> of more modern OS algorithms in newer event processing code may
> >> result in unexpected issues, so the choice is to encourage use
> >> of newer OS's with newer udev event processing code.
> >>
> >> Signed-off-by: John Ferlan <jferlan@redhat.com>
> >> ---
> >>
> >>  v1: https://www.redhat.com/archives/libvir-list/2018-October/msg01053.html
> >>
> >>  Changes are from code review with some minor tweaks/editing as I
> >>  went along. Mistakes are my own!
> >>
> >>  src/node_device/node_device_udev.c | 28 ++++++++++++++++++++++++++++
> >>  1 file changed, 28 insertions(+)
> >>
> >> diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
> >> index 22897591de..f2c2299d4d 100644
> >> --- a/src/node_device/node_device_udev.c
> >> +++ b/src/node_device/node_device_udev.c
> >> @@ -1591,6 +1591,26 @@ udevEventMonitorSanityCheck(udevEventDataPtr priv,
> >>  }
> >>
> >>
> >> +/**
> >> + * udevEventHandleThread
> >> + * @opaque: unused
> >> + *
> >> + * Thread to handle the udevEventHandleCallback processing when udev
> >> + * tells us there's a device change for us (add, modify, delete, etc).
> >> + *
> >> + * Once notified there is data to be processed, the actual @device
> >> + * data retrieval by libudev may be delayed due to how threads are
> >> + * scheduled. In fact, the event loop could be scheduled earlier than
> >> + * the handler thread, thus potentially emitting the very same event
> >> + * the handler thread is currently trying to process, simply because
> >> + * the data hadn't been retrieved from the socket.
> >> + *
> >
> > ...
> >
> >> + * NB: Usage of event based socket algorithm causes some issues with
> >> + * older platforms such as CentOS 6 in which the data socket is opened
> >
> > ^Sounds a bit too generic, as an event based algorithm is not forced to open a
> > socket without O_NONBLOCK - just put something like "older platforms' libudev
> > opens sockets without the NONBLOCK flag which might cause issues with event
> > based algorithm" or something alike in there.
> >
>
>  * NB: Some older distros, such as CentOS 6, libudev opens sockets
>  * without the NONBLOCK flag which might cause issues with event
>  * based algorithm. Although the issue can be mitigated by resetting
>  * priv->dataReady for each event found; however, the scheduler issues
>  * would still come into play.

Sounds good...

Erik

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2] nodedev: Document the udevEventHandleThread
Posted by Andrea Bolognani 5 years, 5 months ago
On Fri, 2018-11-02 at 07:46 -0400, John Ferlan wrote:
> > > + * NB: Usage of event based socket algorithm causes some issues with
> > > + * older platforms such as CentOS 6 in which the data socket is opened
> > 
> > ^Sounds a bit too generic, as an event based algorithm is not forced to open a
> > socket without O_NONBLOCK - just put something like "older platforms' libudev
> > opens sockets without the NONBLOCK flag which might cause issues with event
> > based algorithm" or something alike in there.
> 
>  * NB: Some older distros, such as CentOS 6, libudev opens sockets
>  * without the NONBLOCK flag which might cause issues with event
>  * based algorithm. Although the issue can be mitigated by resetting
>  * priv->dataReady for each event found; however, the scheduler issues
>  * would still come into play.

I haven't looked into this at all so I might be missing something
entirely obvious, but I'd just like to point out that we no longer
target RHEL 6 / CentOS 6, so any issue that's specific to those
platforms we no longer have to concern ourselves with.

-- 
Andrea Bolognani / Red Hat / Virtualization

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v2] nodedev: Document the udevEventHandleThread
Posted by Erik Skultety 5 years, 5 months ago
On Fri, Nov 02, 2018 at 01:33:32PM +0100, Andrea Bolognani wrote:
> On Fri, 2018-11-02 at 07:46 -0400, John Ferlan wrote:
> > > > + * NB: Usage of event based socket algorithm causes some issues with
> > > > + * older platforms such as CentOS 6 in which the data socket is opened
> > >
> > > ^Sounds a bit too generic, as an event based algorithm is not forced to open a
> > > socket without O_NONBLOCK - just put something like "older platforms' libudev
> > > opens sockets without the NONBLOCK flag which might cause issues with event
> > > based algorithm" or something alike in there.
> >
> >  * NB: Some older distros, such as CentOS 6, libudev opens sockets
> >  * without the NONBLOCK flag which might cause issues with event
> >  * based algorithm. Although the issue can be mitigated by resetting
> >  * priv->dataReady for each event found; however, the scheduler issues
> >  * would still come into play.
>
> I haven't looked into this at all so I might be missing something
> entirely obvious, but I'd just like to point out that we no longer
> target RHEL 6 / CentOS 6, so any issue that's specific to those
> platforms we no longer have to concern ourselves with.

Right, I pointed that out in my review for the patch which was the impetus for
this patch. But John's right here that it's better we document the reasons for
the algorithm used which might save us from NACKing any further attempts to
change the code to suit old platforms, actually we might benefit from the
description ourselves at some point in the future, so I'm fine with that.

Erik

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list