[v4] Work around the kernel mdev uevent race in nodedev

[libvirt] [PATCH v4 5/7] nodedev: Disable/re-enable polling on the udev fd

Posted by Erik Skultety 8 years, 4 months ago

The event loop may get scheduled earlier than the udev event handler
thread which means that it would keep invoking the handler callback with
"new" events, while in fact it's most likely still the same event which
the handler thread hasn't managed to remove from the socket queue yet.
This is due to unwanted increments of the number of events queuing on the
socket and causes the handler thread to spam the logs with an error about
libudev returning NULL (errno == EAGAIN).

Signed-off-by: Erik Skultety <eskultet@redhat.com>
---
 src/node_device/node_device_udev.c | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
index 96760d1e4..70e15ffb8 100644
--- a/src/node_device/node_device_udev.c
+++ b/src/node_device/node_device_udev.c
@@ -1696,6 +1696,7 @@ udevEventCheckMonitorFD(struct udev_monitor *udev_monitor,
 static void
 udevEventHandleThread(void *opaque)
 {
+    udevPrivate *priv = NULL;
     udevEventThreadDataPtr privateData = opaque;
     struct udev_device *device = NULL;
     struct udev_monitor *udev_monitor = NULL;
@@ -1715,6 +1716,7 @@ udevEventHandleThread(void *opaque)
         virMutexUnlock(&privateData->lock);
 
         nodeDeviceLock();
+        priv = driver->privateData;
         udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
 
         if (!udevEventCheckMonitorFD(udev_monitor, privateData->monitor_fd)) {
@@ -1725,6 +1727,9 @@ udevEventHandleThread(void *opaque)
         device = udev_monitor_receive_device(udev_monitor);
         nodeDeviceUnlock();
 
+        /* Re-enable polling for new events on the @udev_monitor */
+        virEventUpdateHandle(priv->watch, VIR_EVENT_HANDLE_READABLE);
+
         if (!device) {
             virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
                            _("udev_monitor_receive_device returned NULL"));
@@ -1750,10 +1755,12 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
                         int events ATTRIBUTE_UNUSED,
                         void *opaque)
 {
+    udevPrivate *priv = NULL;
     struct udev_monitor *udev_monitor = NULL;
     udevEventThreadDataPtr threadData = opaque;
 
     nodeDeviceLock();
+    priv = driver->privateData;
     udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
 
     if (!udevEventCheckMonitorFD(udev_monitor, fd)) {
@@ -1771,6 +1778,16 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
     threadData->nevents++;
     virCondSignal(&threadData->threadCond);
     virMutexUnlock(&threadData->lock);
+
+    /* Due to scheduling, the eventloop might poll the udev monitor before the
+     * handler thread actually removes the data from the socket, thus causing it
+     * to spam errors about data not being ready yet, because
+     * udevEventHandleCallback would be invoked multiple times incrementing the
+     * counter of events queuing for a single event.
+     * Therefore we need to disable polling on the fd until the thread actually
+     * removes the data from the socket.
+     */
+    virEventUpdateHandle(priv->watch, 0);
 }
 
 
-- 
2.13.5

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 5/7] nodedev: Disable/re-enable polling on the udev fd

Posted by John Ferlan 8 years, 4 months ago


On 09/18/2017 12:34 PM, Erik Skultety wrote:
> The event loop may get scheduled earlier than the udev event handler
> thread which means that it would keep invoking the handler callback with
> "new" events, while in fact it's most likely still the same event which
> the handler thread hasn't managed to remove from the socket queue yet.
> This is due to unwanted increments of the number of events queuing on the
> socket and causes the handler thread to spam the logs with an error about
> libudev returning NULL (errno == EAGAIN).
> 
> Signed-off-by: Erik Skultety <eskultet@redhat.com>
> ---
>  src/node_device/node_device_udev.c | 17 +++++++++++++++++
>  1 file changed, 17 insertions(+)
> 
> diff --git a/src/node_device/node_device_udev.c b/src/node_device/node_device_udev.c
> index 96760d1e4..70e15ffb8 100644
> --- a/src/node_device/node_device_udev.c
> +++ b/src/node_device/node_device_udev.c
> @@ -1696,6 +1696,7 @@ udevEventCheckMonitorFD(struct udev_monitor *udev_monitor,
>  static void
>  udevEventHandleThread(void *opaque)
>  {
> +    udevPrivate *priv = NULL;
>      udevEventThreadDataPtr privateData = opaque;
>      struct udev_device *device = NULL;
>      struct udev_monitor *udev_monitor = NULL;
> @@ -1715,6 +1716,7 @@ udevEventHandleThread(void *opaque)
>          virMutexUnlock(&privateData->lock);
>  
>          nodeDeviceLock();
> +        priv = driver->privateData;
>          udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
>  
>          if (!udevEventCheckMonitorFD(udev_monitor, privateData->monitor_fd)) {
> @@ -1725,6 +1727,9 @@ udevEventHandleThread(void *opaque)
>          device = udev_monitor_receive_device(udev_monitor);
>          nodeDeviceUnlock();
>  
> +        /* Re-enable polling for new events on the @udev_monitor */
> +        virEventUpdateHandle(priv->watch, VIR_EVENT_HANDLE_READABLE);
> +

I think this should only be done when privateData->nevents == 0?  If we
have multiple events to read, then calling virEventPollUpdateHandle,
(eventually) for every pass through the loop seems like a bit of
overkill especially if udevEventHandleCallback turns right around and
disables it again.

Also fortunately there isn't more than one udev thread sending the
events since you access the priv->watch without the driver lock...

Conversely perhaps we only disable if events > 1... Then again, how does
one get to 2 events queued if we're disabling as soon as we increment
nevents? Wouldn't this totally obviate the need for nevents?

I would think it would be overkill to disable/enable for just 1 event.
If multiple are queued, then yeah we're starting to get behind... Of
course that could effect the re-enable when events == 1 (or some
MAGIC_CONSTANT_NUMBER_THRESHOLD_TO_DISABLE_POLLING)

IIRC from the previous trip down this rabbit hole (I did briefly go back
and read the comments) that what you're trying to avoid is the following
message spamming the error logs because of a scheduling mishap... Thus
perhaps the following message could only be displayed if nevents > 1 and
nothing was found knowing that we have this "timing problem" between
udev, this thread, and the fd polling scanner of when a device should be
"found"...  and then reset nevents == 0 with the appropriate lock.

Curious on your thoughts before an R-B/ACK though.

John


>          if (!device) {
>              virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>                             _("udev_monitor_receive_device returned NULL"));
> @@ -1750,10 +1755,12 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
>                          int events ATTRIBUTE_UNUSED,
>                          void *opaque)
>  {
> +    udevPrivate *priv = NULL;
>      struct udev_monitor *udev_monitor = NULL;
>      udevEventThreadDataPtr threadData = opaque;
>  
>      nodeDeviceLock();
> +    priv = driver->privateData;
>      udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
>  
>      if (!udevEventCheckMonitorFD(udev_monitor, fd)) {
> @@ -1771,6 +1778,16 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
>      threadData->nevents++;
>      virCondSignal(&threadData->threadCond);
>      virMutexUnlock(&threadData->lock);
> +
> +    /* Due to scheduling, the eventloop might poll the udev monitor before the
> +     * handler thread actually removes the data from the socket, thus causing it
> +     * to spam errors about data not being ready yet, because
> +     * udevEventHandleCallback would be invoked multiple times incrementing the
> +     * counter of events queuing for a single event.
> +     * Therefore we need to disable polling on the fd until the thread actually
> +     * removes the data from the socket.
> +     */
> +    virEventUpdateHandle(priv->watch, 0);
>  }
>  
>  
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 5/7] nodedev: Disable/re-enable polling on the udev fd

Posted by Erik Skultety 8 years, 4 months ago

[...]

> >
> >          nodeDeviceLock();
> > +        priv = driver->privateData;
> >          udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
> >
> >          if (!udevEventCheckMonitorFD(udev_monitor, privateData->monitor_fd)) {
> > @@ -1725,6 +1727,9 @@ udevEventHandleThread(void *opaque)
> >          device = udev_monitor_receive_device(udev_monitor);
> >          nodeDeviceUnlock();
> >
> > +        /* Re-enable polling for new events on the @udev_monitor */
> > +        virEventUpdateHandle(priv->watch, VIR_EVENT_HANDLE_READABLE);
> > +
>
> I think this should only be done when privateData->nevents == 0?  If we
> have multiple events to read, then calling virEventPollUpdateHandle,
> (eventually) for every pass through the loop seems like a bit of
> overkill especially if udevEventHandleCallback turns right around and
> disables it again.
>
> Also fortunately there isn't more than one udev thread sending the
> events since you access the priv->watch without the driver lock...
>
> Conversely perhaps we only disable if events > 1... Then again, how does
> one get to 2 events queued if we're disabling as soon as we increment

Very good point, technically events would still get queued, we just wouldn't
check and yes, we would process 1 event at a time. Not optimal, but if you look
at the original code and compare it with this one performance-wise (and I hope
I haven't missed anything that would render everything I write next a complete
rubbish), the time complexity hasn't changed, the space complexity hasn't
changed, what changed is code complexity which makes the code a bit slower due
to the excessive locking and toggling the FD polling back and forth. So
essentially you've got the same thing as you had before..but asynchronous.
However, yes, the usage of @nevents is completely useless now (haven't realized
that immediately, thanks) and a simple signalling should suffice.

So how could we make it faster though? I thought more about the idea you shared
in one of the previous reviews, letting the thread actually pull all the data
from the monitor, to which I IIRC replied something in the sense that the event
counting mechanism wouldn't allow that and it would break. Okay, let's drop the
event counting. What if we now let both the udev handler thread and the event
loop "poll" the file descriptor, IOW let the event loop polling the monitor fd,
thus invoking udevHandleCallback which would in turn keep signalling the handler
thread that there are some data. The difference now in the handler thread would
be that it wouldn't blindly trust the callback about the data, because of the
scheduling issue, it would keep poking the monitor itself until it gets either
EAGAIN or EWOULDBLOCK from recvmsg() called in libudev,  which would then be the
signal to start waiting on the condition. The next an event appears, a signal
from udevHandleCallback would finally have a meaning and wouldn't be ignored.

This way, it's actually the handler thread who's 'the boss', as most of the
signals from udevHandleCallback would be lost/NOPs.

Would you agree to such an approach?

Erik

> nevents? Wouldn't this totally obviate the need for nevents?
>
> I would think it would be overkill to disable/enable for just 1 event.
> If multiple are queued, then yeah we're starting to get behind... Of
> course that could effect the re-enable when events == 1 (or some
> MAGIC_CONSTANT_NUMBER_THRESHOLD_TO_DISABLE_POLLING)

This magic threshold approach still wouldn't work because you don't know
anything about the event at that point, you'd have to process them to actually
find out whether that's a legitimate event or it's an already noted event that
hasn't been processed yet, so it would still mess with your counters.

>
> IIRC from the previous trip down this rabbit hole (I did briefly go back
> and read the comments) that what you're trying to avoid is the following
> message spamming the error logs because of a scheduling mishap... Thus
> perhaps the following message could only be displayed if nevents > 1 and
> nothing was found knowing that we have this "timing problem" between
> udev, this thread, and the fd polling scanner of when a device should be
> "found"...  and then reset nevents == 0 with the appropriate lock.
>
> Curious on your thoughts before an R-B/ACK though.
>
> John
>
>
> >          if (!device) {
> >              virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> >                             _("udev_monitor_receive_device returned NULL"));
> > @@ -1750,10 +1755,12 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
> >                          int events ATTRIBUTE_UNUSED,
> >                          void *opaque)
> >  {
> > +    udevPrivate *priv = NULL;
> >      struct udev_monitor *udev_monitor = NULL;
> >      udevEventThreadDataPtr threadData = opaque;
> >
> >      nodeDeviceLock();
> > +    priv = driver->privateData;
> >      udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
> >
> >      if (!udevEventCheckMonitorFD(udev_monitor, fd)) {
> > @@ -1771,6 +1778,16 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
> >      threadData->nevents++;
> >      virCondSignal(&threadData->threadCond);
> >      virMutexUnlock(&threadData->lock);
> > +
> > +    /* Due to scheduling, the eventloop might poll the udev monitor before the
> > +     * handler thread actually removes the data from the socket, thus causing it
> > +     * to spam errors about data not being ready yet, because
> > +     * udevEventHandleCallback would be invoked multiple times incrementing the
> > +     * counter of events queuing for a single event.
> > +     * Therefore we need to disable polling on the fd until the thread actually
> > +     * removes the data from the socket.
> > +     */
> > +    virEventUpdateHandle(priv->watch, 0);
> >  }
> >
> >
> >

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 5/7] nodedev: Disable/re-enable polling on the udev fd

Posted by John Ferlan 8 years, 4 months ago


On 09/28/2017 06:00 AM, Erik Skultety wrote:
> [...]
> 
>>>
>>>          nodeDeviceLock();
>>> +        priv = driver->privateData;
>>>          udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
>>>
>>>          if (!udevEventCheckMonitorFD(udev_monitor, privateData->monitor_fd)) {
>>> @@ -1725,6 +1727,9 @@ udevEventHandleThread(void *opaque)
>>>          device = udev_monitor_receive_device(udev_monitor);
>>>          nodeDeviceUnlock();
>>>
>>> +        /* Re-enable polling for new events on the @udev_monitor */
>>> +        virEventUpdateHandle(priv->watch, VIR_EVENT_HANDLE_READABLE);
>>> +
>>
>> I think this should only be done when privateData->nevents == 0?  If we
>> have multiple events to read, then calling virEventPollUpdateHandle,
>> (eventually) for every pass through the loop seems like a bit of
>> overkill especially if udevEventHandleCallback turns right around and
>> disables it again.
>>
>> Also fortunately there isn't more than one udev thread sending the
>> events since you access the priv->watch without the driver lock...
>>
>> Conversely perhaps we only disable if events > 1... Then again, how does
>> one get to 2 events queued if we're disabling as soon as we increment
> 
> Very good point, technically events would still get queued, we just wouldn't
> check and yes, we would process 1 event at a time. Not optimal, but if you look
> at the original code and compare it with this one performance-wise (and I hope
> I haven't missed anything that would render everything I write next a complete
> rubbish), the time complexity hasn't changed, the space complexity hasn't
> changed, what changed is code complexity which makes the code a bit slower due
> to the excessive locking and toggling the FD polling back and forth. So
> essentially you've got the same thing as you had before..but asynchronous.
> However, yes, the usage of @nevents is completely useless now (haven't realized
> that immediately, thanks) and a simple signalling should suffice.

Having it "slower" is necessarily bad ;-)  That gives some of the other
slower buggers a chance to fill in the details we need. Throwing the
control back to udev quicker could aid in that too.

> 
> So how could we make it faster though? I thought more about the idea you shared
> in one of the previous reviews, letting the thread actually pull all the data
> from the monitor, to which I IIRC replied something in the sense that the event
> counting mechanism wouldn't allow that and it would break. Okay, let's drop the
> event counting. What if we now let both the udev handler thread and the event
> loop "poll" the file descriptor, IOW let the event loop polling the monitor fd,
> thus invoking udevHandleCallback which would in turn keep signalling the handler
> thread that there are some data. The difference now in the handler thread would
> be that it wouldn't blindly trust the callback about the data, because of the
> scheduling issue, it would keep poking the monitor itself until it gets either
> EAGAIN or EWOULDBLOCK from recvmsg() called in libudev,  which would then be the
> signal to start waiting on the condition. The next an event appears, a signal
> from udevHandleCallback would finally have a meaning and wouldn't be ignored.

Is making it faster really a goal?  It's preferable that it works
consistently I think. The various errno possibilities and the desire to
avoid the "udev_monitor_receive_device returned NULL" message processing
because we had too many "cooks in the kitchen" trying to determine
whether a new device was really available or was this just another
notification for something we're already processing.

Also, not that I expect the udev code to change, but if a new errno is
added then we may have to keep up... Always a fear especially if we're
using the errno to dictate our algorithm.

> 
> This way, it's actually the handler thread who's 'the boss', as most of the
> signals from udevHandleCallback would be lost/NOPs.
> 
> Would you agree to such an approach?

At some point we get too fancy and think too hard about a problem
letting it consume us.  I think when I presented my idea - I wasn't
really understanding the details of how we consume the udev data. Now
that I have a bit more of a clue - perhaps the original idea wasn't so
good. If we receive multiple notifications that a device is ready to be
processed even though we could be processing it - how are we to truly
know whether we missed one or we really got it and udev was just
reminding us again.

I'm not against looking at a different mechanism - the question then
becomes from your perspective is it worth it?

John

> 
> Erik
> 
>> nevents? Wouldn't this totally obviate the need for nevents?
>>
>> I would think it would be overkill to disable/enable for just 1 event.
>> If multiple are queued, then yeah we're starting to get behind... Of
>> course that could effect the re-enable when events == 1 (or some
>> MAGIC_CONSTANT_NUMBER_THRESHOLD_TO_DISABLE_POLLING)
> 
> This magic threshold approach still wouldn't work because you don't know
> anything about the event at that point, you'd have to process them to actually
> find out whether that's a legitimate event or it's an already noted event that
> hasn't been processed yet, so it would still mess with your counters.
> 
>>
>> IIRC from the previous trip down this rabbit hole (I did briefly go back
>> and read the comments) that what you're trying to avoid is the following
>> message spamming the error logs because of a scheduling mishap... Thus
>> perhaps the following message could only be displayed if nevents > 1 and
>> nothing was found knowing that we have this "timing problem" between
>> udev, this thread, and the fd polling scanner of when a device should be
>> "found"...  and then reset nevents == 0 with the appropriate lock.
>>
>> Curious on your thoughts before an R-B/ACK though.
>>
>> John
>>
>>
>>>          if (!device) {
>>>              virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
>>>                             _("udev_monitor_receive_device returned NULL"));
>>> @@ -1750,10 +1755,12 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
>>>                          int events ATTRIBUTE_UNUSED,
>>>                          void *opaque)
>>>  {
>>> +    udevPrivate *priv = NULL;
>>>      struct udev_monitor *udev_monitor = NULL;
>>>      udevEventThreadDataPtr threadData = opaque;
>>>
>>>      nodeDeviceLock();
>>> +    priv = driver->privateData;
>>>      udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
>>>
>>>      if (!udevEventCheckMonitorFD(udev_monitor, fd)) {
>>> @@ -1771,6 +1778,16 @@ udevEventHandleCallback(int watch ATTRIBUTE_UNUSED,
>>>      threadData->nevents++;
>>>      virCondSignal(&threadData->threadCond);
>>>      virMutexUnlock(&threadData->lock);
>>> +
>>> +    /* Due to scheduling, the eventloop might poll the udev monitor before the
>>> +     * handler thread actually removes the data from the socket, thus causing it
>>> +     * to spam errors about data not being ready yet, because
>>> +     * udevEventHandleCallback would be invoked multiple times incrementing the
>>> +     * counter of events queuing for a single event.
>>> +     * Therefore we need to disable polling on the fd until the thread actually
>>> +     * removes the data from the socket.
>>> +     */
>>> +    virEventUpdateHandle(priv->watch, 0);
>>>  }
>>>
>>>
>>>

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 5/7] nodedev: Disable/re-enable polling on the udev fd

Posted by Erik Skultety 8 years, 4 months ago

On Fri, Sep 29, 2017 at 09:46:48AM -0400, John Ferlan wrote:
>
>
> On 09/28/2017 06:00 AM, Erik Skultety wrote:
> > [...]
> >
> >>>
> >>>          nodeDeviceLock();
> >>> +        priv = driver->privateData;
> >>>          udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
> >>>
> >>>          if (!udevEventCheckMonitorFD(udev_monitor, privateData->monitor_fd)) {
> >>> @@ -1725,6 +1727,9 @@ udevEventHandleThread(void *opaque)
> >>>          device = udev_monitor_receive_device(udev_monitor);
> >>>          nodeDeviceUnlock();
> >>>
> >>> +        /* Re-enable polling for new events on the @udev_monitor */
> >>> +        virEventUpdateHandle(priv->watch, VIR_EVENT_HANDLE_READABLE);
> >>> +
> >>
> >> I think this should only be done when privateData->nevents == 0?  If we
> >> have multiple events to read, then calling virEventPollUpdateHandle,
> >> (eventually) for every pass through the loop seems like a bit of
> >> overkill especially if udevEventHandleCallback turns right around and
> >> disables it again.
> >>
> >> Also fortunately there isn't more than one udev thread sending the
> >> events since you access the priv->watch without the driver lock...
> >>
> >> Conversely perhaps we only disable if events > 1... Then again, how does
> >> one get to 2 events queued if we're disabling as soon as we increment
> >
> > Very good point, technically events would still get queued, we just wouldn't
> > check and yes, we would process 1 event at a time. Not optimal, but if you look
> > at the original code and compare it with this one performance-wise (and I hope
> > I haven't missed anything that would render everything I write next a complete
> > rubbish), the time complexity hasn't changed, the space complexity hasn't
> > changed, what changed is code complexity which makes the code a bit slower due
> > to the excessive locking and toggling the FD polling back and forth. So
> > essentially you've got the same thing as you had before..but asynchronous.
> > However, yes, the usage of @nevents is completely useless now (haven't realized
> > that immediately, thanks) and a simple signalling should suffice.
>
> Having it "slower" is necessarily bad ;-)  That gives some of the other
> slower buggers a chance to fill in the details we need. Throwing the
> control back to udev quicker could aid in that too.

I'm sorry, but I don't follow, how do you hand control back over to something
you can't control? udev is not paused in any way during the phase libvirt is
processing the events, so it keeps pushing new events to the socket queue,
until it can't push any more.

>
> >
> > So how could we make it faster though? I thought more about the idea you shared
> > in one of the previous reviews, letting the thread actually pull all the data
> > from the monitor, to which I IIRC replied something in the sense that the event
> > counting mechanism wouldn't allow that and it would break. Okay, let's drop the
> > event counting. What if we now let both the udev handler thread and the event
> > loop "poll" the file descriptor, IOW let the event loop polling the monitor fd,
> > thus invoking udevHandleCallback which would in turn keep signalling the handler
> > thread that there are some data. The difference now in the handler thread would
> > be that it wouldn't blindly trust the callback about the data, because of the
> > scheduling issue, it would keep poking the monitor itself until it gets either
> > EAGAIN or EWOULDBLOCK from recvmsg() called in libudev,  which would then be the
> > signal to start waiting on the condition. The next an event appears, a signal
> > from udevHandleCallback would finally have a meaning and wouldn't be ignored.
>
> Is making it faster really a goal?  It's preferable that it works

"I would think it would be overkill to disable/enable for just 1 event." I was
trying to come up with something faster based on that statement.

> consistently I think. The various errno possibilities and the desire to
> avoid the "udev_monitor_receive_device returned NULL" message processing
> because we had too many "cooks in the kitchen" trying to determine
> whether a new device was really available or was this just another
> notification for something we're already processing.
>
> Also, not that I expect the udev code to change, but if a new errno is
> added then we may have to keep up... Always a fear especially if we're

Not necessarily, that would mean that either libudev replaces recvmsg with
something else or that recvmsg starts returning more/different errnos to signal
that there are no data to be pulled - since there are already 2 such errnos,
it's highly unlikely IMHO. Unless libudev changes substantially, in terms of
error signaling, I think we can safely ignore the other errnos as the actual
outcome for libvirt is that libudev failed because of some socket error, but
since that can either come from kernel or from libudev itself, the best thing
we can do is shrug our shoulders and say "libudev failed for some reason, but
we can't tell why".

> using the errno to dictate our algorithm.
>
> >
> > This way, it's actually the handler thread who's 'the boss', as most of the
> > signals from udevHandleCallback would be lost/NOPs.
> >
> > Would you agree to such an approach?
>
> At some point we get too fancy and think too hard about a problem
> letting it consume us.  I think when I presented my idea - I wasn't
> really understanding the details of how we consume the udev data. Now
> that I have a bit more of a clue - perhaps the original idea wasn't so
> good. If we receive multiple notifications that a device is ready to be
> processed even though we could be processing it - how are we to truly
> know whether we missed one or we really got it and udev was just
> reminding us again.

udev doesn't remind us of anything, it's our event loop that does - if we don't
pre-process the events somehow, we can't tell them apart and even if we did, it
would be utterly unreliable since you'd have to include some kind of nonce and
a hash to be really sure that it's a duplicate event and not a device that was
un-plugged and re-plugged again in a very short period of time. So, this issue
is not a matter of our communication with udev, it's about how good can we
manage our discover-pull-process event algorithm.

>
> I'm not against looking at a different mechanism - the question then
> becomes from your perspective is it worth it?

I think the ultimate question is how fast do we want to deliver a fix. Of
course I'm open to explore more approaches, but we can do that even with a
"preliminary" fix included (even though, honestly, once this is merged, I doubt
anyone will find time to do any kind of experiments unless there's another
breakage to fix in this code). So, based on what I just wrote, I'm inclined to
say no, it really wouldn't be of much worth.

Even though I came with another proposal, I personally still like the previous
version with one-by-one event version, since despite (perhaps) being a bit
slower, it's a more transparent and consistent (minus the event counter)
solution than the one I proposed the last time.

Erik

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 5/7] nodedev: Disable/re-enable polling on the udev fd

Posted by John Ferlan 8 years, 4 months ago


On 10/02/2017 03:19 AM, Erik Skultety wrote:
> On Fri, Sep 29, 2017 at 09:46:48AM -0400, John Ferlan wrote:
>>
>>
>> On 09/28/2017 06:00 AM, Erik Skultety wrote:
>>> [...]
>>>
>>>>>
>>>>>          nodeDeviceLock();
>>>>> +        priv = driver->privateData;
>>>>>          udev_monitor = DRV_STATE_UDEV_MONITOR(driver);
>>>>>
>>>>>          if (!udevEventCheckMonitorFD(udev_monitor, privateData->monitor_fd)) {
>>>>> @@ -1725,6 +1727,9 @@ udevEventHandleThread(void *opaque)
>>>>>          device = udev_monitor_receive_device(udev_monitor);
>>>>>          nodeDeviceUnlock();
>>>>>
>>>>> +        /* Re-enable polling for new events on the @udev_monitor */
>>>>> +        virEventUpdateHandle(priv->watch, VIR_EVENT_HANDLE_READABLE);
>>>>> +
>>>>
>>>> I think this should only be done when privateData->nevents == 0?  If we
>>>> have multiple events to read, then calling virEventPollUpdateHandle,
>>>> (eventually) for every pass through the loop seems like a bit of
>>>> overkill especially if udevEventHandleCallback turns right around and
>>>> disables it again.
>>>>
>>>> Also fortunately there isn't more than one udev thread sending the
>>>> events since you access the priv->watch without the driver lock...
>>>>
>>>> Conversely perhaps we only disable if events > 1... Then again, how does
>>>> one get to 2 events queued if we're disabling as soon as we increment
>>>
>>> Very good point, technically events would still get queued, we just wouldn't
>>> check and yes, we would process 1 event at a time. Not optimal, but if you look
>>> at the original code and compare it with this one performance-wise (and I hope
>>> I haven't missed anything that would render everything I write next a complete
>>> rubbish), the time complexity hasn't changed, the space complexity hasn't
>>> changed, what changed is code complexity which makes the code a bit slower due
>>> to the excessive locking and toggling the FD polling back and forth. So
>>> essentially you've got the same thing as you had before..but asynchronous.
>>> However, yes, the usage of @nevents is completely useless now (haven't realized
>>> that immediately, thanks) and a simple signalling should suffice.
>>
>> Having it "slower" is necessarily bad ;-)  That gives some of the other
>> slower buggers a chance to fill in the details we need. Throwing the
>> control back to udev quicker could aid in that too.
> 
> I'm sorry, but I don't follow, how do you hand control back over to something
> you can't control? udev is not paused in any way during the phase libvirt is
> processing the events, so it keeps pushing new events to the socket queue,
> until it can't push any more.
> 

Hmmm... If the problem we're chasing is that some other consumer isn't
processing "fast enough" for our needs (for whatever reason), then by
perhaps making the libvirt implementation a bit slower because now we
have to "wait" for the thread to be signaled that something's there,
then perhaps, just maybe, with any good fortune the thing that was
causing the problem before would have "just enough time" to complete
such that we'll never have to enter the timing loop in subsequent
patches. Hopefully that helps explain.


>>
>>>
>>> So how could we make it faster though? I thought more about the idea you shared
>>> in one of the previous reviews, letting the thread actually pull all the data
>>> from the monitor, to which I IIRC replied something in the sense that the event
>>> counting mechanism wouldn't allow that and it would break. Okay, let's drop the
>>> event counting. What if we now let both the udev handler thread and the event
>>> loop "poll" the file descriptor, IOW let the event loop polling the monitor fd,
>>> thus invoking udevHandleCallback which would in turn keep signalling the handler
>>> thread that there are some data. The difference now in the handler thread would
>>> be that it wouldn't blindly trust the callback about the data, because of the
>>> scheduling issue, it would keep poking the monitor itself until it gets either
>>> EAGAIN or EWOULDBLOCK from recvmsg() called in libudev,  which would then be the
>>> signal to start waiting on the condition. The next an event appears, a signal
>>> from udevHandleCallback would finally have a meaning and wouldn't be ignored.
>>
>> Is making it faster really a goal?  It's preferable that it works
> 
> "I would think it would be overkill to disable/enable for just 1 event." I was
> trying to come up with something faster based on that statement.
> 

True and understood... But can the thread rely on two factors - "A" we
were told there was an event and "B" we process until we get the NULL
(w/ EAGAIN) from the udev_monitor_receive_device.

If so, then having the thread process until empty is perhaps the way out
of the disable/enable for just 1 event and keeping track of the count of
events. If not then I think I agree with the sentiment later in this
response of one-by-one version.

If we go with process until NULL type logic, would the thread need some
sort of wakeup timeout to ensure we don't miss something because of some
scheduling thing? I don't think so, but had to ask anyway.

>> consistently I think. The various errno possibilities and the desire to
>> avoid the "udev_monitor_receive_device returned NULL" message processing
>> because we had too many "cooks in the kitchen" trying to determine
>> whether a new device was really available or was this just another
>> notification for something we're already processing.
>>
>> Also, not that I expect the udev code to change, but if a new errno is
>> added then we may have to keep up... Always a fear especially if we're
> 
> Not necessarily, that would mean that either libudev replaces recvmsg with
> something else or that recvmsg starts returning more/different errnos to signal
> that there are no data to be pulled - since there are already 2 such errnos,
> it's highly unlikely IMHO. Unless libudev changes substantially, in terms of
> error signaling, I think we can safely ignore the other errnos as the actual
> outcome for libvirt is that libudev failed because of some socket error, but
> since that can either come from kernel or from libudev itself, the best thing
> we can do is shrug our shoulders and say "libudev failed for some reason, but
> we can't tell why".
> 

OK - fair enough... Just exhausting all options/excuses!

>> using the errno to dictate our algorithm.
>>
>>>
>>> This way, it's actually the handler thread who's 'the boss', as most of the
>>> signals from udevHandleCallback would be lost/NOPs.
>>>
>>> Would you agree to such an approach?
>>
>> At some point we get too fancy and think too hard about a problem
>> letting it consume us.  I think when I presented my idea - I wasn't
>> really understanding the details of how we consume the udev data. Now
>> that I have a bit more of a clue - perhaps the original idea wasn't so
>> good. If we receive multiple notifications that a device is ready to be
>> processed even though we could be processing it - how are we to truly
>> know whether we missed one or we really got it and udev was just
>> reminding us again.
> 
> udev doesn't remind us of anything, it's our event loop that does - if we don't
> pre-process the events somehow, we can't tell them apart and even if we did, it
> would be utterly unreliable since you'd have to include some kind of nonce and
> a hash to be really sure that it's a duplicate event and not a device that was
> un-plugged and re-plugged again in a very short period of time. So, this issue
> is not a matter of our communication with udev, it's about how good can we
> manage our discover-pull-process event algorithm.
> 

OK - that's just me and my shortcut interpretation of how we're notified.

>>
>> I'm not against looking at a different mechanism - the question then
>> becomes from your perspective is it worth it?
> 
> I think the ultimate question is how fast do we want to deliver a fix. Of
> course I'm open to explore more approaches, but we can do that even with a
> "preliminary" fix included (even though, honestly, once this is merged, I doubt
> anyone will find time to do any kind of experiments unless there's another
> breakage to fix in this code). So, based on what I just wrote, I'm inclined to
> say no, it really wouldn't be of much worth.

Can we win the race to fix the root-cause? I think we've explored our
alternatives thoroughly and am fine with libvirt making an adjustment
since we cannot rely on when something else is fixed. There are others
that have a different opinion. Still while it's MDEV this time, the last
time it was VHBA, and the next time it will be something different -
guaranteed. If we can provide a way to avoid our customers from having
to deal with "intermittent" and "unexplained" failures because we're
waiting for someone else to fix the problem, then I say we fix it and
move on. Especially when we can understand the problem. If eventually
they fix their problem, then our solution gathers cobwebs, but so what.

> 
> Even though I came with another proposal, I personally still like the previous
> version with one-by-one event version, since despite (perhaps) being a bit
> slower, it's a more transparent and consistent (minus the event counter)
> solution than the one I proposed the last time.
> 
> Erik
> 

I don't have that version paged into short term memory ;-)... If you
have a thread that processes events until udev_monitor_receive_device
returns NULL, then does that do the trick? Regardless of event counting?


John

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v4 5/7] nodedev: Disable/re-enable polling on the udev fd

Posted by Erik Skultety 8 years, 4 months ago

[...]

> > Even though I came with another proposal, I personally still like the previous
> > version with one-by-one event version, since despite (perhaps) being a bit
> > slower, it's a more transparent and consistent (minus the event counter)
> > solution than the one I proposed the last time.
> >
> > Erik
> >
>
> I don't have that version paged into short term memory ;-)... If you
> have a thread that processes events until udev_monitor_receive_device
> returns NULL, then does that do the trick? Regardless of event counting?

Well, so far it's just an idea, I haven't tried it yet, it's just it appeared
to me like it could. I'll try and get back with results.

Erik

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

[libvirt] [PATCH v4 1/7] nodedev: Introduce udevEventCheckMonitorFD helper function
[libvirt] [PATCH v4 2/7] nodedev: udev: Lock the driver in udevEventCheckMonitorFd
[libvirt] [PATCH v4 3/7] udev: Split udevEventHandleCallback in two functions
[libvirt] [PATCH v4 4/7] udev: Convert udevEventHandleThread to an actual thread routine
[libvirt] [PATCH v4 5/7] nodedev: Disable/re-enable polling on the udev fd
[libvirt] [PATCH v4 6/7] util: Introduce virFileWaitForExists
[libvirt] [PATCH v4 7/7] nodedev: Work around the uevent race by hooking up virFileWaitForAccess