[libvirt] [PATCH 1/2] qemu: don't kill qemu process on restart if networkNotify fails

Laine Stump posted 2 patches 8 years, 9 months ago
There is a newer version of this series
[libvirt] [PATCH 1/2] qemu: don't kill qemu process on restart if networkNotify fails
Posted by Laine Stump 8 years, 9 months ago
Nothing that could happen during networkNotifyActualDevice() could
justify unceremoniously killing the qemu process, but that's what we
were doing.

In particular, new code added in commit 85bcc022 (first appearred in
libvirt-3.2.0) attempts to reattach tap devices to their assigned
bridge devices when libvirtd restarts (to make it easier to recover
from a restart of a libvirt network). But if the network has been
stopped and *not* restarted, the bridge device won't exist and
networkNotifyActualDevice() will fail.

This patch changes qemuProcessNotifyNets() to return void, so that
qemuProcessReconnect() will soldier on regardless of what happens (any
errors will still be logged, but we won't terminate the guest).

Partially resolves: https://bugzilla.redhat.com/1442700
---

NB: there are many other things that might fail during reconnection of
a qemu process that shouldn't lead to killing the qemu
process. qemuProcessReconnect() could use a good audit to change what
happens when certain items fail.



 src/qemu/qemu_process.c | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
index 53170d7..fcb5763 100644
--- a/src/qemu/qemu_process.c
+++ b/src/qemu/qemu_process.c
@@ -2825,7 +2825,7 @@ int qemuProcessStopCPUs(virQEMUDriverPtr driver,
 
 
 
-static int
+static void
 qemuProcessNotifyNets(virDomainDefPtr def)
 {
     size_t i;
@@ -2840,10 +2840,8 @@ qemuProcessNotifyNets(virDomainDefPtr def)
         if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_DIRECT)
            ignore_value(virNetDevMacVLanReserveName(net->ifname, false));
 
-        if (networkNotifyActualDevice(def, net) < 0)
-            return -1;
+        networkNotifyActualDevice(def, net);
     }
-    return 0;
 }
 
 static int
@@ -3480,8 +3478,7 @@ qemuProcessReconnect(void *opaque)
     if (qemuSecurityReserveLabel(driver->securityManager, obj->def, obj->pid) < 0)
         goto error;
 
-    if (qemuProcessNotifyNets(obj->def) < 0)
-        goto error;
+    qemuProcessNotifyNets(obj->def);
 
     if (qemuProcessFiltersInstantiate(obj->def))
         goto error;
-- 
2.9.3

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/2] qemu: don't kill qemu process on restart if networkNotify fails
Posted by John Ferlan 8 years, 9 months ago

On 04/25/2017 12:33 PM, Laine Stump wrote:
> Nothing that could happen during networkNotifyActualDevice() could
> justify unceremoniously killing the qemu process, but that's what we
> were doing.

So that galaxy in far far away land circa commit id '04711a0f' when all
this was added -

> 
> In particular, new code added in commit 85bcc022 (first appearred in

appeared

> libvirt-3.2.0) attempts to reattach tap devices to their assigned
> bridge devices when libvirtd restarts (to make it easier to recover
> from a restart of a libvirt network). But if the network has been
> stopped and *not* restarted, the bridge device won't exist and
> networkNotifyActualDevice() will fail.
> 
> This patch changes qemuProcessNotifyNets() to return void, so that
> qemuProcessReconnect() will soldier on regardless of what happens (any
> errors will still be logged, but we won't terminate the guest).
> 
> Partially resolves: https://bugzilla.redhat.com/1442700
> ---
> 
> NB: there are many other things that might fail during reconnection of
> a qemu process that shouldn't lead to killing the qemu
> process. qemuProcessReconnect() could use a good audit to change what
> happens when certain items fail.
> 

Nose game, not me ;-)

> 
> 
>  src/qemu/qemu_process.c | 9 +++------
>  1 file changed, 3 insertions(+), 6 deletions(-)
> 
> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
> index 53170d7..fcb5763 100644
> --- a/src/qemu/qemu_process.c
> +++ b/src/qemu/qemu_process.c
> @@ -2825,7 +2825,7 @@ int qemuProcessStopCPUs(virQEMUDriverPtr driver,
>  
>  
>  
> -static int
> +static void
>  qemuProcessNotifyNets(virDomainDefPtr def)
>  {
>      size_t i;
> @@ -2840,10 +2840,8 @@ qemuProcessNotifyNets(virDomainDefPtr def)
>          if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_DIRECT)
>             ignore_value(virNetDevMacVLanReserveName(net->ifname, false));
>  
> -        if (networkNotifyActualDevice(def, net) < 0)
> -            return -1;
> +        networkNotifyActualDevice(def, net);

Seems like networkNotifyActualDevice should then return a void too?
And have it's comments/content changed...


IIUC thought, essentially if the network goes away, the domain isn't
necessarily shut down - it gets notified somehow that the network isn't
there any more and soldiers on. So from that aspect I can certain agree
that just because we discover an error when libvirtd restarts doesn't
mean we should now kill the domain.

I think the concept behind the patch is fine - it's the "how much more"
should be done...  (and yes with the eye on keeping backports to a
minimum, yep understood).


John

>      }
> -    return 0;
>  }
>  
>  static int
> @@ -3480,8 +3478,7 @@ qemuProcessReconnect(void *opaque)
>      if (qemuSecurityReserveLabel(driver->securityManager, obj->def, obj->pid) < 0)
>          goto error;
>  
> -    if (qemuProcessNotifyNets(obj->def) < 0)
> -        goto error;
> +    qemuProcessNotifyNets(obj->def);
>  
>      if (qemuProcessFiltersInstantiate(obj->def))
>          goto error;
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/2] qemu: don't kill qemu process on restart if networkNotify fails
Posted by Laine Stump 8 years, 9 months ago
On 04/26/2017 02:39 PM, John Ferlan wrote:
> 
> 
> On 04/25/2017 12:33 PM, Laine Stump wrote:
>> Nothing that could happen during networkNotifyActualDevice() could
>> justify unceremoniously killing the qemu process, but that's what we
>> were doing.
> 
> So that galaxy in far far away land circa commit id '04711a0f' when all
> this was added -

Sure, in 2011. I think the statute of limitations has passed on that
though (actually I'm surprised there's still any evidence of it in the
git blame, what with all the code reorganization etc going on - that
tends to obscure meaningful/functional changes...)

> 
>>
>> In particular, new code added in commit 85bcc022 (first appearred in
> 
> appeared
> 
>> libvirt-3.2.0) attempts to reattach tap devices to their assigned
>> bridge devices when libvirtd restarts (to make it easier to recover
>> from a restart of a libvirt network). But if the network has been
>> stopped and *not* restarted, the bridge device won't exist and
>> networkNotifyActualDevice() will fail.
>>
>> This patch changes qemuProcessNotifyNets() to return void, so that
>> qemuProcessReconnect() will soldier on regardless of what happens (any
>> errors will still be logged, but we won't terminate the guest).
>>
>> Partially resolves: https://bugzilla.redhat.com/1442700
>> ---
>>
>> NB: there are many other things that might fail during reconnection of
>> a qemu process that shouldn't lead to killing the qemu
>> process. qemuProcessReconnect() could use a good audit to change what
>> happens when certain items fail.
>>
> 
> Nose game, not me ;-)

Nor me. I've done my part by pointing it out :-P

> 
>>
>>
>>  src/qemu/qemu_process.c | 9 +++------
>>  1 file changed, 3 insertions(+), 6 deletions(-)
>>
>> diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c
>> index 53170d7..fcb5763 100644
>> --- a/src/qemu/qemu_process.c
>> +++ b/src/qemu/qemu_process.c
>> @@ -2825,7 +2825,7 @@ int qemuProcessStopCPUs(virQEMUDriverPtr driver,
>>  
>>  
>>  
>> -static int
>> +static void
>>  qemuProcessNotifyNets(virDomainDefPtr def)
>>  {
>>      size_t i;
>> @@ -2840,10 +2840,8 @@ qemuProcessNotifyNets(virDomainDefPtr def)
>>          if (virDomainNetGetActualType(net) == VIR_DOMAIN_NET_TYPE_DIRECT)
>>             ignore_value(virNetDevMacVLanReserveName(net->ifname, false));
>>  
>> -        if (networkNotifyActualDevice(def, net) < 0)
>> -            return -1;
>> +        networkNotifyActualDevice(def, net);
> 
> Seems like networkNotifyActualDevice should then return a void too?
> And have it's comments/content changed...

That depends. If we go through the entire call stack and obliterate the
return codes, then in the future if we decide there is something that we
really do want to fail on, there will be more stuff to "un-undo". Also,
I'm not as comfortable obliterating the return code of a function that
called from another file. That's partially just superstition though.

> 
> 
> IIUC thought, essentially if the network goes away, the domain isn't
> necessarily shut down - it gets notified somehow that the network isn't
> there any more

Yeah, it's "notified" mainly by the fact that it's no longer receiving
any traffic. We *might* want to think about setting the guest's
interface ~IFF_UP, but who knows what unwanted side effects that could
have in the case of someone just restarting a network.

> and soldiers on. So from that aspect I can certain agree
> that just because we discover an error when libvirtd restarts doesn't
> mean we should now kill the domain.

Yeah, it's*very* passive-agressive (nay, agressive-agressive) to kill
the entire guest just because one little part of it has stopped working.
Much nicer to leave it running so that some amount of sane shutdown
might be performed even if the non-working piece can't be restored.


> 
> I think the concept behind the patch is fine - it's the "how much more"
> should be done...  (and yes with the eye on keeping backports to a
> minimum, yep understood).

I guess I can also nuke the return value from
networkNotiyActualDevice(). I'll redo it that way and resend.

> 
> 
> John
> 
>>      }
>> -    return 0;
>>  }
>>  
>>  static int
>> @@ -3480,8 +3478,7 @@ qemuProcessReconnect(void *opaque)
>>      if (qemuSecurityReserveLabel(driver->securityManager, obj->def, obj->pid) < 0)
>>          goto error;
>>  
>> -    if (qemuProcessNotifyNets(obj->def) < 0)
>> -        goto error;
>> +    qemuProcessNotifyNets(obj->def);
>>  
>>      if (qemuProcessFiltersInstantiate(obj->def))
>>          goto error;
>>
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list