[PATCH] tests/qtest: Poll on waitpid() for a while before sending SIGKILL

Stefan Berger posted 1 patch 1 year, 3 months ago
There is a newer version of this series
tests/qtest/libqtest.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
[PATCH] tests/qtest: Poll on waitpid() for a while before sending SIGKILL
Posted by Stefan Berger 1 year, 3 months ago
To prevent getting stuck on waitpid() in case the target process does
not terminate on SIGTERM, poll on waitpid() for 10s and if the target
process has not changed state until then send a SIGKILL to it.

Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
---
 tests/qtest/libqtest.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
index 2fbc3b88f3..362b1f724f 100644
--- a/tests/qtest/libqtest.c
+++ b/tests/qtest/libqtest.c
@@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
 {
 #ifndef _WIN32
     pid_t pid;
+    uint64_t end;
+
+    /* poll for 10s until sending SIGKILL */
+    end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;
+
+    do {
+        pid = waitpid(s->qemu_pid, &s->wstatus, WNOHANG);
+        if (pid != 0) {
+            break;
+        }
+        g_usleep(100 * 1000);
+    } while (g_get_monotonic_time() < end);
+
+    if (pid == 0) {
+        kill(s->qemu_pid, SIGKILL);
+        TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
+    }
 
-    TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
     assert(pid == s->qemu_pid);
 #else
     DWORD ret;
-- 
2.39.0
Re: [PATCH] tests/qtest: Poll on waitpid() for a while before sending SIGKILL
Posted by Philippe Mathieu-Daudé 1 year, 3 months ago
On 11/1/23 23:30, Stefan Berger wrote:
> To prevent getting stuck on waitpid() in case the target process does
> not terminate on SIGTERM, poll on waitpid() for 10s and if the target
> process has not changed state until then send a SIGKILL to it.
> 
> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> ---
>   tests/qtest/libqtest.c | 18 +++++++++++++++++-
>   1 file changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> index 2fbc3b88f3..362b1f724f 100644
> --- a/tests/qtest/libqtest.c
> +++ b/tests/qtest/libqtest.c
> @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
>   {
>   #ifndef _WIN32
>       pid_t pid;
> +    uint64_t end;
> +
> +    /* poll for 10s until sending SIGKILL */
> +    end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;

Maybe we could use getenv() to allow tuning / using different value?

> +    do {
> +        pid = waitpid(s->qemu_pid, &s->wstatus, WNOHANG);
> +        if (pid != 0) {
> +            break;
> +        }
> +        g_usleep(100 * 1000);
> +    } while (g_get_monotonic_time() < end);
> +
> +    if (pid == 0) {
> +        kill(s->qemu_pid, SIGKILL);
> +        TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
> +    }
>   
> -    TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
>       assert(pid == s->qemu_pid);
>   #else
>       DWORD ret;
Re: [PATCH] tests/qtest: Poll on waitpid() for a while before sending SIGKILL
Posted by Daniel P. Berrangé 1 year, 3 months ago
On Thu, Jan 12, 2023 at 10:18:01AM +0100, Philippe Mathieu-Daudé wrote:
> On 11/1/23 23:30, Stefan Berger wrote:
> > To prevent getting stuck on waitpid() in case the target process does
> > not terminate on SIGTERM, poll on waitpid() for 10s and if the target
> > process has not changed state until then send a SIGKILL to it.
> > 
> > Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> > ---
> >   tests/qtest/libqtest.c | 18 +++++++++++++++++-
> >   1 file changed, 17 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> > index 2fbc3b88f3..362b1f724f 100644
> > --- a/tests/qtest/libqtest.c
> > +++ b/tests/qtest/libqtest.c
> > @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
> >   {
> >   #ifndef _WIN32
> >       pid_t pid;
> > +    uint64_t end;
> > +
> > +    /* poll for 10s until sending SIGKILL */
> > +    end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;
> 
> Maybe we could use getenv() to allow tuning / using different value?

I'd rather we picked a value large enough that it will work
reliably out of the box for all scenarios with no magic
env required. We're just trying to prevent infinite waits if
something unexpected happens. We don't need to use an
aggressively short value, as most users will never hit this
scenario. I think 30 seconds is large enough to be reliable
but we could easily go higher to 60/120 if we want to be
really really sure.


With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|


Re: [PATCH] tests/qtest: Poll on waitpid() for a while before sending SIGKILL
Posted by Philippe Mathieu-Daudé 1 year, 3 months ago
On 12/1/23 10:54, Daniel P. Berrangé wrote:
> On Thu, Jan 12, 2023 at 10:18:01AM +0100, Philippe Mathieu-Daudé wrote:
>> On 11/1/23 23:30, Stefan Berger wrote:
>>> To prevent getting stuck on waitpid() in case the target process does
>>> not terminate on SIGTERM, poll on waitpid() for 10s and if the target
>>> process has not changed state until then send a SIGKILL to it.
>>>
>>> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
>>> ---
>>>    tests/qtest/libqtest.c | 18 +++++++++++++++++-
>>>    1 file changed, 17 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
>>> index 2fbc3b88f3..362b1f724f 100644
>>> --- a/tests/qtest/libqtest.c
>>> +++ b/tests/qtest/libqtest.c
>>> @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
>>>    {
>>>    #ifndef _WIN32
>>>        pid_t pid;
>>> +    uint64_t end;
>>> +
>>> +    /* poll for 10s until sending SIGKILL */
>>> +    end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;
>>
>> Maybe we could use getenv() to allow tuning / using different value?
> 
> I'd rather we picked a value large enough that it will work
> reliably out of the box for all scenarios with no magic
> env required. We're just trying to prevent infinite waits if
> something unexpected happens. We don't need to use an
> aggressively short value, as most users will never hit this
> scenario. I think 30 seconds is large enough to be reliable
> but we could easily go higher to 60/120 if we want to be
> really really sure.

I read your other comment later and I agree with you.


Re: [PATCH] tests/qtest: Poll on waitpid() for a while before sending SIGKILL
Posted by Daniel P. Berrangé 1 year, 3 months ago
On Wed, Jan 11, 2023 at 05:30:18PM -0500, Stefan Berger wrote:
> To prevent getting stuck on waitpid() in case the target process does
> not terminate on SIGTERM, poll on waitpid() for 10s and if the target
> process has not changed state until then send a SIGKILL to it.
> 
> Signed-off-by: Stefan Berger <stefanb@linux.ibm.com>
> ---
>  tests/qtest/libqtest.c | 18 +++++++++++++++++-
>  1 file changed, 17 insertions(+), 1 deletion(-)

Since this is a test suite and we know our CI system gets very
heavily loaded, I think we should wait more than 10 secs, to
ensure QEMU has time to flush pending I/O in particular which
is most likely to delay things. If you bump the time to 30 secs
then

  Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>

> 
> diff --git a/tests/qtest/libqtest.c b/tests/qtest/libqtest.c
> index 2fbc3b88f3..362b1f724f 100644
> --- a/tests/qtest/libqtest.c
> +++ b/tests/qtest/libqtest.c
> @@ -202,8 +202,24 @@ void qtest_wait_qemu(QTestState *s)
>  {
>  #ifndef _WIN32
>      pid_t pid;
> +    uint64_t end;
> +
> +    /* poll for 10s until sending SIGKILL */
> +    end = g_get_monotonic_time() + 10 * G_TIME_SPAN_SECOND;
> +
> +    do {
> +        pid = waitpid(s->qemu_pid, &s->wstatus, WNOHANG);
> +        if (pid != 0) {
> +            break;
> +        }
> +        g_usleep(100 * 1000);
> +    } while (g_get_monotonic_time() < end);
> +
> +    if (pid == 0) {
> +        kill(s->qemu_pid, SIGKILL);
> +        TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
> +    }
>  
> -    TFR(pid = waitpid(s->qemu_pid, &s->wstatus, 0));
>      assert(pid == s->qemu_pid);
>  #else
>      DWORD ret;
> -- 
> 2.39.0
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|