[PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job

Thomas Huth posted 6 patches 2 years, 3 months ago
There is a newer version of this series
[PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Thomas Huth 2 years, 3 months ago
We can get rid of the build-coroutine-sigaltstack job by moving
the configure flags that should be tested here to other jobs:
Move --with-coroutine=sigaltstack to the build-without-defaults job
and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 .gitlab-ci.d/buildtest.yml   | 14 ++------------
 .gitlab-ci.d/crossbuilds.yml |  2 +-
 2 files changed, 3 insertions(+), 13 deletions(-)

diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
index 91c7467a66..1438797a1c 100644
--- a/.gitlab-ci.d/buildtest.yml
+++ b/.gitlab-ci.d/buildtest.yml
@@ -533,19 +533,8 @@ build-tci:
     - QTEST_QEMU_BINARY="./qemu-system-s390x" ./tests/qtest/pxe-test -m slow
     - make check-tcg
 
-# Alternate coroutines implementations are only really of interest to KVM users
-# However we can't test against KVM on Gitlab-CI so we can only run unit tests
-build-coroutine-sigaltstack:
-  extends: .native_build_job_template
-  needs:
-    job: amd64-ubuntu2004-container
-  variables:
-    IMAGE: ubuntu2004
-    CONFIGURE_ARGS: --with-coroutine=sigaltstack --disable-tcg
-                    --enable-trace-backends=ftrace
-    MAKE_CHECK_ARGS: check-unit
-
 # Check our reduced build configurations
+# (and an alternative coroutine implementation)
 build-without-defaults:
   extends: .native_build_job_template
   needs:
@@ -559,6 +548,7 @@ build-without-defaults:
       --disable-pie
       --disable-qom-cast-debug
       --disable-strip
+      --with-coroutine=sigaltstack
     TARGETS: avr-softmmu mips64-softmmu s390x-softmmu sh4-softmmu
       sparc64-softmmu hexagon-linux-user i386-linux-user s390x-linux-user
     MAKE_CHECK_ARGS: check-unit check-qtest-avr check-qtest-mips64
diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
index 8dbbb8f881..027d2088da 100644
--- a/.gitlab-ci.d/crossbuilds.yml
+++ b/.gitlab-ci.d/crossbuilds.yml
@@ -159,7 +159,7 @@ cross-s390x-kvm-only:
     job: s390x-debian-cross-container
   variables:
     IMAGE: debian-s390x-cross
-    EXTRA_CONFIGURE_OPTS: --disable-tcg
+    EXTRA_CONFIGURE_OPTS: --disable-tcg --enable-trace-backends=ftrace
 
 cross-mips64el-kvm-only:
   extends: .cross_accel_build_job
-- 
2.31.1
Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Daniel P. Berrangé 2 years, 3 months ago
On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote:
> We can get rid of the build-coroutine-sigaltstack job by moving
> the configure flags that should be tested here to other jobs:
> Move --with-coroutine=sigaltstack to the build-without-defaults job
> and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.

The biggest user of coroutines is the block layer. So we probably
ought to have coroutines aligned with a job that triggers the
'make check-block' for iotests.  IIUC,  the without-defaults
job won't do that. How about, arbitrarily, using either the
'check-system-debian' or 'check-system-ubuntu' job. Those distros
are closely related, so getting sigaltstack vs ucontext coverage
between them is a good win, and they both trigger the block jobs
IIUC.

Incidentally sigaltstack is also covered by our Cirrus CI job
for macOS.

> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  .gitlab-ci.d/buildtest.yml   | 14 ++------------
>  .gitlab-ci.d/crossbuilds.yml |  2 +-
>  2 files changed, 3 insertions(+), 13 deletions(-)
> 
> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
> index 91c7467a66..1438797a1c 100644
> --- a/.gitlab-ci.d/buildtest.yml
> +++ b/.gitlab-ci.d/buildtest.yml
> @@ -533,19 +533,8 @@ build-tci:
>      - QTEST_QEMU_BINARY="./qemu-system-s390x" ./tests/qtest/pxe-test -m slow
>      - make check-tcg
>  
> -# Alternate coroutines implementations are only really of interest to KVM users
> -# However we can't test against KVM on Gitlab-CI so we can only run unit tests
> -build-coroutine-sigaltstack:
> -  extends: .native_build_job_template
> -  needs:
> -    job: amd64-ubuntu2004-container
> -  variables:
> -    IMAGE: ubuntu2004
> -    CONFIGURE_ARGS: --with-coroutine=sigaltstack --disable-tcg
> -                    --enable-trace-backends=ftrace
> -    MAKE_CHECK_ARGS: check-unit
> -
>  # Check our reduced build configurations
> +# (and an alternative coroutine implementation)
>  build-without-defaults:
>    extends: .native_build_job_template
>    needs:
> @@ -559,6 +548,7 @@ build-without-defaults:
>        --disable-pie
>        --disable-qom-cast-debug
>        --disable-strip
> +      --with-coroutine=sigaltstack
>      TARGETS: avr-softmmu mips64-softmmu s390x-softmmu sh4-softmmu
>        sparc64-softmmu hexagon-linux-user i386-linux-user s390x-linux-user
>      MAKE_CHECK_ARGS: check-unit check-qtest-avr check-qtest-mips64
> diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
> index 8dbbb8f881..027d2088da 100644
> --- a/.gitlab-ci.d/crossbuilds.yml
> +++ b/.gitlab-ci.d/crossbuilds.yml
> @@ -159,7 +159,7 @@ cross-s390x-kvm-only:
>      job: s390x-debian-cross-container
>    variables:
>      IMAGE: debian-s390x-cross
> -    EXTRA_CONFIGURE_OPTS: --disable-tcg
> +    EXTRA_CONFIGURE_OPTS: --disable-tcg --enable-trace-backends=ftrace
>  
>  cross-mips64el-kvm-only:
>    extends: .cross_accel_build_job
> -- 
> 2.31.1
> 
> 

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|
Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Thomas Huth 2 years, 3 months ago
On 30/01/2023 11.58, Daniel P. Berrangé wrote:
> On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote:
>> We can get rid of the build-coroutine-sigaltstack job by moving
>> the configure flags that should be tested here to other jobs:
>> Move --with-coroutine=sigaltstack to the build-without-defaults job
>> and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.
> 
> The biggest user of coroutines is the block layer. So we probably
> ought to have coroutines aligned with a job that triggers the
> 'make check-block' for iotests.  IIUC,  the without-defaults
> job won't do that. How about, arbitrarily, using either the
> 'check-system-debian' or 'check-system-ubuntu' job. Those distros
> are closely related, so getting sigaltstack vs ucontext coverage
> between them is a good win, and they both trigger the block jobs
> IIUC.

I gave it a try with the ubuntu job, but this apparently trips up the iotests:

  https://gitlab.com/thuth/qemu/-/jobs/3705965062#L212

Does anybody have a clue what could be going wrong here?

  Thomas


Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Kevin Wolf 2 years, 3 months ago
Am 03.02.2023 um 12:23 hat Thomas Huth geschrieben:
> On 30/01/2023 11.58, Daniel P. Berrangé wrote:
> > On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote:
> > > We can get rid of the build-coroutine-sigaltstack job by moving
> > > the configure flags that should be tested here to other jobs:
> > > Move --with-coroutine=sigaltstack to the build-without-defaults job
> > > and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.
> > 
> > The biggest user of coroutines is the block layer. So we probably
> > ought to have coroutines aligned with a job that triggers the
> > 'make check-block' for iotests.  IIUC,  the without-defaults
> > job won't do that. How about, arbitrarily, using either the
> > 'check-system-debian' or 'check-system-ubuntu' job. Those distros
> > are closely related, so getting sigaltstack vs ucontext coverage
> > between them is a good win, and they both trigger the block jobs
> > IIUC.
> 
> I gave it a try with the ubuntu job, but this apparently trips up the iotests:
> 
>  https://gitlab.com/thuth/qemu/-/jobs/3705965062#L212
> 
> Does anybody have a clue what could be going wrong here?

I'm not sure how changing the coroutine backend could cause it, but
primarily this looks like an assertion failure in migration code.

Dave, Juan, any ideas what this assertion checks and why it could be
failing?

Kevin
Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Juan Quintela 2 years, 3 months ago
Kevin Wolf <kwolf@redhat.com> wrote:
> Am 03.02.2023 um 12:23 hat Thomas Huth geschrieben:
>> On 30/01/2023 11.58, Daniel P. Berrangé wrote:
>> > On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote:
>> > > We can get rid of the build-coroutine-sigaltstack job by moving
>> > > the configure flags that should be tested here to other jobs:
>> > > Move --with-coroutine=sigaltstack to the build-without-defaults job
>> > > and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.
>> > 
>> > The biggest user of coroutines is the block layer. So we probably
>> > ought to have coroutines aligned with a job that triggers the
>> > 'make check-block' for iotests.  IIUC,  the without-defaults
>> > job won't do that. How about, arbitrarily, using either the
>> > 'check-system-debian' or 'check-system-ubuntu' job. Those distros
>> > are closely related, so getting sigaltstack vs ucontext coverage
>> > between them is a good win, and they both trigger the block jobs
>> > IIUC.
>> 
>> I gave it a try with the ubuntu job, but this apparently trips up the iotests:
>> 
>>  https://gitlab.com/thuth/qemu/-/jobs/3705965062#L212
>> 
>> Does anybody have a clue what could be going wrong here?
>
> I'm not sure how changing the coroutine backend could cause it, but
> primarily this looks like an assertion failure in migration code.

Adding Peter here, as he is the last one touching that code O:-)

> Dave, Juan, any ideas what this assertion checks and why it could be
> failing?

Really no.

+QEMU_PROG: ../migration/ram.c:874: pss_find_next_dirty: Assertion `pss->host_page_end' failed.
+./common.rc: line 195: 78727 Aborted                 (core dumped) ( if [ -n "${QEMU_NEED_PID}" ]; then
+    echo $BASHPID > "${QEMU_TEST_DIR}/qemu-${_QEMU_HANDLE}.pid";
+fi; GDB=""; if [ -n "${GDB_OPTIONS}" ]; then
+    GDB="gdbserver ${GDB_OPTIONS}";
+fi; VALGRIND_QEMU="${VALGRIND_QEMU_VM}" _qemu_proc_exec "${VALGRIND_LOGFILE}" $GDB "$QEMU_PROG" $QEMU_OPTIONS "$@" )

pss_find_next_dirty() is only called from three places:

find . -type f -exec grep --color=auto -nH --null -e pss_find_next_dirty \{\} +
./ram.c.847: * pss_find_next_dirty: find the next dirty page of current ramblock
./ram.c.857:static void pss_find_next_dirty(PageSearchStatus *pss)
./ram.c.1562:    pss_find_next_dirty(pss);
./ram.c.2391:        pss_find_next_dirty(pss);
./ram.c.2476:        pss_find_next_dirty(pss);

I can't see how this can be affected by coroutines changes.

What is the test that is failing, and what is the change that I have to
do to try to reproduce it?

Later, Juan.
Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Thomas Huth 2 years, 3 months ago
On 03/02/2023 13.08, Kevin Wolf wrote:
> Am 03.02.2023 um 12:23 hat Thomas Huth geschrieben:
>> On 30/01/2023 11.58, Daniel P. Berrangé wrote:
>>> On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote:
>>>> We can get rid of the build-coroutine-sigaltstack job by moving
>>>> the configure flags that should be tested here to other jobs:
>>>> Move --with-coroutine=sigaltstack to the build-without-defaults job
>>>> and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.
>>>
>>> The biggest user of coroutines is the block layer. So we probably
>>> ought to have coroutines aligned with a job that triggers the
>>> 'make check-block' for iotests.  IIUC,  the without-defaults
>>> job won't do that. How about, arbitrarily, using either the
>>> 'check-system-debian' or 'check-system-ubuntu' job. Those distros
>>> are closely related, so getting sigaltstack vs ucontext coverage
>>> between them is a good win, and they both trigger the block jobs
>>> IIUC.
>>
>> I gave it a try with the ubuntu job, but this apparently trips up the iotests:
>>
>>   https://gitlab.com/thuth/qemu/-/jobs/3705965062#L212
>>
>> Does anybody have a clue what could be going wrong here?
> 
> I'm not sure how changing the coroutine backend could cause it, but
> primarily this looks like an assertion failure in migration code.
> 
> Dave, Juan, any ideas what this assertion checks and why it could be
> failing?

Ah, I think it's the bug that will be fixed by:

  https://lore.kernel.org/qemu-devel/20230202160640.2300-2-quintela@redhat.com/

The fix hasn't hit the master branch yet (I think), and I had another patch 
in my CI that disables the aarch64 binary in that runner, so the iotests 
suddenly have been executed with the alpha binary there --> migration fails.

So never mind, it will be fixed as soon as Juan's pull request gets included.

  Thomas



Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Peter Maydell 2 years, 3 months ago
On Fri, 3 Feb 2023 at 15:44, Thomas Huth <thuth@redhat.com> wrote:
>
> On 03/02/2023 13.08, Kevin Wolf wrote:
> > Am 03.02.2023 um 12:23 hat Thomas Huth geschrieben:
> >> On 30/01/2023 11.58, Daniel P. Berrangé wrote:
> >>> On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote:
> >>>> We can get rid of the build-coroutine-sigaltstack job by moving
> >>>> the configure flags that should be tested here to other jobs:
> >>>> Move --with-coroutine=sigaltstack to the build-without-defaults job
> >>>> and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.
> >>>
> >>> The biggest user of coroutines is the block layer. So we probably
> >>> ought to have coroutines aligned with a job that triggers the
> >>> 'make check-block' for iotests.  IIUC,  the without-defaults
> >>> job won't do that. How about, arbitrarily, using either the
> >>> 'check-system-debian' or 'check-system-ubuntu' job. Those distros
> >>> are closely related, so getting sigaltstack vs ucontext coverage
> >>> between them is a good win, and they both trigger the block jobs
> >>> IIUC.
> >>
> >> I gave it a try with the ubuntu job, but this apparently trips up the iotests:
> >>
> >>   https://gitlab.com/thuth/qemu/-/jobs/3705965062#L212
> >>
> >> Does anybody have a clue what could be going wrong here?
> >
> > I'm not sure how changing the coroutine backend could cause it, but
> > primarily this looks like an assertion failure in migration code.
> >
> > Dave, Juan, any ideas what this assertion checks and why it could be
> > failing?
>
> Ah, I think it's the bug that will be fixed by:
>
>   https://lore.kernel.org/qemu-devel/20230202160640.2300-2-quintela@redhat.com/
>
> The fix hasn't hit the master branch yet (I think), and I had another patch
> in my CI that disables the aarch64 binary in that runner, so the iotests
> suddenly have been executed with the alpha binary there --> migration fails.
>
> So never mind, it will be fixed as soon as Juan's pull request gets included.

The migration tests have been flaky for a while now,
including setups where host and guest page sizes are the same.
(For instance, my x86 macos box pretty reliably sees failures
when the machine is under load.)

-- PMM
Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Juan Quintela 2 years, 3 months ago
Peter Maydell <peter.maydell@linaro.org> wrote:
> On Fri, 3 Feb 2023 at 15:44, Thomas Huth <thuth@redhat.com> wrote:
>>
>> On 03/02/2023 13.08, Kevin Wolf wrote:
>> > Am 03.02.2023 um 12:23 hat Thomas Huth geschrieben:
>> >> On 30/01/2023 11.58, Daniel P. Berrangé wrote:
>> >>> On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote:
>> >>>> We can get rid of the build-coroutine-sigaltstack job by moving
>> >>>> the configure flags that should be tested here to other jobs:
>> >>>> Move --with-coroutine=sigaltstack to the build-without-defaults job
>> >>>> and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.
>> >>>
>> >>> The biggest user of coroutines is the block layer. So we probably
>> >>> ought to have coroutines aligned with a job that triggers the
>> >>> 'make check-block' for iotests.  IIUC,  the without-defaults
>> >>> job won't do that. How about, arbitrarily, using either the
>> >>> 'check-system-debian' or 'check-system-ubuntu' job. Those distros
>> >>> are closely related, so getting sigaltstack vs ucontext coverage
>> >>> between them is a good win, and they both trigger the block jobs
>> >>> IIUC.
>> >>
>> >> I gave it a try with the ubuntu job, but this apparently trips up the iotests:
>> >>
>> >>   https://gitlab.com/thuth/qemu/-/jobs/3705965062#L212
>> >>
>> >> Does anybody have a clue what could be going wrong here?
>> >
>> > I'm not sure how changing the coroutine backend could cause it, but
>> > primarily this looks like an assertion failure in migration code.
>> >
>> > Dave, Juan, any ideas what this assertion checks and why it could be
>> > failing?
>>
>> Ah, I think it's the bug that will be fixed by:
>>
>>   https://lore.kernel.org/qemu-devel/20230202160640.2300-2-quintela@redhat.com/
>>
>> The fix hasn't hit the master branch yet (I think), and I had another patch
>> in my CI that disables the aarch64 binary in that runner, so the iotests
>> suddenly have been executed with the alpha binary there --> migration fails.
>>
>> So never mind, it will be fixed as soon as Juan's pull request gets included.
>
> The migration tests have been flaky for a while now,
> including setups where host and guest page sizes are the same.
> (For instance, my x86 macos box pretty reliably sees failures
> when the machine is under load.)

I *thought* that we had fixed all of those.

But it is difficult for me to know because:
- I only happens when one runs "make check"
- running ./migration-test have never failed to me
- When it fails (and it has been a while since it has failed to me)
  it is impossible to me to detect what is going on, and as said, I have
  never been able to reproduce running only migration-test.

I will try to run several at the same time and see if it happens.

And as Thomas said, I *think* that the fix that Peter Xu posted should
fix this issue.  Famous last words.

Later, Juan.
Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Thomas Huth 2 years, 3 months ago
On 03/02/2023 22.14, Juan Quintela wrote:
> Peter Maydell <peter.maydell@linaro.org> wrote:
>> On Fri, 3 Feb 2023 at 15:44, Thomas Huth <thuth@redhat.com> wrote:
>>>
>>> On 03/02/2023 13.08, Kevin Wolf wrote:
>>>> Am 03.02.2023 um 12:23 hat Thomas Huth geschrieben:
>>>>> On 30/01/2023 11.58, Daniel P. Berrangé wrote:
>>>>>> On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote:
>>>>>>> We can get rid of the build-coroutine-sigaltstack job by moving
>>>>>>> the configure flags that should be tested here to other jobs:
>>>>>>> Move --with-coroutine=sigaltstack to the build-without-defaults job
>>>>>>> and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.
>>>>>>
>>>>>> The biggest user of coroutines is the block layer. So we probably
>>>>>> ought to have coroutines aligned with a job that triggers the
>>>>>> 'make check-block' for iotests.  IIUC,  the without-defaults
>>>>>> job won't do that. How about, arbitrarily, using either the
>>>>>> 'check-system-debian' or 'check-system-ubuntu' job. Those distros
>>>>>> are closely related, so getting sigaltstack vs ucontext coverage
>>>>>> between them is a good win, and they both trigger the block jobs
>>>>>> IIUC.
>>>>>
>>>>> I gave it a try with the ubuntu job, but this apparently trips up the iotests:
>>>>>
>>>>>    https://gitlab.com/thuth/qemu/-/jobs/3705965062#L212
>>>>>
>>>>> Does anybody have a clue what could be going wrong here?
>>>>
>>>> I'm not sure how changing the coroutine backend could cause it, but
>>>> primarily this looks like an assertion failure in migration code.
>>>>
>>>> Dave, Juan, any ideas what this assertion checks and why it could be
>>>> failing?
>>>
>>> Ah, I think it's the bug that will be fixed by:
>>>
>>>    https://lore.kernel.org/qemu-devel/20230202160640.2300-2-quintela@redhat.com/
>>>
>>> The fix hasn't hit the master branch yet (I think), and I had another patch
>>> in my CI that disables the aarch64 binary in that runner, so the iotests
>>> suddenly have been executed with the alpha binary there --> migration fails.
>>>
>>> So never mind, it will be fixed as soon as Juan's pull request gets included.
>>
>> The migration tests have been flaky for a while now,
>> including setups where host and guest page sizes are the same.
>> (For instance, my x86 macos box pretty reliably sees failures
>> when the machine is under load.)
> 
> I *thought* that we had fixed all of those.
> 
> But it is difficult for me to know because:
> - I only happens when one runs "make check"
> - running ./migration-test have never failed to me
> - When it fails (and it has been a while since it has failed to me)
>    it is impossible to me to detect what is going on, and as said, I have
>    never been able to reproduce running only migration-test.
> 
> I will try to run several at the same time and see if it happens.
> 
> And as Thomas said, I *think* that the fix that Peter Xu posted should
> fix this issue.  Famous last words.

The patch from Peter should fix my problems that I triggered via the iotests 
- but the migration-qtest is still unstable independent from that issue, I 
think. See for example the latest staging pipeline:

  https://gitlab.com/qemu-project/qemu/-/pipelines/767961842

The migration qtest failed in both, the x86-freebsd-build and the 
ubuntu-20.04-s390x-all pipelin.

  Thomas



Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Juan Quintela 2 years, 3 months ago
Thomas Huth <thuth@redhat.com> wrote:
> On 03/02/2023 22.14, Juan Quintela wrote:
>> Peter Maydell <peter.maydell@linaro.org> wrote:
>>> On Fri, 3 Feb 2023 at 15:44, Thomas Huth <thuth@redhat.com> wrote:
>>>>
>>>> On 03/02/2023 13.08, Kevin Wolf wrote:
>>>>> Am 03.02.2023 um 12:23 hat Thomas Huth geschrieben:
>>>>>> On 30/01/2023 11.58, Daniel P. Berrangé wrote:
>>>>>>> On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote:
>>>>>>>> We can get rid of the build-coroutine-sigaltstack job by moving
>>>>>>>> the configure flags that should be tested here to other jobs:
>>>>>>>> Move --with-coroutine=sigaltstack to the build-without-defaults job
>>>>>>>> and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.
>>>>>>>
>>>>>>> The biggest user of coroutines is the block layer. So we probably
>>>>>>> ought to have coroutines aligned with a job that triggers the
>>>>>>> 'make check-block' for iotests.  IIUC,  the without-defaults
>>>>>>> job won't do that. How about, arbitrarily, using either the
>>>>>>> 'check-system-debian' or 'check-system-ubuntu' job. Those distros
>>>>>>> are closely related, so getting sigaltstack vs ucontext coverage
>>>>>>> between them is a good win, and they both trigger the block jobs
>>>>>>> IIUC.
>>>>>>
>>>>>> I gave it a try with the ubuntu job, but this apparently trips up the iotests:
>>>>>>
>>>>>>    https://gitlab.com/thuth/qemu/-/jobs/3705965062#L212
>>>>>>
>>>>>> Does anybody have a clue what could be going wrong here?
>>>>>
>>>>> I'm not sure how changing the coroutine backend could cause it, but
>>>>> primarily this looks like an assertion failure in migration code.
>>>>>
>>>>> Dave, Juan, any ideas what this assertion checks and why it could be
>>>>> failing?
>>>>
>>>> Ah, I think it's the bug that will be fixed by:
>>>>
>>>>    https://lore.kernel.org/qemu-devel/20230202160640.2300-2-quintela@redhat.com/
>>>>
>>>> The fix hasn't hit the master branch yet (I think), and I had another patch
>>>> in my CI that disables the aarch64 binary in that runner, so the iotests
>>>> suddenly have been executed with the alpha binary there --> migration fails.
>>>>
>>>> So never mind, it will be fixed as soon as Juan's pull request gets included.
>>>
>>> The migration tests have been flaky for a while now,
>>> including setups where host and guest page sizes are the same.
>>> (For instance, my x86 macos box pretty reliably sees failures
>>> when the machine is under load.)
>> I *thought* that we had fixed all of those.
>> But it is difficult for me to know because:
>> - I only happens when one runs "make check"
>> - running ./migration-test have never failed to me
>> - When it fails (and it has been a while since it has failed to me)
>>    it is impossible to me to detect what is going on, and as said, I have
>>    never been able to reproduce running only migration-test.
>> I will try to run several at the same time and see if it happens.
>> And as Thomas said, I *think* that the fix that Peter Xu posted
>> should
>> fix this issue.  Famous last words.
>
> The patch from Peter should fix my problems that I triggered via the
> iotests - but the migration-qtest is still unstable independent from
> that issue, I think. See for example the latest staging pipeline:
>
>  https://gitlab.com/qemu-project/qemu/-/pipelines/767961842
>
> The migration qtest failed in both, the x86-freebsd-build and the
> ubuntu-20.04-s390x-all pipelin.
>
>  Thomas

 31/659 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test                   ERROR          48.23s   killed by signal 6 SIGABRT
>>> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-aarch64 MALLOC_PERTURB_=124 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test --tap -k
――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
stderr:
Broken pipe
../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)
TAP parsing error: Too few tests run (expected 41, got 12)
(test program exited with status code -6)
――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――

I don't know hat to do with this:
- this is aarch64 tcg
- this *works* on f37, or at least I can't reproduce any error with make
  check on my box, and I *think* my configuration is quite extensive (as
  far as I know everything that can be compiled in fedora with packages
  in the distro):

configure file: /mnt/code/qemu/full/configure
--enable-trace-backends=log
	--prefix=/usr
	--sysconfdir=/etc/sysconfig/
	--audio-drv-list=pa,alsa
	--with-coroutine=ucontext
	--with-git-submodules=validate
	--enable-alsa
	--enable-attr
	--enable-auth-pam
	--enable-avx2
	--enable-avx512f
	--enable-bochs
	--enable-bpf
	--enable-brlapi
	--disable-bsd-user
	--enable-bzip2
	--enable-cap-ng
	--enable-capstone
	--disable-cfi
	--disable-cfi-debug
	--enable-cloop
	--disable-cocoa
	--enable-containers
	--disable-coreaudio
	--enable-coroutine-pool
	--enable-crypto-afalg
	--enable-curl
	--enable-curses
	--enable-dbus-display
	--enable-debug-info
	--disable-debug-mutex
	--disable-debug-stack-usage
	--disable-debug-tcg
	--enable-dmg
	--enable-docs
	--disable-dsound
	--enable-fdt
	--enable-fuse
	--enable-fuse-lseek
	--disable-fuzzing
	--disable-gcov
	--disable-gcrypt
	--enable-gettext
	--enable-gio
	--enable-glusterfs
	--enable-gnutls
	--disable-gprof
	--enable-gtk
	--enable-guest-agent
	--disable-guest-agent-msi
	--disable-hax
	--disable-hvf
	--enable-iconv
	--enable-install-blobs
	--enable-jack
	--enable-keyring
	--enable-kvm
	--enable-l2tpv3
	--enable-libdaxctl
	--enable-libiscsi
	--enable-libnfs
	--enable-libpmem
	--enable-libssh
	--enable-libudev
	--enable-libusb
	--enable-linux-aio
	--enable-linux-io-uring
	--enable-linux-user
	--enable-live-block-migration
	--disable-lto
	--disable-lzfse
	--enable-lzo
	--disable-malloc-trim
	--enable-membarrier
	--enable-module-upgrades
	--enable-modules
	--enable-mpath
	--enable-multiprocess
	--disable-netmap
	--enable-nettle
	--enable-numa
	--disable-nvmm
	--enable-opengl
	--enable-oss
	--enable-pa
	--enable-parallels
	--enable-pie
	--enable-plugins
	--enable-png
	--disable-profiler
	--enable-pvrdma
	--enable-qcow1
	--enable-qed
	--disable-qom-cast-debug
	--enable-rbd
	--enable-rdma
	--enable-replication
	--enable-rng-none
	--disable-safe-stack
	--disable-sanitizers
	--enable-stack-protector
	--enable-sdl
	--enable-sdl-image
	--enable-seccomp
	--enable-selinux
	--enable-slirp
	--enable-slirp-smbd
	--enable-smartcard
	--enable-snappy
	--enable-sparse
	--enable-spice
	--enable-spice-protocol
	--enable-system
	--enable-tcg
	--disable-tcg-interpreter
	--enable-tools
	--enable-tpm
	--disable-tsan
	--disable-u2f
	--enable-usb-redir
	--enable-user
	--disable-vde
	--enable-vdi
	--enable-vhost-crypto
	--enable-vhost-kernel
	--enable-vhost-net
	--enable-vhost-user
	--enable-vhost-user-blk-server
	--enable-vhost-vdpa
	--enable-virglrenderer
	--enable-virtfs
	--enable-virtiofsd
	--enable-vnc
	--enable-vnc-jpeg
	--enable-vnc-sasl
	--enable-vte
	--enable-vvfat
	--enable-werror
	--disable-whpx
	--enable-xen
	--enable-xen-pci-passthrough
	--enable-xkbcommon
	--enable-zstd

- It gives a segmentation fault.  Nothing else.

Can we get at least a backtrace to work from there?

Thanks, Juan.
Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Peter Maydell 2 years, 3 months ago
On Mon, 6 Feb 2023 at 08:46, Juan Quintela <quintela@redhat.com> wrote:
>  31/659 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test                   ERROR          48.23s   killed by signal 6 SIGABRT
> >>> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh QTEST_QEMU_IMG=./qemu-img QTEST_QEMU_BINARY=./qemu-system-aarch64 MALLOC_PERTURB_=124 QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon /home/gitlab-runner/builds/-LCfcJ2T/0/qemu-project/qemu/build/tests/qtest/migration-test --tap -k
> ――――――――――――――――――――――――――――――――――――― ✀  ―――――――――――――――――――――――――――――――――――――
> stderr:
> Broken pipe
> ../tests/qtest/libqtest.c:190: kill_qemu() detected QEMU death from signal 11 (Segmentation fault) (core dumped)
> TAP parsing error: Too few tests run (expected 41, got 12)
> (test program exited with status code -6)
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
>
> I don't know hat to do with this:
> - this is aarch64 tcg
> - this *works* on f37, or at least I can't reproduce any error with make
>   check on my box, and I *think* my configuration is quite extensive (as
>   far as I know everything that can be compiled in fedora with packages
>   in the distro):

> - It gives a segmentation fault.  Nothing else.
>
> Can we get at least a backtrace to work from there?

Improving the test program / test harness to provide better
information would help. Ultimately if we're going to be
doing CI in gitlab this is a necessity -- all we are ever
going to have is whatever the test program and harness can
print to the logs...

thanks
-- PMM
Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Peter Maydell 2 years, 3 months ago
On Fri, 3 Feb 2023 at 21:14, Juan Quintela <quintela@redhat.com> wrote:
>
> Peter Maydell <peter.maydell@linaro.org> wrote:
> > The migration tests have been flaky for a while now,
> > including setups where host and guest page sizes are the same.
> > (For instance, my x86 macos box pretty reliably sees failures
> > when the machine is under load.)
>
> I *thought* that we had fixed all of those.
>
> But it is difficult for me to know because:
> - I only happens when one runs "make check"
> - running ./migration-test have never failed to me
> - When it fails (and it has been a while since it has failed to me)
>   it is impossible to me to detect what is going on, and as said, I have
>   never been able to reproduce running only migration-test.

Yes. If we could improve the logging in the test so that when
an intermittent failure does happen the test prints better
clues about what happened, I think that would help a lot.

https://lore.kernel.org/qemu-devel/CAFEAcA8x_iM3hN2-P9F+huXnXFXy+D6FzE+Leq4erLdg7zkVGw@mail.gmail.com/
is the thread from late December about the macos failures.

-- PMM
Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Juan Quintela 2 years, 3 months ago
Peter Maydell <peter.maydell@linaro.org> wrote:
> On Fri, 3 Feb 2023 at 21:14, Juan Quintela <quintela@redhat.com> wrote:
>>
>> Peter Maydell <peter.maydell@linaro.org> wrote:
>> > The migration tests have been flaky for a while now,
>> > including setups where host and guest page sizes are the same.
>> > (For instance, my x86 macos box pretty reliably sees failures
>> > when the machine is under load.)
>>
>> I *thought* that we had fixed all of those.
>>
>> But it is difficult for me to know because:
>> - I only happens when one runs "make check"
>> - running ./migration-test have never failed to me
>> - When it fails (and it has been a while since it has failed to me)
>>   it is impossible to me to detect what is going on, and as said, I have
>>   never been able to reproduce running only migration-test.
>
> Yes. If we could improve the logging in the test so that when
> an intermittent failure does happen the test prints better
> clues about what happened, I think that would help a lot.
>
> https://lore.kernel.org/qemu-devel/CAFEAcA8x_iM3hN2-P9F+huXnXFXy+D6FzE+Leq4erLdg7zkVGw@mail.gmail.com/
> is the thread from late December about the macos failures.

We (red hat) found a similar problem with aarch64, but only when using
zero copy.  Will try to see if I can reproduce this other there.

https://bugzilla.redhat.com/show_bug.cgi?id=2160929

the similar thing to what you have is:
- they are trying to cancel
- they are on aarch64

but:

- they can only reproduce with zero copy enabled.

Later, Juan.
Re: [PATCH 6/6] gitlab-ci.d/buildtest: Disintegrate the build-coroutine-sigaltstack job
Posted by Thomas Huth 2 years, 3 months ago
On 30/01/2023 11.58, Daniel P. Berrangé wrote:
> On Mon, Jan 30, 2023 at 11:44:46AM +0100, Thomas Huth wrote:
>> We can get rid of the build-coroutine-sigaltstack job by moving
>> the configure flags that should be tested here to other jobs:
>> Move --with-coroutine=sigaltstack to the build-without-defaults job
>> and --enable-trace-backends=ftrace to the cross-s390x-kvm-only job.
> 
> The biggest user of coroutines is the block layer. So we probably
> ought to have coroutines aligned with a job that triggers the
> 'make check-block' for iotests.  IIUC,  the without-defaults
> job won't do that. How about, arbitrarily, using either the
> 'check-system-debian' or 'check-system-ubuntu' job. Those distros
> are closely related, so getting sigaltstack vs ucontext coverage
> between them is a good win, and they both trigger the block jobs
> IIUC.

Ok ... but let's put qemu-block@nongnu.org on CC: first, to see what the 
block layer folks think about this.

> Incidentally sigaltstack is also covered by our Cirrus CI job
> for macOS.

Oh, nice, so we already have some "check-block" test coverage there!

  Thomas


>> ---
>>   .gitlab-ci.d/buildtest.yml   | 14 ++------------
>>   .gitlab-ci.d/crossbuilds.yml |  2 +-
>>   2 files changed, 3 insertions(+), 13 deletions(-)
>>
>> diff --git a/.gitlab-ci.d/buildtest.yml b/.gitlab-ci.d/buildtest.yml
>> index 91c7467a66..1438797a1c 100644
>> --- a/.gitlab-ci.d/buildtest.yml
>> +++ b/.gitlab-ci.d/buildtest.yml
>> @@ -533,19 +533,8 @@ build-tci:
>>       - QTEST_QEMU_BINARY="./qemu-system-s390x" ./tests/qtest/pxe-test -m slow
>>       - make check-tcg
>>   
>> -# Alternate coroutines implementations are only really of interest to KVM users
>> -# However we can't test against KVM on Gitlab-CI so we can only run unit tests
>> -build-coroutine-sigaltstack:
>> -  extends: .native_build_job_template
>> -  needs:
>> -    job: amd64-ubuntu2004-container
>> -  variables:
>> -    IMAGE: ubuntu2004
>> -    CONFIGURE_ARGS: --with-coroutine=sigaltstack --disable-tcg
>> -                    --enable-trace-backends=ftrace
>> -    MAKE_CHECK_ARGS: check-unit
>> -
>>   # Check our reduced build configurations
>> +# (and an alternative coroutine implementation)
>>   build-without-defaults:
>>     extends: .native_build_job_template
>>     needs:
>> @@ -559,6 +548,7 @@ build-without-defaults:
>>         --disable-pie
>>         --disable-qom-cast-debug
>>         --disable-strip
>> +      --with-coroutine=sigaltstack
>>       TARGETS: avr-softmmu mips64-softmmu s390x-softmmu sh4-softmmu
>>         sparc64-softmmu hexagon-linux-user i386-linux-user s390x-linux-user
>>       MAKE_CHECK_ARGS: check-unit check-qtest-avr check-qtest-mips64
>> diff --git a/.gitlab-ci.d/crossbuilds.yml b/.gitlab-ci.d/crossbuilds.yml
>> index 8dbbb8f881..027d2088da 100644
>> --- a/.gitlab-ci.d/crossbuilds.yml
>> +++ b/.gitlab-ci.d/crossbuilds.yml
>> @@ -159,7 +159,7 @@ cross-s390x-kvm-only:
>>       job: s390x-debian-cross-container
>>     variables:
>>       IMAGE: debian-s390x-cross
>> -    EXTRA_CONFIGURE_OPTS: --disable-tcg
>> +    EXTRA_CONFIGURE_OPTS: --disable-tcg --enable-trace-backends=ftrace
>>   
>>   cross-mips64el-kvm-only:
>>     extends: .cross_accel_build_job
>> -- 
>> 2.31.1
>>
>>
> 
> With regards,
> Daniel