From nobody Wed May 14 12:16:36 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=fail(p=none dis=none) header.from=virtuozzo.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 15240628795491023.4190120266346; Wed, 18 Apr 2018 07:47:59 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D4D1931854C7; Wed, 18 Apr 2018 14:47:57 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.20]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 8CBB27D939; Wed, 18 Apr 2018 14:47:55 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 4D5CF180BADB; Wed, 18 Apr 2018 14:47:45 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id w3IEkPHY029904 for ; Wed, 18 Apr 2018 10:46:25 -0400 Received: by smtp.corp.redhat.com (Postfix) id 0EDA96C33B; Wed, 18 Apr 2018 14:46:25 +0000 (UTC) Received: from mx1.redhat.com (ext-mx13.extmail.prod.ext.phx2.redhat.com [10.5.110.42]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D1C5A75D05 for ; Wed, 18 Apr 2018 14:46:04 +0000 (UTC) Received: from relay.sw.ru (relay.sw.ru [185.231.240.75]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 3D8F830BC5FF for ; Wed, 18 Apr 2018 14:44:52 +0000 (UTC) Received: from msk-vpn.virtuozzo.com ([195.214.232.6] helo=dim-vz7.qa.sw.ru) by relay.sw.ru with esmtp (Exim 4.90_1) (envelope-from ) id 1f8oKI-0002Qh-ER for libvir-list@redhat.com; Wed, 18 Apr 2018 17:44:50 +0300 From: Nikolay Shirokovskiy To: libvir-list@redhat.com Date: Wed, 18 Apr 2018 17:44:43 +0300 Message-Id: <1524062684-854425-5-git-send-email-nshirokovskiy@virtuozzo.com> In-Reply-To: <1524062684-854425-1-git-send-email-nshirokovskiy@virtuozzo.com> References: <1524062684-854425-1-git-send-email-nshirokovskiy@virtuozzo.com> X-Greylist: Sender passed SPF test, ACL 227 matched, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Wed, 18 Apr 2018 14:45:13 +0000 (UTC) X-Greylist: inspected by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Wed, 18 Apr 2018 14:45:13 +0000 (UTC) for IP:'185.231.240.75' DOMAIN:'relay.sw.ru' HELO:'relay.sw.ru' FROM:'nshirokovskiy@virtuozzo.com' RCPT:'' X-RedHat-Spam-Score: -0.001 (SPF_PASS) 185.231.240.75 relay.sw.ru 185.231.240.75 relay.sw.ru X-Scanned-By: MIMEDefang 2.84 on 10.5.110.42 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-loop: libvir-list@redhat.com Subject: [libvirt] [PATCH REBASE 4/5] qemu: fix domain object wait to handle monitor errors X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.45]); Wed, 18 Apr 2018 14:47:58 +0000 (UTC) X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Type: text/plain; charset="utf-8" Block job abort operation can not handle properly qemu crashes when waiting for abort/pivot completion. Deadlock scenario is next: - qemuDomainBlockJobAbort waits for pivot/abort completion - qemu crashes, then qemuProcessBeginStopJob broadcasts for VM condition and then waits for job condition (taken by qemuDomainBlockJobAbort) - qemuDomainBlockJobAbort awakes but nothing really changed, VM is still active (vm->def->id !=3D -1) so thread starts waiting for completion agai= n. Now two threads are in deadlock. First let's add some condition besides domain active status so that waiting thread can check it when awakes. Second let's move signalling to the place where condition is set, namely monitor eof/error handlers. Having signalling in qemuProcessBeginStopJob is not useful. The patch copies monitor error to domain state because at time waiting thread awakes there can be no monitor and it is useful to report monitor error to user. The patch has a drawback that on destroying a domain when waiting for event from monitor we get not very convinient error message like 'EOF from monitor' from waiting API. On the other hand if qemu crashes this is more useful then 'domain is not running'. The first case will be addressed in another patch. The patch also fixes other places where we wait for event from qemu. Namely handling device removal and tray ejection. Other places which used virDomainObjWait (dump, migration, save) were good because of async jobs which allow concurrent destroy job. Signed-off-by: Nikolay Shirokovskiy --- src/conf/domain_conf.c | 43 ------------------------------------------- src/conf/domain_conf.h | 3 --- src/libvirt_private.syms | 2 -- src/qemu/qemu_domain.c | 45 +++++++++++++++++++++++++++++++++++++++++++= ++ src/qemu/qemu_domain.h | 5 ++++- src/qemu/qemu_driver.c | 6 +++--- src/qemu/qemu_hotplug.c | 4 ++-- src/qemu/qemu_migration.c | 12 ++++++------ src/qemu/qemu_process.c | 27 ++++++++++++++++++++++----- 9 files changed, 82 insertions(+), 65 deletions(-) diff --git a/src/conf/domain_conf.c b/src/conf/domain_conf.c index 5f1af91..e046008 100644 --- a/src/conf/domain_conf.c +++ b/src/conf/domain_conf.c @@ -3249,49 +3249,6 @@ virDomainObjBroadcast(virDomainObjPtr vm) } =20 =20 -int -virDomainObjWait(virDomainObjPtr vm) -{ - if (virCondWait(&vm->cond, &vm->parent.lock) < 0) { - virReportSystemError(errno, "%s", - _("failed to wait for domain condition")); - return -1; - } - - if (!virDomainObjIsActive(vm)) { - virReportError(VIR_ERR_OPERATION_FAILED, "%s", - _("domain is not running")); - return -1; - } - - return 0; -} - - -/** - * Waits for domain condition to be triggered for a specific period of tim= e. - * - * Returns: - * -1 in case of error - * 0 on success - * 1 on timeout - */ -int -virDomainObjWaitUntil(virDomainObjPtr vm, - unsigned long long whenms) -{ - if (virCondWaitUntil(&vm->cond, &vm->parent.lock, whenms) < 0) { - if (errno !=3D ETIMEDOUT) { - virReportSystemError(errno, "%s", - _("failed to wait for domain condition")); - return -1; - } - return 1; - } - return 0; -} - - /* * Mark the current VM config as transient. Ensures transient hotplug * operations do not persist past shutdown. diff --git a/src/conf/domain_conf.h b/src/conf/domain_conf.h index 122a051..b4d5bdb 100644 --- a/src/conf/domain_conf.h +++ b/src/conf/domain_conf.h @@ -2747,9 +2747,6 @@ bool virDomainObjTaint(virDomainObjPtr obj, virDomainTaintFlags taint); =20 void virDomainObjBroadcast(virDomainObjPtr vm); -int virDomainObjWait(virDomainObjPtr vm); -int virDomainObjWaitUntil(virDomainObjPtr vm, - unsigned long long whenms); =20 void virDomainPanicDefFree(virDomainPanicDefPtr panic); void virDomainResourceDefFree(virDomainResourceDefPtr resource); diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms index 6bbbf77..ed5498e 100644 --- a/src/libvirt_private.syms +++ b/src/libvirt_private.syms @@ -493,8 +493,6 @@ virDomainObjSetMetadata; virDomainObjSetState; virDomainObjTaint; virDomainObjUpdateModificationImpact; -virDomainObjWait; -virDomainObjWaitUntil; virDomainOSTypeFromString; virDomainOSTypeToString; virDomainParseMemory; diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index 672f08b..1f40ff1 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -1909,6 +1909,7 @@ qemuDomainObjPrivateFree(void *data) qemuDomainObjFreeJob(priv); VIR_FREE(priv->lockState); VIR_FREE(priv->origname); + virResetError(&priv->monError); =20 virChrdevFree(priv->devs); =20 @@ -11881,3 +11882,47 @@ qemuProcessEventFree(struct qemuProcessEvent *even= t) } VIR_FREE(event); } + + +/** + * Waits for domain condition to be triggered for a specific period of tim= e. + * if @until is 0 then waits indefinetely. + * + * Returns: + * -1 on error + * 0 on success + * 1 on timeout + */ +int +qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until) +{ + qemuDomainObjPrivatePtr priv =3D vm->privateData; + int rc; + + if (until) + rc =3D virCondWaitUntil(&vm->cond, &vm->parent.lock, until); + else + rc =3D virCondWait(&vm->cond, &vm->parent.lock); + + if (rc < 0) { + if (until && errno =3D=3D ETIMEDOUT) + return 1; + + virReportSystemError(errno, "%s", + _("failed to wait for domain condition")); + return -1; + } + + if (!virDomainObjIsActive(vm)) { + virReportError(VIR_ERR_OPERATION_FAILED, "%s", + _("domain is not running")); + return -1; + } + + if (priv->monError.code !=3D VIR_ERR_OK) { + virSetError(&priv->monError); + return -1; + } + + return 0; +} diff --git a/src/qemu/qemu_domain.h b/src/qemu/qemu_domain.h index 40d9a6f..494ed35 100644 --- a/src/qemu/qemu_domain.h +++ b/src/qemu/qemu_domain.h @@ -262,7 +262,7 @@ struct _qemuDomainObjPrivate { qemuMonitorPtr mon; virDomainChrSourceDefPtr monConfig; bool monJSON; - bool monError; + virError monError; unsigned long long monStart; =20 qemuAgentPtr agent; @@ -994,4 +994,7 @@ qemuDomainPrepareDiskSource(virDomainDiskDefPtr disk, qemuDomainObjPrivatePtr priv, virQEMUDriverConfigPtr cfg); =20 +int +qemuDomainObjWait(virDomainObjPtr vm, unsigned long long until); + #endif /* __QEMU_DOMAIN_H__ */ diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c index 0dd6032..03969d8 100644 --- a/src/qemu/qemu_driver.c +++ b/src/qemu/qemu_driver.c @@ -2727,7 +2727,7 @@ qemuDomainGetControlInfo(virDomainPtr dom, =20 memset(info, 0, sizeof(*info)); =20 - if (priv->monError) { + if (priv->monError.code !=3D VIR_ERR_OK) { info->state =3D VIR_DOMAIN_CONTROL_ERROR; info->details =3D VIR_DOMAIN_CONTROL_ERROR_REASON_MONITOR; } else if (priv->job.active) { @@ -3726,7 +3726,7 @@ qemuDumpWaitForCompletion(virDomainObjPtr vm) =20 VIR_DEBUG("Waiting for dump completion"); while (!priv->job.dumpCompleted && !priv->job.abortJob) { - if (virDomainObjWait(vm) < 0) + if (qemuDomainObjWait(vm, 0) < 0) return -1; } =20 @@ -16924,7 +16924,7 @@ qemuDomainBlockJobAbort(virDomainPtr dom, qemuDomainDiskPrivatePtr diskPriv =3D QEMU_DOMAIN_DISK_PRIVATE(dis= k); qemuBlockJobUpdate(driver, vm, QEMU_ASYNC_JOB_NONE, disk, NULL); while (diskPriv->blockjob) { - if (virDomainObjWait(vm) < 0) { + if (qemuDomainObjWait(vm, 0) < 0) { ret =3D -1; goto endjob; } diff --git a/src/qemu/qemu_hotplug.c b/src/qemu/qemu_hotplug.c index 8d3191f..527ce3f 100644 --- a/src/qemu/qemu_hotplug.c +++ b/src/qemu/qemu_hotplug.c @@ -206,7 +206,7 @@ qemuHotplugWaitForTrayEject(virQEMUDriverPtr driver, return -1; =20 while (disk->tray_status !=3D VIR_DOMAIN_DISK_TRAY_OPEN) { - if ((rc =3D virDomainObjWaitUntil(vm, now + CHANGE_MEDIA_TIMEOUT))= < 0) + if ((rc =3D qemuDomainObjWait(vm, now + CHANGE_MEDIA_TIMEOUT)) < 0) return -1; =20 if (rc > 0) { @@ -4602,7 +4602,7 @@ qemuDomainWaitForDeviceRemoval(virDomainObjPtr vm) until +=3D qemuDomainRemoveDeviceWaitTime; =20 while (priv->unplug.alias) { - if ((rc =3D virDomainObjWaitUntil(vm, until)) =3D=3D 1) + if ((rc =3D qemuDomainObjWait(vm, until)) =3D=3D 1) return 0; =20 if (rc < 0) { diff --git a/src/qemu/qemu_migration.c b/src/qemu/qemu_migration.c index 88b8253..82e1c7f 100644 --- a/src/qemu/qemu_migration.c +++ b/src/qemu/qemu_migration.c @@ -738,7 +738,7 @@ qemuMigrationSrcCancelDriveMirror(virQEMUDriverPtr driv= er, if (failed && !err) err =3D virSaveLastError(); =20 - if (virDomainObjWait(vm) < 0) + if (qemuDomainObjWait(vm, 0) < 0) goto cleanup; } =20 @@ -877,7 +877,7 @@ qemuMigrationSrcDriveMirror(virQEMUDriverPtr driver, goto cleanup; } =20 - if (virDomainObjWait(vm) < 0) + if (qemuDomainObjWait(vm, 0) < 0) goto cleanup; } =20 @@ -1181,7 +1181,7 @@ qemuMigrationSrcWaitForSpice(virDomainObjPtr vm) =20 VIR_DEBUG("Waiting for SPICE to finish migration"); while (!priv->job.spiceMigrated && !priv->job.abortJob) { - if (virDomainObjWait(vm) < 0) + if (qemuDomainObjWait(vm, 0) < 0) return -1; } return 0; @@ -1460,7 +1460,7 @@ qemuMigrationSrcWaitForCompletion(virQEMUDriverPtr dr= iver, return rv; =20 if (events) { - if (virDomainObjWait(vm) < 0) { + if (qemuDomainObjWait(vm, 0) < 0) { jobInfo->status =3D QEMU_DOMAIN_JOB_STATUS_FAILED; return -2; } @@ -1513,7 +1513,7 @@ qemuMigrationDstWaitForCompletion(virQEMUDriverPtr dr= iver, =20 while ((rv =3D qemuMigrationAnyCompleted(driver, vm, asyncJob, NULL, flags)) !=3D 1) { - if (rv < 0 || virDomainObjWait(vm) < 0) + if (rv < 0 || qemuDomainObjWait(vm, 0) < 0) return -1; } =20 @@ -3464,7 +3464,7 @@ qemuMigrationSrcRun(virQEMUDriverPtr driver, if (priv->monJSON) { while (virDomainObjGetState(vm, NULL) =3D=3D VIR_DOMAIN_RUNNING) { priv->signalStop =3D true; - rc =3D virDomainObjWait(vm); + rc =3D qemuDomainObjWait(vm, 0); priv->signalStop =3D false; if (rc < 0) goto error; diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 6a5262a..d76809e 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -268,6 +268,23 @@ qemuConnectAgent(virQEMUDriverPtr driver, virDomainObj= Ptr vm) return 0; } =20 +static void +qemuProcessNotifyMonitorError(virDomainObjPtr vm, + qemuMonitorPtr mon) +{ + qemuDomainObjPrivatePtr priv =3D vm->privateData; + virErrorPtr err =3D qemuMonitorLastError(mon); + + virCopyError(err, &priv->monError); + + /* set error code if due to OOM conditions we fail to set it before */ + if (priv->monError.code =3D=3D VIR_ERR_OK) + priv->monError.code =3D VIR_ERR_INTERNAL_ERROR; + + /* Wake up anything waiting for events from monitor */ + virDomainObjBroadcast(vm); + virFreeError(err); +} =20 /* * This is a callback registered with a qemuMonitorPtr instance, @@ -286,6 +303,8 @@ qemuProcessHandleMonitorEOF(qemuMonitorPtr mon, =20 virObjectLock(vm); =20 + qemuProcessNotifyMonitorError(vm, mon); + VIR_DEBUG("Received EOF on %p '%s'", vm, vm->def->name); =20 priv =3D vm->privateData; @@ -338,7 +357,8 @@ qemuProcessHandleMonitorError(qemuMonitorPtr mon ATTRIB= UTE_UNUSED, =20 virObjectLock(vm); =20 - ((qemuDomainObjPrivatePtr) vm->privateData)->monError =3D true; + qemuProcessNotifyMonitorError(vm, mon); + event =3D virDomainEventControlErrorNewFromObj(vm); qemuDomainEventQueue(driver, event); =20 @@ -5727,7 +5747,7 @@ qemuProcessPrepareDomain(virQEMUDriverPtr driver, goto cleanup; =20 priv->monJSON =3D true; - priv->monError =3D false; + virResetError(&priv->monError); priv->monStart =3D 0; priv->gotShutdown =3D false; =20 @@ -6483,9 +6503,6 @@ qemuProcessBeginStopJob(virQEMUDriverPtr driver, if (qemuProcessKill(vm, killFlags) < 0) goto cleanup; =20 - /* Wake up anything waiting on domain condition */ - virDomainObjBroadcast(vm); - if (qemuDomainObjBeginJob(driver, vm, job) < 0) goto cleanup; =20 --=20 1.8.3.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list