From nobody Sat Apr 27 21:44:27 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1486052428996345.81218190991535; Thu, 2 Feb 2017 08:20:28 -0800 (PST) Received: from localhost ([::1]:57600 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cZK7W-0004VS-3n for importer@patchew.org; Thu, 02 Feb 2017 11:20:26 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55602) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cZJn2-0001YN-2c for qemu-devel@nongnu.org; Thu, 02 Feb 2017 10:59:16 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cZJn0-00032m-3s for qemu-devel@nongnu.org; Thu, 02 Feb 2017 10:59:16 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51460) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cZJmz-00032O-U5 for qemu-devel@nongnu.org; Thu, 02 Feb 2017 10:59:14 -0500 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 01D743B708; Thu, 2 Feb 2017 15:59:14 +0000 (UTC) Received: from dgilbert-t530.redhat.com (ovpn-117-124.ams2.redhat.com [10.36.117.124]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v12FxBJV001426; Thu, 2 Feb 2017 10:59:12 -0500 From: "Dr. David Alan Gilbert (git)" To: qemu-devel@nongnu.org, quintela@redhat.com, amit.shah@redhat.com Date: Thu, 2 Feb 2017 15:59:08 +0000 Message-Id: <20170202155909.31784-2-dgilbert@redhat.com> In-Reply-To: <20170202155909.31784-1-dgilbert@redhat.com> References: <20170202155909.31784-1-dgilbert@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 02 Feb 2017 15:59:14 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH 1/2] Postcopy: Reset state to avoid cleanup assert X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: zhang.zhanghailiang@huawei.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: "Dr. David Alan Gilbert" On a destination host with no userfault support an incoming postcopy would cause the state to enter ADVISE before it realised there was no support, and because it was in ADVISE state it would perform a cleanup at the end. Since there was no support the cleanup function should be unreachable, but ends up being called and asserting. Reset the state when we realise we have no support, thus the cleanup doesn't happen. Signed-off-by: Dr. David Alan Gilbert --- migration/savevm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/migration/savevm.c b/migration/savevm.c index e8e5ff5..de86db0 100644 --- a/migration/savevm.c +++ b/migration/savevm.c @@ -1355,6 +1355,7 @@ static int loadvm_postcopy_handle_advise(MigrationInc= omingState *mis) } =20 if (!postcopy_ram_supported_by_host()) { + postcopy_state_set(POSTCOPY_INCOMING_NONE); return -1; } =20 --=20 2.9.3 From nobody Sat Apr 27 21:44:27 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 148605271540515.698732418630357; Thu, 2 Feb 2017 08:25:15 -0800 (PST) Received: from localhost ([::1]:57638 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cZKC9-0000M0-II for importer@patchew.org; Thu, 02 Feb 2017 11:25:13 -0500 Received: from eggs.gnu.org ([2001:4830:134:3::10]:55607) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cZJn2-0001YY-5u for qemu-devel@nongnu.org; Thu, 02 Feb 2017 10:59:17 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cZJn1-00033J-Fh for qemu-devel@nongnu.org; Thu, 02 Feb 2017 10:59:16 -0500 Received: from mx1.redhat.com ([209.132.183.28]:51474) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cZJn1-000330-9X for qemu-devel@nongnu.org; Thu, 02 Feb 2017 10:59:15 -0500 Received: from int-mx11.intmail.prod.int.phx2.redhat.com (int-mx11.intmail.prod.int.phx2.redhat.com [10.5.11.24]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 5BBA53B722; Thu, 2 Feb 2017 15:59:15 +0000 (UTC) Received: from dgilbert-t530.redhat.com (ovpn-117-124.ams2.redhat.com [10.36.117.124]) by int-mx11.intmail.prod.int.phx2.redhat.com (8.14.4/8.14.4) with ESMTP id v12FxBJW001426; Thu, 2 Feb 2017 10:59:14 -0500 From: "Dr. David Alan Gilbert (git)" To: qemu-devel@nongnu.org, quintela@redhat.com, amit.shah@redhat.com Date: Thu, 2 Feb 2017 15:59:09 +0000 Message-Id: <20170202155909.31784-3-dgilbert@redhat.com> In-Reply-To: <20170202155909.31784-1-dgilbert@redhat.com> References: <20170202155909.31784-1-dgilbert@redhat.com> X-Scanned-By: MIMEDefang 2.68 on 10.5.11.24 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.30]); Thu, 02 Feb 2017 15:59:15 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH 2/2] postcopy: Recover block devices on early failure X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: zhang.zhanghailiang@huawei.com Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" From: "Dr. David Alan Gilbert" An early postcopy failure can be recovered from as long as we know we haven't sent the command to run the destination. We have to undo the bdrv_inactivate_all by calling bdrv_invalidate_cache_all Note that I'm not using ms->block_inactive because once we've sent the postcopy package we dont want anything else to try and recover the block storage on the source; the destination might have started writing to it. Signed-off-by: Dr. David Alan Gilbert --- migration/migration.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/migration/migration.c b/migration/migration.c index 2766d2f..283677c 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -1605,6 +1605,7 @@ static int postcopy_start(MigrationState *ms, bool *o= ld_vm_running) QIOChannelBuffer *bioc; QEMUFile *fb; int64_t time_at_stop =3D qemu_clock_get_ms(QEMU_CLOCK_REALTIME); + bool restart_block =3D false; migrate_set_state(&ms->state, MIGRATION_STATUS_ACTIVE, MIGRATION_STATUS_POSTCOPY_ACTIVE); =20 @@ -1624,6 +1625,7 @@ static int postcopy_start(MigrationState *ms, bool *o= ld_vm_running) if (ret < 0) { goto fail; } + restart_block =3D true; =20 /* * Cause any non-postcopiable, but iterative devices to @@ -1680,6 +1682,18 @@ static int postcopy_start(MigrationState *ms, bool *= old_vm_running) =20 /* <><> end of stuff going into the package */ =20 + /* Last point of recovery; as soon as we send the package the destinat= ion + * can open devices and potentially start running. + * Lets just check again we've not got any errors. + */ + ret =3D qemu_file_get_error(ms->to_dst_file); + if (ret) { + error_report("postcopy_start: Migration stream errored (pre packag= e)"); + goto fail_closefb; + } + + restart_block =3D false; + /* Now send that blob */ if (qemu_savevm_send_packaged(ms->to_dst_file, bioc->data, bioc->usage= )) { goto fail_closefb; @@ -1717,6 +1731,17 @@ fail_closefb: fail: migrate_set_state(&ms->state, MIGRATION_STATUS_POSTCOPY_ACTIVE, MIGRATION_STATUS_FAILED); + if (restart_block) { + /* A failure happened early enough that we know the destination ha= sn't + * accessed block devices, so we're safe to recover. + */ + Error *local_err =3D NULL; + + bdrv_invalidate_cache_all(&local_err); + if (local_err) { + error_report_err(local_err); + } + } qemu_mutex_unlock_iothread(); return -1; } --=20 2.9.3