From nobody Tue May 13 07:03:27 2025 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1675110257; cv=none; d=zohomail.com; s=zohoarc; b=UpvdR3Wu3+5r1xD4TlF+U7iPQyPFAP5TlS1Pl7ByF3tD5f95NIgyTyAFUP6+aeupdq/aP5XyphjUBzMRYtNlouaRjN7qlTrFX8smUky9ooaGGQKsPs5CIxVbgM95PT0WMF1MCiEB7NoZj/8ArL2kS/0M7uHO1SyBZ3mMlWP74PE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1675110257; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:To; bh=UAwSvHSOl+UoCWd5gVq+4zPfPKuJ/FxuHvkI36xoePk=; b=PWTahlQ5hS1JCoVcj+WIGAuaxRMJSuhndaGIrTazjy/0qmKJKVJNm59lmgQLJUsczhBuxYMwreTsYs1FCqH+/rrUNAFg4cBccw8tMyTPC6RvJnqPXrgiyEHiEkdieedMVCoBTU2lYlIKT3+2uKP6AP5FBbdu9d4GDJ6i7gYflec= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=none dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTP id 1675110257362302.53171595869765; Mon, 30 Jan 2023 12:24:17 -0800 (PST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pMadf-0004Nz-Vx; Mon, 30 Jan 2023 15:20:28 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pMadT-0003RB-Gq for qemu-devel@nongnu.org; Mon, 30 Jan 2023 15:20:16 -0500 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pMadQ-00076S-BV for qemu-devel@nongnu.org; Mon, 30 Jan 2023 15:20:15 -0500 Received: from mail-ed1-f72.google.com (mail-ed1-f72.google.com [209.85.208.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-299-ulUkELp9My-vNQlZ2ATScQ-1; Mon, 30 Jan 2023 15:20:08 -0500 Received: by mail-ed1-f72.google.com with SMTP id t26-20020aa7d71a000000b004a244cc7ee8so2816827edq.2 for ; Mon, 30 Jan 2023 12:20:08 -0800 (PST) Received: from redhat.com ([2.52.144.173]) by smtp.gmail.com with ESMTPSA id gn19-20020a1709070d1300b008512e1379dbsm7505598ejc.171.2023.01.30.12.20.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Jan 2023 12:20:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1675110010; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UAwSvHSOl+UoCWd5gVq+4zPfPKuJ/FxuHvkI36xoePk=; b=hyejNH1MkH3I4gctVQzmEgL3cR1k79yUOInzoIJtY4yGuY91FdZYVVZvpXO15QjlqSJQ3R gFHizw2PQKCFhPZwAOyJrTyh4Nf3jDktKEwLEUbHcVVlGKwmRyQA620/XPZ/VcDJcybYNn 5o6T47CAIREZ/cQ0HoIcmiW1YBjN8CU= X-MC-Unique: ulUkELp9My-vNQlZ2ATScQ-1 X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-transfer-encoding:content-disposition :mime-version:references:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UAwSvHSOl+UoCWd5gVq+4zPfPKuJ/FxuHvkI36xoePk=; b=MaOpSVcs7Iegi+U08cDcyEevJPe0zIeNa67DIavt9rraD14Yc0og32a5RQBZStX30I CPyfcs/eE3O4uHtlBevTTeDEr1x6ordgVC/pIF3xqh+vXPOCyV1+LypCGHqbZomvbTRN jJFcUbF4vUZQ6UrUQRRwAEennpWDoP3Bl+qPUEA/PcYSCXJjG2wO4xVfkrCC/Fk9LUJI 2JTNdyYCCGAcN7VUMGxkcVnTpU90TsnSvimScsOGzjpqflCpGSUIJ+fYaNxsjcrPUEje fOMKoyW6r1nGo+945UQa5BDp0mQc4/1NW4V+Lrhlb8gqBwN4/L7CUgWYbPfSsDF0StVo BQrQ== X-Gm-Message-State: AO0yUKVK3AZKogE6NnQUd+josIT5oUAgzcRyruEFgpVS8V1QnWY5IHjH wFNF7WLbf1UWWeU2jiK8+St8cyaR/G2aK/Z82sgCKaetuG0PdVwW5FfEQBWONFKE7HhC41GCeyj 9+TuiSOuIC2t6HfGxno525LDd42rG2zmJcgs99jmntkGa0BW41ykIrcTac25L X-Received: by 2002:a17:906:b7ca:b0:878:814d:bc99 with SMTP id fy10-20020a170906b7ca00b00878814dbc99mr16014110ejb.66.1675110004022; Mon, 30 Jan 2023 12:20:04 -0800 (PST) X-Google-Smtp-Source: AK7set8WEbaok4QRm+tfQUwP43WlgpJvFsOb3kX+Eo92Am5pQD1LgLnseQotKea5csNMeOlgTHprHg== X-Received: by 2002:a17:906:b7ca:b0:878:814d:bc99 with SMTP id fy10-20020a170906b7ca00b00878814dbc99mr16014075ejb.66.1675110003753; Mon, 30 Jan 2023 12:20:03 -0800 (PST) Date: Mon, 30 Jan 2023 15:19:59 -0500 From: "Michael S. Tsirkin" To: qemu-devel@nongnu.org Cc: Peter Maydell , "Jason A. Donenfeld" , x86@kernel.org, Philippe =?utf-8?Q?Mathieu-Daud=C3=A9?= , "H . Peter Anvin" , Borislav Petkov , Eric Biggers , Eric Biggers , Mathias Krause , Sergio Lopez , Paolo Bonzini , Richard Henderson , Eduardo Habkost , Marcel Apfelbaum , Gerd Hoffmann Subject: [PULL 10/56] x86: don't let decompressed kernel image clobber setup_data Message-ID: <20230130201810.11518-11-mst@redhat.com> References: <20230130201810.11518-1-mst@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Disposition: inline Content-Transfer-Encoding: quoted-printable In-Reply-To: <20230130201810.11518-1-mst@redhat.com> X-Mailer: git-send-email 2.27.0.106.g8ac3dc51b1 X-Mutt-Fcc: =sent Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.129.124; envelope-from=mst@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1675110259067100001 From: "Jason A. Donenfeld" The setup_data links are appended to the compressed kernel image. Since the kernel image is typically loaded at 0x100000, setup_data lives at `0x100000 + compressed_size`, which does not get relocated during the kernel's boot process. The kernel typically decompresses the image starting at address 0x1000000 (note: there's one more zero there than the compressed image above). This usually is fine for most kernels. However, if the compressed image is actually quite large, then setup_data will live at a `0x100000 + compressed_size` that extends into the decompressed zone at 0x1000000. In other words, if compressed_size is larger than `0x1000000 - 0x100000`, then the decompression step will clobber setup_data, resulting in crashes. Visually, what happens now is that QEMU appends setup_data to the kernel image: kernel image setup_data |--------------------------||----------------| 0x100000 0x100000+l1 0x100000+l1+l2 The problem is that this decompresses to 0x1000000 (one more zero). So if l1 is > (0x1000000-0x100000), then this winds up looking like: kernel image setup_data |--------------------------||----------------| 0x100000 0x100000+l1 0x100000+l1+l2 d e c o m p r e s s e d k e r n e l |-----------------------------------------------------= --------| 0x1000000 = 0x1000000+l3 The decompressed kernel seemingly overwriting the compressed kernel image isn't a problem, because that gets relocated to a higher address early on in the boot process, at the end of startup_64. setup_data, however, stays in the same place, since those links are self referential and nothing fixes them up. So the decompressed kernel clobbers it. Fix this by appending setup_data to the cmdline blob rather than the kernel image blob, which remains at a lower address that won't get clobbered. This could have been done by overwriting the initrd blob instead, but that poses big difficulties, such as no longer being able to use memory mapped files for initrd, hurting performance, and, more importantly, the initrd address calculation is hard coded in qboot, and it always grows down rather than up, which means lots of brittle semantics would have to be changed around, incurring more complexity. In contrast, using cmdline is simple and doesn't interfere with anything. The microvm machine has a gross hack where it fiddles with fw_cfg data after the fact. So this hack is updated to account for this appending, by reserving some bytes. Fixup-by: Michael S. Tsirkin Cc: x86@kernel.org Cc: Philippe Mathieu-Daud=C3=A9 Cc: H. Peter Anvin Cc: Borislav Petkov Cc: Eric Biggers Signed-off-by: Jason A. Donenfeld Message-Id: <20221230220725.618763-1-Jason@zx2c4.com> Message-ID: <20230128061015-mutt-send-email-mst@kernel.org> Reviewed-by: Michael S. Tsirkin Signed-off-by: Michael S. Tsirkin Tested-by: Eric Biggers Tested-by: Mathias Krause --- include/hw/i386/microvm.h | 5 ++-- include/hw/nvram/fw_cfg.h | 9 +++++++ hw/i386/microvm.c | 15 +++++++---- hw/i386/x86.c | 52 +++++++++++++++++++++------------------ hw/nvram/fw_cfg.c | 9 +++++++ 5 files changed, 59 insertions(+), 31 deletions(-) diff --git a/include/hw/i386/microvm.h b/include/hw/i386/microvm.h index fad97a891d..e8af61f194 100644 --- a/include/hw/i386/microvm.h +++ b/include/hw/i386/microvm.h @@ -50,8 +50,9 @@ */ =20 /* Platform virtio definitions */ -#define VIRTIO_MMIO_BASE 0xfeb00000 -#define VIRTIO_CMDLINE_MAXLEN 64 +#define VIRTIO_MMIO_BASE 0xfeb00000 +#define VIRTIO_CMDLINE_MAXLEN 64 +#define VIRTIO_CMDLINE_TOTAL_MAX_LEN ((VIRTIO_CMDLINE_MAXLEN + 1) * 16) =20 #define GED_MMIO_BASE 0xfea00000 #define GED_MMIO_BASE_MEMHP (GED_MMIO_BASE + 0x100) diff --git a/include/hw/nvram/fw_cfg.h b/include/hw/nvram/fw_cfg.h index 2e503904dc..990dcdbb2e 100644 --- a/include/hw/nvram/fw_cfg.h +++ b/include/hw/nvram/fw_cfg.h @@ -139,6 +139,15 @@ void fw_cfg_add_bytes_callback(FWCfgState *s, uint16_t= key, void *data, size_t len, bool read_only); =20 +/** + * fw_cfg_read_bytes_ptr: + * @s: fw_cfg device being modified + * @key: selector key value for new fw_cfg item + * + * Reads an existing fw_cfg data pointer. + */ +void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key); + /** * fw_cfg_add_string: * @s: fw_cfg device being modified diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c index 170a331e3f..29f30dd6d3 100644 --- a/hw/i386/microvm.c +++ b/hw/i386/microvm.c @@ -378,7 +378,8 @@ static void microvm_fix_kernel_cmdline(MachineState *ma= chine) MicrovmMachineState *mms =3D MICROVM_MACHINE(machine); BusState *bus; BusChild *kid; - char *cmdline; + char *cmdline, *existing_cmdline; + size_t len; =20 /* * Find MMIO transports with attached devices, and add them to the ker= nel @@ -387,7 +388,8 @@ static void microvm_fix_kernel_cmdline(MachineState *ma= chine) * Yes, this is a hack, but one that heavily improves the UX without * introducing any significant issues. */ - cmdline =3D g_strdup(machine->kernel_cmdline); + existing_cmdline =3D fw_cfg_read_bytes_ptr(x86ms->fw_cfg, FW_CFG_CMDLI= NE_DATA); + cmdline =3D g_strdup(existing_cmdline); bus =3D sysbus_get_default(); QTAILQ_FOREACH(kid, &bus->children, sibling) { DeviceState *dev =3D kid->child; @@ -411,9 +413,12 @@ static void microvm_fix_kernel_cmdline(MachineState *m= achine) } } =20 - fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) = + 1); - fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline); - + len =3D strlen(cmdline); + if (len > VIRTIO_CMDLINE_TOTAL_MAX_LEN + strlen(existing_cmdline)) { + fprintf(stderr, "qemu: virtio mmio cmdline too large, skipping\n"); + } else { + memcpy(existing_cmdline, cmdline, len + 1); + } g_free(cmdline); } =20 diff --git a/hw/i386/x86.c b/hw/i386/x86.c index 78cc131926..eaff4227bd 100644 --- a/hw/i386/x86.c +++ b/hw/i386/x86.c @@ -50,6 +50,7 @@ #include "hw/intc/i8259.h" #include "hw/rtc/mc146818rtc.h" #include "target/i386/sev.h" +#include "hw/i386/microvm.h" =20 #include "hw/acpi/cpu_hotplug.h" #include "hw/irq.h" @@ -813,12 +814,18 @@ void x86_load_linux(X86MachineState *x86ms, const char *kernel_filename =3D machine->kernel_filename; const char *initrd_filename =3D machine->initrd_filename; const char *dtb_filename =3D machine->dtb; - const char *kernel_cmdline =3D machine->kernel_cmdline; + char *kernel_cmdline; SevKernelLoaderContext sev_load_ctx =3D {}; enum { RNG_SEED_LENGTH =3D 32 }; =20 - /* Align to 16 bytes as a paranoia measure */ - cmdline_size =3D (strlen(kernel_cmdline) + 16) & ~15; + /* + * Add the NUL terminator, some padding for the microvm cmdline fiddli= ng + * hack, and then align to 16 bytes as a paranoia measure + */ + cmdline_size =3D (strlen(machine->kernel_cmdline) + 1 + + VIRTIO_CMDLINE_TOTAL_MAX_LEN + 16) & ~15; + /* Make a copy, since we might append arbitrary bytes to it later. */ + kernel_cmdline =3D g_strndup(machine->kernel_cmdline, cmdline_size); =20 /* load the kernel header */ f =3D fopen(kernel_filename, "rb"); @@ -959,12 +966,6 @@ void x86_load_linux(X86MachineState *x86ms, initrd_max =3D x86ms->below_4g_mem_size - acpi_data_size - 1; } =20 - fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr); - fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(kernel_cmdline) + 1= ); - fw_cfg_add_string(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline); - sev_load_ctx.cmdline_data =3D (char *)kernel_cmdline; - sev_load_ctx.cmdline_size =3D strlen(kernel_cmdline) + 1; - if (protocol >=3D 0x202) { stl_p(header + 0x228, cmdline_addr); } else { @@ -1091,27 +1092,24 @@ void x86_load_linux(X86MachineState *x86ms, exit(1); } =20 - setup_data_offset =3D QEMU_ALIGN_UP(kernel_size, 16); - kernel_size =3D setup_data_offset + sizeof(SetupData) + dtb_size; - kernel =3D g_realloc(kernel, kernel_size); - - - setup_data =3D (SetupData *)(kernel + setup_data_offset); + setup_data_offset =3D cmdline_size; + cmdline_size +=3D sizeof(SetupData) + dtb_size; + kernel_cmdline =3D g_realloc(kernel_cmdline, cmdline_size); + setup_data =3D (void *)kernel_cmdline + setup_data_offset; setup_data->next =3D cpu_to_le64(first_setup_data); - first_setup_data =3D prot_addr + setup_data_offset; + first_setup_data =3D cmdline_addr + setup_data_offset; setup_data->type =3D cpu_to_le32(SETUP_DTB); setup_data->len =3D cpu_to_le32(dtb_size); - load_image_size(dtb_filename, setup_data->data, dtb_size); } =20 - if (!legacy_no_rng_seed) { - setup_data_offset =3D QEMU_ALIGN_UP(kernel_size, 16); - kernel_size =3D setup_data_offset + sizeof(SetupData) + RNG_SEED_L= ENGTH; - kernel =3D g_realloc(kernel, kernel_size); - setup_data =3D (SetupData *)(kernel + setup_data_offset); + if (!legacy_no_rng_seed && protocol >=3D 0x209) { + setup_data_offset =3D cmdline_size; + cmdline_size +=3D sizeof(SetupData) + RNG_SEED_LENGTH; + kernel_cmdline =3D g_realloc(kernel_cmdline, cmdline_size); + setup_data =3D (void *)kernel_cmdline + setup_data_offset; setup_data->next =3D cpu_to_le64(first_setup_data); - first_setup_data =3D prot_addr + setup_data_offset; + first_setup_data =3D cmdline_addr + setup_data_offset; setup_data->type =3D cpu_to_le32(SETUP_RNG_SEED); setup_data->len =3D cpu_to_le32(RNG_SEED_LENGTH); qemu_guest_getrandom_nofail(setup_data->data, RNG_SEED_LENGTH); @@ -1122,6 +1120,12 @@ void x86_load_linux(X86MachineState *x86ms, fw_cfg_add_bytes(fw_cfg, FW_CFG_KERNEL_DATA, kernel, kernel_size); } =20 + fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_ADDR, cmdline_addr); + fw_cfg_add_i32(fw_cfg, FW_CFG_CMDLINE_SIZE, cmdline_size); + fw_cfg_add_bytes(fw_cfg, FW_CFG_CMDLINE_DATA, kernel_cmdline, cmdline_= size); + sev_load_ctx.cmdline_data =3D (char *)kernel_cmdline; + sev_load_ctx.cmdline_size =3D cmdline_size; + fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_ADDR, prot_addr); fw_cfg_add_i32(fw_cfg, FW_CFG_KERNEL_SIZE, kernel_size); sev_load_ctx.kernel_data =3D (char *)kernel; @@ -1134,7 +1138,7 @@ void x86_load_linux(X86MachineState *x86ms, * kernel on the other side of the fw_cfg interface matches the hash o= f the * file the user passed in. */ - if (!sev_enabled()) { + if (!sev_enabled() && first_setup_data) { SetupDataFixup *fixup =3D g_malloc(sizeof(*fixup)); =20 memcpy(setup, header, MIN(sizeof(header), setup_size)); diff --git a/hw/nvram/fw_cfg.c b/hw/nvram/fw_cfg.c index a00881bc64..432754eda4 100644 --- a/hw/nvram/fw_cfg.c +++ b/hw/nvram/fw_cfg.c @@ -741,6 +741,15 @@ void fw_cfg_add_bytes(FWCfgState *s, uint16_t key, voi= d *data, size_t len) fw_cfg_add_bytes_callback(s, key, NULL, NULL, NULL, data, len, true); } =20 +void *fw_cfg_read_bytes_ptr(FWCfgState *s, uint16_t key) +{ + int arch =3D !!(key & FW_CFG_ARCH_LOCAL); + + key &=3D FW_CFG_ENTRY_MASK; + assert(key < fw_cfg_max_entry(s)); + return s->entries[arch][key].data; +} + void fw_cfg_add_string(FWCfgState *s, uint16_t key, const char *value) { size_t sz =3D strlen(value) + 1; --=20 MST