:p
atchew
Login
The following changes since commit e3debd5e7d0ce031356024878a0a18b9d109354a: Merge tag 'pull-request-2023-03-24' of https://gitlab.com/thuth/qemu into staging (2023-03-24 16:08:46 +0000) are available in the Git repository at: https://repo.or.cz/qemu/kevin.git tags/for-upstream for you to fetch changes up to d8fbf9aa85aed64450907580a1d70583f097e9df: block/export: Fix graph locking in blk_get_geometry() call (2023-03-27 15:16:05 +0200) ---------------------------------------------------------------- Block layer patches - aio-posix: Fix race during epoll upgrade - vhost-user-blk/VDUSE export: Fix a potential deadlock and an assertion failure when the export runs in an iothread - NBD server: Push pending frames after sending reply to fix performance especially when used with TLS ---------------------------------------------------------------- Florian Westphal (1): nbd/server: push pending frames after sending reply Kevin Wolf (1): block/export: Fix graph locking in blk_get_geometry() call Stefan Hajnoczi (2): block/export: only acquire AioContext once for vhost_user_server_stop() aio-posix: fix race between epoll upgrade and aio_set_fd_handler() include/block/block-io.h | 4 +++- include/sysemu/block-backend-io.h | 5 ++++- block.c | 5 +++-- block/block-backend.c | 7 +++++-- block/export/virtio-blk-handler.c | 7 ++++--- nbd/server.c | 3 +++ util/fdmon-epoll.c | 25 ++++++++++++++++++------- util/vhost-user-server.c | 5 +---- 8 files changed, 41 insertions(+), 20 deletions(-)
From: Florian Westphal <fw@strlen.de> qemu-nbd doesn't set TCP_NODELAY on the tcp socket. Kernel waits for more data and avoids transmission of small packets. Without TLS this is barely noticeable, but with TLS this really shows. Booting a VM via qemu-nbd on localhost (with tls) takes more than 2 minutes on my system. tcpdump shows frequent wait periods, where no packets get sent for a 40ms period. Add explicit (un)corking when processing (and responding to) requests. "TCP_CORK, &zero" after earlier "CORK, &one" will flush pending data. VM Boot time: main: no tls: 23s, with tls: 2m45s patched: no tls: 14s, with tls: 15s VM Boot time, qemu-nbd via network (same lan): main: no tls: 18s, with tls: 1m50s patched: no tls: 17s, with tls: 18s Future optimization: if we could detect if there is another pending request we could defer the uncork operation because more data would be appended. Signed-off-by: Florian Westphal <fw@strlen.de> Message-Id: <20230324104720.2498-1-fw@strlen.de> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- nbd/server.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/nbd/server.c b/nbd/server.c index XXXXXXX..XXXXXXX 100644 --- a/nbd/server.c +++ b/nbd/server.c @@ -XXX,XX +XXX,XX @@ static coroutine_fn void nbd_trip(void *opaque) goto disconnect; } + qio_channel_set_cork(client->ioc, true); + if (ret < 0) { /* It wasn't -EIO, so, according to nbd_co_receive_request() * semantics, we should return the error to the client. */ @@ -XXX,XX +XXX,XX @@ static coroutine_fn void nbd_trip(void *opaque) goto disconnect; } + qio_channel_set_cork(client->ioc, false); done: nbd_request_put(req); nbd_client_put(client); -- 2.39.2
From: Stefan Hajnoczi <stefanha@redhat.com> vhost_user_server_stop() uses AIO_WAIT_WHILE(). AIO_WAIT_WHILE() requires that AioContext is only acquired once. Since blk_exp_request_shutdown() already acquires the AioContext it shouldn't be acquired again in vhost_user_server_stop(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230323145853.1345527-1-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- util/vhost-user-server.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c index XXXXXXX..XXXXXXX 100644 --- a/util/vhost-user-server.c +++ b/util/vhost-user-server.c @@ -XXX,XX +XXX,XX @@ static void vu_accept(QIONetListener *listener, QIOChannelSocket *sioc, aio_context_release(server->ctx); } +/* server->ctx acquired by caller */ void vhost_user_server_stop(VuServer *server) { - aio_context_acquire(server->ctx); - qemu_bh_delete(server->restart_listener_bh); server->restart_listener_bh = NULL; @@ -XXX,XX +XXX,XX @@ void vhost_user_server_stop(VuServer *server) AIO_WAIT_WHILE(server->ctx, server->co_trip); } - aio_context_release(server->ctx); - if (server->listener) { qio_net_listener_disconnect(server->listener); object_unref(OBJECT(server->listener)); -- 2.39.2
From: Stefan Hajnoczi <stefanha@redhat.com> If another thread calls aio_set_fd_handler() while the IOThread event loop is upgrading from ppoll(2) to epoll(7) then we might miss new AioHandlers. The epollfd will not monitor the new AioHandler's fd, resulting in hangs. Take the AioHandler list lock while upgrading to epoll. This prevents AioHandlers from changing while epoll is being set up. If we cannot lock because we're in a nested event loop, then don't upgrade to epoll (it will happen next time we're not in a nested call). The downside to taking the lock is that the aio_set_fd_handler() thread has to wait until the epoll upgrade is finished, which involves many epoll_ctl(2) system calls. However, this scenario is rare and I couldn't think of another solution that is still simple. Reported-by: Qing Wang <qinwang@redhat.com> Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2090998 Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Fam Zheng <fam@euphon.net> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230323144859.1338495-1-stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- util/fdmon-epoll.c | 25 ++++++++++++++++++------- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/util/fdmon-epoll.c b/util/fdmon-epoll.c index XXXXXXX..XXXXXXX 100644 --- a/util/fdmon-epoll.c +++ b/util/fdmon-epoll.c @@ -XXX,XX +XXX,XX @@ static bool fdmon_epoll_try_enable(AioContext *ctx) bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd) { + bool ok; + if (ctx->epollfd < 0) { return false; } @@ -XXX,XX +XXX,XX @@ bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd) return false; } - if (npfd >= EPOLL_ENABLE_THRESHOLD) { - if (fdmon_epoll_try_enable(ctx)) { - return true; - } else { - fdmon_epoll_disable(ctx); - } + if (npfd < EPOLL_ENABLE_THRESHOLD) { + return false; + } + + /* The list must not change while we add fds to epoll */ + if (!qemu_lockcnt_dec_if_lock(&ctx->list_lock)) { + return false; + } + + ok = fdmon_epoll_try_enable(ctx); + + qemu_lockcnt_inc_and_unlock(&ctx->list_lock); + + if (!ok) { + fdmon_epoll_disable(ctx); } - return false; + return ok; } void fdmon_epoll_setup(AioContext *ctx) -- 2.39.2
blk_get_geometry() eventually calls bdrv_nb_sectors(), which is a co_wrapper_mixed_bdrv_rdlock. This means that when it is called from coroutine context, it already assume to have the graph locked. However, virtio_blk_sect_range_ok() in block/export/virtio-blk-handler.c (used by vhost-user-blk and VDUSE exports) runs in a coroutine, but doesn't take the graph lock - blk_*() functions are generally expected to do that internally. This causes an assertion failure when accessing an export for the first time if it runs in an iothread. This is an example of the crash: $ ./storage-daemon/qemu-storage-daemon --object iothread,id=th0 --blockdev file,filename=/home/kwolf/images/hd.img,node-name=disk --export vhost-user-blk,addr.type=unix,addr.path=/tmp/vhost.sock,node-name=disk,id=exp0,iothread=th0 qemu-storage-daemon: ../block/graph-lock.c:268: void assert_bdrv_graph_readable(void): Assertion `qemu_in_main_thread() || reader_count()' failed. (gdb) bt #0 0x00007ffff6eafe5c in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x00007ffff6e5fa76 in raise () from /lib64/libc.so.6 #2 0x00007ffff6e497fc in abort () from /lib64/libc.so.6 #3 0x00007ffff6e4971b in __assert_fail_base.cold () from /lib64/libc.so.6 #4 0x00007ffff6e58656 in __assert_fail () from /lib64/libc.so.6 #5 0x00005555556337a3 in assert_bdrv_graph_readable () at ../block/graph-lock.c:268 #6 0x00005555555fd5a2 in bdrv_co_nb_sectors (bs=0x5555564c5ef0) at ../block.c:5847 #7 0x00005555555ee949 in bdrv_nb_sectors (bs=0x5555564c5ef0) at block/block-gen.c:256 #8 0x00005555555fd6b9 in bdrv_get_geometry (bs=0x5555564c5ef0, nb_sectors_ptr=0x7fffef7fedd0) at ../block.c:5884 #9 0x000055555562ad6d in blk_get_geometry (blk=0x5555564cb200, nb_sectors_ptr=0x7fffef7fedd0) at ../block/block-backend.c:1624 #10 0x00005555555ddb74 in virtio_blk_sect_range_ok (blk=0x5555564cb200, block_size=512, sector=0, size=512) at ../block/export/virtio-blk-handler.c:44 #11 0x00005555555dd80d in virtio_blk_process_req (handler=0x5555564cbb98, in_iov=0x7fffe8003830, out_iov=0x7fffe8003860, in_num=1, out_num=0) at ../block/export/virtio-blk-handler.c:189 #12 0x00005555555dd546 in vu_blk_virtio_process_req (opaque=0x7fffe8003800) at ../block/export/vhost-user-blk-server.c:66 #13 0x00005555557bf4a1 in coroutine_trampoline (i0=-402635264, i1=32767) at ../util/coroutine-ucontext.c:177 #14 0x00007ffff6e75c20 in ?? () from /lib64/libc.so.6 #15 0x00007fffefffa870 in ?? () #16 0x0000000000000000 in ?? () Fix this by creating a new blk_co_get_geometry() that takes the lock, and changing blk_get_geometry() to be a co_wrapper_mixed around it. To make the resulting code cleaner, virtio-blk-handler.c can directly call the coroutine version now (though that wouldn't be necessary for fixing the bug, taking the lock in blk_co_get_geometry() is what fixes it). Fixes: 8ab8140a04cf771d63e9754d6ba6c1e676bfe507 Reported-by: Lukáš Doktor <ldoktor@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230327113959.60071-1-kwolf@redhat.com> Reviewed-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/block/block-io.h | 4 +++- include/sysemu/block-backend-io.h | 5 ++++- block.c | 5 +++-- block/block-backend.c | 7 +++++-- block/export/virtio-blk-handler.c | 7 ++++--- 5 files changed, 19 insertions(+), 9 deletions(-) diff --git a/include/block/block-io.h b/include/block/block-io.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/block-io.h +++ b/include/block/block-io.h @@ -XXX,XX +XXX,XX @@ int64_t co_wrapper bdrv_get_allocated_file_size(BlockDriverState *bs); BlockMeasureInfo *bdrv_measure(BlockDriver *drv, QemuOpts *opts, BlockDriverState *in_bs, Error **errp); -void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr); + +void coroutine_fn GRAPH_RDLOCK +bdrv_co_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr); int coroutine_fn GRAPH_RDLOCK bdrv_co_delete_file(BlockDriverState *bs, Error **errp); diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backend-io.h index XXXXXXX..XXXXXXX 100644 --- a/include/sysemu/block-backend-io.h +++ b/include/sysemu/block-backend-io.h @@ -XXX,XX +XXX,XX @@ void co_wrapper blk_eject(BlockBackend *blk, bool eject_flag); int64_t coroutine_fn blk_co_getlength(BlockBackend *blk); int64_t co_wrapper_mixed blk_getlength(BlockBackend *blk); -void blk_get_geometry(BlockBackend *blk, uint64_t *nb_sectors_ptr); +void coroutine_fn blk_co_get_geometry(BlockBackend *blk, + uint64_t *nb_sectors_ptr); +void co_wrapper_mixed blk_get_geometry(BlockBackend *blk, + uint64_t *nb_sectors_ptr); int64_t coroutine_fn blk_co_nb_sectors(BlockBackend *blk); int64_t co_wrapper_mixed blk_nb_sectors(BlockBackend *blk); diff --git a/block.c b/block.c index XXXXXXX..XXXXXXX 100644 --- a/block.c +++ b/block.c @@ -XXX,XX +XXX,XX @@ int64_t coroutine_fn bdrv_co_getlength(BlockDriverState *bs) } /* return 0 as number of sectors if no device present or error */ -void bdrv_get_geometry(BlockDriverState *bs, uint64_t *nb_sectors_ptr) +void coroutine_fn bdrv_co_get_geometry(BlockDriverState *bs, + uint64_t *nb_sectors_ptr) { - int64_t nb_sectors = bdrv_nb_sectors(bs); + int64_t nb_sectors = bdrv_co_nb_sectors(bs); IO_CODE(); *nb_sectors_ptr = nb_sectors < 0 ? 0 : nb_sectors; diff --git a/block/block-backend.c b/block/block-backend.c index XXXXXXX..XXXXXXX 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -XXX,XX +XXX,XX @@ int64_t coroutine_fn blk_co_getlength(BlockBackend *blk) return bdrv_co_getlength(blk_bs(blk)); } -void blk_get_geometry(BlockBackend *blk, uint64_t *nb_sectors_ptr) +void coroutine_fn blk_co_get_geometry(BlockBackend *blk, + uint64_t *nb_sectors_ptr) { IO_CODE(); + GRAPH_RDLOCK_GUARD(); + if (!blk_bs(blk)) { *nb_sectors_ptr = 0; } else { - bdrv_get_geometry(blk_bs(blk), nb_sectors_ptr); + bdrv_co_get_geometry(blk_bs(blk), nb_sectors_ptr); } } diff --git a/block/export/virtio-blk-handler.c b/block/export/virtio-blk-handler.c index XXXXXXX..XXXXXXX 100644 --- a/block/export/virtio-blk-handler.c +++ b/block/export/virtio-blk-handler.c @@ -XXX,XX +XXX,XX @@ struct virtio_blk_inhdr { unsigned char status; }; -static bool virtio_blk_sect_range_ok(BlockBackend *blk, uint32_t block_size, - uint64_t sector, size_t size) +static bool coroutine_fn +virtio_blk_sect_range_ok(BlockBackend *blk, uint32_t block_size, + uint64_t sector, size_t size) { uint64_t nb_sectors; uint64_t total_sectors; @@ -XXX,XX +XXX,XX @@ static bool virtio_blk_sect_range_ok(BlockBackend *blk, uint32_t block_size, if ((sector << VIRTIO_BLK_SECTOR_BITS) % block_size) { return false; } - blk_get_geometry(blk, &total_sectors); + blk_co_get_geometry(blk, &total_sectors); if (sector > total_sectors || nb_sectors > total_sectors - sector) { return false; } -- 2.39.2
The following changes since commit 13356edb87506c148b163b8c7eb0695647d00c2a: Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging (2023-01-24 09:45:33 +0000) are available in the Git repository at: https://repo.or.cz/qemu/kevin.git tags/for-upstream for you to fetch changes up to d570177b50c389f379f93183155a27d44856ab46: qemu-img: Change info key names for protocol nodes (2023-02-01 16:52:33 +0100) v4: - Fixed the 'qemu-img-close-errors' test case to run only on Linux and only with the file protocol, use qemu-io instead of truncate v3: - Make the compiler happier on BSD and CentOS Stream 8 v2: - Rebased to resolve merge conflicts in coroutine.h ---------------------------------------------------------------- Block layer patches - qemu-img info: Show protocol-level information - Move more functions to coroutines - Make coroutine annotations ready for static analysis - qemu-img: Fix exit code for errors closing the image - qcow2 bitmaps: Fix theoretical corruption in error path - pflash: Only load non-zero parts of backend image to save memory - Code cleanup and test case improvements ---------------------------------------------------------------- Alberto Faria (2): coroutine: annotate coroutine_fn for libclang block: Add no_coroutine_fn and coroutine_mixed_fn marker Emanuele Giuseppe Esposito (14): block-coroutine-wrapper: support void functions block: Convert bdrv_io_plug() to co_wrapper block: Convert bdrv_io_unplug() to co_wrapper block: Convert bdrv_is_inserted() to co_wrapper block: Rename refresh_total_sectors to bdrv_refresh_total_sectors block: Convert bdrv_refresh_total_sectors() to co_wrapper_mixed block-backend: use bdrv_getlength instead of blk_getlength block: use bdrv_co_refresh_total_sectors when possible block: Convert bdrv_get_allocated_file_size() to co_wrapper block: Convert bdrv_get_info() to co_wrapper_mixed block: Convert bdrv_eject() to co_wrapper block: Convert bdrv_lock_medium() to co_wrapper block: Convert bdrv_debug_event() to co_wrapper_mixed block: Rename bdrv_load/save_vmstate() to bdrv_co_load/save_vmstate() Hanna Reitz (12): block: Improve empty format-specific info dump block/file: Add file-specific image info block/vmdk: Change extent info type block: Split BlockNodeInfo off of ImageInfo qemu-img: Use BlockNodeInfo block/qapi: Let bdrv_query_image_info() recurse block/qapi: Introduce BlockGraphInfo block/qapi: Add indentation to bdrv_node_info_dump() iotests: Filter child node information iotests/106, 214, 308: Read only one size line qemu-img: Let info print block graph qemu-img: Change info key names for protocol nodes Kevin Wolf (4): qcow2: Fix theoretical corruption in store_bitmap() error path qemu-img commit: Report errors while closing the image qemu-img bitmap: Report errors while closing the image qemu-iotests: Test qemu-img bitmap/commit exit code on error Paolo Bonzini (2): qemu-io: do not reinvent the blk_pwrite_zeroes wheel block: remove bdrv_coroutine_enter Philippe Mathieu-Daudé (1): block/nbd: Add missing <qemu/bswap.h> include Thomas Huth (2): tests/qemu-iotests/312: Mark "quorum" as required driver tests/qemu-iotests/262: Check for availability of "blkverify" first Xiang Zheng (1): pflash: Only read non-zero parts of backend image qapi/block-core.json | 123 +++++++- include/block/block-common.h | 11 +- include/block/block-io.h | 41 ++- include/block/block_int-common.h | 26 +- include/block/block_int-io.h | 5 +- include/block/nbd.h | 1 + include/block/qapi.h | 14 +- include/qemu/osdep.h | 44 +++ include/sysemu/block-backend-io.h | 31 +- block.c | 88 +++--- block/blkdebug.c | 11 +- block/blkio.c | 15 +- block/blklogwrites.c | 6 +- block/blkreplay.c | 6 +- block/blkverify.c | 6 +- block/block-backend.c | 38 +-- block/commit.c | 4 +- block/copy-on-read.c | 18 +- block/crypto.c | 14 +- block/curl.c | 10 +- block/file-posix.c | 137 +++++---- block/file-win32.c | 18 +- block/filter-compress.c | 20 +- block/gluster.c | 23 +- block/io.c | 76 ++--- block/iscsi.c | 17 +- block/mirror.c | 6 +- block/monitor/block-hmp-cmds.c | 2 +- block/nbd.c | 8 +- block/nfs.c | 4 +- block/null.c | 13 +- block/nvme.c | 14 +- block/preallocate.c | 16 +- block/qapi.c | 317 ++++++++++++++++----- block/qcow.c | 5 +- block/qcow2-bitmap.c | 5 +- block/qcow2-refcount.c | 2 +- block/qcow2.c | 17 +- block/qed.c | 11 +- block/quorum.c | 8 +- block/raw-format.c | 25 +- block/rbd.c | 9 +- block/replication.c | 6 +- block/ssh.c | 4 +- block/throttle.c | 6 +- block/vdi.c | 7 +- block/vhdx.c | 5 +- block/vmdk.c | 22 +- block/vpc.c | 5 +- blockdev.c | 8 +- hw/block/block.c | 36 ++- hw/scsi/scsi-disk.c | 5 + qemu-img.c | 100 +++++-- qemu-io-cmds.c | 62 +--- tests/unit/test-block-iothread.c | 3 + scripts/block-coroutine-wrapper.py | 20 +- tests/qemu-iotests/iotests.py | 18 +- block/meson.build | 1 + tests/qemu-iotests/065 | 2 +- tests/qemu-iotests/106 | 4 +- tests/qemu-iotests/214 | 6 +- tests/qemu-iotests/262 | 3 +- tests/qemu-iotests/302.out | 5 + tests/qemu-iotests/308 | 4 +- tests/qemu-iotests/312 | 1 + tests/qemu-iotests/common.filter | 22 +- tests/qemu-iotests/common.rc | 22 +- tests/qemu-iotests/tests/qemu-img-close-errors | 96 +++++++ tests/qemu-iotests/tests/qemu-img-close-errors.out | 23 ++ 69 files changed, 1209 insertions(+), 552 deletions(-) create mode 100755 tests/qemu-iotests/tests/qemu-img-close-errors create mode 100644 tests/qemu-iotests/tests/qemu-img-close-errors.out