:p
atchew
Login
The following changes since commit aa9bbd865502ed517624ab6fe7d4b5d89ca95e43: Merge tag 'pull-ppc-20230528' of https://gitlab.com/danielhb/qemu into staging (2023-05-29 14:31:52 -0700) are available in the Git repository at: https://repo.or.cz/qemu/kevin.git tags/for-upstream for you to fetch changes up to 60f782b6b78211c125970768be726c9f380dbd61: aio: remove aio_disable_external() API (2023-05-30 17:37:26 +0200) ---------------------------------------------------------------- Block layer patches - Fix blockdev-create with iothreads - Remove aio_disable_external() API ---------------------------------------------------------------- Kevin Wolf (12): block-coroutine-wrapper: Take AioContext lock in no_co_wrappers block: Clarify locking rules for bdrv_open(_inherit)() block: Take main AioContext lock when calling bdrv_open() block-backend: Fix blk_new_open() for iothreads mirror: Hold main AioContext lock for calling bdrv_open_backing_file() qcow2: Fix open with 'file' in iothread raw-format: Fix open with 'file' in iothread copy-before-write: Fix open with child in iothread block: Take AioContext lock in bdrv_open_driver() block: Fix AioContext locking in bdrv_insert_node() iotests: Make verify_virtio_scsi_pci_or_ccw() public iotests: Test blockdev-create in iothread Stefan Hajnoczi (20): block-backend: split blk_do_set_aio_context() hw/qdev: introduce qdev_is_realized() helper virtio-scsi: avoid race between unplug and transport event virtio-scsi: stop using aio_disable_external() during unplug util/vhost-user-server: rename refcount to in_flight counter block/export: wait for vhost-user-blk requests when draining block/export: stop using is_external in vhost-user-blk server hw/xen: do not use aio_set_fd_handler(is_external=true) in xen_xenstore block: add blk_in_drain() API block: drain from main loop thread in bdrv_co_yield_to_drain() xen-block: implement BlockDevOps->drained_begin() hw/xen: do not set is_external=true on evtchn fds block/export: rewrite vduse-blk drain code block/export: don't require AioContext lock around blk_exp_ref/unref() block/fuse: do not set is_external=true on FUSE fd virtio: make it possible to detach host notifier from any thread virtio-blk: implement BlockDevOps->drained_begin() virtio-scsi: implement BlockDevOps->drained_begin() virtio: do not set is_external=true on host notifiers aio: remove aio_disable_external() API hw/block/dataplane/xen-block.h | 2 + include/block/aio.h | 57 ------------ include/block/block-common.h | 3 + include/block/block_int-common.h | 72 +++++++-------- include/block/export.h | 2 + include/hw/qdev-core.h | 17 +++- include/hw/scsi/scsi.h | 14 +++ include/qemu/vhost-user-server.h | 8 +- include/sysemu/block-backend-common.h | 25 ++--- include/sysemu/block-backend-global-state.h | 1 + util/aio-posix.h | 1 - block.c | 46 ++++++--- block/blkio.c | 15 +-- block/block-backend.c | 104 ++++++++++++--------- block/copy-before-write.c | 21 ++++- block/curl.c | 10 +- block/export/export.c | 13 ++- block/export/fuse.c | 56 ++++++++++- block/export/vduse-blk.c | 128 ++++++++++++++++++-------- block/export/vhost-user-blk-server.c | 52 +++++++++-- block/io.c | 16 ++-- block/io_uring.c | 4 +- block/iscsi.c | 3 +- block/linux-aio.c | 4 +- block/mirror.c | 6 ++ block/nfs.c | 5 +- block/nvme.c | 8 +- block/qapi-sysemu.c | 3 + block/qcow2.c | 8 +- block/raw-format.c | 5 + block/ssh.c | 4 +- block/win32-aio.c | 6 +- blockdev.c | 29 ++++-- hw/block/dataplane/virtio-blk.c | 23 +++-- hw/block/dataplane/xen-block.c | 42 ++++++--- hw/block/virtio-blk.c | 38 +++++++- hw/block/xen-block.c | 24 ++++- hw/i386/kvm/xen_xenstore.c | 2 +- hw/scsi/scsi-bus.c | 46 ++++++++- hw/scsi/scsi-disk.c | 27 +++++- hw/scsi/virtio-scsi-dataplane.c | 32 +++++-- hw/scsi/virtio-scsi.c | 127 +++++++++++++++++++------ hw/virtio/virtio.c | 9 +- hw/xen/xen-bus.c | 11 ++- io/channel-command.c | 6 +- io/channel-file.c | 3 +- io/channel-socket.c | 3 +- migration/rdma.c | 16 ++-- qemu-nbd.c | 4 + tests/unit/test-aio.c | 27 +----- tests/unit/test-bdrv-drain.c | 15 +-- tests/unit/test-block-iothread.c | 4 +- tests/unit/test-fdmon-epoll.c | 73 --------------- tests/unit/test-nested-aio-poll.c | 9 +- util/aio-posix.c | 20 +--- util/aio-win32.c | 8 +- util/async.c | 3 +- util/fdmon-epoll.c | 10 -- util/fdmon-io_uring.c | 8 +- util/fdmon-poll.c | 3 +- util/main-loop.c | 7 +- util/qemu-coroutine-io.c | 7 +- util/vhost-user-server.c | 33 ++++--- scripts/block-coroutine-wrapper.py | 25 +++-- tests/qemu-iotests/iotests.py | 2 +- hw/scsi/trace-events | 2 + tests/qemu-iotests/256 | 2 +- tests/qemu-iotests/tests/iothreads-create | 67 ++++++++++++++ tests/qemu-iotests/tests/iothreads-create.out | 4 + tests/unit/meson.build | 3 - 70 files changed, 931 insertions(+), 562 deletions(-) delete mode 100644 tests/unit/test-fdmon-epoll.c create mode 100755 tests/qemu-iotests/tests/iothreads-create create mode 100644 tests/qemu-iotests/tests/iothreads-create.out
All of the functions that currently take a BlockDriverState, BdrvChild or BlockBackend as their first parameter expect the associated AioContext to be locked when they are called. In the case of no_co_wrappers, they are called from bottom halves directly in the main loop, so no other caller can be expected to take the lock for them. This can result in assertion failures because a lock that isn't taken is released in nested event loops. Looking at the first parameter is already done by co_wrappers to decide where the coroutine should run, so doing the same in no_co_wrappers is only consistent. Take the lock in the generated bottom halves to fix the problem. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-2-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/block/block-common.h | 3 +++ block/block-backend.c | 7 ++++++- scripts/block-coroutine-wrapper.py | 25 +++++++++++++++---------- 3 files changed, 24 insertions(+), 11 deletions(-) diff --git a/include/block/block-common.h b/include/block/block-common.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -XXX,XX +XXX,XX @@ * scheduling a BH in the bottom half that runs the respective non-coroutine * function. The coroutine yields after scheduling the BH and is reentered when * the wrapped function returns. + * + * If the first parameter of the function is a BlockDriverState, BdrvChild or + * BlockBackend pointer, the AioContext lock for it is taken in the wrapper. */ #define no_co_wrapper diff --git a/block/block-backend.c b/block/block-backend.c index XXXXXXX..XXXXXXX 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -XXX,XX +XXX,XX @@ void blk_op_unblock_all(BlockBackend *blk, Error *reason) AioContext *blk_get_aio_context(BlockBackend *blk) { - BlockDriverState *bs = blk_bs(blk); + BlockDriverState *bs; IO_CODE(); + if (!blk) { + return qemu_get_aio_context(); + } + + bs = blk_bs(blk); if (bs) { AioContext *ctx = bdrv_get_aio_context(blk_bs(blk)); assert(ctx == blk->ctx); diff --git a/scripts/block-coroutine-wrapper.py b/scripts/block-coroutine-wrapper.py index XXXXXXX..XXXXXXX 100644 --- a/scripts/block-coroutine-wrapper.py +++ b/scripts/block-coroutine-wrapper.py @@ -XXX,XX +XXX,XX @@ def __init__(self, wrapper_type: str, return_type: str, name: str, raise ValueError(f"no_co function can't be rdlock: {self.name}") self.target_name = f'{subsystem}_{subname}' - t = self.args[0].type - if t == 'BlockDriverState *': - ctx = 'bdrv_get_aio_context(bs)' - elif t == 'BdrvChild *': - ctx = 'bdrv_get_aio_context(child->bs)' - elif t == 'BlockBackend *': - ctx = 'blk_get_aio_context(blk)' - else: - ctx = 'qemu_get_aio_context()' - self.ctx = ctx + self.ctx = self.gen_ctx() self.get_result = 's->ret = ' self.ret = 'return s.ret;' @@ -XXX,XX +XXX,XX @@ def __init__(self, wrapper_type: str, return_type: str, name: str, self.co_ret = '' self.return_field = '' + def gen_ctx(self, prefix: str = '') -> str: + t = self.args[0].type + if t == 'BlockDriverState *': + return f'bdrv_get_aio_context({prefix}bs)' + elif t == 'BdrvChild *': + return f'bdrv_get_aio_context({prefix}child->bs)' + elif t == 'BlockBackend *': + return f'blk_get_aio_context({prefix}blk)' + else: + return 'qemu_get_aio_context()' + def gen_list(self, format: str) -> str: return ', '.join(format.format_map(arg.__dict__) for arg in self.args) @@ -XXX,XX +XXX,XX @@ def gen_no_co_wrapper(func: FuncDecl) -> str: static void {name}_bh(void *opaque) {{ {struct_name} *s = opaque; + AioContext *ctx = {func.gen_ctx('s->')}; + aio_context_acquire(ctx); {func.get_result}{name}({ func.gen_list('s->{name}') }); + aio_context_release(ctx); aio_co_wake(s->co); }} -- 2.40.1
These functions specify that the caller must hold the "@filename AioContext lock". This doesn't make sense, file names don't have an AioContext. New BlockDriverStates always start in the main AioContext, so this is what we really need here. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-3-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block.c | 10 ++-------- 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/block.c b/block.c index XXXXXXX..XXXXXXX 100644 --- a/block.c +++ b/block.c @@ -XXX,XX +XXX,XX @@ out: * should be opened. If specified, neither options nor a filename may be given, * nor can an existing BDS be reused (that is, *pbs has to be NULL). * - * The caller must always hold @filename AioContext lock, because this - * function eventually calls bdrv_refresh_total_sectors() which polls - * when called from non-coroutine context. + * The caller must always hold the main AioContext lock. */ static BlockDriverState * no_coroutine_fn bdrv_open_inherit(const char *filename, const char *reference, QDict *options, @@ -XXX,XX +XXX,XX @@ close_and_fail: return NULL; } -/* - * The caller must always hold @filename AioContext lock, because this - * function eventually calls bdrv_refresh_total_sectors() which polls - * when called from non-coroutine context. - */ +/* The caller must always hold the main AioContext lock. */ BlockDriverState *bdrv_open(const char *filename, const char *reference, QDict *options, int flags, Error **errp) { -- 2.40.1
The function documentation already says that all callers must hold the main AioContext lock, but not all of them do. This can cause assertion failures when functions called by bdrv_open() try to drop the lock. Fix a few more callers to take the lock before calling bdrv_open(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-4-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block.c | 3 +++ block/block-backend.c | 2 ++ block/qapi-sysemu.c | 3 +++ blockdev.c | 29 +++++++++++++++++++++++------ qemu-nbd.c | 4 ++++ tests/unit/test-block-iothread.c | 3 +++ 6 files changed, 38 insertions(+), 6 deletions(-) diff --git a/block.c b/block.c index XXXXXXX..XXXXXXX 100644 --- a/block.c +++ b/block.c @@ -XXX,XX +XXX,XX @@ void bdrv_img_create(const char *filename, const char *fmt, return; } + aio_context_acquire(qemu_get_aio_context()); + /* Create parameter list */ create_opts = qemu_opts_append(create_opts, drv->create_opts); create_opts = qemu_opts_append(create_opts, proto_drv->create_opts); @@ -XXX,XX +XXX,XX @@ out: qemu_opts_del(opts); qemu_opts_free(create_opts); error_propagate(errp, local_err); + aio_context_release(qemu_get_aio_context()); } AioContext *bdrv_get_aio_context(BlockDriverState *bs) diff --git a/block/block-backend.c b/block/block-backend.c index XXXXXXX..XXXXXXX 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -XXX,XX +XXX,XX @@ BlockBackend *blk_new_open(const char *filename, const char *reference, } blk = blk_new(qemu_get_aio_context(), perm, shared); + aio_context_acquire(qemu_get_aio_context()); bs = bdrv_open(filename, reference, options, flags, errp); + aio_context_release(qemu_get_aio_context()); if (!bs) { blk_unref(blk); return NULL; diff --git a/block/qapi-sysemu.c b/block/qapi-sysemu.c index XXXXXXX..XXXXXXX 100644 --- a/block/qapi-sysemu.c +++ b/block/qapi-sysemu.c @@ -XXX,XX +XXX,XX @@ void qmp_blockdev_change_medium(const char *device, qdict_put_str(options, "driver", format); } + aio_context_acquire(qemu_get_aio_context()); medium_bs = bdrv_open(filename, NULL, options, bdrv_flags, errp); + aio_context_release(qemu_get_aio_context()); + if (!medium_bs) { goto fail; } diff --git a/blockdev.c b/blockdev.c index XXXXXXX..XXXXXXX 100644 --- a/blockdev.c +++ b/blockdev.c @@ -XXX,XX +XXX,XX @@ err_no_opts: /* Takes the ownership of bs_opts */ BlockDriverState *bds_tree_init(QDict *bs_opts, Error **errp) { + BlockDriverState *bs; int bdrv_flags = 0; GLOBAL_STATE_CODE(); @@ -XXX,XX +XXX,XX @@ BlockDriverState *bds_tree_init(QDict *bs_opts, Error **errp) bdrv_flags |= BDRV_O_INACTIVE; } - return bdrv_open(NULL, NULL, bs_opts, bdrv_flags, errp); + aio_context_acquire(qemu_get_aio_context()); + bs = bdrv_open(NULL, NULL, bs_opts, bdrv_flags, errp); + aio_context_release(qemu_get_aio_context()); + + return bs; } void blockdev_close_all_bdrv_states(void) @@ -XXX,XX +XXX,XX @@ static void external_snapshot_action(TransactionAction *action, } qdict_put_str(options, "driver", format); } + aio_context_release(aio_context); + aio_context_acquire(qemu_get_aio_context()); state->new_bs = bdrv_open(new_image_file, snapshot_ref, options, flags, errp); + aio_context_release(qemu_get_aio_context()); + /* We will manually add the backing_hd field to the bs later */ if (!state->new_bs) { - goto out; + return; } + aio_context_acquire(aio_context); + /* * Allow attaching a backing file to an overlay that's already in use only * if the parents don't assume that they are already seeing a valid image. @@ -XXX,XX +XXX,XX @@ static void drive_backup_action(DriveBackup *backup, if (format) { qdict_put_str(options, "driver", format); } + aio_context_release(aio_context); + aio_context_acquire(qemu_get_aio_context()); target_bs = bdrv_open(backup->target, NULL, options, flags, errp); + aio_context_release(qemu_get_aio_context()); + if (!target_bs) { - goto out; + return; } /* Honor bdrv_try_change_aio_context() context acquisition requirements. */ old_context = bdrv_get_aio_context(target_bs); - aio_context_release(aio_context); aio_context_acquire(old_context); ret = bdrv_try_change_aio_context(target_bs, aio_context, NULL, errp); @@ -XXX,XX +XXX,XX @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp) if (format) { qdict_put_str(options, "driver", format); } + aio_context_release(aio_context); /* Mirroring takes care of copy-on-write using the source's backing * file. */ + aio_context_acquire(qemu_get_aio_context()); target_bs = bdrv_open(arg->target, NULL, options, flags, errp); + aio_context_release(qemu_get_aio_context()); + if (!target_bs) { - goto out; + return; } zero_target = (arg->sync == MIRROR_SYNC_MODE_FULL && @@ -XXX,XX +XXX,XX @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp) /* Honor bdrv_try_change_aio_context() context acquisition requirements. */ old_context = bdrv_get_aio_context(target_bs); - aio_context_release(aio_context); aio_context_acquire(old_context); ret = bdrv_try_change_aio_context(target_bs, aio_context, NULL, errp); diff --git a/qemu-nbd.c b/qemu-nbd.c index XXXXXXX..XXXXXXX 100644 --- a/qemu-nbd.c +++ b/qemu-nbd.c @@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv) qdict_put_str(raw_opts, "driver", "raw"); qdict_put_str(raw_opts, "file", bs->node_name); qdict_put_int(raw_opts, "offset", dev_offset); + + aio_context_acquire(qemu_get_aio_context()); bs = bdrv_open(NULL, NULL, raw_opts, flags, &error_fatal); + aio_context_release(qemu_get_aio_context()); + blk_remove_bs(blk); blk_insert_bs(blk, bs, &error_fatal); bdrv_unref(bs); diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c index XXXXXXX..XXXXXXX 100644 --- a/tests/unit/test-block-iothread.c +++ b/tests/unit/test-block-iothread.c @@ -XXX,XX +XXX,XX @@ static void test_attach_second_node(void) qdict_put_str(options, "driver", "raw"); qdict_put_str(options, "file", "base"); + /* FIXME raw_open() should take ctx's lock internally */ aio_context_acquire(ctx); + aio_context_acquire(main_ctx); filter = bdrv_open(NULL, NULL, options, BDRV_O_RDWR, &error_abort); + aio_context_release(main_ctx); aio_context_release(ctx); g_assert(blk_get_aio_context(blk) == ctx); -- 2.40.1
This fixes blk_new_open() to not assume that bs is in the main context. In particular, the BlockBackend must be created with the right AioContext because it will refuse to move to a different context afterwards. (blk->allow_aio_context_change is false.) Use this opportunity to use blk_insert_bs() instead of duplicating the bdrv_root_attach_child() call. This is consistent with what blk_new_with_bs() does. Add comments to document the locking rules. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-5-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/block-backend.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index XXXXXXX..XXXXXXX 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -XXX,XX +XXX,XX @@ BlockBackend *blk_new(AioContext *ctx, uint64_t perm, uint64_t shared_perm) * Both sets of permissions can be changed later using blk_set_perm(). * * Return the new BlockBackend on success, null on failure. + * + * Callers must hold the AioContext lock of @bs. */ BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm, uint64_t shared_perm, Error **errp) @@ -XXX,XX +XXX,XX @@ BlockBackend *blk_new_with_bs(BlockDriverState *bs, uint64_t perm, /* * Creates a new BlockBackend, opens a new BlockDriverState, and connects both. - * The new BlockBackend is in the main AioContext. + * By default, the new BlockBackend is in the main AioContext, but if the + * parameters connect it with any existing node in a different AioContext, it + * may end up there instead. * * Just as with bdrv_open(), after having called this function the reference to * @options belongs to the block layer (even on failure). * + * Called without holding an AioContext lock. + * * TODO: Remove @filename and @flags; it should be possible to specify a whole * BDS tree just by specifying the @options QDict (or @reference, * alternatively). At the time of adding this function, this is not possible, @@ -XXX,XX +XXX,XX @@ BlockBackend *blk_new_open(const char *filename, const char *reference, { BlockBackend *blk; BlockDriverState *bs; + AioContext *ctx; uint64_t perm = 0; uint64_t shared = BLK_PERM_ALL; @@ -XXX,XX +XXX,XX @@ BlockBackend *blk_new_open(const char *filename, const char *reference, shared = BLK_PERM_CONSISTENT_READ | BLK_PERM_WRITE_UNCHANGED; } - blk = blk_new(qemu_get_aio_context(), perm, shared); aio_context_acquire(qemu_get_aio_context()); bs = bdrv_open(filename, reference, options, flags, errp); aio_context_release(qemu_get_aio_context()); if (!bs) { - blk_unref(blk); return NULL; } - blk->root = bdrv_root_attach_child(bs, "root", &child_root, - BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, - perm, shared, blk, errp); + /* bdrv_open() could have moved bs to a different AioContext */ + ctx = bdrv_get_aio_context(bs); + blk = blk_new(bdrv_get_aio_context(bs), perm, shared); + blk->perm = perm; + blk->shared_perm = shared; + + aio_context_acquire(ctx); + blk_insert_bs(blk, bs, errp); + bdrv_unref(bs); + aio_context_release(ctx); + if (!blk->root) { blk_unref(blk); return NULL; @@ -XXX,XX +XXX,XX @@ void blk_remove_bs(BlockBackend *blk) /* * Associates a new BlockDriverState with @blk. + * + * Callers must hold the AioContext lock of @bs. */ int blk_insert_bs(BlockBackend *blk, BlockDriverState *bs, Error **errp) { -- 2.40.1
bdrv_open_backing_file() calls bdrv_open_inherit(), so all callers must hold the main AioContext lock. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-6-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block.c | 2 ++ block/mirror.c | 6 ++++++ 2 files changed, 8 insertions(+) diff --git a/block.c b/block.c index XXXXXXX..XXXXXXX 100644 --- a/block.c +++ b/block.c @@ -XXX,XX +XXX,XX @@ int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd, * itself, all options starting with "${bdref_key}." are considered part of the * BlockdevRef. * + * The caller must hold the main AioContext lock. + * * TODO Can this be unified with bdrv_open_image()? */ int bdrv_open_backing_file(BlockDriverState *bs, QDict *parent_options, diff --git a/block/mirror.c b/block/mirror.c index XXXXXXX..XXXXXXX 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -XXX,XX +XXX,XX @@ static int mirror_exit_common(Job *job) bool abort = job->ret < 0; int ret = 0; + GLOBAL_STATE_CODE(); + if (s->prepared) { return 0; } s->prepared = true; + aio_context_acquire(qemu_get_aio_context()); + mirror_top_bs = s->mirror_top_bs; bs_opaque = mirror_top_bs->opaque; src = mirror_top_bs->backing->bs; @@ -XXX,XX +XXX,XX @@ static int mirror_exit_common(Job *job) bdrv_unref(mirror_top_bs); bdrv_unref(src); + aio_context_release(qemu_get_aio_context()); + return ret; } -- 2.40.1
qcow2_open() doesn't work correctly when opening the 'file' child moves bs to an iothread, for several reasons: - It uses BDRV_POLL_WHILE() to wait for the qcow2_open_entry() coroutine, which involves dropping the AioContext lock for bs when it is not in the main context - but we don't hold it, so this crashes. - It runs the qcow2_open_entry() coroutine in the current thread instead of the new AioContext of bs. - qcow2_open_entry() doesn't notify the main loop when it's done. This patches fixes these issues around delegating work to a coroutine. Temporarily dropping the main AioContext lock is not necessary because we know we run in the main thread. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-7-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block.c | 6 ++++++ block/qcow2.c | 8 ++++++-- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/block.c b/block.c index XXXXXXX..XXXXXXX 100644 --- a/block.c +++ b/block.c @@ -XXX,XX +XXX,XX @@ done: * BlockdevRef. * * The BlockdevRef will be removed from the options QDict. + * + * @parent can move to a different AioContext in this function. Callers must + * make sure that their AioContext locking is still correct after this. */ BdrvChild *bdrv_open_child(const char *filename, QDict *options, const char *bdref_key, @@ -XXX,XX +XXX,XX @@ BdrvChild *bdrv_open_child(const char *filename, /* * Wrapper on bdrv_open_child() for most popular case: open primary child of bs. + * + * @parent can move to a different AioContext in this function. Callers must + * make sure that their AioContext locking is still correct after this. */ int bdrv_open_file_child(const char *filename, QDict *options, const char *bdref_key, diff --git a/block/qcow2.c b/block/qcow2.c index XXXXXXX..XXXXXXX 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -XXX,XX +XXX,XX @@ static void coroutine_fn qcow2_open_entry(void *opaque) qoc->ret = qcow2_do_open(qoc->bs, qoc->options, qoc->flags, true, qoc->errp); qemu_co_mutex_unlock(&s->lock); + + aio_wait_kick(); } static int qcow2_open(BlockDriverState *bs, QDict *options, int flags, @@ -XXX,XX +XXX,XX @@ static int qcow2_open(BlockDriverState *bs, QDict *options, int flags, assert(!qemu_in_coroutine()); assert(qemu_get_current_aio_context() == qemu_get_aio_context()); - qemu_coroutine_enter(qemu_coroutine_create(qcow2_open_entry, &qoc)); - BDRV_POLL_WHILE(bs, qoc.ret == -EINPROGRESS); + + aio_co_enter(bdrv_get_aio_context(bs), + qemu_coroutine_create(qcow2_open_entry, &qoc)); + AIO_WAIT_WHILE_UNLOCKED(NULL, qoc.ret == -EINPROGRESS); return qoc.ret; } -- 2.40.1
When opening the 'file' child moves bs to an iothread, we need to hold the AioContext lock of it before we can call raw_apply_options() (and more specifically, bdrv_getlength() inside of it). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-8-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/raw-format.c | 5 +++++ tests/unit/test-block-iothread.c | 3 --- 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/block/raw-format.c b/block/raw-format.c index XXXXXXX..XXXXXXX 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -XXX,XX +XXX,XX @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags, Error **errp) { BDRVRawState *s = bs->opaque; + AioContext *ctx; bool has_size; uint64_t offset, size; BdrvChildRole file_role; @@ -XXX,XX +XXX,XX @@ static int raw_open(BlockDriverState *bs, QDict *options, int flags, bs->file->bs->filename); } + ctx = bdrv_get_aio_context(bs); + aio_context_acquire(ctx); ret = raw_apply_options(bs, s, offset, has_size, size, errp); + aio_context_release(ctx); + if (ret < 0) { return ret; } diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c index XXXXXXX..XXXXXXX 100644 --- a/tests/unit/test-block-iothread.c +++ b/tests/unit/test-block-iothread.c @@ -XXX,XX +XXX,XX @@ static void test_attach_second_node(void) qdict_put_str(options, "driver", "raw"); qdict_put_str(options, "file", "base"); - /* FIXME raw_open() should take ctx's lock internally */ - aio_context_acquire(ctx); aio_context_acquire(main_ctx); filter = bdrv_open(NULL, NULL, options, BDRV_O_RDWR, &error_abort); aio_context_release(main_ctx); - aio_context_release(ctx); g_assert(blk_get_aio_context(blk) == ctx); g_assert(bdrv_get_aio_context(bs) == ctx); -- 2.40.1
The AioContext lock must not be held for bdrv_open_child(), but it is necessary for the following operations, in particular those using nested event loops in coroutine wrappers. Temporarily dropping the main AioContext lock is not necessary because we know we run in the main thread. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-9-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/copy-before-write.c | 21 ++++++++++++++++----- 1 file changed, 16 insertions(+), 5 deletions(-) diff --git a/block/copy-before-write.c b/block/copy-before-write.c index XXXXXXX..XXXXXXX 100644 --- a/block/copy-before-write.c +++ b/block/copy-before-write.c @@ -XXX,XX +XXX,XX @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags, int64_t cluster_size; g_autoptr(BlockdevOptions) full_opts = NULL; BlockdevOptionsCbw *opts; + AioContext *ctx; int ret; full_opts = cbw_parse_options(options, errp); @@ -XXX,XX +XXX,XX @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags, return -EINVAL; } + ctx = bdrv_get_aio_context(bs); + aio_context_acquire(ctx); + if (opts->bitmap) { bitmap = block_dirty_bitmap_lookup(opts->bitmap->node, opts->bitmap->name, NULL, errp); if (!bitmap) { - return -EINVAL; + ret = -EINVAL; + goto out; } } s->on_cbw_error = opts->has_on_cbw_error ? opts->on_cbw_error : @@ -XXX,XX +XXX,XX @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags, s->bcs = block_copy_state_new(bs->file, s->target, bitmap, errp); if (!s->bcs) { error_prepend(errp, "Cannot create block-copy-state: "); - return -EINVAL; + ret = -EINVAL; + goto out; } cluster_size = block_copy_cluster_size(s->bcs); s->done_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, NULL, errp); if (!s->done_bitmap) { - return -EINVAL; + ret = -EINVAL; + goto out; } bdrv_disable_dirty_bitmap(s->done_bitmap); /* s->access_bitmap starts equal to bcs bitmap */ s->access_bitmap = bdrv_create_dirty_bitmap(bs, cluster_size, NULL, errp); if (!s->access_bitmap) { - return -EINVAL; + ret = -EINVAL; + goto out; } bdrv_disable_dirty_bitmap(s->access_bitmap); bdrv_dirty_bitmap_merge_internal(s->access_bitmap, @@ -XXX,XX +XXX,XX @@ static int cbw_open(BlockDriverState *bs, QDict *options, int flags, qemu_co_mutex_init(&s->lock); QLIST_INIT(&s->frozen_read_reqs); - return 0; + ret = 0; +out: + aio_context_release(ctx); + return ret; } static void cbw_close(BlockDriverState *bs) -- 2.40.1
bdrv_refresh_total_sectors() and bdrv_refresh_limits() expect to be called under the AioContext lock of the node. Take the lock. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-10-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/block.c b/block.c index XXXXXXX..XXXXXXX 100644 --- a/block.c +++ b/block.c @@ -XXX,XX +XXX,XX @@ static int no_coroutine_fn GRAPH_UNLOCKED bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name, QDict *options, int open_flags, Error **errp) { + AioContext *ctx; Error *local_err = NULL; int i, ret; GLOBAL_STATE_CODE(); @@ -XXX,XX +XXX,XX @@ bdrv_open_driver(BlockDriverState *bs, BlockDriver *drv, const char *node_name, bs->supported_read_flags |= BDRV_REQ_REGISTERED_BUF; bs->supported_write_flags |= BDRV_REQ_REGISTERED_BUF; + /* Get the context after .bdrv_open, it can change the context */ + ctx = bdrv_get_aio_context(bs); + aio_context_acquire(ctx); + ret = bdrv_refresh_total_sectors(bs, bs->total_sectors); if (ret < 0) { error_setg_errno(errp, -ret, "Could not refresh total sector count"); + aio_context_release(ctx); return ret; } bdrv_graph_rdlock_main_loop(); bdrv_refresh_limits(bs, NULL, &local_err); bdrv_graph_rdunlock_main_loop(); + aio_context_release(ctx); if (local_err) { error_propagate(errp, local_err); -- 2.40.1
While calling bdrv_new_open_driver_opts(), the main AioContext lock must be held, not the lock of the AioContext of the block subtree it will be added to afterwards. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-11-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/block.c b/block.c index XXXXXXX..XXXXXXX 100644 --- a/block.c +++ b/block.c @@ -XXX,XX +XXX,XX @@ static void bdrv_delete(BlockDriverState *bs) * empty set of options. The reference to the QDict belongs to the block layer * after the call (even on failure), so if the caller intends to reuse the * dictionary, it needs to use qobject_ref() before calling bdrv_open. + * + * The caller holds the AioContext lock for @bs. It must make sure that @bs + * stays in the same AioContext, i.e. @options must not refer to nodes in a + * different AioContext. */ BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *options, int flags, Error **errp) { ERRP_GUARD(); int ret; + AioContext *ctx = bdrv_get_aio_context(bs); BlockDriverState *new_node_bs = NULL; const char *drvname, *node_name; BlockDriver *drv; @@ -XXX,XX +XXX,XX @@ BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *options, GLOBAL_STATE_CODE(); + aio_context_release(ctx); + aio_context_acquire(qemu_get_aio_context()); new_node_bs = bdrv_new_open_driver_opts(drv, node_name, options, flags, errp); + aio_context_release(qemu_get_aio_context()); + aio_context_acquire(ctx); + assert(bdrv_get_aio_context(bs) == ctx); + options = NULL; /* bdrv_new_open_driver() eats options */ if (!new_node_bs) { error_prepend(errp, "Could not create node: "); -- 2.40.1
It has no internal callers, so its only use is being called from individual test cases. If the name starts with an underscore, it is considered private and linters warn against calling it. 256 only gets away with it currently because it's on the exception list for linters. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-12-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- tests/qemu-iotests/iotests.py | 2 +- tests/qemu-iotests/256 | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py index XXXXXXX..XXXXXXX 100644 --- a/tests/qemu-iotests/iotests.py +++ b/tests/qemu-iotests/iotests.py @@ -XXX,XX +XXX,XX @@ def _verify_virtio_blk() -> None: if 'virtio-blk' not in out: notrun('Missing virtio-blk in QEMU binary') -def _verify_virtio_scsi_pci_or_ccw() -> None: +def verify_virtio_scsi_pci_or_ccw() -> None: out = qemu_pipe('-M', 'none', '-device', 'help') if 'virtio-scsi-pci' not in out and 'virtio-scsi-ccw' not in out: notrun('Missing virtio-scsi-pci or virtio-scsi-ccw in QEMU binary') diff --git a/tests/qemu-iotests/256 b/tests/qemu-iotests/256 index XXXXXXX..XXXXXXX 100755 --- a/tests/qemu-iotests/256 +++ b/tests/qemu-iotests/256 @@ -XXX,XX +XXX,XX @@ import os import iotests from iotests import log -iotests._verify_virtio_scsi_pci_or_ccw() +iotests.verify_virtio_scsi_pci_or_ccw() iotests.script_initialize(supported_fmts=['qcow2']) size = 64 * 1024 * 1024 -- 2.40.1
If blockdev-create references an existing node in an iothread (e.g. as it's 'file' child), then suddenly all of the image creation code must run in that AioContext, too. Test that this actually works. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230525124713.401149-13-kwolf@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- tests/qemu-iotests/tests/iothreads-create | 67 +++++++++++++++++++ tests/qemu-iotests/tests/iothreads-create.out | 4 ++ 2 files changed, 71 insertions(+) create mode 100755 tests/qemu-iotests/tests/iothreads-create create mode 100644 tests/qemu-iotests/tests/iothreads-create.out diff --git a/tests/qemu-iotests/tests/iothreads-create b/tests/qemu-iotests/tests/iothreads-create new file mode 100755 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tests/qemu-iotests/tests/iothreads-create @@ -XXX,XX +XXX,XX @@ +#!/usr/bin/env python3 +# group: rw quick +# +# Copyright (C) 2023 Red Hat, Inc. +# +# This program is free software; you can redistribute it and/or modify +# it under the terms of the GNU General Public License as published by +# the Free Software Foundation; either version 2 of the License, or +# (at your option) any later version. +# +# This program is distributed in the hope that it will be useful, +# but WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +# GNU General Public License for more details. +# +# You should have received a copy of the GNU General Public License +# along with this program. If not, see <http://www.gnu.org/licenses/>. +# +# Creator/Owner: Kevin Wolf <kwolf@redhat.com> + +import asyncio +import iotests + +iotests.script_initialize(supported_fmts=['qcow2', 'qcow', 'qed', 'vdi', + 'vmdk', 'parallels']) +iotests.verify_virtio_scsi_pci_or_ccw() + +with iotests.FilePath('disk.img') as img_path, \ + iotests.VM() as vm: + + iotests.qemu_img_create('-f', 'raw', img_path, '0') + + vm.add_object('iothread,id=iothread0') + vm.add_blockdev(f'file,node-name=img-file,read-only=on,' + f'filename={img_path}') + vm.add_device('virtio-scsi,iothread=iothread0') + vm.add_device('scsi-hd,drive=img-file,share-rw=on') + + vm.launch() + + iotests.log(vm.qmp( + 'blockdev-reopen', + options=[{ + 'driver': 'file', + 'filename': img_path, + 'node-name': 'img-file', + 'read-only': False, + }], + )) + iotests.log(vm.qmp( + 'blockdev-create', + job_id='job0', + options={ + 'driver': iotests.imgfmt, + 'file': 'img-file', + 'size': 1024 * 1024, + }, + )) + + # Should succeed and not time out + try: + vm.run_job('job0', wait=5.0) + vm.shutdown() + except asyncio.TimeoutError: + # VM may be stuck, kill it + vm.kill() + raise diff --git a/tests/qemu-iotests/tests/iothreads-create.out b/tests/qemu-iotests/tests/iothreads-create.out new file mode 100644 index XXXXXXX..XXXXXXX --- /dev/null +++ b/tests/qemu-iotests/tests/iothreads-create.out @@ -XXX,XX +XXX,XX @@ +{"return": {}} +{"return": {}} +{"execute": "job-dismiss", "arguments": {"id": "job0"}} +{"return": {}} -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> blk_set_aio_context() is not fully transactional because blk_do_set_aio_context() updates blk->ctx outside the transaction. Most of the time this goes unnoticed but a BlockDevOps.drained_end() callback that invokes blk_get_aio_context() fails assert(ctx == blk->ctx). This happens because blk->ctx is only assigned after BlockDevOps.drained_end() is called and we're in an intermediate state where BlockDrvierState nodes already have the new context and the BlockBackend still has the old context. Making blk_set_aio_context() fully transactional solves this assertion failure because the BlockBackend's context is updated as part of the transaction (before BlockDevOps.drained_end() is called). Split blk_do_set_aio_context() in order to solve this assertion failure. This helper function actually serves two different purposes: 1. It drives blk_set_aio_context(). 2. It responds to BdrvChildClass->change_aio_ctx(). Get rid of the helper function. Do #1 inside blk_set_aio_context() and do #2 inside blk_root_set_aio_ctx_commit(). This simplifies the code. The only drawback of the fully transactional approach is that blk_set_aio_context() must contend with blk_root_set_aio_ctx_commit() being invoked as part of the AioContext change propagation. This can be solved by temporarily setting blk->allow_aio_context_change to true. Future patches call blk_get_aio_context() from BlockDevOps->drained_end(), so this patch will become necessary. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-2-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/block-backend.c | 61 ++++++++++++++++--------------------------- 1 file changed, 23 insertions(+), 38 deletions(-) diff --git a/block/block-backend.c b/block/block-backend.c index XXXXXXX..XXXXXXX 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -XXX,XX +XXX,XX @@ static AioContext *blk_aiocb_get_aio_context(BlockAIOCB *acb) return blk_get_aio_context(blk_acb->blk); } -static int blk_do_set_aio_context(BlockBackend *blk, AioContext *new_context, - bool update_root_node, Error **errp) +int blk_set_aio_context(BlockBackend *blk, AioContext *new_context, + Error **errp) { + bool old_allow_change; BlockDriverState *bs = blk_bs(blk); - ThrottleGroupMember *tgm = &blk->public.throttle_group_member; int ret; - if (bs) { - bdrv_ref(bs); - - if (update_root_node) { - /* - * update_root_node MUST be false for blk_root_set_aio_ctx_commit(), - * as we are already in the commit function of a transaction. - */ - ret = bdrv_try_change_aio_context(bs, new_context, blk->root, errp); - if (ret < 0) { - bdrv_unref(bs); - return ret; - } - } - /* - * Make blk->ctx consistent with the root node before we invoke any - * other operations like drain that might inquire blk->ctx - */ - blk->ctx = new_context; - if (tgm->throttle_state) { - bdrv_drained_begin(bs); - throttle_group_detach_aio_context(tgm); - throttle_group_attach_aio_context(tgm, new_context); - bdrv_drained_end(bs); - } + GLOBAL_STATE_CODE(); - bdrv_unref(bs); - } else { + if (!bs) { blk->ctx = new_context; + return 0; } - return 0; -} + bdrv_ref(bs); -int blk_set_aio_context(BlockBackend *blk, AioContext *new_context, - Error **errp) -{ - GLOBAL_STATE_CODE(); - return blk_do_set_aio_context(blk, new_context, true, errp); + old_allow_change = blk->allow_aio_context_change; + blk->allow_aio_context_change = true; + + ret = bdrv_try_change_aio_context(bs, new_context, NULL, errp); + + blk->allow_aio_context_change = old_allow_change; + + bdrv_unref(bs); + return ret; } typedef struct BdrvStateBlkRootContext { @@ -XXX,XX +XXX,XX @@ static void blk_root_set_aio_ctx_commit(void *opaque) { BdrvStateBlkRootContext *s = opaque; BlockBackend *blk = s->blk; + AioContext *new_context = s->new_ctx; + ThrottleGroupMember *tgm = &blk->public.throttle_group_member; - blk_do_set_aio_context(blk, s->new_ctx, false, &error_abort); + blk->ctx = new_context; + if (tgm->throttle_state) { + throttle_group_detach_aio_context(tgm); + throttle_group_attach_aio_context(tgm, new_context); + } } static TransactionActionDrv set_blk_root_context = { -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> Add a helper function to check whether the device is realized without requiring the Big QEMU Lock. The next patch adds a second caller. The goal is to avoid spreading DeviceState field accesses throughout the code. Suggested-by: Philippe Mathieu-Daudé <philmd@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-3-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/hw/qdev-core.h | 17 ++++++++++++++--- hw/scsi/scsi-bus.c | 3 +-- 2 files changed, 15 insertions(+), 5 deletions(-) diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/qdev-core.h +++ b/include/hw/qdev-core.h @@ -XXX,XX +XXX,XX @@ #ifndef QDEV_CORE_H #define QDEV_CORE_H +#include "qemu/atomic.h" #include "qemu/queue.h" #include "qemu/bitmap.h" #include "qemu/rcu.h" @@ -XXX,XX +XXX,XX @@ typedef struct { /** * DeviceState: - * @realized: Indicates whether the device has been fully constructed. - * When accessed outside big qemu lock, must be accessed with - * qatomic_load_acquire() * @reset: ResettableState for the device; handled by Resettable interface. * * This structure should not be accessed directly. We declare it here @@ -XXX,XX +XXX,XX @@ DeviceState *qdev_new(const char *name); */ DeviceState *qdev_try_new(const char *name); +/** + * qdev_is_realized: + * @dev: The device to check. + * + * May be called outside big qemu lock. + * + * Returns: %true% if the device has been fully constructed, %false% otherwise. + */ +static inline bool qdev_is_realized(DeviceState *dev) +{ + return qatomic_load_acquire(&dev->realized); +} + /** * qdev_realize: Realize @dev. * @dev: device to realize diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index XXXXXXX..XXXXXXX 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -XXX,XX +XXX,XX @@ static SCSIDevice *do_scsi_device_find(SCSIBus *bus, * the user access the device. */ - if (retval && !include_unrealized && - !qatomic_load_acquire(&retval->qdev.realized)) { + if (retval && !include_unrealized && !qdev_is_realized(&retval->qdev)) { retval = NULL; } -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> Only report a transport reset event to the guest after the SCSIDevice has been unrealized by qdev_simple_device_unplug_cb(). qdev_simple_device_unplug_cb() sets the SCSIDevice's qdev.realized field to false so that scsi_device_find/get() no longer see it. scsi_target_emulate_report_luns() also needs to be updated to filter out SCSIDevices that are unrealized. Change virtio_scsi_push_event() to take event information as an argument instead of the SCSIDevice. This allows virtio_scsi_hotunplug() to emit a VIRTIO_SCSI_T_TRANSPORT_RESET event after the SCSIDevice has already been unrealized. These changes ensure that the guest driver does not see the SCSIDevice that's being unplugged if it responds very quickly to the transport reset event. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-4-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- hw/scsi/scsi-bus.c | 3 +- hw/scsi/virtio-scsi.c | 86 ++++++++++++++++++++++++++++++------------- 2 files changed, 63 insertions(+), 26 deletions(-) diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index XXXXXXX..XXXXXXX 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -XXX,XX +XXX,XX @@ static bool scsi_target_emulate_report_luns(SCSITargetReq *r) DeviceState *qdev = kid->child; SCSIDevice *dev = SCSI_DEVICE(qdev); - if (dev->channel == channel && dev->id == id && dev->lun != 0) { + if (dev->channel == channel && dev->id == id && dev->lun != 0 && + qdev_is_realized(&dev->qdev)) { store_lun(tmp, dev->lun); g_byte_array_append(buf, tmp, 8); len += 8; diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index XXXXXXX..XXXXXXX 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -XXX,XX +XXX,XX @@ static void virtio_scsi_reset(VirtIODevice *vdev) s->events_dropped = false; } -static void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice *dev, - uint32_t event, uint32_t reason) +typedef struct { + uint32_t event; + uint32_t reason; + union { + /* Used by messages specific to a device */ + struct { + uint32_t id; + uint32_t lun; + } address; + }; +} VirtIOSCSIEventInfo; + +static void virtio_scsi_push_event(VirtIOSCSI *s, + const VirtIOSCSIEventInfo *info) { VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s); VirtIOSCSIReq *req; VirtIOSCSIEvent *evt; VirtIODevice *vdev = VIRTIO_DEVICE(s); + uint32_t event = info->event; + uint32_t reason = info->reason; if (!(vdev->status & VIRTIO_CONFIG_S_DRIVER_OK)) { return; @@ -XXX,XX +XXX,XX @@ static void virtio_scsi_push_event(VirtIOSCSI *s, SCSIDevice *dev, memset(evt, 0, sizeof(VirtIOSCSIEvent)); evt->event = virtio_tswap32(vdev, event); evt->reason = virtio_tswap32(vdev, reason); - if (!dev) { - assert(event == VIRTIO_SCSI_T_EVENTS_MISSED); - } else { + if (event != VIRTIO_SCSI_T_EVENTS_MISSED) { evt->lun[0] = 1; - evt->lun[1] = dev->id; + evt->lun[1] = info->address.id; /* Linux wants us to keep the same encoding we use for REPORT LUNS. */ - if (dev->lun >= 256) { - evt->lun[2] = (dev->lun >> 8) | 0x40; + if (info->address.lun >= 256) { + evt->lun[2] = (info->address.lun >> 8) | 0x40; } - evt->lun[3] = dev->lun & 0xFF; + evt->lun[3] = info->address.lun & 0xFF; } trace_virtio_scsi_event(virtio_scsi_get_lun(evt->lun), event, reason); - + virtio_scsi_complete_req(req); } static void virtio_scsi_handle_event_vq(VirtIOSCSI *s, VirtQueue *vq) { if (s->events_dropped) { - virtio_scsi_push_event(s, NULL, VIRTIO_SCSI_T_NO_EVENT, 0); + VirtIOSCSIEventInfo info = { + .event = VIRTIO_SCSI_T_NO_EVENT, + }; + virtio_scsi_push_event(s, &info); } } @@ -XXX,XX +XXX,XX @@ static void virtio_scsi_change(SCSIBus *bus, SCSIDevice *dev, SCSISense sense) if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_CHANGE) && dev->type != TYPE_ROM) { + VirtIOSCSIEventInfo info = { + .event = VIRTIO_SCSI_T_PARAM_CHANGE, + .reason = sense.asc | (sense.ascq << 8), + .address = { + .id = dev->id, + .lun = dev->lun, + }, + }; + virtio_scsi_acquire(s); - virtio_scsi_push_event(s, dev, VIRTIO_SCSI_T_PARAM_CHANGE, - sense.asc | (sense.ascq << 8)); + virtio_scsi_push_event(s, &info); virtio_scsi_release(s); } } @@ -XXX,XX +XXX,XX @@ static void virtio_scsi_hotplug(HotplugHandler *hotplug_dev, DeviceState *dev, } if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) { + VirtIOSCSIEventInfo info = { + .event = VIRTIO_SCSI_T_TRANSPORT_RESET, + .reason = VIRTIO_SCSI_EVT_RESET_RESCAN, + .address = { + .id = sd->id, + .lun = sd->lun, + }, + }; + virtio_scsi_acquire(s); - virtio_scsi_push_event(s, sd, - VIRTIO_SCSI_T_TRANSPORT_RESET, - VIRTIO_SCSI_EVT_RESET_RESCAN); + virtio_scsi_push_event(s, &info); scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED)); virtio_scsi_release(s); } @@ -XXX,XX +XXX,XX @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev, VirtIOSCSI *s = VIRTIO_SCSI(vdev); SCSIDevice *sd = SCSI_DEVICE(dev); AioContext *ctx = s->ctx ?: qemu_get_aio_context(); - - if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) { - virtio_scsi_acquire(s); - virtio_scsi_push_event(s, sd, - VIRTIO_SCSI_T_TRANSPORT_RESET, - VIRTIO_SCSI_EVT_RESET_REMOVED); - scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED)); - virtio_scsi_release(s); - } + VirtIOSCSIEventInfo info = { + .event = VIRTIO_SCSI_T_TRANSPORT_RESET, + .reason = VIRTIO_SCSI_EVT_RESET_REMOVED, + .address = { + .id = sd->id, + .lun = sd->lun, + }, + }; aio_disable_external(ctx); qdev_simple_device_unplug_cb(hotplug_dev, dev, errp); @@ -XXX,XX +XXX,XX @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev, blk_set_aio_context(sd->conf.blk, qemu_get_aio_context(), NULL); virtio_scsi_release(s); } + + if (virtio_vdev_has_feature(vdev, VIRTIO_SCSI_F_HOTPLUG)) { + virtio_scsi_acquire(s); + virtio_scsi_push_event(s, &info); + scsi_bus_set_ua(&s->bus, SENSE_CODE(REPORTED_LUNS_CHANGED)); + virtio_scsi_release(s); + } } static struct SCSIBusInfo virtio_scsi_scsi_info = { -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> This patch is part of an effort to remove the aio_disable_external() API because it does not fit in a multi-queue block layer world where many AioContexts may be submitting requests to the same disk. The SCSI emulation code is already in good shape to stop using aio_disable_external(). It was only used by commit 9c5aad84da1c ("virtio-scsi: fixed virtio_scsi_ctx_check failed when detaching scsi disk") to ensure that virtio_scsi_hotunplug() works while the guest driver is submitting I/O. Ensure virtio_scsi_hotunplug() is safe as follows: 1. qdev_simple_device_unplug_cb() -> qdev_unrealize() -> device_set_realized() calls qatomic_set(&dev->realized, false) so that future scsi_device_get() calls return NULL because they exclude SCSIDevices with realized=false. That means virtio-scsi will reject new I/O requests to this SCSIDevice with VIRTIO_SCSI_S_BAD_TARGET even while virtio_scsi_hotunplug() is still executing. We are protected against new requests! 2. scsi_qdev_unrealize() already contains a call to scsi_device_purge_requests() so that in-flight requests are cancelled synchronously. This ensures that no in-flight requests remain once qdev_simple_device_unplug_cb() returns. Thanks to these two conditions we don't need aio_disable_external() anymore. Cc: Zhengui Li <lizhengui@huawei.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Daniil Tatianin <d-tatianin@yandex-team.ru> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-5-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- hw/scsi/virtio-scsi.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index XXXXXXX..XXXXXXX 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -XXX,XX +XXX,XX @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev, VirtIODevice *vdev = VIRTIO_DEVICE(hotplug_dev); VirtIOSCSI *s = VIRTIO_SCSI(vdev); SCSIDevice *sd = SCSI_DEVICE(dev); - AioContext *ctx = s->ctx ?: qemu_get_aio_context(); VirtIOSCSIEventInfo info = { .event = VIRTIO_SCSI_T_TRANSPORT_RESET, .reason = VIRTIO_SCSI_EVT_RESET_REMOVED, @@ -XXX,XX +XXX,XX @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev, }, }; - aio_disable_external(ctx); qdev_simple_device_unplug_cb(hotplug_dev, dev, errp); - aio_enable_external(ctx); if (s->ctx) { virtio_scsi_acquire(s); -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> The VuServer object has a refcount field and ref/unref APIs. The name is confusing because it's actually an in-flight request counter instead of a refcount. Normally a refcount destroys the object upon reaching zero. The VuServer counter is used to wake up the vhost-user coroutine when there are no more requests. Avoid confusing by renaming refcount and ref/unref to in_flight and inc/dec. Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-6-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/qemu/vhost-user-server.h | 6 +++--- block/export/vhost-user-blk-server.c | 11 +++++++---- util/vhost-user-server.c | 14 +++++++------- 3 files changed, 17 insertions(+), 14 deletions(-) diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h index XXXXXXX..XXXXXXX 100644 --- a/include/qemu/vhost-user-server.h +++ b/include/qemu/vhost-user-server.h @@ -XXX,XX +XXX,XX @@ typedef struct { const VuDevIface *vu_iface; /* Protected by ctx lock */ - unsigned int refcount; + unsigned int in_flight; bool wait_idle; VuDev vu_dev; QIOChannel *ioc; /* The I/O channel with the client */ @@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server, void vhost_user_server_stop(VuServer *server); -void vhost_user_server_ref(VuServer *server); -void vhost_user_server_unref(VuServer *server); +void vhost_user_server_inc_in_flight(VuServer *server); +void vhost_user_server_dec_in_flight(VuServer *server); void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx); void vhost_user_server_detach_aio_context(VuServer *server); diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c index XXXXXXX..XXXXXXX 100644 --- a/block/export/vhost-user-blk-server.c +++ b/block/export/vhost-user-blk-server.c @@ -XXX,XX +XXX,XX @@ static void vu_blk_req_complete(VuBlkReq *req, size_t in_len) free(req); } -/* Called with server refcount increased, must decrease before returning */ +/* + * Called with server in_flight counter increased, must decrease before + * returning. + */ static void coroutine_fn vu_blk_virtio_process_req(void *opaque) { VuBlkReq *req = opaque; @@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_blk_virtio_process_req(void *opaque) in_num, out_num); if (in_len < 0) { free(req); - vhost_user_server_unref(server); + vhost_user_server_dec_in_flight(server); return; } vu_blk_req_complete(req, in_len); - vhost_user_server_unref(server); + vhost_user_server_dec_in_flight(server); } static void vu_blk_process_vq(VuDev *vu_dev, int idx) @@ -XXX,XX +XXX,XX @@ static void vu_blk_process_vq(VuDev *vu_dev, int idx) Coroutine *co = qemu_coroutine_create(vu_blk_virtio_process_req, req); - vhost_user_server_ref(server); + vhost_user_server_inc_in_flight(server); qemu_coroutine_enter(co); } } diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c index XXXXXXX..XXXXXXX 100644 --- a/util/vhost-user-server.c +++ b/util/vhost-user-server.c @@ -XXX,XX +XXX,XX @@ static void panic_cb(VuDev *vu_dev, const char *buf) error_report("vu_panic: %s", buf); } -void vhost_user_server_ref(VuServer *server) +void vhost_user_server_inc_in_flight(VuServer *server) { assert(!server->wait_idle); - server->refcount++; + server->in_flight++; } -void vhost_user_server_unref(VuServer *server) +void vhost_user_server_dec_in_flight(VuServer *server) { - server->refcount--; - if (server->wait_idle && !server->refcount) { + server->in_flight--; + if (server->wait_idle && !server->in_flight) { aio_co_wake(server->co_trip); } } @@ -XXX,XX +XXX,XX @@ static coroutine_fn void vu_client_trip(void *opaque) /* Keep running */ } - if (server->refcount) { + if (server->in_flight) { /* Wait for requests to complete before we can unmap the memory */ server->wait_idle = true; qemu_coroutine_yield(); server->wait_idle = false; } - assert(server->refcount == 0); + assert(server->in_flight == 0); vu_deinit(vu_dev); -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> Each vhost-user-blk request runs in a coroutine. When the BlockBackend enters a drained section we need to enter a quiescent state. Currently any in-flight requests race with bdrv_drained_begin() because it is unaware of vhost-user-blk requests. When blk_co_preadv/pwritev()/etc returns it wakes the bdrv_drained_begin() thread but vhost-user-blk request processing has not yet finished. The request coroutine continues executing while the main loop thread thinks it is in a drained section. One example where this is unsafe is for blk_set_aio_context() where bdrv_drained_begin() is called before .aio_context_detached() and .aio_context_attach(). If request coroutines are still running after bdrv_drained_begin(), then the AioContext could change underneath them and they race with new requests processed in the new AioContext. This could lead to virtqueue corruption, for example. (This example is theoretical, I came across this while reading the code and have not tried to reproduce it.) It's easy to make bdrv_drained_begin() wait for in-flight requests: add a .drained_poll() callback that checks the VuServer's in-flight counter. VuServer just needs an API that returns true when there are requests in flight. The in-flight counter needs to be atomic. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-7-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/qemu/vhost-user-server.h | 4 +++- block/export/vhost-user-blk-server.c | 13 +++++++++++++ util/vhost-user-server.c | 18 ++++++++++++------ 3 files changed, 28 insertions(+), 7 deletions(-) diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h index XXXXXXX..XXXXXXX 100644 --- a/include/qemu/vhost-user-server.h +++ b/include/qemu/vhost-user-server.h @@ -XXX,XX +XXX,XX @@ typedef struct { int max_queues; const VuDevIface *vu_iface; + unsigned int in_flight; /* atomic */ + /* Protected by ctx lock */ - unsigned int in_flight; bool wait_idle; VuDev vu_dev; QIOChannel *ioc; /* The I/O channel with the client */ @@ -XXX,XX +XXX,XX @@ void vhost_user_server_stop(VuServer *server); void vhost_user_server_inc_in_flight(VuServer *server); void vhost_user_server_dec_in_flight(VuServer *server); +bool vhost_user_server_has_in_flight(VuServer *server); void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx); void vhost_user_server_detach_aio_context(VuServer *server); diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c index XXXXXXX..XXXXXXX 100644 --- a/block/export/vhost-user-blk-server.c +++ b/block/export/vhost-user-blk-server.c @@ -XXX,XX +XXX,XX @@ static void vu_blk_exp_resize(void *opaque) vu_config_change_msg(&vexp->vu_server.vu_dev); } +/* + * Ensures that bdrv_drained_begin() waits until in-flight requests complete. + * + * Called with vexp->export.ctx acquired. + */ +static bool vu_blk_drained_poll(void *opaque) +{ + VuBlkExport *vexp = opaque; + + return vhost_user_server_has_in_flight(&vexp->vu_server); +} + static const BlockDevOps vu_blk_dev_ops = { + .drained_poll = vu_blk_drained_poll, .resize_cb = vu_blk_exp_resize, }; diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c index XXXXXXX..XXXXXXX 100644 --- a/util/vhost-user-server.c +++ b/util/vhost-user-server.c @@ -XXX,XX +XXX,XX @@ static void panic_cb(VuDev *vu_dev, const char *buf) void vhost_user_server_inc_in_flight(VuServer *server) { assert(!server->wait_idle); - server->in_flight++; + qatomic_inc(&server->in_flight); } void vhost_user_server_dec_in_flight(VuServer *server) { - server->in_flight--; - if (server->wait_idle && !server->in_flight) { - aio_co_wake(server->co_trip); + if (qatomic_fetch_dec(&server->in_flight) == 1) { + if (server->wait_idle) { + aio_co_wake(server->co_trip); + } } } +bool vhost_user_server_has_in_flight(VuServer *server) +{ + return qatomic_load_acquire(&server->in_flight) > 0; +} + static bool coroutine_fn vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg) { @@ -XXX,XX +XXX,XX @@ static coroutine_fn void vu_client_trip(void *opaque) /* Keep running */ } - if (server->in_flight) { + if (vhost_user_server_has_in_flight(server)) { /* Wait for requests to complete before we can unmap the memory */ server->wait_idle = true; qemu_coroutine_yield(); server->wait_idle = false; } - assert(server->in_flight == 0); + assert(!vhost_user_server_has_in_flight(server)); vu_deinit(vu_dev); -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> vhost-user activity must be suspended during bdrv_drained_begin/end(). This prevents new requests from interfering with whatever is happening in the drained section. Previously this was done using aio_set_fd_handler()'s is_external argument. In a multi-queue block layer world the aio_disable_external() API cannot be used since multiple AioContext may be processing I/O, not just one. Switch to BlockDevOps->drained_begin/end() callbacks. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-8-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/export/vhost-user-blk-server.c | 28 ++++++++++++++++++++++++++-- util/vhost-user-server.c | 10 +++++----- 2 files changed, 31 insertions(+), 7 deletions(-) diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c index XXXXXXX..XXXXXXX 100644 --- a/block/export/vhost-user-blk-server.c +++ b/block/export/vhost-user-blk-server.c @@ -XXX,XX +XXX,XX @@ static void blk_aio_attached(AioContext *ctx, void *opaque) { VuBlkExport *vexp = opaque; + /* + * The actual attach will happen in vu_blk_drained_end() and we just + * restore ctx here. + */ vexp->export.ctx = ctx; - vhost_user_server_attach_aio_context(&vexp->vu_server, ctx); } static void blk_aio_detach(void *opaque) { VuBlkExport *vexp = opaque; - vhost_user_server_detach_aio_context(&vexp->vu_server); + /* + * The actual detach already happened in vu_blk_drained_begin() but from + * this point on we must not access ctx anymore. + */ vexp->export.ctx = NULL; } @@ -XXX,XX +XXX,XX @@ static void vu_blk_exp_resize(void *opaque) vu_config_change_msg(&vexp->vu_server.vu_dev); } +/* Called with vexp->export.ctx acquired */ +static void vu_blk_drained_begin(void *opaque) +{ + VuBlkExport *vexp = opaque; + + vhost_user_server_detach_aio_context(&vexp->vu_server); +} + +/* Called with vexp->export.blk AioContext acquired */ +static void vu_blk_drained_end(void *opaque) +{ + VuBlkExport *vexp = opaque; + + vhost_user_server_attach_aio_context(&vexp->vu_server, vexp->export.ctx); +} + /* * Ensures that bdrv_drained_begin() waits until in-flight requests complete. * @@ -XXX,XX +XXX,XX @@ static bool vu_blk_drained_poll(void *opaque) } static const BlockDevOps vu_blk_dev_ops = { + .drained_begin = vu_blk_drained_begin, + .drained_end = vu_blk_drained_end, .drained_poll = vu_blk_drained_poll, .resize_cb = vu_blk_exp_resize, }; diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c index XXXXXXX..XXXXXXX 100644 --- a/util/vhost-user-server.c +++ b/util/vhost-user-server.c @@ -XXX,XX +XXX,XX @@ set_watch(VuDev *vu_dev, int fd, int vu_evt, vu_fd_watch->fd = fd; vu_fd_watch->cb = cb; qemu_socket_set_nonblock(fd); - aio_set_fd_handler(server->ioc->ctx, fd, true, kick_handler, + aio_set_fd_handler(server->ioc->ctx, fd, false, kick_handler, NULL, NULL, NULL, vu_fd_watch); vu_fd_watch->vu_dev = vu_dev; vu_fd_watch->pvt = pvt; @@ -XXX,XX +XXX,XX @@ static void remove_watch(VuDev *vu_dev, int fd) if (!vu_fd_watch) { return; } - aio_set_fd_handler(server->ioc->ctx, fd, true, + aio_set_fd_handler(server->ioc->ctx, fd, false, NULL, NULL, NULL, NULL, NULL); QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next); @@ -XXX,XX +XXX,XX @@ void vhost_user_server_stop(VuServer *server) VuFdWatch *vu_fd_watch; QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) { - aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true, + aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false, NULL, NULL, NULL, NULL, vu_fd_watch); } @@ -XXX,XX +XXX,XX @@ void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx) qio_channel_attach_aio_context(server->ioc, ctx); QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) { - aio_set_fd_handler(ctx, vu_fd_watch->fd, true, kick_handler, NULL, + aio_set_fd_handler(ctx, vu_fd_watch->fd, false, kick_handler, NULL, NULL, NULL, vu_fd_watch); } @@ -XXX,XX +XXX,XX @@ void vhost_user_server_detach_aio_context(VuServer *server) VuFdWatch *vu_fd_watch; QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) { - aio_set_fd_handler(server->ctx, vu_fd_watch->fd, true, + aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false, NULL, NULL, NULL, NULL, vu_fd_watch); } -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> There is no need to suspend activity between aio_disable_external() and aio_enable_external(), which is mainly used for the block layer's drain operation. This is part of ongoing work to remove the aio_disable_external() API. Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: Paul Durrant <paul@xen.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-9-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- hw/i386/kvm/xen_xenstore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index XXXXXXX..XXXXXXX 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -XXX,XX +XXX,XX @@ static void xen_xenstore_realize(DeviceState *dev, Error **errp) error_setg(errp, "Xenstore evtchn port init failed"); return; } - aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), true, + aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), false, xen_xenstore_event, NULL, NULL, NULL, s); s->impl = xs_impl_create(xen_domid); -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> The BlockBackend quiesce_counter is greater than zero during drained sections. Add an API to check whether the BlockBackend is in a drained section. The next patch will use this API. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-10-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/sysemu/block-backend-global-state.h | 1 + block/block-backend.c | 7 +++++++ 2 files changed, 8 insertions(+) diff --git a/include/sysemu/block-backend-global-state.h b/include/sysemu/block-backend-global-state.h index XXXXXXX..XXXXXXX 100644 --- a/include/sysemu/block-backend-global-state.h +++ b/include/sysemu/block-backend-global-state.h @@ -XXX,XX +XXX,XX @@ void blk_activate(BlockBackend *blk, Error **errp); int blk_make_zero(BlockBackend *blk, BdrvRequestFlags flags); void blk_aio_cancel(BlockAIOCB *acb); int blk_commit_all(void); +bool blk_in_drain(BlockBackend *blk); void blk_drain(BlockBackend *blk); void blk_drain_all(void); void blk_set_on_error(BlockBackend *blk, BlockdevOnError on_read_error, diff --git a/block/block-backend.c b/block/block-backend.c index XXXXXXX..XXXXXXX 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -XXX,XX +XXX,XX @@ blk_check_byte_request(BlockBackend *blk, int64_t offset, int64_t bytes) return 0; } +/* Are we currently in a drained section? */ +bool blk_in_drain(BlockBackend *blk) +{ + GLOBAL_STATE_CODE(); /* change to IO_OR_GS_CODE(), if necessary */ + return qatomic_read(&blk->quiesce_counter); +} + /* To be called between exactly one pair of blk_inc/dec_in_flight() */ static void coroutine_fn blk_wait_while_drained(BlockBackend *blk) { -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> For simplicity, always run BlockDevOps .drained_begin/end/poll() callbacks in the main loop thread. This makes it easier to implement the callbacks and avoids extra locks. Move the function pointer declarations from the I/O Code section to the Global State section for BlockDevOps, BdrvChildClass, and BlockDriver. Narrow IO_OR_GS_CODE() to GLOBAL_STATE_CODE() where appropriate. The test-bdrv-drain test case calls bdrv_drain() from an IOThread. This is now only allowed from coroutine context, so update the test case to run in a coroutine. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-11-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/block/block_int-common.h | 72 +++++++++++++-------------- include/sysemu/block-backend-common.h | 25 +++++----- block/io.c | 14 ++++-- tests/unit/test-bdrv-drain.c | 14 +++--- 4 files changed, 67 insertions(+), 58 deletions(-) diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -XXX,XX +XXX,XX @@ struct BlockDriver { void (*bdrv_attach_aio_context)(BlockDriverState *bs, AioContext *new_context); + /** + * bdrv_drain_begin is called if implemented in the beginning of a + * drain operation to drain and stop any internal sources of requests in + * the driver. + * bdrv_drain_end is called if implemented at the end of the drain. + * + * They should be used by the driver to e.g. manage scheduled I/O + * requests, or toggle an internal state. After the end of the drain new + * requests will continue normally. + * + * Implementations of both functions must not call aio_poll(). + */ + void (*bdrv_drain_begin)(BlockDriverState *bs); + void (*bdrv_drain_end)(BlockDriverState *bs); + /** * Try to get @bs's logical and physical block size. * On success, store them in @bsz and return zero. @@ -XXX,XX +XXX,XX @@ struct BlockDriver { void coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_io_unplug)( BlockDriverState *bs); - /** - * bdrv_drain_begin is called if implemented in the beginning of a - * drain operation to drain and stop any internal sources of requests in - * the driver. - * bdrv_drain_end is called if implemented at the end of the drain. - * - * They should be used by the driver to e.g. manage scheduled I/O - * requests, or toggle an internal state. After the end of the drain new - * requests will continue normally. - * - * Implementations of both functions must not call aio_poll(). - */ - void (*bdrv_drain_begin)(BlockDriverState *bs); - void (*bdrv_drain_end)(BlockDriverState *bs); - bool (*bdrv_supports_persistent_dirty_bitmap)(BlockDriverState *bs); bool coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_can_store_new_dirty_bitmap)( @@ -XXX,XX +XXX,XX @@ struct BdrvChildClass { void GRAPH_WRLOCK_PTR (*attach)(BdrvChild *child); void GRAPH_WRLOCK_PTR (*detach)(BdrvChild *child); + /* + * If this pair of functions is implemented, the parent doesn't issue new + * requests after returning from .drained_begin() until .drained_end() is + * called. + * + * These functions must not change the graph (and therefore also must not + * call aio_poll(), which could change the graph indirectly). + * + * Note that this can be nested. If drained_begin() was called twice, new + * I/O is allowed only after drained_end() was called twice, too. + */ + void (*drained_begin)(BdrvChild *child); + void (*drained_end)(BdrvChild *child); + + /* + * Returns whether the parent has pending requests for the child. This + * callback is polled after .drained_begin() has been called until all + * activity on the child has stopped. + */ + bool (*drained_poll)(BdrvChild *child); + /* * Notifies the parent that the filename of its child has changed (e.g. * because the direct child was removed from the backing chain), so that it @@ -XXX,XX +XXX,XX @@ struct BdrvChildClass { const char *(*get_name)(BdrvChild *child); AioContext *(*get_parent_aio_context)(BdrvChild *child); - - /* - * If this pair of functions is implemented, the parent doesn't issue new - * requests after returning from .drained_begin() until .drained_end() is - * called. - * - * These functions must not change the graph (and therefore also must not - * call aio_poll(), which could change the graph indirectly). - * - * Note that this can be nested. If drained_begin() was called twice, new - * I/O is allowed only after drained_end() was called twice, too. - */ - void (*drained_begin)(BdrvChild *child); - void (*drained_end)(BdrvChild *child); - - /* - * Returns whether the parent has pending requests for the child. This - * callback is polled after .drained_begin() has been called until all - * activity on the child has stopped. - */ - bool (*drained_poll)(BdrvChild *child); }; extern const BdrvChildClass child_of_bds; diff --git a/include/sysemu/block-backend-common.h b/include/sysemu/block-backend-common.h index XXXXXXX..XXXXXXX 100644 --- a/include/sysemu/block-backend-common.h +++ b/include/sysemu/block-backend-common.h @@ -XXX,XX +XXX,XX @@ typedef struct BlockDevOps { */ bool (*is_medium_locked)(void *opaque); + /* + * Runs when the backend receives a drain request. + */ + void (*drained_begin)(void *opaque); + /* + * Runs when the backend's last drain request ends. + */ + void (*drained_end)(void *opaque); + /* + * Is the device still busy? + */ + bool (*drained_poll)(void *opaque); + /* * I/O API functions. These functions are thread-safe. * @@ -XXX,XX +XXX,XX @@ typedef struct BlockDevOps { * Runs when the size changed (e.g. monitor command block_resize) */ void (*resize_cb)(void *opaque); - /* - * Runs when the backend receives a drain request. - */ - void (*drained_begin)(void *opaque); - /* - * Runs when the backend's last drain request ends. - */ - void (*drained_end)(void *opaque); - /* - * Is the device still busy? - */ - bool (*drained_poll)(void *opaque); } BlockDevOps; /* diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ static void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore) void bdrv_parent_drained_end_single(BdrvChild *c) { - IO_OR_GS_CODE(); + GLOBAL_STATE_CODE(); assert(c->quiesced_parent); c->quiesced_parent = false; @@ -XXX,XX +XXX,XX @@ static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore, void bdrv_parent_drained_begin_single(BdrvChild *c) { - IO_OR_GS_CODE(); + GLOBAL_STATE_CODE(); assert(!c->quiesced_parent); c->quiesced_parent = true; @@ -XXX,XX +XXX,XX @@ typedef struct { bool bdrv_drain_poll(BlockDriverState *bs, BdrvChild *ignore_parent, bool ignore_bds_parents) { - IO_OR_GS_CODE(); + GLOBAL_STATE_CODE(); if (bdrv_parent_drained_poll(bs, ignore_parent, ignore_bds_parents)) { return true; @@ -XXX,XX +XXX,XX @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs, if (ctx != co_ctx) { aio_context_release(ctx); } - replay_bh_schedule_oneshot_event(ctx, bdrv_co_drain_bh_cb, &data); + replay_bh_schedule_oneshot_event(qemu_get_aio_context(), + bdrv_co_drain_bh_cb, &data); qemu_coroutine_yield(); /* If we are resumed from some other event (such as an aio completion or a @@ -XXX,XX +XXX,XX @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent, return; } + GLOBAL_STATE_CODE(); + /* Stop things in parent-to-child order */ if (qatomic_fetch_inc(&bs->quiesce_counter) == 0) { aio_disable_external(bdrv_get_aio_context(bs)); @@ -XXX,XX +XXX,XX @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent) { int old_quiesce_counter; + IO_OR_GS_CODE(); + if (qemu_in_coroutine()) { bdrv_co_yield_to_drain(bs, false, parent, false); return; } assert(bs->quiesce_counter > 0); + GLOBAL_STATE_CODE(); /* Re-enable things in child-to-parent order */ old_quiesce_counter = qatomic_fetch_dec(&bs->quiesce_counter); diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c index XXXXXXX..XXXXXXX 100644 --- a/tests/unit/test-bdrv-drain.c +++ b/tests/unit/test-bdrv-drain.c @@ -XXX,XX +XXX,XX @@ struct test_iothread_data { BlockDriverState *bs; enum drain_type drain_type; int *aio_ret; + bool co_done; }; -static void test_iothread_drain_entry(void *opaque) +static void coroutine_fn test_iothread_drain_co_entry(void *opaque) { struct test_iothread_data *data = opaque; - aio_context_acquire(bdrv_get_aio_context(data->bs)); do_drain_begin(data->drain_type, data->bs); g_assert_cmpint(*data->aio_ret, ==, 0); do_drain_end(data->drain_type, data->bs); - aio_context_release(bdrv_get_aio_context(data->bs)); - qemu_event_set(&done_event); + data->co_done = true; + aio_wait_kick(); } static void test_iothread_aio_cb(void *opaque, int ret) @@ -XXX,XX +XXX,XX @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread) BlockDriverState *bs; BDRVTestState *s; BlockAIOCB *acb; + Coroutine *co; int aio_ret; struct test_iothread_data data; @@ -XXX,XX +XXX,XX @@ static void test_iothread_common(enum drain_type drain_type, int drain_thread) } break; case 1: - aio_bh_schedule_oneshot(ctx_a, test_iothread_drain_entry, &data); - qemu_event_wait(&done_event); + co = qemu_coroutine_create(test_iothread_drain_co_entry, &data); + aio_co_enter(ctx_a, co); + AIO_WAIT_WHILE_UNLOCKED(NULL, !data.co_done); break; default: g_assert_not_reached(); -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> Detach event channels during drained sections to stop I/O submission from the ring. xen-block is no longer reliant on aio_disable_external() after this patch. This will allow us to remove the aio_disable_external() API once all other code that relies on it is converted. Extend xen_device_set_event_channel_context() to allow ctx=NULL. The event channel still exists but the event loop does not monitor the file descriptor. Event channel processing can resume by calling xen_device_set_event_channel_context() with a non-NULL ctx. Factor out xen_device_set_event_channel_context() calls in hw/block/dataplane/xen-block.c into attach/detach helper functions. Incidentally, these don't require the AioContext lock because aio_set_fd_handler() is thread-safe. It's safer to register BlockDevOps after the dataplane instance has been created. The BlockDevOps .drained_begin/end() callbacks depend on the dataplane instance, so move the blk_set_dev_ops() call after xen_block_dataplane_create(). Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-12-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- hw/block/dataplane/xen-block.h | 2 ++ hw/block/dataplane/xen-block.c | 42 +++++++++++++++++++++++++--------- hw/block/xen-block.c | 24 ++++++++++++++++--- hw/xen/xen-bus.c | 7 ++++-- 4 files changed, 59 insertions(+), 16 deletions(-) diff --git a/hw/block/dataplane/xen-block.h b/hw/block/dataplane/xen-block.h index XXXXXXX..XXXXXXX 100644 --- a/hw/block/dataplane/xen-block.h +++ b/hw/block/dataplane/xen-block.h @@ -XXX,XX +XXX,XX @@ void xen_block_dataplane_start(XenBlockDataPlane *dataplane, unsigned int protocol, Error **errp); void xen_block_dataplane_stop(XenBlockDataPlane *dataplane); +void xen_block_dataplane_attach(XenBlockDataPlane *dataplane); +void xen_block_dataplane_detach(XenBlockDataPlane *dataplane); #endif /* HW_BLOCK_DATAPLANE_XEN_BLOCK_H */ diff --git a/hw/block/dataplane/xen-block.c b/hw/block/dataplane/xen-block.c index XXXXXXX..XXXXXXX 100644 --- a/hw/block/dataplane/xen-block.c +++ b/hw/block/dataplane/xen-block.c @@ -XXX,XX +XXX,XX @@ void xen_block_dataplane_destroy(XenBlockDataPlane *dataplane) g_free(dataplane); } +void xen_block_dataplane_detach(XenBlockDataPlane *dataplane) +{ + if (!dataplane || !dataplane->event_channel) { + return; + } + + /* Only reason for failure is a NULL channel */ + xen_device_set_event_channel_context(dataplane->xendev, + dataplane->event_channel, + NULL, &error_abort); +} + +void xen_block_dataplane_attach(XenBlockDataPlane *dataplane) +{ + if (!dataplane || !dataplane->event_channel) { + return; + } + + /* Only reason for failure is a NULL channel */ + xen_device_set_event_channel_context(dataplane->xendev, + dataplane->event_channel, + dataplane->ctx, &error_abort); +} + void xen_block_dataplane_stop(XenBlockDataPlane *dataplane) { XenDevice *xendev; @@ -XXX,XX +XXX,XX @@ void xen_block_dataplane_stop(XenBlockDataPlane *dataplane) xendev = dataplane->xendev; - aio_context_acquire(dataplane->ctx); - if (dataplane->event_channel) { - /* Only reason for failure is a NULL channel */ - xen_device_set_event_channel_context(xendev, dataplane->event_channel, - qemu_get_aio_context(), - &error_abort); + if (!blk_in_drain(dataplane->blk)) { + xen_block_dataplane_detach(dataplane); } + + aio_context_acquire(dataplane->ctx); /* Xen doesn't have multiple users for nodes, so this can't fail */ blk_set_aio_context(dataplane->blk, qemu_get_aio_context(), &error_abort); aio_context_release(dataplane->ctx); @@ -XXX,XX +XXX,XX @@ void xen_block_dataplane_start(XenBlockDataPlane *dataplane, blk_set_aio_context(dataplane->blk, dataplane->ctx, NULL); aio_context_release(old_context); - /* Only reason for failure is a NULL channel */ - aio_context_acquire(dataplane->ctx); - xen_device_set_event_channel_context(xendev, dataplane->event_channel, - dataplane->ctx, &error_abort); - aio_context_release(dataplane->ctx); + if (!blk_in_drain(dataplane->blk)) { + xen_block_dataplane_attach(dataplane); + } return; diff --git a/hw/block/xen-block.c b/hw/block/xen-block.c index XXXXXXX..XXXXXXX 100644 --- a/hw/block/xen-block.c +++ b/hw/block/xen-block.c @@ -XXX,XX +XXX,XX @@ static void xen_block_resize_cb(void *opaque) xen_device_backend_printf(xendev, "state", "%u", state); } +/* Suspend request handling */ +static void xen_block_drained_begin(void *opaque) +{ + XenBlockDevice *blockdev = opaque; + + xen_block_dataplane_detach(blockdev->dataplane); +} + +/* Resume request handling */ +static void xen_block_drained_end(void *opaque) +{ + XenBlockDevice *blockdev = opaque; + + xen_block_dataplane_attach(blockdev->dataplane); +} + static const BlockDevOps xen_block_dev_ops = { - .resize_cb = xen_block_resize_cb, + .resize_cb = xen_block_resize_cb, + .drained_begin = xen_block_drained_begin, + .drained_end = xen_block_drained_end, }; static void xen_block_realize(XenDevice *xendev, Error **errp) @@ -XXX,XX +XXX,XX @@ static void xen_block_realize(XenDevice *xendev, Error **errp) return; } - blk_set_dev_ops(blk, &xen_block_dev_ops, blockdev); - if (conf->discard_granularity == -1) { conf->discard_granularity = conf->physical_block_size; } @@ -XXX,XX +XXX,XX @@ static void xen_block_realize(XenDevice *xendev, Error **errp) blockdev->dataplane = xen_block_dataplane_create(xendev, blk, conf->logical_block_size, blockdev->props.iothread); + + blk_set_dev_ops(blk, &xen_block_dev_ops, blockdev); } static void xen_block_frontend_changed(XenDevice *xendev, diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index XXXXXXX..XXXXXXX 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -XXX,XX +XXX,XX @@ void xen_device_set_event_channel_context(XenDevice *xendev, NULL, NULL, NULL, NULL, NULL); channel->ctx = ctx; - aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true, - xen_device_event, NULL, xen_device_poll, NULL, channel); + if (ctx) { + aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), + true, xen_device_event, NULL, xen_device_poll, NULL, + channel); + } } XenEventChannel *xen_device_bind_event_channel(XenDevice *xendev, -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> is_external=true suspends fd handlers between aio_disable_external() and aio_enable_external(). The block layer's drain operation uses this mechanism to prevent new I/O from sneaking in between bdrv_drained_begin() and bdrv_drained_end(). The previous commit converted the xen-block device to use BlockDevOps .drained_begin/end() callbacks. It no longer relies on is_external=true so it is safe to pass is_external=false. This is part of ongoing work to remove the aio_disable_external() API. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-13-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- hw/xen/xen-bus.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index XXXXXXX..XXXXXXX 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -XXX,XX +XXX,XX @@ void xen_device_set_event_channel_context(XenDevice *xendev, } if (channel->ctx) - aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true, + aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false, NULL, NULL, NULL, NULL, NULL); channel->ctx = ctx; if (ctx) { aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), - true, xen_device_event, NULL, xen_device_poll, NULL, - channel); + false, xen_device_event, NULL, xen_device_poll, + NULL, channel); } } @@ -XXX,XX +XXX,XX @@ void xen_device_unbind_event_channel(XenDevice *xendev, QLIST_REMOVE(channel, list); - aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), true, + aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false, NULL, NULL, NULL, NULL, NULL); if (qemu_xen_evtchn_unbind(channel->xeh, channel->local_port) < 0) { -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> vduse_blk_detach_ctx() waits for in-flight requests using AIO_WAIT_WHILE(). This is not allowed according to a comment in bdrv_set_aio_context_commit(): /* * Take the old AioContex when detaching it from bs. * At this point, new_context lock is already acquired, and we are now * also taking old_context. This is safe as long as bdrv_detach_aio_context * does not call AIO_POLL_WHILE(). */ Use this opportunity to rewrite the drain code in vduse-blk: - Use the BlockExport refcount so that vduse_blk_exp_delete() is only called when there are no more requests in flight. - Implement .drained_poll() so in-flight request coroutines are stopped by the time .bdrv_detach_aio_context() is called. - Remove AIO_WAIT_WHILE() from vduse_blk_detach_ctx() to solve the .bdrv_detach_aio_context() constraint violation. It's no longer needed due to the previous changes. - Always handle the VDUSE file descriptor, even in drained sections. The VDUSE file descriptor doesn't submit I/O, so it's safe to handle it in drained sections. This ensures that the VDUSE kernel code gets a fast response. - Suspend virtqueue fd handlers in .drained_begin() and resume them in .drained_end(). This eliminates the need for the aio_set_fd_handler(is_external=true) flag, which is being removed from QEMU. This is a long list but splitting it into individual commits would probably lead to git bisect failures - the changes are all related. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-14-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/export/vduse-blk.c | 132 +++++++++++++++++++++++++++------------ 1 file changed, 93 insertions(+), 39 deletions(-) diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c index XXXXXXX..XXXXXXX 100644 --- a/block/export/vduse-blk.c +++ b/block/export/vduse-blk.c @@ -XXX,XX +XXX,XX @@ typedef struct VduseBlkExport { VduseDev *dev; uint16_t num_queues; char *recon_file; - unsigned int inflight; + unsigned int inflight; /* atomic */ + bool vqs_started; } VduseBlkExport; typedef struct VduseBlkReq { @@ -XXX,XX +XXX,XX @@ typedef struct VduseBlkReq { static void vduse_blk_inflight_inc(VduseBlkExport *vblk_exp) { - vblk_exp->inflight++; + if (qatomic_fetch_inc(&vblk_exp->inflight) == 0) { + /* Prevent export from being deleted */ + aio_context_acquire(vblk_exp->export.ctx); + blk_exp_ref(&vblk_exp->export); + aio_context_release(vblk_exp->export.ctx); + } } static void vduse_blk_inflight_dec(VduseBlkExport *vblk_exp) { - if (--vblk_exp->inflight == 0) { + if (qatomic_fetch_dec(&vblk_exp->inflight) == 1) { + /* Wake AIO_WAIT_WHILE() */ aio_wait_kick(); + + /* Now the export can be deleted */ + aio_context_acquire(vblk_exp->export.ctx); + blk_exp_unref(&vblk_exp->export); + aio_context_release(vblk_exp->export.ctx); } } @@ -XXX,XX +XXX,XX @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq) { VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev); + if (!vblk_exp->vqs_started) { + return; /* vduse_blk_drained_end() will start vqs later */ + } + aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq), - true, on_vduse_vq_kick, NULL, NULL, NULL, vq); + false, on_vduse_vq_kick, NULL, NULL, NULL, vq); /* Make sure we don't miss any kick afer reconnecting */ eventfd_write(vduse_queue_get_fd(vq), 1); } @@ -XXX,XX +XXX,XX @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq) static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq) { VduseBlkExport *vblk_exp = vduse_dev_get_priv(dev); + int fd = vduse_queue_get_fd(vq); - aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq), - true, NULL, NULL, NULL, NULL, NULL); + if (fd < 0) { + return; + } + + aio_set_fd_handler(vblk_exp->export.ctx, fd, false, + NULL, NULL, NULL, NULL, NULL); } static const VduseOps vduse_blk_ops = { @@ -XXX,XX +XXX,XX @@ static void on_vduse_dev_kick(void *opaque) static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx) { - int i; - aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev), - true, on_vduse_dev_kick, NULL, NULL, NULL, + false, on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev); - for (i = 0; i < vblk_exp->num_queues; i++) { - VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i); - int fd = vduse_queue_get_fd(vq); - - if (fd < 0) { - continue; - } - aio_set_fd_handler(vblk_exp->export.ctx, fd, true, - on_vduse_vq_kick, NULL, NULL, NULL, vq); - } + /* Virtqueues are handled by vduse_blk_drained_end() */ } static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp) { - int i; - - for (i = 0; i < vblk_exp->num_queues; i++) { - VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i); - int fd = vduse_queue_get_fd(vq); - - if (fd < 0) { - continue; - } - aio_set_fd_handler(vblk_exp->export.ctx, fd, - true, NULL, NULL, NULL, NULL, NULL); - } aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev), - true, NULL, NULL, NULL, NULL, NULL); + false, NULL, NULL, NULL, NULL, NULL); - AIO_WAIT_WHILE(vblk_exp->export.ctx, vblk_exp->inflight > 0); + /* Virtqueues are handled by vduse_blk_drained_begin() */ } @@ -XXX,XX +XXX,XX @@ static void vduse_blk_resize(void *opaque) (char *)&config.capacity); } +static void vduse_blk_stop_virtqueues(VduseBlkExport *vblk_exp) +{ + for (uint16_t i = 0; i < vblk_exp->num_queues; i++) { + VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i); + vduse_blk_disable_queue(vblk_exp->dev, vq); + } + + vblk_exp->vqs_started = false; +} + +static void vduse_blk_start_virtqueues(VduseBlkExport *vblk_exp) +{ + vblk_exp->vqs_started = true; + + for (uint16_t i = 0; i < vblk_exp->num_queues; i++) { + VduseVirtq *vq = vduse_dev_get_queue(vblk_exp->dev, i); + vduse_blk_enable_queue(vblk_exp->dev, vq); + } +} + +static void vduse_blk_drained_begin(void *opaque) +{ + BlockExport *exp = opaque; + VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export); + + vduse_blk_stop_virtqueues(vblk_exp); +} + +static void vduse_blk_drained_end(void *opaque) +{ + BlockExport *exp = opaque; + VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export); + + vduse_blk_start_virtqueues(vblk_exp); +} + +static bool vduse_blk_drained_poll(void *opaque) +{ + BlockExport *exp = opaque; + VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export); + + return qatomic_read(&vblk_exp->inflight) > 0; +} + static const BlockDevOps vduse_block_ops = { - .resize_cb = vduse_blk_resize, + .resize_cb = vduse_blk_resize, + .drained_begin = vduse_blk_drained_begin, + .drained_end = vduse_blk_drained_end, + .drained_poll = vduse_blk_drained_poll, }; static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts, @@ -XXX,XX +XXX,XX @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts, vblk_exp->handler.serial = g_strdup(vblk_opts->serial ?: ""); vblk_exp->handler.logical_block_size = logical_block_size; vblk_exp->handler.writable = opts->writable; + vblk_exp->vqs_started = true; config.capacity = cpu_to_le64(blk_getlength(exp->blk) >> VIRTIO_BLK_SECTOR_BITS); @@ -XXX,XX +XXX,XX @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts, vduse_dev_setup_queue(vblk_exp->dev, i, queue_size); } - aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), true, + aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), false, on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev); blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach, vblk_exp); - blk_set_dev_ops(exp->blk, &vduse_block_ops, exp); + /* + * We handle draining ourselves using an in-flight counter and by disabling + * virtqueue fd handlers. Do not queue BlockBackend requests, they need to + * complete so the in-flight counter reaches zero. + */ + blk_set_disable_request_queuing(exp->blk, true); + return 0; err: vduse_dev_destroy(vblk_exp->dev); @@ -XXX,XX +XXX,XX @@ static void vduse_blk_exp_delete(BlockExport *exp) VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export); int ret; + assert(qatomic_read(&vblk_exp->inflight) == 0); + + vduse_blk_detach_ctx(vblk_exp); blk_remove_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach, vblk_exp); ret = vduse_dev_destroy(vblk_exp->dev); @@ -XXX,XX +XXX,XX @@ static void vduse_blk_exp_delete(BlockExport *exp) g_free(vblk_exp->handler.serial); } +/* Called with exp->ctx acquired */ static void vduse_blk_exp_request_shutdown(BlockExport *exp) { VduseBlkExport *vblk_exp = container_of(exp, VduseBlkExport, export); - aio_context_acquire(vblk_exp->export.ctx); - vduse_blk_detach_ctx(vblk_exp); - aio_context_acquire(vblk_exp->export.ctx); + vduse_blk_stop_virtqueues(vblk_exp); } const BlockExportDriver blk_exp_vduse_blk = { -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> The FUSE export calls blk_exp_ref/unref() without the AioContext lock. Instead of fixing the FUSE export, adjust blk_exp_ref/unref() so they work without the AioContext lock. This way it's less error-prone. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-15-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/block/export.h | 2 ++ block/export/export.c | 13 ++++++------- block/export/vduse-blk.c | 4 ---- 3 files changed, 8 insertions(+), 11 deletions(-) diff --git a/include/block/export.h b/include/block/export.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/export.h +++ b/include/block/export.h @@ -XXX,XX +XXX,XX @@ struct BlockExport { * Reference count for this block export. This includes strong references * both from the owner (qemu-nbd or the monitor) and clients connected to * the export. + * + * Use atomics to access this field. */ int refcount; diff --git a/block/export/export.c b/block/export/export.c index XXXXXXX..XXXXXXX 100644 --- a/block/export/export.c +++ b/block/export/export.c @@ -XXX,XX +XXX,XX @@ fail: return NULL; } -/* Callers must hold exp->ctx lock */ void blk_exp_ref(BlockExport *exp) { - assert(exp->refcount > 0); - exp->refcount++; + assert(qatomic_read(&exp->refcount) > 0); + qatomic_inc(&exp->refcount); } /* Runs in the main thread */ @@ -XXX,XX +XXX,XX @@ static void blk_exp_delete_bh(void *opaque) aio_context_release(aio_context); } -/* Callers must hold exp->ctx lock */ void blk_exp_unref(BlockExport *exp) { - assert(exp->refcount > 0); - if (--exp->refcount == 0) { + assert(qatomic_read(&exp->refcount) > 0); + if (qatomic_fetch_dec(&exp->refcount) == 1) { /* Touch the block_exports list only in the main thread */ aio_bh_schedule_oneshot(qemu_get_aio_context(), blk_exp_delete_bh, exp); @@ -XXX,XX +XXX,XX @@ void qmp_block_export_del(const char *id, if (!has_mode) { mode = BLOCK_EXPORT_REMOVE_MODE_SAFE; } - if (mode == BLOCK_EXPORT_REMOVE_MODE_SAFE && exp->refcount > 1) { + if (mode == BLOCK_EXPORT_REMOVE_MODE_SAFE && + qatomic_read(&exp->refcount) > 1) { error_setg(errp, "export '%s' still in use", exp->id); error_append_hint(errp, "Use mode='hard' to force client " "disconnect\n"); diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c index XXXXXXX..XXXXXXX 100644 --- a/block/export/vduse-blk.c +++ b/block/export/vduse-blk.c @@ -XXX,XX +XXX,XX @@ static void vduse_blk_inflight_inc(VduseBlkExport *vblk_exp) { if (qatomic_fetch_inc(&vblk_exp->inflight) == 0) { /* Prevent export from being deleted */ - aio_context_acquire(vblk_exp->export.ctx); blk_exp_ref(&vblk_exp->export); - aio_context_release(vblk_exp->export.ctx); } } @@ -XXX,XX +XXX,XX @@ static void vduse_blk_inflight_dec(VduseBlkExport *vblk_exp) aio_wait_kick(); /* Now the export can be deleted */ - aio_context_acquire(vblk_exp->export.ctx); blk_exp_unref(&vblk_exp->export); - aio_context_release(vblk_exp->export.ctx); } } -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> This is part of ongoing work to remove the aio_disable_external() API. Use BlockDevOps .drained_begin/end/poll() instead of aio_set_fd_handler(is_external=true). As a side-effect the FUSE export now follows AioContext changes like the other export types. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20230516190238.8401-16-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/export/fuse.c | 56 +++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 54 insertions(+), 2 deletions(-) diff --git a/block/export/fuse.c b/block/export/fuse.c index XXXXXXX..XXXXXXX 100644 --- a/block/export/fuse.c +++ b/block/export/fuse.c @@ -XXX,XX +XXX,XX @@ typedef struct FuseExport { struct fuse_session *fuse_session; struct fuse_buf fuse_buf; + unsigned int in_flight; /* atomic */ bool mounted, fd_handler_set_up; char *mountpoint; @@ -XXX,XX +XXX,XX @@ static void read_from_fuse_export(void *opaque); static bool is_regular_file(const char *path, Error **errp); +static void fuse_export_drained_begin(void *opaque) +{ + FuseExport *exp = opaque; + + aio_set_fd_handler(exp->common.ctx, + fuse_session_fd(exp->fuse_session), false, + NULL, NULL, NULL, NULL, NULL); + exp->fd_handler_set_up = false; +} + +static void fuse_export_drained_end(void *opaque) +{ + FuseExport *exp = opaque; + + /* Refresh AioContext in case it changed */ + exp->common.ctx = blk_get_aio_context(exp->common.blk); + + aio_set_fd_handler(exp->common.ctx, + fuse_session_fd(exp->fuse_session), false, + read_from_fuse_export, NULL, NULL, NULL, exp); + exp->fd_handler_set_up = true; +} + +static bool fuse_export_drained_poll(void *opaque) +{ + FuseExport *exp = opaque; + + return qatomic_read(&exp->in_flight) > 0; +} + +static const BlockDevOps fuse_export_blk_dev_ops = { + .drained_begin = fuse_export_drained_begin, + .drained_end = fuse_export_drained_end, + .drained_poll = fuse_export_drained_poll, +}; + static int fuse_export_create(BlockExport *blk_exp, BlockExportOptions *blk_exp_args, Error **errp) @@ -XXX,XX +XXX,XX @@ static int fuse_export_create(BlockExport *blk_exp, } } + blk_set_dev_ops(exp->common.blk, &fuse_export_blk_dev_ops, exp); + + /* + * We handle draining ourselves using an in-flight counter and by disabling + * the FUSE fd handler. Do not queue BlockBackend requests, they need to + * complete so the in-flight counter reaches zero. + */ + blk_set_disable_request_queuing(exp->common.blk, true); + init_exports_table(); /* @@ -XXX,XX +XXX,XX @@ static int setup_fuse_export(FuseExport *exp, const char *mountpoint, g_hash_table_insert(exports, g_strdup(mountpoint), NULL); aio_set_fd_handler(exp->common.ctx, - fuse_session_fd(exp->fuse_session), true, + fuse_session_fd(exp->fuse_session), false, read_from_fuse_export, NULL, NULL, NULL, exp); exp->fd_handler_set_up = true; @@ -XXX,XX +XXX,XX @@ static void read_from_fuse_export(void *opaque) blk_exp_ref(&exp->common); + qatomic_inc(&exp->in_flight); + do { ret = fuse_session_receive_buf(exp->fuse_session, &exp->fuse_buf); } while (ret == -EINTR); @@ -XXX,XX +XXX,XX @@ static void read_from_fuse_export(void *opaque) fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf); out: + if (qatomic_fetch_dec(&exp->in_flight) == 1) { + aio_wait_kick(); /* wake AIO_WAIT_WHILE() */ + } + blk_exp_unref(&exp->common); } @@ -XXX,XX +XXX,XX @@ static void fuse_export_shutdown(BlockExport *blk_exp) if (exp->fd_handler_set_up) { aio_set_fd_handler(exp->common.ctx, - fuse_session_fd(exp->fuse_session), true, + fuse_session_fd(exp->fuse_session), false, NULL, NULL, NULL, NULL, NULL); exp->fd_handler_set_up = false; } -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> virtio_queue_aio_detach_host_notifier() does two things: 1. It removes the fd handler from the event loop. 2. It processes the virtqueue one last time. The first step can be peformed by any thread and without taking the AioContext lock. The second step may need the AioContext lock (depending on the device implementation) and runs in the thread where request processing takes place. virtio-blk and virtio-scsi therefore call virtio_queue_aio_detach_host_notifier() from a BH that is scheduled in AioContext. The next patch will introduce a .drained_begin() function that needs to call virtio_queue_aio_detach_host_notifier(). .drained_begin() functions cannot call aio_poll() to wait synchronously for the BH. It is possible for a .drained_poll() callback to asynchronously wait for the BH, but that is more complex than necessary here. Move the virtqueue processing out to the callers of virtio_queue_aio_detach_host_notifier() so that the function can be called from any thread. This is in preparation for the next patch. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230516190238.8401-17-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- hw/block/dataplane/virtio-blk.c | 7 +++++++ hw/scsi/virtio-scsi-dataplane.c | 14 ++++++++++++++ hw/virtio/virtio.c | 3 --- 3 files changed, 21 insertions(+), 3 deletions(-) diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index XXXXXXX..XXXXXXX 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -XXX,XX +XXX,XX @@ static void virtio_blk_data_plane_stop_bh(void *opaque) for (i = 0; i < s->conf->num_queues; i++) { VirtQueue *vq = virtio_get_queue(s->vdev, i); + EventNotifier *host_notifier = virtio_queue_get_host_notifier(vq); virtio_queue_aio_detach_host_notifier(vq, s->ctx); + + /* + * Test and clear notifier after disabling event, in case poll callback + * didn't have time to run. + */ + virtio_queue_host_notifier_read(host_notifier); } } diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c index XXXXXXX..XXXXXXX 100644 --- a/hw/scsi/virtio-scsi-dataplane.c +++ b/hw/scsi/virtio-scsi-dataplane.c @@ -XXX,XX +XXX,XX @@ static void virtio_scsi_dataplane_stop_bh(void *opaque) { VirtIOSCSI *s = opaque; VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(s); + EventNotifier *host_notifier; int i; virtio_queue_aio_detach_host_notifier(vs->ctrl_vq, s->ctx); + host_notifier = virtio_queue_get_host_notifier(vs->ctrl_vq); + + /* + * Test and clear notifier after disabling event, in case poll callback + * didn't have time to run. + */ + virtio_queue_host_notifier_read(host_notifier); + virtio_queue_aio_detach_host_notifier(vs->event_vq, s->ctx); + host_notifier = virtio_queue_get_host_notifier(vs->event_vq); + virtio_queue_host_notifier_read(host_notifier); + for (i = 0; i < vs->conf.num_queues; i++) { virtio_queue_aio_detach_host_notifier(vs->cmd_vqs[i], s->ctx); + host_notifier = virtio_queue_get_host_notifier(vs->cmd_vqs[i]); + virtio_queue_host_notifier_read(host_notifier); } } diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index XXXXXXX..XXXXXXX 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -XXX,XX +XXX,XX @@ void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext *ct void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx) { aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL, NULL); - /* Test and clear notifier before after disabling event, - * in case poll callback didn't have time to run. */ - virtio_queue_host_notifier_read(&vq->host_notifier); } void virtio_queue_host_notifier_read(EventNotifier *n) -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> Detach ioeventfds during drained sections to stop I/O submission from the guest. virtio-blk is no longer reliant on aio_disable_external() after this patch. This will allow us to remove the aio_disable_external() API once all other code that relies on it is converted. Take extra care to avoid attaching/detaching ioeventfds if the data plane is started/stopped during a drained section. This should be rare, but maybe the mirror block job can trigger it. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230516190238.8401-18-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- hw/block/dataplane/virtio-blk.c | 16 ++++++++------ hw/block/virtio-blk.c | 38 ++++++++++++++++++++++++++++++++- 2 files changed, 47 insertions(+), 7 deletions(-) diff --git a/hw/block/dataplane/virtio-blk.c b/hw/block/dataplane/virtio-blk.c index XXXXXXX..XXXXXXX 100644 --- a/hw/block/dataplane/virtio-blk.c +++ b/hw/block/dataplane/virtio-blk.c @@ -XXX,XX +XXX,XX @@ int virtio_blk_data_plane_start(VirtIODevice *vdev) } /* Get this show started by hooking up our callbacks */ - aio_context_acquire(s->ctx); - for (i = 0; i < nvqs; i++) { - VirtQueue *vq = virtio_get_queue(s->vdev, i); + if (!blk_in_drain(s->conf->conf.blk)) { + aio_context_acquire(s->ctx); + for (i = 0; i < nvqs; i++) { + VirtQueue *vq = virtio_get_queue(s->vdev, i); - virtio_queue_aio_attach_host_notifier(vq, s->ctx); + virtio_queue_aio_attach_host_notifier(vq, s->ctx); + } + aio_context_release(s->ctx); } - aio_context_release(s->ctx); return 0; fail_aio_context: @@ -XXX,XX +XXX,XX @@ void virtio_blk_data_plane_stop(VirtIODevice *vdev) s->stopping = true; trace_virtio_blk_data_plane_stop(s); - aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s); + if (!blk_in_drain(s->conf->conf.blk)) { + aio_wait_bh_oneshot(s->ctx, virtio_blk_data_plane_stop_bh, s); + } aio_context_acquire(s->ctx); diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index XXXXXXX..XXXXXXX 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -XXX,XX +XXX,XX @@ static void virtio_blk_resize(void *opaque) aio_bh_schedule_oneshot(qemu_get_aio_context(), virtio_resize_cb, vdev); } +/* Suspend virtqueue ioeventfd processing during drain */ +static void virtio_blk_drained_begin(void *opaque) +{ + VirtIOBlock *s = opaque; + VirtIODevice *vdev = VIRTIO_DEVICE(opaque); + AioContext *ctx = blk_get_aio_context(s->conf.conf.blk); + + if (!s->dataplane || !s->dataplane_started) { + return; + } + + for (uint16_t i = 0; i < s->conf.num_queues; i++) { + VirtQueue *vq = virtio_get_queue(vdev, i); + virtio_queue_aio_detach_host_notifier(vq, ctx); + } +} + +/* Resume virtqueue ioeventfd processing after drain */ +static void virtio_blk_drained_end(void *opaque) +{ + VirtIOBlock *s = opaque; + VirtIODevice *vdev = VIRTIO_DEVICE(opaque); + AioContext *ctx = blk_get_aio_context(s->conf.conf.blk); + + if (!s->dataplane || !s->dataplane_started) { + return; + } + + for (uint16_t i = 0; i < s->conf.num_queues; i++) { + VirtQueue *vq = virtio_get_queue(vdev, i); + virtio_queue_aio_attach_host_notifier(vq, ctx); + } +} + static const BlockDevOps virtio_block_ops = { - .resize_cb = virtio_blk_resize, + .resize_cb = virtio_blk_resize, + .drained_begin = virtio_blk_drained_begin, + .drained_end = virtio_blk_drained_end, }; static void virtio_blk_device_realize(DeviceState *dev, Error **errp) -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> The virtio-scsi Host Bus Adapter provides access to devices on a SCSI bus. Those SCSI devices typically have a BlockBackend. When the BlockBackend enters a drained section, the SCSI device must temporarily stop submitting new I/O requests. Implement this behavior by temporarily stopping virtio-scsi virtqueue processing when one of the SCSI devices enters a drained section. The new scsi_device_drained_begin() API allows scsi-disk to message the virtio-scsi HBA. scsi_device_drained_begin() uses a drain counter so that multiple SCSI devices can have overlapping drained sections. The HBA only sees one pair of .drained_begin/end() calls. After this commit, virtio-scsi no longer depends on hw/virtio's ioeventfd aio_set_event_notifier(is_external=true). This commit is a step towards removing the aio_disable_external() API. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230516190238.8401-19-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/hw/scsi/scsi.h | 14 ++++++++++++ hw/scsi/scsi-bus.c | 40 +++++++++++++++++++++++++++++++++ hw/scsi/scsi-disk.c | 27 +++++++++++++++++----- hw/scsi/virtio-scsi-dataplane.c | 18 +++++++++------ hw/scsi/virtio-scsi.c | 38 +++++++++++++++++++++++++++++++ hw/scsi/trace-events | 2 ++ 6 files changed, 127 insertions(+), 12 deletions(-) diff --git a/include/hw/scsi/scsi.h b/include/hw/scsi/scsi.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/scsi/scsi.h +++ b/include/hw/scsi/scsi.h @@ -XXX,XX +XXX,XX @@ struct SCSIBusInfo { void (*save_request)(QEMUFile *f, SCSIRequest *req); void *(*load_request)(QEMUFile *f, SCSIRequest *req); void (*free_request)(SCSIBus *bus, void *priv); + + /* + * Temporarily stop submitting new requests between drained_begin() and + * drained_end(). Called from the main loop thread with the BQL held. + * + * Implement these callbacks if request processing is triggered by a file + * descriptor like an EventNotifier. Otherwise set them to NULL. + */ + void (*drained_begin)(SCSIBus *bus); + void (*drained_end)(SCSIBus *bus); }; #define TYPE_SCSI_BUS "SCSI" @@ -XXX,XX +XXX,XX @@ struct SCSIBus { SCSISense unit_attention; const SCSIBusInfo *info; + + int drain_count; /* protected by BQL */ }; /** @@ -XXX,XX +XXX,XX @@ void scsi_req_cancel_complete(SCSIRequest *req); void scsi_req_cancel(SCSIRequest *req); void scsi_req_cancel_async(SCSIRequest *req, Notifier *notifier); void scsi_req_retry(SCSIRequest *req); +void scsi_device_drained_begin(SCSIDevice *sdev); +void scsi_device_drained_end(SCSIDevice *sdev); void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense); void scsi_device_set_ua(SCSIDevice *sdev, SCSISense sense); void scsi_device_report_change(SCSIDevice *dev, SCSISense sense); diff --git a/hw/scsi/scsi-bus.c b/hw/scsi/scsi-bus.c index XXXXXXX..XXXXXXX 100644 --- a/hw/scsi/scsi-bus.c +++ b/hw/scsi/scsi-bus.c @@ -XXX,XX +XXX,XX @@ void scsi_device_purge_requests(SCSIDevice *sdev, SCSISense sense) scsi_device_set_ua(sdev, sense); } +void scsi_device_drained_begin(SCSIDevice *sdev) +{ + SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, sdev->qdev.parent_bus); + if (!bus) { + return; + } + + assert(qemu_get_current_aio_context() == qemu_get_aio_context()); + assert(bus->drain_count < INT_MAX); + + /* + * Multiple BlockBackends can be on a SCSIBus and each may begin/end + * draining at any time. Keep a counter so HBAs only see begin/end once. + */ + if (bus->drain_count++ == 0) { + trace_scsi_bus_drained_begin(bus, sdev); + if (bus->info->drained_begin) { + bus->info->drained_begin(bus); + } + } +} + +void scsi_device_drained_end(SCSIDevice *sdev) +{ + SCSIBus *bus = DO_UPCAST(SCSIBus, qbus, sdev->qdev.parent_bus); + if (!bus) { + return; + } + + assert(qemu_get_current_aio_context() == qemu_get_aio_context()); + assert(bus->drain_count > 0); + + if (bus->drain_count-- == 1) { + trace_scsi_bus_drained_end(bus, sdev); + if (bus->info->drained_end) { + bus->info->drained_end(bus); + } + } +} + static char *scsibus_get_dev_path(DeviceState *dev) { SCSIDevice *d = SCSI_DEVICE(dev); diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index XXXXXXX..XXXXXXX 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -XXX,XX +XXX,XX @@ static void scsi_disk_reset(DeviceState *dev) s->qdev.scsi_version = s->qdev.default_scsi_version; } +static void scsi_disk_drained_begin(void *opaque) +{ + SCSIDiskState *s = opaque; + + scsi_device_drained_begin(&s->qdev); +} + +static void scsi_disk_drained_end(void *opaque) +{ + SCSIDiskState *s = opaque; + + scsi_device_drained_end(&s->qdev); +} + static void scsi_disk_resize_cb(void *opaque) { SCSIDiskState *s = opaque; @@ -XXX,XX +XXX,XX @@ static bool scsi_cd_is_medium_locked(void *opaque) } static const BlockDevOps scsi_disk_removable_block_ops = { - .change_media_cb = scsi_cd_change_media_cb, + .change_media_cb = scsi_cd_change_media_cb, + .drained_begin = scsi_disk_drained_begin, + .drained_end = scsi_disk_drained_end, .eject_request_cb = scsi_cd_eject_request_cb, - .is_tray_open = scsi_cd_is_tray_open, .is_medium_locked = scsi_cd_is_medium_locked, - - .resize_cb = scsi_disk_resize_cb, + .is_tray_open = scsi_cd_is_tray_open, + .resize_cb = scsi_disk_resize_cb, }; static const BlockDevOps scsi_disk_block_ops = { - .resize_cb = scsi_disk_resize_cb, + .drained_begin = scsi_disk_drained_begin, + .drained_end = scsi_disk_drained_end, + .resize_cb = scsi_disk_resize_cb, }; static void scsi_disk_unit_attention_reported(SCSIDevice *dev) diff --git a/hw/scsi/virtio-scsi-dataplane.c b/hw/scsi/virtio-scsi-dataplane.c index XXXXXXX..XXXXXXX 100644 --- a/hw/scsi/virtio-scsi-dataplane.c +++ b/hw/scsi/virtio-scsi-dataplane.c @@ -XXX,XX +XXX,XX @@ int virtio_scsi_dataplane_start(VirtIODevice *vdev) s->dataplane_starting = false; s->dataplane_started = true; - aio_context_acquire(s->ctx); - virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx); - virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx); + if (s->bus.drain_count == 0) { + aio_context_acquire(s->ctx); + virtio_queue_aio_attach_host_notifier(vs->ctrl_vq, s->ctx); + virtio_queue_aio_attach_host_notifier_no_poll(vs->event_vq, s->ctx); - for (i = 0; i < vs->conf.num_queues; i++) { - virtio_queue_aio_attach_host_notifier(vs->cmd_vqs[i], s->ctx); + for (i = 0; i < vs->conf.num_queues; i++) { + virtio_queue_aio_attach_host_notifier(vs->cmd_vqs[i], s->ctx); + } + aio_context_release(s->ctx); } - aio_context_release(s->ctx); return 0; fail_host_notifiers: @@ -XXX,XX +XXX,XX @@ void virtio_scsi_dataplane_stop(VirtIODevice *vdev) } s->dataplane_stopping = true; - aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s); + if (s->bus.drain_count == 0) { + aio_wait_bh_oneshot(s->ctx, virtio_scsi_dataplane_stop_bh, s); + } blk_drain_all(); /* ensure there are no in-flight requests */ diff --git a/hw/scsi/virtio-scsi.c b/hw/scsi/virtio-scsi.c index XXXXXXX..XXXXXXX 100644 --- a/hw/scsi/virtio-scsi.c +++ b/hw/scsi/virtio-scsi.c @@ -XXX,XX +XXX,XX @@ static void virtio_scsi_hotunplug(HotplugHandler *hotplug_dev, DeviceState *dev, } } +/* Suspend virtqueue ioeventfd processing during drain */ +static void virtio_scsi_drained_begin(SCSIBus *bus) +{ + VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus); + VirtIODevice *vdev = VIRTIO_DEVICE(s); + uint32_t total_queues = VIRTIO_SCSI_VQ_NUM_FIXED + + s->parent_obj.conf.num_queues; + + if (!s->dataplane_started) { + return; + } + + for (uint32_t i = 0; i < total_queues; i++) { + VirtQueue *vq = virtio_get_queue(vdev, i); + virtio_queue_aio_detach_host_notifier(vq, s->ctx); + } +} + +/* Resume virtqueue ioeventfd processing after drain */ +static void virtio_scsi_drained_end(SCSIBus *bus) +{ + VirtIOSCSI *s = container_of(bus, VirtIOSCSI, bus); + VirtIODevice *vdev = VIRTIO_DEVICE(s); + uint32_t total_queues = VIRTIO_SCSI_VQ_NUM_FIXED + + s->parent_obj.conf.num_queues; + + if (!s->dataplane_started) { + return; + } + + for (uint32_t i = 0; i < total_queues; i++) { + VirtQueue *vq = virtio_get_queue(vdev, i); + virtio_queue_aio_attach_host_notifier(vq, s->ctx); + } +} + static struct SCSIBusInfo virtio_scsi_scsi_info = { .tcq = true, .max_channel = VIRTIO_SCSI_MAX_CHANNEL, @@ -XXX,XX +XXX,XX @@ static struct SCSIBusInfo virtio_scsi_scsi_info = { .get_sg_list = virtio_scsi_get_sg_list, .save_request = virtio_scsi_save_request, .load_request = virtio_scsi_load_request, + .drained_begin = virtio_scsi_drained_begin, + .drained_end = virtio_scsi_drained_end, }; void virtio_scsi_common_realize(DeviceState *dev, diff --git a/hw/scsi/trace-events b/hw/scsi/trace-events index XXXXXXX..XXXXXXX 100644 --- a/hw/scsi/trace-events +++ b/hw/scsi/trace-events @@ -XXX,XX +XXX,XX @@ scsi_req_cancel(int target, int lun, int tag) "target %d lun %d tag %d" scsi_req_data(int target, int lun, int tag, int len) "target %d lun %d tag %d len %d" scsi_req_data_canceled(int target, int lun, int tag, int len) "target %d lun %d tag %d len %d" scsi_req_dequeue(int target, int lun, int tag) "target %d lun %d tag %d" +scsi_bus_drained_begin(void *bus, void *sdev) "bus %p sdev %p" +scsi_bus_drained_end(void *bus, void *sdev) "bus %p sdev %p" scsi_req_continue(int target, int lun, int tag) "target %d lun %d tag %d" scsi_req_continue_canceled(int target, int lun, int tag) "target %d lun %d tag %d" scsi_req_parsed(int target, int lun, int tag, int cmd, int mode, int xfer) "target %d lun %d tag %d command %d dir %d length %d" -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> Host notifiers can now use is_external=false since virtio-blk and virtio-scsi no longer rely on is_external=true for drained sections. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230516190238.8401-20-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- hw/virtio/virtio.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index XXXXXXX..XXXXXXX 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -XXX,XX +XXX,XX @@ static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n) void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx) { - aio_set_event_notifier(ctx, &vq->host_notifier, true, + aio_set_event_notifier(ctx, &vq->host_notifier, false, virtio_queue_host_notifier_read, virtio_queue_host_notifier_aio_poll, virtio_queue_host_notifier_aio_poll_ready); @@ -XXX,XX +XXX,XX @@ void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx) */ void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext *ctx) { - aio_set_event_notifier(ctx, &vq->host_notifier, true, + aio_set_event_notifier(ctx, &vq->host_notifier, false, virtio_queue_host_notifier_read, NULL, NULL); } void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx) { - aio_set_event_notifier(ctx, &vq->host_notifier, true, NULL, NULL, NULL); + aio_set_event_notifier(ctx, &vq->host_notifier, false, NULL, NULL, NULL); } void virtio_queue_host_notifier_read(EventNotifier *n) -- 2.40.1
From: Stefan Hajnoczi <stefanha@redhat.com> All callers now pass is_external=false to aio_set_fd_handler() and aio_set_event_notifier(). The aio_disable_external() API that temporarily disables fd handlers that were registered is_external=true is therefore dead code. Remove aio_disable_external(), aio_enable_external(), and the is_external arguments to aio_set_fd_handler() and aio_set_event_notifier(). The entire test-fdmon-epoll test is removed because its sole purpose was testing aio_disable_external(). Parts of this patch were generated using the following coccinelle (https://coccinelle.lip6.fr/) semantic patch: @@ expression ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque; @@ - aio_set_fd_handler(ctx, fd, is_external, io_read, io_write, io_poll, io_poll_ready, opaque) + aio_set_fd_handler(ctx, fd, io_read, io_write, io_poll, io_poll_ready, opaque) @@ expression ctx, notifier, is_external, io_read, io_poll, io_poll_ready; @@ - aio_set_event_notifier(ctx, notifier, is_external, io_read, io_poll, io_poll_ready) + aio_set_event_notifier(ctx, notifier, io_read, io_poll, io_poll_ready) Reviewed-by: Juan Quintela <quintela@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-Id: <20230516190238.8401-21-stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/block/aio.h | 57 ------------------------ util/aio-posix.h | 1 - block.c | 7 --- block/blkio.c | 15 +++---- block/curl.c | 10 ++--- block/export/fuse.c | 8 ++-- block/export/vduse-blk.c | 10 ++--- block/io.c | 2 - block/io_uring.c | 4 +- block/iscsi.c | 3 +- block/linux-aio.c | 4 +- block/nfs.c | 5 +-- block/nvme.c | 8 ++-- block/ssh.c | 4 +- block/win32-aio.c | 6 +-- hw/i386/kvm/xen_xenstore.c | 2 +- hw/virtio/virtio.c | 6 +-- hw/xen/xen-bus.c | 8 ++-- io/channel-command.c | 6 +-- io/channel-file.c | 3 +- io/channel-socket.c | 3 +- migration/rdma.c | 16 +++---- tests/unit/test-aio.c | 27 +----------- tests/unit/test-bdrv-drain.c | 1 - tests/unit/test-fdmon-epoll.c | 73 ------------------------------- tests/unit/test-nested-aio-poll.c | 9 ++-- util/aio-posix.c | 20 +++------ util/aio-win32.c | 8 +--- util/async.c | 3 +- util/fdmon-epoll.c | 10 ----- util/fdmon-io_uring.c | 8 +--- util/fdmon-poll.c | 3 +- util/main-loop.c | 7 ++- util/qemu-coroutine-io.c | 7 ++- util/vhost-user-server.c | 11 +++-- tests/unit/meson.build | 3 -- 36 files changed, 80 insertions(+), 298 deletions(-) delete mode 100644 tests/unit/test-fdmon-epoll.c diff --git a/include/block/aio.h b/include/block/aio.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -XXX,XX +XXX,XX @@ struct AioContext { */ QEMUTimerListGroup tlg; - int external_disable_cnt; - /* Number of AioHandlers without .io_poll() */ int poll_disable_cnt; @@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking); */ void aio_set_fd_handler(AioContext *ctx, int fd, - bool is_external, IOHandler *io_read, IOHandler *io_write, AioPollFn *io_poll, @@ -XXX,XX +XXX,XX @@ void aio_set_fd_handler(AioContext *ctx, */ void aio_set_event_notifier(AioContext *ctx, EventNotifier *notifier, - bool is_external, EventNotifierHandler *io_read, AioPollFn *io_poll, EventNotifierHandler *io_poll_ready); @@ -XXX,XX +XXX,XX @@ static inline void aio_timer_init(AioContext *ctx, */ int64_t aio_compute_timeout(AioContext *ctx); -/** - * aio_disable_external: - * @ctx: the aio context - * - * Disable the further processing of external clients. - */ -static inline void aio_disable_external(AioContext *ctx) -{ - qatomic_inc(&ctx->external_disable_cnt); -} - -/** - * aio_enable_external: - * @ctx: the aio context - * - * Enable the processing of external clients. - */ -static inline void aio_enable_external(AioContext *ctx) -{ - int old; - - old = qatomic_fetch_dec(&ctx->external_disable_cnt); - assert(old > 0); - if (old == 1) { - /* Kick event loop so it re-arms file descriptors */ - aio_notify(ctx); - } -} - -/** - * aio_external_disabled: - * @ctx: the aio context - * - * Return true if the external clients are disabled. - */ -static inline bool aio_external_disabled(AioContext *ctx) -{ - return qatomic_read(&ctx->external_disable_cnt); -} - -/** - * aio_node_check: - * @ctx: the aio context - * @is_external: Whether or not the checked node is an external event source. - * - * Check if the node's is_external flag is okay to be polled by the ctx at this - * moment. True means green light. - */ -static inline bool aio_node_check(AioContext *ctx, bool is_external) -{ - return !is_external || !qatomic_read(&ctx->external_disable_cnt); -} - /** * aio_co_schedule: * @ctx: the aio context diff --git a/util/aio-posix.h b/util/aio-posix.h index XXXXXXX..XXXXXXX 100644 --- a/util/aio-posix.h +++ b/util/aio-posix.h @@ -XXX,XX +XXX,XX @@ struct AioHandler { #endif int64_t poll_idle_timeout; /* when to stop userspace polling */ bool poll_ready; /* has polling detected an event? */ - bool is_external; }; /* Add a handler to a ready list */ diff --git a/block.c b/block.c index XXXXXXX..XXXXXXX 100644 --- a/block.c +++ b/block.c @@ -XXX,XX +XXX,XX @@ static void bdrv_detach_aio_context(BlockDriverState *bs) bs->drv->bdrv_detach_aio_context(bs); } - if (bs->quiesce_counter) { - aio_enable_external(bs->aio_context); - } bs->aio_context = NULL; } @@ -XXX,XX +XXX,XX @@ static void bdrv_attach_aio_context(BlockDriverState *bs, BdrvAioNotifier *ban, *ban_tmp; GLOBAL_STATE_CODE(); - if (bs->quiesce_counter) { - aio_disable_external(new_context); - } - bs->aio_context = new_context; if (bs->drv && bs->drv->bdrv_attach_aio_context) { diff --git a/block/blkio.c b/block/blkio.c index XXXXXXX..XXXXXXX 100644 --- a/block/blkio.c +++ b/block/blkio.c @@ -XXX,XX +XXX,XX @@ static void blkio_attach_aio_context(BlockDriverState *bs, { BDRVBlkioState *s = bs->opaque; - aio_set_fd_handler(new_context, - s->completion_fd, - false, - blkio_completion_fd_read, - NULL, + aio_set_fd_handler(new_context, s->completion_fd, + blkio_completion_fd_read, NULL, blkio_completion_fd_poll, - blkio_completion_fd_poll_ready, - bs); + blkio_completion_fd_poll_ready, bs); } static void blkio_detach_aio_context(BlockDriverState *bs) { BDRVBlkioState *s = bs->opaque; - aio_set_fd_handler(bdrv_get_aio_context(bs), - s->completion_fd, - false, NULL, NULL, NULL, NULL, NULL); + aio_set_fd_handler(bdrv_get_aio_context(bs), s->completion_fd, NULL, NULL, + NULL, NULL, NULL); } /* Call with s->blkio_lock held to submit I/O after enqueuing a new request */ diff --git a/block/curl.c b/block/curl.c index XXXXXXX..XXXXXXX 100644 --- a/block/curl.c +++ b/block/curl.c @@ -XXX,XX +XXX,XX @@ static gboolean curl_drop_socket(void *key, void *value, void *opaque) CURLSocket *socket = value; BDRVCURLState *s = socket->s; - aio_set_fd_handler(s->aio_context, socket->fd, false, + aio_set_fd_handler(s->aio_context, socket->fd, NULL, NULL, NULL, NULL, NULL); return true; } @@ -XXX,XX +XXX,XX @@ static int curl_sock_cb(CURL *curl, curl_socket_t fd, int action, trace_curl_sock_cb(action, (int)fd); switch (action) { case CURL_POLL_IN: - aio_set_fd_handler(s->aio_context, fd, false, + aio_set_fd_handler(s->aio_context, fd, curl_multi_do, NULL, NULL, NULL, socket); break; case CURL_POLL_OUT: - aio_set_fd_handler(s->aio_context, fd, false, + aio_set_fd_handler(s->aio_context, fd, NULL, curl_multi_do, NULL, NULL, socket); break; case CURL_POLL_INOUT: - aio_set_fd_handler(s->aio_context, fd, false, + aio_set_fd_handler(s->aio_context, fd, curl_multi_do, curl_multi_do, NULL, NULL, socket); break; case CURL_POLL_REMOVE: - aio_set_fd_handler(s->aio_context, fd, false, + aio_set_fd_handler(s->aio_context, fd, NULL, NULL, NULL, NULL, NULL); break; } diff --git a/block/export/fuse.c b/block/export/fuse.c index XXXXXXX..XXXXXXX 100644 --- a/block/export/fuse.c +++ b/block/export/fuse.c @@ -XXX,XX +XXX,XX @@ static void fuse_export_drained_begin(void *opaque) FuseExport *exp = opaque; aio_set_fd_handler(exp->common.ctx, - fuse_session_fd(exp->fuse_session), false, + fuse_session_fd(exp->fuse_session), NULL, NULL, NULL, NULL, NULL); exp->fd_handler_set_up = false; } @@ -XXX,XX +XXX,XX @@ static void fuse_export_drained_end(void *opaque) exp->common.ctx = blk_get_aio_context(exp->common.blk); aio_set_fd_handler(exp->common.ctx, - fuse_session_fd(exp->fuse_session), false, + fuse_session_fd(exp->fuse_session), read_from_fuse_export, NULL, NULL, NULL, exp); exp->fd_handler_set_up = true; } @@ -XXX,XX +XXX,XX @@ static int setup_fuse_export(FuseExport *exp, const char *mountpoint, g_hash_table_insert(exports, g_strdup(mountpoint), NULL); aio_set_fd_handler(exp->common.ctx, - fuse_session_fd(exp->fuse_session), false, + fuse_session_fd(exp->fuse_session), read_from_fuse_export, NULL, NULL, NULL, exp); exp->fd_handler_set_up = true; @@ -XXX,XX +XXX,XX @@ static void fuse_export_shutdown(BlockExport *blk_exp) if (exp->fd_handler_set_up) { aio_set_fd_handler(exp->common.ctx, - fuse_session_fd(exp->fuse_session), false, + fuse_session_fd(exp->fuse_session), NULL, NULL, NULL, NULL, NULL); exp->fd_handler_set_up = false; } diff --git a/block/export/vduse-blk.c b/block/export/vduse-blk.c index XXXXXXX..XXXXXXX 100644 --- a/block/export/vduse-blk.c +++ b/block/export/vduse-blk.c @@ -XXX,XX +XXX,XX @@ static void vduse_blk_enable_queue(VduseDev *dev, VduseVirtq *vq) } aio_set_fd_handler(vblk_exp->export.ctx, vduse_queue_get_fd(vq), - false, on_vduse_vq_kick, NULL, NULL, NULL, vq); + on_vduse_vq_kick, NULL, NULL, NULL, vq); /* Make sure we don't miss any kick afer reconnecting */ eventfd_write(vduse_queue_get_fd(vq), 1); } @@ -XXX,XX +XXX,XX @@ static void vduse_blk_disable_queue(VduseDev *dev, VduseVirtq *vq) return; } - aio_set_fd_handler(vblk_exp->export.ctx, fd, false, + aio_set_fd_handler(vblk_exp->export.ctx, fd, NULL, NULL, NULL, NULL, NULL); } @@ -XXX,XX +XXX,XX @@ static void on_vduse_dev_kick(void *opaque) static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx) { aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev), - false, on_vduse_dev_kick, NULL, NULL, NULL, + on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev); /* Virtqueues are handled by vduse_blk_drained_end() */ @@ -XXX,XX +XXX,XX @@ static void vduse_blk_attach_ctx(VduseBlkExport *vblk_exp, AioContext *ctx) static void vduse_blk_detach_ctx(VduseBlkExport *vblk_exp) { aio_set_fd_handler(vblk_exp->export.ctx, vduse_dev_get_fd(vblk_exp->dev), - false, NULL, NULL, NULL, NULL, NULL); + NULL, NULL, NULL, NULL, NULL); /* Virtqueues are handled by vduse_blk_drained_begin() */ } @@ -XXX,XX +XXX,XX @@ static int vduse_blk_exp_create(BlockExport *exp, BlockExportOptions *opts, vduse_dev_setup_queue(vblk_exp->dev, i, queue_size); } - aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), false, + aio_set_fd_handler(exp->ctx, vduse_dev_get_fd(vblk_exp->dev), on_vduse_dev_kick, NULL, NULL, NULL, vblk_exp->dev); blk_add_aio_context_notifier(exp->blk, blk_aio_attached, blk_aio_detach, diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ static void bdrv_do_drained_begin(BlockDriverState *bs, BdrvChild *parent, /* Stop things in parent-to-child order */ if (qatomic_fetch_inc(&bs->quiesce_counter) == 0) { - aio_disable_external(bdrv_get_aio_context(bs)); bdrv_parent_drained_begin(bs, parent); if (bs->drv && bs->drv->bdrv_drain_begin) { bs->drv->bdrv_drain_begin(bs); @@ -XXX,XX +XXX,XX @@ static void bdrv_do_drained_end(BlockDriverState *bs, BdrvChild *parent) bs->drv->bdrv_drain_end(bs); } bdrv_parent_drained_end(bs, parent); - aio_enable_external(bdrv_get_aio_context(bs)); } } diff --git a/block/io_uring.c b/block/io_uring.c index XXXXXXX..XXXXXXX 100644 --- a/block/io_uring.c +++ b/block/io_uring.c @@ -XXX,XX +XXX,XX @@ int coroutine_fn luring_co_submit(BlockDriverState *bs, int fd, uint64_t offset, void luring_detach_aio_context(LuringState *s, AioContext *old_context) { - aio_set_fd_handler(old_context, s->ring.ring_fd, false, + aio_set_fd_handler(old_context, s->ring.ring_fd, NULL, NULL, NULL, NULL, s); qemu_bh_delete(s->completion_bh); s->aio_context = NULL; @@ -XXX,XX +XXX,XX @@ void luring_attach_aio_context(LuringState *s, AioContext *new_context) { s->aio_context = new_context; s->completion_bh = aio_bh_new(new_context, qemu_luring_completion_bh, s); - aio_set_fd_handler(s->aio_context, s->ring.ring_fd, false, + aio_set_fd_handler(s->aio_context, s->ring.ring_fd, qemu_luring_completion_cb, NULL, qemu_luring_poll_cb, qemu_luring_poll_ready, s); } diff --git a/block/iscsi.c b/block/iscsi.c index XXXXXXX..XXXXXXX 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -XXX,XX +XXX,XX @@ iscsi_set_events(IscsiLun *iscsilun) if (ev != iscsilun->events) { aio_set_fd_handler(iscsilun->aio_context, iscsi_get_fd(iscsi), - false, (ev & POLLIN) ? iscsi_process_read : NULL, (ev & POLLOUT) ? iscsi_process_write : NULL, NULL, NULL, @@ -XXX,XX +XXX,XX @@ static void iscsi_detach_aio_context(BlockDriverState *bs) IscsiLun *iscsilun = bs->opaque; aio_set_fd_handler(iscsilun->aio_context, iscsi_get_fd(iscsilun->iscsi), - false, NULL, NULL, NULL, NULL, NULL); + NULL, NULL, NULL, NULL, NULL); iscsilun->events = 0; if (iscsilun->nop_timer) { diff --git a/block/linux-aio.c b/block/linux-aio.c index XXXXXXX..XXXXXXX 100644 --- a/block/linux-aio.c +++ b/block/linux-aio.c @@ -XXX,XX +XXX,XX @@ int coroutine_fn laio_co_submit(int fd, uint64_t offset, QEMUIOVector *qiov, void laio_detach_aio_context(LinuxAioState *s, AioContext *old_context) { - aio_set_event_notifier(old_context, &s->e, false, NULL, NULL, NULL); + aio_set_event_notifier(old_context, &s->e, NULL, NULL, NULL); qemu_bh_delete(s->completion_bh); s->aio_context = NULL; } @@ -XXX,XX +XXX,XX @@ void laio_attach_aio_context(LinuxAioState *s, AioContext *new_context) { s->aio_context = new_context; s->completion_bh = aio_bh_new(new_context, qemu_laio_completion_bh, s); - aio_set_event_notifier(new_context, &s->e, false, + aio_set_event_notifier(new_context, &s->e, qemu_laio_completion_cb, qemu_laio_poll_cb, qemu_laio_poll_ready); diff --git a/block/nfs.c b/block/nfs.c index XXXXXXX..XXXXXXX 100644 --- a/block/nfs.c +++ b/block/nfs.c @@ -XXX,XX +XXX,XX @@ static void nfs_set_events(NFSClient *client) int ev = nfs_which_events(client->context); if (ev != client->events) { aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context), - false, (ev & POLLIN) ? nfs_process_read : NULL, (ev & POLLOUT) ? nfs_process_write : NULL, NULL, NULL, client); @@ -XXX,XX +XXX,XX @@ static void nfs_detach_aio_context(BlockDriverState *bs) NFSClient *client = bs->opaque; aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context), - false, NULL, NULL, NULL, NULL, NULL); + NULL, NULL, NULL, NULL, NULL); client->events = 0; } @@ -XXX,XX +XXX,XX @@ static void nfs_client_close(NFSClient *client) if (client->context) { qemu_mutex_lock(&client->mutex); aio_set_fd_handler(client->aio_context, nfs_get_fd(client->context), - false, NULL, NULL, NULL, NULL, NULL); + NULL, NULL, NULL, NULL, NULL); qemu_mutex_unlock(&client->mutex); if (client->fh) { nfs_close(client->context, client->fh); diff --git a/block/nvme.c b/block/nvme.c index XXXXXXX..XXXXXXX 100644 --- a/block/nvme.c +++ b/block/nvme.c @@ -XXX,XX +XXX,XX @@ static int nvme_init(BlockDriverState *bs, const char *device, int namespace, } aio_set_event_notifier(bdrv_get_aio_context(bs), &s->irq_notifier[MSIX_SHARED_IRQ_IDX], - false, nvme_handle_event, nvme_poll_cb, + nvme_handle_event, nvme_poll_cb, nvme_poll_ready); if (!nvme_identify(bs, namespace, errp)) { @@ -XXX,XX +XXX,XX @@ static void nvme_close(BlockDriverState *bs) g_free(s->queues); aio_set_event_notifier(bdrv_get_aio_context(bs), &s->irq_notifier[MSIX_SHARED_IRQ_IDX], - false, NULL, NULL, NULL); + NULL, NULL, NULL); event_notifier_cleanup(&s->irq_notifier[MSIX_SHARED_IRQ_IDX]); qemu_vfio_pci_unmap_bar(s->vfio, 0, s->bar0_wo_map, 0, sizeof(NvmeBar) + NVME_DOORBELL_SIZE); @@ -XXX,XX +XXX,XX @@ static void nvme_detach_aio_context(BlockDriverState *bs) aio_set_event_notifier(bdrv_get_aio_context(bs), &s->irq_notifier[MSIX_SHARED_IRQ_IDX], - false, NULL, NULL, NULL); + NULL, NULL, NULL); } static void nvme_attach_aio_context(BlockDriverState *bs, @@ -XXX,XX +XXX,XX @@ static void nvme_attach_aio_context(BlockDriverState *bs, s->aio_context = new_context; aio_set_event_notifier(new_context, &s->irq_notifier[MSIX_SHARED_IRQ_IDX], - false, nvme_handle_event, nvme_poll_cb, + nvme_handle_event, nvme_poll_cb, nvme_poll_ready); for (unsigned i = 0; i < s->queue_count; i++) { diff --git a/block/ssh.c b/block/ssh.c index XXXXXXX..XXXXXXX 100644 --- a/block/ssh.c +++ b/block/ssh.c @@ -XXX,XX +XXX,XX @@ static void restart_coroutine(void *opaque) AioContext *ctx = bdrv_get_aio_context(bs); trace_ssh_restart_coroutine(restart->co); - aio_set_fd_handler(ctx, s->sock, false, NULL, NULL, NULL, NULL, NULL); + aio_set_fd_handler(ctx, s->sock, NULL, NULL, NULL, NULL, NULL); aio_co_wake(restart->co); } @@ -XXX,XX +XXX,XX @@ static coroutine_fn void co_yield(BDRVSSHState *s, BlockDriverState *bs) trace_ssh_co_yield(s->sock, rd_handler, wr_handler); aio_set_fd_handler(bdrv_get_aio_context(bs), s->sock, - false, rd_handler, wr_handler, NULL, NULL, &restart); + rd_handler, wr_handler, NULL, NULL, &restart); qemu_coroutine_yield(); trace_ssh_co_yield_back(s->sock); } diff --git a/block/win32-aio.c b/block/win32-aio.c index XXXXXXX..XXXXXXX 100644 --- a/block/win32-aio.c +++ b/block/win32-aio.c @@ -XXX,XX +XXX,XX @@ int win32_aio_attach(QEMUWin32AIOState *aio, HANDLE hfile) void win32_aio_detach_aio_context(QEMUWin32AIOState *aio, AioContext *old_context) { - aio_set_event_notifier(old_context, &aio->e, false, NULL, NULL, NULL); + aio_set_event_notifier(old_context, &aio->e, NULL, NULL, NULL); aio->aio_ctx = NULL; } @@ -XXX,XX +XXX,XX @@ void win32_aio_attach_aio_context(QEMUWin32AIOState *aio, AioContext *new_context) { aio->aio_ctx = new_context; - aio_set_event_notifier(new_context, &aio->e, false, - win32_aio_completion_cb, NULL, NULL); + aio_set_event_notifier(new_context, &aio->e, win32_aio_completion_cb, + NULL, NULL); } QEMUWin32AIOState *win32_aio_init(void) diff --git a/hw/i386/kvm/xen_xenstore.c b/hw/i386/kvm/xen_xenstore.c index XXXXXXX..XXXXXXX 100644 --- a/hw/i386/kvm/xen_xenstore.c +++ b/hw/i386/kvm/xen_xenstore.c @@ -XXX,XX +XXX,XX @@ static void xen_xenstore_realize(DeviceState *dev, Error **errp) error_setg(errp, "Xenstore evtchn port init failed"); return; } - aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), false, + aio_set_fd_handler(qemu_get_aio_context(), xen_be_evtchn_fd(s->eh), xen_xenstore_event, NULL, NULL, NULL, s); s->impl = xs_impl_create(xen_domid); diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c index XXXXXXX..XXXXXXX 100644 --- a/hw/virtio/virtio.c +++ b/hw/virtio/virtio.c @@ -XXX,XX +XXX,XX @@ static void virtio_queue_host_notifier_aio_poll_end(EventNotifier *n) void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx) { - aio_set_event_notifier(ctx, &vq->host_notifier, false, + aio_set_event_notifier(ctx, &vq->host_notifier, virtio_queue_host_notifier_read, virtio_queue_host_notifier_aio_poll, virtio_queue_host_notifier_aio_poll_ready); @@ -XXX,XX +XXX,XX @@ void virtio_queue_aio_attach_host_notifier(VirtQueue *vq, AioContext *ctx) */ void virtio_queue_aio_attach_host_notifier_no_poll(VirtQueue *vq, AioContext *ctx) { - aio_set_event_notifier(ctx, &vq->host_notifier, false, + aio_set_event_notifier(ctx, &vq->host_notifier, virtio_queue_host_notifier_read, NULL, NULL); } void virtio_queue_aio_detach_host_notifier(VirtQueue *vq, AioContext *ctx) { - aio_set_event_notifier(ctx, &vq->host_notifier, false, NULL, NULL, NULL); + aio_set_event_notifier(ctx, &vq->host_notifier, NULL, NULL, NULL); } void virtio_queue_host_notifier_read(EventNotifier *n) diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c index XXXXXXX..XXXXXXX 100644 --- a/hw/xen/xen-bus.c +++ b/hw/xen/xen-bus.c @@ -XXX,XX +XXX,XX @@ void xen_device_set_event_channel_context(XenDevice *xendev, } if (channel->ctx) - aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false, + aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), NULL, NULL, NULL, NULL, NULL); channel->ctx = ctx; if (ctx) { aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), - false, xen_device_event, NULL, xen_device_poll, - NULL, channel); + xen_device_event, NULL, xen_device_poll, NULL, + channel); } } @@ -XXX,XX +XXX,XX @@ void xen_device_unbind_event_channel(XenDevice *xendev, QLIST_REMOVE(channel, list); - aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), false, + aio_set_fd_handler(channel->ctx, qemu_xen_evtchn_fd(channel->xeh), NULL, NULL, NULL, NULL, NULL); if (qemu_xen_evtchn_unbind(channel->xeh, channel->local_port) < 0) { diff --git a/io/channel-command.c b/io/channel-command.c index XXXXXXX..XXXXXXX 100644 --- a/io/channel-command.c +++ b/io/channel-command.c @@ -XXX,XX +XXX,XX @@ static void qio_channel_command_set_aio_fd_handler(QIOChannel *ioc, void *opaque) { QIOChannelCommand *cioc = QIO_CHANNEL_COMMAND(ioc); - aio_set_fd_handler(ctx, cioc->readfd, false, - io_read, NULL, NULL, NULL, opaque); - aio_set_fd_handler(ctx, cioc->writefd, false, - NULL, io_write, NULL, NULL, opaque); + aio_set_fd_handler(ctx, cioc->readfd, io_read, NULL, NULL, NULL, opaque); + aio_set_fd_handler(ctx, cioc->writefd, NULL, io_write, NULL, NULL, opaque); } diff --git a/io/channel-file.c b/io/channel-file.c index XXXXXXX..XXXXXXX 100644 --- a/io/channel-file.c +++ b/io/channel-file.c @@ -XXX,XX +XXX,XX @@ static void qio_channel_file_set_aio_fd_handler(QIOChannel *ioc, void *opaque) { QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc); - aio_set_fd_handler(ctx, fioc->fd, false, io_read, io_write, - NULL, NULL, opaque); + aio_set_fd_handler(ctx, fioc->fd, io_read, io_write, NULL, NULL, opaque); } static GSource *qio_channel_file_create_watch(QIOChannel *ioc, diff --git a/io/channel-socket.c b/io/channel-socket.c index XXXXXXX..XXXXXXX 100644 --- a/io/channel-socket.c +++ b/io/channel-socket.c @@ -XXX,XX +XXX,XX @@ static void qio_channel_socket_set_aio_fd_handler(QIOChannel *ioc, void *opaque) { QIOChannelSocket *sioc = QIO_CHANNEL_SOCKET(ioc); - aio_set_fd_handler(ctx, sioc->fd, false, - io_read, io_write, NULL, NULL, opaque); + aio_set_fd_handler(ctx, sioc->fd, io_read, io_write, NULL, NULL, opaque); } static GSource *qio_channel_socket_create_watch(QIOChannel *ioc, diff --git a/migration/rdma.c b/migration/rdma.c index XXXXXXX..XXXXXXX 100644 --- a/migration/rdma.c +++ b/migration/rdma.c @@ -XXX,XX +XXX,XX @@ static void qio_channel_rdma_set_aio_fd_handler(QIOChannel *ioc, { QIOChannelRDMA *rioc = QIO_CHANNEL_RDMA(ioc); if (io_read) { - aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd, - false, io_read, io_write, NULL, NULL, opaque); - aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd, - false, io_read, io_write, NULL, NULL, opaque); + aio_set_fd_handler(ctx, rioc->rdmain->recv_comp_channel->fd, io_read, + io_write, NULL, NULL, opaque); + aio_set_fd_handler(ctx, rioc->rdmain->send_comp_channel->fd, io_read, + io_write, NULL, NULL, opaque); } else { - aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd, - false, io_read, io_write, NULL, NULL, opaque); - aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd, - false, io_read, io_write, NULL, NULL, opaque); + aio_set_fd_handler(ctx, rioc->rdmaout->recv_comp_channel->fd, io_read, + io_write, NULL, NULL, opaque); + aio_set_fd_handler(ctx, rioc->rdmaout->send_comp_channel->fd, io_read, + io_write, NULL, NULL, opaque); } } diff --git a/tests/unit/test-aio.c b/tests/unit/test-aio.c index XXXXXXX..XXXXXXX 100644 --- a/tests/unit/test-aio.c +++ b/tests/unit/test-aio.c @@ -XXX,XX +XXX,XX @@ static void *test_acquire_thread(void *opaque) static void set_event_notifier(AioContext *ctx, EventNotifier *notifier, EventNotifierHandler *handler) { - aio_set_event_notifier(ctx, notifier, false, handler, NULL, NULL); + aio_set_event_notifier(ctx, notifier, handler, NULL, NULL); } static void dummy_notifier_read(EventNotifier *n) @@ -XXX,XX +XXX,XX @@ static void test_flush_event_notifier(void) event_notifier_cleanup(&data.e); } -static void test_aio_external_client(void) -{ - int i, j; - - for (i = 1; i < 3; i++) { - EventNotifierTestData data = { .n = 0, .active = 10, .auto_set = true }; - event_notifier_init(&data.e, false); - aio_set_event_notifier(ctx, &data.e, true, event_ready_cb, NULL, NULL); - event_notifier_set(&data.e); - for (j = 0; j < i; j++) { - aio_disable_external(ctx); - } - for (j = 0; j < i; j++) { - assert(!aio_poll(ctx, false)); - assert(event_notifier_test_and_clear(&data.e)); - event_notifier_set(&data.e); - aio_enable_external(ctx); - } - assert(aio_poll(ctx, false)); - set_event_notifier(ctx, &data.e, NULL); - event_notifier_cleanup(&data.e); - } -} - static void test_wait_event_notifier_noflush(void) { EventNotifierTestData data = { .n = 0 }; @@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv) g_test_add_func("/aio/event/wait", test_wait_event_notifier); g_test_add_func("/aio/event/wait/no-flush-cb", test_wait_event_notifier_noflush); g_test_add_func("/aio/event/flush", test_flush_event_notifier); - g_test_add_func("/aio/external-client", test_aio_external_client); g_test_add_func("/aio/timer/schedule", test_timer_schedule); g_test_add_func("/aio/coroutine/queue-chaining", test_queue_chaining); diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c index XXXXXXX..XXXXXXX 100644 --- a/tests/unit/test-bdrv-drain.c +++ b/tests/unit/test-bdrv-drain.c @@ -XXX,XX +XXX,XX @@ static void test_graph_change_drain_all(void) g_assert_cmpint(bs_b->quiesce_counter, ==, 0); g_assert_cmpint(b_s->drain_count, ==, 0); - g_assert_cmpint(qemu_get_aio_context()->external_disable_cnt, ==, 0); bdrv_unref(bs_b); blk_unref(blk_b); diff --git a/tests/unit/test-fdmon-epoll.c b/tests/unit/test-fdmon-epoll.c deleted file mode 100644 index XXXXXXX..XXXXXXX --- a/tests/unit/test-fdmon-epoll.c +++ /dev/null @@ -XXX,XX +XXX,XX @@ -/* SPDX-License-Identifier: GPL-2.0-or-later */ -/* - * fdmon-epoll tests - * - * Copyright (c) 2020 Red Hat, Inc. - */ - -#include "qemu/osdep.h" -#include "block/aio.h" -#include "qapi/error.h" -#include "qemu/main-loop.h" - -static AioContext *ctx; - -static void dummy_fd_handler(EventNotifier *notifier) -{ - event_notifier_test_and_clear(notifier); -} - -static void add_event_notifiers(EventNotifier *notifiers, size_t n) -{ - for (size_t i = 0; i < n; i++) { - event_notifier_init(¬ifiers[i], false); - aio_set_event_notifier(ctx, ¬ifiers[i], false, - dummy_fd_handler, NULL, NULL); - } -} - -static void remove_event_notifiers(EventNotifier *notifiers, size_t n) -{ - for (size_t i = 0; i < n; i++) { - aio_set_event_notifier(ctx, ¬ifiers[i], false, NULL, NULL, NULL); - event_notifier_cleanup(¬ifiers[i]); - } -} - -/* Check that fd handlers work when external clients are disabled */ -static void test_external_disabled(void) -{ - EventNotifier notifiers[100]; - - /* fdmon-epoll is only enabled when many fd handlers are registered */ - add_event_notifiers(notifiers, G_N_ELEMENTS(notifiers)); - - event_notifier_set(¬ifiers[0]); - assert(aio_poll(ctx, true)); - - aio_disable_external(ctx); - event_notifier_set(¬ifiers[0]); - assert(aio_poll(ctx, true)); - aio_enable_external(ctx); - - remove_event_notifiers(notifiers, G_N_ELEMENTS(notifiers)); -} - -int main(int argc, char **argv) -{ - /* - * This code relies on the fact that fdmon-io_uring disables itself when - * the glib main loop is in use. The main loop uses fdmon-poll and upgrades - * to fdmon-epoll when the number of fds exceeds a threshold. - */ - qemu_init_main_loop(&error_fatal); - ctx = qemu_get_aio_context(); - - while (g_main_context_iteration(NULL, false)) { - /* Do nothing */ - } - - g_test_init(&argc, &argv, NULL); - g_test_add_func("/fdmon-epoll/external-disabled", test_external_disabled); - return g_test_run(); -} diff --git a/tests/unit/test-nested-aio-poll.c b/tests/unit/test-nested-aio-poll.c index XXXXXXX..XXXXXXX 100644 --- a/tests/unit/test-nested-aio-poll.c +++ b/tests/unit/test-nested-aio-poll.c @@ -XXX,XX +XXX,XX @@ static void test(void) /* Make the event notifier active (set) right away */ event_notifier_init(&td.poll_notifier, 1); - aio_set_event_notifier(td.ctx, &td.poll_notifier, false, + aio_set_event_notifier(td.ctx, &td.poll_notifier, io_read, io_poll_true, io_poll_ready); /* This event notifier will be used later */ event_notifier_init(&td.dummy_notifier, 0); - aio_set_event_notifier(td.ctx, &td.dummy_notifier, false, + aio_set_event_notifier(td.ctx, &td.dummy_notifier, io_read, io_poll_false, io_poll_never_ready); /* Consume aio_notify() */ @@ -XXX,XX +XXX,XX @@ static void test(void) /* Run io_poll()/io_poll_ready() one more time to show it keeps working */ g_assert(aio_poll(td.ctx, true)); - aio_set_event_notifier(td.ctx, &td.dummy_notifier, false, - NULL, NULL, NULL); - aio_set_event_notifier(td.ctx, &td.poll_notifier, false, NULL, NULL, NULL); + aio_set_event_notifier(td.ctx, &td.dummy_notifier, NULL, NULL, NULL); + aio_set_event_notifier(td.ctx, &td.poll_notifier, NULL, NULL, NULL); event_notifier_cleanup(&td.dummy_notifier); event_notifier_cleanup(&td.poll_notifier); aio_context_unref(td.ctx); diff --git a/util/aio-posix.c b/util/aio-posix.c index XXXXXXX..XXXXXXX 100644 --- a/util/aio-posix.c +++ b/util/aio-posix.c @@ -XXX,XX +XXX,XX @@ static bool aio_remove_fd_handler(AioContext *ctx, AioHandler *node) void aio_set_fd_handler(AioContext *ctx, int fd, - bool is_external, IOHandler *io_read, IOHandler *io_write, AioPollFn *io_poll, @@ -XXX,XX +XXX,XX @@ void aio_set_fd_handler(AioContext *ctx, new_node->io_poll = io_poll; new_node->io_poll_ready = io_poll_ready; new_node->opaque = opaque; - new_node->is_external = is_external; if (is_new) { new_node->pfd.fd = fd; @@ -XXX,XX +XXX,XX @@ static void aio_set_fd_poll(AioContext *ctx, int fd, void aio_set_event_notifier(AioContext *ctx, EventNotifier *notifier, - bool is_external, EventNotifierHandler *io_read, AioPollFn *io_poll, EventNotifierHandler *io_poll_ready) { - aio_set_fd_handler(ctx, event_notifier_get_fd(notifier), is_external, + aio_set_fd_handler(ctx, event_notifier_get_fd(notifier), (IOHandler *)io_read, NULL, io_poll, (IOHandler *)io_poll_ready, notifier); } @@ -XXX,XX +XXX,XX @@ bool aio_pending(AioContext *ctx) /* TODO should this check poll ready? */ revents = node->pfd.revents & node->pfd.events; - if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read && - aio_node_check(ctx, node->is_external)) { + if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) { result = true; break; } - if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write && - aio_node_check(ctx, node->is_external)) { + if (revents & (G_IO_OUT | G_IO_ERR) && node->io_write) { result = true; break; } @@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node) QLIST_INSERT_HEAD(&ctx->poll_aio_handlers, node, node_poll); } if (!QLIST_IS_INSERTED(node, node_deleted) && - poll_ready && revents == 0 && - aio_node_check(ctx, node->is_external) && - node->io_poll_ready) { + poll_ready && revents == 0 && node->io_poll_ready) { /* * Remove temporarily to avoid infinite loops when ->io_poll_ready() * calls aio_poll() before clearing the condition that made the poll @@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node) if (!QLIST_IS_INSERTED(node, node_deleted) && (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) && - aio_node_check(ctx, node->is_external) && node->io_read) { node->io_read(node->opaque); @@ -XXX,XX +XXX,XX @@ static bool aio_dispatch_handler(AioContext *ctx, AioHandler *node) } if (!QLIST_IS_INSERTED(node, node_deleted) && (revents & (G_IO_OUT | G_IO_ERR)) && - aio_node_check(ctx, node->is_external) && node->io_write) { node->io_write(node->opaque); progress = true; @@ -XXX,XX +XXX,XX @@ static bool run_poll_handlers_once(AioContext *ctx, AioHandler *tmp; QLIST_FOREACH_SAFE(node, &ctx->poll_aio_handlers, node_poll, tmp) { - if (aio_node_check(ctx, node->is_external) && - node->io_poll(node->opaque)) { + if (node->io_poll(node->opaque)) { aio_add_poll_ready_handler(ready_list, node); node->poll_idle_timeout = now + POLL_IDLE_INTERVAL_NS; diff --git a/util/aio-win32.c b/util/aio-win32.c index XXXXXXX..XXXXXXX 100644 --- a/util/aio-win32.c +++ b/util/aio-win32.c @@ -XXX,XX +XXX,XX @@ struct AioHandler { GPollFD pfd; int deleted; void *opaque; - bool is_external; QLIST_ENTRY(AioHandler) node; }; @@ -XXX,XX +XXX,XX @@ static void aio_remove_fd_handler(AioContext *ctx, AioHandler *node) void aio_set_fd_handler(AioContext *ctx, int fd, - bool is_external, IOHandler *io_read, IOHandler *io_write, AioPollFn *io_poll, @@ -XXX,XX +XXX,XX @@ void aio_set_fd_handler(AioContext *ctx, node->opaque = opaque; node->io_read = io_read; node->io_write = io_write; - node->is_external = is_external; if (io_read) { bitmask |= FD_READ | FD_ACCEPT | FD_CLOSE; @@ -XXX,XX +XXX,XX @@ void aio_set_fd_handler(AioContext *ctx, void aio_set_event_notifier(AioContext *ctx, EventNotifier *e, - bool is_external, EventNotifierHandler *io_notify, AioPollFn *io_poll, EventNotifierHandler *io_poll_ready) @@ -XXX,XX +XXX,XX @@ void aio_set_event_notifier(AioContext *ctx, node->e = e; node->pfd.fd = (uintptr_t)event_notifier_get_handle(e); node->pfd.events = G_IO_IN; - node->is_external = is_external; QLIST_INSERT_HEAD_RCU(&ctx->aio_handlers, node, node); g_source_add_poll(&ctx->source, &node->pfd); @@ -XXX,XX +XXX,XX @@ bool aio_poll(AioContext *ctx, bool blocking) /* fill fd sets */ count = 0; QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { - if (!node->deleted && node->io_notify - && aio_node_check(ctx, node->is_external)) { + if (!node->deleted && node->io_notify) { assert(count < MAXIMUM_WAIT_OBJECTS); events[count++] = event_notifier_get_handle(node->e); } diff --git a/util/async.c b/util/async.c index XXXXXXX..XXXXXXX 100644 --- a/util/async.c +++ b/util/async.c @@ -XXX,XX +XXX,XX @@ aio_ctx_finalize(GSource *source) g_free(bh); } - aio_set_event_notifier(ctx, &ctx->notifier, false, NULL, NULL, NULL); + aio_set_event_notifier(ctx, &ctx->notifier, NULL, NULL, NULL); event_notifier_cleanup(&ctx->notifier); qemu_rec_mutex_destroy(&ctx->lock); qemu_lockcnt_destroy(&ctx->list_lock); @@ -XXX,XX +XXX,XX @@ AioContext *aio_context_new(Error **errp) QSLIST_INIT(&ctx->scheduled_coroutines); aio_set_event_notifier(ctx, &ctx->notifier, - false, aio_context_notifier_cb, aio_context_notifier_poll, aio_context_notifier_poll_ready); diff --git a/util/fdmon-epoll.c b/util/fdmon-epoll.c index XXXXXXX..XXXXXXX 100644 --- a/util/fdmon-epoll.c +++ b/util/fdmon-epoll.c @@ -XXX,XX +XXX,XX @@ static int fdmon_epoll_wait(AioContext *ctx, AioHandlerList *ready_list, int i, ret = 0; struct epoll_event events[128]; - /* Fall back while external clients are disabled */ - if (qatomic_read(&ctx->external_disable_cnt)) { - return fdmon_poll_ops.wait(ctx, ready_list, timeout); - } - if (timeout > 0) { ret = qemu_poll_ns(&pfd, 1, timeout); if (ret > 0) { @@ -XXX,XX +XXX,XX @@ bool fdmon_epoll_try_upgrade(AioContext *ctx, unsigned npfd) return false; } - /* Do not upgrade while external clients are disabled */ - if (qatomic_read(&ctx->external_disable_cnt)) { - return false; - } - if (npfd < EPOLL_ENABLE_THRESHOLD) { return false; } diff --git a/util/fdmon-io_uring.c b/util/fdmon-io_uring.c index XXXXXXX..XXXXXXX 100644 --- a/util/fdmon-io_uring.c +++ b/util/fdmon-io_uring.c @@ -XXX,XX +XXX,XX @@ static int fdmon_io_uring_wait(AioContext *ctx, AioHandlerList *ready_list, unsigned wait_nr = 1; /* block until at least one cqe is ready */ int ret; - /* Fall back while external clients are disabled */ - if (qatomic_read(&ctx->external_disable_cnt)) { - return fdmon_poll_ops.wait(ctx, ready_list, timeout); - } - if (timeout == 0) { wait_nr = 0; /* non-blocking */ } else if (timeout > 0) { @@ -XXX,XX +XXX,XX @@ static bool fdmon_io_uring_need_wait(AioContext *ctx) return true; } - /* Are we falling back to fdmon-poll? */ - return qatomic_read(&ctx->external_disable_cnt); + return false; } static const FDMonOps fdmon_io_uring_ops = { diff --git a/util/fdmon-poll.c b/util/fdmon-poll.c index XXXXXXX..XXXXXXX 100644 --- a/util/fdmon-poll.c +++ b/util/fdmon-poll.c @@ -XXX,XX +XXX,XX @@ static int fdmon_poll_wait(AioContext *ctx, AioHandlerList *ready_list, assert(npfd == 0); QLIST_FOREACH_RCU(node, &ctx->aio_handlers, node) { - if (!QLIST_IS_INSERTED(node, node_deleted) && node->pfd.events - && aio_node_check(ctx, node->is_external)) { + if (!QLIST_IS_INSERTED(node, node_deleted) && node->pfd.events) { add_pollfd(node); } } diff --git a/util/main-loop.c b/util/main-loop.c index XXXXXXX..XXXXXXX 100644 --- a/util/main-loop.c +++ b/util/main-loop.c @@ -XXX,XX +XXX,XX @@ void qemu_set_fd_handler(int fd, void *opaque) { iohandler_init(); - aio_set_fd_handler(iohandler_ctx, fd, false, - fd_read, fd_write, NULL, NULL, opaque); + aio_set_fd_handler(iohandler_ctx, fd, fd_read, fd_write, NULL, NULL, + opaque); } void event_notifier_set_handler(EventNotifier *e, EventNotifierHandler *handler) { iohandler_init(); - aio_set_event_notifier(iohandler_ctx, e, false, - handler, NULL, NULL); + aio_set_event_notifier(iohandler_ctx, e, handler, NULL, NULL); } diff --git a/util/qemu-coroutine-io.c b/util/qemu-coroutine-io.c index XXXXXXX..XXXXXXX 100644 --- a/util/qemu-coroutine-io.c +++ b/util/qemu-coroutine-io.c @@ -XXX,XX +XXX,XX @@ typedef struct { static void fd_coroutine_enter(void *opaque) { FDYieldUntilData *data = opaque; - aio_set_fd_handler(data->ctx, data->fd, false, - NULL, NULL, NULL, NULL, NULL); + aio_set_fd_handler(data->ctx, data->fd, NULL, NULL, NULL, NULL, NULL); qemu_coroutine_enter(data->co); } @@ -XXX,XX +XXX,XX @@ void coroutine_fn yield_until_fd_readable(int fd) data.ctx = qemu_get_current_aio_context(); data.co = qemu_coroutine_self(); data.fd = fd; - aio_set_fd_handler( - data.ctx, fd, false, fd_coroutine_enter, NULL, NULL, NULL, &data); + aio_set_fd_handler(data.ctx, fd, fd_coroutine_enter, NULL, NULL, NULL, + &data); qemu_coroutine_yield(); } diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c index XXXXXXX..XXXXXXX 100644 --- a/util/vhost-user-server.c +++ b/util/vhost-user-server.c @@ -XXX,XX +XXX,XX @@ set_watch(VuDev *vu_dev, int fd, int vu_evt, vu_fd_watch->fd = fd; vu_fd_watch->cb = cb; qemu_socket_set_nonblock(fd); - aio_set_fd_handler(server->ioc->ctx, fd, false, kick_handler, + aio_set_fd_handler(server->ioc->ctx, fd, kick_handler, NULL, NULL, NULL, vu_fd_watch); vu_fd_watch->vu_dev = vu_dev; vu_fd_watch->pvt = pvt; @@ -XXX,XX +XXX,XX @@ static void remove_watch(VuDev *vu_dev, int fd) if (!vu_fd_watch) { return; } - aio_set_fd_handler(server->ioc->ctx, fd, false, - NULL, NULL, NULL, NULL, NULL); + aio_set_fd_handler(server->ioc->ctx, fd, NULL, NULL, NULL, NULL, NULL); QTAILQ_REMOVE(&server->vu_fd_watches, vu_fd_watch, next); g_free(vu_fd_watch); @@ -XXX,XX +XXX,XX @@ void vhost_user_server_stop(VuServer *server) VuFdWatch *vu_fd_watch; QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) { - aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false, + aio_set_fd_handler(server->ctx, vu_fd_watch->fd, NULL, NULL, NULL, NULL, vu_fd_watch); } @@ -XXX,XX +XXX,XX @@ void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx) qio_channel_attach_aio_context(server->ioc, ctx); QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) { - aio_set_fd_handler(ctx, vu_fd_watch->fd, false, kick_handler, NULL, + aio_set_fd_handler(ctx, vu_fd_watch->fd, kick_handler, NULL, NULL, NULL, vu_fd_watch); } @@ -XXX,XX +XXX,XX @@ void vhost_user_server_detach_aio_context(VuServer *server) VuFdWatch *vu_fd_watch; QTAILQ_FOREACH(vu_fd_watch, &server->vu_fd_watches, next) { - aio_set_fd_handler(server->ctx, vu_fd_watch->fd, false, + aio_set_fd_handler(server->ctx, vu_fd_watch->fd, NULL, NULL, NULL, NULL, vu_fd_watch); } diff --git a/tests/unit/meson.build b/tests/unit/meson.build index XXXXXXX..XXXXXXX 100644 --- a/tests/unit/meson.build +++ b/tests/unit/meson.build @@ -XXX,XX +XXX,XX @@ if have_block if nettle.found() or gcrypt.found() tests += {'test-crypto-pbkdf': [io]} endif - if config_host_data.get('CONFIG_EPOLL_CREATE1') - tests += {'test-fdmon-epoll': [testblock]} - endif endif if have_system -- 2.40.1
The following changes since commit 13356edb87506c148b163b8c7eb0695647d00c2a: Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging (2023-01-24 09:45:33 +0000) are available in the Git repository at: https://repo.or.cz/qemu/kevin.git tags/for-upstream for you to fetch changes up to d570177b50c389f379f93183155a27d44856ab46: qemu-img: Change info key names for protocol nodes (2023-02-01 16:52:33 +0100) v4: - Fixed the 'qemu-img-close-errors' test case to run only on Linux and only with the file protocol, use qemu-io instead of truncate v3: - Make the compiler happier on BSD and CentOS Stream 8 v2: - Rebased to resolve merge conflicts in coroutine.h ---------------------------------------------------------------- Block layer patches - qemu-img info: Show protocol-level information - Move more functions to coroutines - Make coroutine annotations ready for static analysis - qemu-img: Fix exit code for errors closing the image - qcow2 bitmaps: Fix theoretical corruption in error path - pflash: Only load non-zero parts of backend image to save memory - Code cleanup and test case improvements ---------------------------------------------------------------- Alberto Faria (2): coroutine: annotate coroutine_fn for libclang block: Add no_coroutine_fn and coroutine_mixed_fn marker Emanuele Giuseppe Esposito (14): block-coroutine-wrapper: support void functions block: Convert bdrv_io_plug() to co_wrapper block: Convert bdrv_io_unplug() to co_wrapper block: Convert bdrv_is_inserted() to co_wrapper block: Rename refresh_total_sectors to bdrv_refresh_total_sectors block: Convert bdrv_refresh_total_sectors() to co_wrapper_mixed block-backend: use bdrv_getlength instead of blk_getlength block: use bdrv_co_refresh_total_sectors when possible block: Convert bdrv_get_allocated_file_size() to co_wrapper block: Convert bdrv_get_info() to co_wrapper_mixed block: Convert bdrv_eject() to co_wrapper block: Convert bdrv_lock_medium() to co_wrapper block: Convert bdrv_debug_event() to co_wrapper_mixed block: Rename bdrv_load/save_vmstate() to bdrv_co_load/save_vmstate() Hanna Reitz (12): block: Improve empty format-specific info dump block/file: Add file-specific image info block/vmdk: Change extent info type block: Split BlockNodeInfo off of ImageInfo qemu-img: Use BlockNodeInfo block/qapi: Let bdrv_query_image_info() recurse block/qapi: Introduce BlockGraphInfo block/qapi: Add indentation to bdrv_node_info_dump() iotests: Filter child node information iotests/106, 214, 308: Read only one size line qemu-img: Let info print block graph qemu-img: Change info key names for protocol nodes Kevin Wolf (4): qcow2: Fix theoretical corruption in store_bitmap() error path qemu-img commit: Report errors while closing the image qemu-img bitmap: Report errors while closing the image qemu-iotests: Test qemu-img bitmap/commit exit code on error Paolo Bonzini (2): qemu-io: do not reinvent the blk_pwrite_zeroes wheel block: remove bdrv_coroutine_enter Philippe Mathieu-Daudé (1): block/nbd: Add missing <qemu/bswap.h> include Thomas Huth (2): tests/qemu-iotests/312: Mark "quorum" as required driver tests/qemu-iotests/262: Check for availability of "blkverify" first Xiang Zheng (1): pflash: Only read non-zero parts of backend image qapi/block-core.json | 123 +++++++- include/block/block-common.h | 11 +- include/block/block-io.h | 41 ++- include/block/block_int-common.h | 26 +- include/block/block_int-io.h | 5 +- include/block/nbd.h | 1 + include/block/qapi.h | 14 +- include/qemu/osdep.h | 44 +++ include/sysemu/block-backend-io.h | 31 +- block.c | 88 +++--- block/blkdebug.c | 11 +- block/blkio.c | 15 +- block/blklogwrites.c | 6 +- block/blkreplay.c | 6 +- block/blkverify.c | 6 +- block/block-backend.c | 38 +-- block/commit.c | 4 +- block/copy-on-read.c | 18 +- block/crypto.c | 14 +- block/curl.c | 10 +- block/file-posix.c | 137 +++++---- block/file-win32.c | 18 +- block/filter-compress.c | 20 +- block/gluster.c | 23 +- block/io.c | 76 ++--- block/iscsi.c | 17 +- block/mirror.c | 6 +- block/monitor/block-hmp-cmds.c | 2 +- block/nbd.c | 8 +- block/nfs.c | 4 +- block/null.c | 13 +- block/nvme.c | 14 +- block/preallocate.c | 16 +- block/qapi.c | 317 ++++++++++++++++----- block/qcow.c | 5 +- block/qcow2-bitmap.c | 5 +- block/qcow2-refcount.c | 2 +- block/qcow2.c | 17 +- block/qed.c | 11 +- block/quorum.c | 8 +- block/raw-format.c | 25 +- block/rbd.c | 9 +- block/replication.c | 6 +- block/ssh.c | 4 +- block/throttle.c | 6 +- block/vdi.c | 7 +- block/vhdx.c | 5 +- block/vmdk.c | 22 +- block/vpc.c | 5 +- blockdev.c | 8 +- hw/block/block.c | 36 ++- hw/scsi/scsi-disk.c | 5 + qemu-img.c | 100 +++++-- qemu-io-cmds.c | 62 +--- tests/unit/test-block-iothread.c | 3 + scripts/block-coroutine-wrapper.py | 20 +- tests/qemu-iotests/iotests.py | 18 +- block/meson.build | 1 + tests/qemu-iotests/065 | 2 +- tests/qemu-iotests/106 | 4 +- tests/qemu-iotests/214 | 6 +- tests/qemu-iotests/262 | 3 +- tests/qemu-iotests/302.out | 5 + tests/qemu-iotests/308 | 4 +- tests/qemu-iotests/312 | 1 + tests/qemu-iotests/common.filter | 22 +- tests/qemu-iotests/common.rc | 22 +- tests/qemu-iotests/tests/qemu-img-close-errors | 96 +++++++ tests/qemu-iotests/tests/qemu-img-close-errors.out | 23 ++ 69 files changed, 1209 insertions(+), 552 deletions(-) create mode 100755 tests/qemu-iotests/tests/qemu-img-close-errors create mode 100644 tests/qemu-iotests/tests/qemu-img-close-errors.out