:p
atchew
Login
The following changes since commit 804b30d25f8d70dc2dea951883ea92235274a50c: Merge remote-tracking branch 'remotes/legoater/tags/pull-ppc-20220130' into staging (2022-01-31 11:10:08 +0000) are available in the Git repository at: https://gitlab.com/kmwolf/qemu.git tags/for-upstream for you to fetch changes up to fc176116cdea816ceb8dd969080b2b95f58edbc0: block/rbd: workaround for ceph issue #53784 (2022-02-01 15:16:32 +0100) ---------------------------------------------------------------- Block layer patches - rbd: fix handling of holes in .bdrv_co_block_status - Fix potential crash in bdrv_set_backing_hd() - vhost-user-blk export: Fix shutdown with requests in flight - FUSE export: Fix build failure on FreeBSD - Documentation improvements ---------------------------------------------------------------- Emanuele Giuseppe Esposito (1): block.h: remove outdated comment Hanna Reitz (2): qsd: Document fuse's allow-other option qemu-img: Unify [-b [-F]] documentation Kevin Wolf (2): qemu-storage-daemon: Fix typo in vhost-user-blk help block/export: Fix vhost-user-blk shutdown with requests in flight Peter Lieven (2): block/rbd: fix handling of holes in .bdrv_co_block_status block/rbd: workaround for ceph issue #53784 Philippe Mathieu-Daudé (2): block/export/fuse: Rearrange if-else-if ladder in fuse_fallocate() block/export/fuse: Fix build failure on FreeBSD Vladimir Sementsov-Ogievskiy (1): block: bdrv_set_backing_hd(): use drained section docs/tools/qemu-img.rst | 2 +- docs/tools/qemu-storage-daemon.rst | 9 +++++-- include/block/block.h | 1 - include/qemu/vhost-user-server.h | 5 ++++ block.c | 4 +++ block/export/fuse.c | 45 +++++++++++++++++-------------- block/export/vhost-user-blk-server.c | 5 ++++ block/rbd.c | 52 +++++++++++++++++++++++++++++++----- storage-daemon/qemu-storage-daemon.c | 4 +-- util/vhost-user-server.c | 22 +++++++++++++++ qemu-img-cmds.hx | 4 +-- 11 files changed, 118 insertions(+), 35 deletions(-)
The syntax of the fd passing case misses the "addr.type=" key. Add it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20220125151514.49035-1-kwolf@redhat.com> Reviewed-by: Hanna Reitz <hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- storage-daemon/qemu-storage-daemon.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/storage-daemon/qemu-storage-daemon.c b/storage-daemon/qemu-storage-daemon.c index XXXXXXX..XXXXXXX 100644 --- a/storage-daemon/qemu-storage-daemon.c +++ b/storage-daemon/qemu-storage-daemon.c @@ -XXX,XX +XXX,XX @@ static void help(void) " export the specified block node as a\n" " vhost-user-blk device over UNIX domain socket\n" " --export [type=]vhost-user-blk,id=<id>,node-name=<node-name>,\n" -" fd,addr.str=<fd>[,writable=on|off]\n" +" addr.type=fd,addr.str=<fd>[,writable=on|off]\n" " [,logical-block-size=<block-size>][,num-queues=<num-queues>]\n" " export the specified block node as a\n" " vhost-user-blk device over file descriptor\n" -- 2.31.1
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Graph modifications should be done in drained section. stream_prepare() handler of block stream job call bdrv_set_backing_hd() without using drained section and it's theoretically possible that some IO request will interleave with graph modification and will use outdated pointers to removed block nodes. Some other callers use bdrv_set_backing_hd() not caring about drained sections too. So it seems good to make a drained section exactly in bdrv_set_backing_hd(). Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Message-Id: <20220124173741.2984056-1-vsementsov@virtuozzo.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/block.c b/block.c index XXXXXXX..XXXXXXX 100644 --- a/block.c +++ b/block.c @@ -XXX,XX +XXX,XX @@ int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd, int ret; Transaction *tran = tran_new(); + bdrv_drained_begin(bs); + ret = bdrv_set_backing_noperm(bs, backing_hd, tran, errp); if (ret < 0) { goto out; @@ -XXX,XX +XXX,XX @@ int bdrv_set_backing_hd(BlockDriverState *bs, BlockDriverState *backing_hd, out: tran_finalize(tran, ret); + bdrv_drained_end(bs); + return ret; } -- 2.31.1
The vhost-user-blk export runs requests asynchronously in their own coroutine. When the vhost connection goes away and we want to stop the vhost-user server, we need to wait for these coroutines to stop before we can unmap the shared memory. Otherwise, they would still access the unmapped memory and crash. This introduces a refcount to VuServer which is increased when spawning a new request coroutine and decreased before the coroutine exits. The memory is only unmapped when the refcount reaches zero. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <20220125151435.48792-1-kwolf@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/qemu/vhost-user-server.h | 5 +++++ block/export/vhost-user-blk-server.c | 5 +++++ util/vhost-user-server.c | 22 ++++++++++++++++++++++ 3 files changed, 32 insertions(+) diff --git a/include/qemu/vhost-user-server.h b/include/qemu/vhost-user-server.h index XXXXXXX..XXXXXXX 100644 --- a/include/qemu/vhost-user-server.h +++ b/include/qemu/vhost-user-server.h @@ -XXX,XX +XXX,XX @@ typedef struct { const VuDevIface *vu_iface; /* Protected by ctx lock */ + unsigned int refcount; + bool wait_idle; VuDev vu_dev; QIOChannel *ioc; /* The I/O channel with the client */ QIOChannelSocket *sioc; /* The underlying data channel with the client */ @@ -XXX,XX +XXX,XX @@ bool vhost_user_server_start(VuServer *server, void vhost_user_server_stop(VuServer *server); +void vhost_user_server_ref(VuServer *server); +void vhost_user_server_unref(VuServer *server); + void vhost_user_server_attach_aio_context(VuServer *server, AioContext *ctx); void vhost_user_server_detach_aio_context(VuServer *server); diff --git a/block/export/vhost-user-blk-server.c b/block/export/vhost-user-blk-server.c index XXXXXXX..XXXXXXX 100644 --- a/block/export/vhost-user-blk-server.c +++ b/block/export/vhost-user-blk-server.c @@ -XXX,XX +XXX,XX @@ vu_blk_discard_write_zeroes(VuBlkExport *vexp, struct iovec *iov, return VIRTIO_BLK_S_IOERR; } +/* Called with server refcount increased, must decrease before returning */ static void coroutine_fn vu_blk_virtio_process_req(void *opaque) { VuBlkReq *req = opaque; @@ -XXX,XX +XXX,XX @@ static void coroutine_fn vu_blk_virtio_process_req(void *opaque) } vu_blk_req_complete(req); + vhost_user_server_unref(server); return; err: free(req); + vhost_user_server_unref(server); } static void vu_blk_process_vq(VuDev *vu_dev, int idx) @@ -XXX,XX +XXX,XX @@ static void vu_blk_process_vq(VuDev *vu_dev, int idx) Coroutine *co = qemu_coroutine_create(vu_blk_virtio_process_req, req); + + vhost_user_server_ref(server); qemu_coroutine_enter(co); } } diff --git a/util/vhost-user-server.c b/util/vhost-user-server.c index XXXXXXX..XXXXXXX 100644 --- a/util/vhost-user-server.c +++ b/util/vhost-user-server.c @@ -XXX,XX +XXX,XX @@ static void panic_cb(VuDev *vu_dev, const char *buf) error_report("vu_panic: %s", buf); } +void vhost_user_server_ref(VuServer *server) +{ + assert(!server->wait_idle); + server->refcount++; +} + +void vhost_user_server_unref(VuServer *server) +{ + server->refcount--; + if (server->wait_idle && !server->refcount) { + aio_co_wake(server->co_trip); + } +} + static bool coroutine_fn vu_message_read(VuDev *vu_dev, int conn_fd, VhostUserMsg *vmsg) { @@ -XXX,XX +XXX,XX @@ static coroutine_fn void vu_client_trip(void *opaque) /* Keep running */ } + if (server->refcount) { + /* Wait for requests to complete before we can unmap the memory */ + server->wait_idle = true; + qemu_coroutine_yield(); + server->wait_idle = false; + } + assert(server->refcount == 0); + vu_deinit(vu_dev); /* vu_deinit() should have called remove_watch() */ -- 2.31.1
From: Philippe Mathieu-Daudé <f4bug@amsat.org> In order to safely maintain a mixture of #ifdef'ry with if-else-if ladder, rearrange the last statement (!mode) first. Since it is mutually exclusive with the other conditions, checking it first doesn't make any logical difference, but allows to add #ifdef'ry around in a more cleanly way. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220201112655.344373-2-f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/export/fuse.c | 41 +++++++++++++++++++++-------------------- 1 file changed, 21 insertions(+), 20 deletions(-) diff --git a/block/export/fuse.c b/block/export/fuse.c index XXXXXXX..XXXXXXX 100644 --- a/block/export/fuse.c +++ b/block/export/fuse.c @@ -XXX,XX +XXX,XX @@ static void fuse_fallocate(fuse_req_t req, fuse_ino_t inode, int mode, length = MIN(length, blk_len - offset); } - if (mode & FALLOC_FL_PUNCH_HOLE) { + if (!mode) { + /* We can only fallocate at the EOF with a truncate */ + if (offset < blk_len) { + fuse_reply_err(req, EOPNOTSUPP); + return; + } + + if (offset > blk_len) { + /* No preallocation needed here */ + ret = fuse_do_truncate(exp, offset, true, PREALLOC_MODE_OFF); + if (ret < 0) { + fuse_reply_err(req, -ret); + return; + } + } + + ret = fuse_do_truncate(exp, offset + length, true, + PREALLOC_MODE_FALLOC); + } + else if (mode & FALLOC_FL_PUNCH_HOLE) { if (!(mode & FALLOC_FL_KEEP_SIZE)) { fuse_reply_err(req, EINVAL); return; @@ -XXX,XX +XXX,XX @@ static void fuse_fallocate(fuse_req_t req, fuse_ino_t inode, int mode, } while (ret == 0 && length > 0); } #endif /* CONFIG_FALLOCATE_ZERO_RANGE */ - else if (!mode) { - /* We can only fallocate at the EOF with a truncate */ - if (offset < blk_len) { - fuse_reply_err(req, EOPNOTSUPP); - return; - } - - if (offset > blk_len) { - /* No preallocation needed here */ - ret = fuse_do_truncate(exp, offset, true, PREALLOC_MODE_OFF); - if (ret < 0) { - fuse_reply_err(req, -ret); - return; - } - } - - ret = fuse_do_truncate(exp, offset + length, true, - PREALLOC_MODE_FALLOC); - } else { + else { ret = -EOPNOTSUPP; } -- 2.31.1
From: Philippe Mathieu-Daudé <f4bug@amsat.org> When building on FreeBSD we get: [816/6851] Compiling C object libblockdev.fa.p/block_export_fuse.c.o ../block/export/fuse.c:628:16: error: use of undeclared identifier 'FALLOC_FL_KEEP_SIZE' if (mode & FALLOC_FL_KEEP_SIZE) { ^ ../block/export/fuse.c:651:16: error: use of undeclared identifier 'FALLOC_FL_PUNCH_HOLE' if (mode & FALLOC_FL_PUNCH_HOLE) { ^ ../block/export/fuse.c:652:22: error: use of undeclared identifier 'FALLOC_FL_KEEP_SIZE' if (!(mode & FALLOC_FL_KEEP_SIZE)) { ^ 3 errors generated. FAILED: libblockdev.fa.p/block_export_fuse.c.o Meson indeed reported FALLOC_FL_PUNCH_HOLE is not available: C compiler for the host machine: cc (clang 10.0.1 "FreeBSD clang version 10.0.1") Checking for function "fallocate" : NO Checking for function "posix_fallocate" : YES Header <linux/falloc.h> has symbol "FALLOC_FL_PUNCH_HOLE" : NO Header <linux/falloc.h> has symbol "FALLOC_FL_ZERO_RANGE" : NO ... Similarly to commit 304332039 ("block/export/fuse.c: fix musl build"), guard the code requiring FALLOC_FL_KEEP_SIZE / FALLOC_FL_PUNCH_HOLE definitions under CONFIG_FALLOCATE_PUNCH_HOLE #ifdef'ry. Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Message-Id: <20220201112655.344373-3-f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/export/fuse.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/block/export/fuse.c b/block/export/fuse.c index XXXXXXX..XXXXXXX 100644 --- a/block/export/fuse.c +++ b/block/export/fuse.c @@ -XXX,XX +XXX,XX @@ static void fuse_fallocate(fuse_req_t req, fuse_ino_t inode, int mode, return; } +#ifdef CONFIG_FALLOCATE_PUNCH_HOLE if (mode & FALLOC_FL_KEEP_SIZE) { length = MIN(length, blk_len - offset); } +#endif /* CONFIG_FALLOCATE_PUNCH_HOLE */ if (!mode) { /* We can only fallocate at the EOF with a truncate */ @@ -XXX,XX +XXX,XX @@ static void fuse_fallocate(fuse_req_t req, fuse_ino_t inode, int mode, ret = fuse_do_truncate(exp, offset + length, true, PREALLOC_MODE_FALLOC); } +#ifdef CONFIG_FALLOCATE_PUNCH_HOLE else if (mode & FALLOC_FL_PUNCH_HOLE) { if (!(mode & FALLOC_FL_KEEP_SIZE)) { fuse_reply_err(req, EINVAL); @@ -XXX,XX +XXX,XX @@ static void fuse_fallocate(fuse_req_t req, fuse_ino_t inode, int mode, length -= size; } while (ret == 0 && length > 0); } +#endif /* CONFIG_FALLOCATE_PUNCH_HOLE */ #ifdef CONFIG_FALLOCATE_ZERO_RANGE else if (mode & FALLOC_FL_ZERO_RANGE) { if (!(mode & FALLOC_FL_KEEP_SIZE) && offset + length > blk_len) { -- 2.31.1
From: Emanuele Giuseppe Esposito <eesposit@redhat.com> The comment "disk I/O throttling" doesn't make any sense at all any more. It was added in commit 0563e191516 to describe bdrv_io_limits_enable()/disable(), which were removed in commit 97148076, so the comment is just a forgotten leftover. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com> Message-Id: <20220131125615.74612-1-eesposit@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/block/block.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/block/block.h b/include/block/block.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -XXX,XX +XXX,XX @@ typedef unsigned int BdrvChildRole; char *bdrv_perm_names(uint64_t perm); uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission qapi_perm); -/* disk I/O throttling */ void bdrv_init(void); void bdrv_init_with_whitelist(void); bool bdrv_uses_whitelist(void); -- 2.31.1
From: Hanna Reitz <hreitz@redhat.com> We did not add documentation to the storage daemon's man page for fuse's allow-other option when it was introduced, so do that now. Fixes: 8fc54f9428b9763f800 ("export/fuse: Add allow-other option") Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220131103124.20325-1-hreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- docs/tools/qemu-storage-daemon.rst | 9 +++++++-- storage-daemon/qemu-storage-daemon.c | 2 +- 2 files changed, 8 insertions(+), 3 deletions(-) diff --git a/docs/tools/qemu-storage-daemon.rst b/docs/tools/qemu-storage-daemon.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/tools/qemu-storage-daemon.rst +++ b/docs/tools/qemu-storage-daemon.rst @@ -XXX,XX +XXX,XX @@ Standard options: .. option:: --export [type=]nbd,id=<id>,node-name=<node-name>[,name=<export-name>][,writable=on|off][,bitmap=<name>] --export [type=]vhost-user-blk,id=<id>,node-name=<node-name>,addr.type=unix,addr.path=<socket-path>[,writable=on|off][,logical-block-size=<block-size>][,num-queues=<num-queues>] --export [type=]vhost-user-blk,id=<id>,node-name=<node-name>,addr.type=fd,addr.str=<fd>[,writable=on|off][,logical-block-size=<block-size>][,num-queues=<num-queues>] - --export [type=]fuse,id=<id>,node-name=<node-name>,mountpoint=<file>[,growable=on|off][,writable=on|off] + --export [type=]fuse,id=<id>,node-name=<node-name>,mountpoint=<file>[,growable=on|off][,writable=on|off][,allow-other=on|off|auto] is a block export definition. ``node-name`` is the block node that should be exported. ``writable`` determines whether or not the export allows write @@ -XXX,XX +XXX,XX @@ Standard options: mounted). Consequently, applications that have opened the given file before the export became active will continue to see its original content. If ``growable`` is set, writes after the end of the exported file will grow the - block node to fit. + block node to fit. The ``allow-other`` option controls whether users other + than the user running the process will be allowed to access the export. Note + that enabling this option as a non-root user requires enabling the + user_allow_other option in the global fuse.conf configuration file. Setting + ``allow-other`` to auto (the default) will try enabling this option, and on + error fall back to disabling it. .. option:: --monitor MONITORDEF diff --git a/storage-daemon/qemu-storage-daemon.c b/storage-daemon/qemu-storage-daemon.c index XXXXXXX..XXXXXXX 100644 --- a/storage-daemon/qemu-storage-daemon.c +++ b/storage-daemon/qemu-storage-daemon.c @@ -XXX,XX +XXX,XX @@ static void help(void) "\n" #ifdef CONFIG_FUSE " --export [type=]fuse,id=<id>,node-name=<node-name>,mountpoint=<file>\n" -" [,growable=on|off][,writable=on|off]\n" +" [,growable=on|off][,writable=on|off][,allow-other=on|off|auto]\n" " export the specified block node over FUSE\n" "\n" #endif /* CONFIG_FUSE */ -- 2.31.1
From: Hanna Reitz <hreitz@redhat.com> qemu-img convert documents the backing file and backing format options as follows: [-B backing_file [-F backing_fmt]] whereas qemu-img create has this: [-b backing_file] [-F backing_fmt] That is, for convert, we document that -F cannot be given without -B, while for create, way say that they are independent. Indeed, it is technically possible to give -F without -b, because it is left to the block driver to decide whether this is an error or not, so sometimes it is: $ qemu-img create -f qed -F qed test.qed 64M Formatting 'test.qed', fmt=qed size=67108864 backing_fmt=qed [...] And sometimes it is not: $ qemu-img create -f qcow2 -F qcow2 test.qcow2 64M Formatting 'test.qcow2', fmt=qcow2 cluster_size=65536 [...] qemu-img: test.qcow2: Backing format cannot be used without backing file Generally, it does not make much sense, though, and users should only give -F with -b, so document it that way, as we have already done for qemu-img convert (commit 1899bf47375ad40555dcdff12ba49b4b8b82df38). Reported-by: Tingting Mao <timao@redhat.com> Signed-off-by: Hanna Reitz <hreitz@redhat.com> Message-Id: <20220131135908.32393-1-hreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- docs/tools/qemu-img.rst | 2 +- qemu-img-cmds.hx | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst index XXXXXXX..XXXXXXX 100644 --- a/docs/tools/qemu-img.rst +++ b/docs/tools/qemu-img.rst @@ -XXX,XX +XXX,XX @@ Command description: ``--skip-broken-bitmaps`` is also specified to copy only the consistent bitmaps. -.. option:: create [--object OBJECTDEF] [-q] [-f FMT] [-b BACKING_FILE] [-F BACKING_FMT] [-u] [-o OPTIONS] FILENAME [SIZE] +.. option:: create [--object OBJECTDEF] [-q] [-f FMT] [-b BACKING_FILE [-F BACKING_FMT]] [-u] [-o OPTIONS] FILENAME [SIZE] Create the new disk image *FILENAME* of size *SIZE* and format *FMT*. Depending on the file format, you can add one or more *OPTIONS* diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index XXXXXXX..XXXXXXX 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -XXX,XX +XXX,XX @@ SRST ERST DEF("create", img_create, - "create [--object objectdef] [-q] [-f fmt] [-b backing_file] [-F backing_fmt] [-u] [-o options] filename [size]") + "create [--object objectdef] [-q] [-f fmt] [-b backing_file [-F backing_fmt]] [-u] [-o options] filename [size]") SRST -.. option:: create [--object OBJECTDEF] [-q] [-f FMT] [-b BACKING_FILE] [-F BACKING_FMT] [-u] [-o OPTIONS] FILENAME [SIZE] +.. option:: create [--object OBJECTDEF] [-q] [-f FMT] [-b BACKING_FILE [-F BACKING_FMT]] [-u] [-o OPTIONS] FILENAME [SIZE] ERST DEF("dd", img_dd, -- 2.31.1
From: Peter Lieven <pl@kamp.de> the assumption that we can't hit a hole if we do not diff against a snapshot was wrong. We can see a hole in an image if we diff against base if there exists an older snapshot of the image and we have discarded blocks in the image where the snapshot has data. Fix this by simply handling a hole like an unallocated area. There are no callbacks for unallocated areas so just bail out if we hit a hole. Fixes: 0347a8fd4c3faaedf119be04c197804be40a384b Suggested-by: Ilya Dryomov <idryomov@gmail.com> Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <20220113144426.4036493-2-pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/rbd.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index XXXXXXX..XXXXXXX 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -XXX,XX +XXX,XX @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len, RBDDiffIterateReq *req = opaque; assert(req->offs + req->bytes <= offs); - /* - * we do not diff against a snapshot so we should never receive a callback - * for a hole. - */ - assert(exists); + + /* treat a hole like an unallocated area and bail out */ + if (!exists) { + return 0; + } if (!req->exists && offs > req->offs) { /* -- 2.31.1
From: Peter Lieven <pl@kamp.de> librbd had a bug until early 2022 that affected all versions of ceph that supported fast-diff. This bug results in reporting of incorrect offsets if the offset parameter to rbd_diff_iterate2 is not object aligned. This patch works around this bug for pre Quincy versions of librbd. Fixes: 0347a8fd4c3faaedf119be04c197804be40a384b Cc: qemu-stable@nongnu.org Signed-off-by: Peter Lieven <pl@kamp.de> Message-Id: <20220113144426.4036493-3-pl@kamp.de> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Tested-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/rbd.c | 42 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index XXXXXXX..XXXXXXX 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs, int status, r; RBDDiffIterateReq req = { .offs = offset }; uint64_t features, flags; + uint64_t head = 0; assert(offset + bytes <= s->image_size); @@ -XXX,XX +XXX,XX @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs, return status; } - r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true, +#if LIBRBD_VERSION_CODE < LIBRBD_VERSION(1, 17, 0) + /* + * librbd had a bug until early 2022 that affected all versions of ceph that + * supported fast-diff. This bug results in reporting of incorrect offsets + * if the offset parameter to rbd_diff_iterate2 is not object aligned. + * Work around this bug by rounding down the offset to object boundaries. + * This is OK because we call rbd_diff_iterate2 with whole_object = true. + * However, this workaround only works for non cloned images with default + * striping. + * + * See: https://tracker.ceph.com/issues/53784 + */ + + /* check if RBD image has non-default striping enabled */ + if (features & RBD_FEATURE_STRIPINGV2) { + return status; + } + +#pragma GCC diagnostic push +#pragma GCC diagnostic ignored "-Wdeprecated-declarations" + /* + * check if RBD image is a clone (= has a parent). + * + * rbd_get_parent_info is deprecated from Nautilus onwards, but the + * replacement rbd_get_parent is not present in Luminous and Mimic. + */ + if (rbd_get_parent_info(s->image, NULL, 0, NULL, 0, NULL, 0) != -ENOENT) { + return status; + } +#pragma GCC diagnostic pop + + head = req.offs & (s->object_size - 1); + req.offs -= head; + bytes += head; +#endif + + r = rbd_diff_iterate2(s->image, NULL, req.offs, bytes, true, true, qemu_rbd_diff_iterate_cb, &req); if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) { return status; @@ -XXX,XX +XXX,XX @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs, status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID; } - *pnum = req.bytes; + assert(req.bytes > head); + *pnum = req.bytes - head; return status; } -- 2.31.1
The following changes since commit 13356edb87506c148b163b8c7eb0695647d00c2a: Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging (2023-01-24 09:45:33 +0000) are available in the Git repository at: https://repo.or.cz/qemu/kevin.git tags/for-upstream for you to fetch changes up to d570177b50c389f379f93183155a27d44856ab46: qemu-img: Change info key names for protocol nodes (2023-02-01 16:52:33 +0100) v4: - Fixed the 'qemu-img-close-errors' test case to run only on Linux and only with the file protocol, use qemu-io instead of truncate v3: - Make the compiler happier on BSD and CentOS Stream 8 v2: - Rebased to resolve merge conflicts in coroutine.h ---------------------------------------------------------------- Block layer patches - qemu-img info: Show protocol-level information - Move more functions to coroutines - Make coroutine annotations ready for static analysis - qemu-img: Fix exit code for errors closing the image - qcow2 bitmaps: Fix theoretical corruption in error path - pflash: Only load non-zero parts of backend image to save memory - Code cleanup and test case improvements ---------------------------------------------------------------- Alberto Faria (2): coroutine: annotate coroutine_fn for libclang block: Add no_coroutine_fn and coroutine_mixed_fn marker Emanuele Giuseppe Esposito (14): block-coroutine-wrapper: support void functions block: Convert bdrv_io_plug() to co_wrapper block: Convert bdrv_io_unplug() to co_wrapper block: Convert bdrv_is_inserted() to co_wrapper block: Rename refresh_total_sectors to bdrv_refresh_total_sectors block: Convert bdrv_refresh_total_sectors() to co_wrapper_mixed block-backend: use bdrv_getlength instead of blk_getlength block: use bdrv_co_refresh_total_sectors when possible block: Convert bdrv_get_allocated_file_size() to co_wrapper block: Convert bdrv_get_info() to co_wrapper_mixed block: Convert bdrv_eject() to co_wrapper block: Convert bdrv_lock_medium() to co_wrapper block: Convert bdrv_debug_event() to co_wrapper_mixed block: Rename bdrv_load/save_vmstate() to bdrv_co_load/save_vmstate() Hanna Reitz (12): block: Improve empty format-specific info dump block/file: Add file-specific image info block/vmdk: Change extent info type block: Split BlockNodeInfo off of ImageInfo qemu-img: Use BlockNodeInfo block/qapi: Let bdrv_query_image_info() recurse block/qapi: Introduce BlockGraphInfo block/qapi: Add indentation to bdrv_node_info_dump() iotests: Filter child node information iotests/106, 214, 308: Read only one size line qemu-img: Let info print block graph qemu-img: Change info key names for protocol nodes Kevin Wolf (4): qcow2: Fix theoretical corruption in store_bitmap() error path qemu-img commit: Report errors while closing the image qemu-img bitmap: Report errors while closing the image qemu-iotests: Test qemu-img bitmap/commit exit code on error Paolo Bonzini (2): qemu-io: do not reinvent the blk_pwrite_zeroes wheel block: remove bdrv_coroutine_enter Philippe Mathieu-Daudé (1): block/nbd: Add missing <qemu/bswap.h> include Thomas Huth (2): tests/qemu-iotests/312: Mark "quorum" as required driver tests/qemu-iotests/262: Check for availability of "blkverify" first Xiang Zheng (1): pflash: Only read non-zero parts of backend image qapi/block-core.json | 123 +++++++- include/block/block-common.h | 11 +- include/block/block-io.h | 41 ++- include/block/block_int-common.h | 26 +- include/block/block_int-io.h | 5 +- include/block/nbd.h | 1 + include/block/qapi.h | 14 +- include/qemu/osdep.h | 44 +++ include/sysemu/block-backend-io.h | 31 +- block.c | 88 +++--- block/blkdebug.c | 11 +- block/blkio.c | 15 +- block/blklogwrites.c | 6 +- block/blkreplay.c | 6 +- block/blkverify.c | 6 +- block/block-backend.c | 38 +-- block/commit.c | 4 +- block/copy-on-read.c | 18 +- block/crypto.c | 14 +- block/curl.c | 10 +- block/file-posix.c | 137 +++++---- block/file-win32.c | 18 +- block/filter-compress.c | 20 +- block/gluster.c | 23 +- block/io.c | 76 ++--- block/iscsi.c | 17 +- block/mirror.c | 6 +- block/monitor/block-hmp-cmds.c | 2 +- block/nbd.c | 8 +- block/nfs.c | 4 +- block/null.c | 13 +- block/nvme.c | 14 +- block/preallocate.c | 16 +- block/qapi.c | 317 ++++++++++++++++----- block/qcow.c | 5 +- block/qcow2-bitmap.c | 5 +- block/qcow2-refcount.c | 2 +- block/qcow2.c | 17 +- block/qed.c | 11 +- block/quorum.c | 8 +- block/raw-format.c | 25 +- block/rbd.c | 9 +- block/replication.c | 6 +- block/ssh.c | 4 +- block/throttle.c | 6 +- block/vdi.c | 7 +- block/vhdx.c | 5 +- block/vmdk.c | 22 +- block/vpc.c | 5 +- blockdev.c | 8 +- hw/block/block.c | 36 ++- hw/scsi/scsi-disk.c | 5 + qemu-img.c | 100 +++++-- qemu-io-cmds.c | 62 +--- tests/unit/test-block-iothread.c | 3 + scripts/block-coroutine-wrapper.py | 20 +- tests/qemu-iotests/iotests.py | 18 +- block/meson.build | 1 + tests/qemu-iotests/065 | 2 +- tests/qemu-iotests/106 | 4 +- tests/qemu-iotests/214 | 6 +- tests/qemu-iotests/262 | 3 +- tests/qemu-iotests/302.out | 5 + tests/qemu-iotests/308 | 4 +- tests/qemu-iotests/312 | 1 + tests/qemu-iotests/common.filter | 22 +- tests/qemu-iotests/common.rc | 22 +- tests/qemu-iotests/tests/qemu-img-close-errors | 96 +++++++ tests/qemu-iotests/tests/qemu-img-close-errors.out | 23 ++ 69 files changed, 1209 insertions(+), 552 deletions(-) create mode 100755 tests/qemu-iotests/tests/qemu-img-close-errors create mode 100644 tests/qemu-iotests/tests/qemu-img-close-errors.out