:p
atchew
Login
The following changes since commit b34181056c04e05db6c632063012beaee7006a37: Merge remote-tracking branch 'remotes/rth/tags/pull-sh4-20180709' into staging (2018-07-09 22:44:22 +0100) are available in the git repository at: git://repo.or.cz/qemu/kevin.git tags/for-upstream for you to fetch changes up to cd47d792d7a27a57f4b621e2ff1ed8f4e83de1e9: block: Use common write req handling in truncate (2018-07-10 16:46:22 +0200) ---------------------------------------------------------------- Block layer patches: - Copy offloading fixes for when the copy increases the image size - Temporary revert of the removal of deprecated -drive options - Fix request serialisation in the image fleecing scenario - Fix copy-on-read crash with unaligned image size - Fix another drain crash ---------------------------------------------------------------- Ari Sundholm (2): qapi/block-core.json: Add missing documentation for blklogwrites log-append option block/blklogwrites: Make sure the log sector size is not too small Cornelia Huck (4): Revert "block: Remove dead deprecation warning code" Revert "block: Remove deprecated -drive option serial" Revert "block: Remove deprecated -drive option addr" Revert "block: Remove deprecated -drive geometry options" Fam Zheng (11): iotests: 222: Don't run with luks block: Prefix file driver trace points with "file_" block: Add copy offloading trace points block: Use BdrvChild to discard block: Use uint64_t for BdrvTrackedRequest byte fields block: Extract common write req handling block: Fix handling of image enlarging write block: Use common req handling for discard block: Use common req handling in copy offloading block: Fix bdrv_co_truncate overlap check block: Use common write req handling in truncate Kevin Wolf (3): block: Poll after drain on attaching a node test-bdrv-drain: Test bdrv_append() to drained node block: Fix copy-on-read crash with partial final cluster Vladimir Sementsov-Ogievskiy (4): block/io: fix copy_range block: split flags in copy_range block: add BDRV_REQ_SERIALISING flag block/backup: fix fleecing scheme: use serialized writes qapi/block-core.json | 2 + include/block/block.h | 41 +++++- include/block/block_int.h | 21 ++- include/hw/block/block.h | 1 + include/sysemu/block-backend.h | 3 +- include/sysemu/blockdev.h | 3 + block.c | 2 +- block/backup.c | 20 ++- block/blkdebug.c | 2 +- block/blklogwrites.c | 7 +- block/blkreplay.c | 2 +- block/block-backend.c | 8 +- block/copy-on-read.c | 2 +- block/file-posix.c | 25 ++-- block/file-win32.c | 2 +- block/io.c | 318 ++++++++++++++++++++++++++++------------- block/iscsi.c | 12 +- block/mirror.c | 2 +- block/qcow2-refcount.c | 2 +- block/qcow2.c | 20 +-- block/raw-format.c | 26 ++-- block/throttle.c | 2 +- blockdev.c | 110 ++++++++++++++ device-hotplug.c | 4 + hw/block/block.c | 27 ++++ hw/block/nvme.c | 1 + hw/block/virtio-blk.c | 1 + hw/ide/qdev.c | 1 + hw/scsi/scsi-disk.c | 1 + hw/usb/dev-storage.c | 1 + qemu-img.c | 2 +- tests/ahci-test.c | 6 +- tests/hd-geo-test.c | 37 ++++- tests/ide-test.c | 8 +- tests/test-bdrv-drain.c | 43 ++++++ block/trace-events | 10 +- hmp-commands.hx | 1 + qemu-doc.texi | 15 ++ qemu-options.hx | 14 +- tests/qemu-iotests/197 | 9 ++ tests/qemu-iotests/197.out | 8 ++ tests/qemu-iotests/222 | 2 + 42 files changed, 647 insertions(+), 177 deletions(-)
Commit dcf94a23b1 ('block: Don't poll in parent drain callbacks') removed polling in bdrv_child_cb_drained_begin() on the grounds that the original bdrv_drain() already will poll and BdrvChildRole.drained_begin calls must not cause graph changes (and therefore must not call aio_poll() or the recursion through the graph will break. This reasoning is correct for calls through bdrv_do_drained_begin(). However, BdrvChildRole.drained_begin is also called when a node that is already in a drained section (i.e. bdrv_do_drained_begin() has already returned and therefore can't poll any more) is attached to a new parent. In this case, we must explicitly poll to have all requests completed before the drained new child can be attached to the parent. In bdrv_replace_child_noperm(), we know that we're not inside the recursion of bdrv_do_drained_begin() because graph changes are not allowed there, and bdrv_replace_child_noperm() is a graph change. The call of BdrvChildRole.drained_begin() must therefore be followed by a BDRV_POLL_WHILE() that waits for the completion of requests. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/block/block.h | 8 ++++++++ include/block/block_int.h | 3 +++ block.c | 2 +- block/io.c | 26 ++++++++++++++++++++------ 4 files changed, 32 insertions(+), 7 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -XXX,XX +XXX,XX @@ void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore, bool ignore_bds_parents); /** + * bdrv_parent_drained_begin_single: + * + * Begin a quiesced section for the parent of @c. If @poll is true, wait for + * any pending activity to cease. + */ +void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll); + +/** * bdrv_parent_drained_end: * * End a quiesced section of all users of @bs. This is part of diff --git a/include/block/block_int.h b/include/block/block_int.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -XXX,XX +XXX,XX @@ struct BdrvChildRole { * requests after returning from .drained_begin() until .drained_end() is * called. * + * These functions must not change the graph (and therefore also must not + * call aio_poll(), which could change the graph indirectly). + * * Note that this can be nested. If drained_begin() was called twice, new * I/O is allowed only after drained_end() was called twice, too. */ diff --git a/block.c b/block.c index XXXXXXX..XXXXXXX 100644 --- a/block.c +++ b/block.c @@ -XXX,XX +XXX,XX @@ static void bdrv_replace_child_noperm(BdrvChild *child, } assert(num >= 0); for (i = 0; i < num; i++) { - child->role->drained_begin(child); + bdrv_parent_drained_begin_single(child, true); } } diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ void bdrv_parent_drained_begin(BlockDriverState *bs, BdrvChild *ignore, if (c == ignore || (ignore_bds_parents && c->role->parent_is_bds)) { continue; } - if (c->role->drained_begin) { - c->role->drained_begin(c); - } + bdrv_parent_drained_begin_single(c, false); } } @@ -XXX,XX +XXX,XX @@ void bdrv_parent_drained_end(BlockDriverState *bs, BdrvChild *ignore, } } +static bool bdrv_parent_drained_poll_single(BdrvChild *c) +{ + if (c->role->drained_poll) { + return c->role->drained_poll(c); + } + return false; +} + static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore, bool ignore_bds_parents) { @@ -XXX,XX +XXX,XX @@ static bool bdrv_parent_drained_poll(BlockDriverState *bs, BdrvChild *ignore, if (c == ignore || (ignore_bds_parents && c->role->parent_is_bds)) { continue; } - if (c->role->drained_poll) { - busy |= c->role->drained_poll(c); - } + busy |= bdrv_parent_drained_poll_single(c); } return busy; } +void bdrv_parent_drained_begin_single(BdrvChild *c, bool poll) +{ + if (c->role->drained_begin) { + c->role->drained_begin(c); + } + if (poll) { + BDRV_POLL_WHILE(c->bs, bdrv_parent_drained_poll_single(c)); + } +} + static void bdrv_merge_limits(BlockLimits *dst, const BlockLimits *src) { dst->opt_transfer = MAX(dst->opt_transfer, src->opt_transfer); -- 2.13.6
Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- tests/test-bdrv-drain.c | 43 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/tests/test-bdrv-drain.c b/tests/test-bdrv-drain.c index XXXXXXX..XXXXXXX 100644 --- a/tests/test-bdrv-drain.c +++ b/tests/test-bdrv-drain.c @@ -XXX,XX +XXX,XX @@ static void test_detach_by_driver_cb(void) test_detach_indirect(false); } +static void test_append_to_drained(void) +{ + BlockBackend *blk; + BlockDriverState *base, *overlay; + BDRVTestState *base_s, *overlay_s; + + blk = blk_new(BLK_PERM_ALL, BLK_PERM_ALL); + base = bdrv_new_open_driver(&bdrv_test, "base", BDRV_O_RDWR, &error_abort); + base_s = base->opaque; + blk_insert_bs(blk, base, &error_abort); + + overlay = bdrv_new_open_driver(&bdrv_test, "overlay", BDRV_O_RDWR, + &error_abort); + overlay_s = overlay->opaque; + + do_drain_begin(BDRV_DRAIN, base); + g_assert_cmpint(base->quiesce_counter, ==, 1); + g_assert_cmpint(base_s->drain_count, ==, 1); + g_assert_cmpint(base->in_flight, ==, 0); + + /* Takes ownership of overlay, so we don't have to unref it later */ + bdrv_append(overlay, base, &error_abort); + g_assert_cmpint(base->in_flight, ==, 0); + g_assert_cmpint(overlay->in_flight, ==, 0); + + g_assert_cmpint(base->quiesce_counter, ==, 1); + g_assert_cmpint(base_s->drain_count, ==, 1); + g_assert_cmpint(overlay->quiesce_counter, ==, 1); + g_assert_cmpint(overlay_s->drain_count, ==, 1); + + do_drain_end(BDRV_DRAIN, base); + + g_assert_cmpint(base->quiesce_counter, ==, 0); + g_assert_cmpint(base_s->drain_count, ==, 0); + g_assert_cmpint(overlay->quiesce_counter, ==, 0); + g_assert_cmpint(overlay_s->drain_count, ==, 0); + + bdrv_unref(base); + blk_unref(blk); +} + int main(int argc, char **argv) { int ret; @@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv) g_test_add_func("/bdrv-drain/detach/parent_cb", test_detach_by_parent_cb); g_test_add_func("/bdrv-drain/detach/driver_cb", test_detach_by_driver_cb); + g_test_add_func("/bdrv-drain/attach/drain", test_append_to_drained); + ret = g_test_run(); qemu_event_destroy(&done_event); return ret; -- 2.13.6
If the virtual disk size isn't aligned to full clusters, bdrv_co_do_copy_on_readv() may get pnum == 0 before having the full cluster completed, which will let it run into an assertion failure: qemu-io: block/io.c:1203: bdrv_co_do_copy_on_readv: Assertion `skip_bytes < pnum' failed. Check for EOF, assert that we read at least as much as the read request originally wanted to have (which is true at EOF because otherwise bdrv_check_byte_request() would already have returned an error) and return success early even though we couldn't copy the full cluster. Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/io.c | 6 ++++++ tests/qemu-iotests/197 | 9 +++++++++ tests/qemu-iotests/197.out | 8 ++++++++ 3 files changed, 23 insertions(+) diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_co_do_copy_on_readv(BdrvChild *child, pnum = MIN(cluster_bytes, max_transfer); } + /* Stop at EOF if the image ends in the middle of the cluster */ + if (ret == 0 && pnum == 0) { + assert(progress >= bytes); + break; + } + assert(skip_bytes < pnum); if (ret <= 0) { diff --git a/tests/qemu-iotests/197 b/tests/qemu-iotests/197 index XXXXXXX..XXXXXXX 100755 --- a/tests/qemu-iotests/197 +++ b/tests/qemu-iotests/197 @@ -XXX,XX +XXX,XX @@ $QEMU_IO -f qcow2 -c map "$TEST_WRAP" _check_test_img $QEMU_IMG compare -f $IMGFMT -F qcow2 "$TEST_IMG" "$TEST_WRAP" +echo +echo '=== Partial final cluster ===' +echo + +_make_test_img 1024 +$QEMU_IO -f $IMGFMT -C -c 'read 0 1024' "$TEST_IMG" | _filter_qemu_io +$QEMU_IO -f $IMGFMT -c map "$TEST_IMG" +_check_test_img + # success, all done echo '*** done' status=0 diff --git a/tests/qemu-iotests/197.out b/tests/qemu-iotests/197.out index XXXXXXX..XXXXXXX 100644 --- a/tests/qemu-iotests/197.out +++ b/tests/qemu-iotests/197.out @@ -XXX,XX +XXX,XX @@ can't open device TEST_DIR/t.wrap.qcow2: Can't use copy-on-read on read-only dev 1023.938 MiB (0x3fff0000) bytes not allocated at offset 3 GiB (0xc0010000) No errors were found on the image. Images are identical. + +=== Partial final cluster === + +Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1024 +read 1024/1024 bytes at offset 0 +1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) +1 KiB (0x400) bytes allocated at offset 0 bytes (0x0) +No errors were found on the image. *** done -- 2.13.6
From: Fam Zheng <famz@redhat.com> Luks needs special parameters to operate the image. Since this test is focusing on image fleecing, skip skip that format. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- tests/qemu-iotests/222 | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tests/qemu-iotests/222 b/tests/qemu-iotests/222 index XXXXXXX..XXXXXXX 100644 --- a/tests/qemu-iotests/222 +++ b/tests/qemu-iotests/222 @@ -XXX,XX +XXX,XX @@ import iotests from iotests import log, qemu_img, qemu_io, qemu_io_silent iotests.verify_platform(['linux']) +iotests.verify_image_format(supported_fmts=['qcow2', 'qcow', 'qed', 'vmdk', + 'vhdx', 'raw']) patterns = [("0x5d", "0", "64k"), ("0xd5", "1M", "64k"), -- 2.13.6
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Here two things are fixed: 1. Architecture On each recursion step, we go to the child of src or dst, only for one of them. So, it's wrong to create tracked requests for both on each step. It leads to tracked requests duplication. 2. Wait for serializing requests on write path independently of BDRV_REQ_NO_SERIALISING Before commit 9ded4a01149 "backup: Use copy offloading", BDRV_REQ_NO_SERIALISING was used for only one case: read in copy-on-write operation during backup. Also, the flag was handled only on read path (in bdrv_co_preadv and bdrv_aligned_preadv). After 9ded4a01149, flag is used for not waiting serializing operations on backup target (in same case of copy-on-write operation). This behavior change is unsubstantiated and potentially dangerous, let's drop it and add additional asserts and documentation. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/block/block.h | 12 ++++++++++++ block/io.c | 42 +++++++++++++++++++++++++++--------------- 2 files changed, 39 insertions(+), 15 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -XXX,XX +XXX,XX @@ typedef enum { * opened with BDRV_O_UNMAP. */ BDRV_REQ_MAY_UNMAP = 0x4, + + /* + * The BDRV_REQ_NO_SERIALISING flag is only valid for reads and means that + * we don't want wait_serialising_requests() during the read operation. + * + * This flag is used for backup copy-on-write operations, when we need to + * read old data before write (write notifier triggered). It is okay since + * we already waited for other serializing requests in the initiating write + * (see bdrv_aligned_pwritev), and it is necessary if the initiating write + * is already serializing (without the flag, the read would deadlock + * waiting for the serialising write to complete). + */ BDRV_REQ_NO_SERIALISING = 0x8, BDRV_REQ_FUA = 0x10, BDRV_REQ_WRITE_COMPRESSED = 0x20, diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child, max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX), align); + /* BDRV_REQ_NO_SERIALISING is only for read operation */ + assert(!(flags & BDRV_REQ_NO_SERIALISING)); waited = wait_serialising_requests(req); assert(!waited || !req->serialising); assert(req->overlap_offset <= offset); @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, BdrvRequestFlags flags, bool recurse_src) { - BdrvTrackedRequest src_req, dst_req; + BdrvTrackedRequest req; int ret; if (!dst || !dst->bs) { @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, || src->bs->encrypted || dst->bs->encrypted) { return -ENOTSUP; } - bdrv_inc_in_flight(src->bs); - bdrv_inc_in_flight(dst->bs); - tracked_request_begin(&src_req, src->bs, src_offset, - bytes, BDRV_TRACKED_READ); - tracked_request_begin(&dst_req, dst->bs, dst_offset, - bytes, BDRV_TRACKED_WRITE); - if (!(flags & BDRV_REQ_NO_SERIALISING)) { - wait_serialising_requests(&src_req); - wait_serialising_requests(&dst_req); - } if (recurse_src) { + bdrv_inc_in_flight(src->bs); + tracked_request_begin(&req, src->bs, src_offset, bytes, + BDRV_TRACKED_READ); + + if (!(flags & BDRV_REQ_NO_SERIALISING)) { + wait_serialising_requests(&req); + } + ret = src->bs->drv->bdrv_co_copy_range_from(src->bs, src, src_offset, dst, dst_offset, bytes, flags); + + tracked_request_end(&req); + bdrv_dec_in_flight(src->bs); } else { + bdrv_inc_in_flight(dst->bs); + tracked_request_begin(&req, dst->bs, dst_offset, bytes, + BDRV_TRACKED_WRITE); + + /* BDRV_REQ_NO_SERIALISING is only for read operation, + * so we ignore it in flags. + */ + wait_serialising_requests(&req); + ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs, src, src_offset, dst, dst_offset, bytes, flags); + + tracked_request_end(&req); + bdrv_dec_in_flight(dst->bs); } - tracked_request_end(&src_req); - tracked_request_end(&dst_req); - bdrv_dec_in_flight(src->bs); - bdrv_dec_in_flight(dst->bs); + return ret; } -- 2.13.6
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Pass read flags and write flags separately. This is needed to handle coming BDRV_REQ_NO_SERIALISING clearly in following patches. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/block/block.h | 3 ++- include/block/block_int.h | 14 +++++++++---- include/sysemu/block-backend.h | 3 ++- block/backup.c | 2 +- block/block-backend.c | 5 +++-- block/file-posix.c | 21 +++++++++++-------- block/io.c | 46 +++++++++++++++++++++++------------------- block/iscsi.c | 9 ++++++--- block/qcow2.c | 20 +++++++++--------- block/raw-format.c | 24 ++++++++++++++-------- qemu-img.c | 2 +- 11 files changed, 90 insertions(+), 59 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -XXX,XX +XXX,XX @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host); **/ int coroutine_fn bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags); + uint64_t bytes, BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags); #endif diff --git a/include/block/block_int.h b/include/block/block_int.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -XXX,XX +XXX,XX @@ struct BlockDriver { BdrvChild *dst, uint64_t dst_offset, uint64_t bytes, - BdrvRequestFlags flags); + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags); /* Map [offset, offset + nbytes) range onto a child of bs to copy data to, * and invoke bdrv_co_copy_range_to(child, src, ...), or perform the copy @@ -XXX,XX +XXX,XX @@ struct BlockDriver { BdrvChild *dst, uint64_t dst_offset, uint64_t bytes, - BdrvRequestFlags flags); + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags); /* * Building block for bdrv_block_status[_above] and @@ -XXX,XX +XXX,XX @@ void blockdev_close_all_bdrv_states(void); int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags); + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags); int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags); + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags); int refresh_total_sectors(BlockDriverState *bs, int64_t hint); diff --git a/include/sysemu/block-backend.h b/include/sysemu/block-backend.h index XXXXXXX..XXXXXXX 100644 --- a/include/sysemu/block-backend.h +++ b/include/sysemu/block-backend.h @@ -XXX,XX +XXX,XX @@ void blk_unregister_buf(BlockBackend *blk, void *host); int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in, BlockBackend *blk_out, int64_t off_out, - int bytes, BdrvRequestFlags flags); + int bytes, BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags); #endif diff --git a/block/backup.c b/block/backup.c index XXXXXXX..XXXXXXX 100644 --- a/block/backup.c +++ b/block/backup.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job, hbitmap_reset(job->copy_bitmap, start / job->cluster_size, nr_clusters); ret = blk_co_copy_range(blk, start, job->target, start, nbytes, - is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0); + is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0, 0); if (ret < 0) { trace_backup_do_cow_copy_range_fail(job, start, ret); hbitmap_set(job->copy_bitmap, start / job->cluster_size, diff --git a/block/block-backend.c b/block/block-backend.c index XXXXXXX..XXXXXXX 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -XXX,XX +XXX,XX @@ void blk_unregister_buf(BlockBackend *blk, void *host) int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in, BlockBackend *blk_out, int64_t off_out, - int bytes, BdrvRequestFlags flags) + int bytes, BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { int r; r = blk_check_byte_request(blk_in, off_in, bytes); @@ -XXX,XX +XXX,XX @@ int coroutine_fn blk_co_copy_range(BlockBackend *blk_in, int64_t off_in, } return bdrv_co_copy_range(blk_in->root, off_in, blk_out->root, off_out, - bytes, flags); + bytes, read_flags, write_flags); } diff --git a/block/file-posix.c b/block/file-posix.c index XXXXXXX..XXXXXXX 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -XXX,XX +XXX,XX @@ static void raw_abort_perm_update(BlockDriverState *bs) raw_handle_perm_lock(bs, RAW_PL_ABORT, 0, 0, NULL); } -static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs, - BdrvChild *src, uint64_t src_offset, - BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) +static int coroutine_fn raw_co_copy_range_from( + BlockDriverState *bs, BdrvChild *src, uint64_t src_offset, + BdrvChild *dst, uint64_t dst_offset, uint64_t bytes, + BdrvRequestFlags read_flags, BdrvRequestFlags write_flags) { - return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags); + return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, + read_flags, write_flags); } static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs, - BdrvChild *src, uint64_t src_offset, - BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + BdrvChild *src, + uint64_t src_offset, + BdrvChild *dst, + uint64_t dst_offset, + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { BDRVRawState *s = bs->opaque; BDRVRawState *src_s; diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ void bdrv_unregister_buf(BlockDriverState *bs, void *host) } } -static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, - uint64_t src_offset, - BdrvChild *dst, - uint64_t dst_offset, - uint64_t bytes, - BdrvRequestFlags flags, - bool recurse_src) +static int coroutine_fn bdrv_co_copy_range_internal( + BdrvChild *src, uint64_t src_offset, BdrvChild *dst, + uint64_t dst_offset, uint64_t bytes, + BdrvRequestFlags read_flags, BdrvRequestFlags write_flags, + bool recurse_src) { BdrvTrackedRequest req; int ret; @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, if (ret) { return ret; } - if (flags & BDRV_REQ_ZERO_WRITE) { - return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, flags); + if (write_flags & BDRV_REQ_ZERO_WRITE) { + return bdrv_co_pwrite_zeroes(dst, dst_offset, bytes, write_flags); } if (!src || !src->bs) { @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, tracked_request_begin(&req, src->bs, src_offset, bytes, BDRV_TRACKED_READ); - if (!(flags & BDRV_REQ_NO_SERIALISING)) { + if (!(read_flags & BDRV_REQ_NO_SERIALISING)) { wait_serialising_requests(&req); } ret = src->bs->drv->bdrv_co_copy_range_from(src->bs, src, src_offset, dst, dst_offset, - bytes, flags); + bytes, + read_flags, write_flags); tracked_request_end(&req); bdrv_dec_in_flight(src->bs); @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, tracked_request_begin(&req, dst->bs, dst_offset, bytes, BDRV_TRACKED_WRITE); - /* BDRV_REQ_NO_SERIALISING is only for read operation, - * so we ignore it in flags. - */ + /* BDRV_REQ_NO_SERIALISING is only for read operation */ + assert(!(write_flags & BDRV_REQ_NO_SERIALISING)); wait_serialising_requests(&req); ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs, src, src_offset, dst, dst_offset, - bytes, flags); + bytes, + read_flags, write_flags); tracked_request_end(&req); bdrv_dec_in_flight(dst->bs); @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_co_copy_range_internal(BdrvChild *src, * semantics. */ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset, - bytes, flags, true); + bytes, read_flags, write_flags, true); } /* Copy range from @src to @dst. @@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset, * semantics. */ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset, - bytes, flags, false); + bytes, read_flags, write_flags, false); } int coroutine_fn bdrv_co_copy_range(BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + uint64_t bytes, BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { return bdrv_co_copy_range_from(src, src_offset, dst, dst_offset, - bytes, flags); + bytes, read_flags, write_flags); } static void bdrv_parent_cb_resize(BlockDriverState *bs) diff --git a/block/iscsi.c b/block/iscsi.c index XXXXXXX..XXXXXXX 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn iscsi_co_copy_range_from(BlockDriverState *bs, BdrvChild *dst, uint64_t dst_offset, uint64_t bytes, - BdrvRequestFlags flags) + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { - return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, flags); + return bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, + read_flags, write_flags); } static struct scsi_task *iscsi_xcopy_task(int param_len) @@ -XXX,XX +XXX,XX @@ static int coroutine_fn iscsi_co_copy_range_to(BlockDriverState *bs, BdrvChild *dst, uint64_t dst_offset, uint64_t bytes, - BdrvRequestFlags flags) + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { IscsiLun *dst_lun = dst->bs->opaque; IscsiLun *src_lun; diff --git a/block/qcow2.c b/block/qcow2.c index XXXXXXX..XXXXXXX 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn qcow2_co_copy_range_from(BlockDriverState *bs, BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + uint64_t bytes, BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { BDRVQcow2State *s = bs->opaque; int ret; unsigned int cur_bytes; /* number of bytes in current iteration */ BdrvChild *child = NULL; - BdrvRequestFlags cur_flags; + BdrvRequestFlags cur_write_flags; assert(!bs->encrypted); qemu_co_mutex_lock(&s->lock); @@ -XXX,XX +XXX,XX @@ qcow2_co_copy_range_from(BlockDriverState *bs, uint64_t copy_offset = 0; /* prepare next request */ cur_bytes = MIN(bytes, INT_MAX); - cur_flags = flags; + cur_write_flags = write_flags; ret = qcow2_get_cluster_offset(bs, src_offset, &cur_bytes, ©_offset); if (ret < 0) { @@ -XXX,XX +XXX,XX @@ qcow2_co_copy_range_from(BlockDriverState *bs, if (bs->backing && bs->backing->bs) { int64_t backing_length = bdrv_getlength(bs->backing->bs); if (src_offset >= backing_length) { - cur_flags |= BDRV_REQ_ZERO_WRITE; + cur_write_flags |= BDRV_REQ_ZERO_WRITE; } else { child = bs->backing; cur_bytes = MIN(cur_bytes, backing_length - src_offset); copy_offset = src_offset; } } else { - cur_flags |= BDRV_REQ_ZERO_WRITE; + cur_write_flags |= BDRV_REQ_ZERO_WRITE; } break; case QCOW2_CLUSTER_ZERO_PLAIN: case QCOW2_CLUSTER_ZERO_ALLOC: - cur_flags |= BDRV_REQ_ZERO_WRITE; + cur_write_flags |= BDRV_REQ_ZERO_WRITE; break; case QCOW2_CLUSTER_COMPRESSED: @@ -XXX,XX +XXX,XX @@ qcow2_co_copy_range_from(BlockDriverState *bs, ret = bdrv_co_copy_range_from(child, copy_offset, dst, dst_offset, - cur_bytes, cur_flags); + cur_bytes, read_flags, cur_write_flags); qemu_co_mutex_lock(&s->lock); if (ret < 0) { goto out; @@ -XXX,XX +XXX,XX @@ static int coroutine_fn qcow2_co_copy_range_to(BlockDriverState *bs, BdrvChild *src, uint64_t src_offset, BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + uint64_t bytes, BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { BDRVQcow2State *s = bs->opaque; int offset_in_cluster; @@ -XXX,XX +XXX,XX @@ qcow2_co_copy_range_to(BlockDriverState *bs, ret = bdrv_co_copy_range_to(src, src_offset, bs->file, cluster_offset + offset_in_cluster, - cur_bytes, flags); + cur_bytes, read_flags, write_flags); qemu_co_mutex_lock(&s->lock); if (ret < 0) { goto fail; diff --git a/block/raw-format.c b/block/raw-format.c index XXXXXXX..XXXXXXX 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -XXX,XX +XXX,XX @@ static int raw_probe_geometry(BlockDriverState *bs, HDGeometry *geo) } static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs, - BdrvChild *src, uint64_t src_offset, - BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + BdrvChild *src, + uint64_t src_offset, + BdrvChild *dst, + uint64_t dst_offset, + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { int ret; @@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_copy_range_from(BlockDriverState *bs, return ret; } return bdrv_co_copy_range_from(bs->file, src_offset, dst, dst_offset, - bytes, flags); + bytes, read_flags, write_flags); } static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs, - BdrvChild *src, uint64_t src_offset, - BdrvChild *dst, uint64_t dst_offset, - uint64_t bytes, BdrvRequestFlags flags) + BdrvChild *src, + uint64_t src_offset, + BdrvChild *dst, + uint64_t dst_offset, + uint64_t bytes, + BdrvRequestFlags read_flags, + BdrvRequestFlags write_flags) { int ret; @@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_copy_range_to(BlockDriverState *bs, return ret; } return bdrv_co_copy_range_to(src, src_offset, bs->file, dst_offset, bytes, - flags); + read_flags, write_flags); } BlockDriver bdrv_raw = { diff --git a/qemu-img.c b/qemu-img.c index XXXXXXX..XXXXXXX 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn convert_co_copy_range(ImgConvertState *s, int64_t sector ret = blk_co_copy_range(blk, offset, s->target, sector_num << BDRV_SECTOR_BITS, - n << BDRV_SECTOR_BITS, 0); + n << BDRV_SECTOR_BITS, 0, 0); if (ret < 0) { return ret; } -- 2.13.6
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Serialized writes should be used in copy-on-write of backup(sync=none) for image fleecing scheme. We need to change an assert in bdrv_aligned_pwritev, added in 28de2dcd88de. The assert may fail now, because call to wait_serialising_requests here may become first call to it for this request with serializing flag set. It occurs if the request is aligned (otherwise, we should already set serializing flag before calling bdrv_aligned_pwritev and correspondingly waited for all intersecting requests). However, for aligned requests, we should not care about outdating of previously read data, as there no such data. Therefore, let's just update an assert to not care about aligned requests. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/block/block.h | 14 +++++++++++++- block/io.c | 28 +++++++++++++++++++++++++++- 2 files changed, 40 insertions(+), 2 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -XXX,XX +XXX,XX @@ typedef enum { * content. */ BDRV_REQ_WRITE_UNCHANGED = 0x40, + /* + * BDRV_REQ_SERIALISING forces request serialisation for writes. + * It is used to ensure that writes to the backing file of a backup process + * target cannot race with a read of the backup target that defers to the + * backing file. + * + * Note, that BDRV_REQ_SERIALISING is _not_ opposite in meaning to + * BDRV_REQ_NO_SERIALISING. A more descriptive name for the latter might be + * _DO_NOT_WAIT_FOR_SERIALISING, except that is too long. + */ + BDRV_REQ_SERIALISING = 0x80, + /* Mask of valid flags */ - BDRV_REQ_MASK = 0x7f, + BDRV_REQ_MASK = 0xff, } BdrvRequestFlags; typedef struct BlockSizes { diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ static void mark_request_serialising(BdrvTrackedRequest *req, uint64_t align) req->overlap_bytes = MAX(req->overlap_bytes, overlap_bytes); } +static bool is_request_serialising_and_aligned(BdrvTrackedRequest *req) +{ + /* + * If the request is serialising, overlap_offset and overlap_bytes are set, + * so we can check if the request is aligned. Otherwise, don't care and + * return false. + */ + + return req->serialising && (req->offset == req->overlap_offset) && + (req->bytes == req->overlap_bytes); +} + /** * Round a region to cluster boundaries */ @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_aligned_preadv(BdrvChild *child, mark_request_serialising(req, bdrv_get_cluster_size(bs)); } + /* BDRV_REQ_SERIALISING is only for write operation */ + assert(!(flags & BDRV_REQ_SERIALISING)); + if (!(flags & BDRV_REQ_NO_SERIALISING)) { wait_serialising_requests(req); } @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child, /* BDRV_REQ_NO_SERIALISING is only for read operation */ assert(!(flags & BDRV_REQ_NO_SERIALISING)); + + if (flags & BDRV_REQ_SERIALISING) { + mark_request_serialising(req, bdrv_get_cluster_size(bs)); + } + waited = wait_serialising_requests(req); - assert(!waited || !req->serialising); + assert(!waited || !req->serialising || + is_request_serialising_and_aligned(req)); assert(req->overlap_offset <= offset); assert(offset + bytes <= req->overlap_offset + req->overlap_bytes); if (flags & BDRV_REQ_WRITE_UNCHANGED) { @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_co_copy_range_internal( tracked_request_begin(&req, src->bs, src_offset, bytes, BDRV_TRACKED_READ); + /* BDRV_REQ_SERIALISING is only for write operation */ + assert(!(read_flags & BDRV_REQ_SERIALISING)); if (!(read_flags & BDRV_REQ_NO_SERIALISING)) { wait_serialising_requests(&req); } @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_co_copy_range_internal( /* BDRV_REQ_NO_SERIALISING is only for read operation */ assert(!(write_flags & BDRV_REQ_NO_SERIALISING)); + if (write_flags & BDRV_REQ_SERIALISING) { + mark_request_serialising(&req, bdrv_get_cluster_size(dst->bs)); + } wait_serialising_requests(&req); ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs, -- 2.13.6
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Fleecing scheme works as follows: we want a kind of temporary snapshot of active drive A. We create temporary image B, with B->backing = A. Then we start backup(sync=none) from A to B. From this point, B reads as point-in-time snapshot of A (A continues to be active drive, accepting guest IO). This scheme needs some additional synchronization between reads from B and backup COW operations, otherwise, the following situation is theoretically possible: (assume B is qcow2, client is NBD client, reading from B) 1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and goes up to l2 table loading (assume cache miss) 2) guest write => backup COW => qcow2 write => try to take qcow2 mutex => waiting 3. l2 table loaded, we see that cluster is UNALLOCATED, go to "case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before bdrv_co_preadv(bs->backing, ...) 4) aha, mutex unlocked, backup COW continues, and we finally finish guest write and change cluster in our active disk A 5. actually, do bdrv_co_preadv(bs->backing, ...) and read _new updated_ data. To avoid this, let's make backup writes serializing, to not intersect with reads from B. Note: we expand range of handled cases from (sync=none and B->backing = A) to just (A in backing chain of B), to finally allow safe reading from B during backup for all cases when A in backing chain of B, i.e. B formally looks like point-in-time snapshot of A. Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/backup.c | 20 ++++++++++++++------ 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/block/backup.c b/block/backup.c index XXXXXXX..XXXXXXX 100644 --- a/block/backup.c +++ b/block/backup.c @@ -XXX,XX +XXX,XX @@ typedef struct BackupBlockJob { HBitmap *copy_bitmap; bool use_copy_range; int64_t copy_range_size; + + bool serialize_target_writes; } BackupBlockJob; static const BlockJobDriver backup_job_driver; @@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job, QEMUIOVector qiov; BlockBackend *blk = job->common.blk; int nbytes; + int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; + int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0; hbitmap_reset(job->copy_bitmap, start / job->cluster_size, 1); nbytes = MIN(job->cluster_size, job->len - start); @@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job, iov.iov_len = nbytes; qemu_iovec_init_external(&qiov, &iov, 1); - ret = blk_co_preadv(blk, start, qiov.size, &qiov, - is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0); + ret = blk_co_preadv(blk, start, qiov.size, &qiov, read_flags); if (ret < 0) { trace_backup_do_cow_read_fail(job, start, ret); if (error_is_read) { @@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job, if (qemu_iovec_is_zero(&qiov)) { ret = blk_co_pwrite_zeroes(job->target, start, - qiov.size, BDRV_REQ_MAY_UNMAP); + qiov.size, write_flags | BDRV_REQ_MAY_UNMAP); } else { ret = blk_co_pwritev(job->target, start, - qiov.size, &qiov, - job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0); + qiov.size, &qiov, write_flags | + (job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0)); } if (ret < 0) { trace_backup_do_cow_write_fail(job, start, ret); @@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job, int nr_clusters; BlockBackend *blk = job->common.blk; int nbytes; + int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0; + int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0; assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size)); nbytes = MIN(job->copy_range_size, end - start); @@ -XXX,XX +XXX,XX @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job, hbitmap_reset(job->copy_bitmap, start / job->cluster_size, nr_clusters); ret = blk_co_copy_range(blk, start, job->target, start, nbytes, - is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0, 0); + read_flags, write_flags); if (ret < 0) { trace_backup_do_cow_copy_range_fail(job, start, ret); hbitmap_set(job->copy_bitmap, start / job->cluster_size, @@ -XXX,XX +XXX,XX @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs, sync_bitmap : NULL; job->compress = compress; + /* Detect image-fleecing (and similar) schemes */ + job->serialize_target_writes = bdrv_chain_contains(target, bs); + /* If there is no backing file on the target, we cannot rely on COW if our * backup cluster size is smaller than the target cluster size. Even for * targets with a backing file, try to avoid COW if possible. */ -- 2.13.6
From: Ari Sundholm <ari@tuxera.com> This was accidentally omitted. Thanks to Eric Blake for spotting this. Signed-off-by: Ari Sundholm <ari@tuxera.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- qapi/block-core.json | 2 ++ 1 file changed, 2 insertions(+) diff --git a/qapi/block-core.json b/qapi/block-core.json index XXXXXXX..XXXXXXX 100644 --- a/qapi/block-core.json +++ b/qapi/block-core.json @@ -XXX,XX +XXX,XX @@ # @log-sector-size: sector size used in logging writes to @file, determines # granularity of offsets and sizes of writes (default: 512) # +# @log-append: append to an existing log (default: false) +# # @log-super-update-interval: interval of write requests after which the log # super block is updated to disk (default: 4096) # -- 2.13.6
From: Ari Sundholm <ari@tuxera.com> The sector size needs to be large enough to accommodate the data structures for the log super block and log write entries. This was previously not properly checked, which made it possible to cause QEMU to badly misbehave. Signed-off-by: Ari Sundholm <ari@tuxera.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/blklogwrites.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/block/blklogwrites.c b/block/blklogwrites.c index XXXXXXX..XXXXXXX 100644 --- a/block/blklogwrites.c +++ b/block/blklogwrites.c @@ -XXX,XX +XXX,XX @@ static inline uint32_t blk_log_writes_log2(uint32_t value) static inline bool blk_log_writes_sector_size_valid(uint32_t sector_size) { - return sector_size < (1ull << 24) && is_power_of_2(sector_size); + return is_power_of_2(sector_size) && + sector_size >= sizeof(struct log_write_super) && + sector_size >= sizeof(struct log_write_entry) && + sector_size < (1ull << 24); } static uint64_t blk_log_writes_find_cur_log_sector(BdrvChild *log, -- 2.13.6
From: Cornelia Huck <cohuck@redhat.com> This reverts commit 6266e900b8083945cb766b45c124fb3c42932cb3. Some deprecated -drive options were still in use by libvirt, only fixed with libvirt commit b340c6c614 ("qemu: format serial and geometry on frontend disk device"), which is not yet in any released version of libvirt. So let's hold off removing the deprecated options for one more QEMU release. Reported-by: Christian Borntraeger <borntraeger@de.ibm.com> Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- blockdev.c | 12 ++++++++++++ 1 file changed, 12 insertions(+) diff --git a/blockdev.c b/blockdev.c index XXXXXXX..XXXXXXX 100644 --- a/blockdev.c +++ b/blockdev.c @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) const char *filename; Error *local_err = NULL; int i; + const char *deprecated[] = { + }; /* Change legacy command line options into QMP ones */ static const struct { @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) goto fail; } + /* Other deprecated options */ + if (!qtest_enabled()) { + for (i = 0; i < ARRAY_SIZE(deprecated); i++) { + if (qemu_opt_get(legacy_opts, deprecated[i]) != NULL) { + error_report("'%s' is deprecated, please use the corresponding " + "option of '-device' instead", deprecated[i]); + } + } + } + /* Media type */ value = qemu_opt_get(legacy_opts, "media"); if (value) { -- 2.13.6
From: Cornelia Huck <cohuck@redhat.com> This reverts commit b0083267444a5e0f28391f6c2831a539f878d424. Hold off removing this for one more QEMU release (current libvirt release still uses it.) Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/hw/block/block.h | 1 + include/sysemu/blockdev.h | 1 + block/block-backend.c | 1 + blockdev.c | 10 ++++++++++ hw/block/block.c | 13 +++++++++++++ hw/block/nvme.c | 1 + hw/block/virtio-blk.c | 1 + hw/ide/qdev.c | 1 + hw/scsi/scsi-disk.c | 1 + hw/usb/dev-storage.c | 1 + tests/ahci-test.c | 6 +++--- tests/ide-test.c | 8 ++++---- qemu-doc.texi | 5 +++++ qemu-options.hx | 6 +++++- 14 files changed, 48 insertions(+), 8 deletions(-) diff --git a/include/hw/block/block.h b/include/hw/block/block.h index XXXXXXX..XXXXXXX 100644 --- a/include/hw/block/block.h +++ b/include/hw/block/block.h @@ -XXX,XX +XXX,XX @@ static inline unsigned int get_physical_block_exp(BlockConf *conf) /* Configuration helpers */ +void blkconf_serial(BlockConf *conf, char **serial); bool blkconf_geometry(BlockConf *conf, int *trans, unsigned cyls_max, unsigned heads_max, unsigned secs_max, Error **errp); diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h index XXXXXXX..XXXXXXX 100644 --- a/include/sysemu/blockdev.h +++ b/include/sysemu/blockdev.h @@ -XXX,XX +XXX,XX @@ struct DriveInfo { bool is_default; /* Added by default_drive() ? */ int media_cd; QemuOpts *opts; + char *serial; QTAILQ_ENTRY(DriveInfo) next; }; diff --git a/block/block-backend.c b/block/block-backend.c index XXXXXXX..XXXXXXX 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -XXX,XX +XXX,XX @@ static void drive_info_del(DriveInfo *dinfo) return; } qemu_opts_del(dinfo->opts); + g_free(dinfo->serial); g_free(dinfo); } diff --git a/blockdev.c b/blockdev.c index XXXXXXX..XXXXXXX 100644 --- a/blockdev.c +++ b/blockdev.c @@ -XXX,XX +XXX,XX @@ QemuOptsList qemu_legacy_drive_opts = { .type = QEMU_OPT_STRING, .help = "interface (ide, scsi, sd, mtd, floppy, pflash, virtio)", },{ + .name = "serial", + .type = QEMU_OPT_STRING, + .help = "disk serial number", + },{ .name = "file", .type = QEMU_OPT_STRING, .help = "file name", @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) const char *werror, *rerror; bool read_only = false; bool copy_on_read; + const char *serial; const char *filename; Error *local_err = NULL; int i; const char *deprecated[] = { + "serial" }; /* Change legacy command line options into QMP ones */ @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) goto fail; } + /* Serial number */ + serial = qemu_opt_get(legacy_opts, "serial"); + /* no id supplied -> create one */ if (qemu_opts_id(all_opts) == NULL) { char *new_id; @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) dinfo->type = type; dinfo->bus = bus_id; dinfo->unit = unit_id; + dinfo->serial = g_strdup(serial); blk_set_legacy_dinfo(blk, dinfo); diff --git a/hw/block/block.c b/hw/block/block.c index XXXXXXX..XXXXXXX 100644 --- a/hw/block/block.c +++ b/hw/block/block.c @@ -XXX,XX +XXX,XX @@ #include "qapi/qapi-types-block.h" #include "qemu/error-report.h" +void blkconf_serial(BlockConf *conf, char **serial) +{ + DriveInfo *dinfo; + + if (!*serial) { + /* try to fall back to value set with legacy -drive serial=... */ + dinfo = blk_legacy_dinfo(conf->blk); + if (dinfo) { + *serial = g_strdup(dinfo->serial); + } + } +} + void blkconf_blocksizes(BlockConf *conf) { BlockBackend *blk = conf->blk; diff --git a/hw/block/nvme.c b/hw/block/nvme.c index XXXXXXX..XXXXXXX 100644 --- a/hw/block/nvme.c +++ b/hw/block/nvme.c @@ -XXX,XX +XXX,XX @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp) return; } + blkconf_serial(&n->conf, &n->serial); if (!n->serial) { error_setg(errp, "serial property not set"); return; diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c index XXXXXXX..XXXXXXX 100644 --- a/hw/block/virtio-blk.c +++ b/hw/block/virtio-blk.c @@ -XXX,XX +XXX,XX @@ static void virtio_blk_device_realize(DeviceState *dev, Error **errp) return; } + blkconf_serial(&conf->conf, &conf->serial); if (!blkconf_apply_backend_options(&conf->conf, blk_is_read_only(conf->conf.blk), true, errp)) { diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c index XXXXXXX..XXXXXXX 100644 --- a/hw/ide/qdev.c +++ b/hw/ide/qdev.c @@ -XXX,XX +XXX,XX @@ static void ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind, Error **errp) return; } + blkconf_serial(&dev->conf, &dev->serial); if (kind != IDE_CD) { if (!blkconf_geometry(&dev->conf, &dev->chs_trans, 65535, 16, 255, errp)) { diff --git a/hw/scsi/scsi-disk.c b/hw/scsi/scsi-disk.c index XXXXXXX..XXXXXXX 100644 --- a/hw/scsi/scsi-disk.c +++ b/hw/scsi/scsi-disk.c @@ -XXX,XX +XXX,XX @@ static void scsi_realize(SCSIDevice *dev, Error **errp) return; } + blkconf_serial(&s->qdev.conf, &s->serial); blkconf_blocksizes(&s->qdev.conf); if (s->qdev.conf.logical_block_size > diff --git a/hw/usb/dev-storage.c b/hw/usb/dev-storage.c index XXXXXXX..XXXXXXX 100644 --- a/hw/usb/dev-storage.c +++ b/hw/usb/dev-storage.c @@ -XXX,XX +XXX,XX @@ static void usb_msd_storage_realize(USBDevice *dev, Error **errp) return; } + blkconf_serial(&s->conf, &dev->serial); blkconf_blocksizes(&s->conf); if (!blkconf_apply_backend_options(&s->conf, blk_is_read_only(blk), true, errp)) { diff --git a/tests/ahci-test.c b/tests/ahci-test.c index XXXXXXX..XXXXXXX 100644 --- a/tests/ahci-test.c +++ b/tests/ahci-test.c @@ -XXX,XX +XXX,XX @@ static AHCIQState *ahci_boot(const char *cli, ...) s = ahci_vboot(cli, ap); va_end(ap); } else { - cli = "-drive if=none,id=drive0,file=%s,cache=writeback,format=%s" + cli = "-drive if=none,id=drive0,file=%s,cache=writeback,serial=%s" + ",format=%s" " -M q35 " "-device ide-hd,drive=drive0 " - "-global ide-hd.serial=%s " "-global ide-hd.ver=%s"; - s = ahci_boot(cli, tmp_path, imgfmt, "testdisk", "version"); + s = ahci_boot(cli, tmp_path, "testdisk", imgfmt, "version"); } return s; diff --git a/tests/ide-test.c b/tests/ide-test.c index XXXXXXX..XXXXXXX 100644 --- a/tests/ide-test.c +++ b/tests/ide-test.c @@ -XXX,XX +XXX,XX @@ static void test_bmdma_no_busmaster(void) static void test_bmdma_setup(void) { ide_test_start( - "-drive file=%s,if=ide,cache=writeback,format=raw " - "-global ide-hd.serial=%s -global ide-hd.ver=%s", + "-drive file=%s,if=ide,serial=%s,cache=writeback,format=raw " + "-global ide-hd.ver=%s", tmp_path, "testdisk", "version"); qtest_irq_intercept_in(global_qtest, "ioapic"); } @@ -XXX,XX +XXX,XX @@ static void test_identify(void) int ret; ide_test_start( - "-drive file=%s,if=ide,cache=writeback,format=raw " - "-global ide-hd.serial=%s -global ide-hd.ver=%s", + "-drive file=%s,if=ide,serial=%s,cache=writeback,format=raw " + "-global ide-hd.ver=%s", tmp_path, "testdisk", "version"); dev = get_pci_device(&bmdma_bar, &ide_bar); diff --git a/qemu-doc.texi b/qemu-doc.texi index XXXXXXX..XXXXXXX 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -XXX,XX +XXX,XX @@ with ``-device ...,netdev=x''), or ``-nic user,smb=/some/dir'' (for embedded NICs). The new syntax allows different settings to be provided per NIC. +@subsection -drive serial=... (since 2.10.0) + +The drive serial argument is replaced by the the serial argument +that can be specified with the ``-device'' parameter. + @subsection -usbdevice (since 2.10.0) The ``-usbdevice DEV'' argument is now a synonym for setting diff --git a/qemu-options.hx b/qemu-options.hx index XXXXXXX..XXXXXXX 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -XXX,XX +XXX,XX @@ ETEXI DEF("drive", HAS_ARG, QEMU_OPTION_drive, "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n" " [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n" - " [,snapshot=on|off][,rerror=ignore|stop|report]\n" + " [,snapshot=on|off][,serial=s][,rerror=ignore|stop|report]\n" " [,werror=ignore|stop|report|enospc][,id=name][,aio=threads|native]\n" " [,readonly=on|off][,copy-on-read=on|off]\n" " [,discard=ignore|unmap][,detect-zeroes=on|off|unmap]\n" @@ -XXX,XX +XXX,XX @@ The default mode is @option{cache=writeback}. Specify which disk @var{format} will be used rather than detecting the format. Can be used to specify format=raw to avoid interpreting an untrusted format header. +@item serial=@var{serial} +This option specifies the serial number to assign to the device. This +parameter is deprecated, use the corresponding parameter of @code{-device} +instead. @item werror=@var{action},rerror=@var{action} Specify which @var{action} to take on write and read errors. Valid actions are: "ignore" (ignore the error and try to continue), "stop" (pause QEMU), -- 2.13.6
From: Cornelia Huck <cohuck@redhat.com> This reverts commit eae3bd1eb7c6b105d30ec06008b3bc3dfc5f45bb. Reverted to avoid conflicts for geometry options revert. Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/sysemu/blockdev.h | 1 + blockdev.c | 17 ++++++++++++++++- device-hotplug.c | 4 ++++ qemu-doc.texi | 5 +++++ qemu-options.hx | 5 ++++- 5 files changed, 30 insertions(+), 2 deletions(-) diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h index XXXXXXX..XXXXXXX 100644 --- a/include/sysemu/blockdev.h +++ b/include/sysemu/blockdev.h @@ -XXX,XX +XXX,XX @@ typedef enum { } BlockInterfaceType; struct DriveInfo { + const char *devaddr; BlockInterfaceType type; int bus; int unit; diff --git a/blockdev.c b/blockdev.c index XXXXXXX..XXXXXXX 100644 --- a/blockdev.c +++ b/blockdev.c @@ -XXX,XX +XXX,XX @@ QemuOptsList qemu_legacy_drive_opts = { .type = QEMU_OPT_STRING, .help = "interface (ide, scsi, sd, mtd, floppy, pflash, virtio)", },{ + .name = "addr", + .type = QEMU_OPT_STRING, + .help = "pci address (virtio only)", + },{ .name = "serial", .type = QEMU_OPT_STRING, .help = "disk serial number", @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) DriveMediaType media = MEDIA_DISK; BlockInterfaceType type; int max_devs, bus_id, unit_id, index; + const char *devaddr; const char *werror, *rerror; bool read_only = false; bool copy_on_read; @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) Error *local_err = NULL; int i; const char *deprecated[] = { - "serial" + "serial", "addr" }; /* Change legacy command line options into QMP ones */ @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) } /* Add virtio block device */ + devaddr = qemu_opt_get(legacy_opts, "addr"); + if (devaddr && type != IF_VIRTIO) { + error_report("addr is not supported by this bus type"); + goto fail; + } + if (type == IF_VIRTIO) { QemuOpts *devopts; devopts = qemu_opts_create(qemu_find_opts("device"), NULL, 0, @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) } qemu_opt_set(devopts, "drive", qdict_get_str(bs_opts, "id"), &error_abort); + if (devaddr) { + qemu_opt_set(devopts, "addr", devaddr, &error_abort); + } } filename = qemu_opt_get(legacy_opts, "file"); @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) dinfo->type = type; dinfo->bus = bus_id; dinfo->unit = unit_id; + dinfo->devaddr = devaddr; dinfo->serial = g_strdup(serial); blk_set_legacy_dinfo(blk, dinfo); diff --git a/device-hotplug.c b/device-hotplug.c index XXXXXXX..XXXXXXX 100644 --- a/device-hotplug.c +++ b/device-hotplug.c @@ -XXX,XX +XXX,XX @@ void hmp_drive_add(Monitor *mon, const QDict *qdict) if (!dinfo) { goto err; } + if (dinfo->devaddr) { + monitor_printf(mon, "Parameter addr not supported\n"); + goto err; + } switch (dinfo->type) { case IF_NONE: diff --git a/qemu-doc.texi b/qemu-doc.texi index XXXXXXX..XXXXXXX 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -XXX,XX +XXX,XX @@ provided per NIC. The drive serial argument is replaced by the the serial argument that can be specified with the ``-device'' parameter. +@subsection -drive addr=... (since 2.10.0) + +The drive addr argument is replaced by the the addr argument +that can be specified with the ``-device'' parameter. + @subsection -usbdevice (since 2.10.0) The ``-usbdevice DEV'' argument is now a synonym for setting diff --git a/qemu-options.hx b/qemu-options.hx index XXXXXXX..XXXXXXX 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -XXX,XX +XXX,XX @@ ETEXI DEF("drive", HAS_ARG, QEMU_OPTION_drive, "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n" " [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n" - " [,snapshot=on|off][,serial=s][,rerror=ignore|stop|report]\n" + " [,snapshot=on|off][,serial=s][,addr=A][,rerror=ignore|stop|report]\n" " [,werror=ignore|stop|report|enospc][,id=name][,aio=threads|native]\n" " [,readonly=on|off][,copy-on-read=on|off]\n" " [,discard=ignore|unmap][,detect-zeroes=on|off|unmap]\n" @@ -XXX,XX +XXX,XX @@ an untrusted format header. This option specifies the serial number to assign to the device. This parameter is deprecated, use the corresponding parameter of @code{-device} instead. +@item addr=@var{addr} +Specify the controller's PCI address (if=virtio only). This parameter is +deprecated, use the corresponding parameter of @code{-device} instead. @item werror=@var{action},rerror=@var{action} Specify which @var{action} to take on write and read errors. Valid actions are: "ignore" (ignore the error and try to continue), "stop" (pause QEMU), -- 2.13.6
From: Cornelia Huck <cohuck@redhat.com> This reverts commit a7aff6dd10b16b67e8b142d0c94c5d92c3fe88f6. Hold off removing this for one more QEMU release (current libvirt release still uses it.) Signed-off-by: Cornelia Huck <cohuck@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/sysemu/blockdev.h | 1 + blockdev.c | 75 ++++++++++++++++++++++++++++++++++++++++++++++- hw/block/block.c | 14 +++++++++ tests/hd-geo-test.c | 37 ++++++++++++++++++----- hmp-commands.hx | 1 + qemu-doc.texi | 5 ++++ qemu-options.hx | 7 ++++- 7 files changed, 131 insertions(+), 9 deletions(-) diff --git a/include/sysemu/blockdev.h b/include/sysemu/blockdev.h index XXXXXXX..XXXXXXX 100644 --- a/include/sysemu/blockdev.h +++ b/include/sysemu/blockdev.h @@ -XXX,XX +XXX,XX @@ struct DriveInfo { int auto_del; /* see blockdev_mark_auto_del() */ bool is_default; /* Added by default_drive() ? */ int media_cd; + int cyls, heads, secs, trans; QemuOpts *opts; char *serial; QTAILQ_ENTRY(DriveInfo) next; diff --git a/blockdev.c b/blockdev.c index XXXXXXX..XXXXXXX 100644 --- a/blockdev.c +++ b/blockdev.c @@ -XXX,XX +XXX,XX @@ QemuOptsList qemu_legacy_drive_opts = { .type = QEMU_OPT_STRING, .help = "interface (ide, scsi, sd, mtd, floppy, pflash, virtio)", },{ + .name = "cyls", + .type = QEMU_OPT_NUMBER, + .help = "number of cylinders (ide disk geometry)", + },{ + .name = "heads", + .type = QEMU_OPT_NUMBER, + .help = "number of heads (ide disk geometry)", + },{ + .name = "secs", + .type = QEMU_OPT_NUMBER, + .help = "number of sectors (ide disk geometry)", + },{ + .name = "trans", + .type = QEMU_OPT_STRING, + .help = "chs translation (auto, lba, none)", + },{ .name = "addr", .type = QEMU_OPT_STRING, .help = "pci address (virtio only)", @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) QemuOpts *legacy_opts; DriveMediaType media = MEDIA_DISK; BlockInterfaceType type; + int cyls, heads, secs, translation; int max_devs, bus_id, unit_id, index; const char *devaddr; const char *werror, *rerror; @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) Error *local_err = NULL; int i; const char *deprecated[] = { - "serial", "addr" + "serial", "trans", "secs", "heads", "cyls", "addr" }; /* Change legacy command line options into QMP ones */ @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) type = block_default_type; } + /* Geometry */ + cyls = qemu_opt_get_number(legacy_opts, "cyls", 0); + heads = qemu_opt_get_number(legacy_opts, "heads", 0); + secs = qemu_opt_get_number(legacy_opts, "secs", 0); + + if (cyls || heads || secs) { + if (cyls < 1) { + error_report("invalid physical cyls number"); + goto fail; + } + if (heads < 1) { + error_report("invalid physical heads number"); + goto fail; + } + if (secs < 1) { + error_report("invalid physical secs number"); + goto fail; + } + } + + translation = BIOS_ATA_TRANSLATION_AUTO; + value = qemu_opt_get(legacy_opts, "trans"); + if (value != NULL) { + if (!cyls) { + error_report("'%s' trans must be used with cyls, heads and secs", + value); + goto fail; + } + if (!strcmp(value, "none")) { + translation = BIOS_ATA_TRANSLATION_NONE; + } else if (!strcmp(value, "lba")) { + translation = BIOS_ATA_TRANSLATION_LBA; + } else if (!strcmp(value, "large")) { + translation = BIOS_ATA_TRANSLATION_LARGE; + } else if (!strcmp(value, "rechs")) { + translation = BIOS_ATA_TRANSLATION_RECHS; + } else if (!strcmp(value, "auto")) { + translation = BIOS_ATA_TRANSLATION_AUTO; + } else { + error_report("'%s' invalid translation type", value); + goto fail; + } + } + + if (media == MEDIA_CDROM) { + if (cyls || secs || heads) { + error_report("CHS can't be set with media=cdrom"); + goto fail; + } + } + /* Device address specified by bus/unit or index. * If none was specified, try to find the first free one. */ bus_id = qemu_opt_get_number(legacy_opts, "bus", 0); @@ -XXX,XX +XXX,XX @@ DriveInfo *drive_new(QemuOpts *all_opts, BlockInterfaceType block_default_type) dinfo = g_malloc0(sizeof(*dinfo)); dinfo->opts = all_opts; + dinfo->cyls = cyls; + dinfo->heads = heads; + dinfo->secs = secs; + dinfo->trans = translation; + dinfo->type = type; dinfo->bus = bus_id; dinfo->unit = unit_id; diff --git a/hw/block/block.c b/hw/block/block.c index XXXXXXX..XXXXXXX 100644 --- a/hw/block/block.c +++ b/hw/block/block.c @@ -XXX,XX +XXX,XX @@ bool blkconf_geometry(BlockConf *conf, int *ptrans, unsigned cyls_max, unsigned heads_max, unsigned secs_max, Error **errp) { + DriveInfo *dinfo; + + if (!conf->cyls && !conf->heads && !conf->secs) { + /* try to fall back to value set with legacy -drive cyls=... */ + dinfo = blk_legacy_dinfo(conf->blk); + if (dinfo) { + conf->cyls = dinfo->cyls; + conf->heads = dinfo->heads; + conf->secs = dinfo->secs; + if (ptrans) { + *ptrans = dinfo->trans; + } + } + } if (!conf->cyls && !conf->heads && !conf->secs) { hd_geometry_guess(conf->blk, &conf->cyls, &conf->heads, &conf->secs, diff --git a/tests/hd-geo-test.c b/tests/hd-geo-test.c index XXXXXXX..XXXXXXX 100644 --- a/tests/hd-geo-test.c +++ b/tests/hd-geo-test.c @@ -XXX,XX +XXX,XX @@ static void setup_mbr(int img_idx, MBRcontents mbr) static int setup_ide(int argc, char *argv[], int argv_sz, int ide_idx, const char *dev, int img_idx, - MBRcontents mbr) + MBRcontents mbr, const char *opts) { char *s1, *s2, *s3; @@ -XXX,XX +XXX,XX @@ static int setup_ide(int argc, char *argv[], int argv_sz, s3 = g_strdup(",media=cdrom"); } argc = append_arg(argc, argv, argv_sz, - g_strdup_printf("%s%s%s", s1, s2, s3)); + g_strdup_printf("%s%s%s%s", s1, s2, s3, opts)); g_free(s1); g_free(s2); g_free(s3); @@ -XXX,XX +XXX,XX @@ static void test_ide_mbr(bool use_device, MBRcontents mbr) for (i = 0; i < backend_last; i++) { cur_ide[i] = &hd_chst[i][mbr]; dev = use_device ? (is_hd(cur_ide[i]) ? "ide-hd" : "ide-cd") : NULL; - argc = setup_ide(argc, argv, ARGV_SIZE, i, dev, i, mbr); + argc = setup_ide(argc, argv, ARGV_SIZE, i, dev, i, mbr, ""); } args = g_strjoinv(" ", argv); qtest_start(args); @@ -XXX,XX +XXX,XX @@ static void test_ide_drive_user(const char *dev, bool trans) const CHST expected_chst = { secs / (4 * 32) , 4, 32, trans }; argc = setup_common(argv, ARGV_SIZE); - opts = g_strdup_printf("%s,%scyls=%d,heads=%d,secs=%d", - dev, trans ? "bios-chs-trans=lba," : "", + opts = g_strdup_printf("%s,%s%scyls=%d,heads=%d,secs=%d", + dev ?: "", + trans && dev ? "bios-chs-" : "", + trans ? "trans=lba," : "", expected_chst.cyls, expected_chst.heads, expected_chst.secs); cur_ide[0] = &expected_chst; - argc = setup_ide(argc, argv, ARGV_SIZE, 0, opts, backend_small, mbr_chs); + argc = setup_ide(argc, argv, ARGV_SIZE, + 0, dev ? opts : NULL, backend_small, mbr_chs, + dev ? "" : opts); g_free(opts); args = g_strjoinv(" ", argv); qtest_start(args); @@ -XXX,XX +XXX,XX @@ static void test_ide_drive_user(const char *dev, bool trans) } /* + * Test case: IDE device (if=ide) with explicit CHS + */ +static void test_ide_drive_user_chs(void) +{ + test_ide_drive_user(NULL, false); +} + +/* + * Test case: IDE device (if=ide) with explicit CHS and translation + */ +static void test_ide_drive_user_chst(void) +{ + test_ide_drive_user(NULL, true); +} + +/* * Test case: IDE device (if=none) with explicit CHS */ static void test_ide_device_user_chs(void) @@ -XXX,XX +XXX,XX @@ static void test_ide_drive_cd_0(void) for (i = 0; i <= backend_empty; i++) { ide_idx = backend_empty - i; cur_ide[ide_idx] = &hd_chst[i][mbr_blank]; - argc = setup_ide(argc, argv, ARGV_SIZE, ide_idx, NULL, i, mbr_blank); + argc = setup_ide(argc, argv, ARGV_SIZE, + ide_idx, NULL, i, mbr_blank, ""); } args = g_strjoinv(" ", argv); qtest_start(args); @@ -XXX,XX +XXX,XX @@ int main(int argc, char **argv) qtest_add_func("hd-geo/ide/drive/mbr/blank", test_ide_drive_mbr_blank); qtest_add_func("hd-geo/ide/drive/mbr/lba", test_ide_drive_mbr_lba); qtest_add_func("hd-geo/ide/drive/mbr/chs", test_ide_drive_mbr_chs); + qtest_add_func("hd-geo/ide/drive/user/chs", test_ide_drive_user_chs); + qtest_add_func("hd-geo/ide/drive/user/chst", test_ide_drive_user_chst); qtest_add_func("hd-geo/ide/drive/cd_0", test_ide_drive_cd_0); qtest_add_func("hd-geo/ide/device/mbr/blank", test_ide_device_mbr_blank); qtest_add_func("hd-geo/ide/device/mbr/lba", test_ide_device_mbr_lba); diff --git a/hmp-commands.hx b/hmp-commands.hx index XXXXXXX..XXXXXXX 100644 --- a/hmp-commands.hx +++ b/hmp-commands.hx @@ -XXX,XX +XXX,XX @@ ETEXI .params = "[-n] [[<domain>:]<bus>:]<slot>\n" "[file=file][,if=type][,bus=n]\n" "[,unit=m][,media=d][,index=i]\n" + "[,cyls=c,heads=h,secs=s[,trans=t]]\n" "[,snapshot=on|off][,cache=on|off]\n" "[,readonly=on|off][,copy-on-read=on|off]", .help = "add drive to PCI storage controller", diff --git a/qemu-doc.texi b/qemu-doc.texi index XXXXXXX..XXXXXXX 100644 --- a/qemu-doc.texi +++ b/qemu-doc.texi @@ -XXX,XX +XXX,XX @@ with ``-device ...,netdev=x''), or ``-nic user,smb=/some/dir'' (for embedded NICs). The new syntax allows different settings to be provided per NIC. +@subsection -drive cyls=...,heads=...,secs=...,trans=... (since 2.10.0) + +The drive geometry arguments are replaced by the the geometry arguments +that can be specified with the ``-device'' parameter. + @subsection -drive serial=... (since 2.10.0) The drive serial argument is replaced by the the serial argument diff --git a/qemu-options.hx b/qemu-options.hx index XXXXXXX..XXXXXXX 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -XXX,XX +XXX,XX @@ ETEXI DEF("drive", HAS_ARG, QEMU_OPTION_drive, "-drive [file=file][,if=type][,bus=n][,unit=m][,media=d][,index=i]\n" + " [,cyls=c,heads=h,secs=s[,trans=t]][,snapshot=on|off]\n" " [,cache=writethrough|writeback|none|directsync|unsafe][,format=f]\n" - " [,snapshot=on|off][,serial=s][,addr=A][,rerror=ignore|stop|report]\n" + " [,serial=s][,addr=A][,rerror=ignore|stop|report]\n" " [,werror=ignore|stop|report|enospc][,id=name][,aio=threads|native]\n" " [,readonly=on|off][,copy-on-read=on|off]\n" " [,discard=ignore|unmap][,detect-zeroes=on|off|unmap]\n" @@ -XXX,XX +XXX,XX @@ This option defines where is connected the drive by using an index in the list of available connectors of a given interface type. @item media=@var{media} This option defines the type of the media: disk or cdrom. +@item cyls=@var{c},heads=@var{h},secs=@var{s}[,trans=@var{t}] +Force disk physical geometry and the optional BIOS translation (trans=none or +lba). These parameters are deprecated, use the corresponding parameters +of @code{-device} instead. @item snapshot=@var{snapshot} @var{snapshot} is "on" or "off" and controls snapshot mode for the given drive (see @option{-snapshot}). -- 2.13.6
From: Fam Zheng <famz@redhat.com> With in one module, trace points usually have a common prefix named after the module name. paio_submit and paio_submit_co are the only two trace points so far in the two file protocol drivers. As we are adding more, having a common prefix here is better so that trace points can be enabled with a glob. Rename them. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/file-posix.c | 2 +- block/file-win32.c | 2 +- block/trace-events | 4 ++-- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/block/file-posix.c b/block/file-posix.c index XXXXXXX..XXXXXXX 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -XXX,XX +XXX,XX @@ static int paio_submit_co_full(BlockDriverState *bs, int fd, assert(qiov->size == bytes); } - trace_paio_submit_co(offset, bytes, type); + trace_file_paio_submit_co(offset, bytes, type); pool = aio_get_thread_pool(bdrv_get_aio_context(bs)); return thread_pool_submit_co(pool, aio_worker, acb); } diff --git a/block/file-win32.c b/block/file-win32.c index XXXXXXX..XXXXXXX 100644 --- a/block/file-win32.c +++ b/block/file-win32.c @@ -XXX,XX +XXX,XX @@ static BlockAIOCB *paio_submit(BlockDriverState *bs, HANDLE hfile, acb->aio_nbytes = count; acb->aio_offset = offset; - trace_paio_submit(acb, opaque, offset, count, type); + trace_file_paio_submit(acb, opaque, offset, count, type); pool = aio_get_thread_pool(bdrv_get_aio_context(bs)); return thread_pool_submit_aio(pool, aio_worker, acb, cb, opaque); } diff --git a/block/trace-events b/block/trace-events index XXXXXXX..XXXXXXX 100644 --- a/block/trace-events +++ b/block/trace-events @@ -XXX,XX +XXX,XX @@ qmp_block_stream(void *bs, void *job) "bs %p job %p" # block/file-win32.c # block/file-posix.c -paio_submit_co(int64_t offset, int count, int type) "offset %"PRId64" count %d type %d" -paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "acb %p opaque %p offset %"PRId64" count %d type %d" +file_paio_submit_co(int64_t offset, int count, int type) "offset %"PRId64" count %d type %d" +file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "acb %p opaque %p offset %"PRId64" count %d type %d" # block/qcow2.c qcow2_writev_start_req(void *co, int64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d" -- 2.13.6
From: Fam Zheng <famz@redhat.com> A few trace points that can help reveal what is happening in a copy offloading I/O path. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/file-posix.c | 2 ++ block/io.c | 4 ++++ block/iscsi.c | 3 +++ block/trace-events | 6 ++++++ 4 files changed, 15 insertions(+) diff --git a/block/file-posix.c b/block/file-posix.c index XXXXXXX..XXXXXXX 100644 --- a/block/file-posix.c +++ b/block/file-posix.c @@ -XXX,XX +XXX,XX @@ static ssize_t handle_aiocb_copy_range(RawPosixAIOData *aiocb) ssize_t ret = copy_file_range(aiocb->aio_fildes, &in_off, aiocb->aio_fd2, &out_off, bytes, 0); + trace_file_copy_file_range(aiocb->bs, aiocb->aio_fildes, in_off, + aiocb->aio_fd2, out_off, bytes, 0, ret); if (ret == 0) { /* No progress (e.g. when beyond EOF), let the caller fall back to * buffer I/O. */ diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_copy_range_from(BdrvChild *src, uint64_t src_offset, BdrvRequestFlags read_flags, BdrvRequestFlags write_flags) { + trace_bdrv_co_copy_range_from(src, src_offset, dst, dst_offset, bytes, + read_flags, write_flags); return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset, bytes, read_flags, write_flags, true); } @@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_copy_range_to(BdrvChild *src, uint64_t src_offset, BdrvRequestFlags read_flags, BdrvRequestFlags write_flags) { + trace_bdrv_co_copy_range_to(src, src_offset, dst, dst_offset, bytes, + read_flags, write_flags); return bdrv_co_copy_range_internal(src, src_offset, dst, dst_offset, bytes, read_flags, write_flags, false); } diff --git a/block/iscsi.c b/block/iscsi.c index XXXXXXX..XXXXXXX 100644 --- a/block/iscsi.c +++ b/block/iscsi.c @@ -XXX,XX +XXX,XX @@ #include "qapi/qmp/qstring.h" #include "crypto/secret.h" #include "scsi/utils.h" +#include "trace.h" /* Conflict between scsi/utils.h and libiscsi! :( */ #define SCSI_XFER_NONE ISCSI_XFER_NONE @@ -XXX,XX +XXX,XX @@ retry: } out_unlock: + + trace_iscsi_xcopy(src_lun, src_offset, dst_lun, dst_offset, bytes, r); g_free(iscsi_task.task); qemu_mutex_unlock(&dst_lun->mutex); g_free(iscsi_task.err_str); diff --git a/block/trace-events b/block/trace-events index XXXXXXX..XXXXXXX 100644 --- a/block/trace-events +++ b/block/trace-events @@ -XXX,XX +XXX,XX @@ bdrv_co_preadv(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs bdrv_co_pwritev(void *bs, int64_t offset, int64_t nbytes, unsigned int flags) "bs %p offset %"PRId64" nbytes %"PRId64" flags 0x%x" bdrv_co_pwrite_zeroes(void *bs, int64_t offset, int count, int flags) "bs %p offset %"PRId64" count %d flags 0x%x" bdrv_co_do_copy_on_readv(void *bs, int64_t offset, unsigned int bytes, int64_t cluster_offset, int64_t cluster_bytes) "bs %p offset %"PRId64" bytes %u cluster_offset %"PRId64" cluster_bytes %"PRId64 +bdrv_co_copy_range_from(void *src, uint64_t src_offset, void *dst, uint64_t dst_offset, uint64_t bytes, int read_flags, int write_flags) "src %p offset %"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" rw flags 0x%x 0x%x" +bdrv_co_copy_range_to(void *src, uint64_t src_offset, void *dst, uint64_t dst_offset, uint64_t bytes, int read_flags, int write_flags) "src %p offset %"PRIu64" dst %p offset %"PRIu64" bytes %"PRIu64" rw flags 0x%x 0x%x" # block/stream.c stream_one_iteration(void *s, int64_t offset, uint64_t bytes, int is_allocated) "s %p offset %" PRId64 " bytes %" PRIu64 " is_allocated %d" @@ -XXX,XX +XXX,XX @@ qmp_block_stream(void *bs, void *job) "bs %p job %p" # block/file-posix.c file_paio_submit_co(int64_t offset, int count, int type) "offset %"PRId64" count %d type %d" file_paio_submit(void *acb, void *opaque, int64_t offset, int count, int type) "acb %p opaque %p offset %"PRId64" count %d type %d" +file_copy_file_range(void *bs, int src, int64_t src_off, int dst, int64_t dst_off, int64_t bytes, int flags, int64_t ret) "bs %p src_fd %d offset %"PRIu64" dst_fd %d offset %"PRIu64" bytes %"PRIu64" flags %d ret %"PRId64 # block/qcow2.c qcow2_writev_start_req(void *co, int64_t offset, int bytes) "co %p offset 0x%" PRIx64 " bytes %d" @@ -XXX,XX +XXX,XX @@ nvme_free_req_queue_wait(void *q) "q %p" nvme_cmd_map_qiov(void *s, void *cmd, void *req, void *qiov, int entries) "s %p cmd %p req %p qiov %p entries %d" nvme_cmd_map_qiov_pages(void *s, int i, uint64_t page) "s %p page[%d] 0x%"PRIx64 nvme_cmd_map_qiov_iov(void *s, int i, void *page, int pages) "s %p iov[%d] %p pages %d" + +# block/iscsi.c +iscsi_xcopy(void *src_lun, uint64_t src_off, void *dst_lun, uint64_t dst_off, uint64_t bytes, int ret) "src_lun %p offset %"PRIu64" dst_lun %p offset %"PRIu64" bytes %"PRIu64" ret %d" -- 2.13.6
From: Fam Zheng <famz@redhat.com> Other I/O functions are already using a BdrvChild pointer in the API, so make discard do the same. It makes it possible to initiate the same permission checks before doing I/O, and much easier to share the helper functions for this, which will be added and used by write, truncate and copy range paths. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/block/block.h | 4 ++-- block/blkdebug.c | 2 +- block/blklogwrites.c | 2 +- block/blkreplay.c | 2 +- block/block-backend.c | 2 +- block/copy-on-read.c | 2 +- block/io.c | 18 +++++++++--------- block/mirror.c | 2 +- block/qcow2-refcount.c | 2 +- block/raw-format.c | 2 +- block/throttle.c | 2 +- 11 files changed, 20 insertions(+), 20 deletions(-) diff --git a/include/block/block.h b/include/block/block.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/block.h +++ b/include/block/block.h @@ -XXX,XX +XXX,XX @@ AioWait *bdrv_get_aio_wait(BlockDriverState *bs); bdrv_get_aio_context(bs_), \ cond); }) -int bdrv_pdiscard(BlockDriverState *bs, int64_t offset, int bytes); -int bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes); +int bdrv_pdiscard(BdrvChild *child, int64_t offset, int bytes); +int bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int bytes); int bdrv_has_zero_init_1(BlockDriverState *bs); int bdrv_has_zero_init(BlockDriverState *bs); bool bdrv_unallocated_blocks_are_zero(BlockDriverState *bs); diff --git a/block/blkdebug.c b/block/blkdebug.c index XXXXXXX..XXXXXXX 100644 --- a/block/blkdebug.c +++ b/block/blkdebug.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn blkdebug_co_pdiscard(BlockDriverState *bs, return err; } - return bdrv_co_pdiscard(bs->file->bs, offset, bytes); + return bdrv_co_pdiscard(bs->file, offset, bytes); } static int coroutine_fn blkdebug_co_block_status(BlockDriverState *bs, diff --git a/block/blklogwrites.c b/block/blklogwrites.c index XXXXXXX..XXXXXXX 100644 --- a/block/blklogwrites.c +++ b/block/blklogwrites.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn blk_log_writes_co_do_file_flush(BlkLogWritesFileReq *fr) static int coroutine_fn blk_log_writes_co_do_file_pdiscard(BlkLogWritesFileReq *fr) { - return bdrv_co_pdiscard(fr->bs->file->bs, fr->offset, fr->bytes); + return bdrv_co_pdiscard(fr->bs->file, fr->offset, fr->bytes); } static int coroutine_fn diff --git a/block/blkreplay.c b/block/blkreplay.c index XXXXXXX..XXXXXXX 100755 --- a/block/blkreplay.c +++ b/block/blkreplay.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn blkreplay_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes) { uint64_t reqid = blkreplay_next_id(); - int ret = bdrv_co_pdiscard(bs->file->bs, offset, bytes); + int ret = bdrv_co_pdiscard(bs->file, offset, bytes); block_request_create(reqid, bs, qemu_coroutine_self()); qemu_coroutine_yield(); diff --git a/block/block-backend.c b/block/block-backend.c index XXXXXXX..XXXXXXX 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -XXX,XX +XXX,XX @@ int blk_co_pdiscard(BlockBackend *blk, int64_t offset, int bytes) return ret; } - return bdrv_co_pdiscard(blk_bs(blk), offset, bytes); + return bdrv_co_pdiscard(blk->root, offset, bytes); } int blk_co_flush(BlockBackend *blk) diff --git a/block/copy-on-read.c b/block/copy-on-read.c index XXXXXXX..XXXXXXX 100644 --- a/block/copy-on-read.c +++ b/block/copy-on-read.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn cor_co_pwrite_zeroes(BlockDriverState *bs, static int coroutine_fn cor_co_pdiscard(BlockDriverState *bs, int64_t offset, int bytes) { - return bdrv_co_pdiscard(bs->file->bs, offset, bytes); + return bdrv_co_pdiscard(bs->file, offset, bytes); } diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ int bdrv_flush(BlockDriverState *bs) } typedef struct DiscardCo { - BlockDriverState *bs; + BdrvChild *child; int64_t offset; int bytes; int ret; @@ -XXX,XX +XXX,XX @@ static void coroutine_fn bdrv_pdiscard_co_entry(void *opaque) { DiscardCo *rwco = opaque; - rwco->ret = bdrv_co_pdiscard(rwco->bs, rwco->offset, rwco->bytes); + rwco->ret = bdrv_co_pdiscard(rwco->child, rwco->offset, rwco->bytes); } -int coroutine_fn bdrv_co_pdiscard(BlockDriverState *bs, int64_t offset, - int bytes) +int coroutine_fn bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int bytes) { BdrvTrackedRequest req; int max_pdiscard, ret; int head, tail, align; + BlockDriverState *bs = child->bs; - if (!bs->drv) { + if (!bs || !bs->drv) { return -ENOMEDIUM; } @@ -XXX,XX +XXX,XX @@ out: return ret; } -int bdrv_pdiscard(BlockDriverState *bs, int64_t offset, int bytes) +int bdrv_pdiscard(BdrvChild *child, int64_t offset, int bytes) { Coroutine *co; DiscardCo rwco = { - .bs = bs, + .child = child, .offset = offset, .bytes = bytes, .ret = NOT_DONE, @@ -XXX,XX +XXX,XX @@ int bdrv_pdiscard(BlockDriverState *bs, int64_t offset, int bytes) bdrv_pdiscard_co_entry(&rwco); } else { co = qemu_coroutine_create(bdrv_pdiscard_co_entry, &rwco); - bdrv_coroutine_enter(bs, co); - BDRV_POLL_WHILE(bs, rwco.ret == NOT_DONE); + bdrv_coroutine_enter(child->bs, co); + BDRV_POLL_WHILE(child->bs, rwco.ret == NOT_DONE); } return rwco.ret; diff --git a/block/mirror.c b/block/mirror.c index XXXXXXX..XXXXXXX 100644 --- a/block/mirror.c +++ b/block/mirror.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_mirror_top_do_write(BlockDriverState *bs, break; case MIRROR_METHOD_DISCARD: - ret = bdrv_co_pdiscard(bs->backing->bs, offset, bytes); + ret = bdrv_co_pdiscard(bs->backing, offset, bytes); break; default: diff --git a/block/qcow2-refcount.c b/block/qcow2-refcount.c index XXXXXXX..XXXXXXX 100644 --- a/block/qcow2-refcount.c +++ b/block/qcow2-refcount.c @@ -XXX,XX +XXX,XX @@ void qcow2_process_discards(BlockDriverState *bs, int ret) /* Discard is optional, ignore the return value */ if (ret >= 0) { - bdrv_pdiscard(bs->file->bs, d->offset, d->bytes); + bdrv_pdiscard(bs->file, d->offset, d->bytes); } g_free(d); diff --git a/block/raw-format.c b/block/raw-format.c index XXXXXXX..XXXXXXX 100644 --- a/block/raw-format.c +++ b/block/raw-format.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn raw_co_pdiscard(BlockDriverState *bs, if (ret) { return ret; } - return bdrv_co_pdiscard(bs->file->bs, offset, bytes); + return bdrv_co_pdiscard(bs->file, offset, bytes); } static int64_t raw_getlength(BlockDriverState *bs) diff --git a/block/throttle.c b/block/throttle.c index XXXXXXX..XXXXXXX 100644 --- a/block/throttle.c +++ b/block/throttle.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn throttle_co_pdiscard(BlockDriverState *bs, ThrottleGroupMember *tgm = bs->opaque; throttle_group_co_io_limits_intercept(tgm, bytes, true); - return bdrv_co_pdiscard(bs->file->bs, offset, bytes); + return bdrv_co_pdiscard(bs->file, offset, bytes); } static int throttle_co_flush(BlockDriverState *bs) -- 2.13.6
From: Fam Zheng <famz@redhat.com> This matches the types used for bytes in the rest parts of block layer. In the case of bdrv_co_truncate, new_bytes can be the image size which probably doesn't fit in a 32 bit int. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- include/block/block_int.h | 4 ++-- block/io.c | 8 +++++--- 2 files changed, 7 insertions(+), 5 deletions(-) diff --git a/include/block/block_int.h b/include/block/block_int.h index XXXXXXX..XXXXXXX 100644 --- a/include/block/block_int.h +++ b/include/block/block_int.h @@ -XXX,XX +XXX,XX @@ enum BdrvTrackedRequestType { typedef struct BdrvTrackedRequest { BlockDriverState *bs; int64_t offset; - unsigned int bytes; + uint64_t bytes; enum BdrvTrackedRequestType type; bool serialising; int64_t overlap_offset; - unsigned int overlap_bytes; + uint64_t overlap_bytes; QLIST_ENTRY(BdrvTrackedRequest) list; Coroutine *co; /* owner, used for deadlock detection */ diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ static void tracked_request_end(BdrvTrackedRequest *req) static void tracked_request_begin(BdrvTrackedRequest *req, BlockDriverState *bs, int64_t offset, - unsigned int bytes, + uint64_t bytes, enum BdrvTrackedRequestType type) { + assert(bytes <= INT64_MAX && offset <= INT64_MAX - bytes); + *req = (BdrvTrackedRequest){ .bs = bs, .offset = offset, @@ -XXX,XX +XXX,XX @@ static void tracked_request_begin(BdrvTrackedRequest *req, static void mark_request_serialising(BdrvTrackedRequest *req, uint64_t align) { int64_t overlap_offset = req->offset & ~(align - 1); - unsigned int overlap_bytes = ROUND_UP(req->offset + req->bytes, align) + uint64_t overlap_bytes = ROUND_UP(req->offset + req->bytes, align) - overlap_offset; if (!req->serialising) { @@ -XXX,XX +XXX,XX @@ static int bdrv_get_cluster_size(BlockDriverState *bs) } static bool tracked_request_overlaps(BdrvTrackedRequest *req, - int64_t offset, unsigned int bytes) + int64_t offset, uint64_t bytes) { /* aaaa bbbb */ if (offset >= req->overlap_offset + req->overlap_bytes) { -- 2.13.6
From: Fam Zheng <famz@redhat.com> As a mechanical refactoring patch, this is the first step towards unified and more correct write code paths. This is helpful because multiple BlockDriverState fields need to be updated after modifying image data, and it's hard to maintain in multiple places such as copy offload, discard and truncate. Suggested-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/io.c | 91 +++++++++++++++++++++++++++++++++++++++----------------------- 1 file changed, 57 insertions(+), 34 deletions(-) diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ fail: return ret; } +static inline int coroutine_fn +bdrv_co_write_req_prepare(BdrvChild *child, int64_t offset, uint64_t bytes, + BdrvTrackedRequest *req, int flags) +{ + BlockDriverState *bs = child->bs; + bool waited; + int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE); + + if (bs->read_only) { + return -EPERM; + } + + /* BDRV_REQ_NO_SERIALISING is only for read operation */ + assert(!(flags & BDRV_REQ_NO_SERIALISING)); + assert(!(bs->open_flags & BDRV_O_INACTIVE)); + assert((bs->open_flags & BDRV_O_NO_IO) == 0); + assert(!(flags & ~BDRV_REQ_MASK)); + + if (flags & BDRV_REQ_SERIALISING) { + mark_request_serialising(req, bdrv_get_cluster_size(bs)); + } + + waited = wait_serialising_requests(req); + + assert(!waited || !req->serialising || + is_request_serialising_and_aligned(req)); + assert(req->overlap_offset <= offset); + assert(offset + bytes <= req->overlap_offset + req->overlap_bytes); + + if (flags & BDRV_REQ_WRITE_UNCHANGED) { + assert(child->perm & (BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE)); + } else { + assert(child->perm & BLK_PERM_WRITE); + } + assert(end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE); + return notifier_with_return_list_notify(&bs->before_write_notifiers, req); +} + +static inline void coroutine_fn +bdrv_co_write_req_finish(BdrvChild *child, int64_t offset, uint64_t bytes, + BdrvTrackedRequest *req, int ret) +{ + int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE); + BlockDriverState *bs = child->bs; + + atomic_inc(&bs->write_gen); + bdrv_set_dirty(bs, offset, bytes); + + stat64_max(&bs->wr_highest_offset, offset + bytes); + + if (ret == 0) { + bs->total_sectors = MAX(bs->total_sectors, end_sector); + } +} + /* * Forwards an already correctly aligned write request to the BlockDriver, * after possibly fragmenting it. @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child, { BlockDriverState *bs = child->bs; BlockDriver *drv = bs->drv; - bool waited; int ret; - int64_t end_sector = DIV_ROUND_UP(offset + bytes, BDRV_SECTOR_SIZE); uint64_t bytes_remaining = bytes; int max_transfer; @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child, assert((offset & (align - 1)) == 0); assert((bytes & (align - 1)) == 0); assert(!qiov || bytes == qiov->size); - assert((bs->open_flags & BDRV_O_NO_IO) == 0); - assert(!(flags & ~BDRV_REQ_MASK)); max_transfer = QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_transfer, INT_MAX), align); - /* BDRV_REQ_NO_SERIALISING is only for read operation */ - assert(!(flags & BDRV_REQ_NO_SERIALISING)); - - if (flags & BDRV_REQ_SERIALISING) { - mark_request_serialising(req, bdrv_get_cluster_size(bs)); - } - - waited = wait_serialising_requests(req); - assert(!waited || !req->serialising || - is_request_serialising_and_aligned(req)); - assert(req->overlap_offset <= offset); - assert(offset + bytes <= req->overlap_offset + req->overlap_bytes); - if (flags & BDRV_REQ_WRITE_UNCHANGED) { - assert(child->perm & (BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE)); - } else { - assert(child->perm & BLK_PERM_WRITE); - } - assert(end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE); - - ret = notifier_with_return_list_notify(&bs->before_write_notifiers, req); + ret = bdrv_co_write_req_prepare(child, offset, bytes, req, flags); if (!ret && bs->detect_zeroes != BLOCKDEV_DETECT_ZEROES_OPTIONS_OFF && !(flags & BDRV_REQ_ZERO_WRITE) && drv->bdrv_co_pwrite_zeroes && @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_aligned_pwritev(BdrvChild *child, } bdrv_debug_event(bs, BLKDBG_PWRITEV_DONE); - atomic_inc(&bs->write_gen); - bdrv_set_dirty(bs, offset, bytes); - - stat64_max(&bs->wr_highest_offset, offset + bytes); - if (ret >= 0) { - bs->total_sectors = MAX(bs->total_sectors, end_sector); ret = 0; } + bdrv_co_write_req_finish(child, offset, bytes, req, ret); return ret; } @@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_pwritev(BdrvChild *child, if (!bs->drv) { return -ENOMEDIUM; } - if (bs->read_only) { - return -EPERM; - } - assert(!(bs->open_flags & BDRV_O_INACTIVE)); ret = bdrv_check_byte_request(bs, offset, bytes); if (ret < 0) { -- 2.13.6
From: Fam Zheng <famz@redhat.com> Two problems exist when a write request that enlarges the image (i.e. write beyond EOF) finishes: 1) parent is not notified about size change; 2) dirty bitmap is not resized although we try to set the dirty bits; Fix them just like how bdrv_co_truncate works. Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/io.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ static AioWait drain_all_aio_wait; +static void bdrv_parent_cb_resize(BlockDriverState *bs); static int coroutine_fn bdrv_co_do_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int bytes, BdrvRequestFlags flags); @@ -XXX,XX +XXX,XX @@ bdrv_co_write_req_finish(BdrvChild *child, int64_t offset, uint64_t bytes, BlockDriverState *bs = child->bs; atomic_inc(&bs->write_gen); - bdrv_set_dirty(bs, offset, bytes); stat64_max(&bs->wr_highest_offset, offset + bytes); - if (ret == 0) { - bs->total_sectors = MAX(bs->total_sectors, end_sector); + if (ret == 0 && + end_sector > bs->total_sectors) { + bs->total_sectors = end_sector; + bdrv_parent_cb_resize(bs); + bdrv_dirty_bitmap_truncate(bs, end_sector << BDRV_SECTOR_BITS); } + bdrv_set_dirty(bs, offset, bytes); } /* -- 2.13.6
From: Fam Zheng <famz@redhat.com> Reuse the new bdrv_co_write_req_prepare/finish helpers. The variation here is that discard requests don't affect bs->wr_highest_offset, and it cannot extend the image. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/io.c | 33 +++++++++++++++++++++++---------- 1 file changed, 23 insertions(+), 10 deletions(-) diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ bdrv_co_write_req_finish(BdrvChild *child, int64_t offset, uint64_t bytes, atomic_inc(&bs->write_gen); - stat64_max(&bs->wr_highest_offset, offset + bytes); - + /* + * Discard cannot extend the image, but in error handling cases, such as + * when reverting a qcow2 cluster allocation, the discarded range can pass + * the end of image file, so we cannot assert about BDRV_TRACKED_DISCARD + * here. Instead, just skip it, since semantically a discard request + * beyond EOF cannot expand the image anyway. + */ if (ret == 0 && - end_sector > bs->total_sectors) { + end_sector > bs->total_sectors && + req->type != BDRV_TRACKED_DISCARD) { bs->total_sectors = end_sector; bdrv_parent_cb_resize(bs); bdrv_dirty_bitmap_truncate(bs, end_sector << BDRV_SECTOR_BITS); } - bdrv_set_dirty(bs, offset, bytes); + if (req->bytes) { + switch (req->type) { + case BDRV_TRACKED_WRITE: + stat64_max(&bs->wr_highest_offset, offset + bytes); + /* fall through, to set dirty bits */ + case BDRV_TRACKED_DISCARD: + bdrv_set_dirty(bs, offset, bytes); + break; + default: + break; + } + } } /* @@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int bytes) ret = bdrv_check_byte_request(bs, offset, bytes); if (ret < 0) { return ret; - } else if (bs->read_only) { - return -EPERM; } - assert(!(bs->open_flags & BDRV_O_INACTIVE)); /* Do nothing if disabled. */ if (!(bs->open_flags & BDRV_O_UNMAP)) { @@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int bytes) bdrv_inc_in_flight(bs); tracked_request_begin(&req, bs, offset, bytes, BDRV_TRACKED_DISCARD); - ret = notifier_with_return_list_notify(&bs->before_write_notifiers, &req); + ret = bdrv_co_write_req_prepare(child, offset, bytes, &req, 0); if (ret < 0) { goto out; } @@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_pdiscard(BdrvChild *child, int64_t offset, int bytes) } ret = 0; out: - atomic_inc(&bs->write_gen); - bdrv_set_dirty(bs, req.offset, req.bytes); + bdrv_co_write_req_finish(child, req.offset, req.bytes, &req, ret); tracked_request_end(&req); bdrv_dec_in_flight(bs); return ret; -- 2.13.6
From: Fam Zheng <famz@redhat.com> This brings the request handling logic inline with write and discard, fixing write_gen, resize_cb, dirty bitmaps and image size refreshing. The last of these issues broke iotest case 222, which is now fixed. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/io.c | 24 ++++++++++-------------- 1 file changed, 10 insertions(+), 14 deletions(-) diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ static int coroutine_fn bdrv_co_copy_range_internal( bdrv_inc_in_flight(dst->bs); tracked_request_begin(&req, dst->bs, dst_offset, bytes, BDRV_TRACKED_WRITE); - - /* BDRV_REQ_NO_SERIALISING is only for read operation */ - assert(!(write_flags & BDRV_REQ_NO_SERIALISING)); - if (write_flags & BDRV_REQ_SERIALISING) { - mark_request_serialising(&req, bdrv_get_cluster_size(dst->bs)); - } - wait_serialising_requests(&req); - - ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs, - src, src_offset, - dst, dst_offset, - bytes, - read_flags, write_flags); - + ret = bdrv_co_write_req_prepare(dst, dst_offset, bytes, &req, + write_flags); + if (!ret) { + ret = dst->bs->drv->bdrv_co_copy_range_to(dst->bs, + src, src_offset, + dst, dst_offset, + bytes, + read_flags, write_flags); + } + bdrv_co_write_req_finish(dst, dst_offset, bytes, &req, ret); tracked_request_end(&req); bdrv_dec_in_flight(dst->bs); } -- 2.13.6
From: Fam Zheng <famz@redhat.com> If we are growing the image and potentially using preallocation for the new area, we need to make sure that no write requests are made to the "preallocated" area which is [@old_size, @offset), not [@offset, offset * 2 - @old_size). Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/io.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, } bdrv_inc_in_flight(bs); - tracked_request_begin(&req, bs, offset, new_bytes, BDRV_TRACKED_TRUNCATE); + tracked_request_begin(&req, bs, offset - new_bytes, new_bytes, + BDRV_TRACKED_TRUNCATE); /* If we are growing the image and potentially using preallocation for the * new area, we need to make sure that no write requests are made to it -- 2.13.6
From: Fam Zheng <famz@redhat.com> Truncation is the last to convert from open coded req handling to reusing helpers. This time the permission check in prepare has to adapt to the new caller: it checks a different permission bit, and doesn't trigger the before write notifier. Also, truncation should always trigger a bs->total_sectors update and in turn call parent resize_cb. Update the condition in finish helper, too. It's intended to do a duplicated bs->read_only check before calling bdrv_co_write_req_prepare() so that we can be more informative with the error message, as bdrv_co_write_req_prepare() doesn't have Error parameter. Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> --- block/io.c | 55 +++++++++++++++++++++++++++++++++++-------------------- 1 file changed, 35 insertions(+), 20 deletions(-) diff --git a/block/io.c b/block/io.c index XXXXXXX..XXXXXXX 100644 --- a/block/io.c +++ b/block/io.c @@ -XXX,XX +XXX,XX @@ bdrv_co_write_req_prepare(BdrvChild *child, int64_t offset, uint64_t bytes, is_request_serialising_and_aligned(req)); assert(req->overlap_offset <= offset); assert(offset + bytes <= req->overlap_offset + req->overlap_bytes); + assert(end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE); - if (flags & BDRV_REQ_WRITE_UNCHANGED) { - assert(child->perm & (BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE)); - } else { - assert(child->perm & BLK_PERM_WRITE); + switch (req->type) { + case BDRV_TRACKED_WRITE: + case BDRV_TRACKED_DISCARD: + if (flags & BDRV_REQ_WRITE_UNCHANGED) { + assert(child->perm & (BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE)); + } else { + assert(child->perm & BLK_PERM_WRITE); + } + return notifier_with_return_list_notify(&bs->before_write_notifiers, + req); + case BDRV_TRACKED_TRUNCATE: + assert(child->perm & BLK_PERM_RESIZE); + return 0; + default: + abort(); } - assert(end_sector <= bs->total_sectors || child->perm & BLK_PERM_RESIZE); - return notifier_with_return_list_notify(&bs->before_write_notifiers, req); } static inline void coroutine_fn @@ -XXX,XX +XXX,XX @@ bdrv_co_write_req_finish(BdrvChild *child, int64_t offset, uint64_t bytes, * beyond EOF cannot expand the image anyway. */ if (ret == 0 && - end_sector > bs->total_sectors && - req->type != BDRV_TRACKED_DISCARD) { + (req->type == BDRV_TRACKED_TRUNCATE || + end_sector > bs->total_sectors) && + req->type != BDRV_TRACKED_DISCARD) { bs->total_sectors = end_sector; bdrv_parent_cb_resize(bs); bdrv_dirty_bitmap_truncate(bs, end_sector << BDRV_SECTOR_BITS); @@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, int64_t old_size, new_bytes; int ret; - assert(child->perm & BLK_PERM_RESIZE); /* if bs->drv == NULL, bs is closed, so there's nothing to do here */ if (!drv) { @@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, * concurrently or they might be overwritten by preallocation. */ if (new_bytes) { mark_request_serialising(&req, 1); - wait_serialising_requests(&req); + } + if (bs->read_only) { + error_setg(errp, "Image is read-only"); + ret = -EACCES; + goto out; + } + ret = bdrv_co_write_req_prepare(child, offset - new_bytes, new_bytes, &req, + 0); + if (ret < 0) { + error_setg_errno(errp, -ret, + "Failed to prepare request for truncation"); + goto out; } if (!drv->bdrv_co_truncate) { @@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, ret = -ENOTSUP; goto out; } - if (bs->read_only) { - error_setg(errp, "Image is read-only"); - ret = -EACCES; - goto out; - } - - assert(!(bs->open_flags & BDRV_O_INACTIVE)); ret = drv->bdrv_co_truncate(bs, offset, prealloc, errp); if (ret < 0) { @@ -XXX,XX +XXX,XX @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, int64_t offset, } else { offset = bs->total_sectors * BDRV_SECTOR_SIZE; } - bdrv_dirty_bitmap_truncate(bs, offset); - bdrv_parent_cb_resize(bs); - atomic_inc(&bs->write_gen); + /* It's possible that truncation succeeded but refresh_total_sectors + * failed, but the latter doesn't affect how we should finish the request. + * Pass 0 as the last parameter so that dirty bitmaps etc. are handled. */ + bdrv_co_write_req_finish(child, offset - new_bytes, new_bytes, &req, 0); out: tracked_request_end(&req); -- 2.13.6
The following changes since commit 13356edb87506c148b163b8c7eb0695647d00c2a: Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging (2023-01-24 09:45:33 +0000) are available in the Git repository at: https://repo.or.cz/qemu/kevin.git tags/for-upstream for you to fetch changes up to d570177b50c389f379f93183155a27d44856ab46: qemu-img: Change info key names for protocol nodes (2023-02-01 16:52:33 +0100) v4: - Fixed the 'qemu-img-close-errors' test case to run only on Linux and only with the file protocol, use qemu-io instead of truncate v3: - Make the compiler happier on BSD and CentOS Stream 8 v2: - Rebased to resolve merge conflicts in coroutine.h ---------------------------------------------------------------- Block layer patches - qemu-img info: Show protocol-level information - Move more functions to coroutines - Make coroutine annotations ready for static analysis - qemu-img: Fix exit code for errors closing the image - qcow2 bitmaps: Fix theoretical corruption in error path - pflash: Only load non-zero parts of backend image to save memory - Code cleanup and test case improvements ---------------------------------------------------------------- Alberto Faria (2): coroutine: annotate coroutine_fn for libclang block: Add no_coroutine_fn and coroutine_mixed_fn marker Emanuele Giuseppe Esposito (14): block-coroutine-wrapper: support void functions block: Convert bdrv_io_plug() to co_wrapper block: Convert bdrv_io_unplug() to co_wrapper block: Convert bdrv_is_inserted() to co_wrapper block: Rename refresh_total_sectors to bdrv_refresh_total_sectors block: Convert bdrv_refresh_total_sectors() to co_wrapper_mixed block-backend: use bdrv_getlength instead of blk_getlength block: use bdrv_co_refresh_total_sectors when possible block: Convert bdrv_get_allocated_file_size() to co_wrapper block: Convert bdrv_get_info() to co_wrapper_mixed block: Convert bdrv_eject() to co_wrapper block: Convert bdrv_lock_medium() to co_wrapper block: Convert bdrv_debug_event() to co_wrapper_mixed block: Rename bdrv_load/save_vmstate() to bdrv_co_load/save_vmstate() Hanna Reitz (12): block: Improve empty format-specific info dump block/file: Add file-specific image info block/vmdk: Change extent info type block: Split BlockNodeInfo off of ImageInfo qemu-img: Use BlockNodeInfo block/qapi: Let bdrv_query_image_info() recurse block/qapi: Introduce BlockGraphInfo block/qapi: Add indentation to bdrv_node_info_dump() iotests: Filter child node information iotests/106, 214, 308: Read only one size line qemu-img: Let info print block graph qemu-img: Change info key names for protocol nodes Kevin Wolf (4): qcow2: Fix theoretical corruption in store_bitmap() error path qemu-img commit: Report errors while closing the image qemu-img bitmap: Report errors while closing the image qemu-iotests: Test qemu-img bitmap/commit exit code on error Paolo Bonzini (2): qemu-io: do not reinvent the blk_pwrite_zeroes wheel block: remove bdrv_coroutine_enter Philippe Mathieu-Daudé (1): block/nbd: Add missing <qemu/bswap.h> include Thomas Huth (2): tests/qemu-iotests/312: Mark "quorum" as required driver tests/qemu-iotests/262: Check for availability of "blkverify" first Xiang Zheng (1): pflash: Only read non-zero parts of backend image qapi/block-core.json | 123 +++++++- include/block/block-common.h | 11 +- include/block/block-io.h | 41 ++- include/block/block_int-common.h | 26 +- include/block/block_int-io.h | 5 +- include/block/nbd.h | 1 + include/block/qapi.h | 14 +- include/qemu/osdep.h | 44 +++ include/sysemu/block-backend-io.h | 31 +- block.c | 88 +++--- block/blkdebug.c | 11 +- block/blkio.c | 15 +- block/blklogwrites.c | 6 +- block/blkreplay.c | 6 +- block/blkverify.c | 6 +- block/block-backend.c | 38 +-- block/commit.c | 4 +- block/copy-on-read.c | 18 +- block/crypto.c | 14 +- block/curl.c | 10 +- block/file-posix.c | 137 +++++---- block/file-win32.c | 18 +- block/filter-compress.c | 20 +- block/gluster.c | 23 +- block/io.c | 76 ++--- block/iscsi.c | 17 +- block/mirror.c | 6 +- block/monitor/block-hmp-cmds.c | 2 +- block/nbd.c | 8 +- block/nfs.c | 4 +- block/null.c | 13 +- block/nvme.c | 14 +- block/preallocate.c | 16 +- block/qapi.c | 317 ++++++++++++++++----- block/qcow.c | 5 +- block/qcow2-bitmap.c | 5 +- block/qcow2-refcount.c | 2 +- block/qcow2.c | 17 +- block/qed.c | 11 +- block/quorum.c | 8 +- block/raw-format.c | 25 +- block/rbd.c | 9 +- block/replication.c | 6 +- block/ssh.c | 4 +- block/throttle.c | 6 +- block/vdi.c | 7 +- block/vhdx.c | 5 +- block/vmdk.c | 22 +- block/vpc.c | 5 +- blockdev.c | 8 +- hw/block/block.c | 36 ++- hw/scsi/scsi-disk.c | 5 + qemu-img.c | 100 +++++-- qemu-io-cmds.c | 62 +--- tests/unit/test-block-iothread.c | 3 + scripts/block-coroutine-wrapper.py | 20 +- tests/qemu-iotests/iotests.py | 18 +- block/meson.build | 1 + tests/qemu-iotests/065 | 2 +- tests/qemu-iotests/106 | 4 +- tests/qemu-iotests/214 | 6 +- tests/qemu-iotests/262 | 3 +- tests/qemu-iotests/302.out | 5 + tests/qemu-iotests/308 | 4 +- tests/qemu-iotests/312 | 1 + tests/qemu-iotests/common.filter | 22 +- tests/qemu-iotests/common.rc | 22 +- tests/qemu-iotests/tests/qemu-img-close-errors | 96 +++++++ tests/qemu-iotests/tests/qemu-img-close-errors.out | 23 ++ 69 files changed, 1209 insertions(+), 552 deletions(-) create mode 100755 tests/qemu-iotests/tests/qemu-img-close-errors create mode 100644 tests/qemu-iotests/tests/qemu-img-close-errors.out