From nobody Sun Dec 29 19:49:31 2024 Delivered-To: importer2@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=bytedance.com ARC-Seal: i=1; a=rsa-sha256; t=1720441206; cv=none; d=zohomail.com; s=zohoarc; b=hTF8FZrkUo0LnmHSdrWyS9kdLmJDt27Q9NBLGXeEgoSiqnRZ/BBWfQ6qgutMTCFeNB68bAEHULTrEYi+WJCeh3v/CKuQypi2qko/qEr9LFCs7rag/pfazFGOGKyrSZrtKF1QaaQioP+UWI4RsobVH0T3hKfR/Y3PS+pNewhIi9k= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1720441206; h=Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=kjAdu6ZTaJ2H/T12j7/Y4iOTA7PHN1pPSyvKcs9msGY=; b=iqqyIOFtcMaYWhnoN8XfN/wUZhyq3Ptc40ZgxqYDp5asQGT7qaA0mRl8OLblNVXlEJfBYwSrntKaqlPxUPe6CBjdRvzjndJGKeyUEXbv+hzvgVp41jIwdKsgy7YPJUD0fYXSf4oFENRV7HEPxOKk0jJSeBM09et8nobsfJbKpJM= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer2=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1720441206514757.6393539975903; Mon, 8 Jul 2024 05:20:06 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sQnL4-0001m1-TI; Mon, 08 Jul 2024 08:19:27 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sQnKy-0001ed-Lb for qemu-devel@nongnu.org; Mon, 08 Jul 2024 08:19:20 -0400 Received: from mail-pl1-x629.google.com ([2607:f8b0:4864:20::629]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sQnKu-0001UQ-5s for qemu-devel@nongnu.org; Mon, 08 Jul 2024 08:19:19 -0400 Received: by mail-pl1-x629.google.com with SMTP id d9443c01a7336-1f65a3abd01so26709025ad.3 for ; Mon, 08 Jul 2024 05:19:09 -0700 (PDT) Received: from n37-006-243.byted.org ([180.184.49.4]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-1fb4fa258f7sm56587195ad.169.2024.07.08.05.19.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 08 Jul 2024 05:19:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1720441148; x=1721045948; darn=nongnu.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=kjAdu6ZTaJ2H/T12j7/Y4iOTA7PHN1pPSyvKcs9msGY=; b=l0+vdNwNc0W6KWjmtcpK7bmQJ7cS6dXI+bD0yjm+crPFFXOyyogGTWFTtRYa1h82Ei yiqIVaGVhCFbDuEgzfRa2zdYnmLiJWKnwgS1unYbDDNLzzb6u3vlx0D2fOTqm6t9aqjA 4yAKAEcOm7ctv5OtWlzN6c035VQccxBLensDpt5JCN738YartrqO+qhzTzxouZovWisR lPyPOJ7gXyS2dwTs2iVXasUFe4BemWNRnad8JHO+Blf42HMcxbLgpbjrsxR7I392zhFc taPK8IE2K1BESxFvFfB9bkJ2t9I+78JSvULakY6COxln4rpnEdPv7uiziefHNxqyQgpv OKKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720441148; x=1721045948; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kjAdu6ZTaJ2H/T12j7/Y4iOTA7PHN1pPSyvKcs9msGY=; b=QLKtzlLIZkUNRP4HEhbGY7oP8A2BMBSlyGU6qbkHIqg21S9Fm4flq7WHB9Dwe7HH7i 1fKRbonnzFAjKScuEy94nrdfEIXHCo9hCluRcy9Jate8TCmlIxtZxQ32kyQkytxU48oa 1OXEedMcwF0rXFdu+j+93ogVJ1X2h35Dd3j/UlXiGyoc21otlFubmKy9aRSDCZmiALHv ECnk8uvYmgZQY59r1VFZaJnPsjgWGEryWRploIwXCBiE7axsuDdmshhYuyi1wBC4Jybp nmBsMKPgJRmOu9rmzh3WKfOs1hpm5oHRpRtFpRlieKrrFIvTLFhHlLsgjaTJnzvIw4+U oKKQ== X-Forwarded-Encrypted: i=1; AJvYcCU9iawPbUfJDuthLJNMdigwZdtIJr/nAt1CXX4e584XUgSvlqwRwvZn9XcjwdQy/zZU/F1c2nPPm+ffpo1GF1+WKkfC9ss= X-Gm-Message-State: AOJu0YwMkHY4rgAKIMsYJT7zMXkOrHE5SKNeo/24rsy6UjfzhQkmy1BK Us5UgZypgBMivwVvd1PLK9e5ZwFymAAeGf3OfkbHqZlyHrq/L4Lw+tvmq0swCFw= X-Google-Smtp-Source: AGHT+IFbePtQxCaJxct0+trh8hQfL8KXZirFLnzvFywkBXfcqY/YFGFvJpnXyUsthlxGKlcEVX5bOQ== X-Received: by 2002:a17:902:ecd2:b0:1fb:e31:b4e9 with SMTP id d9443c01a7336-1fb33ee5f55mr137493525ad.53.1720441148202; Mon, 08 Jul 2024 05:19:08 -0700 (PDT) From: Changqi Lu To: qemu-block@nongnu.org, qemu-devel@nongnu.org Cc: kwolf@redhat.com, hreitz@redhat.com, stefanha@redhat.com, fam@euphon.net, ronniesahlberg@gmail.com, pbonzini@redhat.com, pl@dlhnet.de, kbusch@kernel.org, its@irrelevant.dk, foss@defmacro.it, philmd@linaro.org, pizhenwei@bytedance.com, Changqi Lu Subject: [PATCH v8 01/10] block: add persistent reservation in/out api Date: Mon, 8 Jul 2024 20:18:45 +0800 Message-Id: <20240708121854.1318876-2-luchangqi.123@bytedance.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20240708121854.1318876-1-luchangqi.123@bytedance.com> References: <20240708121854.1318876-1-luchangqi.123@bytedance.com> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer2=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=2607:f8b0:4864:20::629; envelope-from=luchangqi.123@bytedance.com; helo=mail-pl1-x629.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer2=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer2=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @bytedance.com) X-ZM-MESSAGEID: 1720441207984100003 Content-Type: text/plain; charset="utf-8" Add persistent reservation in/out operations at the block level. The following operations are included: - read_keys: retrieves the list of registered keys. - read_reservation: retrieves the current reservation status. - register: registers a new reservation key. - reserve: initiates a reservation for a specific key. - release: releases a reservation for a specific key. - clear: clears all existing reservations. - preempt: preempts a reservation held by another key. Signed-off-by: Changqi Lu Signed-off-by: zhenwei pi Reviewed-by: Stefan Hajnoczi --- block/block-backend.c | 403 ++++++++++++++++++++++++++++++ block/io.c | 164 ++++++++++++ include/block/block-common.h | 40 +++ include/block/block-io.h | 20 ++ include/block/block_int-common.h | 84 +++++++ include/sysemu/block-backend-io.h | 24 ++ 6 files changed, 735 insertions(+) diff --git a/block/block-backend.c b/block/block-backend.c index db6f9b92a3..b74aaba23f 100644 --- a/block/block-backend.c +++ b/block/block-backend.c @@ -1770,6 +1770,409 @@ BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsign= ed long int req, void *buf, return blk_aio_prwv(blk, req, 0, buf, blk_aio_ioctl_entry, 0, cb, opaq= ue); } =20 +typedef struct BlkPrInCo { + BlockBackend *blk; + uint32_t *generation; + uint32_t num_keys; + BlockPrType *type; + uint64_t *keys; + int ret; +} BlkPrInCo; + +typedef struct BlkPrInCB { + BlockAIOCB common; + BlkPrInCo prco; + bool has_returned; +} BlkPrInCB; + +static const AIOCBInfo blk_pr_in_aiocb_info =3D { + .aiocb_size =3D sizeof(BlkPrInCB), +}; + +static void blk_pr_in_complete(BlkPrInCB *acb) +{ + if (acb->has_returned) { + acb->common.cb(acb->common.opaque, acb->prco.ret); + + /* This is paired with blk_inc_in_flight() in blk_aio_pr_in(). */ + blk_dec_in_flight(acb->prco.blk); + qemu_aio_unref(acb); + } +} + +static void blk_pr_in_complete_bh(void *opaque) +{ + BlkPrInCB *acb =3D opaque; + assert(acb->has_returned); + blk_pr_in_complete(acb); +} + +static BlockAIOCB *blk_aio_pr_in(BlockBackend *blk, uint32_t *generation, + uint32_t num_keys, BlockPrType *type, + uint64_t *keys, CoroutineEntry co_entry, + BlockCompletionFunc *cb, void *opaque) +{ + BlkPrInCB *acb; + Coroutine *co; + + /* This is paired with blk_dec_in_flight() in blk_pr_in_complete(). */ + blk_inc_in_flight(blk); + acb =3D blk_aio_get(&blk_pr_in_aiocb_info, blk, cb, opaque); + acb->prco =3D (BlkPrInCo) { + .blk =3D blk, + .generation =3D generation, + .num_keys =3D num_keys, + .type =3D type, + .ret =3D NOT_DONE, + .keys =3D keys, + }; + acb->has_returned =3D false; + + co =3D qemu_coroutine_create(co_entry, acb); + aio_co_enter(qemu_get_current_aio_context(), co); + + acb->has_returned =3D true; + if (acb->prco.ret !=3D NOT_DONE) { + replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(), + blk_pr_in_complete_bh, acb); + } + + return &acb->common; +} + +/* To be called between exactly one pair of blk_inc/dec_in_flight() */ +static int coroutine_fn +blk_aio_pr_do_read_keys(BlockBackend *blk, uint32_t *generation, + uint32_t num_keys, uint64_t *keys) +{ + IO_CODE(); + + blk_wait_while_drained(blk); + GRAPH_RDLOCK_GUARD(); + + if (!blk_co_is_available(blk)) { + return -ENOMEDIUM; + } + + return bdrv_co_pr_read_keys(blk_bs(blk), generation, num_keys, keys); +} + +static void coroutine_fn blk_aio_pr_read_keys_entry(void *opaque) +{ + BlkPrInCB *acb =3D opaque; + BlkPrInCo *prco =3D &acb->prco; + + prco->ret =3D blk_aio_pr_do_read_keys(prco->blk, prco->generation, + prco->num_keys, prco->keys); + blk_pr_in_complete(acb); +} + +BlockAIOCB *blk_aio_pr_read_keys(BlockBackend *blk, uint32_t *generation, + uint32_t num_keys, uint64_t *keys, + BlockCompletionFunc *cb, void *opaque) +{ + IO_CODE(); + return blk_aio_pr_in(blk, generation, num_keys, NULL, keys, + blk_aio_pr_read_keys_entry, cb, opaque); +} + +/* To be called between exactly one pair of blk_inc/dec_in_flight() */ +static int coroutine_fn +blk_aio_pr_do_read_reservation(BlockBackend *blk, uint32_t *generation, + uint64_t *key, BlockPrType *type) +{ + IO_CODE(); + + blk_wait_while_drained(blk); + GRAPH_RDLOCK_GUARD(); + + if (!blk_co_is_available(blk)) { + return -ENOMEDIUM; + } + + return bdrv_co_pr_read_reservation(blk_bs(blk), generation, key, type); +} + +static void coroutine_fn blk_aio_pr_read_reservation_entry(void *opaque) +{ + BlkPrInCB *acb =3D opaque; + BlkPrInCo *prco =3D &acb->prco; + + prco->ret =3D blk_aio_pr_do_read_reservation(prco->blk, prco->generati= on, + prco->keys, prco->type); + blk_pr_in_complete(acb); +} + +BlockAIOCB *blk_aio_pr_read_reservation(BlockBackend *blk, uint32_t *gener= ation, + uint64_t *key, BlockPrType *type, + BlockCompletionFunc *cb, void *opa= que) +{ + IO_CODE(); + return blk_aio_pr_in(blk, generation, 0, type, key, + blk_aio_pr_read_reservation_entry, cb, opaque); +} + +typedef struct BlkPrOutCo { + BlockBackend *blk; + uint64_t old_key; + uint64_t new_key; + bool ptpl; + BlockPrType type; + bool ignore_key; + bool abort; + int ret; +} BlkPrOutCo; + +typedef struct BlkPrOutCB { + BlockAIOCB common; + BlkPrOutCo prco; + bool has_returned; +} BlkPrOutCB; + +static const AIOCBInfo blk_pr_out_aiocb_info =3D { + .aiocb_size =3D sizeof(BlkPrOutCB), +}; + +static void blk_pr_out_complete(BlkPrOutCB *acb) +{ + if (acb->has_returned) { + acb->common.cb(acb->common.opaque, acb->prco.ret); + + /* This is paired with blk_inc_in_flight() in blk_aio_pr_out(). */ + blk_dec_in_flight(acb->prco.blk); + qemu_aio_unref(acb); + } +} + +static void blk_pr_out_complete_bh(void *opaque) +{ + BlkPrOutCB *acb =3D opaque; + assert(acb->has_returned); + blk_pr_out_complete(acb); +} + +static BlockAIOCB *blk_aio_pr_out(BlockBackend *blk, uint64_t old_key, + uint64_t new_key, bool ptpl, + BlockPrType type, bool ignore_key, + bool abort, CoroutineEntry co_entry, + BlockCompletionFunc *cb, void *opaque) +{ + BlkPrOutCB *acb; + Coroutine *co; + + /* This is paired with blk_dec_in_flight() in blk_pr_out_complete(). */ + blk_inc_in_flight(blk); + acb =3D blk_aio_get(&blk_pr_out_aiocb_info, blk, cb, opaque); + acb->prco =3D (BlkPrOutCo) { + .blk =3D blk, + .old_key =3D old_key, + .new_key =3D new_key, + .ptpl =3D ptpl, + .type =3D type, + .ignore_key =3D ignore_key, + .abort =3D abort, + .ret =3D NOT_DONE, + }; + acb->has_returned =3D false; + + co =3D qemu_coroutine_create(co_entry, acb); + aio_co_enter(qemu_get_current_aio_context(), co); + + acb->has_returned =3D true; + if (acb->prco.ret !=3D NOT_DONE) { + replay_bh_schedule_oneshot_event(qemu_get_current_aio_context(), + blk_pr_out_complete_bh, acb); + } + + return &acb->common; +} + +/* To be called between exactly one pair of blk_inc/dec_in_flight() */ +static int coroutine_fn +blk_aio_pr_do_register(BlockBackend *blk, uint64_t old_key, + uint64_t new_key, BlockPrType type, + bool ptpl, bool ignore_key) +{ + IO_CODE(); + + blk_wait_while_drained(blk); + GRAPH_RDLOCK_GUARD(); + + if (!blk_co_is_available(blk)) { + return -ENOMEDIUM; + } + + return bdrv_co_pr_register(blk_bs(blk), old_key, new_key, type, + ptpl, ignore_key); +} + +static void coroutine_fn blk_aio_pr_register_entry(void *opaque) +{ + BlkPrOutCB *acb =3D opaque; + BlkPrOutCo *prco =3D &acb->prco; + + prco->ret =3D blk_aio_pr_do_register(prco->blk, prco->old_key, prco->n= ew_key, + prco->type, prco->ptpl, + prco->ignore_key); + blk_pr_out_complete(acb); +} + +BlockAIOCB *blk_aio_pr_register(BlockBackend *blk, uint64_t old_key, + uint64_t new_key, BlockPrType type, + bool ptpl, bool ignore_key, + BlockCompletionFunc *cb, + void *opaque) +{ + IO_CODE(); + return blk_aio_pr_out(blk, old_key, new_key, ptpl, type, ignore_key, f= alse, + blk_aio_pr_register_entry, cb, opaque); +} + +/* To be called between exactly one pair of blk_inc/dec_in_flight() */ +static int coroutine_fn +blk_aio_pr_do_reserve(BlockBackend *blk, uint64_t key, BlockPrType type) +{ + IO_CODE(); + + blk_wait_while_drained(blk); + GRAPH_RDLOCK_GUARD(); + + if (!blk_co_is_available(blk)) { + return -ENOMEDIUM; + } + + return bdrv_co_pr_reserve(blk_bs(blk), key, type); +} + +static void coroutine_fn blk_aio_pr_reserve_entry(void *opaque) +{ + BlkPrOutCB *acb =3D opaque; + BlkPrOutCo *prco =3D &acb->prco; + + prco->ret =3D blk_aio_pr_do_reserve(prco->blk, prco->old_key, + prco->type); + blk_pr_out_complete(acb); +} + + +BlockAIOCB *blk_aio_pr_reserve(BlockBackend *blk, uint64_t key, + BlockPrType type, + BlockCompletionFunc *cb, + void *opaque) +{ + IO_CODE(); + return blk_aio_pr_out(blk, key, 0, false, type, false, false, + blk_aio_pr_reserve_entry, cb, opaque); +} + +/* To be called between exactly one pair of blk_inc/dec_in_flight() */ +static int coroutine_fn +blk_aio_pr_do_release(BlockBackend *blk, uint64_t key, BlockPrType type) +{ + IO_CODE(); + + blk_wait_while_drained(blk); + GRAPH_RDLOCK_GUARD(); + + if (!blk_co_is_available(blk)) { + return -ENOMEDIUM; + } + + return bdrv_co_pr_release(blk_bs(blk), key, type); +} + +static void coroutine_fn blk_aio_pr_release_entry(void *opaque) +{ + BlkPrOutCB *acb =3D opaque; + BlkPrOutCo *prco =3D &acb->prco; + + prco->ret =3D blk_aio_pr_do_release(prco->blk, prco->old_key, prco->ty= pe); + blk_pr_out_complete(acb); +} + + +BlockAIOCB *blk_aio_pr_release(BlockBackend *blk, uint64_t key, + BlockPrType type, BlockCompletionFunc *cb, + void *opaque) +{ + IO_CODE(); + return blk_aio_pr_out(blk, key, 0, false, type, false, false, + blk_aio_pr_release_entry, cb, opaque); +} + +/* To be called between exactly one pair of blk_inc/dec_in_flight() */ +static int coroutine_fn +blk_aio_pr_do_clear(BlockBackend *blk, uint64_t key) +{ + IO_CODE(); + + blk_wait_while_drained(blk); + GRAPH_RDLOCK_GUARD(); + + if (!blk_co_is_available(blk)) { + return -ENOMEDIUM; + } + + return bdrv_co_pr_clear(blk_bs(blk), key); +} + +static void coroutine_fn blk_aio_pr_clear_entry(void *opaque) +{ + BlkPrOutCB *acb =3D opaque; + BlkPrOutCo *prco =3D &acb->prco; + + prco->ret =3D blk_aio_pr_do_clear(prco->blk, prco->old_key); + blk_pr_out_complete(acb); +} + + +BlockAIOCB *blk_aio_pr_clear(BlockBackend *blk, uint64_t key, + BlockCompletionFunc *cb, void *opaque) +{ + IO_CODE(); + return blk_aio_pr_out(blk, key, 0, false, 0, false, false, + blk_aio_pr_clear_entry, cb, opaque); +} + +/* To be called between exactly one pair of blk_inc/dec_in_flight() */ +static int coroutine_fn +blk_aio_pr_do_preempt(BlockBackend *blk, uint64_t cr_key, + uint64_t pr_key, BlockPrType type, bool abort) +{ + IO_CODE(); + + blk_wait_while_drained(blk); + GRAPH_RDLOCK_GUARD(); + + if (!blk_co_is_available(blk)) { + return -ENOMEDIUM; + } + + return bdrv_co_pr_preempt(blk_bs(blk), cr_key, pr_key, type, abort); +} + +static void coroutine_fn blk_aio_pr_preempt_entry(void *opaque) +{ + BlkPrOutCB *acb =3D opaque; + BlkPrOutCo *prco =3D &acb->prco; + + prco->ret =3D blk_aio_pr_do_preempt(prco->blk, prco->old_key, + prco->new_key, prco->type, + prco->abort); + blk_pr_out_complete(acb); +} + + +BlockAIOCB *blk_aio_pr_preempt(BlockBackend *blk, uint64_t cr_key, + uint64_t pr_key, BlockPrType type, + bool abort, BlockCompletionFunc *cb, + void *opaque) +{ + IO_CODE(); + return blk_aio_pr_out(blk, cr_key, pr_key, false, type, false, abort, + blk_aio_pr_preempt_entry, cb, opaque); +} + /* To be called between exactly one pair of blk_inc/dec_in_flight() */ static int coroutine_fn blk_co_do_pdiscard(BlockBackend *blk, int64_t offset, int64_t bytes) diff --git a/block/io.c b/block/io.c index 7217cf811b..105161930e 100644 --- a/block/io.c +++ b/block/io.c @@ -146,6 +146,7 @@ static void bdrv_merge_limits(BlockLimits *dst, const B= lockLimits *src) src->min_mem_alignment); dst->max_iov =3D MIN_NON_ZERO(dst->max_iov, src->max_iov); dst->max_hw_iov =3D MIN_NON_ZERO(dst->max_hw_iov, src->max_hw_iov); + dst->pr_cap |=3D src->pr_cap; } =20 typedef struct BdrvRefreshLimitsState { @@ -3220,6 +3221,169 @@ out: return co.ret; } =20 +int coroutine_fn bdrv_co_pr_read_keys(BlockDriverState *bs, + uint32_t *generation, uint32_t num_keys, + uint64_t *keys) +{ + BlockDriver *drv =3D bs->drv; + CoroutineIOCompletion co =3D { + .coroutine =3D qemu_coroutine_self(), + }; + + IO_CODE(); + assert_bdrv_graph_readable(); + + bdrv_inc_in_flight(bs); + if (!drv || !drv->bdrv_co_pr_read_keys) { + co.ret =3D -ENOTSUP; + goto out; + } + + co.ret =3D drv->bdrv_co_pr_read_keys(bs, generation, num_keys, keys); +out: + bdrv_dec_in_flight(bs); + return co.ret; +} + +int coroutine_fn bdrv_co_pr_read_reservation(BlockDriverState *bs, + uint32_t *generation, uint64_t *key, BlockPrType *type) +{ + BlockDriver *drv =3D bs->drv; + CoroutineIOCompletion co =3D { + .coroutine =3D qemu_coroutine_self(), + }; + + IO_CODE(); + assert_bdrv_graph_readable(); + + bdrv_inc_in_flight(bs); + if (!drv || !drv->bdrv_co_pr_read_reservation) { + co.ret =3D -ENOTSUP; + goto out; + } + + co.ret =3D drv->bdrv_co_pr_read_reservation(bs, generation, key, type); +out: + bdrv_dec_in_flight(bs); + return co.ret; +} + +int coroutine_fn bdrv_co_pr_register(BlockDriverState *bs, uint64_t old_ke= y, + uint64_t new_key, BlockPrType type, bool ptpl, + bool ignore_key) +{ + BlockDriver *drv =3D bs->drv; + CoroutineIOCompletion co =3D { + .coroutine =3D qemu_coroutine_self(), + }; + + IO_CODE(); + assert_bdrv_graph_readable(); + + bdrv_inc_in_flight(bs); + if (!drv || !drv->bdrv_co_pr_register) { + co.ret =3D -ENOTSUP; + goto out; + } + + co.ret =3D drv->bdrv_co_pr_register(bs, old_key, new_key, type, + ptpl, ignore_key); +out: + bdrv_dec_in_flight(bs); + return co.ret; +} + +int coroutine_fn bdrv_co_pr_reserve(BlockDriverState *bs, uint64_t key, + BlockPrType type) +{ + BlockDriver *drv =3D bs->drv; + CoroutineIOCompletion co =3D { + .coroutine =3D qemu_coroutine_self(), + }; + + IO_CODE(); + assert_bdrv_graph_readable(); + + bdrv_inc_in_flight(bs); + if (!drv || !drv->bdrv_co_pr_reserve) { + co.ret =3D -ENOTSUP; + goto out; + } + + co.ret =3D drv->bdrv_co_pr_reserve(bs, key, type); +out: + bdrv_dec_in_flight(bs); + return co.ret; +} + +int coroutine_fn bdrv_co_pr_release(BlockDriverState *bs, uint64_t key, + BlockPrType type) +{ + BlockDriver *drv =3D bs->drv; + CoroutineIOCompletion co =3D { + .coroutine =3D qemu_coroutine_self(), + }; + + IO_CODE(); + assert_bdrv_graph_readable(); + + bdrv_inc_in_flight(bs); + if (!drv || !drv->bdrv_co_pr_release) { + co.ret =3D -ENOTSUP; + goto out; + } + + co.ret =3D drv->bdrv_co_pr_release(bs, key, type); +out: + bdrv_dec_in_flight(bs); + return co.ret; +} + +int coroutine_fn bdrv_co_pr_clear(BlockDriverState *bs, uint64_t key) +{ + BlockDriver *drv =3D bs->drv; + CoroutineIOCompletion co =3D { + .coroutine =3D qemu_coroutine_self(), + }; + + IO_CODE(); + assert_bdrv_graph_readable(); + + bdrv_inc_in_flight(bs); + if (!drv || !drv->bdrv_co_pr_clear) { + co.ret =3D -ENOTSUP; + goto out; + } + + co.ret =3D drv->bdrv_co_pr_clear(bs, key); +out: + bdrv_dec_in_flight(bs); + return co.ret; +} + +int coroutine_fn bdrv_co_pr_preempt(BlockDriverState *bs, uint64_t cr_key, + uint64_t pr_key, BlockPrType type, bool abort) +{ + BlockDriver *drv =3D bs->drv; + CoroutineIOCompletion co =3D { + .coroutine =3D qemu_coroutine_self(), + }; + + IO_CODE(); + assert_bdrv_graph_readable(); + + bdrv_inc_in_flight(bs); + if (!drv || !drv->bdrv_co_pr_preempt) { + co.ret =3D -ENOTSUP; + goto out; + } + + co.ret =3D drv->bdrv_co_pr_preempt(bs, cr_key, pr_key, type, abort); +out: + bdrv_dec_in_flight(bs); + return co.ret; +} + int coroutine_fn bdrv_co_zone_report(BlockDriverState *bs, int64_t offset, unsigned int *nr_zones, BlockZoneDescriptor *zones) diff --git a/include/block/block-common.h b/include/block/block-common.h index a846023a09..7ca4e2328f 100644 --- a/include/block/block-common.h +++ b/include/block/block-common.h @@ -524,6 +524,46 @@ typedef enum { BDRV_FIX_ERRORS =3D 2, } BdrvCheckMode; =20 +/** + * According SCSI protocol(chapter 5.9 of SCSI Primary Commands - 4) + * and NVMe protocol(chapter 7.2 of NVMe Base Specification 2.0), + * the persistent reservation types and persistent capabilities of + * the public layer block are abstracted. + */ +typedef enum { + BLK_PR_WRITE_EXCLUSIVE =3D 0x1, + BLK_PR_EXCLUSIVE_ACCESS =3D 0x2, + BLK_PR_WRITE_EXCLUSIVE_REGS_ONLY =3D 0x3, + BLK_PR_EXCLUSIVE_ACCESS_REGS_ONLY =3D 0x4, + BLK_PR_WRITE_EXCLUSIVE_ALL_REGS =3D 0x5, + BLK_PR_EXCLUSIVE_ACCESS_ALL_REGS =3D 0x6, +} BlockPrType; + +typedef enum BLKPrCap { + /* Persist Through Power Loss */ + BLK_PR_CAP_PTPL =3D 1 << 0, + /* Write Exclusive reservation type */ + BLK_PR_CAP_WR_EX =3D 1 << 1, + /* Exclusive Access reservation type */ + BLK_PR_CAP_EX_AC =3D 1 << 2, + /* Write Exclusive Registrants Only reservation type */ + BLK_PR_CAP_WR_EX_RO =3D 1 << 3, + /* Exclusive Access Registrants Only reservation type */ + BLK_PR_CAP_EX_AC_RO =3D 1 << 4, + /* Write Exclusive All Registrants reservation type */ + BLK_PR_CAP_WR_EX_AR =3D 1 << 5, + /* Exclusive Access All Registrants reservation type */ + BLK_PR_CAP_EX_AC_AR =3D 1 << 6, + + BLK_PR_CAP_ALL =3D (BLK_PR_CAP_PTPL | + BLK_PR_CAP_WR_EX | + BLK_PR_CAP_EX_AC | + BLK_PR_CAP_WR_EX_RO | + BLK_PR_CAP_EX_AC_RO | + BLK_PR_CAP_WR_EX_AR | + BLK_PR_CAP_EX_AC_AR), +} BLKPrCap; + typedef struct BlockSizes { uint32_t phys; uint32_t log; diff --git a/include/block/block-io.h b/include/block/block-io.h index b49e0537dd..908361862b 100644 --- a/include/block/block-io.h +++ b/include/block/block-io.h @@ -106,6 +106,26 @@ void bdrv_aio_cancel_async(BlockAIOCB *acb); int coroutine_fn GRAPH_RDLOCK bdrv_co_ioctl(BlockDriverState *bs, int req, void *buf); =20 +int coroutine_fn GRAPH_RDLOCK +bdrv_co_pr_read_keys(BlockDriverState *bs, uint32_t *generation, + uint32_t num_keys, uint64_t *keys); +int coroutine_fn GRAPH_RDLOCK +bdrv_co_pr_read_reservation(BlockDriverState *bs, uint32_t *generation, + uint64_t *key, BlockPrType *type); +int coroutine_fn GRAPH_RDLOCK +bdrv_co_pr_register(BlockDriverState *bs, uint64_t old_key, + uint64_t new_key, BlockPrType type, + bool ptpl, bool ignore_key); +int coroutine_fn GRAPH_RDLOCK +bdrv_co_pr_reserve(BlockDriverState *bs, uint64_t key, BlockPrType type); +int coroutine_fn GRAPH_RDLOCK +bdrv_co_pr_release(BlockDriverState *bs, uint64_t key, BlockPrType type); +int coroutine_fn GRAPH_RDLOCK +bdrv_co_pr_clear(BlockDriverState *bs, uint64_t key); +int coroutine_fn GRAPH_RDLOCK +bdrv_co_pr_preempt(BlockDriverState *bs, uint64_t cr_key, uint64_t pr_key, + BlockPrType type, bool abort); + /* Ensure contents are flushed to disk. */ int coroutine_fn GRAPH_RDLOCK bdrv_co_flush(BlockDriverState *bs); =20 diff --git a/include/block/block_int-common.h b/include/block/block_int-com= mon.h index 761276127e..6e628069e9 100644 --- a/include/block/block_int-common.h +++ b/include/block/block_int-common.h @@ -766,6 +766,87 @@ struct BlockDriver { int coroutine_fn GRAPH_RDLOCK_PTR (*bdrv_co_ioctl)( BlockDriverState *bs, unsigned long int req, void *buf); =20 + /* + * Persistent reservation series api. + * Please refer to chapter 5.9 of SCSI Primary Commands - 4 or + * chapter 7 of NVMe Base Specification 2.0. + * + * The block layer driver should implement all the following APIs + * or none at all, including: bdrv_co_pr_read_keys, + * bdrv_co_pr_read_reservation, bdrv_co_pr_register, + * bdrv_co_pr_reserve, bdrv_co_pr_release, + * bdrv_co_pr_clear and bdrv_co_pr_preempt. + * + * Read the registered keys and return them in the @keys. + * @generation: The generation of the reservation key. + * @num_keys: The maximum number of keys that can be transmitted. + * @keys: Registered keys array. + * + * On success, store generation in @generation and store keys @keys + * and return the number of @keys. + * On failure return -errno. + */ + int coroutine_fn GRAPH_RDLOCK_PTR(*bdrv_co_pr_read_keys)( + BlockDriverState *bs, uint32_t *generation, + uint32_t num_keys, uint64_t *keys); + /* + * Read the reservation key and store it in the @key. + * @generation: The generation of the reservation key. + * @key: The reservation key. + * @type: Type of the reservation key. + * + * On success, store generation in @generation, store the + * reservation key in @key and return the number of @key + * which used to determine whether the reservation key exists. + * On failure return -errno. + */ + int coroutine_fn GRAPH_RDLOCK_PTR(*bdrv_co_pr_read_reservation)( + BlockDriverState *bs, uint32_t *generation, + uint64_t *key, BlockPrType *type); + /* + * Register, unregister, or replace a reservation key. + * @old_key: The current reservation key associated with the host. + * @new_key: The new reservation Key. + * @type: Type of the reservation key. + * @ignore_key: Ignore or not @old_key. + * @ptpl: Whether to support Persist Through Power Loss(PTPL). + */ + int coroutine_fn GRAPH_RDLOCK_PTR(*bdrv_co_pr_register)( + BlockDriverState *bs, uint64_t old_key, + uint64_t new_key, BlockPrType type, + bool ptpl, bool ignore_key); + /* + * Acquire a reservation on a host. + * @key: The current reservation key associated with the host. + * @type: Type of the reservation key. + */ + int coroutine_fn GRAPH_RDLOCK_PTR(*bdrv_co_pr_reserve)( + BlockDriverState *bs, uint64_t key, BlockPrType type); + /* + * Release a reservation on a host. + * @key: The current reservation key associated with the host. + * @type: Type of the reservation key. + */ + int coroutine_fn GRAPH_RDLOCK_PTR(*bdrv_co_pr_release)( + BlockDriverState *bs, uint64_t key, BlockPrType type); + /** + * Clear reservations on a host. + * @key: The current reservation key associated with the host. + */ + int coroutine_fn GRAPH_RDLOCK_PTR(*bdrv_co_pr_clear)( + BlockDriverState *bs, uint64_t key); + /* + * Preempt a reservation held on a host. + * @cr_key: The current reservation key associated with the host. + * @pr_key: The preempt reservation Key which to be + * unregistered from the namespace. + * @type: Type of the reservation key. + * @abort: Whether to abort a reservation held on a host. + */ + int coroutine_fn GRAPH_RDLOCK_PTR(*bdrv_co_pr_preempt)( + BlockDriverState *bs, uint64_t cr_key, + uint64_t pr_key, BlockPrType type, bool abort); + /* * Returns 0 for completed check, -errno for internal errors. * The check results are stored in result. @@ -899,6 +980,9 @@ typedef struct BlockLimits { uint32_t max_active_zones; =20 uint32_t write_granularity; + + /* Persistent reservation capacities. */ + uint8_t pr_cap; } BlockLimits; =20 typedef struct BdrvOpBlocker BdrvOpBlocker; diff --git a/include/sysemu/block-backend-io.h b/include/sysemu/block-backe= nd-io.h index d174275a5c..b3d49a3c6f 100644 --- a/include/sysemu/block-backend-io.h +++ b/include/sysemu/block-backend-io.h @@ -62,6 +62,30 @@ void blk_aio_cancel_async(BlockAIOCB *acb); BlockAIOCB *blk_aio_ioctl(BlockBackend *blk, unsigned long int req, void *= buf, BlockCompletionFunc *cb, void *opaque); =20 +BlockAIOCB *blk_aio_pr_read_keys(BlockBackend *blk, uint32_t *generation, + uint32_t num_keys, uint64_t *keys, + BlockCompletionFunc *cb, void *opaque); +BlockAIOCB *blk_aio_pr_read_reservation(BlockBackend *blk, uint32_t *gener= ation, + uint64_t *key, BlockPrType *type, + BlockCompletionFunc *cb, void *opa= que); +BlockAIOCB *blk_aio_pr_register(BlockBackend *blk, uint64_t old_key, + uint64_t new_key, BlockPrType type, + bool ptpl, bool ignore_key, + BlockCompletionFunc *cb, + void *opaque); +BlockAIOCB *blk_aio_pr_reserve(BlockBackend *blk, uint64_t key, + BlockPrType type, + BlockCompletionFunc *cb, + void *opaque); +BlockAIOCB *blk_aio_pr_release(BlockBackend *blk, uint64_t key, + BlockPrType type, BlockCompletionFunc *cb, + void *opaque); +BlockAIOCB *blk_aio_pr_clear(BlockBackend *blk, uint64_t key, + BlockCompletionFunc *cb, void *opaque); +BlockAIOCB *blk_aio_pr_preempt(BlockBackend *blk, uint64_t cr_key, + uint64_t pr_key, BlockPrType type, bool abo= rt, + BlockCompletionFunc *cb, void *opaque); + void blk_inc_in_flight(BlockBackend *blk); void blk_dec_in_flight(BlockBackend *blk); =20 --=20 2.20.1