block/qcow2-bitmap.c | 5 +- qemu-img.c | 24 +++++ .../qemu-iotests/tests/qemu-img-close-errors | 95 +++++++++++++++++++ .../tests/qemu-img-close-errors.out | 23 +++++ 4 files changed, 145 insertions(+), 2 deletions(-) create mode 100755 tests/qemu-iotests/tests/qemu-img-close-errors create mode 100644 tests/qemu-iotests/tests/qemu-img-close-errors.out
This series addresses the problem described in these bug reports: https://gitlab.com/qemu-project/qemu/-/issues/1330 https://bugzilla.redhat.com/show_bug.cgi?id=2147617 qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate(). However, when the function is called through blk_unref(), in the case of such errors, while an error message is written to stderr, the callers never see an error return. Specifically, 'qemu-img bitmap/commit' are reported to exit with an exit code 0 despite the errors. The solution taken here is inactivating the images first, which can still return errors, but already performs all of the write operations. Only then the images are actually blk_unref()-ed. If we agree that this is the way to go (as a potential alternative, allowing blk_unref() to fail would require changes in all kinds of places, many of which probably wouldn't even know what to do with the error), then I suppose doing the same for other qemu-img subcommands would make sense, too. As a bonus fix, I found an endianness confusion in an error path of store_bitmap(). The reported error message "qcow2_free_clusters failed: No space left on device" looked too suspicious to ignore this. Freeing an actually existing cluster should never run into ENOSPC. Kevin Wolf (4): qcow2: Fix theoretical corruption in store_bitmap() error path qemu-img commit: Report errors while closing the image qemu-img bitmap: Report errors while closing the image qemu-iotests: Test qemu-img bitmap/commit exit code on error block/qcow2-bitmap.c | 5 +- qemu-img.c | 24 +++++ .../qemu-iotests/tests/qemu-img-close-errors | 95 +++++++++++++++++++ .../tests/qemu-img-close-errors.out | 23 +++++ 4 files changed, 145 insertions(+), 2 deletions(-) create mode 100755 tests/qemu-iotests/tests/qemu-img-close-errors create mode 100644 tests/qemu-iotests/tests/qemu-img-close-errors.out -- 2.38.1
On 12.01.23 20:14, Kevin Wolf wrote: > This series addresses the problem described in these bug reports: > https://gitlab.com/qemu-project/qemu/-/issues/1330 > https://bugzilla.redhat.com/show_bug.cgi?id=2147617 > > qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate(). > However, when the function is called through blk_unref(), in the case of > such errors, while an error message is written to stderr, the callers > never see an error return. Specifically, 'qemu-img bitmap/commit' are > reported to exit with an exit code 0 despite the errors. > > The solution taken here is inactivating the images first, which can > still return errors, but already performs all of the write operations. > Only then the images are actually blk_unref()-ed. Reviewed-by: Hanna Czenczek <hreitz@redhat.com>
Drive-by comment... Kevin Wolf <kwolf@redhat.com> writes: > This series addresses the problem described in these bug reports: > https://gitlab.com/qemu-project/qemu/-/issues/1330 > https://bugzilla.redhat.com/show_bug.cgi?id=2147617 > > qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate(). > However, when the function is called through blk_unref(), in the case of > such errors, while an error message is written to stderr, the callers > never see an error return. Specifically, 'qemu-img bitmap/commit' are > reported to exit with an exit code 0 despite the errors. After having tead the "potential alternative" below, I figure this failure happens within blk_unref(). But I can't see a call chain. Am I confused? > The solution taken here is inactivating the images first, which can > still return errors, but already performs all of the write operations. > Only then the images are actually blk_unref()-ed. > > If we agree that this is the way to go (as a potential alternative, > allowing blk_unref() to fail would require changes in all kinds of > places, many of which probably wouldn't even know what to do with the > error), blk_unref() could fail only when it destroys @blk (refcnt goes to zero). Correct? We have a bunch of "unref" functions in the tree, and, as far as I can tell from a quick grep, none of them can fail. Supports your apparent preference for not changing blk_unref(). > then I suppose doing the same for other qemu-img subcommands > would make sense, too. I was about to ask whether there could be more silent failures like the ones in commit and bitmap. This suggests there are. Say we do the same for all known such failures. Would any remaining (or new) such failures be programming errors? [...]
Am 13.01.2023 um 08:30 hat Markus Armbruster geschrieben: > Drive-by comment... > > Kevin Wolf <kwolf@redhat.com> writes: > > > This series addresses the problem described in these bug reports: > > https://gitlab.com/qemu-project/qemu/-/issues/1330 > > https://bugzilla.redhat.com/show_bug.cgi?id=2147617 > > > > qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate(). > > However, when the function is called through blk_unref(), in the case of > > such errors, while an error message is written to stderr, the callers > > never see an error return. Specifically, 'qemu-img bitmap/commit' are > > reported to exit with an exit code 0 despite the errors. > > After having tead the "potential alternative" below, I figure this > failure happens within blk_unref(). But I can't see a call chain. Am I > confused? When I put an abort() into the error path: #0 0x00007ffff6aa156c in __pthread_kill_implementation () from /lib64/libc.so.6 #1 0x00007ffff6a54d76 in raise () from /lib64/libc.so.6 #2 0x00007ffff6a287f3 in abort () from /lib64/libc.so.6 #3 0x00005555556108f3 in qcow2_inactivate (bs=0x555555879a30) at ../block/qcow2.c:2705 #4 0x0000555555610a08 in qcow2_do_close (bs=0x555555879a30, close_data_file=true) at ../block/qcow2.c:2741 #5 0x0000555555610b38 in qcow2_close (bs=0x555555879a30) at ../block/qcow2.c:2770 #6 0x00005555555a1b4e in bdrv_close (bs=0x555555879a30) at ../block.c:4939 #7 0x00005555555a2ad4 in bdrv_delete (bs=0x555555879a30) at ../block.c:5330 #8 0x00005555555a5b49 in bdrv_unref (bs=0x555555879a30) at ../block.c:6850 #9 0x000055555559d6c5 in bdrv_root_unref_child (child=0x555555873300) at ../block.c:3207 #10 0x00005555555c7beb in blk_remove_bs (blk=0x5555558796e0) at ../block/block-backend.c:895 #11 0x00005555555c6c3f in blk_delete (blk=0x5555558796e0) at ../block/block-backend.c:479 #12 0x00005555555c6fb0 in blk_unref (blk=0x5555558796e0) at ../block/block-backend.c:537 #13 0x0000555555587dc9 in img_bitmap (argc=7, argv=0x7fffffffd760) at ../qemu-img.c:4820 #14 0x0000555555589807 in main (argc=7, argv=0x7fffffffd760) at ../qemu-img.c:5450 > > The solution taken here is inactivating the images first, which can > > still return errors, but already performs all of the write operations. > > Only then the images are actually blk_unref()-ed. > > > > If we agree that this is the way to go (as a potential alternative, > > allowing blk_unref() to fail would require changes in all kinds of > > places, many of which probably wouldn't even know what to do with the > > error), > > blk_unref() could fail only when it destroys @blk (refcnt goes to zero). > Correct? I think so, yes. > We have a bunch of "unref" functions in the tree, and, as far as I can > tell from a quick grep, none of them can fail. Supports your apparent > preference for not changing blk_unref(). > > > then I suppose doing the same for other qemu-img subcommands > > would make sense, too. > > I was about to ask whether there could be more silent failures like the > ones in commit and bitmap. This suggests there are. > > Say we do the same for all known such failures. Would any remaining (or > new) such failures be programming errors? Let's be honest: What I'm proposing here is not pretty and not a full solution, it only covers the simplest part of the problem, which happens to be the part that has shown up in practice. If you have a good idea how to solve the general problem, I'm all ears. I haven't checked other qemu-img subcommands, but I don't see why they wouldn't be able to run into an error in .bdrv_close. They could be fixed the same way. The next level in difficulty might be QMP block-delete. It's still easy because like in qemu-img, we know that we're freeing the last reference, and so we could actually do the same here. Well, more or less the same at least: Obviously not inactivate_all(), but just for a single node. We also need to do this recursively for children, except only for those that would actually go away together with our parent node and aren't referenced elsewhere. Even if we manage to implement this correctly, what do we do with the error? Would returning a QMP error imply that we didn't actually close the image and it's still valid (and not inactivated)? Too easy? Let's make it a bit harder. Let's say a commit job completes and we're now removing the intermediate nodes. One of these images could in theory fail in .bdrv_close. We have successfully committed the data, the new graph is ready and in good state. Just one of the old images we're throwing out runs into ENOSPC in its .bdrv_close. Where do we report that error? We don't even necessarily have a QMP command here, we could only let the whole block job fail, which is probably not a good way to let libvirt know what was happening. Also, we can't just unconditionally inactivate the image beforehand there, it might still be in use by other references. Which may actually be dropped while we're draining the node in bdrv_close(). Not enough headaches yet? There are plenty of places in QEMU that just want to make sure that the node doesn't go away while they are still doing something with it. So they use a bdrv_ref/unref pair locally. These places could end up freeing the last reference if the node would have gone away otherwise. They are almost certainly a very confusing place to report the error. They might not even be places that can return errors at all currently. So the main reason why I'm not doing this properly by returning the errors from qcow2_close() (and .bdrv_close in all other drivers) through bdrv_unref() down to the callers of that is not only that it would be a major conversion that would touch lots of places, but also that I wouldn't even know what to do with the error in most callers. And that I'm not sure what the semantics of an error in a close function should be. Another thing that could be tried is making failure in .bdrv_close less likely by doing things earlier. At least ENOSPC could probably be avoided if dirty bitmaps clusters were allocated during the write request that first sets a bit in them (I know too little about the details how bitmaps are implemented in qcow2, though, maybe Vladimir can help here). But ultimately, you'll always get some I/O requests in .bdrv_close and they could fail even if we made it less likely. Kevin
Kevin Wolf <kwolf@redhat.com> writes: > Am 13.01.2023 um 08:30 hat Markus Armbruster geschrieben: >> Drive-by comment... >> >> Kevin Wolf <kwolf@redhat.com> writes: >> >> > This series addresses the problem described in these bug reports: >> > https://gitlab.com/qemu-project/qemu/-/issues/1330 >> > https://bugzilla.redhat.com/show_bug.cgi?id=2147617 >> > >> > qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate(). >> > However, when the function is called through blk_unref(), in the case of >> > such errors, while an error message is written to stderr, the callers >> > never see an error return. Specifically, 'qemu-img bitmap/commit' are >> > reported to exit with an exit code 0 despite the errors. >> >> After having tead the "potential alternative" below, I figure this >> failure happens within blk_unref(). But I can't see a call chain. Am I >> confused? > > When I put an abort() into the error path: > > #0 0x00007ffff6aa156c in __pthread_kill_implementation () from /lib64/libc.so.6 > #1 0x00007ffff6a54d76 in raise () from /lib64/libc.so.6 > #2 0x00007ffff6a287f3 in abort () from /lib64/libc.so.6 > #3 0x00005555556108f3 in qcow2_inactivate (bs=0x555555879a30) at ../block/qcow2.c:2705 > #4 0x0000555555610a08 in qcow2_do_close (bs=0x555555879a30, close_data_file=true) at ../block/qcow2.c:2741 > #5 0x0000555555610b38 in qcow2_close (bs=0x555555879a30) at ../block/qcow2.c:2770 > #6 0x00005555555a1b4e in bdrv_close (bs=0x555555879a30) at ../block.c:4939 > #7 0x00005555555a2ad4 in bdrv_delete (bs=0x555555879a30) at ../block.c:5330 > #8 0x00005555555a5b49 in bdrv_unref (bs=0x555555879a30) at ../block.c:6850 > #9 0x000055555559d6c5 in bdrv_root_unref_child (child=0x555555873300) at ../block.c:3207 > #10 0x00005555555c7beb in blk_remove_bs (blk=0x5555558796e0) at ../block/block-backend.c:895 > #11 0x00005555555c6c3f in blk_delete (blk=0x5555558796e0) at ../block/block-backend.c:479 > #12 0x00005555555c6fb0 in blk_unref (blk=0x5555558796e0) at ../block/block-backend.c:537 > #13 0x0000555555587dc9 in img_bitmap (argc=7, argv=0x7fffffffd760) at ../qemu-img.c:4820 > #14 0x0000555555589807 in main (argc=7, argv=0x7fffffffd760) at ../qemu-img.c:5450 Thanks! >> > The solution taken here is inactivating the images first, which can >> > still return errors, but already performs all of the write operations. >> > Only then the images are actually blk_unref()-ed. >> > >> > If we agree that this is the way to go (as a potential alternative, >> > allowing blk_unref() to fail would require changes in all kinds of >> > places, many of which probably wouldn't even know what to do with the >> > error), >> >> blk_unref() could fail only when it destroys @blk (refcnt goes to zero). >> Correct? > > I think so, yes. Thanks again! >> We have a bunch of "unref" functions in the tree, and, as far as I can >> tell from a quick grep, none of them can fail. Supports your apparent >> preference for not changing blk_unref(). >> >> > then I suppose doing the same for other qemu-img subcommands >> > would make sense, too. >> >> I was about to ask whether there could be more silent failures like the >> ones in commit and bitmap. This suggests there are. >> >> Say we do the same for all known such failures. Would any remaining (or >> new) such failures be programming errors? > > Let's be honest: What I'm proposing here is not pretty and not a full > solution, it only covers the simplest part of the problem, which happens > to be the part that has shown up in practice. > > If you have a good idea how to solve the general problem, I'm all ears. > > I haven't checked other qemu-img subcommands, but I don't see why they > wouldn't be able to run into an error in .bdrv_close. They could be > fixed the same way. > > The next level in difficulty might be QMP block-delete. It's still easy > because like in qemu-img, we know that we're freeing the last reference, > and so we could actually do the same here. Well, more or less the same > at least: Obviously not inactivate_all(), but just for a single node. We > also need to do this recursively for children, except only for those > that would actually go away together with our parent node and aren't > referenced elsewhere. Even if we manage to implement this correctly, > what do we do with the error? Would returning a QMP error imply that we > didn't actually close the image and it's still valid (and not > inactivated)? > > Too easy? Let's make it a bit harder. Let's say a commit job completes > and we're now removing the intermediate nodes. One of these images could > in theory fail in .bdrv_close. We have successfully committed the data, > the new graph is ready and in good state. Just one of the old images > we're throwing out runs into ENOSPC in its .bdrv_close. Where do we > report that error? We don't even necessarily have a QMP command here, we > could only let the whole block job fail, which is probably not a good > way to let libvirt know what was happening. Also, we can't just > unconditionally inactivate the image beforehand there, it might still be > in use by other references. Which may actually be dropped while we're > draining the node in bdrv_close(). > > Not enough headaches yet? There are plenty of places in QEMU that just > want to make sure that the node doesn't go away while they are still > doing something with it. So they use a bdrv_ref/unref pair locally. > These places could end up freeing the last reference if the node would > have gone away otherwise. They are almost certainly a very confusing > place to report the error. They might not even be places that can return > errors at all currently. Yes. > So the main reason why I'm not doing this properly by returning the > errors from qcow2_close() (and .bdrv_close in all other drivers) through > bdrv_unref() down to the callers of that is not only that it would be a > major conversion that would touch lots of places, but also that I > wouldn't even know what to do with the error in most callers. And that > I'm not sure what the semantics of an error in a close function should > be. Understand. > Another thing that could be tried is making failure in .bdrv_close less > likely by doing things earlier. At least ENOSPC could probably be > avoided if dirty bitmaps clusters were allocated during the write > request that first sets a bit in them (I know too little about the > details how bitmaps are implemented in qcow2, though, maybe Vladimir can > help here). But ultimately, you'll always get some I/O requests in > .bdrv_close and they could fail even if we made it less likely. Let me try to summarize to make sure I understand. Closing an image can fail for the same reason close() can fail: flushing caches can fail, and not caching is not an option. The close is commonly hidden within a bdrv_unref(). It closes when the last reference goes away. Sometimes we know which bdrv_unref() will close. Sometimes we don't. Some bdrv_unref() callers can report errors sanely. Others simply can't. Some failures to close can be safely ignored, such as closing a temporary image that is going away anyway. But it's hard to tell when this is the case. Ideally, things fail cleanly: we either do what's asked and succeed, or do nothing and fail. A failure to close is commonly unclean. So, even if we can report it, recovery can be hard or impossible. A common criticism of garbage collection is that finalization is delayed and runs "out of context". The above shows that reference counting isn't all that better. We could have two variants of bdrv_unref(), one that must not fail, and one that can fail and must be checked. But as you explained, ensuring failure only happens in places where we can handle an error sanely is somewhere between hard and impossible. No better ideas, I'm afraid.
On 15.02.23 16:07, Markus Armbruster wrote: > Kevin Wolf <kwolf@redhat.com> writes: > >> Am 13.01.2023 um 08:30 hat Markus Armbruster geschrieben: >>> Drive-by comment... >>> >>> Kevin Wolf <kwolf@redhat.com> writes: >>> >>>> This series addresses the problem described in these bug reports: >>>> https://gitlab.com/qemu-project/qemu/-/issues/1330 >>>> https://bugzilla.redhat.com/show_bug.cgi?id=2147617 >>>> >>>> qcow2 can fail when writing back dirty bitmaps in qcow2_inactivate(). >>>> However, when the function is called through blk_unref(), in the case of >>>> such errors, while an error message is written to stderr, the callers >>>> never see an error return. Specifically, 'qemu-img bitmap/commit' are >>>> reported to exit with an exit code 0 despite the errors. >>> >>> After having tead the "potential alternative" below, I figure this >>> failure happens within blk_unref(). But I can't see a call chain. Am I >>> confused? >> >> When I put an abort() into the error path: >> >> #0 0x00007ffff6aa156c in __pthread_kill_implementation () from /lib64/libc.so.6 >> #1 0x00007ffff6a54d76 in raise () from /lib64/libc.so.6 >> #2 0x00007ffff6a287f3 in abort () from /lib64/libc.so.6 >> #3 0x00005555556108f3 in qcow2_inactivate (bs=0x555555879a30) at ../block/qcow2.c:2705 >> #4 0x0000555555610a08 in qcow2_do_close (bs=0x555555879a30, close_data_file=true) at ../block/qcow2.c:2741 >> #5 0x0000555555610b38 in qcow2_close (bs=0x555555879a30) at ../block/qcow2.c:2770 >> #6 0x00005555555a1b4e in bdrv_close (bs=0x555555879a30) at ../block.c:4939 >> #7 0x00005555555a2ad4 in bdrv_delete (bs=0x555555879a30) at ../block.c:5330 >> #8 0x00005555555a5b49 in bdrv_unref (bs=0x555555879a30) at ../block.c:6850 >> #9 0x000055555559d6c5 in bdrv_root_unref_child (child=0x555555873300) at ../block.c:3207 >> #10 0x00005555555c7beb in blk_remove_bs (blk=0x5555558796e0) at ../block/block-backend.c:895 >> #11 0x00005555555c6c3f in blk_delete (blk=0x5555558796e0) at ../block/block-backend.c:479 >> #12 0x00005555555c6fb0 in blk_unref (blk=0x5555558796e0) at ../block/block-backend.c:537 >> #13 0x0000555555587dc9 in img_bitmap (argc=7, argv=0x7fffffffd760) at ../qemu-img.c:4820 >> #14 0x0000555555589807 in main (argc=7, argv=0x7fffffffd760) at ../qemu-img.c:5450 > > Thanks! > >>>> The solution taken here is inactivating the images first, which can >>>> still return errors, but already performs all of the write operations. >>>> Only then the images are actually blk_unref()-ed. >>>> >>>> If we agree that this is the way to go (as a potential alternative, >>>> allowing blk_unref() to fail would require changes in all kinds of >>>> places, many of which probably wouldn't even know what to do with the >>>> error), >>> >>> blk_unref() could fail only when it destroys @blk (refcnt goes to zero). >>> Correct? >> >> I think so, yes. > > Thanks again! > >>> We have a bunch of "unref" functions in the tree, and, as far as I can >>> tell from a quick grep, none of them can fail. Supports your apparent >>> preference for not changing blk_unref(). >>> >>>> then I suppose doing the same for other qemu-img subcommands >>>> would make sense, too. >>> >>> I was about to ask whether there could be more silent failures like the >>> ones in commit and bitmap. This suggests there are. >>> >>> Say we do the same for all known such failures. Would any remaining (or >>> new) such failures be programming errors? >> >> Let's be honest: What I'm proposing here is not pretty and not a full >> solution, it only covers the simplest part of the problem, which happens >> to be the part that has shown up in practice. >> >> If you have a good idea how to solve the general problem, I'm all ears. >> >> I haven't checked other qemu-img subcommands, but I don't see why they >> wouldn't be able to run into an error in .bdrv_close. They could be >> fixed the same way. >> >> The next level in difficulty might be QMP block-delete. It's still easy >> because like in qemu-img, we know that we're freeing the last reference, >> and so we could actually do the same here. Well, more or less the same >> at least: Obviously not inactivate_all(), but just for a single node. We >> also need to do this recursively for children, except only for those >> that would actually go away together with our parent node and aren't >> referenced elsewhere. Even if we manage to implement this correctly, >> what do we do with the error? Would returning a QMP error imply that we >> didn't actually close the image and it's still valid (and not >> inactivated)? >> >> Too easy? Let's make it a bit harder. Let's say a commit job completes >> and we're now removing the intermediate nodes. One of these images could >> in theory fail in .bdrv_close. We have successfully committed the data, >> the new graph is ready and in good state. Just one of the old images >> we're throwing out runs into ENOSPC in its .bdrv_close. Where do we >> report that error? We don't even necessarily have a QMP command here, we >> could only let the whole block job fail, which is probably not a good >> way to let libvirt know what was happening. Also, we can't just >> unconditionally inactivate the image beforehand there, it might still be >> in use by other references. Which may actually be dropped while we're >> draining the node in bdrv_close(). >> >> Not enough headaches yet? There are plenty of places in QEMU that just >> want to make sure that the node doesn't go away while they are still >> doing something with it. So they use a bdrv_ref/unref pair locally. >> These places could end up freeing the last reference if the node would >> have gone away otherwise. They are almost certainly a very confusing >> place to report the error. They might not even be places that can return >> errors at all currently. > > Yes. > >> So the main reason why I'm not doing this properly by returning the >> errors from qcow2_close() (and .bdrv_close in all other drivers) through >> bdrv_unref() down to the callers of that is not only that it would be a >> major conversion that would touch lots of places, but also that I >> wouldn't even know what to do with the error in most callers. And that >> I'm not sure what the semantics of an error in a close function should >> be. > > Understand. > >> Another thing that could be tried is making failure in .bdrv_close less >> likely by doing things earlier. At least ENOSPC could probably be >> avoided if dirty bitmaps clusters were allocated during the write >> request that first sets a bit in them (I know too little about the >> details how bitmaps are implemented in qcow2, though, maybe Vladimir can >> help here). But ultimately, you'll always get some I/O requests in >> .bdrv_close and they could fail even if we made it less likely. > > Let me try to summarize to make sure I understand. > > Closing an image can fail for the same reason close() can fail: flushing > caches can fail, and not caching is not an option. > > The close is commonly hidden within a bdrv_unref(). It closes when the > last reference goes away. > > Sometimes we know which bdrv_unref() will close. Sometimes we don't. > > Some bdrv_unref() callers can report errors sanely. Others simply > can't. > > Some failures to close can be safely ignored, such as closing a > temporary image that is going away anyway. But it's hard to tell when > this is the case. > > Ideally, things fail cleanly: we either do what's asked and succeed, or > do nothing and fail. A failure to close is commonly unclean. So, even > if we can report it, recovery can be hard or impossible. > > > A common criticism of garbage collection is that finalization is delayed > and runs "out of context". The above shows that reference counting > isn't all that better. > > We could have two variants of bdrv_unref(), one that must not fail, and > one that can fail and must be checked. But as you explained, ensuring > failure only happens in places where we can handle an error sanely is > somewhere between hard and impossible. > > No better ideas, I'm afraid. > I just add my thought: If user worries about correct closing of any block node, user should control when the node is closed. And we have all the instruments: User just should create each node personally by blockdev-add (or -blockdev). And then before terminating QEMU process do blockdev-del correspondingly for all nodes. This way the moment when the node is finally closed is obvious: it's blockdev-del, where user can get appropriate error message (and retry deletion if needed). To achieve this we need similar additional bdrv_inactivate() call in qmp_blockdev_del() to report an error (and don't bdrv_unref() in case of failed inactivation). And we can add a boolean "force" argument to blcokdev-del to skip this additional bdrv_inactivate() call. (yes, this will not work for qemu-img. But qsd may be used instead to create more reliable scenarios) -- Best regards, Vladimir
A half-baked thought has been sloshing around in my head. Perhaps I can bake it some more by writing it up. Reference-counting and finalizers that can fail are uneasy partners. When managing lifetimes manually, you control where finalization happens. When finalization can fail, you're as empowered as you could be to make it fail in a place where you can handle the failure sensibly. Manual resource management is tedious and error prone, and that's a serious problem. Garbage collection takes it off your hands. Good. But now finalization happens at some future time, and in garbage collection context. Fine when finalization's side effects are sufficiently harmless. But what if finalization can fail? We trade one serious problem (manual resource management) for another one (handling finalization failures). Reference counting is slightly different. Here, finalization can only happen at unref, which means you retain more control than with garbage collection. However, we do need unrefs in places where we can't sensibly handle failure. For instance, when code operates on an object whose reference count can be dropped concurrently, we need to guard with a ref/unref bracket to keep the object alive while the code is messing with it. The only way out I can see is to systematically avoid finalizers that can fail, by extracting the part that can fail into a shutdown method, to be called in a suitable context, and before finalization. Yes, this takes us back to manual resource management, only we manage shutdown instead of death. Finalizing something that has not been shut down would be a programming error. A recoverable one, I guess; we can have finalize attempt to shut down then, and if it fails, just weep into the logs and move on. We gain a "shut down" state, and new problems may well come with it.
On Wed, Feb 22, 2023 at 01:08:05PM +0100, Markus Armbruster wrote: > A half-baked thought has been sloshing around in my head. Perhaps I can > bake it some more by writing it up. > > Reference-counting and finalizers that can fail are uneasy partners. > > When managing lifetimes manually, you control where finalization > happens. When finalization can fail, you're as empowered as you could > be to make it fail in a place where you can handle the failure sensibly. > > Manual resource management is tedious and error prone, and that's a > serious problem. Garbage collection takes it off your hands. Good. > But now finalization happens at some future time, and in garbage > collection context. Fine when finalization's side effects are > sufficiently harmless. But what if finalization can fail? We trade one > serious problem (manual resource management) for another one (handling > finalization failures). > > Reference counting is slightly different. Here, finalization can only > happen at unref, which means you retain more control than with garbage > collection. However, we do need unrefs in places where we can't > sensibly handle failure. For instance, when code operates on an object > whose reference count can be dropped concurrently, we need to guard with > a ref/unref bracket to keep the object alive while the code is messing > with it. > > The only way out I can see is to systematically avoid finalizers that > can fail, by extracting the part that can fail into a shutdown method, > to be called in a suitable context, and before finalization. Yes, I concur with pretty much everything you say above. Since finalizers can occur in any context the logic that runs in them needs to be quite clearly defined and free of side effects or unpredictable failure scenarios. I would probably go further and say that finalizers need to be able to execute in finite time, so that callers do not have execution of their thread blocked arbitrarily if they happen to be one the one that releases the last reference. Finalizers should be releasing resources that are already in a "safe" state. Releasing memory, decrementing references, unregistering callbacks, are typical safe things. Performing I/O is a clearly a bad idea / inappropriate for a finalizer by this standard. Garbage collection vs reference counts is tangential to this problem, as you say, they'll both share the same problem we're facing. > Yes, this takes us back to manual resource management, only we manage > shutdown instead of death. > > Finalizing something that has not been shut down would be a programming > error. A recoverable one, I guess; we can have finalize attempt to shut > down then, and if it fails, just weep into the logs and move on. This is approximately what I did with QIOChannel. There is a qio_channel_close() method that is best practice to invoke to release resources assoociated with the channel, possibly flushing pending I/O, and reporting failures via an Error **errp. If this is not called, however, the finalizer will call close on your behalf, discarding errors. It'll probably be OK much of the time, and if we find it isn't, then the missing explicit close call needs to be addressed. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
On 13.01.23 14:29, Kevin Wolf wrote: > Another thing that could be tried is making failure in .bdrv_close less > likely by doing things earlier. At least ENOSPC could probably be > avoided if dirty bitmaps clusters were allocated during the write > request that first sets a bit in them (I know too little about the > details how bitmaps are implemented in qcow2, though, maybe Vladimir can > help here). That's possible but not trivial :) Qcow2 does nothing with dirty bitmaps during normal operation. Only on close, it finds all persistent bitmaps and stores them somehow, mostly allocating new clusters on the fly. So the simplest way look like: - add generic handler .bitmap_changed in BlockDriver, to handle bitmap change in qcow2 (that's not only write, but may be bitmap_merge opertion). - in a new handler allocate some clusters to produce a pool for dirty bitmaps saving (will need clusters for bitmap data and metadata (bitmap table, bitmap directory)) - in block/qcow2-bitmap.c switch qcow2_alloc_cluster() to a wrapper, that first tries to get a cluster from the pool and if it's empty fallback to qcow2_alloc_cluster() Note also, that this will increase fragmentation. Or, may be more effective would be to preallocate clusters on bitmap creation (and therefore on image resize). More difficult would be rework the whole code to bind allocated clusters for each persistent dirty bitmap. -- Best regards, Vladimir
© 2016 - 2023 Red Hat, Inc.