cfg.mk | 4 +- src/locking/lock_daemon_dispatch.c | 25 ++- src/locking/lock_driver.h | 12 ++ src/locking/lock_driver_lockd.c | 417 +++++++++++++++++++++++++------------ src/locking/lock_driver_lockd.h | 1 + src/locking/lock_driver_sanlock.c | 44 ++-- src/locking/lock_manager.c | 10 +- src/lxc/lxc_controller.c | 3 +- src/lxc/lxc_driver.c | 2 +- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 + src/qemu/qemu_conf.c | 13 ++ src/qemu/qemu_conf.h | 1 + src/qemu/qemu_domain.c | 3 +- src/qemu/qemu_driver.c | 10 +- src/qemu/qemu_process.c | 15 +- src/qemu/qemu_security.c | 272 +++++++++++++++++------- src/qemu/qemu_security.h | 18 +- src/qemu/test_libvirtd_qemu.aug.in | 1 + src/security/security_dac.c | 134 ++++++++---- src/security/security_manager.c | 171 ++++++++++++++- src/security/security_manager.h | 9 + src/security/security_selinux.c | 118 ++++++++--- src/util/virlockspace.c | 15 +- src/util/virlockspace.h | 4 + tests/seclabeltest.c | 2 +- tests/securityselinuxlabeltest.c | 2 +- tests/securityselinuxtest.c | 2 +- tests/testutilsqemu.c | 2 +- tests/virlockspacetest.c | 29 ++- 30 files changed, 1006 insertions(+), 342 deletions(-)
Technically, this is v4 of: https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html However, this is implementing different approach than any of the previous versions. One of the problems with previous version was that it was too complicated. The main reason for that was that we could not close the connection whilst there was a file locked. So we had to invent a mechanism that would prevent that (on the client side). These patches implement different approach. They rely on secdriver's transactions which bring all the paths we want to label into one place so that they can be relabelled within different namespace. I'm extending this idea so that transactions run all the time (regardless of domain namespacing) and only at the very last moment is decided which namespace would the relabeling run in. Metadata locking is then as easy as putting lock/unlock calls around one function. You can find the patches at my github too: https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt Michal Prívozník (23): qemu_security: Fully implement qemuSecurityDomainSetPathLabel qemu_security: Fully implement qemuSecurity{Set,Restore}SavedStateLabel qemu_security: Require full wrappers for APIs that might touch a file virSecurityManagerTransactionCommit: Accept pid == -1 qemu_security: Run transactions more frequently virlockspace: Allow caller to specify start and length offset in virLockSpaceAcquireResource lock_driver_lockd: Introduce VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_METADATA flag lock_driver: Introduce new VIR_LOCK_MANAGER_OBJECT_TYPE_DAEMON _virLockManagerLockDaemonPrivate: Move @hasRWDisks into dom union lock_driver: Introduce VIR_LOCK_MANAGER_RESOURCE_TYPE_METADATA lock_driver: Introduce VIR_LOCK_MANAGER_ACQUIRE_ROLLBACK lock_daemon_dispatch: Check for ownerPid rather than ownerId lock_manager: Allow disabling configFile for virLockManagerPluginNew qemu_conf: Introduce metadata_lock_manager security_manager: Load lock plugin on init security_manager: Introduce metadata locking APIs security_dac: Move transaction handling up one level security_dac: Fix info messages when chown()-ing security_dac: Lock metadata when running transaction virSecuritySELinuxRestoreFileLabel: Rename 'err' label virSecuritySELinuxRestoreFileLabel: Adjust code pattern security_selinux: Move transaction handling up one level security_dac: Lock metadata when running transaction cfg.mk | 4 +- src/locking/lock_daemon_dispatch.c | 25 ++- src/locking/lock_driver.h | 12 ++ src/locking/lock_driver_lockd.c | 417 +++++++++++++++++++++++++------------ src/locking/lock_driver_lockd.h | 1 + src/locking/lock_driver_sanlock.c | 44 ++-- src/locking/lock_manager.c | 10 +- src/lxc/lxc_controller.c | 3 +- src/lxc/lxc_driver.c | 2 +- src/qemu/libvirtd_qemu.aug | 1 + src/qemu/qemu.conf | 8 + src/qemu/qemu_conf.c | 13 ++ src/qemu/qemu_conf.h | 1 + src/qemu/qemu_domain.c | 3 +- src/qemu/qemu_driver.c | 10 +- src/qemu/qemu_process.c | 15 +- src/qemu/qemu_security.c | 272 +++++++++++++++++------- src/qemu/qemu_security.h | 18 +- src/qemu/test_libvirtd_qemu.aug.in | 1 + src/security/security_dac.c | 134 ++++++++---- src/security/security_manager.c | 171 ++++++++++++++- src/security/security_manager.h | 9 + src/security/security_selinux.c | 118 ++++++++--- src/util/virlockspace.c | 15 +- src/util/virlockspace.h | 4 + tests/seclabeltest.c | 2 +- tests/securityselinuxlabeltest.c | 2 +- tests/securityselinuxtest.c | 2 +- tests/testutilsqemu.c | 2 +- tests/virlockspacetest.c | 29 ++- 30 files changed, 1006 insertions(+), 342 deletions(-) -- 2.16.4 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 09/10/2018 11:36 AM, Michal Privoznik wrote: > Hopefully, fixed all the nits John found, and pushed. Thanks John for the review! Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Michal Privoznik <mprivozn@redhat.com> [2018-09-10, 11:36AM +0200]: > Technically, this is v4 of: > > https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html > > However, this is implementing different approach than any of the > previous versions. > > One of the problems with previous version was that it was too > complicated. The main reason for that was that we could not close the > connection whilst there was a file locked. So we had to invent a > mechanism that would prevent that (on the client side). > > These patches implement different approach. They rely on secdriver's > transactions which bring all the paths we want to label into one place > so that they can be relabelled within different namespace. > I'm extending this idea so that transactions run all the time > (regardless of domain namespacing) and only at the very last moment is > decided which namespace would the relabeling run in. > > Metadata locking is then as easy as putting lock/unlock calls around one > function. > > You can find the patches at my github too: > > https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt Hey Michal, is was running a quick test with this patch series with two domains sharing a disk image without <shareable/> and SELinux enabled. When starting the second domain, the whole libvirtd daemon hangs for almost a minute until giving the error that the image is locked. I haven't debugged it yet to figure out what happens. Otherwise it's looking good, relabeling is prevented as expected. Bjoern -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 09/12/2018 07:19 AM, Bjoern Walk wrote: > Michal Privoznik <mprivozn@redhat.com> [2018-09-10, 11:36AM +0200]: >> Technically, this is v4 of: >> >> https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html >> >> However, this is implementing different approach than any of the >> previous versions. >> >> One of the problems with previous version was that it was too >> complicated. The main reason for that was that we could not close the >> connection whilst there was a file locked. So we had to invent a >> mechanism that would prevent that (on the client side). >> >> These patches implement different approach. They rely on secdriver's >> transactions which bring all the paths we want to label into one place >> so that they can be relabelled within different namespace. >> I'm extending this idea so that transactions run all the time >> (regardless of domain namespacing) and only at the very last moment is >> decided which namespace would the relabeling run in. >> >> Metadata locking is then as easy as putting lock/unlock calls around one >> function. >> >> You can find the patches at my github too: >> >> https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt > > Hey Michal, > > is was running a quick test with this patch series with two domains > sharing a disk image without <shareable/> and SELinux enabled. When > starting the second domain, the whole libvirtd daemon hangs for almost a > minute until giving the error that the image is locked. I haven't > debugged it yet to figure out what happens. Is this on two different hosts or one? I'm unable to reproduce, so can you please attach debugger and share 't a a bt' output with me? When I try to reproduce I always get one domain running and the other failing to start because qemu is unable to do its locking. When I put <shareable/> in both, they start successfully. TBH, I don't run SELinux enabled host so I'm testing DAC only, but the changes to the code are merely the same. So I'm wondering if this is really an issue in my SELinux impl or somewhere else. BTW: The one minute timeout comes from patch 16/23: #define LOCK_ACQUIRE_TIMEOUT 60 Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]: > On 09/12/2018 07:19 AM, Bjoern Walk wrote: > > Michal Privoznik <mprivozn@redhat.com> [2018-09-10, 11:36AM +0200]: > >> Technically, this is v4 of: > >> > >> https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html > >> > >> However, this is implementing different approach than any of the > >> previous versions. > >> > >> One of the problems with previous version was that it was too > >> complicated. The main reason for that was that we could not close the > >> connection whilst there was a file locked. So we had to invent a > >> mechanism that would prevent that (on the client side). > >> > >> These patches implement different approach. They rely on secdriver's > >> transactions which bring all the paths we want to label into one place > >> so that they can be relabelled within different namespace. > >> I'm extending this idea so that transactions run all the time > >> (regardless of domain namespacing) and only at the very last moment is > >> decided which namespace would the relabeling run in. > >> > >> Metadata locking is then as easy as putting lock/unlock calls around one > >> function. > >> > >> You can find the patches at my github too: > >> > >> https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt > > > > Hey Michal, > > > > is was running a quick test with this patch series with two domains > > sharing a disk image without <shareable/> and SELinux enabled. When > > starting the second domain, the whole libvirtd daemon hangs for almost a > > minute until giving the error that the image is locked. I haven't > > debugged it yet to figure out what happens. > > Is this on two different hosts or one? On the same host. > I'm unable to reproduce, so can you please attach debugger and share 't > a a bt' output with me? (gdb) thread apply all bt Thread 17 (Thread 0x3ff48dff910 (LWP 193353)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff6678c5a8 in udevEventHandleThread (opaque=<optimized out>) at node_device/node_device_udev.c:1603 #3 0x000003ff8c6bad16 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 16 (Thread 0x3ff4acfe910 (LWP 193312)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 15 (Thread 0x3ff4b4ff910 (LWP 193311)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 14 (Thread 0x3ff64efd910 (LWP 193310)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 13 (Thread 0x3ff656fe910 (LWP 193309)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 12 (Thread 0x3ff65eff910 (LWP 193308)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 11 (Thread 0x3ff67fff910 (LWP 193307)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 10 (Thread 0x3ff84bf7910 (LWP 193306)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 9 (Thread 0x3ff853f8910 (LWP 193305)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 8 (Thread 0x3ff85bf9910 (LWP 193304)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 7 (Thread 0x3ff863fa910 (LWP 193303)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 6 (Thread 0x3ff86bfb910 (LWP 193302)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 5 (Thread 0x3ff873fc910 (LWP 193301)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 4 (Thread 0x3ff87bfd910 (LWP 193300)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 3 (Thread 0x3ff883fe910 (LWP 193299)): #0 0x000003ff8b7113d4 in read () from /lib64/libpthread.so.0 #1 0x000003ff8c65a648 in saferead () from /lib64/libvirt.so.0 #2 0x000003ff8c65a718 in saferead_lim () from /lib64/libvirt.so.0 #3 0x000003ff8c65abe4 in virFileReadHeaderFD () from /lib64/libvirt.so.0 #4 0x000003ff8c69cb5a in virProcessRunInMountNamespace () from /lib64/libvirt.so.0 #5 0x000003ff8c767bd6 in virSecuritySELinuxTransactionCommit () from /lib64/libvirt.so.0 #6 0x000003ff8c75fb1c in virSecurityManagerTransactionCommit () from /lib64/libvirt.so.0 #7 0x000003ff8c75bb70 in virSecurityStackTransactionCommit () from /lib64/libvirt.so.0 #8 0x000003ff8c75fb1c in virSecurityManagerTransactionCommit () from /lib64/libvirt.so.0 #9 0x000003ff6612c492 in qemuSecuritySetAllLabel () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #10 0x000003ff660bcfd6 in qemuProcessLaunch () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #11 0x000003ff660c0916 in qemuProcessStart () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #12 0x000003ff66124a00 in qemuDomainObjStart.constprop.52 () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #13 0x000003ff66125030 in qemuDomainCreateWithFlags () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so #14 0x000003ff8c87bfca in virDomainCreate () from /lib64/libvirt.so.0 #15 0x000002aa1f2d516c in remoteDispatchDomainCreate (server=<optimized out>, msg=0x2aa4c488b00, args=<optimized out>, rerr=0x3ff883fdae8, client=<optimized out>) at remote/remote_daemon_dispatch_stubs.h:4434 #16 remoteDispatchDomainCreateHelper (server=<optimized out>, client=<optimized out>, msg=0x2aa4c488b00, rerr=0x3ff883fdae8, args=<optimized out>, ret=0x3ff78004c80) at remote/remote_daemon_dispatch_stubs.h:4410 #17 0x000003ff8c78bc80 in virNetServerProgramDispatch () from /lib64/libvirt.so.0 #18 0x000003ff8c792aba in virNetServerHandleJob () from /lib64/libvirt.so.0 #19 0x000003ff8c6bbb16 in virThreadPoolWorker () from /lib64/libvirt.so.0 #20 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #21 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #22 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 2 (Thread 0x3ff88bff910 (LWP 193298)): #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 Thread 1 (Thread 0x3ff8cbd7970 (LWP 193297)): #0 0x000003ff8b5ee502 in poll () from /lib64/libc.so.6 #1 0x000003ff8c65629c in virEventPollRunOnce () from /lib64/libvirt.so.0 #2 0x000003ff8c654caa in virEventRunDefaultImpl () from /lib64/libvirt.so.0 #3 0x000003ff8c7921de in virNetDaemonRun () from /lib64/libvirt.so.0 #4 0x000002aa1f2a4c12 in main (argc=<optimized out>, argv=<optimized out>) at remote/remote_daemon.c:1461 > When I try to reproduce I always get one domain running and the other > failing to start because qemu is unable to do its locking. When I put > <shareable/> in both, they start successfully. Yes, I get the same (expected) behaviour, just with the timeout. > TBH, I don't run SELinux enabled host so I'm testing DAC only, but the > changes to the code are merely the same. So I'm wondering if this is > really an issue in my SELinux impl or somewhere else. > > BTW: The one minute timeout comes from patch 16/23: > > #define LOCK_ACQUIRE_TIMEOUT 60 > > Michal It's entirely possible that my setup is broken, this was just a quick test. If you can't reproduce it or make sense out of the backtrace then it's fine, I will try to put some time to debug it next week. I was just interested if this would fix a long-standing bug where this scenario would crash the first domain because the label changed. Bjoern -- IBM Systems Linux on Z & Virtualization Development ------------------------------------------------------------------------ IBM Deutschland Research & Development GmbH Schönaicher Str. 220, 71032 Böblingen Phone: +49 7031 16 1819 ------------------------------------------------------------------------ Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]: > Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]: > > On 09/12/2018 07:19 AM, Bjoern Walk wrote: > > > Michal Privoznik <mprivozn@redhat.com> [2018-09-10, 11:36AM +0200]: > > >> Technically, this is v4 of: > > >> > > >> https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html > > >> > > >> However, this is implementing different approach than any of the > > >> previous versions. > > >> > > >> One of the problems with previous version was that it was too > > >> complicated. The main reason for that was that we could not close the > > >> connection whilst there was a file locked. So we had to invent a > > >> mechanism that would prevent that (on the client side). > > >> > > >> These patches implement different approach. They rely on secdriver's > > >> transactions which bring all the paths we want to label into one place > > >> so that they can be relabelled within different namespace. > > >> I'm extending this idea so that transactions run all the time > > >> (regardless of domain namespacing) and only at the very last moment is > > >> decided which namespace would the relabeling run in. > > >> > > >> Metadata locking is then as easy as putting lock/unlock calls around one > > >> function. > > >> > > >> You can find the patches at my github too: > > >> > > >> https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt > > > > > > Hey Michal, > > > > > > is was running a quick test with this patch series with two domains > > > sharing a disk image without <shareable/> and SELinux enabled. When > > > starting the second domain, the whole libvirtd daemon hangs for almost a > > > minute until giving the error that the image is locked. I haven't > > > debugged it yet to figure out what happens. > > > > Is this on two different hosts or one? > > On the same host. > > > I'm unable to reproduce, so can you please attach debugger and share 't > > a a bt' output with me? > > (gdb) thread apply all bt > > Thread 17 (Thread 0x3ff48dff910 (LWP 193353)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff6678c5a8 in udevEventHandleThread (opaque=<optimized out>) at node_device/node_device_udev.c:1603 > #3 0x000003ff8c6bad16 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 16 (Thread 0x3ff4acfe910 (LWP 193312)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 15 (Thread 0x3ff4b4ff910 (LWP 193311)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 14 (Thread 0x3ff64efd910 (LWP 193310)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 13 (Thread 0x3ff656fe910 (LWP 193309)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 12 (Thread 0x3ff65eff910 (LWP 193308)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 11 (Thread 0x3ff67fff910 (LWP 193307)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 10 (Thread 0x3ff84bf7910 (LWP 193306)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 9 (Thread 0x3ff853f8910 (LWP 193305)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 8 (Thread 0x3ff85bf9910 (LWP 193304)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 7 (Thread 0x3ff863fa910 (LWP 193303)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 6 (Thread 0x3ff86bfb910 (LWP 193302)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 5 (Thread 0x3ff873fc910 (LWP 193301)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 4 (Thread 0x3ff87bfd910 (LWP 193300)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 3 (Thread 0x3ff883fe910 (LWP 193299)): > #0 0x000003ff8b7113d4 in read () from /lib64/libpthread.so.0 > #1 0x000003ff8c65a648 in saferead () from /lib64/libvirt.so.0 > #2 0x000003ff8c65a718 in saferead_lim () from /lib64/libvirt.so.0 > #3 0x000003ff8c65abe4 in virFileReadHeaderFD () from /lib64/libvirt.so.0 > #4 0x000003ff8c69cb5a in virProcessRunInMountNamespace () from /lib64/libvirt.so.0 > #5 0x000003ff8c767bd6 in virSecuritySELinuxTransactionCommit () from /lib64/libvirt.so.0 > #6 0x000003ff8c75fb1c in virSecurityManagerTransactionCommit () from /lib64/libvirt.so.0 > #7 0x000003ff8c75bb70 in virSecurityStackTransactionCommit () from /lib64/libvirt.so.0 > #8 0x000003ff8c75fb1c in virSecurityManagerTransactionCommit () from /lib64/libvirt.so.0 > #9 0x000003ff6612c492 in qemuSecuritySetAllLabel () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so > #10 0x000003ff660bcfd6 in qemuProcessLaunch () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so > #11 0x000003ff660c0916 in qemuProcessStart () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so > #12 0x000003ff66124a00 in qemuDomainObjStart.constprop.52 () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so > #13 0x000003ff66125030 in qemuDomainCreateWithFlags () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so > #14 0x000003ff8c87bfca in virDomainCreate () from /lib64/libvirt.so.0 > #15 0x000002aa1f2d516c in remoteDispatchDomainCreate (server=<optimized out>, msg=0x2aa4c488b00, args=<optimized out>, rerr=0x3ff883fdae8, client=<optimized out>) at remote/remote_daemon_dispatch_stubs.h:4434 > #16 remoteDispatchDomainCreateHelper (server=<optimized out>, client=<optimized out>, msg=0x2aa4c488b00, rerr=0x3ff883fdae8, args=<optimized out>, ret=0x3ff78004c80) at remote/remote_daemon_dispatch_stubs.h:4410 > #17 0x000003ff8c78bc80 in virNetServerProgramDispatch () from /lib64/libvirt.so.0 > #18 0x000003ff8c792aba in virNetServerHandleJob () from /lib64/libvirt.so.0 > #19 0x000003ff8c6bbb16 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #20 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #21 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #22 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 2 (Thread 0x3ff88bff910 (LWP 193298)): > #0 0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 > #1 0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0 > #2 0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0 > #3 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0 > #4 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0 > #5 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6 > > Thread 1 (Thread 0x3ff8cbd7970 (LWP 193297)): > #0 0x000003ff8b5ee502 in poll () from /lib64/libc.so.6 > #1 0x000003ff8c65629c in virEventPollRunOnce () from /lib64/libvirt.so.0 > #2 0x000003ff8c654caa in virEventRunDefaultImpl () from /lib64/libvirt.so.0 > #3 0x000003ff8c7921de in virNetDaemonRun () from /lib64/libvirt.so.0 > #4 0x000002aa1f2a4c12 in main (argc=<optimized out>, argv=<optimized out>) at remote/remote_daemon.c:1461 > > > When I try to reproduce I always get one domain running and the other > > failing to start because qemu is unable to do its locking. When I put > > <shareable/> in both, they start successfully. > > Yes, I get the same (expected) behaviour, just with the timeout. > > > TBH, I don't run SELinux enabled host so I'm testing DAC only, but the > > changes to the code are merely the same. So I'm wondering if this is > > really an issue in my SELinux impl or somewhere else. > > > > BTW: The one minute timeout comes from patch 16/23: > > > > #define LOCK_ACQUIRE_TIMEOUT 60 > > > > Michal > > It's entirely possible that my setup is broken, this was just a quick > test. If you can't reproduce it or make sense out of the backtrace then > it's fine, I will try to put some time to debug it next week. I was just > interested if this would fix a long-standing bug where this scenario > would crash the first domain because the label changed. > > Bjoern > Still seeing the same timeout. Is this expected behaviour? -- IBM Systems Linux on Z & Virtualization Development ------------------------------------------------------------------------ IBM Deutschland Research & Development GmbH Schönaicher Str. 220, 71032 Böblingen Phone: +49 7031 16 1819 ------------------------------------------------------------------------ Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 09/19/2018 11:17 AM, Bjoern Walk wrote: > Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]: >> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]: >> > > Still seeing the same timeout. Is this expected behaviour? > Nope. I wonder if something has locked the path and forgot to unlock it (however, virtlockd should have unlocked all the paths owned by PID on connection close), or something is still holding the lock and connection opened. Can you see the timeout even when you turn off the selinux driver (security_driver="none' in qemu.conf)? I tried to reproduce the issue yesterday and was unsuccessful. Do you have any steps to reproduce? Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]: > On 09/19/2018 11:17 AM, Bjoern Walk wrote: > > Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]: > >> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]: > > >> > > > > Still seeing the same timeout. Is this expected behaviour? > > > > Nope. I wonder if something has locked the path and forgot to unlock it > (however, virtlockd should have unlocked all the paths owned by PID on > connection close), or something is still holding the lock and connection > opened. I can reproduce on a freshly booted machine. There should be no rouge lock held anywhere. > > Can you see the timeout even when you turn off the selinux driver > (security_driver="none' in qemu.conf)? I tried to reproduce the issue Yes, same issue. > yesterday and was unsuccessful. Do you have any steps to reproduce? 0. Host setup: # /usr/sbin/sestatus SELinux status: enabled SELinuxfs mount: /sys/fs/selinux SELinux root directory: /etc/selinux Loaded policy name: targeted Current mode: enforcing Mode from config file: enforcing Policy MLS status: enabled Policy deny_unknown status: allowed Memory protection checking: actual (secure) Max kernel policy version: 31 # grep -E "^[^#]" /etc/libvirt/qemu.conf lock_manager = "lockd" metadata_lock_manager = "lockd" 1. Define two domains, u1604-1 and u1604-2, using the same image, not shared: <domain type='kvm'> <name>u1604-1</name> <uuid>e49679de-eca9-4a72-8d1a-56f437541157</uuid> <memory unit='KiB'>524288</memory> <currentMemory unit='KiB'>524288</currentMemory> <vcpu placement='static'>2</vcpu> <os> <type arch='s390x' machine='s390-ccw-virtio-2.12'>hvm</type> <boot dev='hd'/> </os> <clock offset='utc'/> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>preserve</on_crash> <devices> <emulator>/usr/bin/qemu-system-s390x</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/u1604.qcow2'/> <target dev='vda' bus='virtio'/> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/> </disk> <console type='pty'> <target type='sclp' port='0'/> </console> <memballoon model='virtio'> <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0001'/> </memballoon> <panic model='s390'/> </devices> </domain> 2. Start domain u1604-1: # ls -Z /var/lib/libvirt/images/u1604.qcow2 system_u:object_r:svirt_image_t:s0:c387,c937 /var/lib/libvirt/images/u1604.qcow2 3. Start domain u1604-2. 4. Result: the libvirtd hangs for 60 seconds after which the expected locking message appears. Security labels of the image are not changed: # virsh start u1604-2 error: Failed to start domain u1604-2 error: internal error: child reported: resource busy: Lockspace resource '/var/lib/libvirt/images/u1604.qcow2' is locked # ls -Z /var/lib/libvirt/images/u1604.qcow2 system_u:object_r:svirt_image_t:s0:c387,c937 /var/lib/libvirt/images/u1604.qcow2 Backtrace is the same I sent earlier. Let me know if you need more info or if anything is borked in my setup. > > Michal > -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]: > On 09/19/2018 11:17 AM, Bjoern Walk wrote: > > Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]: > >> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]: > > >> > > > > Still seeing the same timeout. Is this expected behaviour? > > > > Nope. I wonder if something has locked the path and forgot to unlock it > (however, virtlockd should have unlocked all the paths owned by PID on > connection close), or something is still holding the lock and connection > opened. > > Can you see the timeout even when you turn off the selinux driver > (security_driver="none' in qemu.conf)? I tried to reproduce the issue > yesterday and was unsuccessful. Do you have any steps to reproduce? So, I haven't been able to actually dig into the debugging but we have been able to reproduce this behaviour on x86 (both with SELinux and DAC). Looks like a general problem, if a problem at all, because from what I can see in the code, the 60 seconds timeout is actually intended, or not? The security manager does try for 60 seconds to acquire the lock and only then bails out. Why is this? However, an actual bug is that while the security manager waits for the lock acquire the whole libvirtd hangs, but from what I understood and Marc explained to me, this is more of a pathological error in libvirt behaviour with various domain locks. -- IBM Systems Linux on Z & Virtualization Development -------------------------------------------------- IBM Deutschland Research & Development GmbH Schönaicher Str. 220, 71032 Böblingen Phone: +49 7031 16 1819 -------------------------------------------------- Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 09/27/2018 09:01 AM, Bjoern Walk wrote: > Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]: >> On 09/19/2018 11:17 AM, Bjoern Walk wrote: >>> Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]: >>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM >>>> +0200]: >> >>>> >>> >>> Still seeing the same timeout. Is this expected behaviour? >>> >> >> Nope. I wonder if something has locked the path and forgot to >> unlock it (however, virtlockd should have unlocked all the paths >> owned by PID on connection close), or something is still holding >> the lock and connection opened. >> >> Can you see the timeout even when you turn off the selinux driver >> (security_driver="none' in qemu.conf)? I tried to reproduce the >> issue yesterday and was unsuccessful. Do you have any steps to >> reproduce? > > So, I haven't been able to actually dig into the debugging but we > have been able to reproduce this behaviour on x86 (both with SELinux > and DAC). Looks like a general problem, if a problem at all, because > from what I can see in the code, the 60 seconds timeout is actually > intended, or not? The security manager does try for 60 seconds to > acquire the lock and only then bails out. Why is this? The ideal solution would be to just tell virlockd "these are the paths I want you to lock on my behalf" and virtlockd would use F_SETLKW so that the moment all paths are unlocked virtlockd will lock them and libvirtd can continue its execution (i.e. chown() and setfcon()). However, we can't do this because virtlockd runs single threaded [1] and therefore if one thread is waiting for lock to be acquired there is no other thread to unlock the path. Therefore I had to move the logic into libvirtd which tries repeatedly to lock all the paths needed. And this is where the timeout steps in - the lock acquiring attempts are capped at 60 seconds. This is an arbitrary chosen timeout. We can make it smaller, but that will not solve your problem - only mask it. > > However, an actual bug is that while the security manager waits for > the lock acquire the whole libvirtd hangs, but from what I understood > and Marc explained to me, this is more of a pathological error in > libvirt behaviour with various domain locks. > Whole libvirtd shouldn't hang. Only those threads which try to acquire domain object lock. IOW it should be possible to run APIs over different domains (or other objects for that matter). For instance dumpxml over different domain works just fine. Anyway, we need to get to the bottom of this. Looks like something keeps the file locked and then when libvirt wants to lock if for metadata the timeout is hit and whole operation fails. Do you mind running 'lslocks -u' when starting a domain and before the timeout is hit? Michal 1: The reason that virtlockd has to run single threaded is stupidity of POSIX file locks. Imagine one thread doing: open() + fcntl(fd, F_SETLKW, ...) and entering critical section. If another thread does open() + close() on the same file the file is unlocked. Because we can't guarantee this will not happen in multithreaded libvirt we had to introduce a separate process to take care of that. And this process has to be single threaded so there won't ever be the second thread to call close() and unintentionally release the lock. Perhaps we could use OFD locks but those are not part of POSIX and are available on Linux only. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Sep 27, 2018 at 10:15 AM +0200, Michal Privoznik <mprivozn@redhat.com> wrote: > On 09/27/2018 09:01 AM, Bjoern Walk wrote: >> Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]: >>> On 09/19/2018 11:17 AM, Bjoern Walk wrote: >>>> Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]: >>>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM >>>>> +0200]: >>> >>>>> >>>> >>>> Still seeing the same timeout. Is this expected behaviour? >>>> >>> >>> Nope. I wonder if something has locked the path and forgot to >>> unlock it (however, virtlockd should have unlocked all the paths >>> owned by PID on connection close), or something is still holding >>> the lock and connection opened. >>> >>> Can you see the timeout even when you turn off the selinux driver >>> (security_driver="none' in qemu.conf)? I tried to reproduce the >>> issue yesterday and was unsuccessful. Do you have any steps to >>> reproduce? >> >> So, I haven't been able to actually dig into the debugging but we >> have been able to reproduce this behaviour on x86 (both with SELinux >> and DAC). Looks like a general problem, if a problem at all, because >> from what I can see in the code, the 60 seconds timeout is actually >> intended, or not? The security manager does try for 60 seconds to >> acquire the lock and only then bails out. Why is this? > > The ideal solution would be to just tell virlockd "these are the paths I > want you to lock on my behalf" and virtlockd would use F_SETLKW so that > the moment all paths are unlocked virtlockd will lock them and libvirtd > can continue its execution (i.e. chown() and setfcon()). However, we > can't do this because virtlockd runs single threaded [1] and therefore > if one thread is waiting for lock to be acquired there is no other > thread to unlock the path. > > Therefore I had to move the logic into libvirtd which tries repeatedly > to lock all the paths needed. And this is where the timeout steps in - > the lock acquiring attempts are capped at 60 seconds. This is an > arbitrary chosen timeout. We can make it smaller, but that will not > solve your problem - only mask it. > >> >> However, an actual bug is that while the security manager waits for >> the lock acquire the whole libvirtd hangs, but from what I understood >> and Marc explained to me, this is more of a pathological error in >> libvirt behaviour with various domain locks. >> > > Whole libvirtd shouldn't hang. The main loop doesn’t hang. > Only those threads which try to acquire > domain object lock. IOW it should be possible to run APIs over different > domains (or other objects for that matter). For instance dumpxml over > different domain works just fine. Oh sry, my assumption that the thread A is also holding the virDomainObjList lock is wrong! Here is the actual backtrace of a waiting thread (in this case the “worker thread for the "virsh list" command”) (gdb) bt #0 0x00007f850887b7ed in __lll_lock_wait () from /lib64/libpthread.so.0 #1 0x00007f8508874cf4 in pthread_mutex_lock () from /lib64/libpthread.so.0 #2 0x00007f850c26cfd9 in virMutexLock (m=<optimized out>) at util/virthread.c:89 #3 0x00007f850c2455df in virObjectLock (anyobj=<optimized out>) at util/virobject.c:429 #4 0x00007f850c2ceaba in virDomainObjListFilter (list=list@entry=0x7f84f47a97c0, nvms=nvms@entry=0x7f84f47a97c8, conn=conn@entry=0x7f84dc01ad00, filter=filter@entry=0x7f850c318c70 <virConnectListAllDomainsCheckACL>, flags=flags@entry=1) at conf/virdomainobjlist.c:919 #5 0x00007f850c2cfef2 in virDomainObjListCollect (domlist=0x7f848c10e290, conn=conn@entry=0x7f84dc01ad00, vms=vms@entry=0x7f84f47a9820, nvms=nvms@entry=0x7f84f47a9830, filter=0x7f850c318c70 <virConnectListAllDomainsCheckACL>, flags=1) at conf/virdomainobjlist.c:961 #6 0x00007f850c2d01a6 in virDomainObjListExport (domlist=<optimized out>, conn=0x7f84dc01ad00, domains=0x7f84f47a9890, filter=<optimized out>, flags=<optimized out>) at conf/virdomainobjlist.c:1042 #7 0x00007f850c426be9 in virConnectListAllDomains (conn=0x7f84dc01ad00, domains=0x7f84f47a9890, flags=1) at libvirt-domain.c:6493 #8 0x000055a82b2d905d in remoteDispatchConnectListAllDomains (server=0x55a82c9e16d0, msg=0x55a82ca53380, args=0x7f84dc01ae30, args=0x7f84dc01ae30, ret=0x7f84dc005bb0, rerr=0x7f84f47a9960, client=<optimized out>) at remote/remote_daemon_dispatch_stubs.h:1372 #9 remoteDispatchConnectListAllDomainsHelper (server=0x55a82c9e16d0, client=<optimized out>, msg=0x55a82ca53380, rerr=0x7f84f47a9960, args=0x7f84dc01ae30, ret=0x7f84dc005bb0) at remote/remote_daemon_dispatch_stubs.h:1348 #10 0x00007f850c338734 in virNetServerProgramDispatchCall (msg=0x55a82ca53380, client=0x55a82ca52a40, server=0x55a82c9e16d0, prog=0x55a82ca4be70) at rpc/virnetserverprogram.c:437 … virDomainObjListFilter needs the lock of every domain… That’s the actual problem. > > Anyway, we need to get to the bottom of this. Looks like something keeps > the file locked and then when libvirt wants to lock if for metadata the > timeout is hit and whole operation fails. Do you mind running 'lslocks > -u' when starting a domain and before the timeout is hit? > > Michal > > 1: The reason that virtlockd has to run single threaded is stupidity of > POSIX file locks. Imagine one thread doing: open() + fcntl(fd, F_SETLKW, > ...) and entering critical section. If another thread does open() + > close() on the same file the file is unlocked. Because we can't > guarantee this will not happen in multithreaded libvirt we had to > introduce a separate process to take care of that. And this process has > to be single threaded so there won't ever be the second thread to call > close() and unintentionally release the lock. > > Perhaps we could use OFD locks but those are not part of POSIX and are > available on Linux only. > -- Kind regards / Beste Grüße Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Michal Privoznik <mprivozn@redhat.com> [2018-09-27, 10:15AM +0200]: > On 09/27/2018 09:01 AM, Bjoern Walk wrote: > > Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]: > >> On 09/19/2018 11:17 AM, Bjoern Walk wrote: > >>> Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]: > >>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM > >>>> +0200]: > >> > >>>> > >>> > >>> Still seeing the same timeout. Is this expected behaviour? > >>> > >> > >> Nope. I wonder if something has locked the path and forgot to > >> unlock it (however, virtlockd should have unlocked all the paths > >> owned by PID on connection close), or something is still holding > >> the lock and connection opened. > >> > >> Can you see the timeout even when you turn off the selinux driver > >> (security_driver="none' in qemu.conf)? I tried to reproduce the > >> issue yesterday and was unsuccessful. Do you have any steps to > >> reproduce? > > > > So, I haven't been able to actually dig into the debugging but we > > have been able to reproduce this behaviour on x86 (both with SELinux > > and DAC). Looks like a general problem, if a problem at all, because > > from what I can see in the code, the 60 seconds timeout is actually > > intended, or not? The security manager does try for 60 seconds to > > acquire the lock and only then bails out. Why is this? > > The ideal solution would be to just tell virlockd "these are the paths I > want you to lock on my behalf" and virtlockd would use F_SETLKW so that > the moment all paths are unlocked virtlockd will lock them and libvirtd > can continue its execution (i.e. chown() and setfcon()). However, we > can't do this because virtlockd runs single threaded [1] and therefore > if one thread is waiting for lock to be acquired there is no other > thread to unlock the path. > > Therefore I had to move the logic into libvirtd which tries repeatedly > to lock all the paths needed. And this is where the timeout steps in - > the lock acquiring attempts are capped at 60 seconds. This is an > arbitrary chosen timeout. We can make it smaller, but that will not > solve your problem - only mask it. I still don't understand why we need a timeout at all. If virtlockd is unable to get the lock, just bail and continue with what you did after the timeout runs out. Is this some kind of safety-measure? Against what? > > > > > However, an actual bug is that while the security manager waits for > > the lock acquire the whole libvirtd hangs, but from what I understood > > and Marc explained to me, this is more of a pathological error in > > libvirt behaviour with various domain locks. > > > > Whole libvirtd shouldn't hang. Only those threads which try to acquire > domain object lock. IOW it should be possible to run APIs over different > domains (or other objects for that matter). For instance dumpxml over > different domain works just fine. Yes, but from a user perspective, this is still pretty bad and unexpected. libvirt should continue to operate as usual while virtlockd is figuring out the locking. > Anyway, we need to get to the bottom of this. Looks like something keeps > the file locked and then when libvirt wants to lock if for metadata the > timeout is hit and whole operation fails. Do you mind running 'lslocks > -u' when starting a domain and before the timeout is hit? There IS a lock held on the image, from the FIRST domain that we started. The second domain, which is using the SAME image, unshared, runs into the locking timeout. Sorry if I failed to describe this setup appropriately. I am starting to believe that this is expected behaviour, although it is not intuitive. # lslocks -u COMMAND PID TYPE SIZE MODE M START END PATH ... virtlockd 199062 POSIX 1.5G WRITE 0 0 0 /var/lib/libvirt/images/u1604.qcow2 ... > > Michal > > 1: The reason that virtlockd has to run single threaded is stupidity of > POSIX file locks. Imagine one thread doing: open() + fcntl(fd, F_SETLKW, > ...) and entering critical section. If another thread does open() + > close() on the same file the file is unlocked. Because we can't > guarantee this will not happen in multithreaded libvirt we had to > introduce a separate process to take care of that. And this process has > to be single threaded so there won't ever be the second thread to call > close() and unintentionally release the lock. Thanks for the explanation, I will have some reading to do to get a better overview of the locking process in Linux. > > Perhaps we could use OFD locks but those are not part of POSIX and are > available on Linux only. > > -- > libvir-list mailing list > libvir-list@redhat.com > https://www.redhat.com/mailman/listinfo/libvir-list > -- IBM Systems Linux on Z & Virtualization Development -------------------------------------------------- IBM Deutschland Research & Development GmbH Schönaicher Str. 220, 71032 Böblingen Phone: +49 7031 16 1819 -------------------------------------------------- Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 09/27/2018 11:11 AM, Bjoern Walk wrote: > Michal Privoznik <mprivozn@redhat.com> [2018-09-27, 10:15AM +0200]: >> On 09/27/2018 09:01 AM, Bjoern Walk wrote: >>> Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]: >>>> On 09/19/2018 11:17 AM, Bjoern Walk wrote: >>>>> Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]: >>>>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM >>>>>> +0200]: >>>> >>>>>> >>>>> >>>>> Still seeing the same timeout. Is this expected behaviour? >>>>> >>>> >>>> Nope. I wonder if something has locked the path and forgot to >>>> unlock it (however, virtlockd should have unlocked all the paths >>>> owned by PID on connection close), or something is still holding >>>> the lock and connection opened. >>>> >>>> Can you see the timeout even when you turn off the selinux driver >>>> (security_driver="none' in qemu.conf)? I tried to reproduce the >>>> issue yesterday and was unsuccessful. Do you have any steps to >>>> reproduce? >>> >>> So, I haven't been able to actually dig into the debugging but we >>> have been able to reproduce this behaviour on x86 (both with SELinux >>> and DAC). Looks like a general problem, if a problem at all, because >>> from what I can see in the code, the 60 seconds timeout is actually >>> intended, or not? The security manager does try for 60 seconds to >>> acquire the lock and only then bails out. Why is this? >> >> The ideal solution would be to just tell virlockd "these are the paths I >> want you to lock on my behalf" and virtlockd would use F_SETLKW so that >> the moment all paths are unlocked virtlockd will lock them and libvirtd >> can continue its execution (i.e. chown() and setfcon()). However, we >> can't do this because virtlockd runs single threaded [1] and therefore >> if one thread is waiting for lock to be acquired there is no other >> thread to unlock the path. >> >> Therefore I had to move the logic into libvirtd which tries repeatedly >> to lock all the paths needed. And this is where the timeout steps in - >> the lock acquiring attempts are capped at 60 seconds. This is an >> arbitrary chosen timeout. We can make it smaller, but that will not >> solve your problem - only mask it. > > I still don't understand why we need a timeout at all. If virtlockd is > unable to get the lock, just bail and continue with what you did after > the timeout runs out. Is this some kind of safety-measure? Against what? When there are two threads fighting over the lock. Imagine two threads trying to set security label over the same file. Or imagine two daemons on two distant hosts trying to set label on the same file on NFS. virtlockd implements try-or-fail approach. So we need to call lock() repeatedly until we succeed (or timeout). > >> >>> >>> However, an actual bug is that while the security manager waits for >>> the lock acquire the whole libvirtd hangs, but from what I understood >>> and Marc explained to me, this is more of a pathological error in >>> libvirt behaviour with various domain locks. >>> >> >> Whole libvirtd shouldn't hang. Only those threads which try to acquire >> domain object lock. IOW it should be possible to run APIs over different >> domains (or other objects for that matter). For instance dumpxml over >> different domain works just fine. > > Yes, but from a user perspective, this is still pretty bad and > unexpected. libvirt should continue to operate as usual while virtlockd > is figuring out the locking. Agreed. I will look into this. > >> Anyway, we need to get to the bottom of this. Looks like something keeps >> the file locked and then when libvirt wants to lock if for metadata the >> timeout is hit and whole operation fails. Do you mind running 'lslocks >> -u' when starting a domain and before the timeout is hit? > > There IS a lock held on the image, from the FIRST domain that we > started. The second domain, which is using the SAME image, unshared, > runs into the locking timeout. Sorry if I failed to describe this setup > appropriately. I am starting to believe that this is expected behaviour, > although it is not intuitive. > > # lslocks -u > COMMAND PID TYPE SIZE MODE M START END PATH > ... > virtlockd 199062 POSIX 1.5G WRITE 0 0 0 /var/lib/libvirt/images/u1604.qcow2 But this should not clash, because metadata lock use different bytes: the regular locking uses offset=0, metadata lock uses offset=1. Both locks lock just one byte in length. However, from the logs it looks like virtlockd doesn't try to actually acquire the lock because it found in internal records that the path is locked (even though it is locked with different offset). Anyway, now I am finally able to reproduce the issue and I'll look into this too. Quick and dirty patch might look something like this: diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c index 79fa48d365..3c51d7926b 100644 --- i/src/util/virlockspace.c +++ w/src/util/virlockspace.c @@ -630,8 +630,9 @@ int virLockSpaceAcquireResource(virLockSpacePtr lockspace, virMutexLock(&lockspace->lock); if ((res = virHashLookup(lockspace->resources, resname))) { - if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) && - (flags & VIR_LOCK_SPACE_ACQUIRE_SHARED)) { + if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED && + flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) || + start == 1) { if (VIR_EXPAND_N(res->owners, res->nOwners, 1) < 0) goto cleanup; But as I say, this is very dirty hack. I need to find a better solution. At least you might have something to test. Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Michal Privoznik <mprivozn@redhat.com> [2018-09-27, 12:07PM +0200]: > On 09/27/2018 11:11 AM, Bjoern Walk wrote: > > I still don't understand why we need a timeout at all. If virtlockd is > > unable to get the lock, just bail and continue with what you did after > > the timeout runs out. Is this some kind of safety-measure? Against what? > > When there are two threads fighting over the lock. Imagine two threads > trying to set security label over the same file. Or imagine two daemons > on two distant hosts trying to set label on the same file on NFS. > virtlockd implements try-or-fail approach. So we need to call lock() > repeatedly until we succeed (or timeout). One thread wins and the other fails with lock contention? I don't see how repeatedly trying to acquire the lock will improve things, but maybe I am not getting it right now. > > There IS a lock held on the image, from the FIRST domain that we > > started. The second domain, which is using the SAME image, unshared, > > runs into the locking timeout. Sorry if I failed to describe this setup > > appropriately. I am starting to believe that this is expected behaviour, > > although it is not intuitive. > > > > # lslocks -u > > COMMAND PID TYPE SIZE MODE M START END PATH > > ... > > virtlockd 199062 POSIX 1.5G WRITE 0 0 0 /var/lib/libvirt/images/u1604.qcow2 > > But this should not clash, because metadata lock use different bytes: > the regular locking uses offset=0, metadata lock uses offset=1. Both > locks lock just one byte in length. It's the only lock in the output for the image. Should I see the metadata lock at start=1 as well? > However, from the logs it looks like virtlockd doesn't try to actually > acquire the lock because it found in internal records that the path is > locked (even though it is locked with different offset). Anyway, now I > am finally able to reproduce the issue and I'll look into this too. Good. > Quick and dirty patch might look something like this: > > diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c > index 79fa48d365..3c51d7926b 100644 > --- i/src/util/virlockspace.c > +++ w/src/util/virlockspace.c > @@ -630,8 +630,9 @@ int virLockSpaceAcquireResource(virLockSpacePtr > lockspace, > virMutexLock(&lockspace->lock); > > if ((res = virHashLookup(lockspace->resources, resname))) { > - if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) && > - (flags & VIR_LOCK_SPACE_ACQUIRE_SHARED)) { > + if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED && > + flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) || > + start == 1) { > > if (VIR_EXPAND_N(res->owners, res->nOwners, 1) < 0) > goto cleanup; Had a quick test with this, but now it seems like that the whole metadata locking does nothing. When starting the second domain, I get an internal error from QEMU, failing to get the write lock. The SELinux labels of the image are changed as well. This is the same behaviour as with metadata locking disabled. Not entirely sure what should happen or if this is a error in my setup or not. I will have to think about this. > But as I say, this is very dirty hack. I need to find a better solution. > At least you might have something to test. > > Michal > > -- > libvir-list mailing list > libvir-list@redhat.com > https://www.redhat.com/mailman/listinfo/libvir-list > -- IBM Systems Linux on Z & Virtualization Development -------------------------------------------------- IBM Deutschland Research & Development GmbH Schönaicher Str. 220, 71032 Böblingen Phone: +49 7031 16 1819 -------------------------------------------------- Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 09/27/2018 01:16 PM, Bjoern Walk wrote: > Michal Privoznik <mprivozn@redhat.com> [2018-09-27, 12:07PM +0200]: >> On 09/27/2018 11:11 AM, Bjoern Walk wrote: >>> I still don't understand why we need a timeout at all. If virtlockd is >>> unable to get the lock, just bail and continue with what you did after >>> the timeout runs out. Is this some kind of safety-measure? Against what? >> >> When there are two threads fighting over the lock. Imagine two threads >> trying to set security label over the same file. Or imagine two daemons >> on two distant hosts trying to set label on the same file on NFS. >> virtlockd implements try-or-fail approach. So we need to call lock() >> repeatedly until we succeed (or timeout). > > One thread wins and the other fails with lock contention? I don't see > how repeatedly trying to acquire the lock will improve things, but maybe > I am not getting it right now. The point of metadata locking is not to prevent metadata change, but to be able to atomically change it. As I said in other thread, there is not much point in this feature alone since chown() and setfcon() are atomic themselves. But this is preparing the code for original label remembering which will be stored in XATTRs at which point the whole operation is not going to be atomic. https://www.redhat.com/archives/libvir-list/2018-September/msg01349.html Long story short, at the first chown() libvirt will record the original owner of the file into XATTRs and then on the last restore it will use that owner to return the file to instead of root:root. The locks are there because reading XATTR, parsing it, increasing/decreasing refcounter and possibly doing chown() is not atomic. But with locks it is. > >>> There IS a lock held on the image, from the FIRST domain that we >>> started. The second domain, which is using the SAME image, unshared, >>> runs into the locking timeout. Sorry if I failed to describe this setup >>> appropriately. I am starting to believe that this is expected behaviour, >>> although it is not intuitive. >>> >>> # lslocks -u >>> COMMAND PID TYPE SIZE MODE M START END PATH >>> ... >>> virtlockd 199062 POSIX 1.5G WRITE 0 0 0 /var/lib/libvirt/images/u1604.qcow2 >> >> But this should not clash, because metadata lock use different bytes: >> the regular locking uses offset=0, metadata lock uses offset=1. Both >> locks lock just one byte in length. > > It's the only lock in the output for the image. Should I see the > metadata lock at start=1 as well? You should see it if you run lslocks at the right moment. But since metadata locking happens only for a fraction of a second, it is very unlikely that you'll be able to run it at the right moment. > >> However, from the logs it looks like virtlockd doesn't try to actually >> acquire the lock because it found in internal records that the path is >> locked (even though it is locked with different offset). Anyway, now I >> am finally able to reproduce the issue and I'll look into this too. > > Good. > >> Quick and dirty patch might look something like this: >> >> diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c >> index 79fa48d365..3c51d7926b 100644 >> --- i/src/util/virlockspace.c >> +++ w/src/util/virlockspace.c >> @@ -630,8 +630,9 @@ int virLockSpaceAcquireResource(virLockSpacePtr >> lockspace, >> virMutexLock(&lockspace->lock); >> >> if ((res = virHashLookup(lockspace->resources, resname))) { >> - if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) && >> - (flags & VIR_LOCK_SPACE_ACQUIRE_SHARED)) { >> + if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED && >> + flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) || >> + start == 1) { >> >> if (VIR_EXPAND_N(res->owners, res->nOwners, 1) < 0) >> goto cleanup; > > Had a quick test with this, but now it seems like that the whole > metadata locking does nothing. When starting the second domain, I get an > internal error from QEMU, failing to get the write lock. The SELinux > labels of the image are changed as well. This is the same behaviour as > with metadata locking disabled. Not entirely sure what should happen or > if this is a error in my setup or not. I will have to think about this. Metadata locking is not supposed to prevent you from running two domains with the same disk. Nor to prevent chown() on the file. But since you are using virtlockd to lock the disk contents, it should prevent running such domains, not qemu. I wonder if this is a misconfiguration or something. Perhaps one domain has disk RW and the other has it RO + shared? Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Sep 27, 2018 at 09:01 AM +0200, Bjoern Walk <bwalk@linux.ibm.com> wrote: > Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]: >> On 09/19/2018 11:17 AM, Bjoern Walk wrote: >> > Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]: >> >> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]: >> >> >> >> > >> > Still seeing the same timeout. Is this expected behaviour? >> > >> >> Nope. I wonder if something has locked the path and forgot to unlock it >> (however, virtlockd should have unlocked all the paths owned by PID on >> connection close), or something is still holding the lock and connection >> opened. >> >> Can you see the timeout even when you turn off the selinux driver >> (security_driver="none' in qemu.conf)? I tried to reproduce the issue >> yesterday and was unsuccessful. Do you have any steps to reproduce? > > So, I haven't been able to actually dig into the debugging but we have > been able to reproduce this behaviour on x86 (both with SELinux and > DAC). Looks like a general problem, if a problem at all, because from > what I can see in the code, the 60 seconds timeout is actually intended, > or not? The security manager does try for 60 seconds to acquire the lock > and only then bails out. Why is this? Backtrace of libvirtd where it’s hanging (in thread A) (gdb) bt #0 read () from /lib64/libpthread.so.0 #1 read (__nbytes=1024, __buf=0x7f84e401afd0, __fd=31) at /usr/include/bits/unistd.h:44 #2 saferead (fd=fd@entry=31, buf=0x7f84e401afd0, count=count@entry=1024) at util/virfile.c:1061 #3 saferead_lim (fd=31, max_len=max_len@entry=1024, length=length@entry=0x7f84f3fa8240) at util/virfile.c:1345 #4 virFileReadHeaderFD (fd=<optimized out>, maxlen=maxlen@entry=1024, buf=buf@entry=0x7f84f3fa8278) at util/virfile.c:1376 #5 virProcessRunInMountNamespace () at util/virprocess.c:1149 #6 virSecuritySELinuxTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_selinux.c:1106 #7 virSecurityManagerTransactionCommit (mgr=0x7f848c0ffd60, pid=pid@entry=23977) at security/security_manager.c:324 #8 virSecurityStackTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_stack.c:166 #9 virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324 #10 qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at _security.c:56 #11 in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=a=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=1 qemu/qemu_process.c:6329 #12 in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, updatedCPU==0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6816 … #25 in start_thread () from /lib64/libpthread.so.0 #26 in clone () from /lib64/libc.so.6 Backtrace of the forked process where the process is trying to get the meta data lock for 60 seconds. #0 0x00007f8508572730 in nanosleep () from target:/lib64/libc.so.6 #1 0x00007f850859dff4 in usleep () from target:/lib64/libc.so.6 #2 0x00007f850c26efea in virTimeBackOffWait (var=var@entry=0x7f84f3fa81a0) at util/virtime.c:453 #3 0x00007f850c3108d8 in virSecurityManagerMetadataLock (mgr=0x7f848c183850, paths=<optimized out>, npaths=<optimized out>) at security/security_manager.c:1345 #4 0x00007f850c30e44b in virSecurityDACTransactionRun (pid=pid@entry=23977, opaque=opaque@entry=0x7f84e4021450) at security/security_dac.c:226 #5 0x00007f850c250021 in virProcessNamespaceHelper (opaque=0x7f84e4021450, cb=0x7f850c30e330 <virSecurityDACTransactionRun>, pid=23977, errfd=33) at util/virprocess.c:1100 #6 virProcessRunInMountNamespace () at util/virprocess.c:1140 #7 0x00007f850c30e55c in virSecurityDACTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_dac.c:565 #8 0x00007f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183850, pid=pid@entry=23977) at security/security_manager.c:324 #9 0x00007f850c30b36b in virSecurityStackTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_stack.c:166 #10 0x00007f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324 #11 0x00007f84e0586bf2 in qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at qemu/qemu_security.c:56 #12 0x00007f84e051b7fd in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6329 #13 0x00007f84e051ee2e in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6816 #14 0x00007f84e057f5cd in qemuDomainObjStart (conn=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=0x7f848c1c3a50, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at qemu/qemu_driver.c:7296 #15 0x00007f84e057fc19 in qemuDomainCreateWithFlags (dom=0x7f84e4001620, flags=0) at qemu/qemu_driver.c:7349 #16 0x00007f850c426d57 in virDomainCreate (domain=domain@entry=0x7f84e4001620) at libvirt-domain.c:6534 #17 0x000055a82b2f1cae in remoteDispatchDomainCreate (server=0x55a82c9e16d0, msg=0x55a82ca53e00, args=<optimized out>, rerr=0x7f84f3fa8960, client=0x55a82ca5d180) at remote/remote_daemon_dispatch_stubs.h:4434 #18 remoteDispatchDomainCreateHelper (server=0x55a82c9e16d0, client=0x55a82ca5d180, msg=0x55a82ca53e00, rerr=0x7f84f3fa8960, args=<optimized out>, ret=0x7f84e4003800) at remote/remote_daemon_dispatch_stubs.h:4410 #19 0x00007f850c338734 in virNetServerProgramDispatchCall (msg=0x55a82ca53e00, client=0x55a82ca5d180, server=0x55a82c9e16d0, prog=0x55a82ca4be70) at rpc/virnetserverprogram.c:437 #20 virNetServerProgramDispatch (prog=0x55a82ca4be70, server=server@entry=0x55a82c9e16d0, client=0x55a82ca5d180, msg=0x55a82ca53e00) at rpc/virnetserverprogram.c:304 #21 0x00007f850c33ec3c in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55a82c9e16d0) at rpc/virnetserver.c:144 #22 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55a82c9e16d0) at rpc/virnetserver.c:165 #23 0x00007f850c26db20 in virThreadPoolWorker (opaque=opaque@entry=0x55a82c9ebe30) at util/virthreadpool.c:167 #24 0x00007f850c26ce2c in virThreadHelper (data=<optimized out>) at util/virthread.c:206 #25 0x00007f8508872594 in start_thread () from target:/lib64/libpthread.so.0 #26 0x00007f85085a5e6f in clone () from target:/lib64/libc.so.6 > > However, an actual bug is that while the security manager waits for the > lock acquire the whole libvirtd hangs, but from what I understood and > Marc explained to me, this is more of a pathological error in libvirt > behaviour with various domain locks. Since thread A owns the lock for the virDomainObjList other operations like 'virsh list' won’t return for about 60s. Hope this information will help :) > > -- > IBM Systems > Linux on Z & Virtualization Development > -------------------------------------------------- > IBM Deutschland Research & Development GmbH > Schönaicher Str. 220, 71032 Böblingen > Phone: +49 7031 16 1819 > -------------------------------------------------- > Vorsitzende des Aufsichtsrats: Martina Koederitz > Geschäftsführung: Dirk Wittkopp > Sitz der Gesellschaft: Böblingen > Registergericht: Amtsgericht Stuttgart, HRB 243294 -- Kind regards / Beste Grüße Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On 09/27/2018 09:57 AM, Marc Hartmayer wrote: > On Thu, Sep 27, 2018 at 09:01 AM +0200, Bjoern Walk <bwalk@linux.ibm.com> wrote: >> Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]: >>> On 09/19/2018 11:17 AM, Bjoern Walk wrote: >>>> Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]: >>>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]: >>> >>>>> >>>> >>>> Still seeing the same timeout. Is this expected behaviour? >>>> >>> >>> Nope. I wonder if something has locked the path and forgot to unlock it >>> (however, virtlockd should have unlocked all the paths owned by PID on >>> connection close), or something is still holding the lock and connection >>> opened. >>> >>> Can you see the timeout even when you turn off the selinux driver >>> (security_driver="none' in qemu.conf)? I tried to reproduce the issue >>> yesterday and was unsuccessful. Do you have any steps to reproduce? >> >> So, I haven't been able to actually dig into the debugging but we have >> been able to reproduce this behaviour on x86 (both with SELinux and >> DAC). Looks like a general problem, if a problem at all, because from >> what I can see in the code, the 60 seconds timeout is actually intended, >> or not? The security manager does try for 60 seconds to acquire the lock >> and only then bails out. Why is this? > > Backtrace of libvirtd where it’s hanging (in thread A) > > (gdb) bt > #0 read () from /lib64/libpthread.so.0 > #1 read (__nbytes=1024, __buf=0x7f84e401afd0, __fd=31) at /usr/include/bits/unistd.h:44 > #2 saferead (fd=fd@entry=31, buf=0x7f84e401afd0, count=count@entry=1024) at util/virfile.c:1061 > #3 saferead_lim (fd=31, max_len=max_len@entry=1024, length=length@entry=0x7f84f3fa8240) at util/virfile.c:1345 > #4 virFileReadHeaderFD (fd=<optimized out>, maxlen=maxlen@entry=1024, buf=buf@entry=0x7f84f3fa8278) at util/virfile.c:1376 > #5 virProcessRunInMountNamespace () at util/virprocess.c:1149 > #6 virSecuritySELinuxTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_selinux.c:1106 > #7 virSecurityManagerTransactionCommit (mgr=0x7f848c0ffd60, pid=pid@entry=23977) at security/security_manager.c:324 > #8 virSecurityStackTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_stack.c:166 > #9 virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324 > #10 qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at _security.c:56 > #11 in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=a=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=1 qemu/qemu_process.c:6329 > #12 in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, > driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, > updatedCPU==0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, > migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, > snapshot=0x=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at > qemu/qemu_process.c:6816 This is just waiting for forked child to finish. Nothing suspicious here. > > … > > #25 in start_thread () from /lib64/libpthread.so.0 > #26 in clone () from /lib64/libc.so.6 > > > Backtrace of the forked process where the process is trying to get the > meta data lock for 60 seconds.> > #0 0x00007f8508572730 in nanosleep () from target:/lib64/libc.so.6 > #1 0x00007f850859dff4 in usleep () from target:/lib64/libc.so.6 > #2 0x00007f850c26efea in virTimeBackOffWait (var=var@entry=0x7f84f3fa81a0) at util/virtime.c:453 > #3 0x00007f850c3108d8 in virSecurityManagerMetadataLock (mgr=0x7f848c183850, paths=<optimized out>, npaths=<optimized out>) at security/security_manager.c:1345 > #4 0x00007f850c30e44b in virSecurityDACTransactionRun (pid=pid@entry=23977, opaque=opaque@entry=0x7f84e4021450) at security/security_dac.c:226 > #5 0x00007f850c250021 in virProcessNamespaceHelper (opaque=0x7f84e4021450, cb=0x7f850c30e330 <virSecurityDACTransactionRun>, pid=23977, errfd=33) at util/virprocess.c:1100 > #6 virProcessRunInMountNamespace () at util/virprocess.c:1140 > #7 0x00007f850c30e55c in virSecurityDACTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_dac.c:565 > #8 0x00007f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183850, pid=pid@entry=23977) at security/security_manager.c:324 > #9 0x00007f850c30b36b in virSecurityStackTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_stack.c:166 > #10 0x00007f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324 > #11 0x00007f84e0586bf2 in qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at qemu/qemu_security.c:56 > #12 0x00007f84e051b7fd in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6329 > #13 0x00007f84e051ee2e in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6816 > #14 0x00007f84e057f5cd in qemuDomainObjStart (conn=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=0x7f848c1c3a50, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at qemu/qemu_driver.c:7296 > #15 0x00007f84e057fc19 in qemuDomainCreateWithFlags (dom=0x7f84e4001620, flags=0) at qemu/qemu_driver.c:7349 > #16 0x00007f850c426d57 in virDomainCreate (domain=domain@entry=0x7f84e4001620) at libvirt-domain.c:6534 > #17 0x000055a82b2f1cae in remoteDispatchDomainCreate (server=0x55a82c9e16d0, msg=0x55a82ca53e00, args=<optimized out>, rerr=0x7f84f3fa8960, client=0x55a82ca5d180) at remote/remote_daemon_dispatch_stubs.h:4434 > #18 remoteDispatchDomainCreateHelper (server=0x55a82c9e16d0, client=0x55a82ca5d180, msg=0x55a82ca53e00, rerr=0x7f84f3fa8960, args=<optimized out>, ret=0x7f84e4003800) at remote/remote_daemon_dispatch_stubs.h:4410 > #19 0x00007f850c338734 in virNetServerProgramDispatchCall (msg=0x55a82ca53e00, client=0x55a82ca5d180, server=0x55a82c9e16d0, prog=0x55a82ca4be70) at rpc/virnetserverprogram.c:437 > #20 virNetServerProgramDispatch (prog=0x55a82ca4be70, server=server@entry=0x55a82c9e16d0, client=0x55a82ca5d180, msg=0x55a82ca53e00) at rpc/virnetserverprogram.c:304 > #21 0x00007f850c33ec3c in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55a82c9e16d0) at rpc/virnetserver.c:144 > #22 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55a82c9e16d0) at rpc/virnetserver.c:165 > #23 0x00007f850c26db20 in virThreadPoolWorker (opaque=opaque@entry=0x55a82c9ebe30) at util/virthreadpool.c:167 > #24 0x00007f850c26ce2c in virThreadHelper (data=<optimized out>) at util/virthread.c:206 > #25 0x00007f8508872594 in start_thread () from target:/lib64/libpthread.so.0 > #26 0x00007f85085a5e6f in clone () from target:/lib64/libc.so.6 Right, so this is just waiting for virtlockd to lock the paths. virtlockd is obviously unable to do that (as I suggested in my previous e-mail - is perhaps some other process holding the lock?). Can you please enable debug logs for virtlockd, reproduce and share the log with me? Setting this in /etc/libvirt/virtlockd.conf should be enough: log_level=1 log_filters="4:event 3:rpc" log_outputs="1:file:/var/log/virtlockd.lod" Thanks, Michal -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
On Thu, Sep 27, 2018 at 10:14 AM +0200, Michal Privoznik <mprivozn@redhat.com> wrote: > On 09/27/2018 09:57 AM, Marc Hartmayer wrote: >> On Thu, Sep 27, 2018 at 09:01 AM +0200, Bjoern Walk <bwalk@linux.ibm.com> wrote: >>> Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]: >>>> On 09/19/2018 11:17 AM, Bjoern Walk wrote: >>>>> Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]: >>>>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]: >>>> >>>>>> >>>>> >>>>> Still seeing the same timeout. Is this expected behaviour? >>>>> >>>> >>>> Nope. I wonder if something has locked the path and forgot to unlock it >>>> (however, virtlockd should have unlocked all the paths owned by PID on >>>> connection close), or something is still holding the lock and connection >>>> opened. >>>> >>>> Can you see the timeout even when you turn off the selinux driver >>>> (security_driver="none' in qemu.conf)? I tried to reproduce the issue >>>> yesterday and was unsuccessful. Do you have any steps to reproduce? >>> >>> So, I haven't been able to actually dig into the debugging but we have >>> been able to reproduce this behaviour on x86 (both with SELinux and >>> DAC). Looks like a general problem, if a problem at all, because from >>> what I can see in the code, the 60 seconds timeout is actually intended, >>> or not? The security manager does try for 60 seconds to acquire the lock >>> and only then bails out. Why is this? >> >> Backtrace of libvirtd where it’s hanging (in thread A) >> >> (gdb) bt >> #0 read () from /lib64/libpthread.so.0 >> #1 read (__nbytes=1024, __buf=0x7f84e401afd0, __fd=31) at /usr/include/bits/unistd.h:44 >> #2 saferead (fd=fd@entry=31, buf=0x7f84e401afd0, count=count@entry=1024) at util/virfile.c:1061 >> #3 saferead_lim (fd=31, max_len=max_len@entry=1024, length=length@entry=0x7f84f3fa8240) at util/virfile.c:1345 >> #4 virFileReadHeaderFD (fd=<optimized out>, maxlen=maxlen@entry=1024, buf=buf@entry=0x7f84f3fa8278) at util/virfile.c:1376 >> #5 virProcessRunInMountNamespace () at util/virprocess.c:1149 >> #6 virSecuritySELinuxTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_selinux.c:1106 >> #7 virSecurityManagerTransactionCommit (mgr=0x7f848c0ffd60, pid=pid@entry=23977) at security/security_manager.c:324 >> #8 virSecurityStackTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_stack.c:166 >> #9 virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324 >> #10 qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at _security.c:56 >> #11 in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=a=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=1 qemu/qemu_process.c:6329 >> #12 in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, >> driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, >> updatedCPU==0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, >> migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, >> snapshot=0x=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at >> qemu/qemu_process.c:6816 > > > This is just waiting for forked child to finish. Nothing suspicious here. > >> >> … >> >> #25 in start_thread () from /lib64/libpthread.so.0 >> #26 in clone () from /lib64/libc.so.6 >> >> >> Backtrace of the forked process where the process is trying to get the >> meta data lock for 60 seconds.> >> #0 0x00007f8508572730 in nanosleep () from target:/lib64/libc.so.6 >> #1 0x00007f850859dff4 in usleep () from target:/lib64/libc.so.6 >> #2 0x00007f850c26efea in virTimeBackOffWait (var=var@entry=0x7f84f3fa81a0) at util/virtime.c:453 >> #3 0x00007f850c3108d8 in virSecurityManagerMetadataLock (mgr=0x7f848c183850, paths=<optimized out>, npaths=<optimized out>) at security/security_manager.c:1345 >> #4 0x00007f850c30e44b in virSecurityDACTransactionRun (pid=pid@entry=23977, opaque=opaque@entry=0x7f84e4021450) at security/security_dac.c:226 >> #5 0x00007f850c250021 in virProcessNamespaceHelper (opaque=0x7f84e4021450, cb=0x7f850c30e330 <virSecurityDACTransactionRun>, pid=23977, errfd=33) at util/virprocess.c:1100 >> #6 virProcessRunInMountNamespace () at util/virprocess.c:1140 >> #7 0x00007f850c30e55c in virSecurityDACTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_dac.c:565 >> #8 0x00007f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183850, pid=pid@entry=23977) at security/security_manager.c:324 >> #9 0x00007f850c30b36b in virSecurityStackTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_stack.c:166 >> #10 0x00007f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324 >> #11 0x00007f84e0586bf2 in qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at qemu/qemu_security.c:56 >> #12 0x00007f84e051b7fd in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6329 >> #13 0x00007f84e051ee2e in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6816 >> #14 0x00007f84e057f5cd in qemuDomainObjStart (conn=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=0x7f848c1c3a50, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at qemu/qemu_driver.c:7296 >> #15 0x00007f84e057fc19 in qemuDomainCreateWithFlags (dom=0x7f84e4001620, flags=0) at qemu/qemu_driver.c:7349 >> #16 0x00007f850c426d57 in virDomainCreate (domain=domain@entry=0x7f84e4001620) at libvirt-domain.c:6534 >> #17 0x000055a82b2f1cae in remoteDispatchDomainCreate (server=0x55a82c9e16d0, msg=0x55a82ca53e00, args=<optimized out>, rerr=0x7f84f3fa8960, client=0x55a82ca5d180) at remote/remote_daemon_dispatch_stubs.h:4434 >> #18 remoteDispatchDomainCreateHelper (server=0x55a82c9e16d0, client=0x55a82ca5d180, msg=0x55a82ca53e00, rerr=0x7f84f3fa8960, args=<optimized out>, ret=0x7f84e4003800) at remote/remote_daemon_dispatch_stubs.h:4410 >> #19 0x00007f850c338734 in virNetServerProgramDispatchCall (msg=0x55a82ca53e00, client=0x55a82ca5d180, server=0x55a82c9e16d0, prog=0x55a82ca4be70) at rpc/virnetserverprogram.c:437 >> #20 virNetServerProgramDispatch (prog=0x55a82ca4be70, server=server@entry=0x55a82c9e16d0, client=0x55a82ca5d180, msg=0x55a82ca53e00) at rpc/virnetserverprogram.c:304 >> #21 0x00007f850c33ec3c in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55a82c9e16d0) at rpc/virnetserver.c:144 >> #22 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55a82c9e16d0) at rpc/virnetserver.c:165 >> #23 0x00007f850c26db20 in virThreadPoolWorker (opaque=opaque@entry=0x55a82c9ebe30) at util/virthreadpool.c:167 >> #24 0x00007f850c26ce2c in virThreadHelper (data=<optimized out>) at util/virthread.c:206 >> #25 0x00007f8508872594 in start_thread () from target:/lib64/libpthread.so.0 >> #26 0x00007f85085a5e6f in clone () from target:/lib64/libc.so.6 > > Right, so this is just waiting for virtlockd to lock the paths. > virtlockd is obviously unable to do that (as I suggested in my previous > e-mail - is perhaps some other process holding the lock?). The other domain is holding the lock (obviously). > > Can you please enable debug logs for virtlockd, reproduce and share the > log with me? Setting this in /etc/libvirt/virtlockd.conf should be > enough: Sure, but I don’t think this is needed. We’re trying to start two domains which have disks backed on the same image. So the current behavior is probably intended (apart from the problem with the later API calls). In my opinion, the timeout is just way too high. > > log_level=1 > log_filters="4:event 3:rpc" > log_outputs="1:file:/var/log/virtlockd.lod" > > Thanks, > Michal > -- Kind regards / Beste Grüße Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Michal Privoznik <mprivozn@redhat.com> [2018-09-27, 10:14AM +0200]: > Right, so this is just waiting for virtlockd to lock the paths. > virtlockd is obviously unable to do that (as I suggested in my previous > e-mail - is perhaps some other process holding the lock?). It can not lock the paths, because those paths are already locked by the first domain. This is intentional. But it should take 60 seconds to resolve this and block the whole libvirt daemon in the process. > Can you please enable debug logs for virtlockd, reproduce and share the > log with me? Setting this in /etc/libvirt/virtlockd.conf should be enough: > > log_level=1 > log_filters="4:event 3:rpc" > log_outputs="1:file:/var/log/virtlockd.lod" See attached. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
© 2016 - 2023 Red Hat, Inc.