[libvirt] [PATCH v4 00/23] Introduce metadata locking

Michal Privoznik posted 23 patches 5 years, 7 months ago
Failed in applying to current master (apply log)
cfg.mk                             |   4 +-
src/locking/lock_daemon_dispatch.c |  25 ++-
src/locking/lock_driver.h          |  12 ++
src/locking/lock_driver_lockd.c    | 417 +++++++++++++++++++++++++------------
src/locking/lock_driver_lockd.h    |   1 +
src/locking/lock_driver_sanlock.c  |  44 ++--
src/locking/lock_manager.c         |  10 +-
src/lxc/lxc_controller.c           |   3 +-
src/lxc/lxc_driver.c               |   2 +-
src/qemu/libvirtd_qemu.aug         |   1 +
src/qemu/qemu.conf                 |   8 +
src/qemu/qemu_conf.c               |  13 ++
src/qemu/qemu_conf.h               |   1 +
src/qemu/qemu_domain.c             |   3 +-
src/qemu/qemu_driver.c             |  10 +-
src/qemu/qemu_process.c            |  15 +-
src/qemu/qemu_security.c           | 272 +++++++++++++++++-------
src/qemu/qemu_security.h           |  18 +-
src/qemu/test_libvirtd_qemu.aug.in |   1 +
src/security/security_dac.c        | 134 ++++++++----
src/security/security_manager.c    | 171 ++++++++++++++-
src/security/security_manager.h    |   9 +
src/security/security_selinux.c    | 118 ++++++++---
src/util/virlockspace.c            |  15 +-
src/util/virlockspace.h            |   4 +
tests/seclabeltest.c               |   2 +-
tests/securityselinuxlabeltest.c   |   2 +-
tests/securityselinuxtest.c        |   2 +-
tests/testutilsqemu.c              |   2 +-
tests/virlockspacetest.c           |  29 ++-
30 files changed, 1006 insertions(+), 342 deletions(-)
[libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Michal Privoznik 5 years, 7 months ago
Technically, this is v4 of:

https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html

However, this is implementing different approach than any of the
previous versions.

One of the problems with previous version was that it was too
complicated. The main reason for that was that we could not close the
connection whilst there was a file locked. So we had to invent a
mechanism that would prevent that (on the client side).

These patches implement different approach. They rely on secdriver's
transactions which bring all the paths we want to label into one place
so that they can be relabelled within different namespace.
I'm extending this idea so that transactions run all the time
(regardless of domain namespacing) and only at the very last moment is
decided which namespace would the relabeling run in.

Metadata locking is then as easy as putting lock/unlock calls around one
function.

You can find the patches at my github too:

https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt


Michal Prívozník (23):
  qemu_security: Fully implement qemuSecurityDomainSetPathLabel
  qemu_security: Fully implement
    qemuSecurity{Set,Restore}SavedStateLabel
  qemu_security: Require full wrappers for APIs that might touch a file
  virSecurityManagerTransactionCommit: Accept pid == -1
  qemu_security: Run transactions more frequently
  virlockspace: Allow caller to specify start and length offset in
    virLockSpaceAcquireResource
  lock_driver_lockd: Introduce
    VIR_LOCK_SPACE_PROTOCOL_ACQUIRE_RESOURCE_METADATA flag
  lock_driver: Introduce new VIR_LOCK_MANAGER_OBJECT_TYPE_DAEMON
  _virLockManagerLockDaemonPrivate: Move @hasRWDisks into dom union
  lock_driver: Introduce VIR_LOCK_MANAGER_RESOURCE_TYPE_METADATA
  lock_driver: Introduce VIR_LOCK_MANAGER_ACQUIRE_ROLLBACK
  lock_daemon_dispatch: Check for ownerPid rather than ownerId
  lock_manager: Allow disabling configFile for virLockManagerPluginNew
  qemu_conf: Introduce metadata_lock_manager
  security_manager: Load lock plugin on init
  security_manager: Introduce metadata locking APIs
  security_dac: Move transaction handling up one level
  security_dac: Fix info messages when chown()-ing
  security_dac: Lock metadata when running transaction
  virSecuritySELinuxRestoreFileLabel: Rename 'err' label
  virSecuritySELinuxRestoreFileLabel: Adjust code pattern
  security_selinux: Move transaction handling up one level
  security_dac: Lock metadata when running transaction

 cfg.mk                             |   4 +-
 src/locking/lock_daemon_dispatch.c |  25 ++-
 src/locking/lock_driver.h          |  12 ++
 src/locking/lock_driver_lockd.c    | 417 +++++++++++++++++++++++++------------
 src/locking/lock_driver_lockd.h    |   1 +
 src/locking/lock_driver_sanlock.c  |  44 ++--
 src/locking/lock_manager.c         |  10 +-
 src/lxc/lxc_controller.c           |   3 +-
 src/lxc/lxc_driver.c               |   2 +-
 src/qemu/libvirtd_qemu.aug         |   1 +
 src/qemu/qemu.conf                 |   8 +
 src/qemu/qemu_conf.c               |  13 ++
 src/qemu/qemu_conf.h               |   1 +
 src/qemu/qemu_domain.c             |   3 +-
 src/qemu/qemu_driver.c             |  10 +-
 src/qemu/qemu_process.c            |  15 +-
 src/qemu/qemu_security.c           | 272 +++++++++++++++++-------
 src/qemu/qemu_security.h           |  18 +-
 src/qemu/test_libvirtd_qemu.aug.in |   1 +
 src/security/security_dac.c        | 134 ++++++++----
 src/security/security_manager.c    | 171 ++++++++++++++-
 src/security/security_manager.h    |   9 +
 src/security/security_selinux.c    | 118 ++++++++---
 src/util/virlockspace.c            |  15 +-
 src/util/virlockspace.h            |   4 +
 tests/seclabeltest.c               |   2 +-
 tests/securityselinuxlabeltest.c   |   2 +-
 tests/securityselinuxtest.c        |   2 +-
 tests/testutilsqemu.c              |   2 +-
 tests/virlockspacetest.c           |  29 ++-
 30 files changed, 1006 insertions(+), 342 deletions(-)

-- 
2.16.4

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Michal Privoznik 5 years, 7 months ago
On 09/10/2018 11:36 AM, Michal Privoznik wrote:
> 

Hopefully, fixed all the nits John found, and pushed.

Thanks John for the review!

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Bjoern Walk 5 years, 7 months ago
Michal Privoznik <mprivozn@redhat.com> [2018-09-10, 11:36AM +0200]:
> Technically, this is v4 of:
> 
> https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html
> 
> However, this is implementing different approach than any of the
> previous versions.
> 
> One of the problems with previous version was that it was too
> complicated. The main reason for that was that we could not close the
> connection whilst there was a file locked. So we had to invent a
> mechanism that would prevent that (on the client side).
> 
> These patches implement different approach. They rely on secdriver's
> transactions which bring all the paths we want to label into one place
> so that they can be relabelled within different namespace.
> I'm extending this idea so that transactions run all the time
> (regardless of domain namespacing) and only at the very last moment is
> decided which namespace would the relabeling run in.
> 
> Metadata locking is then as easy as putting lock/unlock calls around one
> function.
> 
> You can find the patches at my github too:
> 
> https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt

Hey Michal,

is was running a quick test with this patch series with two domains
sharing a disk image without <shareable/> and SELinux enabled. When
starting the second domain, the whole libvirtd daemon hangs for almost a
minute until giving the error that the image is locked. I haven't
debugged it yet to figure out what happens.

Otherwise it's looking good, relabeling is prevented as expected.

Bjoern
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Michal Privoznik 5 years, 7 months ago
On 09/12/2018 07:19 AM, Bjoern Walk wrote:
> Michal Privoznik <mprivozn@redhat.com> [2018-09-10, 11:36AM +0200]:
>> Technically, this is v4 of:
>>
>> https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html
>>
>> However, this is implementing different approach than any of the
>> previous versions.
>>
>> One of the problems with previous version was that it was too
>> complicated. The main reason for that was that we could not close the
>> connection whilst there was a file locked. So we had to invent a
>> mechanism that would prevent that (on the client side).
>>
>> These patches implement different approach. They rely on secdriver's
>> transactions which bring all the paths we want to label into one place
>> so that they can be relabelled within different namespace.
>> I'm extending this idea so that transactions run all the time
>> (regardless of domain namespacing) and only at the very last moment is
>> decided which namespace would the relabeling run in.
>>
>> Metadata locking is then as easy as putting lock/unlock calls around one
>> function.
>>
>> You can find the patches at my github too:
>>
>> https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt
> 
> Hey Michal,
> 
> is was running a quick test with this patch series with two domains
> sharing a disk image without <shareable/> and SELinux enabled. When
> starting the second domain, the whole libvirtd daemon hangs for almost a
> minute until giving the error that the image is locked. I haven't
> debugged it yet to figure out what happens.

Is this on two different hosts or one?

I'm unable to reproduce, so can you please attach debugger and share 't
a a bt' output with me?

When I try to reproduce I always get one domain running and the other
failing to start because qemu is unable to do its locking. When I put
<shareable/> in both, they start successfully.

TBH, I don't run SELinux enabled host so I'm testing DAC only, but the
changes to the code are merely the same. So I'm wondering if this is
really an issue in my SELinux impl or somewhere else.

BTW: The one minute timeout comes from patch 16/23:

  #define LOCK_ACQUIRE_TIMEOUT 60

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Bjoern Walk 5 years, 7 months ago
Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]:
> On 09/12/2018 07:19 AM, Bjoern Walk wrote:
> > Michal Privoznik <mprivozn@redhat.com> [2018-09-10, 11:36AM +0200]:
> >> Technically, this is v4 of:
> >>
> >> https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html
> >>
> >> However, this is implementing different approach than any of the
> >> previous versions.
> >>
> >> One of the problems with previous version was that it was too
> >> complicated. The main reason for that was that we could not close the
> >> connection whilst there was a file locked. So we had to invent a
> >> mechanism that would prevent that (on the client side).
> >>
> >> These patches implement different approach. They rely on secdriver's
> >> transactions which bring all the paths we want to label into one place
> >> so that they can be relabelled within different namespace.
> >> I'm extending this idea so that transactions run all the time
> >> (regardless of domain namespacing) and only at the very last moment is
> >> decided which namespace would the relabeling run in.
> >>
> >> Metadata locking is then as easy as putting lock/unlock calls around one
> >> function.
> >>
> >> You can find the patches at my github too:
> >>
> >> https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt
> > 
> > Hey Michal,
> > 
> > is was running a quick test with this patch series with two domains
> > sharing a disk image without <shareable/> and SELinux enabled. When
> > starting the second domain, the whole libvirtd daemon hangs for almost a
> > minute until giving the error that the image is locked. I haven't
> > debugged it yet to figure out what happens.
> 
> Is this on two different hosts or one?

On the same host.

> I'm unable to reproduce, so can you please attach debugger and share 't
> a a bt' output with me?

(gdb) thread apply all bt

Thread 17 (Thread 0x3ff48dff910 (LWP 193353)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff6678c5a8 in udevEventHandleThread (opaque=<optimized out>) at node_device/node_device_udev.c:1603
#3  0x000003ff8c6bad16 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 16 (Thread 0x3ff4acfe910 (LWP 193312)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 15 (Thread 0x3ff4b4ff910 (LWP 193311)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 14 (Thread 0x3ff64efd910 (LWP 193310)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 13 (Thread 0x3ff656fe910 (LWP 193309)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 12 (Thread 0x3ff65eff910 (LWP 193308)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 11 (Thread 0x3ff67fff910 (LWP 193307)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 10 (Thread 0x3ff84bf7910 (LWP 193306)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 9 (Thread 0x3ff853f8910 (LWP 193305)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 8 (Thread 0x3ff85bf9910 (LWP 193304)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 7 (Thread 0x3ff863fa910 (LWP 193303)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 6 (Thread 0x3ff86bfb910 (LWP 193302)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 5 (Thread 0x3ff873fc910 (LWP 193301)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 4 (Thread 0x3ff87bfd910 (LWP 193300)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 3 (Thread 0x3ff883fe910 (LWP 193299)):
#0  0x000003ff8b7113d4 in read () from /lib64/libpthread.so.0
#1  0x000003ff8c65a648 in saferead () from /lib64/libvirt.so.0
#2  0x000003ff8c65a718 in saferead_lim () from /lib64/libvirt.so.0
#3  0x000003ff8c65abe4 in virFileReadHeaderFD () from /lib64/libvirt.so.0
#4  0x000003ff8c69cb5a in virProcessRunInMountNamespace () from /lib64/libvirt.so.0
#5  0x000003ff8c767bd6 in virSecuritySELinuxTransactionCommit () from /lib64/libvirt.so.0
#6  0x000003ff8c75fb1c in virSecurityManagerTransactionCommit () from /lib64/libvirt.so.0
#7  0x000003ff8c75bb70 in virSecurityStackTransactionCommit () from /lib64/libvirt.so.0
#8  0x000003ff8c75fb1c in virSecurityManagerTransactionCommit () from /lib64/libvirt.so.0
#9  0x000003ff6612c492 in qemuSecuritySetAllLabel () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#10 0x000003ff660bcfd6 in qemuProcessLaunch () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#11 0x000003ff660c0916 in qemuProcessStart () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#12 0x000003ff66124a00 in qemuDomainObjStart.constprop.52 () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#13 0x000003ff66125030 in qemuDomainCreateWithFlags () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
#14 0x000003ff8c87bfca in virDomainCreate () from /lib64/libvirt.so.0
#15 0x000002aa1f2d516c in remoteDispatchDomainCreate (server=<optimized out>, msg=0x2aa4c488b00, args=<optimized out>, rerr=0x3ff883fdae8, client=<optimized out>) at remote/remote_daemon_dispatch_stubs.h:4434
#16 remoteDispatchDomainCreateHelper (server=<optimized out>, client=<optimized out>, msg=0x2aa4c488b00, rerr=0x3ff883fdae8, args=<optimized out>, ret=0x3ff78004c80) at remote/remote_daemon_dispatch_stubs.h:4410
#17 0x000003ff8c78bc80 in virNetServerProgramDispatch () from /lib64/libvirt.so.0
#18 0x000003ff8c792aba in virNetServerHandleJob () from /lib64/libvirt.so.0
#19 0x000003ff8c6bbb16 in virThreadPoolWorker () from /lib64/libvirt.so.0
#20 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#21 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#22 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 2 (Thread 0x3ff88bff910 (LWP 193298)):
#0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
#2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
#3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
#4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
#5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6

Thread 1 (Thread 0x3ff8cbd7970 (LWP 193297)):
#0  0x000003ff8b5ee502 in poll () from /lib64/libc.so.6
#1  0x000003ff8c65629c in virEventPollRunOnce () from /lib64/libvirt.so.0
#2  0x000003ff8c654caa in virEventRunDefaultImpl () from /lib64/libvirt.so.0
#3  0x000003ff8c7921de in virNetDaemonRun () from /lib64/libvirt.so.0
#4  0x000002aa1f2a4c12 in main (argc=<optimized out>, argv=<optimized out>) at remote/remote_daemon.c:1461

> When I try to reproduce I always get one domain running and the other
> failing to start because qemu is unable to do its locking. When I put
> <shareable/> in both, they start successfully.

Yes, I get the same (expected) behaviour, just with the timeout.

> TBH, I don't run SELinux enabled host so I'm testing DAC only, but the
> changes to the code are merely the same. So I'm wondering if this is
> really an issue in my SELinux impl or somewhere else.
> 
> BTW: The one minute timeout comes from patch 16/23:
> 
>   #define LOCK_ACQUIRE_TIMEOUT 60
> 
> Michal

It's entirely possible that my setup is broken, this was just a quick
test. If you can't reproduce it or make sense out of the backtrace then
it's fine, I will try to put some time to debug it next week. I was just
interested if this would fix a long-standing bug where this scenario
would crash the first domain because the label changed.

Bjoern

-- 
IBM Systems
Linux on Z & Virtualization Development
------------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Schönaicher Str. 220, 71032 Böblingen
Phone: +49 7031 16 1819
------------------------------------------------------------------------
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294 
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Bjoern Walk 5 years, 7 months ago
Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]:
> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]:
> > On 09/12/2018 07:19 AM, Bjoern Walk wrote:
> > > Michal Privoznik <mprivozn@redhat.com> [2018-09-10, 11:36AM +0200]:
> > >> Technically, this is v4 of:
> > >>
> > >> https://www.redhat.com/archives/libvir-list/2018-August/msg01627.html
> > >>
> > >> However, this is implementing different approach than any of the
> > >> previous versions.
> > >>
> > >> One of the problems with previous version was that it was too
> > >> complicated. The main reason for that was that we could not close the
> > >> connection whilst there was a file locked. So we had to invent a
> > >> mechanism that would prevent that (on the client side).
> > >>
> > >> These patches implement different approach. They rely on secdriver's
> > >> transactions which bring all the paths we want to label into one place
> > >> so that they can be relabelled within different namespace.
> > >> I'm extending this idea so that transactions run all the time
> > >> (regardless of domain namespacing) and only at the very last moment is
> > >> decided which namespace would the relabeling run in.
> > >>
> > >> Metadata locking is then as easy as putting lock/unlock calls around one
> > >> function.
> > >>
> > >> You can find the patches at my github too:
> > >>
> > >> https://github.com/zippy2/libvirt/tree/disk_metadata_lock_v4_alt
> > > 
> > > Hey Michal,
> > > 
> > > is was running a quick test with this patch series with two domains
> > > sharing a disk image without <shareable/> and SELinux enabled. When
> > > starting the second domain, the whole libvirtd daemon hangs for almost a
> > > minute until giving the error that the image is locked. I haven't
> > > debugged it yet to figure out what happens.
> > 
> > Is this on two different hosts or one?
> 
> On the same host.
> 
> > I'm unable to reproduce, so can you please attach debugger and share 't
> > a a bt' output with me?
> 
> (gdb) thread apply all bt
> 
> Thread 17 (Thread 0x3ff48dff910 (LWP 193353)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff6678c5a8 in udevEventHandleThread (opaque=<optimized out>) at node_device/node_device_udev.c:1603
> #3  0x000003ff8c6bad16 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 16 (Thread 0x3ff4acfe910 (LWP 193312)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 15 (Thread 0x3ff4b4ff910 (LWP 193311)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 14 (Thread 0x3ff64efd910 (LWP 193310)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 13 (Thread 0x3ff656fe910 (LWP 193309)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 12 (Thread 0x3ff65eff910 (LWP 193308)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 11 (Thread 0x3ff67fff910 (LWP 193307)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 10 (Thread 0x3ff84bf7910 (LWP 193306)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 9 (Thread 0x3ff853f8910 (LWP 193305)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 8 (Thread 0x3ff85bf9910 (LWP 193304)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 7 (Thread 0x3ff863fa910 (LWP 193303)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbba0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 6 (Thread 0x3ff86bfb910 (LWP 193302)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 5 (Thread 0x3ff873fc910 (LWP 193301)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 4 (Thread 0x3ff87bfd910 (LWP 193300)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 3 (Thread 0x3ff883fe910 (LWP 193299)):
> #0  0x000003ff8b7113d4 in read () from /lib64/libpthread.so.0
> #1  0x000003ff8c65a648 in saferead () from /lib64/libvirt.so.0
> #2  0x000003ff8c65a718 in saferead_lim () from /lib64/libvirt.so.0
> #3  0x000003ff8c65abe4 in virFileReadHeaderFD () from /lib64/libvirt.so.0
> #4  0x000003ff8c69cb5a in virProcessRunInMountNamespace () from /lib64/libvirt.so.0
> #5  0x000003ff8c767bd6 in virSecuritySELinuxTransactionCommit () from /lib64/libvirt.so.0
> #6  0x000003ff8c75fb1c in virSecurityManagerTransactionCommit () from /lib64/libvirt.so.0
> #7  0x000003ff8c75bb70 in virSecurityStackTransactionCommit () from /lib64/libvirt.so.0
> #8  0x000003ff8c75fb1c in virSecurityManagerTransactionCommit () from /lib64/libvirt.so.0
> #9  0x000003ff6612c492 in qemuSecuritySetAllLabel () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #10 0x000003ff660bcfd6 in qemuProcessLaunch () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #11 0x000003ff660c0916 in qemuProcessStart () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #12 0x000003ff66124a00 in qemuDomainObjStart.constprop.52 () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #13 0x000003ff66125030 in qemuDomainCreateWithFlags () from /usr/lib64/libvirt/connection-driver/libvirt_driver_qemu.so
> #14 0x000003ff8c87bfca in virDomainCreate () from /lib64/libvirt.so.0
> #15 0x000002aa1f2d516c in remoteDispatchDomainCreate (server=<optimized out>, msg=0x2aa4c488b00, args=<optimized out>, rerr=0x3ff883fdae8, client=<optimized out>) at remote/remote_daemon_dispatch_stubs.h:4434
> #16 remoteDispatchDomainCreateHelper (server=<optimized out>, client=<optimized out>, msg=0x2aa4c488b00, rerr=0x3ff883fdae8, args=<optimized out>, ret=0x3ff78004c80) at remote/remote_daemon_dispatch_stubs.h:4410
> #17 0x000003ff8c78bc80 in virNetServerProgramDispatch () from /lib64/libvirt.so.0
> #18 0x000003ff8c792aba in virNetServerHandleJob () from /lib64/libvirt.so.0
> #19 0x000003ff8c6bbb16 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #20 0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #21 0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #22 0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 2 (Thread 0x3ff88bff910 (LWP 193298)):
> #0  0x000003ff8b70d7b8 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
> #1  0x000003ff8c6baf92 in virCondWait () from /lib64/libvirt.so.0
> #2  0x000003ff8c6bbbe0 in virThreadPoolWorker () from /lib64/libvirt.so.0
> #3  0x000003ff8c6bace6 in virThreadHelper () from /lib64/libvirt.so.0
> #4  0x000003ff8b7079a8 in start_thread () from /lib64/libpthread.so.0
> #5  0x000003ff8b5f9706 in thread_start () from /lib64/libc.so.6
> 
> Thread 1 (Thread 0x3ff8cbd7970 (LWP 193297)):
> #0  0x000003ff8b5ee502 in poll () from /lib64/libc.so.6
> #1  0x000003ff8c65629c in virEventPollRunOnce () from /lib64/libvirt.so.0
> #2  0x000003ff8c654caa in virEventRunDefaultImpl () from /lib64/libvirt.so.0
> #3  0x000003ff8c7921de in virNetDaemonRun () from /lib64/libvirt.so.0
> #4  0x000002aa1f2a4c12 in main (argc=<optimized out>, argv=<optimized out>) at remote/remote_daemon.c:1461
> 
> > When I try to reproduce I always get one domain running and the other
> > failing to start because qemu is unable to do its locking. When I put
> > <shareable/> in both, they start successfully.
> 
> Yes, I get the same (expected) behaviour, just with the timeout.
> 
> > TBH, I don't run SELinux enabled host so I'm testing DAC only, but the
> > changes to the code are merely the same. So I'm wondering if this is
> > really an issue in my SELinux impl or somewhere else.
> > 
> > BTW: The one minute timeout comes from patch 16/23:
> > 
> >   #define LOCK_ACQUIRE_TIMEOUT 60
> > 
> > Michal
> 
> It's entirely possible that my setup is broken, this was just a quick
> test. If you can't reproduce it or make sense out of the backtrace then
> it's fine, I will try to put some time to debug it next week. I was just
> interested if this would fix a long-standing bug where this scenario
> would crash the first domain because the label changed.
> 
> Bjoern
> 

Still seeing the same timeout. Is this expected behaviour?

-- 
IBM Systems
Linux on Z & Virtualization Development
------------------------------------------------------------------------
IBM Deutschland Research & Development GmbH
Schönaicher Str. 220, 71032 Böblingen
Phone: +49 7031 16 1819
------------------------------------------------------------------------
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294 
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Michal Privoznik 5 years, 7 months ago
On 09/19/2018 11:17 AM, Bjoern Walk wrote:
> Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]:
>> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]:

>>
> 
> Still seeing the same timeout. Is this expected behaviour?
> 

Nope. I wonder if something has locked the path and forgot to unlock it
(however, virtlockd should have unlocked all the paths owned by PID on
connection close), or something is still holding the lock and connection
opened.

Can you see the timeout even when you turn off the selinux driver
(security_driver="none' in qemu.conf)? I tried to reproduce the issue
yesterday and was unsuccessful. Do you have any steps to reproduce?

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Bjoern Walk 5 years, 7 months ago
Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]:
> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
> > Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]:
> >> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]:
> 
> >>
> > 
> > Still seeing the same timeout. Is this expected behaviour?
> > 
> 
> Nope. I wonder if something has locked the path and forgot to unlock it
> (however, virtlockd should have unlocked all the paths owned by PID on
> connection close), or something is still holding the lock and connection
> opened.

I can reproduce on a freshly booted machine. There should be no rouge
lock held anywhere.

> 
> Can you see the timeout even when you turn off the selinux driver
> (security_driver="none' in qemu.conf)? I tried to reproduce the issue

Yes, same issue.

> yesterday and was unsuccessful. Do you have any steps to reproduce?

0. Host setup:

    # /usr/sbin/sestatus
    SELinux status:                 enabled
    SELinuxfs mount:                /sys/fs/selinux
    SELinux root directory:         /etc/selinux
    Loaded policy name:             targeted
    Current mode:                   enforcing
    Mode from config file:          enforcing
    Policy MLS status:              enabled
    Policy deny_unknown status:     allowed
    Memory protection checking:     actual (secure)
    Max kernel policy version:      31

    # grep -E "^[^#]" /etc/libvirt/qemu.conf
    lock_manager = "lockd"
    metadata_lock_manager = "lockd"

1. Define two domains, u1604-1 and u1604-2, using the same image, not
   shared:

    <domain type='kvm'>
      <name>u1604-1</name>
      <uuid>e49679de-eca9-4a72-8d1a-56f437541157</uuid>
      <memory unit='KiB'>524288</memory>
      <currentMemory unit='KiB'>524288</currentMemory>
      <vcpu placement='static'>2</vcpu>
      <os>
        <type arch='s390x' machine='s390-ccw-virtio-2.12'>hvm</type>
        <boot dev='hd'/>
      </os>
      <clock offset='utc'/>
      <on_poweroff>destroy</on_poweroff>
      <on_reboot>restart</on_reboot>
      <on_crash>preserve</on_crash>
      <devices>
        <emulator>/usr/bin/qemu-system-s390x</emulator>
        <disk type='file' device='disk'>
          <driver name='qemu' type='qcow2'/>
          <source file='/var/lib/libvirt/images/u1604.qcow2'/>
          <target dev='vda' bus='virtio'/>
          <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0000'/>
        </disk>
        <console type='pty'>
          <target type='sclp' port='0'/>
        </console>
        <memballoon model='virtio'>
          <address type='ccw' cssid='0xfe' ssid='0x0' devno='0x0001'/>
        </memballoon>
        <panic model='s390'/>
      </devices>
    </domain>

2. Start domain u1604-1:

    # ls -Z /var/lib/libvirt/images/u1604.qcow2
    system_u:object_r:svirt_image_t:s0:c387,c937 /var/lib/libvirt/images/u1604.qcow2

3. Start domain u1604-2.

4. Result: the libvirtd hangs for 60 seconds after which the expected
   locking message appears. Security labels of the image are not
   changed:

    # virsh start u1604-2
    error: Failed to start domain u1604-2
    error: internal error: child reported: resource busy: Lockspace resource '/var/lib/libvirt/images/u1604.qcow2' is locked

    # ls -Z /var/lib/libvirt/images/u1604.qcow2
    system_u:object_r:svirt_image_t:s0:c387,c937 /var/lib/libvirt/images/u1604.qcow2

Backtrace is the same I sent earlier.

Let me know if you need more info or if anything is borked in my setup.

> 
> Michal
> 
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Bjoern Walk 5 years, 6 months ago
Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]:
> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
> > Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]:
> >> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]:
> 
> >>
> > 
> > Still seeing the same timeout. Is this expected behaviour?
> > 
> 
> Nope. I wonder if something has locked the path and forgot to unlock it
> (however, virtlockd should have unlocked all the paths owned by PID on
> connection close), or something is still holding the lock and connection
> opened.
> 
> Can you see the timeout even when you turn off the selinux driver
> (security_driver="none' in qemu.conf)? I tried to reproduce the issue
> yesterday and was unsuccessful. Do you have any steps to reproduce?

So, I haven't been able to actually dig into the debugging but we have
been able to reproduce this behaviour on x86 (both with SELinux and
DAC). Looks like a general problem, if a problem at all, because from
what I can see in the code, the 60 seconds timeout is actually intended,
or not? The security manager does try for 60 seconds to acquire the lock
and only then bails out. Why is this?

However, an actual bug is that while the security manager waits for the
lock acquire the whole libvirtd hangs, but from what I understood and
Marc explained to me, this is more of a pathological error in libvirt
behaviour with various domain locks.

-- 
IBM Systems
Linux on Z & Virtualization Development
--------------------------------------------------
IBM Deutschland Research & Development GmbH
Schönaicher Str. 220, 71032 Böblingen
Phone: +49 7031 16 1819
--------------------------------------------------
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294 
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Michal Privoznik 5 years, 6 months ago
On 09/27/2018 09:01 AM, Bjoern Walk wrote:
> Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]:
>> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
>>> Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]:
>>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM
>>>> +0200]:
>> 
>>>> 
>>> 
>>> Still seeing the same timeout. Is this expected behaviour?
>>> 
>> 
>> Nope. I wonder if something has locked the path and forgot to
>> unlock it (however, virtlockd should have unlocked all the paths
>> owned by PID on connection close), or something is still holding
>> the lock and connection opened.
>> 
>> Can you see the timeout even when you turn off the selinux driver 
>> (security_driver="none' in qemu.conf)? I tried to reproduce the
>> issue yesterday and was unsuccessful. Do you have any steps to
>> reproduce?
> 
> So, I haven't been able to actually dig into the debugging but we
> have been able to reproduce this behaviour on x86 (both with SELinux
> and DAC). Looks like a general problem, if a problem at all, because
> from what I can see in the code, the 60 seconds timeout is actually
> intended, or not? The security manager does try for 60 seconds to
> acquire the lock and only then bails out. Why is this?

The ideal solution would be to just tell virlockd "these are the paths I
want you to lock on my behalf" and virtlockd would use F_SETLKW so that
the moment all paths are unlocked virtlockd will lock them and libvirtd
can continue its execution (i.e. chown() and setfcon()). However, we
can't do this because virtlockd runs single threaded [1] and therefore
if one thread is waiting for lock to be acquired there is no other
thread to unlock the path.

Therefore I had to move the logic into libvirtd which tries repeatedly
to lock all the paths needed. And this is where the timeout steps in -
the lock acquiring attempts are capped at 60 seconds. This is an
arbitrary chosen timeout. We can make it smaller, but that will not
solve your problem - only mask it.

> 
> However, an actual bug is that while the security manager waits for
> the lock acquire the whole libvirtd hangs, but from what I understood
> and Marc explained to me, this is more of a pathological error in
> libvirt behaviour with various domain locks.
> 

Whole libvirtd shouldn't hang. Only those threads which try to acquire
domain object lock. IOW it should be possible to run APIs over different
domains (or other objects for that matter). For instance dumpxml over
different domain works just fine.

Anyway, we need to get to the bottom of this. Looks like something keeps
the file locked and then when libvirt wants to lock if for metadata the
timeout is hit and whole operation fails. Do you mind running 'lslocks
-u' when starting a domain and before the timeout is hit?

Michal

1: The reason that virtlockd has to run single threaded is stupidity of
POSIX file locks. Imagine one thread doing: open() + fcntl(fd, F_SETLKW,
...) and entering critical section. If another thread does open() +
close() on the same file the file is unlocked. Because we can't
guarantee this will not happen in multithreaded libvirt we had to
introduce a separate process to take care of that. And this process has
to be single threaded so there won't ever be the second thread to call
close() and unintentionally release the lock.

Perhaps we could use OFD locks but those are not part of POSIX and are
available on Linux only.

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Marc Hartmayer 5 years, 6 months ago
On Thu, Sep 27, 2018 at 10:15 AM +0200, Michal Privoznik <mprivozn@redhat.com> wrote:
> On 09/27/2018 09:01 AM, Bjoern Walk wrote:
>> Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]:
>>> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
>>>> Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]:
>>>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM
>>>>> +0200]:
>>>
>>>>>
>>>>
>>>> Still seeing the same timeout. Is this expected behaviour?
>>>>
>>>
>>> Nope. I wonder if something has locked the path and forgot to
>>> unlock it (however, virtlockd should have unlocked all the paths
>>> owned by PID on connection close), or something is still holding
>>> the lock and connection opened.
>>>
>>> Can you see the timeout even when you turn off the selinux driver
>>> (security_driver="none' in qemu.conf)? I tried to reproduce the
>>> issue yesterday and was unsuccessful. Do you have any steps to
>>> reproduce?
>>
>> So, I haven't been able to actually dig into the debugging but we
>> have been able to reproduce this behaviour on x86 (both with SELinux
>> and DAC). Looks like a general problem, if a problem at all, because
>> from what I can see in the code, the 60 seconds timeout is actually
>> intended, or not? The security manager does try for 60 seconds to
>> acquire the lock and only then bails out. Why is this?
>
> The ideal solution would be to just tell virlockd "these are the paths I
> want you to lock on my behalf" and virtlockd would use F_SETLKW so that
> the moment all paths are unlocked virtlockd will lock them and libvirtd
> can continue its execution (i.e. chown() and setfcon()). However, we
> can't do this because virtlockd runs single threaded [1] and therefore
> if one thread is waiting for lock to be acquired there is no other
> thread to unlock the path.
>
> Therefore I had to move the logic into libvirtd which tries repeatedly
> to lock all the paths needed. And this is where the timeout steps in -
> the lock acquiring attempts are capped at 60 seconds. This is an
> arbitrary chosen timeout. We can make it smaller, but that will not
> solve your problem - only mask it.
>
>>
>> However, an actual bug is that while the security manager waits for
>> the lock acquire the whole libvirtd hangs, but from what I understood
>> and Marc explained to me, this is more of a pathological error in
>> libvirt behaviour with various domain locks.
>>
>
> Whole libvirtd shouldn't hang.

The main loop doesn’t hang.

> Only those threads which try to acquire
> domain object lock. IOW it should be possible to run APIs over different
> domains (or other objects for that matter). For instance dumpxml over
> different domain works just fine.

Oh sry, my assumption that the thread A is also holding the
virDomainObjList lock is wrong!

Here is the actual backtrace of a waiting thread (in this case the
“worker thread for the "virsh list" command”)

(gdb) bt
#0  0x00007f850887b7ed in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f8508874cf4 in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007f850c26cfd9 in virMutexLock (m=<optimized out>) at util/virthread.c:89
#3  0x00007f850c2455df in virObjectLock (anyobj=<optimized out>) at util/virobject.c:429
#4  0x00007f850c2ceaba in virDomainObjListFilter (list=list@entry=0x7f84f47a97c0, nvms=nvms@entry=0x7f84f47a97c8, conn=conn@entry=0x7f84dc01ad00, filter=filter@entry=0x7f850c318c70 <virConnectListAllDomainsCheckACL>, flags=flags@entry=1) at conf/virdomainobjlist.c:919
#5  0x00007f850c2cfef2 in virDomainObjListCollect (domlist=0x7f848c10e290, conn=conn@entry=0x7f84dc01ad00, vms=vms@entry=0x7f84f47a9820, nvms=nvms@entry=0x7f84f47a9830, filter=0x7f850c318c70 <virConnectListAllDomainsCheckACL>, flags=1) at conf/virdomainobjlist.c:961
#6  0x00007f850c2d01a6 in virDomainObjListExport (domlist=<optimized out>, conn=0x7f84dc01ad00, domains=0x7f84f47a9890, filter=<optimized out>, flags=<optimized out>) at conf/virdomainobjlist.c:1042
#7  0x00007f850c426be9 in virConnectListAllDomains (conn=0x7f84dc01ad00, domains=0x7f84f47a9890, flags=1) at libvirt-domain.c:6493
#8  0x000055a82b2d905d in remoteDispatchConnectListAllDomains (server=0x55a82c9e16d0, msg=0x55a82ca53380, args=0x7f84dc01ae30, args=0x7f84dc01ae30, ret=0x7f84dc005bb0, rerr=0x7f84f47a9960, client=<optimized out>) at remote/remote_daemon_dispatch_stubs.h:1372
#9  remoteDispatchConnectListAllDomainsHelper (server=0x55a82c9e16d0, client=<optimized out>, msg=0x55a82ca53380, rerr=0x7f84f47a9960, args=0x7f84dc01ae30, ret=0x7f84dc005bb0) at remote/remote_daemon_dispatch_stubs.h:1348
#10 0x00007f850c338734 in virNetServerProgramDispatchCall (msg=0x55a82ca53380, client=0x55a82ca52a40, server=0x55a82c9e16d0, prog=0x55a82ca4be70) at rpc/virnetserverprogram.c:437
…

virDomainObjListFilter needs the lock of every domain… That’s the actual
problem.

>
> Anyway, we need to get to the bottom of this. Looks like something keeps
> the file locked and then when libvirt wants to lock if for metadata the
> timeout is hit and whole operation fails. Do you mind running 'lslocks
> -u' when starting a domain and before the timeout is hit?
>
> Michal
>
> 1: The reason that virtlockd has to run single threaded is stupidity of
> POSIX file locks. Imagine one thread doing: open() + fcntl(fd, F_SETLKW,
> ...) and entering critical section. If another thread does open() +
> close() on the same file the file is unlocked. Because we can't
> guarantee this will not happen in multithreaded libvirt we had to
> introduce a separate process to take care of that. And this process has
> to be single threaded so there won't ever be the second thread to call
> close() and unintentionally release the lock.
>
> Perhaps we could use OFD locks but those are not part of POSIX and are
> available on Linux only.
>
--
Kind regards / Beste Grüße
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Bjoern Walk 5 years, 6 months ago
Michal Privoznik <mprivozn@redhat.com> [2018-09-27, 10:15AM +0200]:
> On 09/27/2018 09:01 AM, Bjoern Walk wrote:
> > Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]:
> >> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
> >>> Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]:
> >>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM
> >>>> +0200]:
> >> 
> >>>> 
> >>> 
> >>> Still seeing the same timeout. Is this expected behaviour?
> >>> 
> >> 
> >> Nope. I wonder if something has locked the path and forgot to
> >> unlock it (however, virtlockd should have unlocked all the paths
> >> owned by PID on connection close), or something is still holding
> >> the lock and connection opened.
> >> 
> >> Can you see the timeout even when you turn off the selinux driver 
> >> (security_driver="none' in qemu.conf)? I tried to reproduce the
> >> issue yesterday and was unsuccessful. Do you have any steps to
> >> reproduce?
> > 
> > So, I haven't been able to actually dig into the debugging but we
> > have been able to reproduce this behaviour on x86 (both with SELinux
> > and DAC). Looks like a general problem, if a problem at all, because
> > from what I can see in the code, the 60 seconds timeout is actually
> > intended, or not? The security manager does try for 60 seconds to
> > acquire the lock and only then bails out. Why is this?
> 
> The ideal solution would be to just tell virlockd "these are the paths I
> want you to lock on my behalf" and virtlockd would use F_SETLKW so that
> the moment all paths are unlocked virtlockd will lock them and libvirtd
> can continue its execution (i.e. chown() and setfcon()). However, we
> can't do this because virtlockd runs single threaded [1] and therefore
> if one thread is waiting for lock to be acquired there is no other
> thread to unlock the path.
> 
> Therefore I had to move the logic into libvirtd which tries repeatedly
> to lock all the paths needed. And this is where the timeout steps in -
> the lock acquiring attempts are capped at 60 seconds. This is an
> arbitrary chosen timeout. We can make it smaller, but that will not
> solve your problem - only mask it.

I still don't understand why we need a timeout at all. If virtlockd is
unable to get the lock, just bail and continue with what you did after
the timeout runs out. Is this some kind of safety-measure? Against what?

> 
> > 
> > However, an actual bug is that while the security manager waits for
> > the lock acquire the whole libvirtd hangs, but from what I understood
> > and Marc explained to me, this is more of a pathological error in
> > libvirt behaviour with various domain locks.
> > 
> 
> Whole libvirtd shouldn't hang. Only those threads which try to acquire
> domain object lock. IOW it should be possible to run APIs over different
> domains (or other objects for that matter). For instance dumpxml over
> different domain works just fine.

Yes, but from a user perspective, this is still pretty bad and
unexpected. libvirt should continue to operate as usual while virtlockd
is figuring out the locking.

> Anyway, we need to get to the bottom of this. Looks like something keeps
> the file locked and then when libvirt wants to lock if for metadata the
> timeout is hit and whole operation fails. Do you mind running 'lslocks
> -u' when starting a domain and before the timeout is hit?

There IS a lock held on the image, from the FIRST domain that we
started. The second domain, which is using the SAME image, unshared,
runs into the locking timeout. Sorry if I failed to describe this setup
appropriately. I am starting to believe that this is expected behaviour,
although it is not intuitive.

	# lslocks -u
	COMMAND            PID   TYPE SIZE MODE  M START        END PATH
	...
	virtlockd       199062  POSIX 1.5G WRITE 0     0          0 /var/lib/libvirt/images/u1604.qcow2
	...

> 
> Michal
> 
> 1: The reason that virtlockd has to run single threaded is stupidity of
> POSIX file locks. Imagine one thread doing: open() + fcntl(fd, F_SETLKW,
> ...) and entering critical section. If another thread does open() +
> close() on the same file the file is unlocked. Because we can't
> guarantee this will not happen in multithreaded libvirt we had to
> introduce a separate process to take care of that. And this process has
> to be single threaded so there won't ever be the second thread to call
> close() and unintentionally release the lock.

Thanks for the explanation, I will have some reading to do to get a
better overview of the locking process in Linux.

> 
> Perhaps we could use OFD locks but those are not part of POSIX and are
> available on Linux only.
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
> 

-- 
IBM Systems
Linux on Z & Virtualization Development
--------------------------------------------------
IBM Deutschland Research & Development GmbH
Schönaicher Str. 220, 71032 Böblingen
Phone: +49 7031 16 1819
--------------------------------------------------
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294 
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Michal Privoznik 5 years, 6 months ago
On 09/27/2018 11:11 AM, Bjoern Walk wrote:
> Michal Privoznik <mprivozn@redhat.com> [2018-09-27, 10:15AM +0200]:
>> On 09/27/2018 09:01 AM, Bjoern Walk wrote:
>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]:
>>>> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
>>>>> Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]:
>>>>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM
>>>>>> +0200]:
>>>>
>>>>>>
>>>>>
>>>>> Still seeing the same timeout. Is this expected behaviour?
>>>>>
>>>>
>>>> Nope. I wonder if something has locked the path and forgot to
>>>> unlock it (however, virtlockd should have unlocked all the paths
>>>> owned by PID on connection close), or something is still holding
>>>> the lock and connection opened.
>>>>
>>>> Can you see the timeout even when you turn off the selinux driver 
>>>> (security_driver="none' in qemu.conf)? I tried to reproduce the
>>>> issue yesterday and was unsuccessful. Do you have any steps to
>>>> reproduce?
>>>
>>> So, I haven't been able to actually dig into the debugging but we
>>> have been able to reproduce this behaviour on x86 (both with SELinux
>>> and DAC). Looks like a general problem, if a problem at all, because
>>> from what I can see in the code, the 60 seconds timeout is actually
>>> intended, or not? The security manager does try for 60 seconds to
>>> acquire the lock and only then bails out. Why is this?
>>
>> The ideal solution would be to just tell virlockd "these are the paths I
>> want you to lock on my behalf" and virtlockd would use F_SETLKW so that
>> the moment all paths are unlocked virtlockd will lock them and libvirtd
>> can continue its execution (i.e. chown() and setfcon()). However, we
>> can't do this because virtlockd runs single threaded [1] and therefore
>> if one thread is waiting for lock to be acquired there is no other
>> thread to unlock the path.
>>
>> Therefore I had to move the logic into libvirtd which tries repeatedly
>> to lock all the paths needed. And this is where the timeout steps in -
>> the lock acquiring attempts are capped at 60 seconds. This is an
>> arbitrary chosen timeout. We can make it smaller, but that will not
>> solve your problem - only mask it.
> 
> I still don't understand why we need a timeout at all. If virtlockd is
> unable to get the lock, just bail and continue with what you did after
> the timeout runs out. Is this some kind of safety-measure? Against what?

When there are two threads fighting over the lock. Imagine two threads
trying to set security label over the same file. Or imagine two daemons
on two distant hosts trying to set label on the same file on NFS.
virtlockd implements try-or-fail approach. So we need to call lock()
repeatedly until we succeed (or timeout).

> 
>>
>>>
>>> However, an actual bug is that while the security manager waits for
>>> the lock acquire the whole libvirtd hangs, but from what I understood
>>> and Marc explained to me, this is more of a pathological error in
>>> libvirt behaviour with various domain locks.
>>>
>>
>> Whole libvirtd shouldn't hang. Only those threads which try to acquire
>> domain object lock. IOW it should be possible to run APIs over different
>> domains (or other objects for that matter). For instance dumpxml over
>> different domain works just fine.
> 
> Yes, but from a user perspective, this is still pretty bad and
> unexpected. libvirt should continue to operate as usual while virtlockd
> is figuring out the locking.

Agreed. I will look into this.

> 
>> Anyway, we need to get to the bottom of this. Looks like something keeps
>> the file locked and then when libvirt wants to lock if for metadata the
>> timeout is hit and whole operation fails. Do you mind running 'lslocks
>> -u' when starting a domain and before the timeout is hit?
> 
> There IS a lock held on the image, from the FIRST domain that we
> started. The second domain, which is using the SAME image, unshared,
> runs into the locking timeout. Sorry if I failed to describe this setup
> appropriately. I am starting to believe that this is expected behaviour,
> although it is not intuitive.
> 
> 	# lslocks -u
> 	COMMAND            PID   TYPE SIZE MODE  M START        END PATH
> 	...
> 	virtlockd       199062  POSIX 1.5G WRITE 0     0          0 /var/lib/libvirt/images/u1604.qcow2

But this should not clash, because metadata lock use different bytes:
the regular locking uses offset=0, metadata lock uses offset=1. Both
locks lock just one byte in length.

However, from the logs it looks like virtlockd doesn't try to actually
acquire the lock because it found in internal records that the path is
locked (even though it is locked with different offset). Anyway, now I
am finally able to reproduce the issue and I'll look into this too.

Quick and dirty patch might look something like this:

diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c
index 79fa48d365..3c51d7926b 100644
--- i/src/util/virlockspace.c
+++ w/src/util/virlockspace.c
@@ -630,8 +630,9 @@ int virLockSpaceAcquireResource(virLockSpacePtr
lockspace,
     virMutexLock(&lockspace->lock);

     if ((res = virHashLookup(lockspace->resources, resname))) {
-        if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) &&
-            (flags & VIR_LOCK_SPACE_ACQUIRE_SHARED)) {
+        if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED &&
+            flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) ||
+            start == 1) {

             if (VIR_EXPAND_N(res->owners, res->nOwners, 1) < 0)
                 goto cleanup;


But as I say, this is very dirty hack. I need to find a better solution.
At least you might have something to test.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Bjoern Walk 5 years, 6 months ago
Michal Privoznik <mprivozn@redhat.com> [2018-09-27, 12:07PM +0200]:
> On 09/27/2018 11:11 AM, Bjoern Walk wrote:
> > I still don't understand why we need a timeout at all. If virtlockd is
> > unable to get the lock, just bail and continue with what you did after
> > the timeout runs out. Is this some kind of safety-measure? Against what?
> 
> When there are two threads fighting over the lock. Imagine two threads
> trying to set security label over the same file. Or imagine two daemons
> on two distant hosts trying to set label on the same file on NFS.
> virtlockd implements try-or-fail approach. So we need to call lock()
> repeatedly until we succeed (or timeout).

One thread wins and the other fails with lock contention? I don't see
how repeatedly trying to acquire the lock will improve things, but maybe
I am not getting it right now.

> > There IS a lock held on the image, from the FIRST domain that we
> > started. The second domain, which is using the SAME image, unshared,
> > runs into the locking timeout. Sorry if I failed to describe this setup
> > appropriately. I am starting to believe that this is expected behaviour,
> > although it is not intuitive.
> > 
> > 	# lslocks -u
> > 	COMMAND            PID   TYPE SIZE MODE  M START        END PATH
> > 	...
> > 	virtlockd       199062  POSIX 1.5G WRITE 0     0          0 /var/lib/libvirt/images/u1604.qcow2
> 
> But this should not clash, because metadata lock use different bytes:
> the regular locking uses offset=0, metadata lock uses offset=1. Both
> locks lock just one byte in length.

It's the only lock in the output for the image. Should I see the
metadata lock at start=1 as well?

> However, from the logs it looks like virtlockd doesn't try to actually
> acquire the lock because it found in internal records that the path is
> locked (even though it is locked with different offset). Anyway, now I
> am finally able to reproduce the issue and I'll look into this too.

Good.

> Quick and dirty patch might look something like this:
> 
> diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c
> index 79fa48d365..3c51d7926b 100644
> --- i/src/util/virlockspace.c
> +++ w/src/util/virlockspace.c
> @@ -630,8 +630,9 @@ int virLockSpaceAcquireResource(virLockSpacePtr
> lockspace,
>      virMutexLock(&lockspace->lock);
> 
>      if ((res = virHashLookup(lockspace->resources, resname))) {
> -        if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) &&
> -            (flags & VIR_LOCK_SPACE_ACQUIRE_SHARED)) {
> +        if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED &&
> +            flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) ||
> +            start == 1) {
> 
>              if (VIR_EXPAND_N(res->owners, res->nOwners, 1) < 0)
>                  goto cleanup;

Had a quick test with this, but now it seems like that the whole
metadata locking does nothing. When starting the second domain, I get an
internal error from QEMU, failing to get the write lock. The SELinux
labels of the image are changed as well. This is the same behaviour as
with metadata locking disabled. Not entirely sure what should happen or
if this is a error in my setup or not. I will have to think about this.

> But as I say, this is very dirty hack. I need to find a better solution.
> At least you might have something to test.
> 
> Michal
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
> 

-- 
IBM Systems
Linux on Z & Virtualization Development
--------------------------------------------------
IBM Deutschland Research & Development GmbH
Schönaicher Str. 220, 71032 Böblingen
Phone: +49 7031 16 1819
--------------------------------------------------
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294 
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Michal Privoznik 5 years, 6 months ago
On 09/27/2018 01:16 PM, Bjoern Walk wrote:
> Michal Privoznik <mprivozn@redhat.com> [2018-09-27, 12:07PM +0200]:
>> On 09/27/2018 11:11 AM, Bjoern Walk wrote:
>>> I still don't understand why we need a timeout at all. If virtlockd is
>>> unable to get the lock, just bail and continue with what you did after
>>> the timeout runs out. Is this some kind of safety-measure? Against what?
>>
>> When there are two threads fighting over the lock. Imagine two threads
>> trying to set security label over the same file. Or imagine two daemons
>> on two distant hosts trying to set label on the same file on NFS.
>> virtlockd implements try-or-fail approach. So we need to call lock()
>> repeatedly until we succeed (or timeout).
> 
> One thread wins and the other fails with lock contention? I don't see
> how repeatedly trying to acquire the lock will improve things, but maybe
> I am not getting it right now.

The point of metadata locking is not to prevent metadata change, but to
be able to atomically change it. As I said in other thread, there is not
much point in this feature alone since chown() and setfcon() are atomic
themselves. But this is preparing the code for original label
remembering which will be stored in XATTRs at which point the whole
operation is not going to be atomic.

https://www.redhat.com/archives/libvir-list/2018-September/msg01349.html

Long story short, at the first chown() libvirt will record the original
owner of the file into XATTRs and then on the last restore it will use
that owner to return the file to instead of root:root. The locks are
there because reading XATTR, parsing it, increasing/decreasing
refcounter and possibly doing chown() is not atomic. But with locks it is.

> 
>>> There IS a lock held on the image, from the FIRST domain that we
>>> started. The second domain, which is using the SAME image, unshared,
>>> runs into the locking timeout. Sorry if I failed to describe this setup
>>> appropriately. I am starting to believe that this is expected behaviour,
>>> although it is not intuitive.
>>>
>>> 	# lslocks -u
>>> 	COMMAND            PID   TYPE SIZE MODE  M START        END PATH
>>> 	...
>>> 	virtlockd       199062  POSIX 1.5G WRITE 0     0          0 /var/lib/libvirt/images/u1604.qcow2
>>
>> But this should not clash, because metadata lock use different bytes:
>> the regular locking uses offset=0, metadata lock uses offset=1. Both
>> locks lock just one byte in length.
> 
> It's the only lock in the output for the image. Should I see the
> metadata lock at start=1 as well?

You should see it if you run lslocks at the right moment. But since
metadata locking happens only for a fraction of a second, it is very
unlikely that you'll be able to run it at the right moment.

> 
>> However, from the logs it looks like virtlockd doesn't try to actually
>> acquire the lock because it found in internal records that the path is
>> locked (even though it is locked with different offset). Anyway, now I
>> am finally able to reproduce the issue and I'll look into this too.
> 
> Good.
> 
>> Quick and dirty patch might look something like this:
>>
>> diff --git i/src/util/virlockspace.c w/src/util/virlockspace.c
>> index 79fa48d365..3c51d7926b 100644
>> --- i/src/util/virlockspace.c
>> +++ w/src/util/virlockspace.c
>> @@ -630,8 +630,9 @@ int virLockSpaceAcquireResource(virLockSpacePtr
>> lockspace,
>>      virMutexLock(&lockspace->lock);
>>
>>      if ((res = virHashLookup(lockspace->resources, resname))) {
>> -        if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) &&
>> -            (flags & VIR_LOCK_SPACE_ACQUIRE_SHARED)) {
>> +        if ((res->flags & VIR_LOCK_SPACE_ACQUIRE_SHARED &&
>> +            flags & VIR_LOCK_SPACE_ACQUIRE_SHARED) ||
>> +            start == 1) {
>>
>>              if (VIR_EXPAND_N(res->owners, res->nOwners, 1) < 0)
>>                  goto cleanup;
> 
> Had a quick test with this, but now it seems like that the whole
> metadata locking does nothing. When starting the second domain, I get an
> internal error from QEMU, failing to get the write lock. The SELinux
> labels of the image are changed as well. This is the same behaviour as
> with metadata locking disabled. Not entirely sure what should happen or
> if this is a error in my setup or not. I will have to think about this.

Metadata locking is not supposed to prevent you from running two domains
with the same disk. Nor to prevent chown() on the file.
But since you are using virtlockd to lock the disk contents, it should
prevent running such domains, not qemu. I wonder if this is a
misconfiguration or something. Perhaps one domain has disk RW and the
other has it RO + shared?

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Marc Hartmayer 5 years, 6 months ago
On Thu, Sep 27, 2018 at 09:01 AM +0200, Bjoern Walk <bwalk@linux.ibm.com> wrote:
> Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]:
>> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
>> > Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]:
>> >> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]:
>>
>> >>
>> >
>> > Still seeing the same timeout. Is this expected behaviour?
>> >
>>
>> Nope. I wonder if something has locked the path and forgot to unlock it
>> (however, virtlockd should have unlocked all the paths owned by PID on
>> connection close), or something is still holding the lock and connection
>> opened.
>>
>> Can you see the timeout even when you turn off the selinux driver
>> (security_driver="none' in qemu.conf)? I tried to reproduce the issue
>> yesterday and was unsuccessful. Do you have any steps to reproduce?
>
> So, I haven't been able to actually dig into the debugging but we have
> been able to reproduce this behaviour on x86 (both with SELinux and
> DAC). Looks like a general problem, if a problem at all, because from
> what I can see in the code, the 60 seconds timeout is actually intended,
> or not? The security manager does try for 60 seconds to acquire the lock
> and only then bails out. Why is this?

Backtrace of libvirtd where it’s hanging (in thread A)

(gdb) bt
#0 read () from /lib64/libpthread.so.0
#1 read (__nbytes=1024, __buf=0x7f84e401afd0, __fd=31) at /usr/include/bits/unistd.h:44
#2 saferead (fd=fd@entry=31, buf=0x7f84e401afd0, count=count@entry=1024) at util/virfile.c:1061
#3 saferead_lim (fd=31, max_len=max_len@entry=1024, length=length@entry=0x7f84f3fa8240) at util/virfile.c:1345
#4 virFileReadHeaderFD (fd=<optimized out>, maxlen=maxlen@entry=1024, buf=buf@entry=0x7f84f3fa8278) at util/virfile.c:1376
#5 virProcessRunInMountNamespace () at util/virprocess.c:1149
#6 virSecuritySELinuxTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_selinux.c:1106
#7 virSecurityManagerTransactionCommit (mgr=0x7f848c0ffd60, pid=pid@entry=23977) at security/security_manager.c:324
#8 virSecurityStackTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_stack.c:166
#9 virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324
#10 qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at _security.c:56
#11 in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=a=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=1 qemu/qemu_process.c:6329
#12 in qemuProcessStart (conn=conn@entry=0x7f84c0003e40,
driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50,
updatedCPU==0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START,
migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0,
snapshot=0x=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at
qemu/qemu_process.c:6816

…

#25 in start_thread () from /lib64/libpthread.so.0
#26 in clone () from /lib64/libc.so.6


Backtrace of the forked process where the process is trying to get the
meta data lock for 60 seconds.

#0  0x00007f8508572730 in nanosleep () from target:/lib64/libc.so.6
#1  0x00007f850859dff4 in usleep () from target:/lib64/libc.so.6
#2  0x00007f850c26efea in virTimeBackOffWait (var=var@entry=0x7f84f3fa81a0) at util/virtime.c:453
#3  0x00007f850c3108d8 in virSecurityManagerMetadataLock (mgr=0x7f848c183850, paths=<optimized out>, npaths=<optimized out>) at security/security_manager.c:1345
#4  0x00007f850c30e44b in virSecurityDACTransactionRun (pid=pid@entry=23977, opaque=opaque@entry=0x7f84e4021450) at security/security_dac.c:226
#5  0x00007f850c250021 in virProcessNamespaceHelper (opaque=0x7f84e4021450, cb=0x7f850c30e330 <virSecurityDACTransactionRun>, pid=23977, errfd=33) at util/virprocess.c:1100
#6  virProcessRunInMountNamespace () at util/virprocess.c:1140
#7  0x00007f850c30e55c in virSecurityDACTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_dac.c:565
#8  0x00007f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183850, pid=pid@entry=23977) at security/security_manager.c:324
#9  0x00007f850c30b36b in virSecurityStackTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_stack.c:166
#10 0x00007f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324
#11 0x00007f84e0586bf2 in qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at qemu/qemu_security.c:56
#12 0x00007f84e051b7fd in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6329
#13 0x00007f84e051ee2e in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6816
#14 0x00007f84e057f5cd in qemuDomainObjStart (conn=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=0x7f848c1c3a50, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at qemu/qemu_driver.c:7296
#15 0x00007f84e057fc19 in qemuDomainCreateWithFlags (dom=0x7f84e4001620, flags=0) at qemu/qemu_driver.c:7349
#16 0x00007f850c426d57 in virDomainCreate (domain=domain@entry=0x7f84e4001620) at libvirt-domain.c:6534
#17 0x000055a82b2f1cae in remoteDispatchDomainCreate (server=0x55a82c9e16d0, msg=0x55a82ca53e00, args=<optimized out>, rerr=0x7f84f3fa8960, client=0x55a82ca5d180) at remote/remote_daemon_dispatch_stubs.h:4434
#18 remoteDispatchDomainCreateHelper (server=0x55a82c9e16d0, client=0x55a82ca5d180, msg=0x55a82ca53e00, rerr=0x7f84f3fa8960, args=<optimized out>, ret=0x7f84e4003800) at remote/remote_daemon_dispatch_stubs.h:4410
#19 0x00007f850c338734 in virNetServerProgramDispatchCall (msg=0x55a82ca53e00, client=0x55a82ca5d180, server=0x55a82c9e16d0, prog=0x55a82ca4be70) at rpc/virnetserverprogram.c:437
#20 virNetServerProgramDispatch (prog=0x55a82ca4be70, server=server@entry=0x55a82c9e16d0, client=0x55a82ca5d180, msg=0x55a82ca53e00) at rpc/virnetserverprogram.c:304
#21 0x00007f850c33ec3c in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55a82c9e16d0) at rpc/virnetserver.c:144
#22 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55a82c9e16d0) at rpc/virnetserver.c:165
#23 0x00007f850c26db20 in virThreadPoolWorker (opaque=opaque@entry=0x55a82c9ebe30) at util/virthreadpool.c:167
#24 0x00007f850c26ce2c in virThreadHelper (data=<optimized out>) at util/virthread.c:206
#25 0x00007f8508872594 in start_thread () from target:/lib64/libpthread.so.0
#26 0x00007f85085a5e6f in clone () from target:/lib64/libc.so.6


>
> However, an actual bug is that while the security manager waits for the
> lock acquire the whole libvirtd hangs, but from what I understood and
> Marc explained to me, this is more of a pathological error in libvirt
> behaviour with various domain locks.

Since thread A owns the lock for the virDomainObjList other operations
like 'virsh list' won’t return for about 60s.

Hope this information will help :)

>
> --
> IBM Systems
> Linux on Z & Virtualization Development
> --------------------------------------------------
> IBM Deutschland Research & Development GmbH
> Schönaicher Str. 220, 71032 Böblingen
> Phone: +49 7031 16 1819
> --------------------------------------------------
> Vorsitzende des Aufsichtsrats: Martina Koederitz
> Geschäftsführung: Dirk Wittkopp
> Sitz der Gesellschaft: Böblingen
> Registergericht: Amtsgericht Stuttgart, HRB 243294
--
Kind regards / Beste Grüße
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Michal Privoznik 5 years, 6 months ago
On 09/27/2018 09:57 AM, Marc Hartmayer wrote:
> On Thu, Sep 27, 2018 at 09:01 AM +0200, Bjoern Walk <bwalk@linux.ibm.com> wrote:
>> Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]:
>>> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
>>>> Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]:
>>>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]:
>>>
>>>>>
>>>>
>>>> Still seeing the same timeout. Is this expected behaviour?
>>>>
>>>
>>> Nope. I wonder if something has locked the path and forgot to unlock it
>>> (however, virtlockd should have unlocked all the paths owned by PID on
>>> connection close), or something is still holding the lock and connection
>>> opened.
>>>
>>> Can you see the timeout even when you turn off the selinux driver
>>> (security_driver="none' in qemu.conf)? I tried to reproduce the issue
>>> yesterday and was unsuccessful. Do you have any steps to reproduce?
>>
>> So, I haven't been able to actually dig into the debugging but we have
>> been able to reproduce this behaviour on x86 (both with SELinux and
>> DAC). Looks like a general problem, if a problem at all, because from
>> what I can see in the code, the 60 seconds timeout is actually intended,
>> or not? The security manager does try for 60 seconds to acquire the lock
>> and only then bails out. Why is this?
> 
> Backtrace of libvirtd where it’s hanging (in thread A)
> 
> (gdb) bt
> #0 read () from /lib64/libpthread.so.0
> #1 read (__nbytes=1024, __buf=0x7f84e401afd0, __fd=31) at /usr/include/bits/unistd.h:44
> #2 saferead (fd=fd@entry=31, buf=0x7f84e401afd0, count=count@entry=1024) at util/virfile.c:1061
> #3 saferead_lim (fd=31, max_len=max_len@entry=1024, length=length@entry=0x7f84f3fa8240) at util/virfile.c:1345
> #4 virFileReadHeaderFD (fd=<optimized out>, maxlen=maxlen@entry=1024, buf=buf@entry=0x7f84f3fa8278) at util/virfile.c:1376
> #5 virProcessRunInMountNamespace () at util/virprocess.c:1149
> #6 virSecuritySELinuxTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_selinux.c:1106
> #7 virSecurityManagerTransactionCommit (mgr=0x7f848c0ffd60, pid=pid@entry=23977) at security/security_manager.c:324
> #8 virSecurityStackTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_stack.c:166
> #9 virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324
> #10 qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at _security.c:56
> #11 in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=a=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=1 qemu/qemu_process.c:6329
> #12 in qemuProcessStart (conn=conn@entry=0x7f84c0003e40,
> driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50,
> updatedCPU==0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START,
> migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0,
> snapshot=0x=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at
> qemu/qemu_process.c:6816


This is just waiting for forked child to finish. Nothing suspicious here.

> 
> …
> 
> #25 in start_thread () from /lib64/libpthread.so.0
> #26 in clone () from /lib64/libc.so.6
> 
> 
> Backtrace of the forked process where the process is trying to get the
> meta data lock for 60 seconds.>
> #0  0x00007f8508572730 in nanosleep () from target:/lib64/libc.so.6
> #1  0x00007f850859dff4 in usleep () from target:/lib64/libc.so.6
> #2  0x00007f850c26efea in virTimeBackOffWait (var=var@entry=0x7f84f3fa81a0) at util/virtime.c:453
> #3  0x00007f850c3108d8 in virSecurityManagerMetadataLock (mgr=0x7f848c183850, paths=<optimized out>, npaths=<optimized out>) at security/security_manager.c:1345
> #4  0x00007f850c30e44b in virSecurityDACTransactionRun (pid=pid@entry=23977, opaque=opaque@entry=0x7f84e4021450) at security/security_dac.c:226
> #5  0x00007f850c250021 in virProcessNamespaceHelper (opaque=0x7f84e4021450, cb=0x7f850c30e330 <virSecurityDACTransactionRun>, pid=23977, errfd=33) at util/virprocess.c:1100
> #6  virProcessRunInMountNamespace () at util/virprocess.c:1140
> #7  0x00007f850c30e55c in virSecurityDACTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_dac.c:565
> #8  0x00007f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183850, pid=pid@entry=23977) at security/security_manager.c:324
> #9  0x00007f850c30b36b in virSecurityStackTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_stack.c:166
> #10 0x00007f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324
> #11 0x00007f84e0586bf2 in qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at qemu/qemu_security.c:56
> #12 0x00007f84e051b7fd in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6329
> #13 0x00007f84e051ee2e in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6816
> #14 0x00007f84e057f5cd in qemuDomainObjStart (conn=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=0x7f848c1c3a50, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at qemu/qemu_driver.c:7296
> #15 0x00007f84e057fc19 in qemuDomainCreateWithFlags (dom=0x7f84e4001620, flags=0) at qemu/qemu_driver.c:7349
> #16 0x00007f850c426d57 in virDomainCreate (domain=domain@entry=0x7f84e4001620) at libvirt-domain.c:6534
> #17 0x000055a82b2f1cae in remoteDispatchDomainCreate (server=0x55a82c9e16d0, msg=0x55a82ca53e00, args=<optimized out>, rerr=0x7f84f3fa8960, client=0x55a82ca5d180) at remote/remote_daemon_dispatch_stubs.h:4434
> #18 remoteDispatchDomainCreateHelper (server=0x55a82c9e16d0, client=0x55a82ca5d180, msg=0x55a82ca53e00, rerr=0x7f84f3fa8960, args=<optimized out>, ret=0x7f84e4003800) at remote/remote_daemon_dispatch_stubs.h:4410
> #19 0x00007f850c338734 in virNetServerProgramDispatchCall (msg=0x55a82ca53e00, client=0x55a82ca5d180, server=0x55a82c9e16d0, prog=0x55a82ca4be70) at rpc/virnetserverprogram.c:437
> #20 virNetServerProgramDispatch (prog=0x55a82ca4be70, server=server@entry=0x55a82c9e16d0, client=0x55a82ca5d180, msg=0x55a82ca53e00) at rpc/virnetserverprogram.c:304
> #21 0x00007f850c33ec3c in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55a82c9e16d0) at rpc/virnetserver.c:144
> #22 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55a82c9e16d0) at rpc/virnetserver.c:165
> #23 0x00007f850c26db20 in virThreadPoolWorker (opaque=opaque@entry=0x55a82c9ebe30) at util/virthreadpool.c:167
> #24 0x00007f850c26ce2c in virThreadHelper (data=<optimized out>) at util/virthread.c:206
> #25 0x00007f8508872594 in start_thread () from target:/lib64/libpthread.so.0
> #26 0x00007f85085a5e6f in clone () from target:/lib64/libc.so.6

Right, so this is just waiting for virtlockd to lock the paths.
virtlockd is obviously unable to do that (as I suggested in my previous
e-mail - is perhaps some other process holding the lock?).

Can you please enable debug logs for virtlockd, reproduce and share the
log with me? Setting this in /etc/libvirt/virtlockd.conf should be enough:

log_level=1
log_filters="4:event 3:rpc"
log_outputs="1:file:/var/log/virtlockd.lod"

Thanks,
Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Marc Hartmayer 5 years, 6 months ago
On Thu, Sep 27, 2018 at 10:14 AM +0200, Michal Privoznik <mprivozn@redhat.com> wrote:
> On 09/27/2018 09:57 AM, Marc Hartmayer wrote:
>> On Thu, Sep 27, 2018 at 09:01 AM +0200, Bjoern Walk <bwalk@linux.ibm.com> wrote:
>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-19, 11:45AM +0200]:
>>>> On 09/19/2018 11:17 AM, Bjoern Walk wrote:
>>>>> Bjoern Walk <bwalk@linux.ibm.com> [2018-09-12, 01:17PM +0200]:
>>>>>> Michal Privoznik <mprivozn@redhat.com> [2018-09-12, 11:32AM +0200]:
>>>>
>>>>>>
>>>>>
>>>>> Still seeing the same timeout. Is this expected behaviour?
>>>>>
>>>>
>>>> Nope. I wonder if something has locked the path and forgot to unlock it
>>>> (however, virtlockd should have unlocked all the paths owned by PID on
>>>> connection close), or something is still holding the lock and connection
>>>> opened.
>>>>
>>>> Can you see the timeout even when you turn off the selinux driver
>>>> (security_driver="none' in qemu.conf)? I tried to reproduce the issue
>>>> yesterday and was unsuccessful. Do you have any steps to reproduce?
>>>
>>> So, I haven't been able to actually dig into the debugging but we have
>>> been able to reproduce this behaviour on x86 (both with SELinux and
>>> DAC). Looks like a general problem, if a problem at all, because from
>>> what I can see in the code, the 60 seconds timeout is actually intended,
>>> or not? The security manager does try for 60 seconds to acquire the lock
>>> and only then bails out. Why is this?
>>
>> Backtrace of libvirtd where it’s hanging (in thread A)
>>
>> (gdb) bt
>> #0 read () from /lib64/libpthread.so.0
>> #1 read (__nbytes=1024, __buf=0x7f84e401afd0, __fd=31) at /usr/include/bits/unistd.h:44
>> #2 saferead (fd=fd@entry=31, buf=0x7f84e401afd0, count=count@entry=1024) at util/virfile.c:1061
>> #3 saferead_lim (fd=31, max_len=max_len@entry=1024, length=length@entry=0x7f84f3fa8240) at util/virfile.c:1345
>> #4 virFileReadHeaderFD (fd=<optimized out>, maxlen=maxlen@entry=1024, buf=buf@entry=0x7f84f3fa8278) at util/virfile.c:1376
>> #5 virProcessRunInMountNamespace () at util/virprocess.c:1149
>> #6 virSecuritySELinuxTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_selinux.c:1106
>> #7 virSecurityManagerTransactionCommit (mgr=0x7f848c0ffd60, pid=pid@entry=23977) at security/security_manager.c:324
>> #8 virSecurityStackTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_stack.c:166
>> #9 virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324
>> #10 qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at _security.c:56
>> #11 in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=a=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=1 qemu/qemu_process.c:6329
>> #12 in qemuProcessStart (conn=conn@entry=0x7f84c0003e40,
>> driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50,
>> updatedCPU==0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START,
>> migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0,
>> snapshot=0x=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at
>> qemu/qemu_process.c:6816
>
>
> This is just waiting for forked child to finish. Nothing suspicious here.
>
>>
>> …
>>
>> #25 in start_thread () from /lib64/libpthread.so.0
>> #26 in clone () from /lib64/libc.so.6
>>
>>
>> Backtrace of the forked process where the process is trying to get the
>> meta data lock for 60 seconds.>
>> #0  0x00007f8508572730 in nanosleep () from target:/lib64/libc.so.6
>> #1  0x00007f850859dff4 in usleep () from target:/lib64/libc.so.6
>> #2  0x00007f850c26efea in virTimeBackOffWait (var=var@entry=0x7f84f3fa81a0) at util/virtime.c:453
>> #3  0x00007f850c3108d8 in virSecurityManagerMetadataLock (mgr=0x7f848c183850, paths=<optimized out>, npaths=<optimized out>) at security/security_manager.c:1345
>> #4  0x00007f850c30e44b in virSecurityDACTransactionRun (pid=pid@entry=23977, opaque=opaque@entry=0x7f84e4021450) at security/security_dac.c:226
>> #5  0x00007f850c250021 in virProcessNamespaceHelper (opaque=0x7f84e4021450, cb=0x7f850c30e330 <virSecurityDACTransactionRun>, pid=23977, errfd=33) at util/virprocess.c:1100
>> #6  virProcessRunInMountNamespace () at util/virprocess.c:1140
>> #7  0x00007f850c30e55c in virSecurityDACTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_dac.c:565
>> #8  0x00007f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183850, pid=pid@entry=23977) at security/security_manager.c:324
>> #9  0x00007f850c30b36b in virSecurityStackTransactionCommit (mgr=<optimized out>, pid=23977) at security/security_stack.c:166
>> #10 0x00007f850c30eeca in virSecurityManagerTransactionCommit (mgr=0x7f848c183760, pid=pid@entry=23977) at security/security_manager.c:324
>> #11 0x00007f84e0586bf2 in qemuSecuritySetAllLabel (driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, stdin_path=stdin_path@entry=0x0) at qemu/qemu_security.c:56
>> #12 0x00007f84e051b7fd in qemuProcessLaunch (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, incoming=incoming@entry=0x0, snapshot=snapshot@entry=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6329
>> #13 0x00007f84e051ee2e in qemuProcessStart (conn=conn@entry=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=vm@entry=0x7f848c1c3a50, updatedCPU=updatedCPU@entry=0x0, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_START, migrateFrom=migrateFrom@entry=0x0, migrateFd=-1, migratePath=0x0, snapshot=0x0, vmop=VIR_NETDEV_VPORT_PROFILE_OP_CREATE, flags=17) at qemu/qemu_process.c:6816
>> #14 0x00007f84e057f5cd in qemuDomainObjStart (conn=0x7f84c0003e40, driver=driver@entry=0x7f848c108c60, vm=0x7f848c1c3a50, flags=flags@entry=0, asyncJob=QEMU_ASYNC_JOB_START) at qemu/qemu_driver.c:7296
>> #15 0x00007f84e057fc19 in qemuDomainCreateWithFlags (dom=0x7f84e4001620, flags=0) at qemu/qemu_driver.c:7349
>> #16 0x00007f850c426d57 in virDomainCreate (domain=domain@entry=0x7f84e4001620) at libvirt-domain.c:6534
>> #17 0x000055a82b2f1cae in remoteDispatchDomainCreate (server=0x55a82c9e16d0, msg=0x55a82ca53e00, args=<optimized out>, rerr=0x7f84f3fa8960, client=0x55a82ca5d180) at remote/remote_daemon_dispatch_stubs.h:4434
>> #18 remoteDispatchDomainCreateHelper (server=0x55a82c9e16d0, client=0x55a82ca5d180, msg=0x55a82ca53e00, rerr=0x7f84f3fa8960, args=<optimized out>, ret=0x7f84e4003800) at remote/remote_daemon_dispatch_stubs.h:4410
>> #19 0x00007f850c338734 in virNetServerProgramDispatchCall (msg=0x55a82ca53e00, client=0x55a82ca5d180, server=0x55a82c9e16d0, prog=0x55a82ca4be70) at rpc/virnetserverprogram.c:437
>> #20 virNetServerProgramDispatch (prog=0x55a82ca4be70, server=server@entry=0x55a82c9e16d0, client=0x55a82ca5d180, msg=0x55a82ca53e00) at rpc/virnetserverprogram.c:304
>> #21 0x00007f850c33ec3c in virNetServerProcessMsg (msg=<optimized out>, prog=<optimized out>, client=<optimized out>, srv=0x55a82c9e16d0) at rpc/virnetserver.c:144
>> #22 virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x55a82c9e16d0) at rpc/virnetserver.c:165
>> #23 0x00007f850c26db20 in virThreadPoolWorker (opaque=opaque@entry=0x55a82c9ebe30) at util/virthreadpool.c:167
>> #24 0x00007f850c26ce2c in virThreadHelper (data=<optimized out>) at util/virthread.c:206
>> #25 0x00007f8508872594 in start_thread () from target:/lib64/libpthread.so.0
>> #26 0x00007f85085a5e6f in clone () from target:/lib64/libc.so.6
>
> Right, so this is just waiting for virtlockd to lock the paths.
> virtlockd is obviously unable to do that (as I suggested in my previous
> e-mail - is perhaps some other process holding the lock?).

The other domain is holding the lock (obviously).

>
> Can you please enable debug logs for virtlockd, reproduce and share the
> log with me? Setting this in /etc/libvirt/virtlockd.conf should be
> enough:

Sure, but I don’t think this is needed. We’re trying to start two
domains which have disks backed on the same image. So the current
behavior is probably intended (apart from the problem with the later API
calls). In my opinion, the timeout is just way too high.

>
> log_level=1
> log_filters="4:event 3:rpc"
> log_outputs="1:file:/var/log/virtlockd.lod"
>
> Thanks,
> Michal
>
--
Kind regards / Beste Grüße
   Marc Hartmayer

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschäftsführung: Dirk Wittkopp
Sitz der Gesellschaft: Böblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294


--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH v4 00/23] Introduce metadata locking
Posted by Bjoern Walk 5 years, 6 months ago
Michal Privoznik <mprivozn@redhat.com> [2018-09-27, 10:14AM +0200]:
> Right, so this is just waiting for virtlockd to lock the paths.
> virtlockd is obviously unable to do that (as I suggested in my previous
> e-mail - is perhaps some other process holding the lock?).

It can not lock the paths, because those paths are already locked by the
first domain. This is intentional. But it should take 60 seconds to
resolve this and block the whole libvirt daemon in the process.

> Can you please enable debug logs for virtlockd, reproduce and share the
> log with me? Setting this in /etc/libvirt/virtlockd.conf should be enough:
> 
> log_level=1
> log_filters="4:event 3:rpc"
> log_outputs="1:file:/var/log/virtlockd.lod"

See attached.
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list