[v2] Implement alternative metadata locking

[libvirt] [PATCH v2 01/11] security: Always spawn process for transactions

Posted by Michal Privoznik 6 years, 8 months ago

In the next commit the virSecurityManagerMetadataLock() is going
to be turned thread unsafe. Therefore, we have to spawn a
separate process for it. Always.

Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
---
 src/security/security_dac.c     | 12 ++++++------
 src/security/security_selinux.c | 12 ++++++------
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/src/security/security_dac.c b/src/security/security_dac.c
index da4a6c72fe..21db3b9684 100644
--- a/src/security/security_dac.c
+++ b/src/security/security_dac.c
@@ -562,12 +562,12 @@ virSecurityDACTransactionCommit(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED,
         goto cleanup;
     }
 
-    if ((pid == -1 &&
-         virSecurityDACTransactionRun(pid, list) < 0) ||
-        (pid != -1 &&
-         virProcessRunInMountNamespace(pid,
-                                       virSecurityDACTransactionRun,
-                                       list) < 0))
+    if (pid == -1)
+        pid = getpid();
+
+    if (virProcessRunInMountNamespace(pid,
+                                      virSecurityDACTransactionRun,
+                                      list) < 0)
         goto cleanup;
 
     ret = 0;
diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c
index 467d1e6bfe..3c847d8dcb 100644
--- a/src/security/security_selinux.c
+++ b/src/security/security_selinux.c
@@ -1103,12 +1103,12 @@ virSecuritySELinuxTransactionCommit(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED,
         goto cleanup;
     }
 
-    if ((pid == -1 &&
-         virSecuritySELinuxTransactionRun(pid, list) < 0) ||
-        (pid != -1 &&
-         virProcessRunInMountNamespace(pid,
-                                       virSecuritySELinuxTransactionRun,
-                                       list) < 0))
+    if (pid == -1)
+        pid = getpid();
+
+    if (virProcessRunInMountNamespace(pid,
+                                      virSecuritySELinuxTransactionRun,
+                                      list) < 0)
         goto cleanup;
 
     ret = 0;
-- 
2.18.0

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 01/11] security: Always spawn process for transactions

Posted by John Ferlan 6 years, 8 months ago


On 10/11/18 8:03 AM, Michal Privoznik wrote:
> In the next commit the virSecurityManagerMetadataLock() is going
> to be turned thread unsafe. Therefore, we have to spawn a
> separate process for it. Always.
> 
> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
> ---
>  src/security/security_dac.c     | 12 ++++++------
>  src/security/security_selinux.c | 12 ++++++------
>  2 files changed, 12 insertions(+), 12 deletions(-)
> 
> diff --git a/src/security/security_dac.c b/src/security/security_dac.c
> index da4a6c72fe..21db3b9684 100644
> --- a/src/security/security_dac.c
> +++ b/src/security/security_dac.c
> @@ -562,12 +562,12 @@ virSecurityDACTransactionCommit(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED,
^^^^

The virSecurityDACTransactionCommit description would need some
adjustment.... Well a lot of adjustment... and the caller
virSecurityManagerTransactionCommit would also require some adjustment.

>          goto cleanup;
>      }
>  
> -    if ((pid == -1 &&
> -         virSecurityDACTransactionRun(pid, list) < 0) ||
> -        (pid != -1 &&
> -         virProcessRunInMountNamespace(pid,
> -                                       virSecurityDACTransactionRun,
> -                                       list) < 0))
> +    if (pid == -1)
> +        pid = getpid();
> +
> +    if (virProcessRunInMountNamespace(pid,
> +                                      virSecurityDACTransactionRun,
> +                                      list) < 0)

As described in the v1 series cover letter, namespaces would need to be
enabled for this to work. Whether they are enabled true everywhere,
who's to say. I always assumed namespaces were used for a somewhat
narrow band of things and didn't pay that close attention to them.

I will note that will a couple of well placed VIR_WARN()'s in each of
these functions, I can see DAC and SELinux transactionCommit's getting
called. Before the guest actually starts they're called with -1, then
once it starts there's a single call with the domain PID. So I guess I'm
using NS and didn't know.

However, looking at the qemu.conf namespace setting I note things like
"privileged" and "!defined(HAVE_SYS_ACL_H) || !defined(WITH_SELINUX)"
So this all then assumes MountNamespace is being used. And what if it
isn't? Does everything that's security related start failing? and the
only way to back that out is revert this?

I guess, what I don't understand is why this usage pattern cannot be
limited to meta locking. That is, if we're metalocking, then add the
pid; otherwise, business as usual.  At least that limits the scope.

FWIW:
In my "limited" testing, I have a guest that failed to start with this
code running:

# virsh start f23
error: Failed to start domain f23
error: internal error: Child process (6767) unexpected fatal signal 11

...

in the code running in the debugger I get:

2018-10-11 18:38:28.491+0000: 5088: error : virProcessWait:274 :
internal error: Child process (6220) unexpected fatal signal 11
2018-10-11 18:38:28.732+0000: 6211: error : learnIPAddressThread:688 :
encountered an error on interface vnet0 index 11: Invalid argument

I narrowed the failure down to my iSCSI disks such as:

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw'/>
      <source protocol='iscsi'
name='iqn.2013-12.com.example:iscsi-chap-netpool/3'>
        <host name='192.168.122.1' port='3260'/>
        <auth username='redhat'>
          <secret type='iscsi' usage='libvirtiscsi'/>
        </auth>
      </source>
      <target dev='vdc' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x0a'
function='0x0'/>
    </disk>

which doesn't make sense given the error, but who really knows... I
don't have the patience to debug much further either. Even stranger, the
same guest with <hostdev>:

    <hostdev mode='subsystem' type='scsi' managed='no'>
      <source protocol='iscsi'
name='iqn.2013-12.com.example:iscsi-chap-netpool/2'>
        <host name='192.168.122.1' port='3260'/>
        <auth username='redhat'>
          <secret type='iscsi' usage='libvirtiscsi'/>
        </auth>
      </source>
      <address type='drive' controller='0' bus='0' target='0' unit='0'/>
    </hostdev>

would work.

My selinux is "Permissive".

John

>          goto cleanup;
>  
>      ret = 0;
> diff --git a/src/security/security_selinux.c b/src/security/security_selinux.c
> index 467d1e6bfe..3c847d8dcb 100644
> --- a/src/security/security_selinux.c
> +++ b/src/security/security_selinux.c
> @@ -1103,12 +1103,12 @@ virSecuritySELinuxTransactionCommit(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED,
>          goto cleanup;
>      }
>  
> -    if ((pid == -1 &&
> -         virSecuritySELinuxTransactionRun(pid, list) < 0) ||
> -        (pid != -1 &&
> -         virProcessRunInMountNamespace(pid,
> -                                       virSecuritySELinuxTransactionRun,
> -                                       list) < 0))
> +    if (pid == -1)
> +        pid = getpid();
> +
> +    if (virProcessRunInMountNamespace(pid,
> +                                      virSecuritySELinuxTransactionRun,
> +                                      list) < 0)
>          goto cleanup;
>  
>      ret = 0;
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 01/11] security: Always spawn process for transactions

Posted by Michal Privoznik 6 years, 8 months ago

On 10/11/2018 11:06 PM, John Ferlan wrote:
> 
> 
> On 10/11/18 8:03 AM, Michal Privoznik wrote:
>> In the next commit the virSecurityManagerMetadataLock() is going
>> to be turned thread unsafe. Therefore, we have to spawn a
>> separate process for it. Always.
>>
>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>> ---
>>  src/security/security_dac.c     | 12 ++++++------
>>  src/security/security_selinux.c | 12 ++++++------
>>  2 files changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/src/security/security_dac.c b/src/security/security_dac.c
>> index da4a6c72fe..21db3b9684 100644
>> --- a/src/security/security_dac.c
>> +++ b/src/security/security_dac.c
>> @@ -562,12 +562,12 @@ virSecurityDACTransactionCommit(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED,
> ^^^^
> 
> The virSecurityDACTransactionCommit description would need some
> adjustment.... Well a lot of adjustment... and the caller
> virSecurityManagerTransactionCommit would also require some adjustment.
> 
>>          goto cleanup;
>>      }
>>  
>> -    if ((pid == -1 &&
>> -         virSecurityDACTransactionRun(pid, list) < 0) ||
>> -        (pid != -1 &&
>> -         virProcessRunInMountNamespace(pid,
>> -                                       virSecurityDACTransactionRun,
>> -                                       list) < 0))
>> +    if (pid == -1)
>> +        pid = getpid();
>> +
>> +    if (virProcessRunInMountNamespace(pid,
>> +                                      virSecurityDACTransactionRun,
>> +                                      list) < 0)
> 
> As described in the v1 series cover letter, namespaces would need to be
> enabled for this to work.

Not really. I mention namespaces just to show that on majority of setups
we do fork() when relabeling paths anyway. Some might be worried that
fork()-ing is very expensive and therefore might be against this
approach. I am trying to prove them otherwise. In fact, this patch is
just about that - fork() every time, even when namespaces are disabled.

Again, I have no numbers to support that it's the majority, but since
namespaces are turned on by default (and no new bugs regarding
namespaces were to be seen recently) I am assuming that is the case.

> Whether they are enabled true everywhere,
> who's to say. I always assumed namespaces were used for a somewhat
> narrow band of things and didn't pay that close attention to them.

No. Unless you take the extra step and disable them in qemu.conf you're
using them. And the fact, that you haven't noticed feels like pat on a
shoulder. Good job :-) You can verify this by running the following:

# nsenter -m -t $(pgrep qemu) ls -l /dev

(works if there's just one qemu process, or just replace pgrep with qemu
pid)

You will see very few items in /dev compared to how 'real' /dev looks:

# nsenter -m -t $(pgrep qemu) ls -l /dev | wc -l
15
# ls -l /dev | wc -l
188

> 
> I will note that will a couple of well placed VIR_WARN()'s in each of
> these functions, I can see DAC and SELinux transactionCommit's getting
> called. Before the guest actually starts they're called with -1, then
> once it starts there's a single call with the domain PID. So I guess I'm
> using NS and didn't know.

Yes. However, during the cmd line construction process some files are
created and qemuSecurityDomainSetPathLabel() is called even if there is
no qemu running yet. Therefore, security driver sees pid == -1. For
instance, qemuDomainWriteMasterKeyFile() is such function. But this is
not a problem (from metadata locking POV).

> 
> However, looking at the qemu.conf namespace setting I note things like
> "privileged" and "!defined(HAVE_SYS_ACL_H) || !defined(WITH_SELINUX)"
> So this all then assumes MountNamespace is being used. And what if it
> isn't? Does everything that's security related start failing? and the
> only way to back that out is revert this?

No. Namespaces have nothing to do with this feature. Namespaces are
Linux feature and therefore when qemu driver initializes itself it has
to make decision on the defaults. And currently the default is to enable
namespaces if supported by host (= Linux && (at least one of {ACL,
SELinux} was enabled in the build) ). I'm not going to explain that
process now as it has nothing to do with this feature.

> 
> I guess, what I don't understand is why this usage pattern cannot be
> limited to meta locking. That is, if we're metalocking, then add the
> pid; otherwise, business as usual.  At least that limits the scope.

Namespaces and metadata locking are orthogonal features.

> 
> FWIW:
> In my "limited" testing, I have a guest that failed to start with this
> code running:
> 
> # virsh start f23
> error: Failed to start domain f23
> error: internal error: Child process (6767) unexpected fatal signal 11

Ouch. After some debugging I've found the problem. Previously, NULL
paths were not passed to virSecurityManagerMetadataLock() because they
were filtered out in DAC and SELinux drivers. But in v2 I've
de-duplicated the code. And the first thing that
virSecurityManagerMetadataLock() does is sort the paths (in order to
avoid AB/BA locking issue). For that, I've used qsort() with plain
strcmp() which can't handle NULL strings. The fix is quite easy:

diff --git i/src/security/security_manager.c
w/src/security/security_manager.c
index c054c74540..1f3e43b246 100644
--- i/src/security/security_manager.c
+++ w/src/security/security_manager.c
@@ -1257,8 +1257,17 @@ struct _virSecurityManagerMetadataLockState {
 static int
 cmpstringp(const void *p1, const void *p2)
 {
+    const char *s1 = *(char * const *) p1;
+    const char *s2 = *(char * const *) p2;
+
+    if (!s1 && !s2)
+        return 0;
+
+    if (!s1 || !s2)
+        return s2 ? -1 : 1;
+
     /* from man 3 qsort */
-    return strcmp(* (char * const *) p1, * (char * const *) p2);
+    return strcmp(s1, s2);
 }

 #define METADATA_OFFSET 1

I've squashed this into patch #2.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 01/11] security: Always spawn process for transactions

Posted by John Ferlan 6 years, 8 months ago

On 10/12/18 4:45 AM, Michal Privoznik wrote:
> On 10/11/2018 11:06 PM, John Ferlan wrote:
>>
>>
>> On 10/11/18 8:03 AM, Michal Privoznik wrote:
>>> In the next commit the virSecurityManagerMetadataLock() is going
>>> to be turned thread unsafe. Therefore, we have to spawn a
>>> separate process for it. Always.
>>>
>>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>>> ---
>>>  src/security/security_dac.c     | 12 ++++++------
>>>  src/security/security_selinux.c | 12 ++++++------
>>>  2 files changed, 12 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/src/security/security_dac.c b/src/security/security_dac.c
>>> index da4a6c72fe..21db3b9684 100644
>>> --- a/src/security/security_dac.c
>>> +++ b/src/security/security_dac.c
>>> @@ -562,12 +562,12 @@ virSecurityDACTransactionCommit(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED,
>> ^^^^
>>
>> The virSecurityDACTransactionCommit description would need some
>> adjustment.... Well a lot of adjustment... and the caller
>> virSecurityManagerTransactionCommit would also require some adjustment.
>>
>>>          goto cleanup;
>>>      }
>>>  
>>> -    if ((pid == -1 &&
>>> -         virSecurityDACTransactionRun(pid, list) < 0) ||
>>> -        (pid != -1 &&
>>> -         virProcessRunInMountNamespace(pid,
>>> -                                       virSecurityDACTransactionRun,
>>> -                                       list) < 0))
>>> +    if (pid == -1)
>>> +        pid = getpid();
>>> +
>>> +    if (virProcessRunInMountNamespace(pid,
>>> +                                      virSecurityDACTransactionRun,
>>> +                                      list) < 0)
>>
>> As described in the v1 series cover letter, namespaces would need to be
>> enabled for this to work.
> 
> Not really. I mention namespaces just to show that on majority of setups
> we do fork() when relabeling paths anyway. Some might be worried that
> fork()-ing is very expensive and therefore might be against this
> approach. I am trying to prove them otherwise. In fact, this patch is
> just about that - fork() every time, even when namespaces are disabled.

Considering the number of fork's out there anyway, that expense wasn't
part of my consideration. It's a means to an end.

If someone disabled Namespace's for whatever reason, then how would this
path work?

    virProcessRunInMountNamespace => virProcessNamespaceHelper (child)

I get that it's "just" a fork, but my point is more that the namespace
is Namespace and if that's not true, then isn't the premise of the patch
not true?

And no IDC about non Linux (where Namespace wouldn't be available) since
non Linux wouldn't have DAC and/or SELinux either.

Perhaps the light dawning over marblehead moment for me is that the
security driver will use Namespaces similar to how the QEMU driver uses
them, but instead in the context of whatever process/thread that isn't
necessarily the domain process (in this case the libvirtd process).

Since it's been proven that the code works for the QEMU domain
processing, ergo it should "just work" for the libvirtd security driver
processing. That's very different from the commit message above and
perhaps the "assumption" I've been thinking under. There isn't a
"relationship" between the qemu.conf usage of namespaces and what's
being done here except for the libvirtd process as opposed to the qemu-*
process will use fork's and Namespaces.

Something perhaps for both the commit message *and* at least the two
modules that change here to more clearly describe the @pid usage. The
virSecurityManagerTransactionCommit will need adjustment too in order to
indicate that the underlying code will run in the Namespace specified by
the caller rather than the Namespace of the caller (a really vague, but
important difference).

> 
> Again, I have no numbers to support that it's the majority, but since
> namespaces are turned on by default (and no new bugs regarding
> namespaces were to be seen recently) I am assuming that is the case.
> 
>> Whether they are enabled true everywhere,
>> who's to say. I always assumed namespaces were used for a somewhat
>> narrow band of things and didn't pay that close attention to them.
> 
> No. Unless you take the extra step and disable them in qemu.conf you're
> using them. 

Well disabling isn't really well described in the qemu.conf description,
but I'm fairly certain it's by uncommenting the entry and removing the
"mount" from the [ ], leaving just "namespaces = [ ]"

For some more data points using the previously failing guest that now
works with the alteration below.

Running with the "default":

# nsenter -m -t $(pgrep qemu) ls -l /dev | wc -l
16
# ls -l /dev | wc -l
269
#

By changing to "namespaces = [ ]", I get:

# nsenter -m -t $(pgrep qemu) ls -l /dev | wc -l
269
# ls -l /dev | wc -l
269
#

Which I'm not sure I expected, but maybe that's right.

Next, if I run without the changes in this series and adding a couple of
debug messages to show when virProcessNamespaceHelper is called, when
namespaces = [ ] there are no calls to to the helper; whereas, there are
two using the default (I assume one each for DAC and SELinux).

With this series applied, when namespaces = [ ] there are 8 calls to to
the helper (all from the "parent" gdb pid); whereas, for the default the
last 2 come from the domain.

I assume that's all expected (at this point).

> And the fact, that you haven't noticed feels like pat on a
> shoulder. Good job :-) You can verify this by running the following:
> 
> # nsenter -m -t $(pgrep qemu) ls -l /dev
> 
> (works if there's just one qemu process, or just replace pgrep with qemu
> pid)
> 
> You will see very few items in /dev compared to how 'real' /dev looks:
> 
> # nsenter -m -t $(pgrep qemu) ls -l /dev | wc -l
> 15
> # ls -l /dev | wc -l
> 188
> 
>>
>> I will note that will a couple of well placed VIR_WARN()'s in each of
>> these functions, I can see DAC and SELinux transactionCommit's getting
>> called. Before the guest actually starts they're called with -1, then
>> once it starts there's a single call with the domain PID. So I guess I'm
>> using NS and didn't know.
> 
> Yes. However, during the cmd line construction process some files are
> created and qemuSecurityDomainSetPathLabel() is called even if there is
> no qemu running yet. Therefore, security driver sees pid == -1. For
> instance, qemuDomainWriteMasterKeyFile() is such function. But this is
> not a problem (from metadata locking POV).
> 
>>
>> However, looking at the qemu.conf namespace setting I note things like
>> "privileged" and "!defined(HAVE_SYS_ACL_H) || !defined(WITH_SELINUX)"
>> So this all then assumes MountNamespace is being used. And what if it
>> isn't? Does everything that's security related start failing? and the
>> only way to back that out is revert this?
> 
> No. Namespaces have nothing to do with this feature. Namespaces are
> Linux feature and therefore when qemu driver initializes itself it has
> to make decision on the defaults. And currently the default is to enable
> namespaces if supported by host (= Linux && (at least one of {ACL,
> SELinux} was enabled in the build) ). I'm not going to explain that
> process now as it has nothing to do with this feature.
> 
>>
>> I guess, what I don't understand is why this usage pattern cannot be
>> limited to meta locking. That is, if we're metalocking, then add the
>> pid; otherwise, business as usual.  At least that limits the scope.
> 
> Namespaces and metadata locking are orthogonal features.
> 

As I note above, I think I've had the light dawning moment now. I think
prior to this point I've been under the impression that the QEMU domain
setting for namespaces was somehow "affecting" or "related to" what's
being done/changed here. In reality it's making use or reusing the same
code that QEMU uses.

Still the one reservation is - what happens if setns isn't available.
For that I think you need to somehow decide that it isn't available and
just pass -1, unless you have some other grand idea.

Whether you build in some "disable" switch just in case is up to you.
You feel confident that things will just work and I can agree, but you
know there are pessimists that may think otherwise.

John

>>
>> FWIW:
>> In my "limited" testing, I have a guest that failed to start with this
>> code running:
>>
>> # virsh start f23
>> error: Failed to start domain f23
>> error: internal error: Child process (6767) unexpected fatal signal 11
> 
> Ouch. After some debugging I've found the problem. Previously, NULL
> paths were not passed to virSecurityManagerMetadataLock() because they
> were filtered out in DAC and SELinux drivers. But in v2 I've
> de-duplicated the code. And the first thing that
> virSecurityManagerMetadataLock() does is sort the paths (in order to
> avoid AB/BA locking issue). For that, I've used qsort() with plain
> strcmp() which can't handle NULL strings. The fix is quite easy:
> 
> diff --git i/src/security/security_manager.c
> w/src/security/security_manager.c
> index c054c74540..1f3e43b246 100644
> --- i/src/security/security_manager.c
> +++ w/src/security/security_manager.c
> @@ -1257,8 +1257,17 @@ struct _virSecurityManagerMetadataLockState {
>  static int
>  cmpstringp(const void *p1, const void *p2)
>  {
> +    const char *s1 = *(char * const *) p1;
> +    const char *s2 = *(char * const *) p2;
> +
> +    if (!s1 && !s2)
> +        return 0;
> +
> +    if (!s1 || !s2)
> +        return s2 ? -1 : 1;
> +
>      /* from man 3 qsort */
> -    return strcmp(* (char * const *) p1, * (char * const *) p2);
> +    return strcmp(s1, s2);
>  }
> 
>  #define METADATA_OFFSET 1
> 
> >
> I've squashed this into patch #2.
> 
> Michal
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list

Re: [libvirt] [PATCH v2 01/11] security: Always spawn process for transactions

Posted by Michal Privoznik 6 years, 8 months ago

On 10/12/2018 04:20 PM, John Ferlan wrote:
> 
> 
> On 10/12/18 4:45 AM, Michal Privoznik wrote:
>> On 10/11/2018 11:06 PM, John Ferlan wrote:
>>>
>>>
>>> On 10/11/18 8:03 AM, Michal Privoznik wrote:
>>>> In the next commit the virSecurityManagerMetadataLock() is going
>>>> to be turned thread unsafe. Therefore, we have to spawn a
>>>> separate process for it. Always.
>>>>
>>>> Signed-off-by: Michal Privoznik <mprivozn@redhat.com>
>>>> ---
>>>>  src/security/security_dac.c     | 12 ++++++------
>>>>  src/security/security_selinux.c | 12 ++++++------
>>>>  2 files changed, 12 insertions(+), 12 deletions(-)
>>>>
>>>> diff --git a/src/security/security_dac.c b/src/security/security_dac.c
>>>> index da4a6c72fe..21db3b9684 100644
>>>> --- a/src/security/security_dac.c
>>>> +++ b/src/security/security_dac.c
>>>> @@ -562,12 +562,12 @@ virSecurityDACTransactionCommit(virSecurityManagerPtr mgr ATTRIBUTE_UNUSED,
>>> ^^^^
>>>
>>> The virSecurityDACTransactionCommit description would need some
>>> adjustment.... Well a lot of adjustment... and the caller
>>> virSecurityManagerTransactionCommit would also require some adjustment.
>>>
>>>>          goto cleanup;
>>>>      }
>>>>  
>>>> -    if ((pid == -1 &&
>>>> -         virSecurityDACTransactionRun(pid, list) < 0) ||
>>>> -        (pid != -1 &&
>>>> -         virProcessRunInMountNamespace(pid,
>>>> -                                       virSecurityDACTransactionRun,
>>>> -                                       list) < 0))
>>>> +    if (pid == -1)
>>>> +        pid = getpid();
>>>> +
>>>> +    if (virProcessRunInMountNamespace(pid,
>>>> +                                      virSecurityDACTransactionRun,
>>>> +                                      list) < 0)
>>>
>>> As described in the v1 series cover letter, namespaces would need to be
>>> enabled for this to work.
>>
>> Not really. I mention namespaces just to show that on majority of setups
>> we do fork() when relabeling paths anyway. Some might be worried that
>> fork()-ing is very expensive and therefore might be against this
>> approach. I am trying to prove them otherwise. In fact, this patch is
>> just about that - fork() every time, even when namespaces are disabled.
> 
> Considering the number of fork's out there anyway, that expense wasn't
> part of my consideration. It's a means to an end.
> 
> If someone disabled Namespace's for whatever reason, then how would this
> path work?
> 
>     virProcessRunInMountNamespace => virProcessNamespaceHelper (child)
> 
> I get that it's "just" a fork, but my point is more that the namespace
> is Namespace and if that's not true, then isn't the premise of the patch
> not true?

Every Linux process runs in a namespace (unless kernel was compiled
without NS support). That is, even libvirtd runs in a namespace:

# ls -l /proc/$(pgrep libvirtd)/ns
lrwxrwxrwx. 1 root root 0 Oct 16 09:06 cgroup -> 'cgroup:[4026531835]'
lrwxrwxrwx. 1 root root 0 Oct 16 09:06 ipc -> 'ipc:[4026531839]'
lrwxrwxrwx. 1 root root 0 Oct 16 09:06 mnt -> 'mnt:[4026531840]'
...


> 
> And no IDC about non Linux (where Namespace wouldn't be available) since
> non Linux wouldn't have DAC and/or SELinux either.

Actually, we have to care about non Linux, e.g. BSD does have DAC. Or
even Linux where namespaces were disabled (unlikely but not impossible).

> 
> Perhaps the light dawning over marblehead moment for me is that the
> security driver will use Namespaces similar to how the QEMU driver uses
> them, but instead in the context of whatever process/thread that isn't
> necessarily the domain process (in this case the libvirtd process).
> 
> Since it's been proven that the code works for the QEMU domain
> processing, ergo it should "just work" for the libvirtd security driver
> processing. That's very different from the commit message above and
> perhaps the "assumption" I've been thinking under. There isn't a
> "relationship" between the qemu.conf usage of namespaces and what's
> being done here except for the libvirtd process as opposed to the qemu-*
> process will use fork's and Namespaces.
> 
> Something perhaps for both the commit message *and* at least the two
> modules that change here to more clearly describe the @pid usage. The
> virSecurityManagerTransactionCommit will need adjustment too in order to
> indicate that the underlying code will run in the Namespace specified by
> the caller rather than the Namespace of the caller (a really vague, but
> important difference).
> 
>>
>> Again, I have no numbers to support that it's the majority, but since
>> namespaces are turned on by default (and no new bugs regarding
>> namespaces were to be seen recently) I am assuming that is the case.
>>
>>> Whether they are enabled true everywhere,
>>> who's to say. I always assumed namespaces were used for a somewhat
>>> narrow band of things and didn't pay that close attention to them.
>>
>> No. Unless you take the extra step and disable them in qemu.conf you're
>> using them. 
> 
> Well disabling isn't really well described in the qemu.conf description,
> but I'm fairly certain it's by uncommenting the entry and removing the
> "mount" from the [ ], leaving just "namespaces = [ ]"
> 
> For some more data points using the previously failing guest that now
> works with the alteration below.
> 
> Running with the "default":
> 
> # nsenter -m -t $(pgrep qemu) ls -l /dev | wc -l
> 16
> # ls -l /dev | wc -l
> 269
> #
> 
> By changing to "namespaces = [ ]", I get:
> 
> # nsenter -m -t $(pgrep qemu) ls -l /dev | wc -l
> 269
> # ls -l /dev | wc -l
> 269
> #
> 
> Which I'm not sure I expected, but maybe that's right.

It is, because this means that for namespaces=[] case there is exactly
the same number of items in your host /dev and what qemu sees. Compared
to previous case when namespaces=["mnt"] (the default) host /dev
contained more entries and than what qemu saw.

> 
> Next, if I run without the changes in this series and adding a couple of
> debug messages to show when virProcessNamespaceHelper is called, when
> namespaces = [ ] there are no calls to to the helper; whereas, there are
> two using the default (I assume one each for DAC and SELinux).
> 
> With this series applied, when namespaces = [ ] there are 8 calls to to
> the helper (all from the "parent" gdb pid); whereas, for the default the
> last 2 come from the domain.
> 
> I assume that's all expected (at this point).
> 
>> And the fact, that you haven't noticed feels like pat on a
>> shoulder. Good job :-) You can verify this by running the following:
>>
>> # nsenter -m -t $(pgrep qemu) ls -l /dev
>>
>> (works if there's just one qemu process, or just replace pgrep with qemu
>> pid)
>>
>> You will see very few items in /dev compared to how 'real' /dev looks:
>>
>> # nsenter -m -t $(pgrep qemu) ls -l /dev | wc -l
>> 15
>> # ls -l /dev | wc -l
>> 188
>>
>>>
>>> I will note that will a couple of well placed VIR_WARN()'s in each of
>>> these functions, I can see DAC and SELinux transactionCommit's getting
>>> called. Before the guest actually starts they're called with -1, then
>>> once it starts there's a single call with the domain PID. So I guess I'm
>>> using NS and didn't know.
>>
>> Yes. However, during the cmd line construction process some files are
>> created and qemuSecurityDomainSetPathLabel() is called even if there is
>> no qemu running yet. Therefore, security driver sees pid == -1. For
>> instance, qemuDomainWriteMasterKeyFile() is such function. But this is
>> not a problem (from metadata locking POV).
>>
>>>
>>> However, looking at the qemu.conf namespace setting I note things like
>>> "privileged" and "!defined(HAVE_SYS_ACL_H) || !defined(WITH_SELINUX)"
>>> So this all then assumes MountNamespace is being used. And what if it
>>> isn't? Does everything that's security related start failing? and the
>>> only way to back that out is revert this?
>>
>> No. Namespaces have nothing to do with this feature. Namespaces are
>> Linux feature and therefore when qemu driver initializes itself it has
>> to make decision on the defaults. And currently the default is to enable
>> namespaces if supported by host (= Linux && (at least one of {ACL,
>> SELinux} was enabled in the build) ). I'm not going to explain that
>> process now as it has nothing to do with this feature.
>>
>>>
>>> I guess, what I don't understand is why this usage pattern cannot be
>>> limited to meta locking. That is, if we're metalocking, then add the
>>> pid; otherwise, business as usual.  At least that limits the scope.
>>
>> Namespaces and metadata locking are orthogonal features.
>>
> 
> As I note above, I think I've had the light dawning moment now. I think
> prior to this point I've been under the impression that the QEMU domain
> setting for namespaces was somehow "affecting" or "related to" what's
> being done/changed here. In reality it's making use or reusing the same
> code that QEMU uses.
> 
> Still the one reservation is - what happens if setns isn't available.
> For that I think you need to somehow decide that it isn't available and
> just pass -1, unless you have some other grand idea.

Yeah, I need to have a patch that introduces "virProcessRunInFork" which
will just fork() and run the callback in the child. Something like
virProcessRunInMountNamespace but this will be called when namespaces
are not available.

Will send v3 in a while.

Michal

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list