[libvirt] [PATCH 1/2] storage: Resolve storage driver crash

John Ferlan posted 2 patches 7 years, 6 months ago
[libvirt] [PATCH 1/2] storage: Resolve storage driver crash
Posted by John Ferlan 7 years, 6 months ago
Resolve a storage driver crash as a result of a long running
storageVolCreateXML when the virStorageVolPoolRefreshThread is
run as a result of a storageVolUpload complete and ran the
virStoragePoolObjClearVols without checking if the creation
code was currently processing a buildVol after incrementing
the driver->asyncjob count.

The refreshThread needs to wait until all creation threads
are completed so as to not alter the volume list while the
volume is being built.

Crash from valgrind is as follows (with a bit of editing):

==21309== Invalid read of size 8
==21309==    at 0x153E47AF: storageBackendUpdateVolTargetInfo
==21309==    by 0x153E4C30: virStorageBackendUpdateVolInfo
==21309==    by 0x153E52DE: virStorageBackendVolRefreshLocal
==21309==    by 0x153DE29E: storageVolCreateXML
==21309==    by 0x562035B: virStorageVolCreateXML
==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
...
==21309==  Address 0x2590a720 is 64 bytes inside a block of size 336 free'd
==21309==    at 0x4C2F2BB: free
==21309==    by 0x54CB9FA: virFree
==21309==    by 0x55BC800: virStorageVolDefFree
==21309==    by 0x55BF1D8: virStoragePoolObjClearVols
==21309==    by 0x153D967E: virStorageVolPoolRefreshThread
...
==21309==  Block was alloc'd at
==21309==    at 0x4C300A5: calloc
==21309==    by 0x54CB483: virAlloc
==21309==    by 0x55BDC1F: virStorageVolDefParseXML
==21309==    by 0x55BDC1F: virStorageVolDefParseNode
==21309==    by 0x55BE5A4: virStorageVolDefParse
==21309==    by 0x153DDFF1: storageVolCreateXML
==21309==    by 0x562035B: virStorageVolCreateXML
==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
...

Signed-off-by: John Ferlan <jferlan@redhat.com>
---
 src/storage/storage_driver.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/src/storage/storage_driver.c b/src/storage/storage_driver.c
index b0edf9f885..5e920fc14c 100644
--- a/src/storage/storage_driver.c
+++ b/src/storage/storage_driver.c
@@ -2257,6 +2257,7 @@ virStorageVolPoolRefreshThread(void *opaque)
     virStorageBackendPtr backend;
     virObjectEventPtr event = NULL;
 
+ retry:
     storageDriverLock();
     if (cbdata->vol_path) {
         if (virStorageBackendPloopRestoreDesc(cbdata->vol_path) < 0)
@@ -2270,6 +2271,14 @@ virStorageVolPoolRefreshThread(void *opaque)
     if (!(backend = virStorageBackendForType(def->type)))
         goto cleanup;
 
+    /* Some thread is creating a new volume in the pool, we need to retry */
+    if (virStoragePoolObjGetAsyncjobs(obj) > 0) {
+        virStoragePoolObjUnlock(obj);
+        storageDriverUnlock();
+        usleep(100 * 1000);
+        goto retry;
+    }
+
     virStoragePoolObjClearVols(obj);
     if (backend->refreshPool(NULL, obj) < 0)
         VIR_DEBUG("Failed to refresh storage pool");
-- 
2.13.6

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/2] storage: Resolve storage driver crash
Posted by Cedric Bosdonnat 7 years, 6 months ago
On Mon, 2017-11-06 at 15:53 -0500, John Ferlan wrote:
> Resolve a storage driver crash as a result of a long running
> storageVolCreateXML when the virStorageVolPoolRefreshThread is
> run as a result of a storageVolUpload complete and ran the
> virStoragePoolObjClearVols without checking if the creation
> code was currently processing a buildVol after incrementing
> the driver->asyncjob count.
> 
> The refreshThread needs to wait until all creation threads
> are completed so as to not alter the volume list while the
> volume is being built.
> 
> Crash from valgrind is as follows (with a bit of editing):
> 
> ==21309== Invalid read of size 8
> ==21309==    at 0x153E47AF: storageBackendUpdateVolTargetInfo
> ==21309==    by 0x153E4C30: virStorageBackendUpdateVolInfo
> ==21309==    by 0x153E52DE: virStorageBackendVolRefreshLocal
> ==21309==    by 0x153DE29E: storageVolCreateXML
> ==21309==    by 0x562035B: virStorageVolCreateXML
> ==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
> ...
> ==21309==  Address 0x2590a720 is 64 bytes inside a block of size 336 free'd
> ==21309==    at 0x4C2F2BB: free
> ==21309==    by 0x54CB9FA: virFree
> ==21309==    by 0x55BC800: virStorageVolDefFree
> ==21309==    by 0x55BF1D8: virStoragePoolObjClearVols
> ==21309==    by 0x153D967E: virStorageVolPoolRefreshThread
> ...
> ==21309==  Block was alloc'd at
> ==21309==    at 0x4C300A5: calloc
> ==21309==    by 0x54CB483: virAlloc
> ==21309==    by 0x55BDC1F: virStorageVolDefParseXML
> ==21309==    by 0x55BDC1F: virStorageVolDefParseNode
> ==21309==    by 0x55BE5A4: virStorageVolDefParse
> ==21309==    by 0x153DDFF1: storageVolCreateXML
> ==21309==    by 0x562035B: virStorageVolCreateXML
> ==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
> ...
> 
> Signed-off-by: John Ferlan <jferlan@redhat.com>
> ---
>  src/storage/storage_driver.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/src/storage/storage_driver.c b/src/storage/storage_driver.c
> index b0edf9f885..5e920fc14c 100644
> --- a/src/storage/storage_driver.c
> +++ b/src/storage/storage_driver.c
> @@ -2257,6 +2257,7 @@ virStorageVolPoolRefreshThread(void *opaque)
>      virStorageBackendPtr backend;
>      virObjectEventPtr event = NULL;
>  
> + retry:
>      storageDriverLock();
>      if (cbdata->vol_path) {
>          if (virStorageBackendPloopRestoreDesc(cbdata->vol_path) < 0)
> @@ -2270,6 +2271,14 @@ virStorageVolPoolRefreshThread(void *opaque)
>      if (!(backend = virStorageBackendForType(def->type)))
>          goto cleanup;
>  
> +    /* Some thread is creating a new volume in the pool, we need to retry */
> +    if (virStoragePoolObjGetAsyncjobs(obj) > 0) {
> +        virStoragePoolObjUnlock(obj);
> +        storageDriverUnlock();
> +        usleep(100 * 1000);
> +        goto retry;
> +    }
> +
>      virStoragePoolObjClearVols(obj);
>      if (backend->refreshPool(NULL, obj) < 0)
>          VIR_DEBUG("Failed to refresh storage pool");

ACK, does the job here. I'm rather surprised to see you implementing it
with sleep, while you pointed me towards virCondWait yesterday. But using
sleep is a way less intrusive change.

--
Cedric

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/2] storage: Resolve storage driver crash
Posted by John Ferlan 7 years, 6 months ago

On 11/07/2017 04:18 AM, Cedric Bosdonnat wrote:
> On Mon, 2017-11-06 at 15:53 -0500, John Ferlan wrote:
>> Resolve a storage driver crash as a result of a long running
>> storageVolCreateXML when the virStorageVolPoolRefreshThread is
>> run as a result of a storageVolUpload complete and ran the
>> virStoragePoolObjClearVols without checking if the creation
>> code was currently processing a buildVol after incrementing
>> the driver->asyncjob count.
>>
>> The refreshThread needs to wait until all creation threads
>> are completed so as to not alter the volume list while the
>> volume is being built.
>>
>> Crash from valgrind is as follows (with a bit of editing):
>>
>> ==21309== Invalid read of size 8
>> ==21309==    at 0x153E47AF: storageBackendUpdateVolTargetInfo
>> ==21309==    by 0x153E4C30: virStorageBackendUpdateVolInfo
>> ==21309==    by 0x153E52DE: virStorageBackendVolRefreshLocal
>> ==21309==    by 0x153DE29E: storageVolCreateXML
>> ==21309==    by 0x562035B: virStorageVolCreateXML
>> ==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
>> ...
>> ==21309==  Address 0x2590a720 is 64 bytes inside a block of size 336 free'd
>> ==21309==    at 0x4C2F2BB: free
>> ==21309==    by 0x54CB9FA: virFree
>> ==21309==    by 0x55BC800: virStorageVolDefFree
>> ==21309==    by 0x55BF1D8: virStoragePoolObjClearVols
>> ==21309==    by 0x153D967E: virStorageVolPoolRefreshThread
>> ...
>> ==21309==  Block was alloc'd at
>> ==21309==    at 0x4C300A5: calloc
>> ==21309==    by 0x54CB483: virAlloc
>> ==21309==    by 0x55BDC1F: virStorageVolDefParseXML
>> ==21309==    by 0x55BDC1F: virStorageVolDefParseNode
>> ==21309==    by 0x55BE5A4: virStorageVolDefParse
>> ==21309==    by 0x153DDFF1: storageVolCreateXML
>> ==21309==    by 0x562035B: virStorageVolCreateXML
>> ==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
>> ...
>>
>> Signed-off-by: John Ferlan <jferlan@redhat.com>
>> ---
>>  src/storage/storage_driver.c | 9 +++++++++
>>  1 file changed, 9 insertions(+)
>>
>> diff --git a/src/storage/storage_driver.c b/src/storage/storage_driver.c
>> index b0edf9f885..5e920fc14c 100644
>> --- a/src/storage/storage_driver.c
>> +++ b/src/storage/storage_driver.c
>> @@ -2257,6 +2257,7 @@ virStorageVolPoolRefreshThread(void *opaque)
>>      virStorageBackendPtr backend;
>>      virObjectEventPtr event = NULL;
>>  
>> + retry:
>>      storageDriverLock();
>>      if (cbdata->vol_path) {
>>          if (virStorageBackendPloopRestoreDesc(cbdata->vol_path) < 0)
>> @@ -2270,6 +2271,14 @@ virStorageVolPoolRefreshThread(void *opaque)
>>      if (!(backend = virStorageBackendForType(def->type)))
>>          goto cleanup;
>>  
>> +    /* Some thread is creating a new volume in the pool, we need to retry */
>> +    if (virStoragePoolObjGetAsyncjobs(obj) > 0) {
>> +        virStoragePoolObjUnlock(obj);
>> +        storageDriverUnlock();
>> +        usleep(100 * 1000);
>> +        goto retry;
>> +    }
>> +
>>      virStoragePoolObjClearVols(obj);
>>      if (backend->refreshPool(NULL, obj) < 0)
>>          VIR_DEBUG("Failed to refresh storage pool");
> 
> ACK, does the job here. I'm rather surprised to see you implementing it
> with sleep, while you pointed me towards virCondWait yesterday. But using
> sleep is a way less intrusive change.
> 

Well you only asked about an alternative mechanism and condition
variable was the first thing that popped into my mind; however, as I got
to actually doing more thinking about it - asyncjobs is not blocking
multiple creates from occurring; whereas, a condition variable would be
waiting for one thing to complete.

My response to Jan also lists 2 other alternatives. This was just the
"easiest" of the 3.  If there's other ideas, I'm open to suggestions.

John

> --
> Cedric
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/2] storage: Resolve storage driver crash
Posted by Cedric Bosdonnat 7 years, 6 months ago
On Tue, 2017-11-07 at 08:43 -0500, John Ferlan wrote:
> 
> On 11/07/2017 04:18 AM, Cedric Bosdonnat wrote:
> > On Mon, 2017-11-06 at 15:53 -0500, John Ferlan wrote:
> > > Resolve a storage driver crash as a result of a long running
> > > storageVolCreateXML when the virStorageVolPoolRefreshThread is
> > > run as a result of a storageVolUpload complete and ran the
> > > virStoragePoolObjClearVols without checking if the creation
> > > code was currently processing a buildVol after incrementing
> > > the driver->asyncjob count.
> > > 
> > > The refreshThread needs to wait until all creation threads
> > > are completed so as to not alter the volume list while the
> > > volume is being built.
> > > 
> > > Crash from valgrind is as follows (with a bit of editing):
> > > 
> > > ==21309== Invalid read of size 8
> > > ==21309==    at 0x153E47AF: storageBackendUpdateVolTargetInfo
> > > ==21309==    by 0x153E4C30: virStorageBackendUpdateVolInfo
> > > ==21309==    by 0x153E52DE: virStorageBackendVolRefreshLocal
> > > ==21309==    by 0x153DE29E: storageVolCreateXML
> > > ==21309==    by 0x562035B: virStorageVolCreateXML
> > > ==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
> > > ...
> > > ==21309==  Address 0x2590a720 is 64 bytes inside a block of size 336 free'd
> > > ==21309==    at 0x4C2F2BB: free
> > > ==21309==    by 0x54CB9FA: virFree
> > > ==21309==    by 0x55BC800: virStorageVolDefFree
> > > ==21309==    by 0x55BF1D8: virStoragePoolObjClearVols
> > > ==21309==    by 0x153D967E: virStorageVolPoolRefreshThread
> > > ...
> > > ==21309==  Block was alloc'd at
> > > ==21309==    at 0x4C300A5: calloc
> > > ==21309==    by 0x54CB483: virAlloc
> > > ==21309==    by 0x55BDC1F: virStorageVolDefParseXML
> > > ==21309==    by 0x55BDC1F: virStorageVolDefParseNode
> > > ==21309==    by 0x55BE5A4: virStorageVolDefParse
> > > ==21309==    by 0x153DDFF1: storageVolCreateXML
> > > ==21309==    by 0x562035B: virStorageVolCreateXML
> > > ==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
> > > ...
> > > 
> > > Signed-off-by: John Ferlan <jferlan@redhat.com>
> > > ---
> > >  src/storage/storage_driver.c | 9 +++++++++
> > >  1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/src/storage/storage_driver.c b/src/storage/storage_driver.c
> > > index b0edf9f885..5e920fc14c 100644
> > > --- a/src/storage/storage_driver.c
> > > +++ b/src/storage/storage_driver.c
> > > @@ -2257,6 +2257,7 @@ virStorageVolPoolRefreshThread(void *opaque)
> > >      virStorageBackendPtr backend;
> > >      virObjectEventPtr event = NULL;
> > >  
> > > + retry:
> > >      storageDriverLock();
> > >      if (cbdata->vol_path) {
> > >          if (virStorageBackendPloopRestoreDesc(cbdata->vol_path) < 0)
> > > @@ -2270,6 +2271,14 @@ virStorageVolPoolRefreshThread(void *opaque)
> > >      if (!(backend = virStorageBackendForType(def->type)))
> > >          goto cleanup;
> > >  
> > > +    /* Some thread is creating a new volume in the pool, we need to retry */
> > > +    if (virStoragePoolObjGetAsyncjobs(obj) > 0) {
> > > +        virStoragePoolObjUnlock(obj);
> > > +        storageDriverUnlock();
> > > +        usleep(100 * 1000);
> > > +        goto retry;
> > > +    }
> > > +
> > >      virStoragePoolObjClearVols(obj);
> > >      if (backend->refreshPool(NULL, obj) < 0)
> > >          VIR_DEBUG("Failed to refresh storage pool");
> > 
> > ACK, does the job here. I'm rather surprised to see you implementing it
> > with sleep, while you pointed me towards virCondWait yesterday. But using
> > sleep is a way less intrusive change.
> > 
> 
> Well you only asked about an alternative mechanism and condition
> variable was the first thing that popped into my mind; however, as I got
> to actually doing more thinking about it - asyncjobs is not blocking
> multiple creates from occurring; whereas, a condition variable would be
> waiting for one thing to complete.
> 
> My response to Jan also lists 2 other alternatives. This was just the
> "easiest" of the 3.  If there's other ideas, I'm open to suggestions.
> 
It looks fine to me, I was just surprised to see the sleep version ;)

--
Cedric

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/2] storage: Resolve storage driver crash
Posted by Ján Tomko 7 years, 6 months ago
On Mon, Nov 06, 2017 at 03:53:08PM -0500, John Ferlan wrote:
>Resolve a storage driver crash as a result of a long running
>storageVolCreateXML when the virStorageVolPoolRefreshThread is
>run as a result of a storageVolUpload complete and ran the
>virStoragePoolObjClearVols without checking if the creation
>code was currently processing a buildVol after incrementing
>the driver->asyncjob count.

I thought the whole point of refreshing the pool in a thread
was to have the storage driver usable for other things.
This effectively disables pool refresh during long-running
jobs.

>
>The refreshThread needs to wait until all creation threads
>are completed so as to not alter the volume list while the
>volume is being built.
>
>Crash from valgrind is as follows (with a bit of editing):
>
>==21309== Invalid read of size 8
>==21309==    at 0x153E47AF: storageBackendUpdateVolTargetInfo
>==21309==    by 0x153E4C30: virStorageBackendUpdateVolInfo
>==21309==    by 0x153E52DE: virStorageBackendVolRefreshLocal
>==21309==    by 0x153DE29E: storageVolCreateXML

This is a fault of storageVolCreateXML for using invalid objects
after dropping the locks.

Jan

>==21309==    by 0x562035B: virStorageVolCreateXML
>==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
>...
>==21309==  Address 0x2590a720 is 64 bytes inside a block of size 336 free'd
>==21309==    at 0x4C2F2BB: free
>==21309==    by 0x54CB9FA: virFree
>==21309==    by 0x55BC800: virStorageVolDefFree
>==21309==    by 0x55BF1D8: virStoragePoolObjClearVols
>==21309==    by 0x153D967E: virStorageVolPoolRefreshThread
>...
>==21309==  Block was alloc'd at
>==21309==    at 0x4C300A5: calloc
>==21309==    by 0x54CB483: virAlloc
>==21309==    by 0x55BDC1F: virStorageVolDefParseXML
>==21309==    by 0x55BDC1F: virStorageVolDefParseNode
>==21309==    by 0x55BE5A4: virStorageVolDefParse
>==21309==    by 0x153DDFF1: storageVolCreateXML
>==21309==    by 0x562035B: virStorageVolCreateXML
>==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
>...
>
>Signed-off-by: John Ferlan <jferlan@redhat.com>
>---
> src/storage/storage_driver.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/2] storage: Resolve storage driver crash
Posted by John Ferlan 7 years, 6 months ago

On 11/07/2017 04:36 AM, Ján Tomko wrote:
> On Mon, Nov 06, 2017 at 03:53:08PM -0500, John Ferlan wrote:
>> Resolve a storage driver crash as a result of a long running
>> storageVolCreateXML when the virStorageVolPoolRefreshThread is
>> run as a result of a storageVolUpload complete and ran the
>> virStoragePoolObjClearVols without checking if the creation
>> code was currently processing a buildVol after incrementing
>> the driver->asyncjob count.
> 
> I thought the whole point of refreshing the pool in a thread
> was to have the storage driver usable for other things.
> This effectively disables pool refresh during long-running
> jobs.
> 

This would "disable" the refresh during a long running create/build job,
but that's already being done as other than another create, not much can
be done while a create/build is "in progress". The pool undefine,
destroy, delete, and refresh calls are all blocked (they fail) until the
create completes.

The reason for virStorageVolPoolRefreshThread is that updateVol can be
long running and once it's done (the stream is closed), it was desired
that the pool sizing values be updated. That meant either refresh the
entire pool (the easy way - theoretically) or update just the volume and
adjust the pool sizing values with the delta from the upload (similar to
how create/build alters the values).

Unfortunately the latter is not possible (yet) because only a subset of
pool types that support uploadVol also support refreshVol. But this
really is only a "problem" for the intersection of those that support
the buildVol too. That is the only reason we'd be blocking is if a pool
type supports both buildVol/buildVolFrom and uploadVol - so the fs,
logical, and vstorage pools.

The virStoragePoolFCRefreshThread used similar type logic w/r/t
refreshing the pool after a createVport completed. Here though, we're
waiting for udev to work it's magic in creating the vHBA LUN. The
asyncjob won't be > 0 for scsi pools yet, so perhaps patch 2 isn't
necessary, but it's preventative.

In any case, at this point in time I would think it's more desirable to
block during the build/upload condition as opposed to crash.

An "alternative" is to refresh the pool after a create vol rather than
just altering the pool available/allocation values from the created
volume target allocation. That way the VolPoolRefreshThread could just
"exit" knowing that volCreate will do the update. However, this refresh
could only happen when there are no asyncjobs outstanding. In the mean
time the pool sizing values probably would still need to be updated if
asyncjobs > 0 to avoid multiple create jobs from "exhausting" the pool
because the sizing values weren't properly updated.

>>
>> The refreshThread needs to wait until all creation threads
>> are completed so as to not alter the volume list while the
>> volume is being built.
>>
>> Crash from valgrind is as follows (with a bit of editing):
>>
>> ==21309== Invalid read of size 8
>> ==21309==    at 0x153E47AF: storageBackendUpdateVolTargetInfo
>> ==21309==    by 0x153E4C30: virStorageBackendUpdateVolInfo
>> ==21309==    by 0x153E52DE: virStorageBackendVolRefreshLocal
>> ==21309==    by 0x153DE29E: storageVolCreateXML
> 
> This is a fault of storageVolCreateXML for using invalid objects
> after dropping the locks.
> 

Partially, but the code also goes through the trouble of incrementing
the pool->asyncjobs count and setting the voldef->building flag for what
appears to be the reasoning that nothing else should modify the pool
volume list while we create/build a new volume, but the refresh thread
just ignores that. The asyncjobs was added in much simpler times in
commit id '2cd9b2d8'.

And this brings up a possible 3rd alternative. Perhaps the more real
problem is that virStoragePoolObjClearVols and refreshPool processing is
overly invasive with respect to free-ing the entire volume list and
rebuilding it rather than perhaps doing some sort of more intelligent
processing to "add" new volumes found in a pool that are not in the
list, "remove" ones no longer in the pool that are in the list, and
"update" those that are still in the list. That to me is outside the
scope of this change, but a noble goal at some point in time.

John

> Jan
> 
>> ==21309==    by 0x562035B: virStorageVolCreateXML
>> ==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
>> ...
>> ==21309==  Address 0x2590a720 is 64 bytes inside a block of size 336
>> free'd
>> ==21309==    at 0x4C2F2BB: free
>> ==21309==    by 0x54CB9FA: virFree
>> ==21309==    by 0x55BC800: virStorageVolDefFree
>> ==21309==    by 0x55BF1D8: virStoragePoolObjClearVols
>> ==21309==    by 0x153D967E: virStorageVolPoolRefreshThread
>> ...
>> ==21309==  Block was alloc'd at
>> ==21309==    at 0x4C300A5: calloc
>> ==21309==    by 0x54CB483: virAlloc
>> ==21309==    by 0x55BDC1F: virStorageVolDefParseXML
>> ==21309==    by 0x55BDC1F: virStorageVolDefParseNode
>> ==21309==    by 0x55BE5A4: virStorageVolDefParse
>> ==21309==    by 0x153DDFF1: storageVolCreateXML
>> ==21309==    by 0x562035B: virStorageVolCreateXML
>> ==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
>> ...
>>
>> Signed-off-by: John Ferlan <jferlan@redhat.com>
>> ---
>> src/storage/storage_driver.c | 9 +++++++++
>> 1 file changed, 9 insertions(+)

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/2] storage: Resolve storage driver crash
Posted by John Ferlan 7 years, 6 months ago

On 11/07/2017 08:38 AM, John Ferlan wrote:
> 
> 
> On 11/07/2017 04:36 AM, Ján Tomko wrote:
>> On Mon, Nov 06, 2017 at 03:53:08PM -0500, John Ferlan wrote:
>>> Resolve a storage driver crash as a result of a long running
>>> storageVolCreateXML when the virStorageVolPoolRefreshThread is
>>> run as a result of a storageVolUpload complete and ran the
>>> virStoragePoolObjClearVols without checking if the creation
>>> code was currently processing a buildVol after incrementing
>>> the driver->asyncjob count.
>>
>> I thought the whole point of refreshing the pool in a thread
>> was to have the storage driver usable for other things.
>> This effectively disables pool refresh during long-running
>> jobs.
>>

Jan -

So I'm curious to know whether your comments amount to a NAK of the
design of the patches of if the explanation provided here was sufficient?

Tks -

John

> 
> This would "disable" the refresh during a long running create/build job,
> but that's already being done as other than another create, not much can
> be done while a create/build is "in progress". The pool undefine,
> destroy, delete, and refresh calls are all blocked (they fail) until the
> create completes.
> 
> The reason for virStorageVolPoolRefreshThread is that updateVol can be
> long running and once it's done (the stream is closed), it was desired
> that the pool sizing values be updated. That meant either refresh the
> entire pool (the easy way - theoretically) or update just the volume and
> adjust the pool sizing values with the delta from the upload (similar to
> how create/build alters the values).
> 
> Unfortunately the latter is not possible (yet) because only a subset of
> pool types that support uploadVol also support refreshVol. But this
> really is only a "problem" for the intersection of those that support
> the buildVol too. That is the only reason we'd be blocking is if a pool
> type supports both buildVol/buildVolFrom and uploadVol - so the fs,
> logical, and vstorage pools.
> 
> The virStoragePoolFCRefreshThread used similar type logic w/r/t
> refreshing the pool after a createVport completed. Here though, we're
> waiting for udev to work it's magic in creating the vHBA LUN. The
> asyncjob won't be > 0 for scsi pools yet, so perhaps patch 2 isn't
> necessary, but it's preventative.
> 
> In any case, at this point in time I would think it's more desirable to
> block during the build/upload condition as opposed to crash.
> 
> An "alternative" is to refresh the pool after a create vol rather than
> just altering the pool available/allocation values from the created
> volume target allocation. That way the VolPoolRefreshThread could just
> "exit" knowing that volCreate will do the update. However, this refresh
> could only happen when there are no asyncjobs outstanding. In the mean
> time the pool sizing values probably would still need to be updated if
> asyncjobs > 0 to avoid multiple create jobs from "exhausting" the pool
> because the sizing values weren't properly updated.
> 
>>>
>>> The refreshThread needs to wait until all creation threads
>>> are completed so as to not alter the volume list while the
>>> volume is being built.
>>>
>>> Crash from valgrind is as follows (with a bit of editing):
>>>
>>> ==21309== Invalid read of size 8
>>> ==21309==    at 0x153E47AF: storageBackendUpdateVolTargetInfo
>>> ==21309==    by 0x153E4C30: virStorageBackendUpdateVolInfo
>>> ==21309==    by 0x153E52DE: virStorageBackendVolRefreshLocal
>>> ==21309==    by 0x153DE29E: storageVolCreateXML
>>
>> This is a fault of storageVolCreateXML for using invalid objects
>> after dropping the locks.
>>
> 
> Partially, but the code also goes through the trouble of incrementing
> the pool->asyncjobs count and setting the voldef->building flag for what
> appears to be the reasoning that nothing else should modify the pool
> volume list while we create/build a new volume, but the refresh thread
> just ignores that. The asyncjobs was added in much simpler times in
> commit id '2cd9b2d8'.
> 
> And this brings up a possible 3rd alternative. Perhaps the more real
> problem is that virStoragePoolObjClearVols and refreshPool processing is
> overly invasive with respect to free-ing the entire volume list and
> rebuilding it rather than perhaps doing some sort of more intelligent
> processing to "add" new volumes found in a pool that are not in the
> list, "remove" ones no longer in the pool that are in the list, and
> "update" those that are still in the list. That to me is outside the
> scope of this change, but a noble goal at some point in time.
> 
> John
> 
>> Jan
>>
>>> ==21309==    by 0x562035B: virStorageVolCreateXML
>>> ==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
>>> ...
>>> ==21309==  Address 0x2590a720 is 64 bytes inside a block of size 336
>>> free'd
>>> ==21309==    at 0x4C2F2BB: free
>>> ==21309==    by 0x54CB9FA: virFree
>>> ==21309==    by 0x55BC800: virStorageVolDefFree
>>> ==21309==    by 0x55BF1D8: virStoragePoolObjClearVols
>>> ==21309==    by 0x153D967E: virStorageVolPoolRefreshThread
>>> ...
>>> ==21309==  Block was alloc'd at
>>> ==21309==    at 0x4C300A5: calloc
>>> ==21309==    by 0x54CB483: virAlloc
>>> ==21309==    by 0x55BDC1F: virStorageVolDefParseXML
>>> ==21309==    by 0x55BDC1F: virStorageVolDefParseNode
>>> ==21309==    by 0x55BE5A4: virStorageVolDefParse
>>> ==21309==    by 0x153DDFF1: storageVolCreateXML
>>> ==21309==    by 0x562035B: virStorageVolCreateXML
>>> ==21309==    by 0x147366: remoteDispatchStorageVolCreateXML
>>> ...
>>>
>>> Signed-off-by: John Ferlan <jferlan@redhat.com>
>>> ---
>>> src/storage/storage_driver.c | 9 +++++++++
>>> 1 file changed, 9 insertions(+)
> 
> --
> libvir-list mailing list
> libvir-list@redhat.com
> https://www.redhat.com/mailman/listinfo/libvir-list
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/2] storage: Resolve storage driver crash
Posted by Ján Tomko 7 years, 5 months ago
On Wed, Nov 15, 2017 at 06:21:11PM -0500, John Ferlan wrote:
>
>
>On 11/07/2017 08:38 AM, John Ferlan wrote:
>>
>>
>> On 11/07/2017 04:36 AM, Ján Tomko wrote:
>>> On Mon, Nov 06, 2017 at 03:53:08PM -0500, John Ferlan wrote:
>>>> Resolve a storage driver crash as a result of a long running
>>>> storageVolCreateXML when the virStorageVolPoolRefreshThread is
>>>> run as a result of a storageVolUpload complete and ran the
>>>> virStoragePoolObjClearVols without checking if the creation
>>>> code was currently processing a buildVol after incrementing
>>>> the driver->asyncjob count.
>>>
>>> I thought the whole point of refreshing the pool in a thread
>>> was to have the storage driver usable for other things.
>>> This effectively disables pool refresh during long-running
>>> jobs.
>>>
>
>Jan -
>
>So I'm curious to know whether your comments amount to a NAK of the
>design of the patches of if the explanation provided here was sufficient?
>

NACK to the use of usleep for this.
I loathe usleep. It is reasonable for waiting on something external or racy
things like session daemon startup, but I do not see why it is necessary
here.

>Tks -
>
>John
>

[...]

>>
>> An "alternative" is to refresh the pool after a create vol rather than
>> just altering the pool available/allocation values from the created
>> volume target allocation. That way the VolPoolRefreshThread could just
>> "exit" knowing that volCreate will do the update. However, this refresh
>> could only happen when there are no asyncjobs outstanding. In the mean
>> time the pool sizing values probably would still need to be updated if
>> asyncjobs > 0 to avoid multiple create jobs from "exhausting" the pool
>> because the sizing values weren't properly updated.
>>

I did not realize VolPoolRefreshThread was only called after volume
upload. I think quietly doing nothing in this case would be reasonable,
considering that actual storagePoolRefresh errors out in that case.

As far as noble goals go, this is volUpload being more agressive
in keeping the pool allocation/capacity values up to date.
Since volCreate only changes the values based on the request by
the user, it does not take into account how much space is required
on-disk (e.g. metadata for LUKS and qcow2 volumes).
I am not sure how to reasonably solve this without refreshing the
whole pool every time (which gives us the advantage of catching
changes out of our scope - e.g. a fs pool that does not reside
on a separate partition).

Jan

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list