[libvirt] [PATCH 1/3] Add a plugin dlm for lock manager

river posted 3 patches 7 years, 3 months ago
[libvirt] [PATCH 1/3] Add a plugin dlm for lock manager
Posted by river 7 years, 3 months ago
dlm is implemented by linux kernel, provides userspace API by
'libdlm' to lock/unlock resource, cooperated with cluster
communication software, such as corosync, could satisfy the
demands of libvirt lock manager.

Signed-off-by: river <lfu@suse.com>
---
 configure.ac                  |    6 +
 m4/virt-cpg.m4                |   37 ++
 m4/virt-dlm.m4                |   36 ++
 src/Makefile.am               |   15 +
 src/locking/lock_driver_dlm.c | 1056 +++++++++++++++++++++++++++++++++++++++++
 src/util/virlist.h            |  110 +++++
 6 files changed, 1260 insertions(+)
 create mode 100644 m4/virt-cpg.m4
 create mode 100644 m4/virt-dlm.m4
 create mode 100644 src/locking/lock_driver_dlm.c
 create mode 100644 src/util/virlist.h

diff --git a/configure.ac b/configure.ac
index 4cccf7f4d..4ad90470d 100644
--- a/configure.ac
+++ b/configure.ac
@@ -265,6 +265,8 @@ LIBVIRT_ARG_PM_UTILS
 LIBVIRT_ARG_POLKIT
 LIBVIRT_ARG_READLINE
 LIBVIRT_ARG_SANLOCK
+LIBVIRT_ARG_CPG
+LIBVIRT_ARG_DLM
 LIBVIRT_ARG_SASL
 LIBVIRT_ARG_SELINUX
 LIBVIRT_ARG_SSH2
@@ -307,6 +309,8 @@ LIBVIRT_CHECK_POLKIT
 LIBVIRT_CHECK_PTHREAD
 LIBVIRT_CHECK_READLINE
 LIBVIRT_CHECK_SANLOCK
+LIBVIRT_CHECK_CPG
+LIBVIRT_CHECK_DLM
 LIBVIRT_CHECK_SASL
 LIBVIRT_CHECK_SELINUX
 LIBVIRT_CHECK_SSH2
@@ -1005,6 +1009,8 @@ LIBVIRT_RESULT_POLKIT
 LIBVIRT_RESULT_RBD
 LIBVIRT_RESULT_READLINE
 LIBVIRT_RESULT_SANLOCK
+LIBVIRT_RESULT_CPG
+LIBVIRT_RESULT_DLM
 LIBVIRT_RESULT_SASL
 LIBVIRT_RESULT_SELINUX
 LIBVIRT_RESULT_SSH2
diff --git a/m4/virt-cpg.m4 b/m4/virt-cpg.m4
new file mode 100644
index 000000000..27acda665
--- /dev/null
+++ b/m4/virt-cpg.m4
@@ -0,0 +1,37 @@
+dnl The libcpg.so library
+dnl
+dnl Copyright (C) 2018 SUSE LINUX Products, Beijing, China.
+dnl
+dnl This library is free software; you can redistribute it and/or
+dnl modify it under the terms of the GNU Lesser General Public
+dnl License as published by the Free Software Foundation; either
+dnl version 2.1 of the License, or (at your option) any later version.
+dnl
+dnl This library is distributed in the hope that it will be useful,
+dnl but WITHOUT ANY WARRANTY; without even the implied warranty of
+dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+dnl Lesser General Public License for more details.
+dnl
+dnl You should have received a copy of the GNU Lesser General Public
+dnl License along with this library.  If not, see
+dnl <http://www.gnu.org/licenses/>.
+dnl
+
+AC_DEFUN([LIBVIRT_ARG_CPG],[
+  LIBVIRT_ARG_WITH_FEATURE([CPG], [cluster engine CPG library], [check])
+])
+
+AC_DEFUN([LIBVIRT_CHECK_CPG],[
+  dnl in some distribution, Version is `UNKNOW` in libcpg.pc
+  if test "x$with_cpg" != "xno"; then
+    PKG_CHECK_MODULES([CPG], [libcpg], [
+      with_cpg=yes
+    ],[
+      with_cpg=no
+    ])
+  fi
+])
+
+AC_DEFUN([LIBVIRT_RESULT_CPG],[
+  LIBVIRT_RESULT_LIB([CPG])
+])
diff --git a/m4/virt-dlm.m4 b/m4/virt-dlm.m4
new file mode 100644
index 000000000..dc5f5f152
--- /dev/null
+++ b/m4/virt-dlm.m4
@@ -0,0 +1,36 @@
+dnl The libdlm.so library
+dnl
+dnl Copyright (C) 2018 SUSE LINUX Products, Beijing, China.
+dnl
+dnl This library is free software; you can redistribute it and/or
+dnl modify it under the terms of the GNU Lesser General Public
+dnl License as published by the Free Software Foundation; either
+dnl version 2.1 of the License, or (at your option) any later version.
+dnl
+dnl This library is distributed in the hope that it will be useful,
+dnl but WITHOUT ANY WARRANTY; without even the implied warranty of
+dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+dnl Lesser General Public License for more details.
+dnl
+dnl You should have received a copy of the GNU Lesser General Public
+dnl License along with this library.  If not, see
+dnl <http://www.gnu.org/licenses/>.
+dnl
+
+AC_DEFUN([LIBVIRT_ARG_DLM],[
+  LIBVIRT_ARG_WITH_FEATURE([DLM], [Distributed Lock Manager library], [check])
+])
+
+AC_DEFUN([LIBVIRT_CHECK_DLM],[
+  AC_REQUIRE([LIBVIRT_CHECK_CPG])
+  LIBVIRT_CHECK_PKG([DLM], [libdlm], [4.0.0])
+
+  if test "x$with_dlm" == "xyes" && test "x$with_cpg" != "xyes"; then
+    AC_MSG_ERROR([You must install libcpg to build dlm lock])
+  fi
+])
+
+AC_DEFUN([LIBVIRT_RESULT_DLM],[
+  AC_REQUIRE([LIBVIRT_RESULT_CPG])
+  LIBVIRT_RESULT_LIB([DLM])
+])
diff --git a/src/Makefile.am b/src/Makefile.am
index 79adc9ba5..8742921fa 100644
--- a/src/Makefile.am
+++ b/src/Makefile.am
@@ -305,6 +305,10 @@ DRIVER_SOURCES = \
 		logging/log_manager.c logging/log_manager.h \
 		$(NULL)
 
+LOCK_DRIVER_DLM_SOURCES = \
+		locking/lock_driver_dlm.c \
+		$(NULL)
+
 LOCK_DRIVER_SANLOCK_SOURCES = \
 		locking/lock_driver_sanlock.c
 
@@ -2879,6 +2883,17 @@ virtlogd.socket: logging/virtlogd.socket.in $(top_builddir)/config.status
 	    < $< > $@-t && \
 	    mv $@-t $@
 
+if WITH_DLM
+lockdriver_LTLIBRARIES += dlm.la
+dlm_la_SOURCES = $(LOCK_DRIVER_DLM_SOURCES)
+dlm_la_CFLAGS = -I$(srcdir)/conf $(AM_CFLAGS)
+dlm_la_LDFLAGS = -module -avoid-version $(AM_LDFLAGS)
+dlm_la_LIBADD = ../gnulib/lib/libgnu.la \
+				$(CPG_LIBS) \
+				$(DLM_LIBS)
+else ! WITH_DLM
+EXTRA_DIST += $(LOCK_DRIVER_DLM_SOURCES)
+endif
 
 if WITH_SANLOCK
 lockdriver_LTLIBRARIES += sanlock.la
diff --git a/src/locking/lock_driver_dlm.c b/src/locking/lock_driver_dlm.c
new file mode 100644
index 000000000..e197c0bdf
--- /dev/null
+++ b/src/locking/lock_driver_dlm.c
@@ -0,0 +1,1056 @@
+/*
+ * lock_driver_dlm.c: a lock driver for dlm
+ *
+ * Copyright (C) 2018 SUSE LINUX Products, Beijing, China.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library.  If not, see
+ * <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#include <config.h>
+
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <stdint.h>
+#include <errno.h>
+#include <fcntl.h>
+#include <stdio.h>
+
+#include <corosync/cpg.h>
+#include <libdlm.h>
+
+#include "lock_driver.h"
+#include "viralloc.h"
+#include "virconf.h"
+#include "vircrypto.h"
+#include "virerror.h"
+#include "virfile.h"
+#include "virlist.h"
+#include "virlog.h"
+#include "virstring.h"
+#include "virthread.h"
+#include "viruuid.h"
+
+#define VIR_FROM_THIS VIR_FROM_LOCKING
+
+#define DLM_LOCKSPACE_MODE  0600
+#define DLM_LOCKSPACE_NAME  "libvirt"
+
+#define LOCK_RECORD_FILE_MODE       0644
+#define LOCK_RECORD_FILE_PATH       "/tmp/libvirtd-dlm-file"
+
+#define PRMODE  "PRMODE"
+#define EXMODE  "EXMODE"
+
+#define STATUS             "STATUS"
+#define RESOURCE_NAME      "RESOURCE_NAME"
+#define LOCK_ID            "LOCK_ID"
+#define LOCK_MODE          "LOCK_MODE"
+#define VM_PID             "VM_PID"
+
+#define BUFFERLEN          128
+
+/* This will be set after dlm_controld is started. */
+#define DLM_CLUSTER_NAME_PATH "/sys/kernel/config/dlm/cluster/cluster_name"
+
+VIR_LOG_INIT("locking.lock_driver_dlm");
+
+typedef struct _virLockInformation virLockInformation;
+typedef virLockInformation *virLockInformationPtr;
+
+typedef struct _virLockManagerDlmResource virLockManagerDlmResource;
+typedef virLockManagerDlmResource *virLockManagerDlmResourcePtr;
+
+typedef struct _virLockManagerDlmPrivate virLockManagerDlmPrivate;
+typedef virLockManagerDlmPrivate *virLockManagerDlmPrivatePtr;
+
+typedef struct _virLockManagerDlmDriver virLockManagerDlmDriver;
+typedef virLockManagerDlmDriver *virLockManagerDlmDriverPtr;
+
+typedef struct _virListWait virListWait;
+typedef virListWait *virListWaitPtr;
+
+struct _virLockInformation {
+    virListHead entry;
+    char    *name;
+    uint32_t mode;
+    uint32_t lkid;
+    pid_t    vm_pid;
+};
+
+struct _virLockManagerDlmResource {
+    char    *name;
+    uint32_t mode;
+};
+
+struct _virLockManagerDlmPrivate {
+    unsigned char vm_uuid[VIR_UUID_BUFLEN];
+    char         *vm_name;
+    pid_t         vm_pid;
+    int           vm_id;
+
+    size_t        nresources;
+    virLockManagerDlmResourcePtr resources;
+
+    bool          hasRWDisks;
+};
+
+struct _virLockManagerDlmDriver {
+    bool  autoDiskLease;
+    bool  requireLeaseForDisks;
+
+	bool  purgeLockspace;
+	char *lockspaceName;
+    char *lockRecordFilePath;
+};
+
+struct _virListWait {
+    virMutex listMutex;
+    virMutex fileMutex;
+    virListHead list;
+};
+
+static virLockManagerDlmDriverPtr driver;
+static dlm_lshandle_t lockspace;
+static virListWait lockListWait;
+
+static int virLockManagerDlmLoadConfig(const char *configFile)
+{
+    virConfPtr conf = NULL;
+    int ret = -1;
+
+    if (access(configFile, R_OK) == -1) {
+        if (errno != ENOENT) {
+            virReportSystemError(errno,
+                                 _("Unable to access config file %s"),
+                                 configFile);
+            return -1;
+        }
+        return 0;
+    }
+
+	if (!(conf = virConfReadFile(configFile, 0)))
+		return -1;
+
+    if (virConfGetValueBool(conf, "auto_disk_leases", &driver->autoDiskLease) < 0)
+        goto cleanup;
+
+    driver->requireLeaseForDisks = !driver->autoDiskLease;
+    if (virConfGetValueBool(conf, "require_lease_for_disks", &driver->requireLeaseForDisks) < 0)
+        goto cleanup;
+
+    if (virConfGetValueBool(conf, "purge_lockspace", &driver->purgeLockspace) < 0)
+        goto cleanup;
+
+    if (virConfGetValueString(conf, "lockspace_name", &driver->lockspaceName) < 0)
+        goto cleanup;
+
+    if (virConfGetValueString(conf, "lock_record_file_path", &driver->lockRecordFilePath) < 0)
+        goto cleanup;
+
+    ret = 0;
+
+ cleanup:
+    virConfFree(conf);
+    return ret;
+}
+
+static int virLockManagerDlmToModeUint(const char *token)
+{
+    if (STREQ(token, PRMODE))
+        return LKM_PRMODE;
+    if (STREQ(token, EXMODE))
+        return LKM_EXMODE;
+
+    return 0;
+}
+
+static const char *virLockManagerDlmToModeText(const uint32_t mode)
+{
+    switch (mode) {
+    case LKM_PRMODE:
+        return PRMODE;
+    case LKM_EXMODE:
+        return EXMODE;
+    default:
+        return NULL;
+    }
+}
+
+static virLockInformationPtr virLockManagerDlmRecordLock(const char *name,
+                                                         const uint32_t mode,
+                                                         const uint32_t lkid,
+                                                         const pid_t vm_pid)
+{
+    virLockInformationPtr lock = NULL;
+
+    if (VIR_ALLOC(lock) < 0)
+        goto error;
+
+    if (VIR_STRDUP(lock->name, name) < 0)
+        goto error;
+
+    lock->mode = mode;
+    lock->lkid = lkid;
+    lock->vm_pid = vm_pid;
+
+    virMutexLock(&(lockListWait.listMutex));
+    virListAddTail(&lock->entry, &(lockListWait.list));
+    virMutexUnlock(&(lockListWait.listMutex));
+
+    VIR_DEBUG("record lock sucessfully, lockName=%s lockMode=%s lockId=%d",
+              NULLSTR(name), NULLSTR(virLockManagerDlmToModeText(mode)), lkid);
+
+    return lock;
+
+ error:
+    if (lock)
+        VIR_FREE(lock->name);
+    VIR_FREE(lock);
+    return NULL;
+}
+
+static void virLockManagerDlmWriteLock(virLockInformationPtr lock, int fd, bool status)
+{
+    char buffer[BUFFERLEN] = {0};
+    off_t offset = 0, rv = 0;
+
+    if (!lock) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("lock is NULL"));
+        return;
+    }
+
+    /*
+     * STATUS RESOURCE_NAME LOCK_MODE VM_PID\n
+     *      6            64         9     10
+     * 93 = 6 + 1 + 64 + 1 + 9 + 1 + 10 + 1
+     */
+    offset = 93 * lock->lkid;
+	rv = lseek(fd, offset, SEEK_SET);
+	if (rv < 0) {
+		virReportSystemError(errno,
+							 _("unable to lseek fd '%d'"),
+                             fd);
+        return;
+    }
+
+    snprintf(buffer, sizeof(buffer), "%6d %64s %9s %10jd\n", \
+             status, lock->name,
+             NULLSTR(virLockManagerDlmToModeText(lock->mode)),
+             (intmax_t)lock->vm_pid);
+
+    if (safewrite(fd, buffer, strlen(buffer)) != strlen(buffer)) {
+        virReportSystemError(errno,
+                             _("unable to write lock information '%s' to file '%s'"),
+                             buffer, NULLSTR(driver->lockRecordFilePath));
+        return;
+    }
+
+    VIR_DEBUG("write '%s' to fd=%d", buffer, fd);
+
+    fdatasync(fd);
+
+    return;
+}
+
+static void virLockManagerDlmAdoptLock(char *raw) {
+    char *str = NULL, *subtoken = NULL, *saveptr = NULL, *endptr = NULL;
+    int i = 0, status = 0;
+    char *name = NULL;
+    uint32_t mode = 0;
+    pid_t vm_pid = 0;
+    struct dlm_lksb lksb = {0};
+
+    /* every line is the following format:
+     *   STATUS RESOURCE_NAME LOCK_MODE VM_PID
+     */
+    for (i = 0, str = raw, status = 0; ; str = NULL, i++) {
+        subtoken = strtok_r(str, " \n", &saveptr);
+        if (subtoken == NULL)
+            break;
+
+        switch(i) {
+        case 0:
+            if (virStrToLong_i(subtoken, &endptr, 10, &status) < 0) {
+                virReportError(VIR_ERR_INTERNAL_ERROR,
+                               _("cannot extract lock status '%s'"), subtoken);
+                goto cleanup;
+            }
+            break;
+        case 1:
+            if (VIR_STRDUP(name, subtoken) != 1)
+                goto cleanup;
+            break;
+        case 2:
+            mode = virLockManagerDlmToModeUint(subtoken);
+            if (!mode)
+                goto cleanup;
+            break;
+        case 3:
+            if ((virStrToLong_i(subtoken, &endptr, 10, &vm_pid) < 0) || !vm_pid) {
+                virReportError(VIR_ERR_INTERNAL_ERROR,
+                               _("cannot extract lock vm_pid '%s'"), subtoken);
+                goto cleanup;
+            }
+            break;
+        default:
+            goto cleanup;
+            break;
+        }
+
+        if (status != 1)
+            goto cleanup;
+    }
+
+    if (i != 4)
+        goto cleanup;
+
+    /* copy from `lm_adopt_dlm` in daemons/lvmlockd/lvmlockd-dlm.c of lvm2:
+	 *   dlm returns 0 for success, -EAGAIN if an orphan is
+     *   found with another mode, and -ENOENT if no orphan.
+     *
+     *   cast/bast/param are (void *)1 because the kernel
+     *   returns errors if some are null.
+     */
+
+    status = dlm_ls_lockx(lockspace, mode, &lksb, LKF_PERSISTENT|LKF_ORPHAN,
+                          name, strlen(name), 0,
+                          (void *)1, (void *)1, (void *)1,
+                          NULL, NULL);
+    if (status) {
+        virReportSystemError(errno,
+                             _("unable to adopt lock, rv=%d lockName=%s lockMode=%s"),
+                             status, name, NULLSTR(virLockManagerDlmToModeText(mode)));
+        goto cleanup;
+    }
+
+    if (!virLockManagerDlmRecordLock(name, mode, lksb.sb_lkid, vm_pid)) {
+        virReportSystemError(errno,
+                             _("unable to record lock information, "
+                                 "lockName=%s lockMode=%s lockId=%d vm_pid=%jd"),
+                             NULLSTR(name), NULLSTR(virLockManagerDlmToModeText(mode)),
+                             lksb.sb_lkid, (intmax_t)vm_pid);
+    }
+
+
+ cleanup:
+    if (name)
+        VIR_FREE(name);
+
+    return;
+}
+
+static int virLockManagerDlmPrepareLockList(const char *lockRecordFilePath)
+{
+    FILE *fp = NULL;
+    int line = 0;
+    size_t n = 0;
+    ssize_t count = 0;
+    char *buffer = NULL;
+
+    fp = fopen(lockRecordFilePath, "r");
+    if (!fp) {
+        virReportSystemError(errno,
+                             _("unable to open '%s'"), lockRecordFilePath);
+        return -1;
+    }
+
+    /* lock information is from the second line */
+    for (line = 0; !feof(fp); line++) {
+        count = getline(&buffer, &n, fp);
+        if (count <= 0)
+            break;
+
+        switch (line) {
+        case 0:
+            break;
+        default:
+            virLockManagerDlmAdoptLock(buffer);
+            break;
+        }
+    }
+
+    VIR_FORCE_FCLOSE(fp);
+    VIR_FREE(buffer);
+
+    return 0;
+}
+
+static int virLockManagerDlmGetLocalNodeId(uint32_t *nodeId)
+{
+    cpg_handle_t handle = 0;
+    int rv = -1;
+
+    if (cpg_model_initialize(&handle, CPG_MODEL_V1, NULL, NULL) != CS_OK) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("unable to create a new connection to the CPG service"));
+		return -1;
+    }
+
+	if( cpg_local_get(handle, nodeId) != CS_OK) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("unable to get the local node id by the CPG service"));
+        goto cleanup;
+    }
+
+    VIR_DEBUG("the local nodeid=%u", *nodeId);
+
+    rv = 0;
+
+ cleanup:
+    if (cpg_finalize(handle) != CS_OK)
+        VIR_WARN("unable to finalize the CPG service");
+
+	return rv;
+}
+
+static int virLockManagerDlmDumpLockList(const char *lockRecordFilePath)
+{
+    virLockInformationPtr theLock = NULL;
+    char buffer[BUFFERLEN] = {0};
+    int fd = -1, rv = -1;
+
+    /* not need mutex because of only one instance would be initialized */
+    fd = open(lockRecordFilePath, O_WRONLY|O_CREAT|O_TRUNC, LOCK_RECORD_FILE_MODE);
+    if (fd < 0) {
+        virReportSystemError(errno,
+                             _("unable to open '%s'"),
+                             lockRecordFilePath);
+        return -1;
+    }
+
+    snprintf(buffer, sizeof(buffer), "%6s %64s %9s %10s\n", \
+                    STATUS, RESOURCE_NAME, LOCK_MODE, VM_PID);
+    if (safewrite(fd, buffer, strlen(buffer)) != strlen(buffer)) {
+        virReportSystemError(errno,
+                             _("unable to write '%s' to '%s'"),
+                             buffer, lockRecordFilePath);
+        goto cleanup;
+    }
+
+    virListForEachEntry(theLock, &(lockListWait.list), entry) {
+        virLockManagerDlmWriteLock(theLock, fd, 1);
+    }
+
+    if (VIR_CLOSE(fd) < 0) {
+        virReportSystemError(errno,
+                             _("unable to close file '%s'"),
+                             lockRecordFilePath);
+        goto cleanup;
+    }
+
+    rv = 0;
+
+ cleanup:
+    if (rv)
+        VIR_FORCE_CLOSE(fd);
+    return rv;
+}
+
+static int virLockManagerDlmSetupLockRecordFile(const char *lockRecordFilePath,
+                                                const bool newLockspace,
+                                                const bool purgeLockspace)
+{
+    uint32_t nodeId = 0;
+
+    /* there maybe some orphan locks recorded in the lock record file which
+     * should be adopted if lockspace is opened instead of created, we adopt
+     * them then add them in the list.
+     */
+    if (!newLockspace &&
+        virLockManagerDlmPrepareLockList(lockRecordFilePath)) {
+        virReportError(VIR_ERR_INTERNAL_ERROR,
+                       _("unable to adopt locks from '%s'"),
+                       NULLSTR(lockRecordFilePath));
+        return -1;
+    }
+
+    /* purgeLockspace flag means purging orphan locks belong to any process
+     * in this lockspace.
+     */
+    if (purgeLockspace && !virLockManagerDlmGetLocalNodeId(&nodeId)) {
+        if (dlm_ls_purge(lockspace, nodeId, 0)) {
+            VIR_WARN("node=%u purge DLM locks failed in lockspace=%s",
+                     nodeId, NULLSTR(driver->lockspaceName));
+        }
+        else
+            VIR_DEBUG("node=%u purge DLM locks success in lockspace=%s",
+                      nodeId, NULLSTR(driver->lockspaceName));
+    }
+
+    /* initialize the lock record file */
+    if (virLockManagerDlmDumpLockList(lockRecordFilePath)) {
+        virReportError(VIR_ERR_INTERNAL_ERROR,
+                       _("unable to initialize the lock record file '%s'"),
+                       lockRecordFilePath);
+        return -1;
+    }
+
+    return 0;
+}
+
+static int virLockManagerDlmSetup(void)
+{
+    bool newLockspace = false;
+
+    virListHeadInit(&(lockListWait.list));
+    if ((virMutexInit(&(lockListWait.listMutex)) < 0) ||
+        (virMutexInit(&(lockListWait.fileMutex)) < 0)) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("unable to initialize mutex"));
+        return -1;
+    }
+
+
+    /* check whether dlm is running or not */
+    if (access(DLM_CLUSTER_NAME_PATH, F_OK)) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("check dlm_controld, ensure it has setuped"));
+        return -1;
+    }
+
+    /* open lockspace, create it if it doesn't exist */
+    lockspace = dlm_open_lockspace(driver->lockspaceName);
+    if (!lockspace) {
+        lockspace = dlm_create_lockspace(driver->lockspaceName, DLM_LOCKSPACE_MODE);
+        if (!lockspace) {
+            virReportSystemError(errno, "%s",
+                                 _("unable to open and create DLM lockspace"));
+            return -1;
+        }
+        newLockspace = true;
+    }
+
+    /* create thread to receive notification from kernel */
+    if (dlm_ls_pthread_init(lockspace)) {
+        if (errno != EEXIST) {
+            virReportSystemError(errno, "%s",
+                                 _("unable to initialize lockspace"));
+            return -1;
+        }
+    }
+
+    /* we need file to record lock information used by rebooted libvirtd */
+    if (virLockManagerDlmSetupLockRecordFile(driver->lockRecordFilePath,
+                                             newLockspace,
+                                             driver->purgeLockspace)) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("unable to initialize DLM lock file"));
+        return -1;
+    }
+
+    return 0;
+}
+
+static int virLockManagerDlmDeinit(void);
+
+static int virLockManagerDlmInit(unsigned int version,
+                                 const char *configFile,
+                                 unsigned int flags)
+{
+    VIR_DEBUG("version=%u configFile=%s flags=0x%x", version, NULLSTR(configFile), flags);
+
+    virCheckFlags(0, -1);
+
+    if (driver)
+        return 0;
+
+    if (geteuid() != 0) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("dlm lock requires root privileges"));
+        return -1;
+    }
+
+    if (VIR_ALLOC(driver) < 0)
+        return -1;
+
+    driver->autoDiskLease = true;
+    driver->requireLeaseForDisks = !driver->autoDiskLease;
+    driver->purgeLockspace = true;
+
+    if (virAsprintf(&driver->lockspaceName,
+                  "%s", DLM_LOCKSPACE_NAME) < 0)
+        goto error;
+
+    if (virAsprintf(&driver->lockRecordFilePath,
+                  "%s", LOCK_RECORD_FILE_PATH) < 0)
+        goto error;
+
+    if (virLockManagerDlmLoadConfig(configFile) < 0)
+        goto error;
+
+    if (virLockManagerDlmSetup() < 0)
+        goto error;
+
+    return 0;
+
+ error:
+    virLockManagerDlmDeinit();
+    return -1;
+}
+
+static int virLockManagerDlmDeinit(void)
+{
+    virLockInformationPtr theLock = NULL;
+
+    if (!driver)
+        return 0;
+
+    if(lockspace)
+        dlm_close_lockspace(lockspace);
+
+    /* not care about whether adopting lock or not,
+     * just release those to prevent memory leak
+     */
+    virListForEachEntry(theLock, &(lockListWait.list), entry) {
+        virListDelete(&(theLock->entry));
+        VIR_FREE(theLock->name);
+        VIR_FREE(theLock);
+    }
+
+    VIR_FREE(driver->lockspaceName);
+    VIR_FREE(driver->lockRecordFilePath);
+    VIR_FREE(driver);
+
+    return 0;
+}
+
+static int virLockManagerDlmNew(virLockManagerPtr lock,
+                                unsigned int type,
+                                size_t nparams,
+                                virLockManagerParamPtr params,
+                                unsigned int flags)
+{
+    virLockManagerDlmPrivatePtr priv = NULL;
+    size_t i;
+
+    virCheckFlags(VIR_LOCK_MANAGER_NEW_STARTED, -1);
+
+    if (!driver) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("dlm plugin is not initialized"));
+        return -1;
+    }
+
+    if (type != VIR_LOCK_MANAGER_OBJECT_TYPE_DOMAIN) {
+        virReportError(VIR_ERR_INTERNAL_ERROR,
+                       _("unsupported object type %d"), type);
+        return -1;
+    }
+
+    if (VIR_ALLOC(priv) < 0)
+        return -1;
+
+    for (i = 0; i< nparams; i++) {
+        if (STREQ(params[i].key, "uuid")) {
+            memcpy(priv->vm_uuid, params[i].value.uuid, VIR_UUID_BUFLEN);
+        } else if (STREQ(params[i].key, "name")) {
+            if (VIR_STRDUP(priv->vm_name, params[i].value.str) < 0)
+                return -1;
+        } else if (STREQ(params[i].key, "id")) {
+            priv->vm_id = params[i].value.ui;
+        } else if (STREQ(params[i].key, "pid")) {
+            priv->vm_pid = params[i].value.iv;
+        } else if (STREQ(params[i].key, "uri")) {
+            /* there would be a warning in some case according to the history patch,
+             * so ignored
+             */
+        } else {
+            virReportError(VIR_ERR_INTERNAL_ERROR,
+                           _("unexpected parameter %s for object"),
+                           params[i].key);
+        }
+    }
+
+    /* check the following to prevent some unexpexted state in some case */
+    if (priv->vm_pid == 0) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("missing PID parameter for domain object"));
+        return -1;
+    }
+    if (!priv->vm_name) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("missing name parameter for domain object"));
+        return -1;
+    }
+
+    if (priv->vm_id == 0) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("missing ID parameter for domain object"));
+        return -1;
+    }
+    if (!virUUIDIsValid(priv->vm_uuid)) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("missing UUID parameter for domain object"));
+        return -1;
+    }
+
+    lock->privateData = priv;
+
+    return 0;
+}
+
+static void virLockManagerDlmFree(virLockManagerPtr lock)
+{
+    virLockManagerDlmPrivatePtr priv = lock->privateData;
+    size_t i;
+
+    if (!priv)
+        return;
+
+    for (i = 0; i < priv->nresources; i++)
+        VIR_FREE(priv->resources[i].name);
+
+    VIR_FREE(priv->resources);
+    VIR_FREE(priv->vm_name);
+    VIR_FREE(priv);
+    lock->privateData = NULL;
+
+    return;
+}
+
+static int virLockManagerDlmAddResource(virLockManagerPtr lock,
+                                        unsigned int type, const char *name,
+                                        size_t nparams,
+                                        virLockManagerParamPtr params,
+                                        unsigned int flags)
+{
+    virLockManagerDlmPrivatePtr priv = lock->privateData;
+    char *newName = NULL;
+
+    virCheckFlags(VIR_LOCK_MANAGER_RESOURCE_READONLY |
+                  VIR_LOCK_MANAGER_RESOURCE_SHARED, -1);
+
+    /* Treat read only resources as a no-op lock request */
+    if (flags & VIR_LOCK_MANAGER_RESOURCE_READONLY)
+        return 0;
+
+    switch (type) {
+    case VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK:
+            if (params || nparams) {
+                virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                               _("unexpected parameters for disk resource"));
+                return -1;
+            }
+
+            if (!driver->autoDiskLease) {
+                if (!(flags & (VIR_LOCK_MANAGER_RESOURCE_SHARED |
+                               VIR_LOCK_MANAGER_RESOURCE_READONLY))) {
+                    priv->hasRWDisks = true;
+                    /* ignore disk resource without error */
+                    return 0;
+                }
+            }
+
+            if (virCryptoHashString(VIR_CRYPTO_HASH_SHA256, name, &newName) < 0)
+                goto cleanup;
+
+        break;
+
+    case VIR_LOCK_MANAGER_RESOURCE_TYPE_LEASE:
+        /* we need format the lock information, so the lock name must be the constant length */
+        if (virCryptoHashString(VIR_CRYPTO_HASH_SHA256, name, &newName) < 0)
+            goto cleanup;
+
+        break;
+
+    default:
+        virReportError(VIR_ERR_INTERNAL_ERROR,
+                _("unknown lock manager object type %d"),
+                type);
+        return -1;
+    }
+
+    if (VIR_EXPAND_N(priv->resources, priv->nresources, 1) < 0)
+        goto cleanup;
+
+    priv->resources[priv->nresources-1].name = newName;
+
+    if (!!(flags & VIR_LOCK_MANAGER_RESOURCE_SHARED))
+        priv->resources[priv->nresources-1].mode = LKM_PRMODE;
+    else
+        priv->resources[priv->nresources-1].mode = LKM_EXMODE;
+
+    return 0;
+
+ cleanup:
+    VIR_FREE(newName);
+
+    return -1;
+}
+
+static int virLockManagerDlmAcquire(virLockManagerPtr lock,
+                                    const char *state ATTRIBUTE_UNUSED,
+                                    unsigned int flags,
+                                    virDomainLockFailureAction action ATTRIBUTE_UNUSED,
+                                    int *fd)
+{
+    virLockManagerDlmPrivatePtr priv = lock->privateData;
+    virLockInformationPtr theLock = NULL;
+    struct dlm_lksb lksb = {0};
+    int rv = -1, theFd = -1;
+    size_t i;
+
+    virCheckFlags(VIR_LOCK_MANAGER_ACQUIRE_REGISTER_ONLY |
+                  VIR_LOCK_MANAGER_ACQUIRE_RESTRICT, -1);
+
+    /* allowed to start a guest which has read/write disks, but without any leases */
+    if (priv->nresources == 0 &&
+        priv->hasRWDisks &&
+        driver->requireLeaseForDisks) {
+        virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
+                       _("read/write, exclusive access, disks were present, but no leases specified"));
+        return -1;
+    }
+
+    /* accorting to git patch history, add `fd` parameter in order to
+     * 'ensure sanlock socket is labelled with the VM process label',
+     * however, fixing sanlock socket security labelling remove related
+     * code. Now, `fd` parameter is useless.
+     */
+    if (fd)
+        *fd = -1;
+
+    if(!lockspace) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("lockspace is not opened"));
+        return -1;
+    }
+
+    if (!(flags & VIR_LOCK_MANAGER_ACQUIRE_REGISTER_ONLY)) {
+        VIR_DEBUG("Acquiring object %zu", priv->nresources);
+
+        theFd = open(driver->lockRecordFilePath, O_RDWR);
+        if (theFd < 0) {
+            virReportSystemError(errno,
+                                 _("unable to open '%s'"), driver->lockRecordFilePath);
+            return -1;
+        }
+
+        for (i = 0; i < priv->nresources; i++) {
+            VIR_DEBUG("Acquiring object %zu", priv->nresources);
+
+            memset(&lksb, 0, sizeof(lksb));
+            rv = dlm_ls_lock_wait(lockspace, priv->resources[i].mode,
+                                  &lksb, LKF_NOQUEUE|LKF_PERSISTENT,
+                                  priv->resources[i].name, strlen(priv->resources[i].name),
+                                  0, NULL, NULL, NULL);
+            /* both `rv` and `lksb.sb_status` equal 0 means lock sucessfully */
+            if (rv || lksb.sb_status) {
+                if (lksb.sb_status == EAGAIN)
+                    virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                                   _("failed to acquire lock: the lock could not be granted"));
+                else {
+                    virReportSystemError(errno,
+                                         _("failed to acquire lock: rv=%d lockStatus=%d"),
+                                         rv, lksb.sb_status);
+                }
+                /* rv would be 0 although acquiring lock failed */
+                rv = -1;
+                goto cleanup;
+            }
+
+            theLock = virLockManagerDlmRecordLock(priv->resources[i].name,
+                                                  priv->resources[i].mode,
+                                                  lksb.sb_lkid,
+                                                  priv->vm_pid);
+            if (!theLock) {
+			virReportSystemError(errno,
+                                     _("unable to record lock information, "
+                                        "lockName=%s lockMode=%s lockId=%d vm_pid=%jd"),
+                                     NULLSTR(priv->resources[i].name),
+                                     NULLSTR(virLockManagerDlmToModeText(priv->resources[i].mode)),
+                                     lksb.sb_lkid, (intmax_t)priv->vm_pid);
+                /* record lock failed, we can't save lock information in memory, so release it */
+                rv = dlm_ls_unlock_wait(lockspace, lksb.sb_lkid, 0, &lksb);
+                if (!rv)
+                    virReportSystemError(errno,
+                                         _("failed to release lock: rv=%d lockStatue=%d"),
+                                         rv, lksb.sb_status);
+                rv = -1;
+                goto cleanup;
+            }
+
+            virMutexLock(&(lockListWait.fileMutex));
+            virLockManagerDlmWriteLock(theLock, theFd, 1);
+            virMutexUnlock(&(lockListWait.fileMutex));
+        }
+
+        if(VIR_CLOSE(theFd) < 0) {
+            virReportSystemError(errno,
+                                 _("unable to save file '%s'"),
+                                driver->lockRecordFilePath);
+            goto cleanup;
+        }
+    }
+
+    if (flags & VIR_LOCK_MANAGER_ACQUIRE_RESTRICT) {
+        /* no daemon watches this fd, do nothing here, just close lockspace before `execv`
+         * `dlm_close_lockspace` always return 0, so ignore return value
+         */
+        ignore_value(dlm_close_lockspace(lockspace));
+        lockspace = NULL;
+    }
+
+    rv = 0;
+
+ cleanup:
+    if (rv)
+        VIR_FORCE_CLOSE(theFd);
+    return rv;
+}
+
+static void virLockManagerDlmDeleteLock(const virLockInformationPtr lock,
+                                        const char *lockRecordFilePath)
+{
+    int fd = -1;
+
+    if (!lock)
+        return;
+
+    virMutexLock(&(lockListWait.listMutex));
+    virListDelete(&(lock->entry));
+    virMutexUnlock(&(lockListWait.listMutex));
+
+    fd = open(lockRecordFilePath, O_RDWR);
+    if (fd < 0) {
+        virReportSystemError(errno,
+                             _("unable to open '%s'"), lockRecordFilePath);
+        goto cleanup;
+    }
+
+    virMutexLock(&(lockListWait.fileMutex));
+    virLockManagerDlmWriteLock(lock, fd, 0);
+    virMutexUnlock(&(lockListWait.fileMutex));
+
+    if (VIR_CLOSE(fd) < 0) {
+        virReportSystemError(errno,
+                             _("unable to save file '%s'"),
+                             lockRecordFilePath);
+        VIR_FORCE_CLOSE(fd);
+    }
+
+ cleanup:
+    VIR_FREE(lock->name);
+    VIR_FREE(lock);
+}
+
+static int virLockManagerDlmRelease(virLockManagerPtr lock,
+                                    char **state,
+                                    unsigned int flags)
+{
+    virLockManagerDlmPrivatePtr priv = lock->privateData;
+    virLockManagerDlmResourcePtr resource = NULL;
+    virLockInformationPtr theLock = NULL;
+    struct dlm_lksb lksb = {0};
+    int rv = -1;
+    size_t i;
+
+    virCheckFlags(0, -1);
+
+    if(state)
+        *state = NULL;
+
+    if(!lockspace) {
+        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
+                       _("lockspace is not opened"));
+        return -1;
+    }
+
+    for (i = 0; i < priv->nresources; i++) {
+        resource = priv->resources + i;
+
+        virListForEachEntry (theLock, &(lockListWait.list), entry) {
+            if((theLock->vm_pid == priv->vm_pid)    &&
+               STREQ(theLock->name, resource->name) &&
+               (theLock->mode == resource->mode)) {
+
+                /*
+                 * there are some locks from adopting, the existence of `(void *)1`
+                 * when adopting makes 'terminated by signal SIGSEGV (Address
+                 * boundary error)' error appear.
+                 *
+                 * The following code reference to lvm2 project's implement.
+                 */
+                lksb.sb_lkid = theLock->lkid;
+                rv = dlm_ls_lock_wait(lockspace, LKM_NLMODE,
+                                      &lksb, LKF_CONVERT,
+                                      resource->name,
+                                      strlen(resource->name),
+                                      0, NULL, NULL, NULL);
+
+                if (rv < 0) {
+                    virReportSystemError(errno,
+                                         _("failed to convert lock: rv=%d lockStatus=%d"),
+                                         rv, lksb.sb_status);
+                    goto cleanup;
+                }
+
+                /* don't care whether the lock is released or not,
+                 * it will be automatically released after the libvirtd dead
+                 */
+                virLockManagerDlmDeleteLock(theLock, driver->lockRecordFilePath);
+
+                rv = dlm_ls_unlock_wait(lockspace, lksb.sb_lkid, 0, &lksb);
+                if (rv < 0) {
+                    virReportSystemError(errno,
+                                         _("failed to release lock: rv=%d lockStatus=%d"),
+                                         rv, lksb.sb_status);
+                    goto cleanup;
+                }
+
+                break;
+            }
+        }
+    }
+
+    rv = 0;
+
+ cleanup:
+    return rv;
+}
+
+static int virLockManagerDlmInquire(virLockManagerPtr lock ATTRIBUTE_UNUSED,
+                                    char **state,
+                                    unsigned int flags)
+{
+    /* not support mannual lock, so this function almost does nothing */
+    virCheckFlags(0, -1);
+
+    if (state)
+        *state = NULL;
+
+    return 0;
+}
+
+virLockDriver virLockDriverImpl =
+{
+    .version = VIR_LOCK_MANAGER_VERSION,
+
+    .flags = VIR_LOCK_MANAGER_USES_STATE, // currently not used
+
+    .drvInit = virLockManagerDlmInit,
+    .drvDeinit = virLockManagerDlmDeinit,
+
+    .drvNew = virLockManagerDlmNew,
+    .drvFree = virLockManagerDlmFree,
+
+    .drvAddResource = virLockManagerDlmAddResource,
+
+    .drvAcquire = virLockManagerDlmAcquire,
+    .drvRelease = virLockManagerDlmRelease,
+    .drvInquire = virLockManagerDlmInquire,
+};
diff --git a/src/util/virlist.h b/src/util/virlist.h
new file mode 100644
index 000000000..4ac3626c7
--- /dev/null
+++ b/src/util/virlist.h
@@ -0,0 +1,110 @@
+/*
+ * virlist.h: methods for managing list, logic is copied from Linux Kernel
+ *
+ * Copyright (C) 2018 SUSE LINUX Products, Beijing, China.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library.  If not, see
+ * <http://www.gnu.org/licenses/>.
+ *
+ */
+
+#ifndef __VIR_LIST_H
+#define __VIR_LIST_H
+
+#include <stdbool.h>
+
+typedef struct _virListHead virListHead;
+typedef virListHead *virListHeadPtr;
+
+struct _virListHead {
+    struct _virListHead *next, *prev;
+};
+
+static inline void
+virListHeadInit(virListHeadPtr name)
+{
+    name->next = name;
+    name->prev = name;
+}
+
+static inline void
+__virListAdd(virListHeadPtr entry,
+             virListHeadPtr prev, virListHeadPtr next)
+{
+    next->prev = entry;
+    entry->next = next;
+    entry->prev = prev;
+    prev->next = entry;
+}
+
+static inline void
+virListAdd(virListHeadPtr entry, virListHeadPtr head)
+{
+    __virListAdd(entry, head, head->next);
+}
+
+static inline void
+virListAddTail(virListHeadPtr entry, virListHeadPtr head)
+{
+    __virListAdd(entry, head->prev, head);
+}
+
+static inline void
+__virListDelete(virListHeadPtr prev, virListHeadPtr next)
+{
+    next->prev = prev;
+    prev->next = next;
+}
+
+static inline void
+virListDelete(virListHeadPtr entry)
+{
+    __virListDelete(entry->prev, entry->next);
+}
+
+static inline bool
+virListEmpty(virListHeadPtr head)
+{
+    return head->next == head;
+}
+
+#ifndef virContainerOf
+#define virContainerOf(ptr, type, member) \
+    (type *)((char *)(ptr) - (char *) &((type *)0)->member)
+#endif
+
+#define virListEntry(ptr, type, member) \
+    virContainerOf(ptr, type, member)
+
+#define virListFirstEntry(ptr, type, member) \
+    virListEntry((ptr)->next, type, member)
+
+#define virListLastEntry(ptr, type, member) \
+    virListEntry((ptr)->prev, type, member)
+
+#define __virContainerOf(ptr, sample, member)             \
+    (void *)virContainerOf((ptr), typeof(*(sample)), member)
+
+#define virListForEachEntry(pos, head, member)              \
+    for (pos = __virContainerOf((head)->next, pos, member);       \
+     &pos->member != (head);                    \
+     pos = __virContainerOf(pos->member.next, pos, member))
+
+#define virListForEachEntrySafe(pos, tmp, head, member)         \
+    for (pos = __virContainerOf((head)->next, pos, member),       \
+     tmp = __virContainerOf(pos->member.next, pos, member);       \
+     &pos->member != (head);                    \
+     pos = tmp, tmp = __virContainerOf(pos->member.next, tmp, member))
+
+#endif
-- 
2.15.1

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/3] Add a plugin dlm for lock manager
Posted by John Ferlan 7 years, 3 months ago

On 02/05/2018 04:07 AM, river wrote:
> dlm is implemented by linux kernel, provides userspace API by
> 'libdlm' to lock/unlock resource, cooperated with cluster
> communication software, such as corosync, could satisfy the
> demands of libvirt lock manager.

This is lacking in any description of the design or what this particular
patch is accomplishing - although there is some of that in the cover. I
imagine it would take too long to write up everything this single patch
does though...

> 
> Signed-off-by: river <lfu@suse.com>

According to how we prefer individuals to identify themselves for
libvirt patches to be pushed, 'river' is not preferred. A legal name is
preferred - there's plenty of examples in libvirt.git...

There are a number of other guidelines at:

https://libvirt.org/hacking.html

One of the guidelines is to "Split large changes into a series of
smaller patches, self-contained if possible, with an explanation of each
patch and an explanation of how the sequence of patches fits together."
- seeing a 1056 line C file and multiple m4 files makes me wonder if
things could be split up a bit more...

Furthermore it states to ensure to run certain tests - based on my git
am of your patches and subsequent attempt to build, I can say for
certainty you did not do.  I don't even have to have the dlm and cpg
installed to figure that out.

Finally - fair warning - I used to work in OpenVMS development. It's
been a while since I've thought about the OpenVMS Lock Manager; however,
I was quite familiar with the concepts when I worked there as it was a
large part of the project I worked on (one task of the tool was to help
track/manage/fix lock issues). I then was in Tru64 Unix which
essentially "rewrote" the OpenVMS Lock Manager into what it called DLM
and I assume ends up being the basis for the open source version.

In any case, whether you can "shoehorn" the dlm world/model into the
virtlockd world/model for the usage of "cluster wide" resource
management - I dunno.

Lots of nitpick details follow - perhaps hard to follow. I tried not to
repeat too much, but there are certainly some things that were so
pervasive that I had to leave it up to you to go back through and deal
with. I tried to reread after writing, but it's been a long day, I'm
tired, and I have some other things to get to. Still I figured at least
a looksee would be beneficial.

> ---
>  configure.ac                  |    6 +
>  m4/virt-cpg.m4                |   37 ++
>  m4/virt-dlm.m4                |   36 ++
>  src/Makefile.am               |   15 +
>  src/locking/lock_driver_dlm.c | 1056 +++++++++++++++++++++++++++++++++++++++++
>  src/util/virlist.h            |  110 +++++
>  6 files changed, 1260 insertions(+)
>  create mode 100644 m4/virt-cpg.m4
>  create mode 100644 m4/virt-dlm.m4
>  create mode 100644 src/locking/lock_driver_dlm.c
>  create mode 100644 src/util/virlist.h
> 

There's no update to docs/locking.html.in nor creation of a
docs/locking-dlm.html.in...

That's where I'd expect to find the details of why someone would want to
choose dlm over lockd or sanlock.  What goodies are provided? What
advantage is there?

FWIW: configure, m4, and make files are not an area I know well enough
to comment...  Still seems like there's a dlm kernel model required, a
libdlm required, and some sort of cluster wide id thing (cpg).

> diff --git a/configure.ac b/configure.ac
> index 4cccf7f4d..4ad90470d 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -265,6 +265,8 @@ LIBVIRT_ARG_PM_UTILS
>  LIBVIRT_ARG_POLKIT
>  LIBVIRT_ARG_READLINE
>  LIBVIRT_ARG_SANLOCK
> +LIBVIRT_ARG_CPG
> +LIBVIRT_ARG_DLM
>  LIBVIRT_ARG_SASL
>  LIBVIRT_ARG_SELINUX
>  LIBVIRT_ARG_SSH2
> @@ -307,6 +309,8 @@ LIBVIRT_CHECK_POLKIT
>  LIBVIRT_CHECK_PTHREAD
>  LIBVIRT_CHECK_READLINE
>  LIBVIRT_CHECK_SANLOCK
> +LIBVIRT_CHECK_CPG
> +LIBVIRT_CHECK_DLM
>  LIBVIRT_CHECK_SASL
>  LIBVIRT_CHECK_SELINUX
>  LIBVIRT_CHECK_SSH2
> @@ -1005,6 +1009,8 @@ LIBVIRT_RESULT_POLKIT
>  LIBVIRT_RESULT_RBD
>  LIBVIRT_RESULT_READLINE
>  LIBVIRT_RESULT_SANLOCK
> +LIBVIRT_RESULT_CPG
> +LIBVIRT_RESULT_DLM
>  LIBVIRT_RESULT_SASL
>  LIBVIRT_RESULT_SELINUX
>  LIBVIRT_RESULT_SSH2
> diff --git a/m4/virt-cpg.m4 b/m4/virt-cpg.m4
> new file mode 100644
> index 000000000..27acda665
> --- /dev/null
> +++ b/m4/virt-cpg.m4
> @@ -0,0 +1,37 @@
> +dnl The libcpg.so library
> +dnl
> +dnl Copyright (C) 2018 SUSE LINUX Products, Beijing, China.
> +dnl
> +dnl This library is free software; you can redistribute it and/or
> +dnl modify it under the terms of the GNU Lesser General Public
> +dnl License as published by the Free Software Foundation; either
> +dnl version 2.1 of the License, or (at your option) any later version.
> +dnl
> +dnl This library is distributed in the hope that it will be useful,
> +dnl but WITHOUT ANY WARRANTY; without even the implied warranty of
> +dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +dnl Lesser General Public License for more details.
> +dnl
> +dnl You should have received a copy of the GNU Lesser General Public
> +dnl License along with this library.  If not, see
> +dnl <http://www.gnu.org/licenses/>.
> +dnl
> +
> +AC_DEFUN([LIBVIRT_ARG_CPG],[
> +  LIBVIRT_ARG_WITH_FEATURE([CPG], [cluster engine CPG library], [check])
> +])
> +
> +AC_DEFUN([LIBVIRT_CHECK_CPG],[
> +  dnl in some distribution, Version is `UNKNOW` in libcpg.pc

UNKNOWN

doesn't make it very useful does it?

> +  if test "x$with_cpg" != "xno"; then
> +    PKG_CHECK_MODULES([CPG], [libcpg], [
> +      with_cpg=yes
> +    ],[
> +      with_cpg=no
> +    ])
> +  fi
> +])
> +
> +AC_DEFUN([LIBVIRT_RESULT_CPG],[
> +  LIBVIRT_RESULT_LIB([CPG])
> +])
> diff --git a/m4/virt-dlm.m4 b/m4/virt-dlm.m4
> new file mode 100644
> index 000000000..dc5f5f152
> --- /dev/null
> +++ b/m4/virt-dlm.m4
> @@ -0,0 +1,36 @@
> +dnl The libdlm.so library
> +dnl
> +dnl Copyright (C) 2018 SUSE LINUX Products, Beijing, China.
> +dnl
> +dnl This library is free software; you can redistribute it and/or
> +dnl modify it under the terms of the GNU Lesser General Public
> +dnl License as published by the Free Software Foundation; either
> +dnl version 2.1 of the License, or (at your option) any later version.
> +dnl
> +dnl This library is distributed in the hope that it will be useful,
> +dnl but WITHOUT ANY WARRANTY; without even the implied warranty of
> +dnl MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +dnl Lesser General Public License for more details.
> +dnl
> +dnl You should have received a copy of the GNU Lesser General Public
> +dnl License along with this library.  If not, see
> +dnl <http://www.gnu.org/licenses/>.
> +dnl
> +
> +AC_DEFUN([LIBVIRT_ARG_DLM],[
> +  LIBVIRT_ARG_WITH_FEATURE([DLM], [Distributed Lock Manager library], [check])
> +])
> +
> +AC_DEFUN([LIBVIRT_CHECK_DLM],[
> +  AC_REQUIRE([LIBVIRT_CHECK_CPG])
> +  LIBVIRT_CHECK_PKG([DLM], [libdlm], [4.0.0])
> +
> +  if test "x$with_dlm" == "xyes" && test "x$with_cpg" != "xyes"; then
> +    AC_MSG_ERROR([You must install libcpg to build dlm lock])
> +  fi
> +])
> +
> +AC_DEFUN([LIBVIRT_RESULT_DLM],[
> +  AC_REQUIRE([LIBVIRT_RESULT_CPG])
> +  LIBVIRT_RESULT_LIB([DLM])
> +])
> diff --git a/src/Makefile.am b/src/Makefile.am
> index 79adc9ba5..8742921fa 100644
> --- a/src/Makefile.am
> +++ b/src/Makefile.am
> @@ -305,6 +305,10 @@ DRIVER_SOURCES = \
>  		logging/log_manager.c logging/log_manager.h \
>  		$(NULL)
>  
> +LOCK_DRIVER_DLM_SOURCES = \
> +		locking/lock_driver_dlm.c \
> +		$(NULL)
> +
>  LOCK_DRIVER_SANLOCK_SOURCES = \
>  		locking/lock_driver_sanlock.c
>  
> @@ -2879,6 +2883,17 @@ virtlogd.socket: logging/virtlogd.socket.in $(top_builddir)/config.status
>  	    < $< > $@-t && \
>  	    mv $@-t $@
>  
> +if WITH_DLM
> +lockdriver_LTLIBRARIES += dlm.la
> +dlm_la_SOURCES = $(LOCK_DRIVER_DLM_SOURCES)
> +dlm_la_CFLAGS = -I$(srcdir)/conf $(AM_CFLAGS)
> +dlm_la_LDFLAGS = -module -avoid-version $(AM_LDFLAGS)
> +dlm_la_LIBADD = ../gnulib/lib/libgnu.la \
> +				$(CPG_LIBS) \
> +				$(DLM_LIBS)
> +else ! WITH_DLM
> +EXTRA_DIST += $(LOCK_DRIVER_DLM_SOURCES)
> +endif
>  
>  if WITH_SANLOCK
>  lockdriver_LTLIBRARIES += sanlock.la


> diff --git a/src/locking/lock_driver_dlm.c b/src/locking/lock_driver_dlm.c
> new file mode 100644
> index 000000000..e197c0bdf
> --- /dev/null
> +++ b/src/locking/lock_driver_dlm.c
> @@ -0,0 +1,1056 @@
> +/*
> + * lock_driver_dlm.c: a lock driver for dlm
> + *
> + * Copyright (C) 2018 SUSE LINUX Products, Beijing, China.
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library.  If not, see
> + * <http://www.gnu.org/licenses/>.
> + *
> + */
> +
> +#include <config.h>
> +
> +#include <sys/types.h>
> +#include <sys/stat.h>
> +#include <stdint.h>
> +#include <errno.h>
> +#include <fcntl.h>
> +#include <stdio.h>
> +
> +#include <corosync/cpg.h>
> +#include <libdlm.h>
> +
> +#include "lock_driver.h"
> +#include "viralloc.h"
> +#include "virconf.h"
> +#include "vircrypto.h"
> +#include "virerror.h"
> +#include "virfile.h"
> +#include "virlist.h"
> +#include "virlog.h"
> +#include "virstring.h"
> +#include "virthread.h"
> +#include "viruuid.h"
> +
> +#define VIR_FROM_THIS VIR_FROM_LOCKING
> +
> +#define DLM_LOCKSPACE_MODE  0600
> +#define DLM_LOCKSPACE_NAME  "libvirt"
> +
> +#define LOCK_RECORD_FILE_MODE       0644
> +#define LOCK_RECORD_FILE_PATH       "/tmp/libvirtd-dlm-file"

Is somewhere in /tmp the best/only place to put this? What not similar
to lockd/sanlock using /var/lib/libvirt/dlm... (which would seemingly
need to be created on installation).

> +
> +#define PRMODE  "PRMODE"
> +#define EXMODE  "EXMODE"

So the two modes will be "Protected Read" (shared) and "Exclusive"
(unshared).

My recollection is a resource took a NL (e.g., I'm interested) lock,
then would promote/demote from there as it used/needed a resource.
Having the NL is perhaps especially useful to know when the resource in
question has had some other adjustment to it.  In the OpenVMS world, you
could define a function to be called when some other process/node in the
environment became interested or requested a higher mode (those details
are way too foggy to remember).

Not really quite sure what using PR and EX will really buy you in the
way of "features". Something w/ PR will block EX until PR is dropped
(and vice versa). Depending on "how" you envision things to work, I
would think you could use NL, CR, PR, and PW rather usefully.

> +
> +#define STATUS             "STATUS"
> +#define RESOURCE_NAME      "RESOURCE_NAME"
> +#define LOCK_ID            "LOCK_ID"
> +#define LOCK_MODE          "LOCK_MODE"
> +#define VM_PID             "VM_PID"
> +
> +#define BUFFERLEN          128

Yuck - for a stack variable...

> +
> +/* This will be set after dlm_controld is started. */
> +#define DLM_CLUSTER_NAME_PATH "/sys/kernel/config/dlm/cluster/cluster_name"
> +
> +VIR_LOG_INIT("locking.lock_driver_dlm");

This should be closer to the VIR_FROM_THIS - it's lost in the middle here.

> +
> +typedef struct _virLockInformation virLockInformation;
> +typedef virLockInformation *virLockInformationPtr;
> +
> +typedef struct _virLockManagerDlmResource virLockManagerDlmResource;
> +typedef virLockManagerDlmResource *virLockManagerDlmResourcePtr;

Instead of "Dlm" why not use "DLM" in names?  This is quite pervasive.

> +
> +typedef struct _virLockManagerDlmPrivate virLockManagerDlmPrivate;
> +typedef virLockManagerDlmPrivate *virLockManagerDlmPrivatePtr;
> +
> +typedef struct _virLockManagerDlmDriver virLockManagerDlmDriver;
> +typedef virLockManagerDlmDriver *virLockManagerDlmDriverPtr;
> +
> +typedef struct _virListWait virListWait;
> +typedef virListWait *virListWaitPtr;
> +
> +struct _virLockInformation {

Could just have gone with _virLockInfo to cut down on chars to type.

> +    virListHead entry;

I would think rather than (potentially long) linked lists, using hash
tables would be a bit more efficient. As would using the libvirt object
model.

> +    char    *name;
> +    uint32_t mode;
> +    uint32_t lkid;

I would think uint32_t would be limiting...  Why not just unsigned int?

> +    pid_t    vm_pid;
> +};
> +
> +struct _virLockManagerDlmResource {
> +    char    *name;
> +    uint32_t mode;
> +};
> +
> +struct _virLockManagerDlmPrivate {
> +    unsigned char vm_uuid[VIR_UUID_BUFLEN];
> +    char         *vm_name;
> +    pid_t         vm_pid;
> +    int           vm_id;
> +
> +    size_t        nresources;
> +    virLockManagerDlmResourcePtr resources;
> +
> +    bool          hasRWDisks;
> +};
> +
> +struct _virLockManagerDlmDriver {
> +    bool  autoDiskLease;
> +    bool  requireLeaseForDisks;
> +
> +	bool  purgeLockspace;
> +	char *lockspaceName;

Alignment problems - looks like usage of a <tab> instead of spaces.

> +    char *lockRecordFilePath;
> +};
> +
> +struct _virListWait {
> +    virMutex listMutex;
> +    virMutex fileMutex;
> +    virListHead list;
> +};
> +
> +static virLockManagerDlmDriverPtr driver;
> +static dlm_lshandle_t lockspace;
> +static virListWait lockListWait;

Multiple globals?  Why aren't the last 2 part of the @driver?

> +

Not a single comment for any of this functions - they're not all self
documenting.  Each should provide some sort of hint about arguments,
purpose, and returns.  Not just this function either.

> +static int virLockManagerDlmLoadConfig(const char *configFile)

None of your code uses our more standard declaration syntax:

static int
virLockManagerDlmLoadConfig(const char *configFile)


Also, I think the config "feature" should be introduced in a separate
patch that would include the config stuff from patch 3...

> +{
> +    virConfPtr conf = NULL;
> +    int ret = -1;
> +
> +    if (access(configFile, R_OK) == -1) {
> +        if (errno != ENOENT) {
> +            virReportSystemError(errno,
> +                                 _("Unable to access config file %s"),
> +                                 configFile);
> +            return -1;
> +        }
> +        return 0;
> +    }
> +
> +	if (!(conf = virConfReadFile(configFile, 0)))
       ^
The indention is wrong here.

> +		return -1;

and really wrong here.

> +
> +    if (virConfGetValueBool(conf, "auto_disk_leases", &driver->autoDiskLease) < 0)

These are some really long lines - typically like to keep "around" 80
characters with perhaps 1-3 chars of "))) {" type overflow.

Not everyone keeps 132 char wide screens

> +        goto cleanup;
> +
> +    driver->requireLeaseForDisks = !driver->autoDiskLease;
> +    if (virConfGetValueBool(conf, "require_lease_for_disks", &driver->requireLeaseForDisks) < 0)
> +        goto cleanup;
> +
> +    if (virConfGetValueBool(conf, "purge_lockspace", &driver->purgeLockspace) < 0)
> +        goto cleanup;
> +
> +    if (virConfGetValueString(conf, "lockspace_name", &driver->lockspaceName) < 0)

This leaks driver->lockspaceName from virLockManagerDlmInit

> +        goto cleanup;
> +
> +    if (virConfGetValueString(conf, "lock_record_file_path", &driver->lockRecordFilePath) < 0)'

The leaks driver->lockRecordFilePath from virLockManagerDlmInit

> +        goto cleanup;
> +
> +    ret = 0;
> +
> + cleanup:
> +    virConfFree(conf);
> +    return ret;
> +}

Two blank lines between functions for new code is the new normal.

> +
> +static int virLockManagerDlmToModeUint(const char *token)
> +{
> +    if (STREQ(token, PRMODE))
> +        return LKM_PRMODE;
> +    if (STREQ(token, EXMODE))
> +        return LKM_EXMODE;
> +
> +    return 0;
> +}
> +
> +static const char *virLockManagerDlmToModeText(const uint32_t mode)
> +{
> +    switch (mode) {
> +    case LKM_PRMODE:
> +        return PRMODE;
> +    case LKM_EXMODE:
> +        return EXMODE;
> +    default:
> +        return NULL;
> +    }
> +}

I actually think these two should be implemented via the
VIR_ENUM_{DECL|IMPL} macros.

That would result in API's that would allow conversion between type and
string (e.g. vir*ToString and vir*FromString)


> +
> +static virLockInformationPtr virLockManagerDlmRecordLock(const char *name,
> +                                                         const uint32_t mode,
> +                                                         const uint32_t lkid,
> +                                                         const pid_t vm_pid)

This feels "separable" as in something done after you've created the
base code and then want to add in whatever "feature" or "functionality"
this provides.

> +{
> +    virLockInformationPtr lock = NULL;
> +
> +    if (VIR_ALLOC(lock) < 0)
> +        goto error;> +
> +    if (VIR_STRDUP(lock->name, name) < 0)
> +        goto error;
> +
> +    lock->mode = mode;
> +    lock->lkid = lkid;
> +    lock->vm_pid = vm_pid;
> +
> +    virMutexLock(&(lockListWait.listMutex));
> +    virListAddTail(&lock->entry, &(lockListWait.list));
> +    virMutexUnlock(&(lockListWait.listMutex));

What is this list for?  I think this also is completely separable.  I
wouldn't use a list either... 10s of domains probably works fine.  100s
of domains perhaps not as well 1000s of domains starts being a
bottleneck for sure.

> +
> +    VIR_DEBUG("record lock sucessfully, lockName=%s lockMode=%s lockId=%d",

successfully

> +              NULLSTR(name), NULLSTR(virLockManagerDlmToModeText(mode)), lkid);
> +
> +    return lock;
> +
> + error:
> +    if (lock)
> +        VIR_FREE(lock->name);
> +    VIR_FREE(lock);

Since you'll be having @lock contain something that needs VIR_FREE,
implement a virLockDlmLockInfoFree type function that would free the
lock, e.g.:

    if (!lock)
        return;
    VIR_FREE(lock->name);
    VIR_FREE(lock);

> +    return NULL;
> +}
> +
> +static void virLockManagerDlmWriteLock(virLockInformationPtr lock, int fd, bool status)

When you had multiple arguments in a previous decl you split them across
multiple lines - should do the same here.

> +{
> +    char buffer[BUFFERLEN] = {0};

Rather than a stack based length...  Allocate on the fly.

> +    off_t offset = 0, rv = 0;
> +
> +    if (!lock) {

So is this because you don't want to support calling (NULL, fd, status)
or that you don't want to support a @lock parameter that happens to be
NULL "for some reason".

If the former, then use ATTRIBUTE_NONNULL
If the latter, then why is this a void function how do we know if it
succeed or failed?

> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("lock is NULL"));
> +        return;
> +    }
> +
> +    /*
> +     * STATUS RESOURCE_NAME LOCK_MODE VM_PID\n
> +     *      6            64         9     10
> +     * 93 = 6 + 1 + 64 + 1 + 9 + 1 + 10 + 1

Each of the +1 is a "separator" char ' '?  And it's guaranteed somewhere
that someone hasn't created/used a resource name of "happy gilmore".

> +     */
> +    offset = 93 * lock->lkid;

'*'?? Wait what?

> +	rv = lseek(fd, offset, SEEK_SET);

alignment issues (e.g. <tab>)

> +	if (rv < 0) {
> +		virReportSystemError(errno,
> +							 _("unable to lseek fd '%d'"),
> +                             fd);

Even more and different alignment issues.

> +        return;
> +    }
> +
> +    snprintf(buffer, sizeof(buffer), "%6d %64s %9s %10jd\n", \
> +             status, lock->name,
> +             NULLSTR(virLockManagerDlmToModeText(lock->mode)),> +             (intmax_t)lock->vm_pid);

The virAsprintf(buffer, would I think be a better mechanism...

Having a NULL returned from *ToModeText would be killer wouldn't it?
Why does the mode matter anyway in the buffer you're creating?

An "intmax_t" does that work consistently and as expected everywhere?
and 10 places only - would seem intmax could be larger than 10 numbers
so that would seem to indicate some truncation could eventually happen
thus causing algorithm failures.

> +
> +    if (safewrite(fd, buffer, strlen(buffer)) != strlen(buffer)) {
> +        virReportSystemError(errno,
> +                             _("unable to write lock information '%s' to file '%s'"),
> +                             buffer, NULLSTR(driver->lockRecordFilePath));
> +        return;
> +    }

Not sure I understand the reason of the file

> +
> +    VIR_DEBUG("write '%s' to fd=%d", buffer, fd);
> +
> +    fdatasync(fd);
> +
> +    return;

How do we know if it failed?

> +}
> +
> +static void virLockManagerDlmAdoptLock(char *raw) {
> +    char *str = NULL, *subtoken = NULL, *saveptr = NULL, *endptr = NULL;
> +    int i = 0, status = 0;
> +    char *name = NULL;
> +    uint32_t mode = 0;
> +    pid_t vm_pid = 0;
> +    struct dlm_lksb lksb = {0};
> +
> +    /* every line is the following format:
> +     *   STATUS RESOURCE_NAME LOCK_MODE VM_PID
> +     */

See/use virStringSplit* I think it would be easier that whatever it is
you're doing below.

> +    for (i = 0, str = raw, status = 0; ; str = NULL, i++) {
> +        subtoken = strtok_r(str, " \n", &saveptr);
> +        if (subtoken == NULL)
> +            break;
> +
> +        switch(i) {
> +        case 0:

These are magic numbers?

> +            if (virStrToLong_i(subtoken, &endptr, 10, &status) < 0) {
> +                virReportError(VIR_ERR_INTERNAL_ERROR,
> +                               _("cannot extract lock status '%s'"), subtoken);
> +                goto cleanup;
> +            }
> +            break;
> +        case 1:
> +            if (VIR_STRDUP(name, subtoken) != 1)
> +                goto cleanup;
> +            break;
> +        case 2:
> +            mode = virLockManagerDlmToModeUint(subtoken);
> +            if (!mode)
> +                goto cleanup;
> +            break;
> +        case 3:
> +            if ((virStrToLong_i(subtoken, &endptr, 10, &vm_pid) < 0) || !vm_pid) {
> +                virReportError(VIR_ERR_INTERNAL_ERROR,
> +                               _("cannot extract lock vm_pid '%s'"), subtoken);
> +                goto cleanup;
> +            }
> +            break;
> +        default:
> +            goto cleanup;
> +            break;
> +        }
> +
> +        if (status != 1)
> +            goto cleanup;
> +    }
> +
> +    if (i != 4)
> +        goto cleanup;
> +
> +    /* copy from `lm_adopt_dlm` in daemons/lvmlockd/lvmlockd-dlm.c of lvm2:
> +	 *   dlm returns 0 for success, -EAGAIN if an orphan is

bad indent

> +     *   found with another mode, and -ENOENT if no orphan.
> +     *
> +     *   cast/bast/param are (void *)1 because the kernel
> +     *   returns errors if some are null.
> +     */
> +
> +    status = dlm_ls_lockx(lockspace, mode, &lksb, LKF_PERSISTENT|LKF_ORPHAN,
> +                          name, strlen(name), 0,
> +                          (void *)1, (void *)1, (void *)1,

^^ These would seem to relate to the same parametes used for
dlm_lock[_wait].  They are actually quite useful when it comes to lock
conversions.  You may want to considering investigating that...

Still seems like a "bug" to me to need to pass 1 as a callback function
or a blocking callback function.

> +                          NULL, NULL);

Why not dlm_is_lock() instead since you're not using xid and timeout

> +    if (status) {

if (status < 0) or (status == -1)

> +        virReportSystemError(errno,
> +                             _("unable to adopt lock, rv=%d lockName=%s lockMode=%s"),

s/rv/status

> +                             status, name, NULLSTR(virLockManagerDlmToModeText(mode)));
> +        goto cleanup;
> +    }
> +
> +    if (!virLockManagerDlmRecordLock(name, mode, lksb.sb_lkid, vm_pid)) {

Hmm.. I really don't understand the Recording yet.

> +        virReportSystemError(errno,
> +                             _("unable to record lock information, "
> +                                 "lockName=%s lockMode=%s lockId=%d vm_pid=%jd"),
> +                             NULLSTR(name), NULLSTR(virLockManagerDlmToModeText(mode)),

How can name be possible NULL here but non NULL above?

> +                             lksb.sb_lkid, (intmax_t)vm_pid);

(intmax_t) limited...

> +    }
> +
> +
> + cleanup:
> +    if (name)

Unnecessary to have if (name), VIR_FREE does that for you

> +        VIR_FREE(name);> +
> +    return;

So how would any one know this is successful or not?

> +}
> +
> +static int virLockManagerDlmPrepareLockList(const char *lockRecordFilePath)
> +{
> +    FILE *fp = NULL;
> +    int line = 0;
> +    size_t n = 0;
> +    ssize_t count = 0;
> +    char *buffer = NULL;
> +
> +    fp = fopen(lockRecordFilePath, "r");
> +    if (!fp) {
> +        virReportSystemError(errno,
> +                             _("unable to open '%s'"), lockRecordFilePath);
> +        return -1;
> +    }
> +
> +    /* lock information is from the second line */

The format of this file really needs to be documented somehow

> +    for (line = 0; !feof(fp); line++) {
> +        count = getline(&buffer, &n, fp);
> +        if (count <= 0)
> +            break;
> +
> +        switch (line) {
> +        case 0:
> +            break;
> +        default:
> +            virLockManagerDlmAdoptLock(buffer);
> +            break;
> +        }
> +    }
> +
> +    VIR_FORCE_FCLOSE(fp);
> +    VIR_FREE(buffer);
> +
> +    return 0;
> +}
> +
> +static int virLockManagerDlmGetLocalNodeId(uint32_t *nodeId)
> +{
> +    cpg_handle_t handle = 0;
> +    int rv = -1;
> +
> +    if (cpg_model_initialize(&handle, CPG_MODEL_V1, NULL, NULL) != CS_OK) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("unable to create a new connection to the CPG service"));
> +		return -1;

alignment issue

> +    }
> +
> +	if( cpg_local_get(handle, nodeId) != CS_OK) {

    if (cpg...

Might be nice to know what error is returned

    xx = cpg*
    if (xx != CS_OK) {

> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("unable to get the local node id by the CPG service"));

include xx in your output.

> +        goto cleanup;
> +    }
> +
> +    VIR_DEBUG("the local nodeid=%u", *nodeId);
> +
> +    rv = 0;
> +
> + cleanup:
> +    if (cpg_finalize(handle) != CS_OK)
> +        VIR_WARN("unable to finalize the CPG service");

but still OK to return @rv...

> +
> +	return rv;
> +}
> +
> +static int virLockManagerDlmDumpLockList(const char *lockRecordFilePath)
> +{
> +    virLockInformationPtr theLock = NULL;
> +    char buffer[BUFFERLEN] = {0};

Why a stack variable?

> +    int fd = -1, rv = -1;
> +
> +    /* not need mutex because of only one instance would be initialized */
> +    fd = open(lockRecordFilePath, O_WRONLY|O_CREAT|O_TRUNC, LOCK_RECORD_FILE_MODE);

That's a very strange set of O_* values for a dump function.

> +    if (fd < 0) {
> +        virReportSystemError(errno,
> +                             _("unable to open '%s'"),
> +                             lockRecordFilePath);
> +        return -1;
> +    }
> +
> +    snprintf(buffer, sizeof(buffer), "%6s %64s %9s %10s\n", \

Why the \

> +                    STATUS, RESOURCE_NAME, LOCK_MODE, VM_PID);

Again virAsprintf

> +    if (safewrite(fd, buffer, strlen(buffer)) != strlen(buffer)) {
> +        virReportSystemError(errno,
> +                             _("unable to write '%s' to '%s'"),
> +                             buffer, lockRecordFilePath);
> +        goto cleanup;
> +    }
> +
> +    virListForEachEntry(theLock, &(lockListWait.list), entry) {
> +        virLockManagerDlmWriteLock(theLock, fd, 1);
> +    }

No need for the { } because just one line... There are those that really
don't like that type of obfuscation.

> +
> +    if (VIR_CLOSE(fd) < 0) {
> +        virReportSystemError(errno,
> +                             _("unable to close file '%s'"),
> +                             lockRecordFilePath);
> +        goto cleanup;
> +    }
> +
> +    rv = 0;
> +
> + cleanup:
> +    if (rv)
> +        VIR_FORCE_CLOSE(fd);
> +    return rv;
> +}
> +
> +static int virLockManagerDlmSetupLockRecordFile(const char *lockRecordFilePath,
> +                                                const bool newLockspace,
> +                                                const bool purgeLockspace)
> +{
> +    uint32_t nodeId = 0;
> +
> +    /* there maybe some orphan locks recorded in the lock record file which
> +     * should be adopted if lockspace is opened instead of created, we adopt
> +     * them then add them in the list.
> +     */
> +    if (!newLockspace &&
> +        virLockManagerDlmPrepareLockList(lockRecordFilePath)) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR,
> +                       _("unable to adopt locks from '%s'"),
> +                       NULLSTR(lockRecordFilePath));

If @lockRecordFilePath cannot be passed as NULL, then just use
ATTRIBUTE_NONNULL ... If though it's some variable that may be NULL,
then <sigh>.

> +        return -1;
> +    }
> +
> +    /* purgeLockspace flag means purging orphan locks belong to any process
> +     * in this lockspace.
> +     */
> +    if (purgeLockspace && !virLockManagerDlmGetLocalNodeId(&nodeId)) {
> +        if (dlm_ls_purge(lockspace, nodeId, 0)) {
> +            VIR_WARN("node=%u purge DLM locks failed in lockspace=%s",
> +                     nodeId, NULLSTR(driver->lockspaceName));
> +        }
> +        else
> +            VIR_DEBUG("node=%u purge DLM locks success in lockspace=%s",
> +                      nodeId, NULLSTR(driver->lockspaceName));

Both the if {} and the else need {}

> +    }
> +
> +    /* initialize the lock record file */
> +    if (virLockManagerDlmDumpLockList(lockRecordFilePath)) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR,
> +                       _("unable to initialize the lock record file '%s'"),
> +                       lockRecordFilePath);
> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +
> +static int virLockManagerDlmSetup(void)
> +{
> +    bool newLockspace = false;
> +
> +    virListHeadInit(&(lockListWait.list));
> +    if ((virMutexInit(&(lockListWait.listMutex)) < 0) ||
> +        (virMutexInit(&(lockListWait.fileMutex)) < 0)) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("unable to initialize mutex"));
> +        return -1;
> +    }

We have have an "internal" list of locks and use some sort of backing
file for those, true?

> +
> +
> +    /* check whether dlm is running or not */
> +    if (access(DLM_CLUSTER_NAME_PATH, F_OK)) {

IOW: !virFileExists()

> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("check dlm_controld, ensure it has setuped"));
> +        return -1;
> +    }
> +
> +    /* open lockspace, create it if it doesn't exist */
> +    lockspace = dlm_open_lockspace(driver->lockspaceName);
> +    if (!lockspace) {
> +        lockspace = dlm_create_lockspace(driver->lockspaceName, DLM_LOCKSPACE_MODE);
> +        if (!lockspace) {
> +            virReportSystemError(errno, "%s",
> +                                 _("unable to open and create DLM lockspace"));
> +            return -1;
> +        }
> +        newLockspace = true;
> +    }

@lockspace should be stored in driver-> I would think.

> +
> +    /* create thread to receive notification from kernel */

notification for what? what gets called?

> +    if (dlm_ls_pthread_init(lockspace)) {
> +        if (errno != EEXIST) {
> +            virReportSystemError(errno, "%s",
> +                                 _("unable to initialize lockspace"));
> +            return -1;
> +        }
> +    }
> +
> +    /* we need file to record lock information used by rebooted libvirtd */
> +    if (virLockManagerDlmSetupLockRecordFile(driver->lockRecordFilePath,
> +                                             newLockspace,
> +                                             driver->purgeLockspace)) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("unable to initialize DLM lock file"));

Well not able to initialize the recording file technically.

> +        return -1;
> +    }
> +
> +    return 0;
> +}
> +


Above here seems to be able to be split from below here - probably even
possible to create a something that doesn't do much in the first pass,
then adds more functionality as we go along.  More patches, but easier
to review and read.

IOW, multple patches please.

> +static int virLockManagerDlmDeinit(void);

Why a forward decl?  Why not just define the Deinit first?


> +
> +static int virLockManagerDlmInit(unsigned int version,
> +                                 const char *configFile,
> +                                 unsigned int flags)
> +{
> +    VIR_DEBUG("version=%u configFile=%s flags=0x%x", version, NULLSTR(configFile), flags);
> +
> +    virCheckFlags(0, -1);
> +
> +    if (driver)
> +        return 0;
> +
> +    if (geteuid() != 0) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("dlm lock requires root privileges"));
> +        return -1;
> +    }

hmmm... system only and no support for session mode, something that'll
need to be documented.

> +
> +    if (VIR_ALLOC(driver) < 0)
> +        return -1;
> +
> +    driver->autoDiskLease = true;
> +    driver->requireLeaseForDisks = !driver->autoDiskLease;
> +    driver->purgeLockspace = true;
> +
> +    if (virAsprintf(&driver->lockspaceName,
> +                  "%s", DLM_LOCKSPACE_NAME) < 0)
> +        goto error;

This is nothing more than a VIR_STRDUP(driver->lockspaceName,
DLM_LOCKSPACE_NAME)...
'
> +
> +    if (virAsprintf(&driver->lockRecordFilePath,
> +                  "%s", LOCK_RECORD_FILE_PATH) < 0)
> +        goto error;

Likewise on building the default name.

Eww... You're recording locks "forever"?  How do we distinguish when
locks are removed?

> +
> +    if (virLockManagerDlmLoadConfig(configFile) < 0)
> +        goto error;
> +
> +    if (virLockManagerDlmSetup() < 0)
> +        goto error;
> +
> +    return 0;
> +
> + error:
> +    virLockManagerDlmDeinit();
> +    return -1;
> +}
> +
> +static int virLockManagerDlmDeinit(void)
> +{
> +    virLockInformationPtr theLock = NULL;
> +
> +    if (!driver)
> +        return 0;
> +
> +    if(lockspace)

if (lockspace), although I would think @driver would be able to contain
this rather than another global.

> +        dlm_close_lockspace(lockspace);
> +
> +    /* not care about whether adopting lock or not,
> +     * just release those to prevent memory leak
> +     */
> +    virListForEachEntry(theLock, &(lockListWait.list), entry) {
> +        virListDelete(&(theLock->entry));
> +        VIR_FREE(theLock->name);
> +        VIR_FREE(theLock);

This is of course where a virLockManagerDlmLockInfoFree(theLock) would
come in handy.

> +    }'
> +
> +    VIR_FREE(driver->lockspaceName);
> +    VIR_FREE(driver->lockRecordFilePath);

Will a botched Record file cause an unending loop of Init/Deinit?

> +    VIR_FREE(driver);
> +
> +    return 0;
> +}
> +
> +static int virLockManagerDlmNew(virLockManagerPtr lock,
> +                                unsigned int type,
> +                                size_t nparams,
> +                                virLockManagerParamPtr params,
> +                                unsigned int flags)
> +{
> +    virLockManagerDlmPrivatePtr priv = NULL;
> +    size_t i;
> +
> +    virCheckFlags(VIR_LOCK_MANAGER_NEW_STARTED, -1);
> +
> +    if (!driver) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("dlm plugin is not initialized"));
> +        return -1;
> +    }
> +
> +    if (type != VIR_LOCK_MANAGER_OBJECT_TYPE_DOMAIN) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR,
> +                       _("unsupported object type %d"), type);
> +        return -1;
> +    }

See I prefer the _lockd implementation which uses a case over the
_sanlock implementation which does this same

> +
> +    if (VIR_ALLOC(priv) < 0)
> +        return -1;

@priv is leaked if return -1 in subsequent code.
> +
> +    for (i = 0; i< nparams; i++) {
> +        if (STREQ(params[i].key, "uuid")) {
> +            memcpy(priv->vm_uuid, params[i].value.uuid, VIR_UUID_BUFLEN);
> +        } else if (STREQ(params[i].key, "name")) {
> +            if (VIR_STRDUP(priv->vm_name, params[i].value.str) < 0)
> +                return -1;
> +        } else if (STREQ(params[i].key, "id")) {
> +            priv->vm_id = params[i].value.ui;
> +        } else if (STREQ(params[i].key, "pid")) {
> +            priv->vm_pid = params[i].value.iv;
> +        } else if (STREQ(params[i].key, "uri")) {
> +            /* there would be a warning in some case according to the history patch,
> +             * so ignored
> +             */
> +        } else {
> +            virReportError(VIR_ERR_INTERNAL_ERROR,
> +                           _("unexpected parameter %s for object"),
> +                           params[i].key);

but let's continue anyway <sigh> - similar to _lockd code.

> +        }
> +    }

It's too bad the _sanlock code didn't make the decision to somehow unify
this code.  Nonetheless, perhaps that's a good single "pre" patch for
this series in order to reduce the cut-copy-paste.  Yes, I know "uri"
causes issues. It's not that difficult to figure out.

> +
> +    /* check the following to prevent some unexpexted state in some case */
> +    if (priv->vm_pid == 0) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("missing PID parameter for domain object"));
> +        return -1;
> +    }
> +    if (!priv->vm_name) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("missing name parameter for domain object"));
> +        return -1;
> +    }
> +
> +    if (priv->vm_id == 0) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("missing ID parameter for domain object"));
> +        return -1;
> +    }
> +    if (!virUUIDIsValid(priv->vm_uuid)) {
> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("missing UUID parameter for domain object"));
> +        return -1;
> +    }

Likewise, the error checking would seem to be able to be combined. Scary
that the _sanlock code doesn't do much.. err.. any error checking on
whether these parameters we found.

> +
> +    lock->privateData = priv;
> +
> +    return 0;
> +}
> +
> +static void virLockManagerDlmFree(virLockManagerPtr lock)

This would seem to be more of a PrivateFree type call...

> +{
> +    virLockManagerDlmPrivatePtr priv = lock->privateData;
> +    size_t i;
> +
> +    if (!priv)
> +        return;
> +
> +    for (i = 0; i < priv->nresources; i++)
> +        VIR_FREE(priv->resources[i].name);
> +
> +    VIR_FREE(priv->resources);
> +    VIR_FREE(priv->vm_name);
> +    VIR_FREE(priv);
> +    lock->privateData = NULL;
> +
> +    return;
> +}
> +
> +static int virLockManagerDlmAddResource(virLockManagerPtr lock,
> +                                        unsigned int type, const char *name,

Each param on it's own line

> +                                        size_t nparams,
> +                                        virLockManagerParamPtr params,
> +                                        unsigned int flags)
> +{
> +    virLockManagerDlmPrivatePtr priv = lock->privateData;
> +    char *newName = NULL;
> +
> +    virCheckFlags(VIR_LOCK_MANAGER_RESOURCE_READONLY |
> +                  VIR_LOCK_MANAGER_RESOURCE_SHARED, -1);
> +
> +    /* Treat read only resources as a no-op lock request */

Why couldn't these be CR or NL locks? That way you can track all locks
for everything...  I assume no chance they convert to write at some time.

> +    if (flags & VIR_LOCK_MANAGER_RESOURCE_READONLY)
> +        return 0;
> +

BTW: I think the _sanlock implementation is a bit nicer with the helper
functions to add the disk or lease resource.

> +    switch (type) {
> +    case VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK:
> +            if (params || nparams) {
> +                virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                               _("unexpected parameters for disk resource"));
> +                return -1;
> +            }
> +
> +            if (!driver->autoDiskLease) {
> +                if (!(flags & (VIR_LOCK_MANAGER_RESOURCE_SHARED |
> +                               VIR_LOCK_MANAGER_RESOURCE_READONLY))) {

So we earlier bailed because READONLY was set, so why are we checking
this again?

> +                    priv->hasRWDisks = true;
> +                    /* ignore disk resource without error */
> +                    return 0;
> +                }
> +            }
> +
> +            if (virCryptoHashString(VIR_CRYPTO_HASH_SHA256, name, &newName) < 0)

Interesting _lockd uses SHA256 and _sanlock uses MD5...

> +                goto cleanup;
> +
> +        break;
> +
> +    case VIR_LOCK_MANAGER_RESOURCE_TYPE_LEASE:

hmm... considering the @params and @nparams check in the other case,
would this case need to check if they exist or even use them?  Seems the
_lockd and _sanlock code does something with them. Although I like the
conf variables earlier, I think some "common code" could be generated.

> +        /* we need format the lock information, so the lock name must be the constant length */
> +        if (virCryptoHashString(VIR_CRYPTO_HASH_SHA256, name, &newName) < 0)
> +            goto cleanup;

Neither _lockd nor _sandisk do any sort of Hash on this name.

> +
> +        break;
> +
> +    default:
> +        virReportError(VIR_ERR_INTERNAL_ERROR,
> +                _("unknown lock manager object type %d"),
> +                type);

Incorrect indents

> +        return -1;
> +    }
> +
> +    if (VIR_EXPAND_N(priv->resources, priv->nresources, 1) < 0)
> +        goto cleanup;
> +
> +    priv->resources[priv->nresources-1].name = newName;

Personally not a fan of possibly long linked lists. I like the hash
table model.  Create a small initial hash table and then add/remove
elements.

So there's no check if there's the same name in the list - we just
happily add one?  And this would be a VIR_STEAL_PTR

> +
> +    if (!!(flags & VIR_LOCK_MANAGER_RESOURCE_SHARED))

Ew !!

> +        priv->resources[priv->nresources-1].mode = LKM_PRMODE;
> +    else
> +        priv->resources[priv->nresources-1].mode = LKM_EXMODE;
> +

Understanding how dlm lock transitions work - I'm having a hard time
with just having 2 modes here. Seems to me to remove the best parts of
the dlm code.  You have a resource... at times you care to EX lock it
for write, while others it's OK to be shared for read. It's an ebb and
flow.  Of course whether the callers really ascribe to the same model is
something I'm less aware of as I haven't gone down the path of where/how
the virLockManager* API's are used in the code recently.

> +    return 0;

If you initialized an 'int ret = -1;' at the top and only set it to 0
here then you could fall through to cleanup too as long as you've used
the VIR_STEAL_PTR above on newName.

> +
> + cleanup:
> +    VIR_FREE(newName);
> +
> +    return -1;
> +}
> +
> +static int virLockManagerDlmAcquire(virLockManagerPtr lock,
> +                                    const char *state ATTRIBUTE_UNUSED,
> +                                    unsigned int flags,
> +                                    virDomainLockFailureAction action ATTRIBUTE_UNUSED,
> +                                    int *fd)
> +{
> +    virLockManagerDlmPrivatePtr priv = lock->privateData;
> +    virLockInformationPtr theLock = NULL;
> +    struct dlm_lksb lksb = {0};
> +    int rv = -1, theFd = -1;
> +    size_t i;
> +
> +    virCheckFlags(VIR_LOCK_MANAGER_ACQUIRE_REGISTER_ONLY |
> +                  VIR_LOCK_MANAGER_ACQUIRE_RESTRICT, -1);
> +
> +    /* allowed to start a guest which has read/write disks, but without any leases */
> +    if (priv->nresources == 0 &&
> +        priv->hasRWDisks &&
> +        driver->requireLeaseForDisks) {
> +        virReportError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
> +                       _("read/write, exclusive access, disks were present, but no leases specified"));
> +        return -1;
> +    }
> +
> +    /* accorting to git patch history, add `fd` parameter in order to

according

> +     * 'ensure sanlock socket is labelled with the VM process label',
> +     * however, fixing sanlock socket security labelling remove related
> +     * code. Now, `fd` parameter is useless.
> +     */
> +    if (fd)
> +        *fd = -1;

But you just set it to -1??!

> +
> +    if(!lockspace) {

    if (!lockspace)

or driver->lockspace.

> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("lockspace is not opened"));
> +        return -1;
> +    }
> +
> +    if (!(flags & VIR_LOCK_MANAGER_ACQUIRE_REGISTER_ONLY)) {
> +        VIR_DEBUG("Acquiring object %zu", priv->nresources);
> +
> +        theFd = open(driver->lockRecordFilePath, O_RDWR);
> +        if (theFd < 0) {
> +            virReportSystemError(errno,
> +                                 _("unable to open '%s'"), driver->lockRecordFilePath);
> +            return -1;
> +        }
> +
> +        for (i = 0; i < priv->nresources; i++) {
> +            VIR_DEBUG("Acquiring object %zu", priv->nresources);
> +
> +            memset(&lksb, 0, sizeof(lksb));
> +            rv = dlm_ls_lock_wait(lockspace, priv->resources[i].mode,
> +                                  &lksb, LKF_NOQUEUE|LKF_PERSISTENT,
> +                                  priv->resources[i].name, strlen(priv->resources[i].name),

This is where I wonder why we cannot pass the path to the disk to dlm
instead of the hash

> +                                  0, NULL, NULL, NULL);

So this function allows the NULL parameters but the other one required
(void *)1 - something doesn't seem right.

> +            /* both `rv` and `lksb.sb_status` equal 0 means lock sucessfully */
> +            if (rv || lksb.sb_status) {
> +                if (lksb.sb_status == EAGAIN)
> +                    virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                                   _("failed to acquire lock: the lock could not be granted"));
> +                else {
> +                    virReportSystemError(errno,
> +                                         _("failed to acquire lock: rv=%d lockStatus=%d"),
> +                                         rv, lksb.sb_status);
> +                }

If one side of the if else uses {}, then both must

> +                /* rv would be 0 although acquiring lock failed */
> +                rv = -1;
> +                goto cleanup;
> +            }
> +
> +            theLock = virLockManagerDlmRecordLock(priv->resources[i].name,
> +                                                  priv->resources[i].mode,
> +                                                  lksb.sb_lkid,
> +                                                  priv->vm_pid);
> +            if (!theLock) {
> +			virReportSystemError(errno,
> +                                     _("unable to record lock information, "
> +                                        "lockName=%s lockMode=%s lockId=%d vm_pid=%jd"),
> +                                     NULLSTR(priv->resources[i].name),

Really it can be NULL?

> +                                     NULLSTR(virLockManagerDlmToModeText(priv->resources[i].mode)),
> +                                     lksb.sb_lkid, (intmax_t)priv->vm_pid);

add a blank line to help readability

> +                /* record lock failed, we can't save lock information in memory, so release it */
> +                rv = dlm_ls_unlock_wait(lockspace, lksb.sb_lkid, 0, &lksb);
> +                if (!rv)

Any dlm_ls* call should check "rv < 0" instead of "!rv" for failure as 0
is returned on success from how I'm reading the man pages...

> +                    virReportSystemError(errno,
> +                                         _("failed to release lock: rv=%d lockStatue=%d"),

s/lockStatus/lksbStatus/

> +                                         rv, lksb.sb_status);

Well this certainly isn't good!
> +                rv = -1;
> +                goto cleanup;
> +            }
> +
> +            virMutexLock(&(lockListWait.fileMutex));
> +            virLockManagerDlmWriteLock(theLock, theFd, 1);
> +            virMutexUnlock(&(lockListWait.fileMutex));
> +        }
> +
> +        if(VIR_CLOSE(theFd) < 0) {

    if (VIR...

> +            virReportSystemError(errno,
> +                                 _("unable to save file '%s'"),
> +                                driver->lockRecordFilePath);
> +            goto cleanup;
> +        }
> +    }
> +
> +    if (flags & VIR_LOCK_MANAGER_ACQUIRE_RESTRICT) {
> +        /* no daemon watches this fd, do nothing here, just close lockspace before `execv`
> +         * `dlm_close_lockspace` always return 0, so ignore return value
> +         */
> +        ignore_value(dlm_close_lockspace(lockspace));
> +        lockspace = NULL;
> +    }
> +
> +    rv = 0;
> +
> + cleanup:
> +    if (rv)
> +        VIR_FORCE_CLOSE(theFd);
> +    return rv;
> +}
> +
> +static void virLockManagerDlmDeleteLock(const virLockInformationPtr lock,
> +                                        const char *lockRecordFilePath)
> +{
> +    int fd = -1;
> +
> +    if (!lock)
> +        return;
> +
> +    virMutexLock(&(lockListWait.listMutex));
> +    virListDelete(&(lock->entry));
> +    virMutexUnlock(&(lockListWait.listMutex));
> +
> +    fd = open(lockRecordFilePath, O_RDWR);
> +    if (fd < 0) {
> +        virReportSystemError(errno,
> +                             _("unable to open '%s'"), lockRecordFilePath);
> +        goto cleanup;
> +    }
> +
> +    virMutexLock(&(lockListWait.fileMutex));
> +    virLockManagerDlmWriteLock(lock, fd, 0);

Oh - the last parameter should be false here (and true in some other
places)... Still not clear on the mgmt of that file.

> +    virMutexUnlock(&(lockListWait.fileMutex));
> +
> +    if (VIR_CLOSE(fd) < 0) {
> +        virReportSystemError(errno,
> +                             _("unable to save file '%s'"),
> +                             lockRecordFilePath);
> +        VIR_FORCE_CLOSE(fd);
> +    }
> +
> + cleanup:
> +    VIR_FREE(lock->name);
> +    VIR_FREE(lock);
> +}
> +
> +static int virLockManagerDlmRelease(virLockManagerPtr lock,
> +                                    char **state,
> +                                    unsigned int flags)
> +{
> +    virLockManagerDlmPrivatePtr priv = lock->privateData;
> +    virLockManagerDlmResourcePtr resource = NULL;
> +    virLockInformationPtr theLock = NULL;
> +    struct dlm_lksb lksb = {0};
> +    int rv = -1;
> +    size_t i;
> +
> +    virCheckFlags(0, -1);
> +
> +    if(state)

if (state)

> +        *state = NULL;
> +
> +    if(!lockspace) {

if (!lockspace)

> +        virReportError(VIR_ERR_INTERNAL_ERROR, "%s",
> +                       _("lockspace is not opened"));
> +        return -1;
> +    }
> +
> +    for (i = 0; i < priv->nresources; i++) {
> +        resource = priv->resources + i;
> +
> +        virListForEachEntry (theLock, &(lockListWait.list), entry) {
> +            if((theLock->vm_pid == priv->vm_pid)    &&
> +               STREQ(theLock->name, resource->name) &&
> +               (theLock->mode == resource->mode)) {
> +
> +                /*
> +                 * there are some locks from adopting, the existence of `(void *)1`
> +                 * when adopting makes 'terminated by signal SIGSEGV (Address
> +                 * boundary error)' error appear.

This is here for your purposes?

> +                 *
> +                 * The following code reference to lvm2 project's implement.
> +                 */
> +                lksb.sb_lkid = theLock->lkid;
> +                rv = dlm_ls_lock_wait(lockspace, LKM_NLMODE,

So release is more like move to NL mode ...  kind of what I was saying
earlier I think for Acquire... You get a NL model lock and then you
convert to PR/PW/EX whatever...

> +                                      &lksb, LKF_CONVERT,
> +                                      resource->name,
> +                                      strlen(resource->name),
> +                                      0, NULL, NULL, NULL);
> +
> +                if (rv < 0) {
> +                    virReportSystemError(errno,
> +                                         _("failed to convert lock: rv=%d lockStatus=%d"),

lksbStatus

> +                                         rv, lksb.sb_status);
> +                    goto cleanup;
> +                }
> +
> +                /* don't care whether the lock is released or not,
> +                 * it will be automatically released after the libvirtd dead
> +                 */
> +                virLockManagerDlmDeleteLock(theLock, driver->lockRecordFilePath);

I think we should care... Zero fault tolerance problably not good.

> +
> +                rv = dlm_ls_unlock_wait(lockspace, lksb.sb_lkid, 0, &lksb);

0 for flags seems odd considering what you're trying to do (I think)

> +                if (rv < 0) {
> +                    virReportSystemError(errno,
> +                                         _("failed to release lock: rv=%d lockStatus=%d"),

lksbStatus

> +                                         rv, lksb.sb_status);
> +                    goto cleanup;
> +                }
> +
> +                break;
> +            }
> +        }
> +    }
> +
> +    rv = 0;
> +
> + cleanup:
> +    return rv;
> +}
> +
> +static int virLockManagerDlmInquire(virLockManagerPtr lock ATTRIBUTE_UNUSED,
> +                                    char **state,
> +                                    unsigned int flags)
> +{
> +    /* not support mannual lock, so this function almost does nothing */

'manual'

> +    virCheckFlags(0, -1);
> +
> +    if (state)
> +        *state = NULL;

Why would this return NULL?  My recollection is there is some sort of
inquiry API for dlm.

> +
> +    return 0;
> +}
> +
> +virLockDriver virLockDriverImpl =
> +{
> +    .version = VIR_LOCK_MANAGER_VERSION,
> +
> +    .flags = VIR_LOCK_MANAGER_USES_STATE, // currently not used

Using // as a comment is something else that should be flagged by make
syntax-check

> +
> +    .drvInit = virLockManagerDlmInit,
> +    .drvDeinit = virLockManagerDlmDeinit,
> +
> +    .drvNew = virLockManagerDlmNew,
> +    .drvFree = virLockManagerDlmFree,
> +
> +    .drvAddResource = virLockManagerDlmAddResource,
> +
> +    .drvAcquire = virLockManagerDlmAcquire,
> +    .drvRelease = virLockManagerDlmRelease,
> +    .drvInquire = virLockManagerDlmInquire,
> +};


The following in particular is *easily* extractible into its own patch
to be considered on its own for inclusion. If it were, there's lots of
existing code that I would think could use something like this -
although that's probably more than you want to be doing.

None of the following are documented...

> diff --git a/src/util/virlist.h b/src/util/virlist.h
> new file mode 100644
> index 000000000..4ac3626c7
> --- /dev/null
> +++ b/src/util/virlist.h
> @@ -0,0 +1,110 @@
> +/*
> + * virlist.h: methods for managing list, logic is copied from Linux Kernel
> + *
> + * Copyright (C) 2018 SUSE LINUX Products, Beijing, China.
> + *
> + * This library is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU Lesser General Public
> + * License as published by the Free Software Foundation; either
> + * version 2.1 of the License, or (at your option) any later version.
> + *
> + * This library is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> + * Lesser General Public License for more details.
> + *
> + * You should have received a copy of the GNU Lesser General Public
> + * License along with this library.  If not, see
> + * <http://www.gnu.org/licenses/>.
> + *
> + */
> +
> +#ifndef __VIR_LIST_H
> +#define __VIR_LIST_H
> +
> +#include <stdbool.h>
> +
> +typedef struct _virListHead virListHead;
> +typedef virListHead *virListHeadPtr;
> +
> +struct _virListHead {
> +    struct _virListHead *next, *prev;
> +};
> +
> +static inline void
> +virListHeadInit(virListHeadPtr name)
> +{
> +    name->next = name;
> +    name->prev = name;
> +}
> +
> +static inline void
> +__virListAdd(virListHeadPtr entry,
> +             virListHeadPtr prev, virListHeadPtr next)
> +{
> +    next->prev = entry;
> +    entry->next = next;
> +    entry->prev = prev;
> +    prev->next = entry;
> +}
> +
> +static inline void
> +virListAdd(virListHeadPtr entry, virListHeadPtr head)
> +{
> +    __virListAdd(entry, head, head->next);
> +}
> +
> +static inline void
> +virListAddTail(virListHeadPtr entry, virListHeadPtr head)
> +{
> +    __virListAdd(entry, head->prev, head);
> +}
> +
> +static inline void
> +__virListDelete(virListHeadPtr prev, virListHeadPtr next)
> +{
> +    next->prev = prev;
> +    prev->next = next;
> +}
> +
> +static inline void
> +virListDelete(virListHeadPtr entry)

More like Remove than Delete...  Delete to me implies @entry could be
deleted too

> +{
> +    __virListDelete(entry->prev, entry->next);
> +}
> +
> +static inline bool
> +virListEmpty(virListHeadPtr head)
> +{
> +    return head->next == head;
> +}
> +
> +#ifndef virContainerOf
> +#define virContainerOf(ptr, type, member) \
> +    (type *)((char *)(ptr) - (char *) &((type *)0)->member)
> +#endif
> +
> +#define virListEntry(ptr, type, member) \
> +    virContainerOf(ptr, type, member)
> +
> +#define virListFirstEntry(ptr, type, member) \
> +    virListEntry((ptr)->next, type, member)
> +
> +#define virListLastEntry(ptr, type, member) \
> +    virListEntry((ptr)->prev, type, member)
> +
> +#define __virContainerOf(ptr, sample, member)             \

Without a doubt the extra spaces between ) and \ should raise the ire of
the check-syntax test...

> +    (void *)virContainerOf((ptr), typeof(*(sample)), member)
> +
> +#define virListForEachEntry(pos, head, member)              \
> +    for (pos = __virContainerOf((head)->next, pos, member);       \
> +     &pos->member != (head);                    \
> +     pos = __virContainerOf(pos->member.next, pos, member))
> +
> +#define virListForEachEntrySafe(pos, tmp, head, member)         \

Safe from what?


John

> +    for (pos = __virContainerOf((head)->next, pos, member),       \
> +     tmp = __virContainerOf(pos->member.next, pos, member);       \
> +     &pos->member != (head);                    \
> +     pos = tmp, tmp = __virContainerOf(pos->member.next, tmp, member))
> +
> +#endif
> 

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/3] Add a plugin dlm for lock manager
Posted by Lin Fu 7 years, 3 months ago
I could feel that you want to express "Oh, shit! How terrible the code it
is." from the reply, although across the screen. Anyway, I'm really glad
to thanks for your review and suggestions. :-)

On 2018年02月08日 07:55, John Ferlan wrote:

 > That's where I'd expect to find the details of why someone would want to
 > choose dlm over lockd or sanlock.  What goodies are provided? What
 > advantage is there?

For those, assuming this scene: one customer has used cluster software
with DLM in their environment, why not choose DLM plugin instead of
sanlock/lockd ? Beside that, compared with sanlock/lockd, DLM don't need
the share storage, and sanlock would bring a little heavyheavily on disk
IO according to lockd RFC(2011-July/msg00337.html), but DLM only requires
  network.

I have sent a RFC in 12/2017 to ack how the opinion is. Daniel said that
"it can be considered in scope for a libvirt lock manager plugin if you
want to try writing one"(2017-December/msg00706.html). So I draft this
patch set which could only make DLM run in libvirt(with configure flag
`--with-dlm` to compile 'dlm.so' module by force). Hope some suggestion
bringed from libvirt community to help me improve, if accepted DLM lock
manager plugin RFC, so I don't update 'docs/locking.html.in'.

-------------

The following is the reply contents, aim to answer why I do that
for most comments:

1) sorry for indent in my code. I coded them in two notebooks,
both using vim. However one is newer, I have not configured '.vimrc' in
that one.

2)
 >> > +AC_DEFUN([LIBVIRT_CHECK_CPG],[
 >> > +  dnl in some distribution, Version is `UNKNOW` in libcpg.pc
 >
 > UNKNOWN
 > doesn't make it very useful does it?

In some distribution, for example, openSUSE Tumbleweed:
     $ cat /usr/lib64/pkgconfig/libcpg.pc

     prefix=/usr
     exec_prefix=${prefix}
     libdir=/usr/lib64
     includedir=${prefix}/include

     Name: cpg
     Version: UNKNOWN
     Description: cpg
     Requires:
     Libs: -L${libdir} -lcpg
     Cflags: -I${includedir}

using `LIBVIRT_CHECK_PKG([CPG], [libcpg], [xxx])` directly would make
build error. So I decide to ignore 'version'. I also don't know why
that is.

3)
 >> > +#define LOCK_RECORD_FILE_PATH "/tmp/libvirtd-dlm-file"
 >
 > Is somewhere in /tmp the best/only place to put this? What not similar
 > to lockd/sanlock using /var/lib/libvirt/dlm... (which would seemingly
 > need to be created on installation).

In my opinion, if one node reboots, DLM cluster would drop locks belong
that node, most of Linux distribution would clean '/tmp' directory...

4)
 >> > +
 >> > +#define STATUS             "STATUS"
 >> > +#define RESOURCE_NAME      "RESOURCE_NAME"
 >> > +#define LOCK_ID            "LOCK_ID"
 >> > +#define LOCK_MODE          "LOCK_MODE"
 >> > +#define VM_PID             "VM_PID"
 >> > +
 >> > +#define BUFFERLEN          128
 >
 > Yuck - for a stack variable...

 >> > +    for (i = 0, str = raw, status = 0; ; str = NULL, i++) {
 >> > +        subtoken = strtok_r(str, " \n", &saveptr);
 >> > +        if (subtoken == NULL)
 >> > +            break;
 >> > +
 >> > +        switch(i) {
 >> > +        case 0:
 >
 > These are magic numbers?


Badly code wants to provided a readable file, like that:

STATUS RESOURCE_NAME LOCK_MODE     VM_PID
          0 
1501e418dedd122b050c91ec61650ae30d3c217806125d3c2f365989143aa28d 
EXMODE      21857

The first line is the form header, the following is the lock
information.

5)
 >> > +    char    *name;
 >> > +    uint32_t mode;
 >> > +    uint32_t lkid;
 >
 > I would think uint32_t would be limiting...  Why not just unsigned int?

view in '/usr/include/libdlm.h', it use `uint32_t` instead of
`unsigned int`.

6)
 >> > +    virMutexLock(&(lockListWait.listMutex));
 >> > +    virListAddTail(&lock->entry, &(lockListWait.list));
 >> > + virMutexUnlock(&(lockListWait.listMutex));
 >
 > What is this list for?  I think this also is completely separable.  I
 > wouldn't use a list either... 10s of domains probably works fine.  100s
 > of domains perhaps not as well 1000s of domains starts being a
 > bottleneck for sure.

Hashtable is a good idea, and I wanted to use it but I had never read
it's code... Chosing list because that there won't be too much vms in
one host, is it offten common that there is more than 50 vms in normal
usage?

7)
 >> > +    if (safewrite(fd, buffer, strlen(buffer)) != strlen(buffer)) {
 >> > +        virReportSystemError(errno,
 >> > +                             _("unable to write lock information 
'%s' to file '%s'"),
 >> > +                             buffer, 
NULLSTR(driver->lockRecordFilePath));
 >> > +        return;
 >> > +    }
 >
 > Not sure I understand the reason of the file

Compared with sanlock and lockd, this implement of DLM has no daemon,
some area is needed to record lock information in order to resume lock
after libvirtd reboot. I chose file.

8)
 >> > +     *   found with another mode, and -ENOENT if no orphan.
 >> > +     *
 >> > +     *   cast/bast/param are (void *)1 because the kernel
 >> > +     *   returns errors if some are null.
 >> > +     */
 >> > +
 >> > +    status = dlm_ls_lockx(lockspace, mode, &lksb, 
LKF_PERSISTENT|LKF_ORPHAN,
 >> > +                          name, strlen(name), 0,
 >> > +                          (void *)1, (void *)1, (void *)1,
 >
 > ^^ These would seem to relate to the same parametes used for
 > dlm_lock[_wait].  They are actually quite useful when it comes to lock
 > conversions.  You may want to considering investigating that...
 >
 >
 > Still seems like a "bug" to me to need to pass 1 as a callback function
 > or a blocking callback function.
 >
 >> > +                          NULL, NULL);
 >
 > Why not dlm_is_lock() instead since you're not using xid and timeout

The code is from lvm2, I try to use `dlm_lock_wait`, however it failed,
I don't know why. `(void *)1` is the same reason. Maybe a bug from
libdlm or 'dlm.ko'.

9)
 >> > +
 >> > +    /* create thread to receive notification from kernel */
 >
 > notification for what? what gets called?

This is implemented by libdlm, the result is writted in `lksb`.

        int dlm_lock_wait(uint32_t mode,
                  struct dlm_lksb *lksb,
                  uint32_t flags,
                  const void *name,
                  unsigned int namelen,
                  uint32_t parent,         /* unused */
                  void *bastarg,
                  void (*bastaddr) (void *bastarg),
                  void *range);            /* unused */

10)
 >> > +                    priv->hasRWDisks = true;
 >> > +                    /* ignore disk resource without error */
 >> > +                    return 0;
 >> > +                }
 >> > +            }
 >> > +
 >> > +            if (virCryptoHashString(VIR_CRYPTO_HASH_SHA256, name, 
&newName) < 0)
 >
 >> Interesting _lockd uses SHA256 and _sanlock uses MD5...
 >
 >
 >> > +        /* we need format the lock information, so the lock name 
must be the constant length */
 >> > +        if (virCryptoHashString(VIR_CRYPTO_HASH_SHA256, name, 
&newName) < 0)
 >> > +            goto cleanup;
 >
 > Neither _lockd nor _sandisk do any sort of Hash on this name.
 >
 >
 >> > +            memset(&lksb, 0, sizeof(lksb));
 >> > +            rv = dlm_ls_lock_wait(lockspace, priv->resources[i].mode,
 >> > +                                  &lksb, LKF_NOQUEUE|LKF_PERSISTENT,
 >> > + priv->resources[i].name, strlen(priv->resources[i].name),
 >
 > This is where I wonder why we cannot pass the path to the disk to dlm
 > instead of the hash

It's interesting here. Originally I use MD5 instead of SHA256, but I 
found that:

     typedef enum {
         VIR_CRYPTO_HASH_MD5, /* Don't use this except for historic 
compat */
         VIR_CRYPTO_HASH_SHA256,

         VIR_CRYPTO_HASH_LAST
     } virCryptoHash;

Futhermore, I find the bug patch related sanlock name: sanlock's name
length is limited 48, there is always someone use more than 48 characters
(see https://bugzilla.redhat.com/show_bug.cgi?id=735443). The same
situation, why not use the constant length? DLM limits name is 64 bytes
to avoid some potential bugs. Besides, it's convenient to format write
it to file.

11)
 >> > +     * 'ensure sanlock socket is labelled with the VM process label',
 >> > +     * however, fixing sanlock socket security labelling remove 
related
 >> > +     * code. Now, `fd` parameter is useless.
 >> > +     */
 >> > +    if (fd)
 >> > +        *fd = -1;
 >
 > But you just set it to -1??!

Maybe I can submit a patch to cut those code. Afterall it's useless now.

12)
 >> > +    virCheckFlags(0, -1);
 >> > +
 >> > +    if (state)
 >> > +        *state = NULL;
 >
 > Why would this return NULL?  My recollection is there is some sort of
 > inquiry API for dlm.

Inquiry API is disappeared now... Only lock/unlock refered to libdlm
source code and reference book.
  http://people.redhat.com/ccaulfie/docs/rhdlmbook.pdf
  Chapter 4.3. Lock Query Operations

-- 
Regards
Lin Fu(river)

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH 1/3] Add a plugin dlm for lock manager
Posted by John Ferlan 7 years, 3 months ago

On 02/08/2018 03:30 AM, Lin Fu wrote:
> I could feel that you want to express "Oh, shit! How terrible the code it
> is." from the reply, although across the screen. Anyway, I'm really glad
> to thanks for your review and suggestions. :-)
> 

No - just trying to be thorough with respect to expectations that you're
probably not aware of as someone who doesn't regularly follow or
contribute to libvirt. If it came off as oh this is crap, then sorry -
it wasn't intended that way. I'd rather have someone be explicit when
they don't like something than make me guess what I did wrong or that
wasn't following some project guidelines.

> On 2018年02月08日 07:55, John Ferlan wrote:
> 
>> That's where I'd expect to find the details of why someone would want to
>> choose dlm over lockd or sanlock.  What goodies are provided? What
>> advantage is there?

Be careful how you snip - loss of context of placement...  This comment
was pointing out you didn't update the docs.

> 
> For those, assuming this scene: one customer has used cluster software
> with DLM in their environment, why not choose DLM plugin instead of
> sanlock/lockd ? Beside that, compared with sanlock/lockd, DLM don't need
> the share storage, and sanlock would bring a little heavyheavily on disk
> IO according to lockd RFC(2011-July/msg00337.html), but DLM only requires
>  network.

I'd choose dlm over other options too, but that's because I have a
pretty good idea of how it works and it's features. The on/off nature of
pthread/mutex locks has always felt like a step backwards for sharing
resources from what I cut my teeth on with the OpenVMS Lock Manager.

Still you need to "sell" why using dlm is better than the existing
sanlock or lockd.

My "vision" of dlm is that resource you're adding that's available
clusterwide in some manner should be able to have something that each
member of the cluster would name the resource in the same manner. What I
don't have is the mental picture of how say /dev/sdm on one host
"appears" on another host.  If it was /dev/sdm, then it'd be really
simple to use that as a resource name and know it's the same resource
between hosts.  When I see the device name being hashed and added as a
resource, then I wonder how different nodes in the cluster will know
it's the same thing.  Although I do recall some code used the Resource
Name field rather judiciously to do some fairly interesting things. When
you see system space (e.g. kernel) address in a resource name, you know
something entertaining is occurring.

> 
> I have sent a RFC in 12/2017 to ack how the opinion is. Daniel said that
> "it can be considered in scope for a libvirt lock manager plugin if you
> want to try writing one"(2017-December/msg00706.html). So I draft this
> patch set which could only make DLM run in libvirt(with configure flag
> `--with-dlm` to compile 'dlm.so' module by force). Hope some suggestion
> bringed from libvirt community to help me improve, if accepted DLM lock
> manager plugin RFC, so I don't update 'docs/locking.html.in'.

So this series was really more of an RFC than a V1...  It's ok, but
indicating RFC sets the expectations of the reviewer as opposed to
posting a V1.

> 
> -------------
> 
> The following is the reply contents, aim to answer why I do that
> for most comments:
> 
> 1) sorry for indent in my code. I coded them in two notebooks,
> both using vim. However one is newer, I have not configured '.vimrc' in
> that one.
> 

Following posting guidelines of running make check syntax-check would
have avoided that. There are some that won't even *look* at code that
fails those basic tests.

> 2)
>>> > +AC_DEFUN([LIBVIRT_CHECK_CPG],[
>>> > +  dnl in some distribution, Version is `UNKNOW` in libcpg.pc
>>
>> UNKNOWN
>> doesn't make it very useful does it?
> 
> In some distribution, for example, openSUSE Tumbleweed:
>     $ cat /usr/lib64/pkgconfig/libcpg.pc
> 
>     prefix=/usr
>     exec_prefix=${prefix}
>     libdir=/usr/lib64
>     includedir=${prefix}/include
> 
>     Name: cpg
>     Version: UNKNOWN
>     Description: cpg
>     Requires:
>     Libs: -L${libdir} -lcpg
>     Cflags: -I${includedir}
> 
> using `LIBVIRT_CHECK_PKG([CPG], [libcpg], [xxx])` directly would make
> build error. So I decide to ignore 'version'. I also don't know why
> that is.
> 

Seems like Tumbleweed needs to have a bug report filed in order to get
the Version properly filled in. If you live with UNKNOWN, then the
problem is how do you really know if the version someone has installed
will have what you want without peeking into the library and looking for
specific routines. I think for some reason we do that with some of the
crypto m4 files, but then again I try to avoid these files.

> 3)
>>> > +#define LOCK_RECORD_FILE_PATH       "/tmp/libvirtd-dlm-file"
>>
>> Is somewhere in /tmp the best/only place to put this? What not similar
>> to lockd/sanlock using /var/lib/libvirt/dlm... (which would seemingly
>> need to be created on installation).
> 
> In my opinion, if one node reboots, DLM cluster would drop locks belong
> that node, most of Linux distribution would clean '/tmp' directory...
> 

And if /tmp is very small by some admin's choice (done for various
reasons), then what happens to your code?  There are mechanisms to store
in XML data that needs to be saved between libvirtd restarts and even
host reboots. It's really not clear where this fits. Perhaps a
/var/run/libvirt type file that would be cleaned up differently.

> 4)
>>> > +
>>> > +#define STATUS             "STATUS"
>>> > +#define RESOURCE_NAME      "RESOURCE_NAME"
>>> > +#define LOCK_ID            "LOCK_ID"
>>> > +#define LOCK_MODE          "LOCK_MODE"
>>> > +#define VM_PID             "VM_PID"
>>> > +
>>> > +#define BUFFERLEN          128
>>
>> Yuck - for a stack variable...
> 
>>> > +    for (i = 0, str = raw, status = 0; ; str = NULL, i++) {
>>> > +        subtoken = strtok_r(str, " \n", &saveptr);
>>> > +        if (subtoken == NULL)
>>> > +            break;
>>> > +
>>> > +        switch(i) {
>>> > +        case 0:
>>
>> These are magic numbers?
> 
> 
> Badly code wants to provided a readable file, like that:
> 
>     STATUS                                                   
> RESOURCE_NAME LOCK_MODE     VM_PID
>          0
> 1501e418dedd122b050c91ec61650ae30d3c217806125d3c2f365989143aa28d   
> EXMODE      21857
> 
> The first line is the form header, the following is the lock
> information.
> 

You could consider XML or JSON format...

> 5)
>>> > +    char    *name;
>>> > +    uint32_t mode;
>>> > +    uint32_t lkid;
>>
>> I would think uint32_t would be limiting...  Why not just unsigned int?
> 
> view in '/usr/include/libdlm.h', it use `uint32_t` instead of
> `unsigned int`.
> 

It's more of a I'm used to seeing unsigned int rather than uint32_t
which is more limiting.

> 6)
>>> > +    virMutexLock(&(lockListWait.listMutex));
>>> > +    virListAddTail(&lock->entry, &(lockListWait.list));
>>> > +    virMutexUnlock(&(lockListWait.listMutex));
>>
>> What is this list for?  I think this also is completely separable.  I
>> wouldn't use a list either... 10s of domains probably works fine.  100s
>> of domains perhaps not as well 1000s of domains starts being a
>> bottleneck for sure.
> 
> Hashtable is a good idea, and I wanted to use it but I had never read
> it's code... Chosing list because that there won't be too much vms in
> one host, is it offten common that there is more than 50 vms in normal
> usage?
> 

Lots of recent examples using virHashCreate, virHashAddEntry,
virHashRemoveEntry, etc. A hash table can/will resize as necessary.

Assuming usage is a dangerous thing. You're not just talking domains,
you're factoring in shared storage between domains and leases into your
resource counts.

Recent examples have been assuming no one would ever have more than
(say) 200 disks defined for a domain for which we want to obtain domain
statistics. Well it happened and we chased a way to handle that given
the limitation of size of RPC messages. More recently some block device
with more than 200 backing stores generated a very large pile of data
where there was lots of duplication.

> 7)
>>> > +    if (safewrite(fd, buffer, strlen(buffer)) != strlen(buffer)) {
>>> > +        virReportSystemError(errno,
>>> > +                             _("unable to write lock information
> '%s' to file '%s'"),
>>> > +                             buffer,
> NULLSTR(driver->lockRecordFilePath));
>>> > +        return;
>>> > +    }
>>
>> Not sure I understand the reason of the file
> 
> Compared with sanlock and lockd, this implement of DLM has no daemon,
> some area is needed to record lock information in order to resume lock
> after libvirtd reboot. I chose file.
> 

Hmm. you're using virtlockd and creating lock_driver_dlm, but then
virtlockd isn't running?  I really didn't dig too deep and think that
much about how this integrates at all.

> 8)
>>> > +     *   found with another mode, and -ENOENT if no orphan.
>>> > +     *
>>> > +     *   cast/bast/param are (void *)1 because the kernel
>>> > +     *   returns errors if some are null.
>>> > +     */
>>> > +
>>> > +    status = dlm_ls_lockx(lockspace, mode, &lksb,
> LKF_PERSISTENT|LKF_ORPHAN,
>>> > +                          name, strlen(name), 0,
>>> > +                          (void *)1, (void *)1, (void *)1,
>>
>> ^^ These would seem to relate to the same parametes used for
>> dlm_lock[_wait].  They are actually quite useful when it comes to lock
>> conversions.  You may want to considering investigating that...
>>
>>
>> Still seems like a "bug" to me to need to pass 1 as a callback function
>> or a blocking callback function.
>>
>>> > +                          NULL, NULL);
>>
>> Why not dlm_is_lock() instead since you're not using xid and timeout
> 
> The code is from lvm2, I try to use `dlm_lock_wait`, however it failed,
> I don't know why. `(void *)1` is the same reason. Maybe a bug from
> libdlm or 'dlm.ko'.
> 

Perhaps more time spent understanding dlm interactions will help. Just
copying an existing implementation without understanding what the
interactions are may not fit into the model you're trying to create. I
honestly don't have the cycles to learn all that dlm has to offer right
now. But I do think you do need to spend more time understanding how
libvirt works today and how dlm will make things better and then how
exactly to accomplish that in an understandable manner.  My biggest
concern is someone would add a feature like this and then not continue
working with/on libvirt. The dlm concepts are a departure from existing
locking models. If you want to get known in the libvirt community then
pick some of the simple tasks from the bite sized tasks page:

https://wiki.libvirt.org/page/BiteSizedTasks

that still need addressing...

> 9)
>>> > +
>>> > +    /* create thread to receive notification from kernel */
>>
>> notification for what? what gets called?
> 
> This is implemented by libdlm, the result is writted in `lksb`.
> 
>        int dlm_lock_wait(uint32_t mode,
>                  struct dlm_lksb *lksb,
>                  uint32_t flags,
>                  const void *name,
>                  unsigned int namelen,
>                  uint32_t parent,         /* unused */
>                  void *bastarg,
>                  void (*bastaddr) (void *bastarg),
>                  void *range);            /* unused */
> 

Again - a bit of context... This was a dlm_ls_pthread_init call made in
virLockManagerDlmSetup. So something that's called once creates a thread
to do what? I didn't research it, but it's something you have to know
and understand so that if someone asks you can have the answer.

An lksb is a "lock status block", but that's on a lock by lock basis.
IIRC, those are written when lock conversions occur. If you used the
"dlm_lock" API, then used the completion routines, you can then check
the lksb for the status of the attempted operation. I'm very rusty in my
recollection and my description is more my understanding of it.


> 10)
>>> > +                    priv->hasRWDisks = true;
>>> > +                    /* ignore disk resource without error */
>>> > +                    return 0;
>>> > +                }
>>> > +            }
>>> > +
>>> > +            if (virCryptoHashString(VIR_CRYPTO_HASH_SHA256, name,
> &newName) < 0)
>>
>>> Interesting _lockd uses SHA256 and _sanlock uses MD5...
>>
>>
>>> > +        /* we need format the lock information, so the lock name
> must be the constant length */
>>> > +        if (virCryptoHashString(VIR_CRYPTO_HASH_SHA256, name,
> &newName) < 0)
>>> > +            goto cleanup;
>>
>> Neither _lockd nor _sandisk do any sort of Hash on this name.
>>
>>
>>> > +            memset(&lksb, 0, sizeof(lksb));
>>> > +            rv = dlm_ls_lock_wait(lockspace, priv->resources[i].mode,
>>> > +                                  &lksb, LKF_NOQUEUE|LKF_PERSISTENT,
>>> > +                                  priv->resources[i].name,
> strlen(priv->resources[i].name),
>>
>> This is where I wonder why we cannot pass the path to the disk to dlm
>> instead of the hash
> 
> It's interesting here. Originally I use MD5 instead of SHA256, but I
> found that:
> 
>     typedef enum {
>         VIR_CRYPTO_HASH_MD5, /* Don't use this except for historic compat */
>         VIR_CRYPTO_HASH_SHA256,
> 
>         VIR_CRYPTO_HASH_LAST
>     } virCryptoHash;
> 
> Futhermore, I find the bug patch related sanlock name: sanlock's name
> length is limited 48, there is always someone use more than 48 characters
> (see https://bugzilla.redhat.com/show_bug.cgi?id=735443). The same
> situation, why not use the constant length? DLM limits name is 64 bytes
> to avoid some potential bugs. Besides, it's convenient to format write
> it to file.
> 

I'm still wondering why a hash is being used.  That's an implementation
detail which I haven't yet come to grips with given what I know from my
history with OpenVMS (think about that /dev/sdm example above).

In OpenVMS "disks" that were resources in a cluster shared with some SAN
within the cluster connectivity fabric.  Each member of the cluster had
an ID (I think you have that with the CPG above) and shared resources
would be named with that ID - it's been over 15 years, so too long to
remember the details.

Still point is, for a dlm environment to be effective device naming
between cluster members is going to be important.

> 11)
>>> > +     * 'ensure sanlock socket is labelled with the VM process label',
>>> > +     * however, fixing sanlock socket security labelling remove
> related
>>> > +     * code. Now, `fd` parameter is useless.
>>> > +     */
>>> > +    if (fd)
>>> > +        *fd = -1;
>>
>> But you just set it to -1??!
> 
> Maybe I can submit a patch to cut those code. Afterall it's useless now.
> 

Nothing's been accepted you'd just remove this from future patches.

> 12)
>>> > +    virCheckFlags(0, -1);
>>> > +
>>> > +    if (state)
>>> > +        *state = NULL;
>>
>> Why would this return NULL?  My recollection is there is some sort of
>> inquiry API for dlm.
> 
> Inquiry API is disappeared now... Only lock/unlock refered to libdlm
> source code and reference book.
>  http://people.redhat.com/ccaulfie/docs/rhdlmbook.pdf
>  Chapter 4.3. Lock Query Operations
> 

You need to think in terms of what libvirt consumers could use or need
and perhaps not specifically something that dlm provides.
> -- 
> Regards
> Lin Fu(river)
> 

Don't forget to update your local libvirt.git repo in order to add the
pertinent information so that the next series you post will have it.

Sorry if there's an incomplete thoughts above - I've been distracted
today and this evening - it's dinner time now.

John

--
libvir-list mailing list
libvir-list@redhat.com
https://www.redhat.com/mailman/listinfo/libvir-list