From nobody Wed May 14 11:45:50 2025 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) client-ip=209.132.183.28; envelope-from=libvir-list-bounces@redhat.com; helo=mx1.redhat.com; Authentication-Results: mx.zohomail.com; spf=pass (zoho.com: domain of redhat.com designates 209.132.183.28 as permitted sender) smtp.mailfrom=libvir-list-bounces@redhat.com; dmarc=pass(p=none dis=none) header.from=redhat.com Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by mx.zohomail.com with SMTPS id 1523539728390101.00496288374052; Thu, 12 Apr 2018 06:28:48 -0700 (PDT) Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 0DEA4C049D59; Thu, 12 Apr 2018 13:28:47 +0000 (UTC) Received: from colo-mx.corp.redhat.com (colo-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.21]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D970978BCE; Thu, 12 Apr 2018 13:28:46 +0000 (UTC) Received: from lists01.pubmisc.prod.ext.phx2.redhat.com (lists01.pubmisc.prod.ext.phx2.redhat.com [10.5.19.33]) by colo-mx.corp.redhat.com (Postfix) with ESMTP id 9EA1B4CA9E; Thu, 12 Apr 2018 13:28:46 +0000 (UTC) Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.rdu2.redhat.com [10.11.54.3]) by lists01.pubmisc.prod.ext.phx2.redhat.com (8.13.8/8.13.8) with ESMTP id w3CDSdm7025853 for ; Thu, 12 Apr 2018 09:28:39 -0400 Received: by smtp.corp.redhat.com (Postfix) id 2A2641208F83; Thu, 12 Apr 2018 13:28:39 +0000 (UTC) Received: from t460.redhat.com (unknown [10.33.36.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id AF73810B00B2; Thu, 12 Apr 2018 13:28:38 +0000 (UTC) From: =?UTF-8?q?Daniel=20P=2E=20Berrang=C3=A9?= To: libvir-list@redhat.com Date: Thu, 12 Apr 2018 14:28:22 +0100 Message-Id: <20180412132822.23214-6-berrange@redhat.com> In-Reply-To: <20180412132822.23214-1-berrange@redhat.com> References: <20180412132822.23214-1-berrange@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.78 on 10.11.54.3 X-loop: libvir-list@redhat.com Subject: [libvirt] [PATCH 5/5] po: minimize & canonicalize translations stored in git X-BeenThere: libvir-list@redhat.com X-Mailman-Version: 2.1.12 Precedence: junk List-Id: Development discussions about the libvirt library & tools List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable Sender: libvir-list-bounces@redhat.com Errors-To: libvir-list-bounces@redhat.com X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Thu, 12 Apr 2018 13:28:47 +0000 (UTC) X-ZohoMail: RSF_0 Z_629925259 SPT_0 Similar to the libvirt.pot, .po files contain line numbers and file names identifying where in the source a translatable string comes from. The source locations in the .po files are thrown away and replaced with content from the libvirt.pot whenever msgmerge is run, so this is not precious information that needs to be stored in git. When msgmerge processes a .po file, it will add in any msgids from the libvirt.pot that were not already present. Thus, if a particular msgid currently has no translation, it can be considered redundant and again does not need storing in git. When msgmerge processes a .po file and can't find an exact existing translation match, it will try todo fuzzy matching instead, marking such entries with a "# fuzzy" comment to alert the translator to take a look and either discard, edit or accept the match. Looking at the existing fuzzy matches in .po files shows that the quality is awful, with many having a completely different set of printf format specifiers between the msgid and fuzzy msgstr entry. Fortunately when msgfmt generates the .gmo, the fuzzy entries are all ignored anyway. The fuzzy entries could be useful to translators if they were working on the .po files directly from git, but Libvirt outsourced translation to the Fedora Zanata system, so keeping fuzzy matches in git is not much help. Finally, by default msgids are sorted based on source location. Thus, if a bit of code with translatable text is moved from one file to another, it may shift around in the .po file, despite the msgid not itself changing. If the msgids were sorted alphabetically, the .po files would have stable ordering when code is refactored. This patch takes advantage of the above observations to canonicalize and minimize the content stored for .po files in git. Instead of storing the real .po files, we now store .mini.po files. The .mini.po files are the same file format as .po files, but have no source location comments, are sorted alphabetically, and all fuzzy msgstrs and msgids with no translation are discarded. This cuts the size of content in the po directory from 109MB to 19MB. Users working from a libvirt git checkout who need the full .po files can run "make update-po", which merges the libvirt.pot and .mini.po file to create a .po file containing all the content previously stored in git. Conversely if a full .po file has been modified, for example, by downloading new content from Zanata, the .mini.po files can be updated by running "make update-mini-po". The resulting diffs of the .mini.po file will clearly show the changed translations without any of the noise that previously obscured content. Being able to see content changes clearly actually identified a bug in the zanata python client where it was adding bogus "fuzzy" annotations to many messages: https://bugzilla.redhat.com/show_bug.cgi?id=3D1564497 Users working from libvirt releases should not see any difference in behaviour, since the tarballs only contain the full .po files, not the .mini.po files. As an added benefit, generating tarballs with "make dist", will no longer cause creation of dirty files in git, since it won't touch the .mini.po files, only the .po files which are no longer kept in git. To avoid creating a single commit 100+MB in size, each language is minimized separately in a following commit. Signed-off-by: Daniel P. Berrang=C3=A9 Reviewed-by: J=EF=BF=BDn Tomko --- .gitignore | 3 +++ build-aux/minimize-po.pl | 37 +++++++++++++++++++++++++++++++++ po/Makefile.am | 30 ++++++++++++++------------- po/README.md | 53 +++++++++++++++++++++++++++++++++++++++++---= ---- 4 files changed, 102 insertions(+), 21 deletions(-) create mode 100755 build-aux/minimize-po.pl diff --git a/.gitignore b/.gitignore index 121c2caed1..df0ac8e3d4 100644 --- a/.gitignore +++ b/.gitignore @@ -101,6 +101,9 @@ /mingw-libvirt.spec /mkinstalldirs /po/*gmo +/po/*po +!/po/*.mini.po +/po/*pot /proxy/ /python/ /run diff --git a/build-aux/minimize-po.pl b/build-aux/minimize-po.pl new file mode 100755 index 0000000000..3099178970 --- /dev/null +++ b/build-aux/minimize-po.pl @@ -0,0 +1,37 @@ +#!/usr/bin/perl + +my @block; +my $msgstr =3D 0; +my $empty =3D 0; +my $unused =3D 0; +my $fuzzy =3D 0; +while (<>) { + if (/^$/) { + if (!$empty && !$unused && !$fuzzy) { + print @block; + print; + } + @block =3D (); + $msgstr =3D 0; + $fuzzy =3D 0; + } else { + if (/^msgstr/) { + $msgstr =3D 1; + $empty =3D 1; + } + if (/^#.*fuzzy/) { + $fuzzy =3D 1; + } + if (/^#~ msgstr/) { + $unused =3D 1; + } + if ($msgstr && /".+"/) { + $empty =3D 0; + } + push @block, $_; + } +} + +if (@block && !$empty && !$unused) { + print @block; +} diff --git a/po/Makefile.am b/po/Makefile.am index 973ecb42e5..ee1175a524 100644 --- a/po/Makefile.am +++ b/po/Makefile.am @@ -20,6 +20,8 @@ POTFILE :=3D $(DOMAIN).pot POFILES :=3D $(LANGS:%=3D%.po) GMOFILES :=3D $(LANGS:%=3D%.gmo) =20 +MAINTAINERCLEANFILES =3D $(POTFILE) $(POFILES) $(GMOFILES) + EXTRA_DIST =3D \ POTFILES \ $(POTFILE) \ @@ -46,26 +48,26 @@ SED_PO_FIXUP_ARGS =3D \ -e "s|Copyright (C) YEAR|Copyright (C) $$(date +'%Y')|" \ $(NULL) =20 - -# Although they're in EXTRA_DIST, we still need to -# copy these again, because update-gmo will change -# their content, and dist-hook runs after the -# things in EXTRA_DIST are copied. -dist-hook: $(GMOFILES) - cp -f $(POTFILE:%=3D$(srcdir)/%) $(distdir)/ - cp -f $(POFILES:%=3D$(srcdir)/%) $(distdir)/ - cp -f $(GMOFILES:%=3D$(srcdir)/%) $(distdir)/ - update-po: $(POFILES) =20 update-gmo: $(GMOFILES) =20 +update-mini-po: $(POTFILE) + for lang in $(LANGS); do \ + echo "Minimizing $$lang content" && \ + $(MSGMERGE) --no-location --no-fuzzy-matching --sort-output \ + $$lang.po $(POTFILE) | \ + $(SED) $(SED_PO_FIXUP_ARGS) | \ + $(PERL) $(top_srcdir)/build-aux/minimize-po.pl > \ + $(srcdir)/$$lang.mini.po ; \ + done + push-pot: $(POTFILE) zanata push --push-type=3Dsource =20 pull-po: $(POTFILE) zanata pull --create-skeletons - $(MAKE) update-po + $(MAKE) update-mini-po $(MAKE) update-gmo =20 $(POTFILE): POTFILES $(POTFILE_DEPS) @@ -74,9 +76,9 @@ $(POTFILE): POTFILES $(POTFILE_DEPS) $(SED) $(SED_PO_FIXUP_ARGS) < $@-t > $@ rm -f $@-t =20 -%.po: $(POTFILE) - cd $(srcdir) && \ - $(MSGMERGE) --backup=3Doff --no-fuzzy-matching --update $@ $(POTFILE) +%.po: %.mini.po $(POTFILE) + $(MSGMERGE) --no-fuzzy-matching $< $(POTFILE) | \ + $(SED) $(SED_PO_FIXUP_ARGS) > $@ =20 %.gmo: %.po rm -f $(srcdir)/$@ $@-t diff --git a/po/README.md b/po/README.md index e46455e0c0..5c275cf240 100644 --- a/po/README.md +++ b/po/README.md @@ -7,17 +7,39 @@ file formats, in combination with the Zanata web service. Source repository =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 -The libvirt GIT repository stores the master "libvirt.pot" file and full "= po" -files for translations. The master "libvirt.pot" file can be re-generated = using +The libvirt GIT repository does NOT store the master "libvirt.pot" file, n= or +does it store full "po" files for translations. The master "libvirt.pot" f= ile +can be generated at any time using =20 make libvirt.pot =20 -The full po files can have their source locations and msgids updated using +The translations are kept in minimized files that are the same file format +as normal po files but with all redundant information stripped and messages +re-ordered. The key differences between the ".mini.po" files in GIT and the +full ".po" files are + + - msgids with no current translation are omitted + - msgids are sorted in alphabetical order not source file order + - msgids with a msgstr marked "fuzzy" are discarded + - source file locations are omitted + +The full po files can be created at any time using =20 make update-po =20 -Normally these updates are only done when either refreshing translations f= rom -Zanata, or when creating a new release. +This merges the "libvirt.pot" with the "$LANG.mini.po" for each language, = to +create the "$LANG.po" files. These are included in the release archives cr= eated +by "make dist". + +When a full po file is updated, changes can be propagated back into the +minimized po files using + + make update-mini-po + +Note, however, that this is generally not something that should be run by +developers normally, as it is triggered by 'make pull-po' when refreshing +content from Zanata. + =20 Zanata web service =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D @@ -32,5 +54,22 @@ directly to libvirt GIT. Any changes made to "$LANG.mini= .po" files in libvirt GIT will be overwritten & lost the next time content is imported from Zana= ta. =20 The master "libvirt.pot" file is periodically pushed to Zanata to provide = the -translation team with content changes. New translated text is then periodi= cally -pulled down from Zanata to update the po files. +translation team with content changes, using + + make push-pot + +New translated text is then periodically pulled down from Zanata to update= the +minimized po files, using + + make pull-po + +Sometimes the translators make mistakes, most commonly with handling printf +format specifiers. The "pull-po" command re-generates the .gmo files to tr= y to +identify such mistakes. If a mistake is made, the broken msgstr should be +deleted in the local "$LANG.mini.po" file, and the Zanata web interface us= ed +to reject the translation so that the broken msgstr isn't pulled down next= time. + +After pulling down new content the diff should be examined to look for any +obvious mistakes that are not caught automatically. There have been bugs in +Zanata tools which caused messges to go missing, so pay particular attenti= on to +diffs showing deletions where the msgid still exists in libvirt.pot --=20 2.14.3 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list