From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 02B72138336 for ; Sat, 21 Dec 2019 05:20:02 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id D0D33E08FF; Sat, 21 Dec 2019 05:20:01 +0000 (UTC) Received: from smtp.gentoo.org (mail.gentoo.org [IPv6:2001:470:ea4a:1:5054:ff:fec7:86e4]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id B6AE2E08FF for ; Sat, 21 Dec 2019 05:20:01 +0000 (UTC) Received: from oystercatcher.gentoo.org (unknown [IPv6:2a01:4f8:202:4333:225:90ff:fed9:fc84]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id 6CD5834D4D8 for ; Sat, 21 Dec 2019 05:20:00 +0000 (UTC) Received: from localhost.localdomain (localhost [IPv6:::1]) by oystercatcher.gentoo.org (Postfix) with ESMTP id 22AE1973 for ; Sat, 21 Dec 2019 05:19:58 +0000 (UTC) From: "Ulrich Müller" To: gentoo-commits@lists.gentoo.org Content-Transfer-Encoding: 8bit Content-type: text/plain; charset=UTF-8 Reply-To: gentoo-dev@lists.gentoo.org, "Ulrich Müller" Message-ID: <1576905496.1bfa9bda3f10627a9798edfc65472d59bc9ffeba.ulm@gentoo> Subject: [gentoo-commits] proj/devmanual:master commit in: bin/ X-VCS-Repository: proj/devmanual X-VCS-Files: bin/build_search_documents.py X-VCS-Directories: bin/ X-VCS-Committer: ulm X-VCS-Committer-Name: Ulrich Müller X-VCS-Revision: 1bfa9bda3f10627a9798edfc65472d59bc9ffeba X-VCS-Branch: master Date: Sat, 21 Dec 2019 05:19:58 +0000 (UTC) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-commits@lists.gentoo.org X-Auto-Response-Suppress: DR, RN, NRN, OOF, AutoReply X-Archives-Salt: 12784e61-3ab1-4c96-b9ba-ec58188a4778 X-Archives-Hash: 1bd8bbea2cceb7446aa8f3a25cbbbf4f commit: 1bfa9bda3f10627a9798edfc65472d59bc9ffeba Author: Göktürk Yüksek gentoo org> AuthorDate: Sat Dec 21 04:36:03 2019 +0000 Commit: Ulrich Müller gentoo org> CommitDate: Sat Dec 21 05:18:16 2019 +0000 URL: https://gitweb.gentoo.org/proj/devmanual.git/commit/?id=1bfa9bda bin/build_search_documents.py: fix aggressive whitespace stripping In stringify_node(), we aggressively strip the whitespaces around children nodes. This results in something like "SLOT, :SLOT" being parsed as "SLOT,:SLOT", removing the white space between ',' and ':'. Signed-off-by: Göktürk Yüksek gentoo.org> Signed-off-by: Ulrich Müller gentoo.org> bin/build_search_documents.py | 18 ++++++++++++++---- 1 file changed, 14 insertions(+), 4 deletions(-) diff --git a/bin/build_search_documents.py b/bin/build_search_documents.py index 3816fdb..e19dce6 100755 --- a/bin/build_search_documents.py +++ b/bin/build_search_documents.py @@ -18,22 +18,32 @@ def stringify_node(parent: ET.Element) -> str: parent -- the node to convert to a string """ + # We usually have something like: + #

\nText + # Left strip the whitespace. if parent.text: text = parent.text.lstrip() else: text = str() + # For each child, strip the tags and append to text + # along with the tail text following it. + # The tail may include '\n' if it spans multiple lines. + # We will worry about those on return, not now. for child in parent.getchildren(): # The '' tag is simply a fancier '-' character if child.tag == 'd': text += '-' if child.text: - text += child.text.lstrip() + text += child.text if child.tail: - text += child.tail.rstrip() + text += child.tail - text += parent.tail.rstrip() - return text.replace('\n', ' ') + # A paragraph typically ends with: + # Text\n

+ # Right strip any spurious whitespace. + # Finally, get rid of any intermediate newlines. + return text.rstrip().replace('\n', ' ') def process_node(documents: list, node: ET.Element, name: str, url: str) -> None: