From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 12D1D139084 for ; Sat, 25 Nov 2017 20:49:53 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 2278DE0E8B; Sat, 25 Nov 2017 20:49:41 +0000 (UTC) Received: from smtp.gentoo.org (woodpecker.gentoo.org [IPv6:2001:470:ea4a:1:5054:ff:fec7:86e4]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id E435CE0E8B for ; Sat, 25 Nov 2017 20:49:40 +0000 (UTC) Received: from oystercatcher.gentoo.org (oystercatcher.gentoo.org [148.251.78.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id DD24B341130 for ; Sat, 25 Nov 2017 20:49:39 +0000 (UTC) Received: from localhost.localdomain (localhost [IPv6:::1]) by oystercatcher.gentoo.org (Postfix) with ESMTP id 31AC6A78B for ; Sat, 25 Nov 2017 20:49:36 +0000 (UTC) From: "Michał Górny" To: gentoo-commits@lists.gentoo.org Content-Transfer-Encoding: 8bit Content-type: text/plain; charset=UTF-8 Reply-To: gentoo-dev@lists.gentoo.org, "Michał Górny" Message-ID: <1511642957.7f4a0c4c7b45dfbb3ff064cd821380e8dade7534.mgorny@gentoo> Subject: [gentoo-commits] data/glep:master commit in: / X-VCS-Repository: data/glep X-VCS-Files: glep-0074.rst X-VCS-Directories: / X-VCS-Committer: mgorny X-VCS-Committer-Name: Michał Górny X-VCS-Revision: 7f4a0c4c7b45dfbb3ff064cd821380e8dade7534 X-VCS-Branch: master Date: Sat, 25 Nov 2017 20:49:36 +0000 (UTC) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-commits@lists.gentoo.org X-Archives-Salt: 3c90c12a-d011-493d-a454-6dd35884e369 X-Archives-Hash: 2560812b2b5a1f825935b451e74ba553 commit: 7f4a0c4c7b45dfbb3ff064cd821380e8dade7534 Author: Michał Górny gentoo org> AuthorDate: Thu Nov 23 18:37:39 2017 +0000 Commit: Michał Górny gentoo org> CommitDate: Sat Nov 25 20:49:17 2017 +0000 URL: https://gitweb.gentoo.org/data/glep.git/commit/?id=7f4a0c4c glep-0074: Always exclude control characters glep-0074.rst | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/glep-0074.rst b/glep-0074.rst index 8687969..6db6caa 100644 --- a/glep-0074.rst +++ b/glep-0074.rst @@ -138,10 +138,9 @@ Path and filename encoding -------------------------- The path fields in the Manifest file must consist of characters -corresponding to valid UTF-8 code points excluding the NULL character -(``U+0000``), the backwards slash (``\``) and characters classified -as whitespace in the current version of the Unicode standard -[#UNICODE]_. +corresponding to valid UTF-8 code points excluding the backwards slash +(``\``) and characters classified as control characters and whitespace +in the current version of the Unicode standard [#UNICODE]_. Any of the excluded characters that are present in path must be encoded using one of the following escape sequences: @@ -164,8 +163,7 @@ slash used as path component separator should be replaced by forward slash instead. The encoding can be used for other characters as well. In particular, -escaping control characters is recommended to ensure that the file -works correctly in text editors. +escaping non-printable characters might be desirable. File verification @@ -593,16 +591,18 @@ This specification aims to avoid arbitrary restrictions. For this reason, filename characters are only restricted by excluding three technically problematic groups: -1. The NULL character (``U+0000``) is normally used to indicate the end - of a null-terminated string. Its use could therefore break programs - written using C. Furthermore, it is not allowed in any known - filesystem. - -2. The backwards slash character (``\``) is used as path separator +1. The backwards slash character (``\``) is used as path separator on Windows systems, so it's extremely unlikely to be used in real filenames. For this reason it is used to implement character encoding with minimal risk of breaking backwards compatibility. +2. The control characters can trigger special behavior in various + programs and confuse them from recognizing text files. In particular, + the NULL character (``U+0000``) is normally used to indicate the end + of a null-terminated string. Its use could therefore break + implementations written in the C language. Other control characters + could trigger various formatting routines, garbling text output. + 3. Whitespace characters are used to separate Manifest fields and entries. While technically it would be enough to restrict space (``U+0020``) character that is normally used as the separator