From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id E6E96139083 for ; Thu, 23 Nov 2017 18:45:55 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 558A2E0E2D; Thu, 23 Nov 2017 18:45:27 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 1F94EE0E3C for ; Thu, 23 Nov 2017 18:45:27 +0000 (UTC) Received: from oystercatcher.gentoo.org (unknown [IPv6:2a01:4f8:202:4333:225:90ff:fed9:fc84]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id 1F3983402FE for ; Thu, 23 Nov 2017 18:45:26 +0000 (UTC) Received: from localhost.localdomain (localhost [IPv6:::1]) by oystercatcher.gentoo.org (Postfix) with ESMTP id D7643A514 for ; Thu, 23 Nov 2017 18:45:24 +0000 (UTC) From: "Michał Górny" To: gentoo-commits@lists.gentoo.org Content-Transfer-Encoding: 8bit Content-type: text/plain; charset=UTF-8 Reply-To: gentoo-dev@lists.gentoo.org, "Michał Górny" Message-ID: <1511462259.ed111f85c3e7ab98678ee0379589281a2c92380c.mgorny@gentoo> Subject: [gentoo-commits] data/glep:glep-manifest commit in: / X-VCS-Repository: data/glep X-VCS-Files: glep-0074.rst X-VCS-Directories: / X-VCS-Committer: mgorny X-VCS-Committer-Name: Michał Górny X-VCS-Revision: ed111f85c3e7ab98678ee0379589281a2c92380c X-VCS-Branch: glep-manifest Date: Thu, 23 Nov 2017 18:45:24 +0000 (UTC) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-commits@lists.gentoo.org X-Archives-Salt: 783304b1-efa9-45c6-8156-50c7f3f2b74d X-Archives-Hash: d74a7e8d8120b63def15dad56c14d950 commit: ed111f85c3e7ab98678ee0379589281a2c92380c Author: Michał Górny gentoo org> AuthorDate: Thu Nov 23 18:37:39 2017 +0000 Commit: Michał Górny gentoo org> CommitDate: Thu Nov 23 18:37:39 2017 +0000 URL: https://gitweb.gentoo.org/data/glep.git/commit/?id=ed111f85 glep-0074: Always exclude control characters glep-0074.rst | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/glep-0074.rst b/glep-0074.rst index 8687969..6db6caa 100644 --- a/glep-0074.rst +++ b/glep-0074.rst @@ -138,10 +138,9 @@ Path and filename encoding -------------------------- The path fields in the Manifest file must consist of characters -corresponding to valid UTF-8 code points excluding the NULL character -(``U+0000``), the backwards slash (``\``) and characters classified -as whitespace in the current version of the Unicode standard -[#UNICODE]_. +corresponding to valid UTF-8 code points excluding the backwards slash +(``\``) and characters classified as control characters and whitespace +in the current version of the Unicode standard [#UNICODE]_. Any of the excluded characters that are present in path must be encoded using one of the following escape sequences: @@ -164,8 +163,7 @@ slash used as path component separator should be replaced by forward slash instead. The encoding can be used for other characters as well. In particular, -escaping control characters is recommended to ensure that the file -works correctly in text editors. +escaping non-printable characters might be desirable. File verification @@ -593,16 +591,18 @@ This specification aims to avoid arbitrary restrictions. For this reason, filename characters are only restricted by excluding three technically problematic groups: -1. The NULL character (``U+0000``) is normally used to indicate the end - of a null-terminated string. Its use could therefore break programs - written using C. Furthermore, it is not allowed in any known - filesystem. - -2. The backwards slash character (``\``) is used as path separator +1. The backwards slash character (``\``) is used as path separator on Windows systems, so it's extremely unlikely to be used in real filenames. For this reason it is used to implement character encoding with minimal risk of breaking backwards compatibility. +2. The control characters can trigger special behavior in various + programs and confuse them from recognizing text files. In particular, + the NULL character (``U+0000``) is normally used to indicate the end + of a null-terminated string. Its use could therefore break + implementations written in the C language. Other control characters + could trigger various formatting routines, garbling text output. + 3. Whitespace characters are used to separate Manifest fields and entries. While technically it would be enough to restrict space (``U+0020``) character that is normally used as the separator