From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id C0AA1139084 for ; Sat, 25 Nov 2017 20:49:44 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id A75B3E0E62; Sat, 25 Nov 2017 20:49:39 +0000 (UTC) Received: from smtp.gentoo.org (smtp.gentoo.org [140.211.166.183]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id 786B3E0E62 for ; Sat, 25 Nov 2017 20:49:39 +0000 (UTC) Received: from oystercatcher.gentoo.org (oystercatcher.gentoo.org [148.251.78.52]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.gentoo.org (Postfix) with ESMTPS id 4EA38340DF9 for ; Sat, 25 Nov 2017 20:49:38 +0000 (UTC) Received: from localhost.localdomain (localhost [IPv6:::1]) by oystercatcher.gentoo.org (Postfix) with ESMTP id 9F629A784 for ; Sat, 25 Nov 2017 20:49:35 +0000 (UTC) From: "Michał Górny" To: gentoo-commits@lists.gentoo.org Content-Transfer-Encoding: 8bit Content-type: text/plain; charset=UTF-8 Reply-To: gentoo-dev@lists.gentoo.org, "Michał Górny" Message-ID: <1511642956.e1881788598f23191d79f15a0ecf09fbda668a75.mgorny@gentoo> Subject: [gentoo-commits] data/glep:master commit in: / X-VCS-Repository: data/glep X-VCS-Files: glep-0074.rst X-VCS-Directories: / X-VCS-Committer: mgorny X-VCS-Committer-Name: Michał Górny X-VCS-Revision: e1881788598f23191d79f15a0ecf09fbda668a75 X-VCS-Branch: master Date: Sat, 25 Nov 2017 20:49:35 +0000 (UTC) Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-commits@lists.gentoo.org X-Archives-Salt: 89a3b488-e9e6-4415-b74f-b3a1b62d406e X-Archives-Hash: 6682f3cffa9f72f10fbb3b93fb376364 commit: e1881788598f23191d79f15a0ecf09fbda668a75 Author: Michał Górny gentoo org> AuthorDate: Mon Nov 20 18:40:41 2017 +0000 Commit: Michał Górny gentoo org> CommitDate: Sat Nov 25 20:49:16 2017 +0000 URL: https://gitweb.gentoo.org/data/glep.git/commit/?id=e1881788 glep-0074: Disallow filenames containing whitespace glep-0074.rst | 52 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 52 insertions(+) diff --git a/glep-0074.rst b/glep-0074.rst index f96a58e..46ad9fe 100644 --- a/glep-0074.rst +++ b/glep-0074.rst @@ -132,6 +132,13 @@ are not otherwise ignored reside on a different filesystem, or symbolic links point to targets on a different filesystem, they must be explicitly excluded via ``IGNORE``. +All paths specified in the Manifest file must consist of characters +corresponding to valid UTF-8 code points excluding the NULL character +(``U+0000``) and characters classified as whitespace in the current +version of the Unicode standard [#UNICODE]_. It is an error to use +Manifest files in directories containing files whose names contain +the disallowed characters. + File verification ----------------- @@ -542,6 +549,45 @@ In particular, tools might then claim that a file does not exist when it clearly does because it was skipped due to filesystem boundaries. +Filename character set restriction +---------------------------------- + +The valid set of filename characters for the Gentoo repository +is restricted by the devmanual 'File Naming Rules' section +[#FILE-NAMING-RULES]_, and enforced via a git hook. The valid distfile +names are not restricted explicitly -- however, the PMS dependency +specification syntax [#PMS-FETCH]_ implicitly makes it impossible to use +filenames containing whitespace. + +This specification aims to avoid arbitrary restrictions. For this +reason, the filename characters are only restricted by excluding two +technically problematic groups: + +1. The NULL character (``U+0000``) is normally used to indicate the end + of a null-terminated string. Its use could therefore break programs + written using C. Furthermore, it is not allowed in any known + filesystem. + +2. The whitespace characters are used to separate Manifest fields. While + technically it would be enough to restrict space (``U+0020``) + character that is normally used as the separator, all whitespace + characters are forbidden to avoid confusion and implementation + errors. + +While the specification could be extended to allow such filenames +by using some form of escaping, there is currently no apparent need +for such a feature. + +Historically, Portage attempted to overcome the whitespace limitation +by attempting to locate the size field and take everything before it +as filename. This was terribly fragile and even if it worked, it would +solve the problem only partially. + +Since the same restrictions apply to ``IGNORE`` rules, it is currently +not possible to either list or ignore the file using whitespace +characters. Therefore, the presence of such files is forbidden entirely. + + File verification model ----------------------- @@ -880,10 +926,16 @@ References .. [#GLEP61] GLEP 61: Manifest2 compression (https://www.gentoo.org/glep/glep-0061.html) +.. [#UNICODE] The Unicode standard + (https://unicode.org/versions/latest/) + .. [#PMS-FETCH] Package Manager Specification: Dependency Specification Format - SRC_URI (https://projects.gentoo.org/pms/6/pms.html#x1-940008.2.10) +.. [#FILE-NAMING-RULES] Ebuild File Format -- Gentoo Development Guide + (https://devmanual.gentoo.org/ebuild-writing/file-format/#file-naming-rules) + .. [#MD5] RFC1321: The MD5 Message-Digest Algorithm (https://www.ietf.org/rfc/rfc1321.txt)