* [gentoo-portage-dev] [PATCH] Portage to write a compressed copy of 'Packages' index file
@ 2012-08-08 18:35 W-Mark Kubacki
2012-08-08 20:22 ` Zac Medico
2012-08-08 21:00 ` Zac Medico
0 siblings, 2 replies; 4+ messages in thread
From: W-Mark Kubacki @ 2012-08-08 18:35 UTC (permalink / raw
To: gentoo-portage-dev
[-- Attachment #1: Type: text/plain, Size: 942 bytes --]
Hi Portage devs,
Can I send patches by `git send-email` or do you prefer attachments?
The patch applies to master/HEAD and can be backported to 2.1*. Its
description is as follows:
Portage writes a compressed copy of 'Packages' index file.
This behaviour is enabled by FEATURES="compress-index". The
resulting file is 'Packages.gz' and its modification time will
match that of 'Packages'.
Web-servers use that copy to avoid repeated on-the-fly compression.
In order to re-use 'atomic_ofstream' usage of 'codecs.zlib_codec'
has been considered and discarded, because 'GzipFile' yields
smaller files. (According to Mark's tests 62% smaller.)
Example usage, Nginx:
location =/Packages {
gzip_static on;
default_type text/plain;
}
Apache httpd (use with caution):
RewriteRule ^(.*)/Packages$ $1/Packages.gz
[T=text/plain,E=GZIP:gzip,L]
<FilesMatch "Packages\.gz$">
Header set Content-Encoding gzip
</FilesMatch>
[-- Attachment #2: 0001-Portage-writes-a-compressed-copy-of-Packages-index-f.patch --]
[-- Type: text/plain, Size: 4285 bytes --]
From e4339fe07d60c466d250fe547b8e82ee95da6eea Mon Sep 17 00:00:00 2001
From: W-Mark Kubacki <wmark@hurrikane.de>
Date: Wed, 8 Aug 2012 18:49:36 +0200
Subject: [PATCH] Portage writes a compressed copy of 'Packages' index file.
This behaviour is enabled by FEATURES="compress-index". The
resulting file is 'Packages.gz' and its modification time will
match that of 'Packages'.
Web-servers use that copy to avoid repeated on-the-fly compression.
In order to re-use 'atomic_ofstream' usage of 'codecs.zlib_codec'
has been considered and discarded, because 'GzipFile' yields
smaller files. (According to Mark's tests 62% smaller.)
Example usage, Nginx:
location =/Packages {
gzip_static on;
default_type text/plain;
}
Apache httpd (use with caution):
RewriteRule ^(.*)/Packages$ $1/Packages.gz [T=text/plain,E=GZIP:gzip,L]
<FilesMatch "Packages\.gz$">
Header set Content-Encoding gzip
</FilesMatch>
---
man/make.conf.5 | 7 +++++++
pym/portage/const.py | 2 +-
pym/portage/dbapi/bintree.py | 21 +++++++++++++++------
3 files changed, 23 insertions(+), 7 deletions(-)
diff --git a/man/make.conf.5 b/man/make.conf.5
index 876a8a3..d23b0e1 100644
--- a/man/make.conf.5
+++ b/man/make.conf.5
@@ -268,6 +268,13 @@ space. Make sure you have built both binutils and gdb with USE=zlib
support for this to work. See \fBsplitdebug\fR for general split debug
information (upon which this feature depends).
.TP
+.B compress-index
+If set then a compressed copy of 'Packages' index file will be written.
+This feature is intended for Gentoo binhosts using certain webservers
+(such as, but not limited to, Nginx with gzip_static module) to avoid
+redundant on-the-fly compression. The resulting file will be called
+'Packages.gz' and its modification time will match that of 'Packages'.
+.TP
.B config\-protect\-if\-modified
This causes the \fBCONFIG_PROTECT\fR behavior to be skipped for files
that have not been modified since they were installed. This feature is
diff --git a/pym/portage/const.py b/pym/portage/const.py
index ceef5c5..c2049f8 100644
--- a/pym/portage/const.py
+++ b/pym/portage/const.py
@@ -89,7 +89,7 @@ SUPPORTED_FEATURES = frozenset([
"assume-digests", "binpkg-logs", "buildpkg", "buildsyspkg", "candy",
"ccache", "chflags", "clean-logs",
"collision-protect", "compress-build-logs", "compressdebug",
- "config-protect-if-modified",
+ "compress-index", "config-protect-if-modified",
"digest", "distcc", "distcc-pump", "distlocks",
"downgrade-backup", "ebuild-locks", "fakeroot",
"fail-clean", "force-mirror", "force-prefix", "getbinpkg",
diff --git a/pym/portage/dbapi/bintree.py b/pym/portage/dbapi/bintree.py
index 0367503..204bd44 100644
--- a/pym/portage/dbapi/bintree.py
+++ b/pym/portage/dbapi/bintree.py
@@ -41,6 +41,7 @@ import sys
import tempfile
import textwrap
import warnings
+from gzip import GzipFile
from itertools import chain
try:
from urllib.parse import urlparse
@@ -1186,13 +1187,21 @@ class binarytree(object):
pkgindex.packages.append(d)
self._update_pkgindex_header(pkgindex.header)
- pkgindex_filename = os.path.join(self.pkgdir, "Packages")
- f = atomic_ofstream(pkgindex_filename)
- pkgindex.write(f)
- f.close()
- # some seconds might have elapsed since TIMESTAMP
atime = mtime = long(pkgindex.header["TIMESTAMP"])
- os.utime(pkgindex_filename, (atime, mtime))
+ contents = codecs.getwriter("utf-8")(io.BytesIO())
+ pkgindex.write(contents)
+ contents = contents.getvalue()
+
+ pkgindex_filename = os.path.join(self.pkgdir, "Packages")
+ output_files = [(atomic_ofstream(pkgindex_filename), pkgindex_filename)]
+ if "compress-index" in self.settings.features:
+ gz_fname = pkgindex_filename + ".gz"
+ output_files.append((GzipFile(gz_fname, mode="wb"), gz_fname))
+ for f, fname in output_files:
+ f.write(contents)
+ f.close()
+ # some seconds might have elapsed since TIMESTAMP
+ os.utime(fname, (atime, mtime))
finally:
if pkgindex_lock:
unlockfile(pkgindex_lock)
--
1.7.8.6
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [gentoo-portage-dev] [PATCH] Portage to write a compressed copy of 'Packages' index file
2012-08-08 18:35 [gentoo-portage-dev] [PATCH] Portage to write a compressed copy of 'Packages' index file W-Mark Kubacki
@ 2012-08-08 20:22 ` Zac Medico
2012-08-08 21:00 ` Zac Medico
1 sibling, 0 replies; 4+ messages in thread
From: Zac Medico @ 2012-08-08 20:22 UTC (permalink / raw
To: gentoo-portage-dev
On 08/08/2012 11:35 AM, W-Mark Kubacki wrote:
> Hi Portage devs,
>
> Can I send patches by `git send-email` or do you prefer attachments?
Either way is fine.
> The patch applies to master/HEAD and can be backported to 2.1*. Its
> description is as follows:
>
> Portage writes a compressed copy of 'Packages' index file.
>
> This behaviour is enabled by FEATURES="compress-index". The
> resulting file is 'Packages.gz' and its modification time will
> match that of 'Packages'.
>
> Web-servers use that copy to avoid repeated on-the-fly compression.
>
> In order to re-use 'atomic_ofstream' usage of 'codecs.zlib_codec'
> has been considered and discarded, because 'GzipFile' yields
> smaller files. (According to Mark's tests 62% smaller.)
>
> Example usage, Nginx:
>
> location =/Packages {
> gzip_static on;
> default_type text/plain;
> }
>
> Apache httpd (use with caution):
>
> RewriteRule ^(.*)/Packages$ $1/Packages.gz
> [T=text/plain,E=GZIP:gzip,L]
> <FilesMatch "Packages\.gz$">
> Header set Content-Encoding gzip
> </FilesMatch>
>
Thanks, I've applied your patch:
http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=11c0619c63b54346ee5c67cd67ab1ccb24f5f947
--
Thanks,
Zac
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [gentoo-portage-dev] [PATCH] Portage to write a compressed copy of 'Packages' index file
2012-08-08 18:35 [gentoo-portage-dev] [PATCH] Portage to write a compressed copy of 'Packages' index file W-Mark Kubacki
2012-08-08 20:22 ` Zac Medico
@ 2012-08-08 21:00 ` Zac Medico
2012-08-08 21:56 ` W-Mark Kubacki
1 sibling, 1 reply; 4+ messages in thread
From: Zac Medico @ 2012-08-08 21:00 UTC (permalink / raw
To: gentoo-portage-dev
On 08/08/2012 11:35 AM, W-Mark Kubacki wrote:
> In order to re-use 'atomic_ofstream' usage of 'codecs.zlib_codec'
> has been considered and discarded, because 'GzipFile' yields
> smaller files. (According to Mark's tests 62% smaller.)
I've fixed it to use an atomic_ofstream as GzipFile's fileobj argument:
http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=e95a07267c7f642fdca2aca346ab4c12f46748bb
--
Thanks,
Zac
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [gentoo-portage-dev] [PATCH] Portage to write a compressed copy of 'Packages' index file
2012-08-08 21:00 ` Zac Medico
@ 2012-08-08 21:56 ` W-Mark Kubacki
0 siblings, 0 replies; 4+ messages in thread
From: W-Mark Kubacki @ 2012-08-08 21:56 UTC (permalink / raw
To: gentoo-portage-dev
On Wed, Aug 08, 2012 at 02:00:23PM -0700, Zac Medico wrote:
> On 08/08/2012 11:35 AM, W-Mark Kubacki wrote:
> > In order to re-use 'atomic_ofstream' usage of 'codecs.zlib_codec'
> > has been considered and discarded, because 'GzipFile' yields
> > smaller files. (According to Mark's tests 62% smaller.)
>
> I've fixed it to use an atomic_ofstream as GzipFile's fileobj argument:
>
> http://git.overlays.gentoo.org/gitweb/?p=proj/portage.git;a=commit;h=e95a07267c7f642fdca2aca346ab4c12f46748bb
I've noticed the differences between my initial patch and your commit.
Thank you!
Indeed, GzipFile not closing the underlying 'fileobj' makes things ugly.
Your previous and now detached commit (where 'os.pid' has been appended
to the file name and, after having been closed, the file was renamed)
made me wonder whether modification times would be preserved. Well,
that's an obsolete though now.
The example usage with Apache httpd is not complete. Some conditions are
missing. I have no Apache httpd at hand, but that's what I gather from
the documentation:
RewriteCond %{HTTP:Accept-encoding} gzip
RewriteCond %{REQUEST_FILENAME}\.gz -s
RewriteRule ^(.*)/Packages$ $1/Packages.gz
[QSA,T=text/plain,E=no-gzip:1,L]
<FilesMatch "Packages\.gz$">
Header set Content-Encoding gzip
</FilesMatch>
'Packages' served as 'text/plain' will prevent browsers from downloading
that file like, say, zip files. And, one of my next patches will
introduce the 'Accept' HTTP header which could come in handy in future
extensions – and content-negotiation on server-side; 'text/plain+diff'
anyone? ;-)
--
Grüße, Mark
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2012-08-09 0:05 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-08-08 18:35 [gentoo-portage-dev] [PATCH] Portage to write a compressed copy of 'Packages' index file W-Mark Kubacki
2012-08-08 20:22 ` Zac Medico
2012-08-08 21:00 ` Zac Medico
2012-08-08 21:56 ` W-Mark Kubacki
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox