From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 261 invoked by uid 1002); 24 Jun 2003 15:20:04 -0000 Mailing-List: contact gentoo-dev-help@gentoo.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-dev@gentoo.org Received: (qmail 11713 invoked from network); 24 Jun 2003 15:20:04 -0000 X-Authentication-Warning: sam.unet.brandeis.edu: rossgir owned process doing -bs Date: Tue, 24 Jun 2003 11:20:02 -0400 (EDT) From: ross b girshick X-X-Sender: rossgir@sam.unet.brandeis.edu To: gentoo-dev@gentoo.org In-Reply-To: <20030610104631.57e74b7d.citizen428@cargal.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Subject: [gentoo-dev] "Updating Portage Cache" optimizations X-Archives-Salt: 97bf29d9-efbc-42cd-8e73-9ce245776609 X-Archives-Hash: 68436a676c38bdafa8e9ddb3cc841730 Hi, Lately I've been bothered by how long it takes to update the portage cache after doing an emerge [r]sync. So I decided to dive into the portage code for the first time to do something about this. What I found seems a little confusing and inefficient. So I'm ask for people to clear up any misconceptions I might have and get some feedback on a _simple_ optimization. The main time siphon during the cache updating process is the function portage.aux_get embedded in a double nested for loop. aux_get either copies the metadata file out of /usr/portage/metadata/cache/ into /var/cache/edb/dep/ or regenerates it using the ebuild if the cached version is old. My laptop's hard-drive is pretty slow (4200RPM, etc) so this process of copying ~ 36MB of small files takes about 4.5 minutes on average. In most cases the metadata files are copied directly. I did a diff on some categories in the /dep/ cache vs. the /metadata/ cache and found only a few files were regenerated. So my first optimization, a whopping one-liner, reduces the cache update time from 4.5 minutes to 2.25 minutes on my system (and saves about 35MB of disk space). Based on the code, I think a lot of other optimization can be added to (such as symlinking whole category directories when there are no regens in it). So far I've had no problems after making this change. Can anyone think of how this would introduce a bug? Thanks, Ross Girshick p.s. I've been using gentoo for quite a while now, but I've just started getting into the dev side of it. What's the proper channel for submitting patches? Here's the patch: --- portage.py.orig 2003-06-24 10:13:49.000000000 -0400 +++ portage.py 2003-06-24 10:15:04.000000000 -0400 @@ -3400,7 +3400,8 @@ if not os.path.exists(mydir): os.makedirs(mydir, 2775) os.chown(mydir,uid,portage_gid) - shutil.copy2(mymdkey, mydbkey) + #shutil.copy2(mymdkey, mydbkey) + os.symlink(mymdkey, mydbkey) usingmdcache=1 except Exception,e: print "!!! Unable to copy '"+mymdkey+"' to '"+mydbkey+"'" -- gentoo-dev@gentoo.org mailing list