public inbox for gentoo-scm@lists.gentoo.org
 help / color / mirror / Atom feed
From: Rich Freeman <rich0@gentoo.org>
To: gentoo-scm@lists.gentoo.org
Subject: [gentoo-scm] Git Conversion Validation
Date: Sun, 7 Oct 2012 18:23:16 -0400	[thread overview]
Message-ID: <CAGfcS_=hPO_AcY8pAwC70x-C0AbSUUFxKi7PpyRsQk1iGsLiGg@mail.gmail.com> (raw)

FYI - I started a repository of my git validation work at:
git://github.com/rich0/gitvalidate.git

I'm starting on the git side first.  I'm taking all my data directly
from the git executables and plan to do the same for cvs - if they
output the same content we should be OK.  I did some testing and I
think that my code should handle unicode output if git generates it.

The git repository has 1259922 commits, and it takes 50.5 seconds to
walk the list of commits to produce of trees and their commit info.

Next step is to iteratively perform the map / reduce algorithm I
outlined earlier to get a per-file history similar to what cvs
captures.

Contributions welcome.  I'm finding the main issue is cutting down the
overhead of spawning git processes to do the work.  While it will make
for more work in theory I might just have git-ls-tree recurse the
trees to reduce the subprocess overhead and then just do the extra
sorting/de-duplication in python.  I'm trying to avoid using git
implementations in python since that might expose us to bugs.

Rich


             reply	other threads:[~2012-10-07 22:23 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-07 22:23 Rich Freeman [this message]
2012-10-07 22:37 ` [gentoo-scm] Git Conversion Validation Peter Stuge
2012-10-08  2:11   ` Rich Freeman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGfcS_=hPO_AcY8pAwC70x-C0AbSUUFxKi7PpyRsQk1iGsLiGg@mail.gmail.com' \
    --to=rich0@gentoo.org \
    --cc=gentoo-scm@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox