* [gentoo-scm] Git Conversion Validation
@ 2012-10-07 22:23 Rich Freeman
2012-10-07 22:37 ` Peter Stuge
0 siblings, 1 reply; 3+ messages in thread
From: Rich Freeman @ 2012-10-07 22:23 UTC (permalink / raw
To: gentoo-scm
FYI - I started a repository of my git validation work at:
git://github.com/rich0/gitvalidate.git
I'm starting on the git side first. I'm taking all my data directly
from the git executables and plan to do the same for cvs - if they
output the same content we should be OK. I did some testing and I
think that my code should handle unicode output if git generates it.
The git repository has 1259922 commits, and it takes 50.5 seconds to
walk the list of commits to produce of trees and their commit info.
Next step is to iteratively perform the map / reduce algorithm I
outlined earlier to get a per-file history similar to what cvs
captures.
Contributions welcome. I'm finding the main issue is cutting down the
overhead of spawning git processes to do the work. While it will make
for more work in theory I might just have git-ls-tree recurse the
trees to reduce the subprocess overhead and then just do the extra
sorting/de-duplication in python. I'm trying to avoid using git
implementations in python since that might expose us to bugs.
Rich
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [gentoo-scm] Git Conversion Validation
2012-10-07 22:23 [gentoo-scm] Git Conversion Validation Rich Freeman
@ 2012-10-07 22:37 ` Peter Stuge
2012-10-08 2:11 ` Rich Freeman
0 siblings, 1 reply; 3+ messages in thread
From: Peter Stuge @ 2012-10-07 22:37 UTC (permalink / raw
To: gentoo-scm
Rich Freeman wrote:
> I'm trying to avoid using git implementations in python since that
> might expose us to bugs.
Take a look at libgit2+pygit2.
//Peter
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: [gentoo-scm] Git Conversion Validation
2012-10-07 22:37 ` Peter Stuge
@ 2012-10-08 2:11 ` Rich Freeman
0 siblings, 0 replies; 3+ messages in thread
From: Rich Freeman @ 2012-10-08 2:11 UTC (permalink / raw
To: gentoo-scm
On Sun, Oct 7, 2012 at 6:37 PM, Peter Stuge <peter@stuge.se> wrote:
> Rich Freeman wrote:
>> I'm trying to avoid using git implementations in python since that
>> might expose us to bugs.
>
> Take a look at libgit2+pygit2.
Well, my goal was to try to stick to the output of the official
commands, figuring that this is essentially the standard to go by. My
understanding is that subtle problems with character encodings and
such were found in past conversation efforts. If unusual characters
are being modified by the conversion program I want to avoid the
verification program making the same mistake and therefore obscuring
the problem.
That said, spawning git several million times is looking to be REALLY
slow, so I think I might bite the bullet and use a library. It seems
like pygit2 is designed to use unicode for everything.
And of course the risk that pygit2/etc has bugs really isn't
necessarily greater than the risk that my own stuff has bugs (though
knowing my intended use I can probably minimize the ones that count -
the logic really is simple).
The repository contains currently what should be a working
implementation (though it doesn't write the final list out to disk).
It is just WAY too slow to run (hence the command line parameter to
limit the number of commits examined).
Pretty busy for a few days, but I'll convert the git spawning and
output parsing to pygit2 calls. As an added bonus I don't have to
deal with the fact that git just LOVES to mangle its output to be
pleasing to eyes and less so to robots.
Rich
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2012-10-08 2:11 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-07 22:23 [gentoo-scm] Git Conversion Validation Rich Freeman
2012-10-07 22:37 ` Peter Stuge
2012-10-08 2:11 ` Rich Freeman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox