public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
From: Alan McKinnon <alan.mckinnon@gmail.com>
To: gentoo-user@lists.gentoo.org
Subject: Re: [gentoo-user] Mirroring part of tinderbox.dev.gentoo.org
Date: Thu, 24 Oct 2013 08:06:44 +0200	[thread overview]
Message-ID: <5268B8F4.2090101@gmail.com> (raw)
In-Reply-To: <CAKXfNZG5q+OOQ6RRAD=RP5KzdwbXa=97pyJj9eLQzudnDubZTw@mail.gmail.com>

On 23/10/2013 20:40, Noah McNallie wrote:
> I'm trying to mirror
> 'http://tinderbox.dev.gentoo.org/default-linux/sparc' because I run a
> sparcv9 machine that is quite old and not good for compiling. This is
> the only gentoo sparc binary repo that I know of and I'd like to have a
> local copy that I would make available on a webserver.
> 
> I'd like to know the best way to mirror it. I have been trying with the
> following command:
> 
> 'wget -mkN -np http://tinderbox.dev.gentoo.org/default-linux/sparc/'
> 
> -m to mirror -N to check timestamps and only update and -np not to
> access the parent directory.
> 
> This will get the files but the second time around it does not update to
> only changed files but tries to grab everything again.
> 
> Could someone point me in the direction for mirroring this lovely repo?
> 
> Noah McNallie


You are downloading tbz files, not html files, so every file downloaded
has this in the output to the console:


Reusing existing connection to tinderbox.dev.gentoo.org:80.
HTTP request sent, awaiting response... 200 OK
Length: 104635 (102K) [application/octet-stream]
Last-modified header missing -- time-stamps turned off.
                                ^^^^^^^^^^^^^^^^^^^^^^

This leaves wget only one option - it cannot confirm that the file is
unchanged, so it has to download it newly just to be sure. I don't know
of any option to wget to assume that existing local files with the _same
size_ as remote files must be identical and to ignore them. That would
indeed be very dodgy and unsafe.

rsync was developed to amongst other things work around this kind of
problem - the protocol transmits the information needed to make this
decision instead of trying to rely on HTML headers. tinderbox.dev is
also not running rsyncd :-(

Re-downloading everything everytime seems to be your only option. I
don;t imagine that repo changes all that much with time though, why
don't you just re-sync infrequently, like once a week maybe?

-- 
Alan McKinnon
alan.mckinnon@gmail.com



      reply	other threads:[~2013-10-24  6:12 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-23 18:40 [gentoo-user] Mirroring part of tinderbox.dev.gentoo.org Noah McNallie
2013-10-24  6:06 ` Alan McKinnon [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5268B8F4.2090101@gmail.com \
    --to=alan.mckinnon@gmail.com \
    --cc=gentoo-user@lists.gentoo.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox