public inbox for gentoo-user@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-user] Traffic volumes for distfiles mirror
@ 2007-01-29  8:39 Alan McKinnon
  2007-01-29  8:47 ` kashani
  0 siblings, 1 reply; 11+ messages in thread
From: Alan McKinnon @ 2007-01-29  8:39 UTC (permalink / raw
  To: gentoo-user

Hi,

I'm considering setting up a local distfiles and portage mirror here in 
the office. Bandwidth volumes from updates during the day are starting 
to make the network admin nervous. But first I need some current 
numbers, does anyone know approximate answers to these questions:

1. How big is a complete the distfiles mirror currently?
2. On average, how much daily bandwidth to keep it up to date?
3. How much daily bandwidth does updating the portage tree use?

Thanks,

alan

-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] Traffic volumes for distfiles mirror
  2007-01-29  8:39 [gentoo-user] Traffic volumes for distfiles mirror Alan McKinnon
@ 2007-01-29  8:47 ` kashani
  2007-01-29  9:15   ` Neil Bothwick
  0 siblings, 1 reply; 11+ messages in thread
From: kashani @ 2007-01-29  8:47 UTC (permalink / raw
  To: gentoo-user

Alan McKinnon wrote:
> Hi,
> 
> I'm considering setting up a local distfiles and portage mirror here in 
> the office. Bandwidth volumes from updates during the day are starting 
> to make the network admin nervous. But first I need some current 
> numbers, does anyone know approximate answers to these questions:
> 
> 1. How big is a complete the distfiles mirror currently?
> 2. On average, how much daily bandwidth to keep it up to date?
> 3. How much daily bandwidth does updating the portage tree use?
> 

I wouldn't bother with a full mirror. Set a local rsync server that 
updates once a day and use http-replicator. That would be far less 
bandwidth than trying to keep a local dist server current.

kashani
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] Traffic volumes for distfiles mirror
  2007-01-29  8:47 ` kashani
@ 2007-01-29  9:15   ` Neil Bothwick
  2007-01-29  9:50     ` Alan McKinnon
  0 siblings, 1 reply; 11+ messages in thread
From: Neil Bothwick @ 2007-01-29  9:15 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 633 bytes --]

On Mon, 29 Jan 2007 00:47:47 -0800, kashani wrote:

> I wouldn't bother with a full mirror. Set a local rsync server that 
> updates once a day and use http-replicator. That would be far less 
> bandwidth than trying to keep a local dist server current.

If daytime bandwidth is a particular issue, you can set up a cron task on
one of more machines (depending on the variety of packages in use) to do

emerge --sync && emerge -uDNf world

to prime the cache during the night. That should reduce your daytime
downloads to almost zero.


-- 
Neil Bothwick

Death is proven to be 99.9% fatal to all laboratory rats.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] Traffic volumes for distfiles mirror
  2007-01-29  9:15   ` Neil Bothwick
@ 2007-01-29  9:50     ` Alan McKinnon
  2007-01-29 12:11       ` Neil Bothwick
  2007-01-29 17:45       ` Daniel da Veiga
  0 siblings, 2 replies; 11+ messages in thread
From: Alan McKinnon @ 2007-01-29  9:50 UTC (permalink / raw
  To: gentoo-user

On Monday 29 January 2007 11:15, Neil Bothwick wrote:
> On Mon, 29 Jan 2007 00:47:47 -0800, kashani wrote:
> > I wouldn't bother with a full mirror. Set a local rsync server that
> > updates once a day and use http-replicator. That would be far less
> > bandwidth than trying to keep a local dist server current.
>
> If daytime bandwidth is a particular issue, you can set up a cron
> task on one of more machines (depending on the variety of packages in
> use) to do
>
> emerge --sync && emerge -uDNf world
>
> to prime the cache during the night. That should reduce your daytime
> downloads to almost zero.

The daytime bandwidth is indeed the issue. This is South Africa, where 
technologically everything is top-notch first-world. Except for 
bandwidth. By local standards our pipe is quite big - a whopping 512k. 
Shared amongst two offices and 140 users. At least I get to do whatever 
I want with the bandwidth after hours - no real users to compete with, 
just their torrents :-)

I already use a fairly complicate solution with emerge -pvf and wget in 
a cron on one of the fileservers, but it's getting cumbersome. And I'd 
rather not maintain an entire gentoo install on a server simply to act 
as a proxy. Would I be right in saying that I'd have to keep 
the "proxy" machine up to date to avoid the inevitable blockers that 
will happen in short order if I don't?

I've been looking into kashani's suggestion of http-replicator, this 
might be a good interim solution till I can come up with something 
better suited to our needs.

Thanks

alan

-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] Traffic volumes for distfiles mirror
  2007-01-29  9:50     ` Alan McKinnon
@ 2007-01-29 12:11       ` Neil Bothwick
  2007-01-29 16:38         ` Harm Geerts
  2007-01-29 17:59         ` Alan McKinnon
  2007-01-29 17:45       ` Daniel da Veiga
  1 sibling, 2 replies; 11+ messages in thread
From: Neil Bothwick @ 2007-01-29 12:11 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1553 bytes --]

On Mon, 29 Jan 2007 11:50:34 +0200, Alan McKinnon wrote:

> I already use a fairly complicate solution with emerge -pvf and wget in 
> a cron on one of the fileservers, but it's getting cumbersome. And I'd 
> rather not maintain an entire gentoo install on a server simply to act 
> as a proxy. Would I be right in saying that I'd have to keep 
> the "proxy" machine up to date to avoid the inevitable blockers that 
> will happen in short order if I don't?
> 
> I've been looking into kashani's suggestion of http-replicator, this 
> might be a good interim solution till I can come up with something 
> better suited to our needs.

I was suggesting the emerge -uDNf world in combination in
http-replicator. The first request forces http-replicator to download the
files, all other request for those files are then handled locally. So if
you run this on a suitable cross-section of machines overnight,
http-replicator's cache will be primed by the time you stumble
bleary-eyed into the office.

If all your machines run a similar mix of software, say KDE desktops, you
only need to run the cron task on one of them.

I use a slightly different approach here, with an NFS mounted $DISTDIR
for all machines and one of them doing emerge -f world each morning. it's
simpler to set up that http-replicator but is less scalable since you'll
get problems if one machines tries to download a file while another is
partway through downloading it.


-- 
Neil Bothwick

Most software is about as user-friendly as a cornered rat!

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] Traffic volumes for distfiles mirror
  2007-01-29 12:11       ` Neil Bothwick
@ 2007-01-29 16:38         ` Harm Geerts
  2007-01-29 17:28           ` Neil Bothwick
  2007-01-29 17:59         ` Alan McKinnon
  1 sibling, 1 reply; 11+ messages in thread
From: Harm Geerts @ 2007-01-29 16:38 UTC (permalink / raw
  To: gentoo-user

On Mon, January 29, 2007 13:11, Neil Bothwick wrote:
> On Mon, 29 Jan 2007 11:50:34 +0200, Alan McKinnon wrote:
>
>> I already use a fairly complicate solution with emerge -pvf and wget in
>> a cron on one of the fileservers, but it's getting cumbersome. And I'd
>> rather not maintain an entire gentoo install on a server simply to act
>> as a proxy. Would I be right in saying that I'd have to keep
>> the "proxy" machine up to date to avoid the inevitable blockers that
>> will happen in short order if I don't?
>>
>> I've been looking into kashani's suggestion of http-replicator, this
>> might be a good interim solution till I can come up with something
>> better suited to our needs.
>
> I was suggesting the emerge -uDNf world in combination in
> http-replicator. The first request forces http-replicator to download the
> files, all other request for those files are then handled locally. So if
> you run this on a suitable cross-section of machines overnight,
> http-replicator's cache will be primed by the time you stumble
> bleary-eyed into the office.
>
> If all your machines run a similar mix of software, say KDE desktops, you
> only need to run the cron task on one of them.
>
> I use a slightly different approach here, with an NFS mounted $DISTDIR
> for all machines and one of them doing emerge -f world each morning. it's
> simpler to set up that http-replicator but is less scalable since you'll
> get problems if one machines tries to download a file while another is
> partway through downloading it.

portage uses locking for distfiles so if your share is writeable you
wouldn't have any need for http-replicator. The locks are kept in
$DISTDIR/.locks/

I'm sharing my distfiles over nfs myself and I haven't had any problems.
portage also takes care of stale lockfiles, the masterclient truncates the
lockfile and the other clients fill the lockfile with data. If a threshold
is met the lock is discarded.
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] Traffic volumes for distfiles mirror
  2007-01-29 16:38         ` Harm Geerts
@ 2007-01-29 17:28           ` Neil Bothwick
  0 siblings, 0 replies; 11+ messages in thread
From: Neil Bothwick @ 2007-01-29 17:28 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1116 bytes --]

On Mon, 29 Jan 2007 17:38:07 +0100, Harm Geerts wrote:

> > I use a slightly different approach here, with an NFS mounted $DISTDIR
> > for all machines and one of them doing emerge -f world each morning.
> > it's simpler to set up that http-replicator but is less scalable
> > since you'll get problems if one machines tries to download a file
> > while another is partway through downloading it.  
> 
> portage uses locking for distfiles so if your share is writeable you
> wouldn't have any need for http-replicator. The locks are kept in
> $DISTDIR/.locks/
> 
> I'm sharing my distfiles over nfs myself and I haven't had any problems.
> portage also takes care of stale lockfiles, the masterclient truncates
> the lockfile and the other clients fill the lockfile with data. If a
> threshold is met the lock is discarded.

You're absolutely right. I set things up like this a long time ago, when
portage's lockfiles didn't work over NFS. I've been avoiding the
"problem" for so long I'd forgotten it was fixed :(


-- 
Neil Bothwick

Computer apathy error: don't bother striking any key.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] Traffic volumes for distfiles mirror
  2007-01-29  9:50     ` Alan McKinnon
  2007-01-29 12:11       ` Neil Bothwick
@ 2007-01-29 17:45       ` Daniel da Veiga
  2007-01-29 22:22         ` Mick
  1 sibling, 1 reply; 11+ messages in thread
From: Daniel da Veiga @ 2007-01-29 17:45 UTC (permalink / raw
  To: gentoo-user

On 1/29/07, Alan McKinnon <alan@linuxholdings.co.za> wrote:
> On Monday 29 January 2007 11:15, Neil Bothwick wrote:
> > On Mon, 29 Jan 2007 00:47:47 -0800, kashani wrote:
> > > I wouldn't bother with a full mirror. Set a local rsync server that
> > > updates once a day and use http-replicator. That would be far less
> > > bandwidth than trying to keep a local dist server current.
> >
> > If daytime bandwidth is a particular issue, you can set up a cron
> > task on one of more machines (depending on the variety of packages in
> > use) to do
> >
> > emerge --sync && emerge -uDNf world
> >
> > to prime the cache during the night. That should reduce your daytime
> > downloads to almost zero.
>
> The daytime bandwidth is indeed the issue. This is South Africa, where
> technologically everything is top-notch first-world. Except for
> bandwidth. By local standards our pipe is quite big - a whopping 512k.
> Shared amongst two offices and 140 users. At least I get to do whatever
> I want with the bandwidth after hours - no real users to compete with,
> just their torrents :-)
>
> I already use a fairly complicate solution with emerge -pvf and wget in
> a cron on one of the fileservers, but it's getting cumbersome. And I'd
> rather not maintain an entire gentoo install on a server simply to act
> as a proxy. Would I be right in saying that I'd have to keep
> the "proxy" machine up to date to avoid the inevitable blockers that
> will happen in short order if I don't?
>
> I've been looking into kashani's suggestion of http-replicator, this
> might be a good interim solution till I can come up with something
> better suited to our needs.
>

I'm using a different setup, of course its a small number of machines
(like 5 or 6), but it works great. I use NFS to mount
/usr/portage/distfiles on a server sharing this dir. Each time someone
request a file, it goes directly to the shared dir, being available
for all machines. This way, its only 1 request per new file, and only
files that are needed for update of the particular software most
machines have in common.

-- 
Daniel da Veiga
Computer Operator - RS - Brazil
-----BEGIN GEEK CODE BLOCK-----
Version: 3.1
GCM/IT/P/O d-? s:- a? C++$ UBLA++ P+ L++ E--- W+++$ N o+ K- w O M- V-
PS PE Y PGP- t+ 5 X+++ R+* tv b+ DI+++ D+ G+ e h+ r+ y++
------END GEEK CODE BLOCK------
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] Traffic volumes for distfiles mirror
  2007-01-29 12:11       ` Neil Bothwick
  2007-01-29 16:38         ` Harm Geerts
@ 2007-01-29 17:59         ` Alan McKinnon
  2007-01-30  9:23           ` Neil Bothwick
  1 sibling, 1 reply; 11+ messages in thread
From: Alan McKinnon @ 2007-01-29 17:59 UTC (permalink / raw
  To: gentoo-user

On Monday 29 January 2007 14:11, Neil Bothwick wrote:
> On Mon, 29 Jan 2007 11:50:34 +0200, Alan McKinnon wrote:
> > I already use a fairly complicate solution with emerge -pvf and
> > wget in a cron on one of the fileservers, but it's getting
> > cumbersome. And I'd rather not maintain an entire gentoo install on
> > a server simply to act as a proxy. Would I be right in saying that
> > I'd have to keep the "proxy" machine up to date to avoid the
> > inevitable blockers that will happen in short order if I don't?
> >
> > I've been looking into kashani's suggestion of http-replicator,
> > this might be a good interim solution till I can come up with
> > something better suited to our needs.
>
> I was suggesting the emerge -uDNf world in combination in
> http-replicator. The first request forces http-replicator to download
> the files, all other request for those files are then handled
> locally. 

OK, that does make more sense. It's what I first thought you meant but 
then I (stupidly) thought I'd assumed wrongly...

> So if you run this on a suitable cross-section of machines 
> overnight, http-replicator's cache will be primed by the time you
> stumble bleary-eyed into the office.

That has to be the most accurate description of my typical mornings I've 
ever read anywhere... :-)

> If all your machines run a similar mix of software, say KDE desktops,
> you only need to run the cron task on one of them.

Um, that's the hard part. Here's KDE, Gnome, Fluxbox, e17 - just for 
WMs. All machines are ~x86 but that's where the similarities end. I 
suppose I could set up a master machine whose world is a combination of 
all the clients. But whatever I chose, the solution doe not appear to 
be simple :-(


alan
-- 
gentoo-user@gentoo.org mailing list



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] Traffic volumes for distfiles mirror
  2007-01-29 17:45       ` Daniel da Veiga
@ 2007-01-29 22:22         ` Mick
  0 siblings, 0 replies; 11+ messages in thread
From: Mick @ 2007-01-29 22:22 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 3036 bytes --]

On Monday 29 January 2007 17:45, Daniel da Veiga wrote:
> On 1/29/07, Alan McKinnon <alan@linuxholdings.co.za> wrote:
> > On Monday 29 January 2007 11:15, Neil Bothwick wrote:
> > > On Mon, 29 Jan 2007 00:47:47 -0800, kashani wrote:
> > > > I wouldn't bother with a full mirror. Set a local rsync server that
> > > > updates once a day and use http-replicator. That would be far less
> > > > bandwidth than trying to keep a local dist server current.
> > >
> > > If daytime bandwidth is a particular issue, you can set up a cron
> > > task on one of more machines (depending on the variety of packages in
> > > use) to do
> > >
> > > emerge --sync && emerge -uDNf world
> > >
> > > to prime the cache during the night. That should reduce your daytime
> > > downloads to almost zero.
> >
> > The daytime bandwidth is indeed the issue. This is South Africa, where
> > technologically everything is top-notch first-world. Except for
> > bandwidth. By local standards our pipe is quite big - a whopping 512k.
> > Shared amongst two offices and 140 users. At least I get to do whatever
> > I want with the bandwidth after hours - no real users to compete with,
> > just their torrents :-)
> >
> > I already use a fairly complicate solution with emerge -pvf and wget in
> > a cron on one of the fileservers, but it's getting cumbersome. And I'd
> > rather not maintain an entire gentoo install on a server simply to act
> > as a proxy. Would I be right in saying that I'd have to keep
> > the "proxy" machine up to date to avoid the inevitable blockers that
> > will happen in short order if I don't?
> >
> > I've been looking into kashani's suggestion of http-replicator, this
> > might be a good interim solution till I can come up with something
> > better suited to our needs.
>
> I'm using a different setup, of course its a small number of machines
> (like 5 or 6), but it works great. I use NFS to mount
> /usr/portage/distfiles on a server sharing this dir. Each time someone
> request a file, it goes directly to the shared dir, being available
> for all machines. This way, its only 1 request per new file, and only
> files that are needed for update of the particular software most
> machines have in common.

I've set up rsyncd and Boa on the server machine (laptop) which has its 
portage and distfiles updated daily at the office.  Then once a week or so I 
rsync the portage of the home machines with the laptop and they fetch any 
needed distfiles from the Boa server.  For details regarding the set up of 
Boa there was a thread a year or so ago on this list.

Of course there's the odd package that only exists on the LAN machines - they 
pull this off the Internet.  They also insist downloading afresh certain 
binaries (e.g. Opera browser) and some CVS packages.  I guess this is ebuild 
related, was thinking of looking into it with the thought of modifying it one 
day so that all available distfiles are pulled in from the Boa server.

HTH.
-- 
Regards,
Mick

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [gentoo-user] Traffic volumes for distfiles mirror
  2007-01-29 17:59         ` Alan McKinnon
@ 2007-01-30  9:23           ` Neil Bothwick
  0 siblings, 0 replies; 11+ messages in thread
From: Neil Bothwick @ 2007-01-30  9:23 UTC (permalink / raw
  To: gentoo-user

[-- Attachment #1: Type: text/plain, Size: 1263 bytes --]

On Mon, 29 Jan 2007 19:59:53 +0200, Alan McKinnon wrote:

> > So if you run this on a suitable cross-section of machines 
> > overnight, http-replicator's cache will be primed by the time you
> > stumble bleary-eyed into the office.  
> 
> That has to be the most accurate description of my typical mornings
> I've ever read anywhere... :-)

If we were meant to turn up at work wide awake, $DEITY wouldn't have
given us coffee machines :)

> > If all your machines run a similar mix of software, say KDE desktops,
> > you only need to run the cron task on one of them.  
> 
> Um, that's the hard part. Here's KDE, Gnome, Fluxbox, e17 - just for 
> WMs. All machines are ~x86 but that's where the similarities end. I 
> suppose I could set up a master machine whose world is a combination of 
> all the clients. But whatever I chose, the solution doe not appear to 
> be simple :-(

There's nothing to stop you installing all the DE/WMs on one box, it
doesn't have to use them all, or run emerge -uf world on more than one.

I guess you could also join all your world files into one, remove dupes
and do something like "emerge -uf system; xargs emerge -uDf <masterfile".


-- 
Neil Bothwick

Don't be humble, you're not that great.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2007-01-30  9:28 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-01-29  8:39 [gentoo-user] Traffic volumes for distfiles mirror Alan McKinnon
2007-01-29  8:47 ` kashani
2007-01-29  9:15   ` Neil Bothwick
2007-01-29  9:50     ` Alan McKinnon
2007-01-29 12:11       ` Neil Bothwick
2007-01-29 16:38         ` Harm Geerts
2007-01-29 17:28           ` Neil Bothwick
2007-01-29 17:59         ` Alan McKinnon
2007-01-30  9:23           ` Neil Bothwick
2007-01-29 17:45       ` Daniel da Veiga
2007-01-29 22:22         ` Mick

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox