public inbox for gentoo-project@lists.gentoo.org
 help / color / mirror / Atom feed
* [gentoo-project] dipper.gentoo.org outage post-mortem
@ 2016-05-19 18:44 Robin H. Johnson
  2016-05-19 19:24 ` Kristian Fiskerstrand
  2016-05-19 23:47 ` [gentoo-project] Replacement hardware planning for Gentoo VM hosting Robin H. Johnson
  0 siblings, 2 replies; 7+ messages in thread
From: Robin H. Johnson @ 2016-05-19 18:44 UTC (permalink / raw
  To: gentoo-core, gentoo-project

[-- Attachment #1: Type: text/plain, Size: 3674 bytes --]

Summary
-------
- dipper.gentoo.org suffered a major motherboard failure on Friday, May 13th.
- The outage started around 2016/05/13 08h56 UTC, and was mostly resolved
  at 2016/05/14 20h53 UTC, approximately 36 hours in duration.
- During this time, no rsync updates were issued, nor were distfiles,
  releases or snapshots updated.
- New hardware purchasing is planned to recover capacity & mitigate
  hardware old-age.

Timeline (UTC):
---------------
2016/05/13:
08h56: iDRAC/OOB notification to Infra [1]
09h07: Icinga notifications to Infra
13h14: Infra human (jmbsvicetto) notices the problem [2].
15h14: Hosting sponsor requested to investigate hardware 
       (sponsor localtime 08h14, nobody onsite yet)
15h24: Initial infra discussions about where enough disk space 
       is if we have to move it.
15h42: Sponsor initial investigation suggest dead hardware [3].
16h00: (approx) Data consolidation/backup to other hosts begins.
19h46: Sponsor pulls host, tests, seems dead [4]
21h36: Email to -core/-project & status page update
23h05: Migration/Recovery plan outlined on IRC

2016/05/14:
09h00: (approx) Data consolidation/backup completed.
17h54: Sponsor contact onsite (10h54 localtime)
20h15: "New" host booted
20h53: All-clear notification

2015/05/16: 
Lurking bug with snapshots resolved

Root cause and timeline notes:
------------------------------
This was hardware failure. The hardware was years outside of warranty.
Timing meant we didn't notice it immediately, had to move lots of data,
and were then limited by sponsor staff availability to be hands-on with
hardware for workarounds.

[1] The initial iDRAC reports said:
Event: CPU 1 has a thermal trip (over-temperature) event.
[2] IPMI serial-over-lan gets no useful output, IPMI logs and sensors
report no additional info, power cycle does nothing.
[3] Front panel says CPU1 overheating, power button & unplugging,
replugging have no effect.
[4] LEDs light up, but fans don't spin or anything else when power
button is pushed; reseating CPUs has no effect either.

Corrective & Preventative Measures:
-----------------------------------
A similar system had all data evacuated (archived or simply moved),
including multiple VMs, then the disks from the dead system were
transfered and booted with minimal tweaks (udev, networking). The VMs
are still offline, pending more VM capacity (they have large disk
needs).

The failed hardware was some of the newest hardware in this sponsor
location. It was new as of Nov 2011 w/ 3 year warranty. Other Infra
servers present at the same location:
- 2x Dell systems, new as of Dec 2011 as VM hosts 
  [one of these is the new home of dipper, running natively]
- 2x Dell systems, new as of May 2007;
- 4x Supermicro Atom systems, new as of May 2010 [6x originally, 2x failed]
- (various $arch development systems).

Based on these ages, Infra is preparing hardware specifications for a
new VM hosting environment to be purchased by the trustees and hosted at
the same location. This would host the temporarily offline VMs, as well
as absorb at least the Atom & 2007 Dell systems.

Future actions to improve outcome:
- Move rsync & snapshot generation to a dedicated redundant VM
- Improve distfiles/release tarball process to have more redundancy,
  perhaps push-based.
- Encourage cleanups of roverlay/tinderbox/devbox VMs to shrink size.

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 445 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-project] dipper.gentoo.org outage post-mortem
  2016-05-19 18:44 [gentoo-project] dipper.gentoo.org outage post-mortem Robin H. Johnson
@ 2016-05-19 19:24 ` Kristian Fiskerstrand
  2016-05-20  0:35   ` [gentoo-project] Re: [gentoo-core] " Benda Xu
  2016-05-19 23:47 ` [gentoo-project] Replacement hardware planning for Gentoo VM hosting Robin H. Johnson
  1 sibling, 1 reply; 7+ messages in thread
From: Kristian Fiskerstrand @ 2016-05-19 19:24 UTC (permalink / raw
  To: gentoo-project, gentoo-core


[-- Attachment #1.1: Type: text/plain, Size: 1073 bytes --]

On 05/19/2016 08:44 PM, Robin H. Johnson wrote:
> Summary
> -------
> - dipper.gentoo.org suffered a major motherboard failure on Friday, May 13th.
> - The outage started around 2016/05/13 08h56 UTC, and was mostly resolved
>   at 2016/05/14 20h53 UTC, approximately 36 hours in duration.
> - During this time, no rsync updates were issued, nor were distfiles,
>   releases or snapshots updated.
> - New hardware purchasing is planned to recover capacity & mitigate
>   hardware old-age.

...

> Future actions to improve outcome:
> - Move rsync & snapshot generation to a dedicated redundant VM
> - Improve distfiles/release tarball process to have more redundancy,
>   perhaps push-based.
> - Encourage cleanups of roverlay/tinderbox/devbox VMs to shrink size.
> 

Thank you for the post mortem on the failure Robin. I've said it before
and I'll say it again; I think this outage was well handled!

-- 
Kristian Fiskerstrand
OpenPGP certificate reachable at hkp://pool.sks-keyservers.net
fpr:94CB AFDD 3034 5109 5618 35AA 0B7F 8B60 E3ED FAE3


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 455 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gentoo-project] Replacement hardware planning for Gentoo VM hosting
  2016-05-19 18:44 [gentoo-project] dipper.gentoo.org outage post-mortem Robin H. Johnson
  2016-05-19 19:24 ` Kristian Fiskerstrand
@ 2016-05-19 23:47 ` Robin H. Johnson
  2016-05-20  0:42   ` Matthew Thode
  1 sibling, 1 reply; 7+ messages in thread
From: Robin H. Johnson @ 2016-05-19 23:47 UTC (permalink / raw
  To: gentoo-project

[-- Attachment #1: Type: text/plain, Size: 4101 bytes --]

Infra has already discussed most of this hardware planning in
#gentoo-infra, but I thought it might be useful to see any other
comments on the hardware plan. If you wish to make private comments to
this thread, please send directly to infra@gentoo.org or
gentoo-core@lists.gentoo.org instead of the gentoo-project list.

Remarks like 'you should use ZFS instead of this' aren't directly
helpful to this discussion. What is more useful is pointing out any
potential problems you might see with the plan, or gotchas in the
hardware.

We've previously run Ganeti [0] with general success, and we'd like to
continue doing so (vs libvirt or openstack). It offers VM storage
redundancy via DRBD (amongst other options), which we're going to take
best advantage of by using a cross-over 10Gbit link between two nodes
(as we have no 10GBit switching in the environment). Some of the VMs
will run on spinning disk, others on SSD, others maybe w/ dm-cache.
libvirt IS an easy fallback from Ganeti, but lacks some of the automated
failover and DRBD handling options.

This will house at least the following existing VMs, all of which have
large storage needs:
- woodpecker.gentoo.org
- roverlay.dev.g.o
- tinderbox.amd64.dev.g.o
- devbox.amd64.dev.g.o

And virtualize the following older systems:
[2007 Dells]
- finch.g.o (puppet)
- vulture.g.o (GSoC host)
[2010 Atoms]
- bellbird.g.o (infra services)
- bittern.g.o (blogs webhost)
- bobolink.g.o (rsync.g.o node, dns slave)
- brambling.g.o (bouncer, devmanual, infra-status)
[Other]
- meadowlark.g.o (infra services)

And New VMs/services:
- split git to rsync & snapshot generation from dipper?
- split blogs (and other) database hosting from dipper?

We'd probably keep the two other 2011 Dell systems in operation for the
moment, to distribute load better, but have enough capacity to run their
VMs as when they fail.

The general best prices we've seen are from a vendor that's new to us,
WiredZone, and we're willing to give them a try unless somebody has even
better pricing to offer us.

Hardware (all in $USD):
Supermicro SYS-2028TP-DECTR [1][2]
- $2,732.42/ea, quantity 1
- two half-width 2U nodes in a single chassis w/ shared redundant PSU.
- each node has:
- 2x 10GBe ports (there are no SFP options)
- 12x 2.5" SAS3, controller in JBOD/IT mode
Per node:
Intel Xeon E5-2620v4 [3] -
- $421.56/ea, quantity 2
32GB DDR4 PC4-19200 (2400MHz) 288-pin RDIMM ECC Registered [4],
- $162.89/ea, quantity 4
- require min of two DIMMs per CPU
- price jump to 64GB DIMMs very high.
- buy more RAM later?
Seagate 2TB SAS 12Gb/s 7200RPM 2.5in, ST2000NX0273 [5]
- $315.18/ea, quantity 4
- 4-disk RAID5 (mdadm)
Samsung 850 EVO 1TB, MZ-75E1T0B/AM [6]
- $345.00/ea, quantity 2
- RAID1 (mdadm)
= $3445.40/node

Overall cost:
$2,732.42 - chassis
$3,445.40 - left node components
$3,445.40 - right node components
$  315.18 - 1x spare ST2000NX0273 HDD
$   25.00 - 3ft CAT6a patch cable (estimated)

Parts sub-total: $9,963.40
Labour sub-total: $300 (estimate)
Taxes: $0.00 (Oregon has no sales taxes)
S&H: $200 (estimate)

Grant total: $10,463.40 (USD)

Future hardware improvement options:
- Add more RAM
- Add up to 6x more disks per node.

[0] http://www.ganeti.org/
[1] http://www.supermicro.com/products/system/2U/2028/SYS-2028TP-DECTR.cfm
[2] http://www.wiredzone.com/supermicro-multi-node-servers-twin-barebone-dual-cpu-2-node-sys-2028tp-dectr-10024389
[3] https://www.wiredzone.com/intel-components-cpu-processors-server-bx80660e52620v4-10025960
[4] https://www.wiredzone.com/supermicro-components-memory-ddr4-mem-dr432l-sl01-er24-10025993
[5] https://www.wiredzone.com/seagate-components-hard-drives-enterprise-st2000nx0273-10024175
[6] https://www.wiredzone.com/samsung-components-hard-drives-enterprise-mz-75e1t0b-am-10024043

-- 
Robin Hugh Johnson
Gentoo Linux: Dev, Infra Lead, Foundation Trustee & Treasurer
E-Mail   : robbat2@gentoo.org
GnuPG FP : 11ACBA4F 4778E3F6 E4EDF38E B27B944E 34884E85
GnuPG FP : 7D0B3CEB E9B85B1F 825BCECF EE05E6F6 A48F6136

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 445 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [gentoo-project] Re: [gentoo-core] dipper.gentoo.org outage post-mortem
  2016-05-19 19:24 ` Kristian Fiskerstrand
@ 2016-05-20  0:35   ` Benda Xu
  0 siblings, 0 replies; 7+ messages in thread
From: Benda Xu @ 2016-05-20  0:35 UTC (permalink / raw
  To: Kristian Fiskerstrand; +Cc: gentoo-project, gentoo-core

[-- Attachment #1: Type: text/plain, Size: 1177 bytes --]

Kristian Fiskerstrand <k_f@gentoo.org> writes:

> On 05/19/2016 08:44 PM, Robin H. Johnson wrote:
>> Summary
>> -------
>> - dipper.gentoo.org suffered a major motherboard failure on Friday, May 13th.
>> - The outage started around 2016/05/13 08h56 UTC, and was mostly resolved
>>   at 2016/05/14 20h53 UTC, approximately 36 hours in duration.
>> - During this time, no rsync updates were issued, nor were distfiles,
>>   releases or snapshots updated.
>> - New hardware purchasing is planned to recover capacity & mitigate
>>   hardware old-age.
>
> ...
>
>> Future actions to improve outcome:
>> - Move rsync & snapshot generation to a dedicated redundant VM
>> - Improve distfiles/release tarball process to have more redundancy,
>>   perhaps push-based.
>> - Encourage cleanups of roverlay/tinderbox/devbox VMs to shrink size.
>> 
>
> Thank you for the post mortem on the failure Robin. I've said it before
> and I'll say it again; I think this outage was well handled!

Yeah, I would like to thank the infra team for their hard work.

I think this outage was well handled! +1

And thank you Robin for the well-summed-up post-mortem.

Benda

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 818 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-project] Replacement hardware planning for Gentoo VM hosting
  2016-05-19 23:47 ` [gentoo-project] Replacement hardware planning for Gentoo VM hosting Robin H. Johnson
@ 2016-05-20  0:42   ` Matthew Thode
  2016-05-20  0:45     ` Matthew Thode
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Thode @ 2016-05-20  0:42 UTC (permalink / raw
  To: gentoo-project


[-- Attachment #1.1: Type: text/plain, Size: 4446 bytes --]

On 05/19/2016 06:47 PM, Robin H. Johnson wrote:
> Infra has already discussed most of this hardware planning in
> #gentoo-infra, but I thought it might be useful to see any other
> comments on the hardware plan. If you wish to make private comments to
> this thread, please send directly to infra@gentoo.org or
> gentoo-core@lists.gentoo.org instead of the gentoo-project list.
> 
> Remarks like 'you should use ZFS instead of this' aren't directly
> helpful to this discussion. What is more useful is pointing out any
> potential problems you might see with the plan, or gotchas in the
> hardware.
> 
> We've previously run Ganeti [0] with general success, and we'd like to
> continue doing so (vs libvirt or openstack). It offers VM storage
> redundancy via DRBD (amongst other options), which we're going to take
> best advantage of by using a cross-over 10Gbit link between two nodes
> (as we have no 10GBit switching in the environment). Some of the VMs
> will run on spinning disk, others on SSD, others maybe w/ dm-cache.
> libvirt IS an easy fallback from Ganeti, but lacks some of the automated
> failover and DRBD handling options.
> 
> This will house at least the following existing VMs, all of which have
> large storage needs:
> - woodpecker.gentoo.org
> - roverlay.dev.g.o
> - tinderbox.amd64.dev.g.o
> - devbox.amd64.dev.g.o
> 
> And virtualize the following older systems:
> [2007 Dells]
> - finch.g.o (puppet)
> - vulture.g.o (GSoC host)
> [2010 Atoms]
> - bellbird.g.o (infra services)
> - bittern.g.o (blogs webhost)
> - bobolink.g.o (rsync.g.o node, dns slave)
> - brambling.g.o (bouncer, devmanual, infra-status)
> [Other]
> - meadowlark.g.o (infra services)
> 
> And New VMs/services:
> - split git to rsync & snapshot generation from dipper?
> - split blogs (and other) database hosting from dipper?
> 
> We'd probably keep the two other 2011 Dell systems in operation for the
> moment, to distribute load better, but have enough capacity to run their
> VMs as when they fail.
> 
> The general best prices we've seen are from a vendor that's new to us,
> WiredZone, and we're willing to give them a try unless somebody has even
> better pricing to offer us.
> 
> Hardware (all in $USD):
> Supermicro SYS-2028TP-DECTR [1][2]
> - $2,732.42/ea, quantity 1
> - two half-width 2U nodes in a single chassis w/ shared redundant PSU.
> - each node has:
> - 2x 10GBe ports (there are no SFP options)
> - 12x 2.5" SAS3, controller in JBOD/IT mode
> Per node:
> Intel Xeon E5-2620v4 [3] -
> - $421.56/ea, quantity 2
> 32GB DDR4 PC4-19200 (2400MHz) 288-pin RDIMM ECC Registered [4],
> - $162.89/ea, quantity 4
> - require min of two DIMMs per CPU
> - price jump to 64GB DIMMs very high.
> - buy more RAM later?
> Seagate 2TB SAS 12Gb/s 7200RPM 2.5in, ST2000NX0273 [5]
> - $315.18/ea, quantity 4
> - 4-disk RAID5 (mdadm)
> Samsung 850 EVO 1TB, MZ-75E1T0B/AM [6]
> - $345.00/ea, quantity 2
> - RAID1 (mdadm)
> = $3445.40/node
> 
> Overall cost:
> $2,732.42 - chassis
> $3,445.40 - left node components
> $3,445.40 - right node components
> $  315.18 - 1x spare ST2000NX0273 HDD
> $   25.00 - 3ft CAT6a patch cable (estimated)
> 
> Parts sub-total: $9,963.40
> Labour sub-total: $300 (estimate)
> Taxes: $0.00 (Oregon has no sales taxes)
> S&H: $200 (estimate)
> 
> Grant total: $10,463.40 (USD)
> 
> Future hardware improvement options:
> - Add more RAM
> - Add up to 6x more disks per node.
> 
> [0] http://www.ganeti.org/
> [1] http://www.supermicro.com/products/system/2U/2028/SYS-2028TP-DECTR.cfm
> [2] http://www.wiredzone.com/supermicro-multi-node-servers-twin-barebone-dual-cpu-2-node-sys-2028tp-dectr-10024389
> [3] https://www.wiredzone.com/intel-components-cpu-processors-server-bx80660e52620v4-10025960
> [4] https://www.wiredzone.com/supermicro-components-memory-ddr4-mem-dr432l-sl01-er24-10025993
> [5] https://www.wiredzone.com/seagate-components-hard-drives-enterprise-st2000nx0273-10024175
> [6] https://www.wiredzone.com/samsung-components-hard-drives-enterprise-mz-75e1t0b-am-10024043
> 

+1 to this generally, the one question I have is if we want to spend ~1k
more on one of the 4x nodes, it'd allow us to expand easier in the
future.  The reasoning against this that I can think of is that we want
a higher disk/node ratio (which is just 6 per node instead of 12).

-- 
-- Matthew Thode (prometheanfire)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-project] Replacement hardware planning for Gentoo VM hosting
  2016-05-20  0:42   ` Matthew Thode
@ 2016-05-20  0:45     ` Matthew Thode
  2016-05-20  0:48       ` Matthew Thode
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Thode @ 2016-05-20  0:45 UTC (permalink / raw
  To: gentoo-project


[-- Attachment #1.1: Type: text/plain, Size: 4895 bytes --]

On 05/19/2016 07:42 PM, Matthew Thode wrote:
> On 05/19/2016 06:47 PM, Robin H. Johnson wrote:
>> Infra has already discussed most of this hardware planning in
>> #gentoo-infra, but I thought it might be useful to see any other
>> comments on the hardware plan. If you wish to make private comments to
>> this thread, please send directly to infra@gentoo.org or
>> gentoo-core@lists.gentoo.org instead of the gentoo-project list.
>>
>> Remarks like 'you should use ZFS instead of this' aren't directly
>> helpful to this discussion. What is more useful is pointing out any
>> potential problems you might see with the plan, or gotchas in the
>> hardware.
>>
>> We've previously run Ganeti [0] with general success, and we'd like to
>> continue doing so (vs libvirt or openstack). It offers VM storage
>> redundancy via DRBD (amongst other options), which we're going to take
>> best advantage of by using a cross-over 10Gbit link between two nodes
>> (as we have no 10GBit switching in the environment). Some of the VMs
>> will run on spinning disk, others on SSD, others maybe w/ dm-cache.
>> libvirt IS an easy fallback from Ganeti, but lacks some of the automated
>> failover and DRBD handling options.
>>
>> This will house at least the following existing VMs, all of which have
>> large storage needs:
>> - woodpecker.gentoo.org
>> - roverlay.dev.g.o
>> - tinderbox.amd64.dev.g.o
>> - devbox.amd64.dev.g.o
>>
>> And virtualize the following older systems:
>> [2007 Dells]
>> - finch.g.o (puppet)
>> - vulture.g.o (GSoC host)
>> [2010 Atoms]
>> - bellbird.g.o (infra services)
>> - bittern.g.o (blogs webhost)
>> - bobolink.g.o (rsync.g.o node, dns slave)
>> - brambling.g.o (bouncer, devmanual, infra-status)
>> [Other]
>> - meadowlark.g.o (infra services)
>>
>> And New VMs/services:
>> - split git to rsync & snapshot generation from dipper?
>> - split blogs (and other) database hosting from dipper?
>>
>> We'd probably keep the two other 2011 Dell systems in operation for the
>> moment, to distribute load better, but have enough capacity to run their
>> VMs as when they fail.
>>
>> The general best prices we've seen are from a vendor that's new to us,
>> WiredZone, and we're willing to give them a try unless somebody has even
>> better pricing to offer us.
>>
>> Hardware (all in $USD):
>> Supermicro SYS-2028TP-DECTR [1][2]
>> - $2,732.42/ea, quantity 1
>> - two half-width 2U nodes in a single chassis w/ shared redundant PSU.
>> - each node has:
>> - 2x 10GBe ports (there are no SFP options)
>> - 12x 2.5" SAS3, controller in JBOD/IT mode
>> Per node:
>> Intel Xeon E5-2620v4 [3] -
>> - $421.56/ea, quantity 2
>> 32GB DDR4 PC4-19200 (2400MHz) 288-pin RDIMM ECC Registered [4],
>> - $162.89/ea, quantity 4
>> - require min of two DIMMs per CPU
>> - price jump to 64GB DIMMs very high.
>> - buy more RAM later?
>> Seagate 2TB SAS 12Gb/s 7200RPM 2.5in, ST2000NX0273 [5]
>> - $315.18/ea, quantity 4
>> - 4-disk RAID5 (mdadm)
>> Samsung 850 EVO 1TB, MZ-75E1T0B/AM [6]
>> - $345.00/ea, quantity 2
>> - RAID1 (mdadm)
>> = $3445.40/node
>>
>> Overall cost:
>> $2,732.42 - chassis
>> $3,445.40 - left node components
>> $3,445.40 - right node components
>> $  315.18 - 1x spare ST2000NX0273 HDD
>> $   25.00 - 3ft CAT6a patch cable (estimated)
>>
>> Parts sub-total: $9,963.40
>> Labour sub-total: $300 (estimate)
>> Taxes: $0.00 (Oregon has no sales taxes)
>> S&H: $200 (estimate)
>>
>> Grant total: $10,463.40 (USD)
>>
>> Future hardware improvement options:
>> - Add more RAM
>> - Add up to 6x more disks per node.
>>
>> [0] http://www.ganeti.org/
>> [1] http://www.supermicro.com/products/system/2U/2028/SYS-2028TP-DECTR.cfm
>> [2] http://www.wiredzone.com/supermicro-multi-node-servers-twin-barebone-dual-cpu-2-node-sys-2028tp-dectr-10024389
>> [3] https://www.wiredzone.com/intel-components-cpu-processors-server-bx80660e52620v4-10025960
>> [4] https://www.wiredzone.com/supermicro-components-memory-ddr4-mem-dr432l-sl01-er24-10025993
>> [5] https://www.wiredzone.com/seagate-components-hard-drives-enterprise-st2000nx0273-10024175
>> [6] https://www.wiredzone.com/samsung-components-hard-drives-enterprise-mz-75e1t0b-am-10024043
>>
> 
> +1 to this generally, the one question I have is if we want to spend ~1k
> more on one of the 4x nodes, it'd allow us to expand easier in the
> future.  The reasoning against this that I can think of is that we want
> a higher disk/node ratio (which is just 6 per node instead of 12).
> 
Another reason I just thought of is if we only have access to 15A
outlets.  If so it'd limit us to one of the following (from our list) as
they are 1600W.

http://www.supermicro.com/products/system/2U/2028/SYS-2028TR-HTR.cfmhttp://www.supermicro.com/products/system/2U/2028/SYS-2028TR-H72R.cfm


-- 
-- Matthew Thode (prometheanfire)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 801 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [gentoo-project] Replacement hardware planning for Gentoo VM hosting
  2016-05-20  0:45     ` Matthew Thode
@ 2016-05-20  0:48       ` Matthew Thode
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Thode @ 2016-05-20  0:48 UTC (permalink / raw
  To: gentoo-project

On 05/19/2016 07:45 PM, Matthew Thode wrote:
> On 05/19/2016 07:42 PM, Matthew Thode wrote:
>> On 05/19/2016 06:47 PM, Robin H. Johnson wrote:
>>> Infra has already discussed most of this hardware planning in
>>> #gentoo-infra, but I thought it might be useful to see any other
>>> comments on the hardware plan. If you wish to make private comments to
>>> this thread, please send directly to infra@gentoo.org or
>>> gentoo-core@lists.gentoo.org instead of the gentoo-project list.
>>>
>>> Remarks like 'you should use ZFS instead of this' aren't directly
>>> helpful to this discussion. What is more useful is pointing out any
>>> potential problems you might see with the plan, or gotchas in the
>>> hardware.
>>>
>>> We've previously run Ganeti [0] with general success, and we'd like to
>>> continue doing so (vs libvirt or openstack). It offers VM storage
>>> redundancy via DRBD (amongst other options), which we're going to take
>>> best advantage of by using a cross-over 10Gbit link between two nodes
>>> (as we have no 10GBit switching in the environment). Some of the VMs
>>> will run on spinning disk, others on SSD, others maybe w/ dm-cache.
>>> libvirt IS an easy fallback from Ganeti, but lacks some of the automated
>>> failover and DRBD handling options.
>>>
>>> This will house at least the following existing VMs, all of which have
>>> large storage needs:
>>> - woodpecker.gentoo.org
>>> - roverlay.dev.g.o
>>> - tinderbox.amd64.dev.g.o
>>> - devbox.amd64.dev.g.o
>>>
>>> And virtualize the following older systems:
>>> [2007 Dells]
>>> - finch.g.o (puppet)
>>> - vulture.g.o (GSoC host)
>>> [2010 Atoms]
>>> - bellbird.g.o (infra services)
>>> - bittern.g.o (blogs webhost)
>>> - bobolink.g.o (rsync.g.o node, dns slave)
>>> - brambling.g.o (bouncer, devmanual, infra-status)
>>> [Other]
>>> - meadowlark.g.o (infra services)
>>>
>>> And New VMs/services:
>>> - split git to rsync & snapshot generation from dipper?
>>> - split blogs (and other) database hosting from dipper?
>>>
>>> We'd probably keep the two other 2011 Dell systems in operation for the
>>> moment, to distribute load better, but have enough capacity to run their
>>> VMs as when they fail.
>>>
>>> The general best prices we've seen are from a vendor that's new to us,
>>> WiredZone, and we're willing to give them a try unless somebody has even
>>> better pricing to offer us.
>>>
>>> Hardware (all in $USD):
>>> Supermicro SYS-2028TP-DECTR [1][2]
>>> - $2,732.42/ea, quantity 1
>>> - two half-width 2U nodes in a single chassis w/ shared redundant PSU.
>>> - each node has:
>>> - 2x 10GBe ports (there are no SFP options)
>>> - 12x 2.5" SAS3, controller in JBOD/IT mode
>>> Per node:
>>> Intel Xeon E5-2620v4 [3] -
>>> - $421.56/ea, quantity 2
>>> 32GB DDR4 PC4-19200 (2400MHz) 288-pin RDIMM ECC Registered [4],
>>> - $162.89/ea, quantity 4
>>> - require min of two DIMMs per CPU
>>> - price jump to 64GB DIMMs very high.
>>> - buy more RAM later?
>>> Seagate 2TB SAS 12Gb/s 7200RPM 2.5in, ST2000NX0273 [5]
>>> - $315.18/ea, quantity 4
>>> - 4-disk RAID5 (mdadm)
>>> Samsung 850 EVO 1TB, MZ-75E1T0B/AM [6]
>>> - $345.00/ea, quantity 2
>>> - RAID1 (mdadm)
>>> = $3445.40/node
>>>
>>> Overall cost:
>>> $2,732.42 - chassis
>>> $3,445.40 - left node components
>>> $3,445.40 - right node components
>>> $  315.18 - 1x spare ST2000NX0273 HDD
>>> $   25.00 - 3ft CAT6a patch cable (estimated)
>>>
>>> Parts sub-total: $9,963.40
>>> Labour sub-total: $300 (estimate)
>>> Taxes: $0.00 (Oregon has no sales taxes)
>>> S&H: $200 (estimate)
>>>
>>> Grant total: $10,463.40 (USD)
>>>
>>> Future hardware improvement options:
>>> - Add more RAM
>>> - Add up to 6x more disks per node.
>>>
>>> [0] http://www.ganeti.org/
>>> [1] http://www.supermicro.com/products/system/2U/2028/SYS-2028TP-DECTR.cfm
>>> [2] http://www.wiredzone.com/supermicro-multi-node-servers-twin-barebone-dual-cpu-2-node-sys-2028tp-dectr-10024389
>>> [3] https://www.wiredzone.com/intel-components-cpu-processors-server-bx80660e52620v4-10025960
>>> [4] https://www.wiredzone.com/supermicro-components-memory-ddr4-mem-dr432l-sl01-er24-10025993
>>> [5] https://www.wiredzone.com/seagate-components-hard-drives-enterprise-st2000nx0273-10024175
>>> [6] https://www.wiredzone.com/samsung-components-hard-drives-enterprise-mz-75e1t0b-am-10024043
>>>
>>
>> +1 to this generally, the one question I have is if we want to spend ~1k
>> more on one of the 4x nodes, it'd allow us to expand easier in the
>> future.  The reasoning against this that I can think of is that we want
>> a higher disk/node ratio (which is just 6 per node instead of 12).
>>
> Another reason I just thought of is if we only have access to 15A
> outlets.  If so it'd limit us to one of the following (from our list) as
> they are 1600W.
> 

http://www.supermicro.com/products/system/2U/2028/SYS-2028TR-HTR.cfm
http://www.supermicro.com/products/system/2U/2028/SYS-2028TR-H72R.cfm

fixed...

-- 
-- Matthew Thode (prometheanfire)


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-05-20  0:48 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-05-19 18:44 [gentoo-project] dipper.gentoo.org outage post-mortem Robin H. Johnson
2016-05-19 19:24 ` Kristian Fiskerstrand
2016-05-20  0:35   ` [gentoo-project] Re: [gentoo-core] " Benda Xu
2016-05-19 23:47 ` [gentoo-project] Replacement hardware planning for Gentoo VM hosting Robin H. Johnson
2016-05-20  0:42   ` Matthew Thode
2016-05-20  0:45     ` Matthew Thode
2016-05-20  0:48       ` Matthew Thode

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox