From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from pigeon.gentoo.org ([208.92.234.80] helo=lists.gentoo.org) by finch.gentoo.org with esmtp (Exim 4.60) (envelope-from ) id 1QtoIc-0006a8-6C for garchives@archives.gentoo.org; Wed, 17 Aug 2011 22:09:22 +0000 Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id CB3B121C3B8; Wed, 17 Aug 2011 22:09:12 +0000 (UTC) Received: from www01.badapple.net (www01.badapple.net [64.79.219.163]) by pigeon.gentoo.org (Postfix) with ESMTP id 44BB421C3AF for ; Wed, 17 Aug 2011 22:08:10 +0000 (UTC) Received: from [127.0.0.1] (173-8-169-73-SFBA.hfc.comcastbusiness.net [173.8.169.73]) (Authenticated sender: ramin@badapple.net) by www01.badapple.net (Postfix) with ESMTPSA id BD00D9FAFBC9 for ; Wed, 17 Aug 2011 15:08:09 -0700 (PDT) Message-ID: <4E4C3BC9.7060105@badapple.net> Date: Wed, 17 Aug 2011 15:08:09 -0700 From: kashani User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20110812 Thunderbird/6.0 Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 To: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] Running HTTP and DNS on same machine References: <1348288.bdNIif9y8Z@nazgul> <4E4C310D.9010408@badapple.net> <2005305.NAJv4TkKfY@nazgul> In-Reply-To: <2005305.NAJv4TkKfY@nazgul> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Archives-Salt: X-Archives-Hash: 134230422ab223ef2ffc2c3be3d73066 On 8/17/2011 2:43 PM, Alan McKinnon wrote: > > I'm just itching to type up the long list of horror stories I've > stored from people doing their own DNS thinking it was real easy. > > But there's this little thing called an NDA and it says I can't :-( heh, I think I can dredge one up for you that no one will care about these days. This was at a large ISP in '99 known for their free Internet. Bind 8 was fresh on the scene and somehow Network Engineering was in charge of DNS rather than Systems. My intern and I came up with a plan to have ns00.int as the internal master and make the rest of name servers slave off of it. All ns00 did was supply the production name servers with zones. ns00 --> ns01(vip) --> ns01-[01-03] \--> ns02(vip) --> ns02-[01-03] \-> ns03(vip) --> ns03-[01-03] Three virtual IPs and three name servers behind each vip. This way we could have tools deal with updating zones on ns00 on the internal network and not have to push to a number of name servers. This worked well for a few months and we generally forgot about it. Almost a month after a reorganization in the local datacenter DNS went down. Well not down down, but our zones weren't working. After a hectic hour of freaking out, troubleshooting random things, and bouncing from machine to machine by IP address because none of DNS worked we realized our mistake. The TTL of the zone itself was set to three weeks. In the move Bind had silently died on ns00 which we didn't monitor because it was inside the corp network. The slaves dutifully stayed up and working till they hit the TTL of the zones and demanded to speak to the master again. Restarting Bind on the prod servers did nothing other than remove the already expired cache. Once restarted Bind on ns00 (and made it part of the runlevel) the prod server checked in and all was well. The lessons: Monitor *all* of your DNS infrastructure DNS can break even with a large distributed system and it is never pretty. kashani