From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from lists.gentoo.org (pigeon.gentoo.org [208.92.234.80]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by finch.gentoo.org (Postfix) with ESMTPS id 475BC1382C5 for ; Fri, 20 Apr 2018 14:40:44 +0000 (UTC) Received: from pigeon.gentoo.org (localhost [127.0.0.1]) by pigeon.gentoo.org (Postfix) with SMTP id 86AABE0948; Fri, 20 Apr 2018 14:40:37 +0000 (UTC) Received: from mail-wr0-x22a.google.com (mail-wr0-x22a.google.com [IPv6:2a00:1450:400c:c0c::22a]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by pigeon.gentoo.org (Postfix) with ESMTPS id D3AB9E0938 for ; Fri, 20 Apr 2018 14:40:36 +0000 (UTC) Received: by mail-wr0-x22a.google.com with SMTP id h3-v6so23677354wrh.5 for ; Fri, 20 Apr 2018 07:40:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:reply-to:subject:date:message-id:in-reply-to:references :mime-version; bh=ejKYOm17GO9Cr1dwQsESB6+QEUC7tCKQlnKZ7vAHpCo=; b=tFSNqs6O7OValEIDtwugL5F+LcEEuOGxth4d9wgRtUhXj4wO4hbh4YNHxpYIx1hEcg sksbQ0H1HjFyAMFzvLbDouGBw4AZNlwV31LwvgY0v/rwjiHrQ/EEOpvdpEgXCnMIUWgd FMr6JFwzfav6ZUyEkC7NnsacnlaBGSpuR7OGEp/hZzUErUlSIzpoBU9uWwGHfkAvzaZY sl1gBDbb/InB/qMar2m3DelGm+2qHZxuDytmy6ts4afD6pkEVHIQf7aU+vXDIOoDqVtV l5/Hwj0WJ+H3BSKpmkKb0mSDuTYVi+cA/oOMHuXRYKf6qW+1Pb3CB9fB5g7T34FV3yim dOeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:reply-to:subject:date:message-id :in-reply-to:references:mime-version; bh=ejKYOm17GO9Cr1dwQsESB6+QEUC7tCKQlnKZ7vAHpCo=; b=oy7GtRyQyAiijlmxZJAXMABRTjECnU51jze+WB3+uad+67+Mn/M0foe+2nZVMmsvu0 gbVWL7ixWeJ5jEsxh+jCMEaxXHcB1B1rHLmca9tx/+EpcpaQSWQAf3LGXGBbDVflioip 1f2y2sUbZC0zPAwxziGfah7SEAmX7KM1ecrrwKPE094lti7mTBamO3A+gkj3pr8fd+U5 XLXakSaqHgDVrfn0Zk2Q9DCNuLnnBbqNRG6d3K+fgFzdypCOfch5gZpcNWW0WjHwtQM8 eZ9nppCqsPmb+rOibaQJo87rJdH8UM8W+NKSGPOFM1SqThopQHTabBfhfubE3M5f3PgW JsPA== X-Gm-Message-State: ALQs6tCJG9do3IXae8GoIJWWtYEg9lCiM49ZKonWvzhrijXQvDUi3ltG KgzrPBFb2vUoZyJcDyF62UgPAw== X-Google-Smtp-Source: AIpwx4862Cg7igDck1WOY2Zv/9bvhzFcsJR/4cqBX37b7jy3pmzebcgslYIFXZEO+kCMS+iUg5T5aQ== X-Received: by 10.28.107.151 with SMTP id a23mr2143872wmi.14.1524235235012; Fri, 20 Apr 2018 07:40:35 -0700 (PDT) Received: from dell_xps.localnet (230.3.169.217.in-addr.arpa. [217.169.3.230]) by smtp.gmail.com with ESMTPSA id s49-v6sm6615769wrc.36.2018.04.20.07.40.33 for (version=TLS1 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Fri, 20 Apr 2018 07:40:34 -0700 (PDT) From: Mick To: gentoo-user@lists.gentoo.org Subject: Re: [gentoo-user] Dell Precision Workstation Overheating Date: Fri, 20 Apr 2018 15:40:32 +0100 Message-ID: <17524638.sDcRh40y47@dell_xps> In-Reply-To: References: <1627937.pTDruciAPT@dell_xps> Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: List-Id: Gentoo Linux mail X-BeenThere: gentoo-user@lists.gentoo.org Reply-to: gentoo-user@lists.gentoo.org MIME-Version: 1.0 Content-Type: multipart/signed; boundary="nextPart1723109.Ceo5K7ImAq"; micalg="pgp-sha256"; protocol="application/pgp-signature" X-Archives-Salt: a3c2cf13-f41f-4a5a-a2ca-96053203e6de X-Archives-Hash: 724aee97ed90763d23cc82afb85d3be5 --nextPart1723109.Ceo5K7ImAq Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="UTF-8" On Friday, 20 April 2018 15:11:43 BST R0b0t1 wrote: > On Fri, Apr 20, 2018 at 7:21 AM, Mick wrote: > > On Friday, 20 April 2018 12:55:13 BST Corbin Bird wrote: > >> Oak Ridge National Laboratory uses these processors ( Rhea Cluster ) a= nd > >> has numerous heat failures. > >>=20 > >> Due to poor cooling ... surprised? > >>=20 > >> The cooling is not working right. Something is still wrong. > >>=20 > >> On 04/19/2018 09:33 PM, R0b0t1 wrote: > >> > Dell Precision T7600, two 16 thread Xeons, 192GB of RAM, two Quadro > >> > cards and a Tesla card. > >> >=20 > >> > The system is a few years old at this point. Old enough that the > >> > thermal compound could have hardened, which is why I replaced it. > >=20 > > If the problem started suddenly, rather than getting progressively worse > > over time, it may have something to do with kernel drivers, or some > > change in firmware. >=20 > As far as I know it has always been like this. It may be why it was > hardly used before it came into my care. Looking at the server I could > blame poor design; the inside is rather cramped, despite the care > taken with the internal baffles. They may not have run a good flow > simulation. >=20 > Mr. Bird's observation seems to support this. >=20 > > If the cause is mechanical, I'd also suggest checking the heat sink > > contact > > surface. Some heat sinks are poorly manufactured and require flattening > > with wet 'n dry sandpaper to get a flat enough surface and improve their > > contact with the CPU. I've seen 15=C2=B0C improvement in a Zalman CPU = cooler > > after excess metal was removed from copper pipes, which were manufactur= ed > > proud. Hardcore O/C's flatten the CPU too, but I'd avoid anything as > > radical because it can go badly wrong if you remove more than the surfa= ce > > varnish from the chip. > >=20 > > In the interim, opening the side panel may also help in hot weather. >=20 > The internals are custom made to fit the motherboard, cards, and drive > slots. It may work better if I move it to another tower but it will be > a while before I can find one. I will look at the interface between > the heatsink and processor again, but it looked fine. >=20 >=20 > How concerned should I be about overheating machine check errors? I > used to think that it was best to avoid them, as the threshold was > high enough that very small parts of the die could overshoot and fail, > but I was informed that is not the case. Besides the throttling (which > is fairly bad) I am not sure if there are any drawbacks to the > overheating. Semiconductors eventually fail when overheated. So it is not a good idea t= o=20 continue trying to fry your CPU. You can confirm the reason of these exceptions by installing and running 'a= pp- admin/mcelog'. If the tower design is poor and air circulation within the= =20 case is creating recirculatory thermal race conditions, your choices would= =20 typically be: 1. Install more effective after market CPU coolers. This means you have to= =20 spend money, which may be better spent on a new tower/PC. It may also be=20 there isn't enough space in the case to fit them, although low profile/comp= act=20 CPU coolers exist and you may have better luck with them. 2. Install bigger or additional case fans, to help getting the heated air o= ut=20 of the case and minimising hot spots and hot air recirculation. You could = try=20 forcing some more air through the case with a small desktop fan to see if t= his=20 option has any legs. 3. Modify the case, by drilling/cutting holes to improve air flow, e.g. at = the=20 top of the case. 4. Migrating components to a diffent case/MoBo, which you have already=20 considered. > I am wondering what the point of 32 threads is if you can't use them at > 100%. >=20 > Cheers, > R0b0t1 Quite, but the box may have not been intended to come across the pressures = of=20 running gentoo to compile software on a regular basis. I've found many=20 cheaper laptops in particular are so poorly designed from a cooling=20 perspective, they struggle to run a lengthy gentoo emerge. I've also had=20 desktops which struggled, although nothing as critical as yours. The=20 permanent solutions I came up with involved after market cooling fans. Wit= h=20 boxen I was not keen to spend money on for cooling improvements I would jus= t=20 open the side panel during an emerge, which allowed the CPU temperature to= =20 drop sufficiently to avoid further thermal throttling. =2D-=20 Regards, Mick --nextPart1723109.Ceo5K7ImAq Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. Content-Transfer-Encoding: 7Bit -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEt7MNaGaS6HvTUrEz6WnU8jC95dcFAlrZ++AACgkQ6WnU8jC9 5dfQlg/+OGMvmQG00VHyZtfIdslDHD6zU4qlSkwf9pATVdgj7fhu67X0bl05MRFn AxAX6Veat3fK/QZazKEAWN9EvNjlnp90rQuE0IuWsInaSRrzM4nNw6IQWBsHwgC+ KCWW+YHXOyTvRYC7Hc0bFnSf8Uvi5vUvlfte27+a5uRFtihxO8ElvbamcWaKT8yd T+GIugByaF7GBCkGhc+s72ATbjg/31C5gtbw/DDwIwlMmEIXENr0AKJ7gKkClDu2 x6QZgVTeJymk+ZjNTtm/ZmFiuUuCqN9A/40doXhCKwfuJpp6vVStG88iolA5yaxv SNcuXATEkKr5QqRDUwUxT5ERICphdke3R4W6igBs2CoMtvZJkK8Gt5LVgr77FNmJ unxBzjubXeZZEHIWH3xUGQYc6NFZSU3+QxnDFRw5luFyKhsmkqmvx+fDs8dWcu9k xq/wvl/xPaE/ufKkd53OfTJoq5tywBtlHezDZIWPVIBvj8N8oURGmdH+IM/VShD5 KMFBXQbuG+KzMCq5ep+FG4R2g9wdaaH+G9OtwcqH9muOMfwBfDBTF4L5KZJ90nvf eUlXkcnGm3DtNL8jIR885lCM6KXxWwbOxLNpaohL6uyiP0UxhQRlvmxitzr53bg2 Qut+hyVwOgSU3tpY0viUGPmvh4kPcbgncTsiMO8nO8BoxY2rh4w= =xs+K -----END PGP SIGNATURE----- --nextPart1723109.Ceo5K7ImAq--