* [gentoo-user] machine check exception errors @ 2010-09-14 16:45 Grant 2010-09-14 18:16 ` Albert Hopkins 0 siblings, 1 reply; 18+ messages in thread From: Grant @ 2010-09-14 16:45 UTC (permalink / raw To: Gentoo mailing list I'm getting a lot of machine check exception errors in dmesg on my hosted server. Running mcelog I get: # mcelog HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor MCE 0 CPU 0 4 northbridge TSC 5ab2d0c67592a MISC c008001901000000 ADDR a2d6e1f0 Northbridge RAM Chipkill ECC error Chipkill ECC syndrome = 7b58 bit40 = error found by scrub bit46 = corrected ecc error bit59 = misc error valid bus error 'local node response, request didn't time out generic read mem transaction memory access, level generic' STATUS 9c2c41007b080a13 MCGSTATUS 0 MCGCAP c008001a01000000 SOCKETID 7b080a13 HARDWARE ERROR. This is *NOT* a software problem! Please contact your hardware vendor MCE 1 CPU 0 4 northbridge TSC 5aee3f082740a MISC c008001a01000000 ADDR a2d6e1f0 Northbridge RAM Chipkill ECC error Chipkill ECC syndrome = 7b58 bit46 = corrected ecc error bit59 = misc error valid bus error 'local node response, request didn't time out generic read mem transaction memory access, level generic' STATUS 9c2c40007b080a13 MCGSTATUS 0 SOCKETID 0 Should I just contact the hosting company? Can anyone give me more info on what this means? Bad memory? - Grant ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] machine check exception errors 2010-09-14 16:45 [gentoo-user] machine check exception errors Grant @ 2010-09-14 18:16 ` Albert Hopkins 2010-09-15 20:43 ` Mick 0 siblings, 1 reply; 18+ messages in thread From: Albert Hopkins @ 2010-09-14 18:16 UTC (permalink / raw To: gentoo-user On Tue, 2010-09-14 at 09:45 -0700, Grant wrote: > I'm getting a lot of machine check exception errors in dmesg on my > hosted server. Running mcelog I get: > > # mcelog > HARDWARE ERROR. This is *NOT* a software problem! [...] > Should I just contact the hosting company? Can anyone give me more > info on what this means? Bad memory? They are likely better able to help you if it's a hardware problem. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] machine check exception errors 2010-09-14 18:16 ` Albert Hopkins @ 2010-09-15 20:43 ` Mick 2010-09-21 17:37 ` Grant 0 siblings, 1 reply; 18+ messages in thread From: Mick @ 2010-09-15 20:43 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: Text/Plain, Size: 768 bytes --] On Tuesday 14 September 2010 19:16:52 Albert Hopkins wrote: > On Tue, 2010-09-14 at 09:45 -0700, Grant wrote: > > I'm getting a lot of machine check exception errors in dmesg on my > > hosted server. Running mcelog I get: > > > > # mcelog > > HARDWARE ERROR. This is *NOT* a software problem! > > [...] > > > Should I just contact the hosting company? Can anyone give me more > > info on what this means? Bad memory? > > They are likely better able to help you if it's a hardware problem. It reads as if the error correction in one of the RAM modules is kicking in. Ask them to reseat or replace the bad module - which they will have to find by trial and error. They could hot-swap them and see then the errors stop. -- Regards, Mick [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] machine check exception errors 2010-09-15 20:43 ` Mick @ 2010-09-21 17:37 ` Grant 2010-09-21 19:15 ` Stroller 0 siblings, 1 reply; 18+ messages in thread From: Grant @ 2010-09-21 17:37 UTC (permalink / raw To: gentoo-user >> > I'm getting a lot of machine check exception errors in dmesg on my >> > hosted server. Running mcelog I get: >> > >> > # mcelog >> > HARDWARE ERROR. This is *NOT* a software problem! >> >> [...] >> >> > Should I just contact the hosting company? Can anyone give me more >> > info on what this means? Bad memory? >> >> They are likely better able to help you if it's a hardware problem. > > It reads as if the error correction in one of the RAM modules is kicking in. > Ask them to reseat or replace the bad module - which they will have to find by > trial and error. They could hot-swap them and see then the errors stop. > -- > Regards, > Mick They offered to take my machine down and do a memory test which they said would take a number of hours. Is a memory test likely to help? Did you suggest reseating or replacing RAM modules as opposed to a memory test because it will result in less downtime? - Grant ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] machine check exception errors 2010-09-21 17:37 ` Grant @ 2010-09-21 19:15 ` Stroller 2010-09-21 21:32 ` Mick 2010-09-25 23:20 ` [gentoo-user] " Volker Armin Hemmann 0 siblings, 2 replies; 18+ messages in thread From: Stroller @ 2010-09-21 19:15 UTC (permalink / raw To: gentoo-user On 21 Sep 2010, at 18:37, Grant wrote: >>>> I'm getting a lot of machine check exception errors in dmesg on my >>>> hosted server. Running mcelog I get: >>>> ... > > They offered to take my machine down and do a memory test which they > said would take a number of hours. Is a memory test likely to help? > Did you suggest reseating or replacing RAM modules as opposed to a > memory test because it will result in less downtime? I suspect that your hosting provider are offering you this memory test because they don't want to go swapping out memory modules willy-nilly. How do they know that the problem is really memory, and not your operating system? If they take all this RAM out and put new RAM in, what do they do with the old RAM? They don't know if it's good or bad, so are they expected to just slap it in a server belonging to another customer, and stitch him up? A memory test is likely to identify bad RAM, if it is bad, so you should proceed with this. This is likely the best route to solving the problem. I think that ideally, for you, they would move the system image onto a different known-good server with the same configuration. Then you cannot complain if the same problems start occurring again. If the problem is genuinely hardware then they won't. And the hosting provider is free to run diagnostics on your old machine. But realistically, the memory test is likely to show up a bad RAM module, you'll get it replaced and be up and running within a few hours. Why would you refuse? If your system needed a guaranteed uptime you'd perhaps have to pay for a higher level of service than the fees you're paying at present. Stroller. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] machine check exception errors 2010-09-21 19:15 ` Stroller @ 2010-09-21 21:32 ` Mick 2010-09-22 1:24 ` Grant 2010-09-25 23:20 ` [gentoo-user] " Volker Armin Hemmann 1 sibling, 1 reply; 18+ messages in thread From: Mick @ 2010-09-21 21:32 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: Text/Plain, Size: 2169 bytes --] On Tuesday 21 September 2010 20:15:05 Stroller wrote: > On 21 Sep 2010, at 18:37, Grant wrote: > >>>> I'm getting a lot of machine check exception errors in dmesg on my > >>>> hosted server. Running mcelog I get: > >>>> ... > > > > They offered to take my machine down and do a memory test which they > > said would take a number of hours. Is a memory test likely to help? > > Did you suggest reseating or replacing RAM modules as opposed to a > > memory test because it will result in less downtime? > > I suspect that your hosting provider are offering you this memory test > because they don't want to go swapping out memory modules willy-nilly. > > How do they know that the problem is really memory, and not your operating > system? If they take all this RAM out and put new RAM in, what do they do > with the old RAM? They don't know if it's good or bad, so are they > expected to just slap it in a server belonging to another customer, and > stitch him up? > > A memory test is likely to identify bad RAM, if it is bad, so you should > proceed with this. This is likely the best route to solving the problem. > > I think that ideally, for you, they would move the system image onto a > different known-good server with the same configuration. Then you cannot > complain if the same problems start occurring again. If the problem is > genuinely hardware then they won't. And the hosting provider is free to > run diagnostics on your old machine. > > But realistically, the memory test is likely to show up a bad RAM module, > you'll get it replaced and be up and running within a few hours. Why would > you refuse? If your system needed a guaranteed uptime you'd perhaps have > to pay for a higher level of service than the fees you're paying at > present. I run memory tests overnight. If a module is seriously borked then it will fail earlier. Reseating/replacing takes a few minutes, instead of hours. If they have spare machines (for dev't or testing) they can fit the memory module(s) there and test them exhaustively, before they put the good ones back into a customer's machine. -- Regards, Mick [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] machine check exception errors 2010-09-21 21:32 ` Mick @ 2010-09-22 1:24 ` Grant 2010-09-22 9:19 ` Mick 0 siblings, 1 reply; 18+ messages in thread From: Grant @ 2010-09-22 1:24 UTC (permalink / raw To: gentoo-user >> >>>> I'm getting a lot of machine check exception errors in dmesg on my >> >>>> hosted server. Running mcelog I get: >> >>>> ... >> > >> > They offered to take my machine down and do a memory test which they >> > said would take a number of hours. Is a memory test likely to help? >> > Did you suggest reseating or replacing RAM modules as opposed to a >> > memory test because it will result in less downtime? >> >> I suspect that your hosting provider are offering you this memory test >> because they don't want to go swapping out memory modules willy-nilly. >> >> How do they know that the problem is really memory, and not your operating >> system? If they take all this RAM out and put new RAM in, what do they do >> with the old RAM? They don't know if it's good or bad, so are they >> expected to just slap it in a server belonging to another customer, and >> stitch him up? >> >> A memory test is likely to identify bad RAM, if it is bad, so you should >> proceed with this. This is likely the best route to solving the problem. >> >> I think that ideally, for you, they would move the system image onto a >> different known-good server with the same configuration. Then you cannot >> complain if the same problems start occurring again. If the problem is >> genuinely hardware then they won't. And the hosting provider is free to >> run diagnostics on your old machine. >> >> But realistically, the memory test is likely to show up a bad RAM module, >> you'll get it replaced and be up and running within a few hours. Why would >> you refuse? If your system needed a guaranteed uptime you'd perhaps have >> to pay for a higher level of service than the fees you're paying at >> present. > > I run memory tests overnight. If a module is seriously borked then it will > fail earlier. Reseating/replacing takes a few minutes, instead of hours. > > If they have spare machines (for dev't or testing) they can fit the memory > module(s) there and test them exhaustively, before they put the good ones back > into a customer's machine. Thanks Mick and Stroller. I'll see if they'll go for this. - Grant ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] machine check exception errors 2010-09-22 1:24 ` Grant @ 2010-09-22 9:19 ` Mick 2010-09-22 16:42 ` Grant 0 siblings, 1 reply; 18+ messages in thread From: Mick @ 2010-09-22 9:19 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: Text/Plain, Size: 2732 bytes --] On Wednesday 22 September 2010 02:24:39 Grant wrote: > >> >>>> I'm getting a lot of machine check exception errors in dmesg on my > >> >>>> hosted server. Running mcelog I get: > >> >>>> ... > >> > > >> > They offered to take my machine down and do a memory test which they > >> > said would take a number of hours. Is a memory test likely to help? > >> > Did you suggest reseating or replacing RAM modules as opposed to a > >> > memory test because it will result in less downtime? > >> > >> I suspect that your hosting provider are offering you this memory test > >> because they don't want to go swapping out memory modules willy-nilly. > >> > >> How do they know that the problem is really memory, and not your > >> operating system? If they take all this RAM out and put new RAM in, > >> what do they do with the old RAM? They don't know if it's good or bad, > >> so are they expected to just slap it in a server belonging to another > >> customer, and stitch him up? > >> > >> A memory test is likely to identify bad RAM, if it is bad, so you should > >> proceed with this. This is likely the best route to solving the problem. > >> > >> I think that ideally, for you, they would move the system image onto a > >> different known-good server with the same configuration. Then you cannot > >> complain if the same problems start occurring again. If the problem is > >> genuinely hardware then they won't. And the hosting provider is free to > >> run diagnostics on your old machine. > >> > >> But realistically, the memory test is likely to show up a bad RAM > >> module, you'll get it replaced and be up and running within a few > >> hours. Why would you refuse? If your system needed a guaranteed uptime > >> you'd perhaps have to pay for a higher level of service than the fees > >> you're paying at present. > > > > I run memory tests overnight. If a module is seriously borked then it > > will fail earlier. Reseating/replacing takes a few minutes, instead of > > hours. > > > > If they have spare machines (for dev't or testing) they can fit the > > memory module(s) there and test them exhaustively, before they put the > > good ones back into a customer's machine. > > Thanks Mick and Stroller. I'll see if they'll go for this. You're welcome. Bear in mind though that a lot of hosters are just glorified resellers with an account in a bigger data centre. In many cases they do not even have physical access to the machines. Only the data centre techies do and they may be less willing to oblige and break procedure or routine, just because one end user out of hundreds/thousands complained about some memory errors. YMMV -- Regards, Mick [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] machine check exception errors 2010-09-22 9:19 ` Mick @ 2010-09-22 16:42 ` Grant 2010-09-23 4:26 ` Dale 0 siblings, 1 reply; 18+ messages in thread From: Grant @ 2010-09-22 16:42 UTC (permalink / raw To: gentoo-user >> >> >>>> I'm getting a lot of machine check exception errors in dmesg on my >> >> >>>> hosted server. Running mcelog I get: >> >> >>>> ... >> >> > >> >> > They offered to take my machine down and do a memory test which they >> >> > said would take a number of hours. Is a memory test likely to help? >> >> > Did you suggest reseating or replacing RAM modules as opposed to a >> >> > memory test because it will result in less downtime? >> >> >> >> I suspect that your hosting provider are offering you this memory test >> >> because they don't want to go swapping out memory modules willy-nilly. >> >> >> >> How do they know that the problem is really memory, and not your >> >> operating system? If they take all this RAM out and put new RAM in, >> >> what do they do with the old RAM? They don't know if it's good or bad, >> >> so are they expected to just slap it in a server belonging to another >> >> customer, and stitch him up? >> >> >> >> A memory test is likely to identify bad RAM, if it is bad, so you should >> >> proceed with this. This is likely the best route to solving the problem. >> >> >> >> I think that ideally, for you, they would move the system image onto a >> >> different known-good server with the same configuration. Then you cannot >> >> complain if the same problems start occurring again. If the problem is >> >> genuinely hardware then they won't. And the hosting provider is free to >> >> run diagnostics on your old machine. >> >> >> >> But realistically, the memory test is likely to show up a bad RAM >> >> module, you'll get it replaced and be up and running within a few >> >> hours. Why would you refuse? If your system needed a guaranteed uptime >> >> you'd perhaps have to pay for a higher level of service than the fees >> >> you're paying at present. >> > >> > I run memory tests overnight. If a module is seriously borked then it >> > will fail earlier. Reseating/replacing takes a few minutes, instead of >> > hours. >> > >> > If they have spare machines (for dev't or testing) they can fit the >> > memory module(s) there and test them exhaustively, before they put the >> > good ones back into a customer's machine. >> >> Thanks Mick and Stroller. I'll see if they'll go for this. > > You're welcome. Bear in mind though that a lot of hosters are just glorified > resellers with an account in a bigger data centre. In many cases they do not > even have physical access to the machines. Only the data centre techies do > and they may be less willing to oblige and break procedure or routine, just > because one end user out of hundreds/thousands complained about some memory > errors. Thanks Mick. My host is big with multiple data centers of their own. They did exactly as I asked and I'm running on new RAM. There was a problem bringing my system back online and the cause was purported to be an unseated ethernet cable. I handed over my root password as I was requested to do, and then started to get paranoid. I suppose I shouldn't though because with physical access to my machine they pretty much have full access anyway, right? - Grant ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] machine check exception errors 2010-09-22 16:42 ` Grant @ 2010-09-23 4:26 ` Dale 2010-09-23 9:02 ` Neil Bothwick 0 siblings, 1 reply; 18+ messages in thread From: Dale @ 2010-09-23 4:26 UTC (permalink / raw To: gentoo-user Grant wrote: > > Thanks Mick. My host is big with multiple data centers of their own. > They did exactly as I asked and I'm running on new RAM. There was a > problem bringing my system back online and the cause was purported to > be an unseated ethernet cable. I handed over my root password as I > was requested to do, and then started to get paranoid. I suppose I > shouldn't though because with physical access to my machine they > pretty much have full access anyway, right? > > - Grant > > > Usually, physical access means they either have it or can get it pretty quick. Boot a CD/DVD, mount the partitions, chroot in, change password and reboot. Then, you don't have the password but they do. My conspiracy hat on, if you can't trust them with the password, why do they have your data? Just thinking. ;-) This leaves out the encryption thing tho. That would change things. Dale :-) :-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] machine check exception errors 2010-09-23 4:26 ` Dale @ 2010-09-23 9:02 ` Neil Bothwick 2010-09-25 16:38 ` Grant 0 siblings, 1 reply; 18+ messages in thread From: Neil Bothwick @ 2010-09-23 9:02 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 1221 bytes --] On Wed, 22 Sep 2010 23:26:09 -0500, Dale wrote: > > Thanks Mick. My host is big with multiple data centers of their own. > > They did exactly as I asked and I'm running on new RAM. There was a > > problem bringing my system back online and the cause was purported to > > be an unseated ethernet cable. I handed over my root password as I > > was requested to do, and then started to get paranoid. I suppose I > > shouldn't though because with physical access to my machine they > > pretty much have full access anyway, right? > Usually, physical access means they either have it or can get it pretty > quick. Boot a CD/DVD, mount the partitions, chroot in, change password > and reboot. Then, you don't have the password but they do. That's pretty obvious though. Physical access allows them to change your password but not read it, so you'd know pretty soon if they'd been up to anything. If they really do need the root password, you have to give it to them, but that doesn't stop you changing it, and running a rootkit scan, as soon as they've finished with it. -- Neil Bothwick God said, "div D = rho, div B = 0, curl E = - @B/@t, curl H = J + @D/@t," and there was light. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] machine check exception errors 2010-09-23 9:02 ` Neil Bothwick @ 2010-09-25 16:38 ` Grant 2010-09-25 18:09 ` [gentoo-user] " walt 0 siblings, 1 reply; 18+ messages in thread From: Grant @ 2010-09-25 16:38 UTC (permalink / raw To: gentoo-user >> > Thanks Mick. My host is big with multiple data centers of their own. >> > They did exactly as I asked and I'm running on new RAM. There was a >> > problem bringing my system back online and the cause was purported to >> > be an unseated ethernet cable. I handed over my root password as I >> > was requested to do, and then started to get paranoid. I suppose I >> > shouldn't though because with physical access to my machine they >> > pretty much have full access anyway, right? > >> Usually, physical access means they either have it or can get it pretty >> quick. Boot a CD/DVD, mount the partitions, chroot in, change password >> and reboot. Then, you don't have the password but they do. > > That's pretty obvious though. Physical access allows them to change your > password but not read it, so you'd know pretty soon if they'd been up to > anything. > > If they really do need the root password, you have to give it to them, > but that doesn't stop you changing it, and running a rootkit scan, as > soon as they've finished with it. I've run chkrootkit, but I noticed: The file of stored file properties (rkhunter.dat) does not exist, and so must be created. To do this type in 'rkhunter --propupd'. I thought the best practice with a rootkit checker like chkrootkit was to not leave it installed on the system so you can run it as a clean install when the time comes? Do any of these warnings sound an alarm for anyone? I think the SSH warnings are OK because I have a normal user specified with AllowUsers and the config file says: # The default requires explicit activation of protocol 1 #Protocol 2 Here are the warnings: Warning: The command '/usr/bin/ldd' has been replaced by a script: /usr/bin/ldd: Bourne-Again shell script text executable Warning: The command '/usr/bin/whatis' has been replaced by a script: /usr/bin/whatis: POSIX shell script text executable Warning: The command '/usr/bin/lwp-request' has been replaced by a script: /usr/bin/lwp-request: a /usr/bin/perl -w script text executable Warning: No output found from the lsmod command or the /proc/modules file: /proc/modules output: lsmod output: Warning: The SSH configuration option 'PermitRootLogin' has not been set. The default value may be 'yes', to allow root access. Warning: The SSH configuration option 'Protocol' has not been set. The default value may be '2,1', to allow the use of protocol version 1. Warning: Hidden directory found: /dev/.udev - Grant ^ permalink raw reply [flat|nested] 18+ messages in thread
* [gentoo-user] Re: machine check exception errors 2010-09-25 16:38 ` Grant @ 2010-09-25 18:09 ` walt 2010-09-25 20:47 ` Grant 0 siblings, 1 reply; 18+ messages in thread From: walt @ 2010-09-25 18:09 UTC (permalink / raw To: gentoo-user On 09/25/2010 09:38 AM, Grant wrote: > Do any of these warnings sound an alarm for anyone? > > Warning: The command '/usr/bin/ldd' has been replaced by a script: > /usr/bin/ldd: Bourne-Again shell script text executable > > Warning: The command '/usr/bin/whatis' has been replaced by a script: > /usr/bin/whatis: POSIX shell script text executable > > Warning: The command '/usr/bin/lwp-request' has been replaced by a > script: /usr/bin/lwp-request: a /usr/bin/perl -w script text > executable > > Warning: No output found from the lsmod command or the /proc/modules file: > /proc/modules output: > lsmod output: > > Warning: The SSH configuration option 'PermitRootLogin' has not been > set. The default value may be 'yes', to allow root access. > > Warning: The SSH configuration option 'Protocol' has not been set. The > default value may be '2,1', to allow the use of protocol version 1. > > Warning: Hidden directory found: /dev/.udev I have the same on my machines except for the lsmod output. Did you configure your kernel without any modules? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Re: machine check exception errors 2010-09-25 18:09 ` [gentoo-user] " walt @ 2010-09-25 20:47 ` Grant 2010-09-25 21:53 ` Dale 0 siblings, 1 reply; 18+ messages in thread From: Grant @ 2010-09-25 20:47 UTC (permalink / raw To: gentoo-user >> Do any of these warnings sound an alarm for anyone? >> >> Warning: The command '/usr/bin/ldd' has been replaced by a script: >> /usr/bin/ldd: Bourne-Again shell script text executable >> >> Warning: The command '/usr/bin/whatis' has been replaced by a script: >> /usr/bin/whatis: POSIX shell script text executable >> >> Warning: The command '/usr/bin/lwp-request' has been replaced by a >> script: /usr/bin/lwp-request: a /usr/bin/perl -w script text >> executable >> >> Warning: No output found from the lsmod command or the /proc/modules file: >> /proc/modules output: >> lsmod output: >> >> Warning: The SSH configuration option 'PermitRootLogin' has not been >> set. The default value may be 'yes', to allow root access. >> >> Warning: The SSH configuration option 'Protocol' has not been set. The >> default value may be '2,1', to allow the use of protocol version 1. >> >> Warning: Hidden directory found: /dev/.udev > > I have the same on my machines except for the lsmod output. Did you > configure > your kernel without any modules? Yes, no modules except that one that seems to be required called something along the lines of scsi-wait-scan. - Grant ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Re: machine check exception errors 2010-09-25 20:47 ` Grant @ 2010-09-25 21:53 ` Dale 2010-09-25 23:11 ` walt 0 siblings, 1 reply; 18+ messages in thread From: Dale @ 2010-09-25 21:53 UTC (permalink / raw To: gentoo-user Grant wrote: >>> Do any of these warnings sound an alarm for anyone? >>> >>> Warning: The command '/usr/bin/ldd' has been replaced by a script: >>> /usr/bin/ldd: Bourne-Again shell script text executable >>> >>> Warning: The command '/usr/bin/whatis' has been replaced by a script: >>> /usr/bin/whatis: POSIX shell script text executable >>> >>> Warning: The command '/usr/bin/lwp-request' has been replaced by a >>> script: /usr/bin/lwp-request: a /usr/bin/perl -w script text >>> executable >>> >>> Warning: No output found from the lsmod command or the /proc/modules file: >>> /proc/modules output: >>> lsmod output: >>> >>> Warning: The SSH configuration option 'PermitRootLogin' has not been >>> set. The default value may be 'yes', to allow root access. >>> >>> Warning: The SSH configuration option 'Protocol' has not been set. The >>> default value may be '2,1', to allow the use of protocol version 1. >>> >>> Warning: Hidden directory found: /dev/.udev >>> >> I have the same on my machines except for the lsmod output. Did you >> configure >> your kernel without any modules? >> > Yes, no modules except that one that seems to be required called > something along the lines of scsi-wait-scan. > > - Grant > > I tried getting rid of that, I don't like modules much, but I had no luck. It just has to be a module and it seems it just has to be there. Still not sure it is the way it is. Dale :-) :-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* [gentoo-user] Re: machine check exception errors 2010-09-25 21:53 ` Dale @ 2010-09-25 23:11 ` walt 2010-09-25 23:17 ` Dale 0 siblings, 1 reply; 18+ messages in thread From: walt @ 2010-09-25 23:11 UTC (permalink / raw To: gentoo-user On 09/25/2010 02:53 PM, Dale wrote: > Grant wrote: >>> Did you >>> configure >>> your kernel without any modules? >> Yes, no modules except that one that seems to be required called >> something along the lines of scsi-wait-scan. > I tried getting rid of that, I don't like modules much, but I had no luck. It just has to be a module and it seems it just has to be there. Still not sure it is the way it is. Hm. I see that I have that driver in /lib/modules/ but the module is not loaded. Anyone know what that module is for? ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] Re: machine check exception errors 2010-09-25 23:11 ` walt @ 2010-09-25 23:17 ` Dale 0 siblings, 0 replies; 18+ messages in thread From: Dale @ 2010-09-25 23:17 UTC (permalink / raw To: gentoo-user walt wrote: > On 09/25/2010 02:53 PM, Dale wrote: >> Grant wrote: >>>> Did you >>>> configure >>>> your kernel without any modules? > >>> Yes, no modules except that one that seems to be required called >>> something along the lines of scsi-wait-scan. > >> I tried getting rid of that, I don't like modules much, but I had no >> luck. It just has to be a module and it seems it just has to be >> there. Still not sure it is the way it is. > > Hm. I see that I have that driver in /lib/modules/ but the module is > not loaded. Anyone know what that module is for? > I think it was Alan that tried to explain that thing to me. I don't use it that I know of but you can't config it out. I even tried to edit the config file directly and it just got really mad during the build. No idea what it is for but some kernel dev thinks it is really important to have no matter what. Dale :-) :-) ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [gentoo-user] machine check exception errors 2010-09-21 19:15 ` Stroller 2010-09-21 21:32 ` Mick @ 2010-09-25 23:20 ` Volker Armin Hemmann 1 sibling, 0 replies; 18+ messages in thread From: Volker Armin Hemmann @ 2010-09-25 23:20 UTC (permalink / raw To: gentoo-user On Tuesday 21 September 2010, Stroller wrote: > On 21 Sep 2010, at 18:37, Grant wrote: > >>>> I'm getting a lot of machine check exception errors in dmesg on my > >>>> hosted server. Running mcelog I get: > >>>> ... > > > > They offered to take my machine down and do a memory test which they > > said would take a number of hours. Is a memory test likely to help? > > Did you suggest reseating or replacing RAM modules as opposed to a > > memory test because it will result in less downtime? > > I suspect that your hosting provider are offering you this memory test > because they don't want to go swapping out memory modules willy-nilly. > > How do they know that the problem is really memory, and not your operating > system? If they take all this RAM out and put new RAM in, what do they do > with the old RAM? They don't know if it's good or bad, so are they > expected to just slap it in a server belonging to another customer, and > stitch him up? > > A memory test is likely to identify bad RAM, if it is bad, so you should > proceed with this. This is likely the best route to solving the problem. > sure? this is ecc ram - does memtest report ecc-corrected errors? i don't think so. The mce errors say: we detected an error. Error was corrected. Applications will not see error. Everything marches on. The ram is borked and must be replaced. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2010-09-26 0:09 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-09-14 16:45 [gentoo-user] machine check exception errors Grant 2010-09-14 18:16 ` Albert Hopkins 2010-09-15 20:43 ` Mick 2010-09-21 17:37 ` Grant 2010-09-21 19:15 ` Stroller 2010-09-21 21:32 ` Mick 2010-09-22 1:24 ` Grant 2010-09-22 9:19 ` Mick 2010-09-22 16:42 ` Grant 2010-09-23 4:26 ` Dale 2010-09-23 9:02 ` Neil Bothwick 2010-09-25 16:38 ` Grant 2010-09-25 18:09 ` [gentoo-user] " walt 2010-09-25 20:47 ` Grant 2010-09-25 21:53 ` Dale 2010-09-25 23:11 ` walt 2010-09-25 23:17 ` Dale 2010-09-25 23:20 ` [gentoo-user] " Volker Armin Hemmann
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox