* [gentoo-user] strange behaviour in quite special case @ 2017-08-24 21:27 Francisco Ares 2017-08-31 7:47 ` Andrew Savchenko 0 siblings, 1 reply; 5+ messages in thread From: Francisco Ares @ 2017-08-24 21:27 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 2606 bytes --] Hi, All. This is a rather special case, so I don't expect much, but who knows? I've built a Gentoo x86-64 system for an embedded application. Just after a lot of updates, which I am unable to track, it stopped working as usual. There is the development system, fully loaded of a lot of packages used for development, and the production system, that don't need all of those. There is a line in both systems in /etc/iniitab responsible for auto-login the production system user and the programs we need running (in its ".bash_profile" and ".xinitrc"): c6:2345:respawn:/sbin/agetty -a production-user 38400 tty6 linux The development system starts a WindowMaker session, and the production system starts a program that controls the rest of the hardware of this embedded system, with an X11 graphical interface. That runs normally when simulated at the development system. The development system runs smoothly. The production system, after removing the files from undesirable packages and creating a squashfs image of the ripped-off root partition behaves strangely at boot: It shows the initialization messages as expected, but when the auto-login and the controller program start should take place, it completely stalls up to I plug a USB keyboard and issue some times some of the key combinations to change to a text console and back to X11 (Ctrl-Alt-F1 and Ctrl-Alt-F6); only then the things resume as expected. As you might suspect, there is no keyboard for the production system ;-) . As a matter of fact, I don't know where the stall take place, as when I try to switch to a text console to see the logs, it switches back to X11 and starts our program. By the way, the logs just show that the events occurred at latter times than expected. Although the squashfs is read-only, some main directories are arranged in a way that, using tmpfs mounts and unionfs with the read-only directory to the read-write tmpfs directory to that main directory provide a way of creating temporary files that has been working for a few years now. For instance, in "/etc/fstab": tmpfs /.etc.rw tmpfs defaults,mode=755 0 0 union /etc unionfs default_permissions,allow_other,use_ino,nonempty,suid,cow,dirs=/. etc.rw=rw:/.etc.ro=ro 0 0 And there is a "/.etc.ro" with a copy of all files present in regular "/etc" , a "/.etc.rw" directory to be mounted tmpfs, and the original "/etc" directory, that needs to be there at boot, even before mounting all this. Does anyone have a clue? Thanks! Francisco [-- Attachment #2: Type: text/html, Size: 3675 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] strange behaviour in quite special case 2017-08-24 21:27 [gentoo-user] strange behaviour in quite special case Francisco Ares @ 2017-08-31 7:47 ` Andrew Savchenko 2017-09-18 13:13 ` Francisco Ares 0 siblings, 1 reply; 5+ messages in thread From: Andrew Savchenko @ 2017-08-31 7:47 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 3597 bytes --] Hi, On Thu, 24 Aug 2017 18:27:22 -0300 Francisco Ares wrote: > Hi, All. > > This is a rather special case, so I don't expect much, but who knows? > > I've built a Gentoo x86-64 system for an embedded application. > > Just after a lot of updates, which I am unable to track, it stopped working > as usual. > > There is the development system, fully loaded of a lot of packages used for > development, and the production system, that don't need all of those. > > There is a line in both systems in /etc/iniitab responsible for auto-login > the production system user and the programs we need running (in its > ".bash_profile" and ".xinitrc"): > > c6:2345:respawn:/sbin/agetty -a production-user 38400 tty6 linux > > The development system starts a WindowMaker session, and the production > system starts a program that controls the rest of the hardware of this > embedded system, with an X11 graphical interface. That runs normally when > simulated at the development system. > > The development system runs smoothly. The production system, after > removing the files from undesirable packages and creating a squashfs image > of the ripped-off root partition behaves strangely at boot: > > It shows the initialization messages as expected, but when the auto-login > and the controller program start should take place, it completely stalls up > to I plug a USB keyboard and issue some times some of the key combinations > to change to a text console and back to X11 (Ctrl-Alt-F1 and Ctrl-Alt-F6); > only then the things resume as expected. > > As you might suspect, there is no keyboard for the production system ;-) . > > As a matter of fact, I don't know where the stall take place, as when I try > to switch to a text console to see the logs, it switches back to X11 and > starts our program. By the way, the logs just show that the events > occurred at latter times than expected. > > Although the squashfs is read-only, some main directories are arranged in a > way that, using tmpfs mounts and unionfs with the read-only directory to > the read-write tmpfs directory to that main directory provide a way of > creating temporary files that has been working for a few years now. > > For instance, in "/etc/fstab": > > tmpfs /.etc.rw tmpfs defaults,mode=755 > 0 0 > union /etc unionfs > default_permissions,allow_other,use_ino,nonempty,suid,cow,dirs=/. > etc.rw=rw:/.etc.ro=ro 0 0 > > And there is a "/.etc.ro" with a copy of all files present in regular > "/etc" , a "/.etc.rw" directory to be mounted tmpfs, and the original > "/etc" directory, that needs to be there at boot, even before mounting all > this. > > Does anyone have a clue? Try to dissect your problem. Start with removing squashfs and all tmpfs/unionfs manipulations. Create the same image, but on "normal" writable file system and see how it goes. It may be fs-related bug, may be you removed too many files and some "undesired" packages are actually mandatory. If you have some form on snapshots of your changes, you can try to bisect them in a git bisect way. Another approach is to run X server (or any other app suspected as a troublemaker) under strace (or attach strace to a running process) and see what is going on. You will have a lot of low level information and extensive filtering will be required; strace is capable of that, but you will need to dig into its documentation. Best regards, Andrew Savchenko [-- Attachment #2: Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] strange behaviour in quite special case 2017-08-31 7:47 ` Andrew Savchenko @ 2017-09-18 13:13 ` Francisco Ares 2017-09-18 22:56 ` Peter Humphrey 0 siblings, 1 reply; 5+ messages in thread From: Francisco Ares @ 2017-09-18 13:13 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 4106 bytes --] 2017-08-31 4:47 GMT-03:00 Andrew Savchenko <bircoph@gentoo.org>: > Hi, > > On Thu, 24 Aug 2017 18:27:22 -0300 Francisco Ares wrote: > > Hi, All. > > > > This is a rather special case, so I don't expect much, but who knows? > > > > I've built a Gentoo x86-64 system for an embedded application. > > > > Just after a lot of updates, which I am unable to track, it stopped > working > > as usual. > > > > There is the development system, fully loaded of a lot of packages used > for > > development, and the production system, that don't need all of those. > > > > There is a line in both systems in /etc/iniitab responsible for > auto-login > > the production system user and the programs we need running (in its > > ".bash_profile" and ".xinitrc"): > > > > c6:2345:respawn:/sbin/agetty -a production-user 38400 tty6 linux > > > > The development system starts a WindowMaker session, and the production > > system starts a program that controls the rest of the hardware of this > > embedded system, with an X11 graphical interface. That runs normally > when > > simulated at the development system. > > > > The development system runs smoothly. The production system, after > > removing the files from undesirable packages and creating a squashfs > image > > of the ripped-off root partition behaves strangely at boot: > > > > It shows the initialization messages as expected, but when the auto-login > > and the controller program start should take place, it completely stalls > up > > to I plug a USB keyboard and issue some times some of the key > combinations > > to change to a text console and back to X11 (Ctrl-Alt-F1 and > Ctrl-Alt-F6); > > only then the things resume as expected. > > > > As you might suspect, there is no keyboard for the production system ;-) > . > > > > As a matter of fact, I don't know where the stall take place, as when I > try > > to switch to a text console to see the logs, it switches back to X11 and > > starts our program. By the way, the logs just show that the events > > occurred at latter times than expected. > > > > Although the squashfs is read-only, some main directories are arranged > in a > > way that, using tmpfs mounts and unionfs with the read-only directory to > > the read-write tmpfs directory to that main directory provide a way of > > creating temporary files that has been working for a few years now. > > > > For instance, in "/etc/fstab": > > > > tmpfs /.etc.rw tmpfs defaults,mode=755 > > 0 0 > > union /etc unionfs > > default_permissions,allow_other,use_ino,nonempty,suid, > cow,dirs=/. > > etc.rw=rw:/.etc.ro=ro 0 0 > > > > And there is a "/.etc.ro" with a copy of all files present in regular > > "/etc" , a "/.etc.rw" directory to be mounted tmpfs, and the original > > "/etc" directory, that needs to be there at boot, even before mounting > all > > this. > > > > Does anyone have a clue? > > Try to dissect your problem. Start with removing squashfs and all > tmpfs/unionfs manipulations. Create the same image, but on "normal" > writable file system and see how it goes. It may be fs-related bug, > may be you removed too many files and some "undesired" packages are > actually mandatory. > > If you have some form on snapshots of your changes, you can try to > bisect them in a git bisect way. > > Another approach is to run X server (or any other app suspected as > a troublemaker) under strace (or attach strace to a running process) > and see what is going on. You will have a lot of low level > information and extensive filtering will be required; strace is > capable of that, but you will need to dig into its documentation. > > Best regards, > Andrew Savchenko > Hi All, After days and days struggling, I finally upgraded to the newest stable kernel and updated every package with a "emerge -e", just in case, twice! Then, rebuilt the kernel again. So, like a charm, everything got back to work as before. Unfortunately, will never know what piece of code that issue was. Thank you, and Best Regards, Francisco [-- Attachment #2: Type: text/html, Size: 5773 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] strange behaviour in quite special case 2017-09-18 13:13 ` Francisco Ares @ 2017-09-18 22:56 ` Peter Humphrey 2017-09-19 14:15 ` Francisco Ares 0 siblings, 1 reply; 5+ messages in thread From: Peter Humphrey @ 2017-09-18 22:56 UTC (permalink / raw To: gentoo-user On Monday, 18 September 2017 14:13:44 BST Francisco Ares wrote: > After days and days struggling, I know what you mean. I've spent weeks wrestling with KMail. That included losing e-mails, falling behind in conversations and so on. > I finally upgraded to the newest stable kernel and updated every package > with a "emerge -e", just in case, twice! Then, rebuilt the kernel again. > > So, like a charm, everything got back to work as before. My technique in such cases is to emerge @system, then recompile the kernel, reboot on it and emerge -e world --exclude="gcc gentoo-sources". Seems to have worked out all right so far. You could omit the exclusion if you're even more paranoid than KMail has made me. -- Regards, Peter. ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [gentoo-user] strange behaviour in quite special case 2017-09-18 22:56 ` Peter Humphrey @ 2017-09-19 14:15 ` Francisco Ares 0 siblings, 0 replies; 5+ messages in thread From: Francisco Ares @ 2017-09-19 14:15 UTC (permalink / raw To: gentoo-user [-- Attachment #1: Type: text/plain, Size: 1302 bytes --] 2017-09-18 19:56 GMT-03:00 Peter Humphrey <peter@prh.myzen.co.uk>: > On Monday, 18 September 2017 14:13:44 BST Francisco Ares wrote: > > > After days and days struggling, > > I know what you mean. I've spent weeks wrestling with KMail. That included > losing e-mails, falling behind in conversations and so on. > > > I finally upgraded to the newest stable kernel and updated every package > > with a "emerge -e", just in case, twice! Then, rebuilt the kernel again. > > > > So, like a charm, everything got back to work as before. > > My technique in such cases is to emerge @system, then recompile the kernel, > reboot on it and emerge -e world --exclude="gcc gentoo-sources". Seems to > have worked out all right so far. You could omit the exclusion if you're > even more paranoid than KMail has made me. > > -- > Regards, > Peter. > > > Hi, Peter. Thank you for your experience. In fact, as an "emerge -e" is quite automatic, and I could let the system alone a whole weekend, I didn't worry (nor had the time) to do it in parts to try to figure out which one would succeed, in special because on the following monday it just _should_ be working, or the launch of the new program version would be delayed (for who-knows how much time) and would put my neck at risk ;-). Best Regards, Francisco [-- Attachment #2: Type: text/html, Size: 2056 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2017-09-19 14:16 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-08-24 21:27 [gentoo-user] strange behaviour in quite special case Francisco Ares 2017-08-31 7:47 ` Andrew Savchenko 2017-09-18 13:13 ` Francisco Ares 2017-09-18 22:56 ` Peter Humphrey 2017-09-19 14:15 ` Francisco Ares
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox