* [gentoo-soc] Week 6 Report for Big Data Infrastructure and H2O ebuilds Project
@ 2021-07-19 5:52 Yuan Liao (Leo)
2021-07-19 10:29 ` Benda Xu
2021-07-20 17:39 ` A Schenck
0 siblings, 2 replies; 4+ messages in thread
From: Yuan Liao (Leo) @ 2021-07-19 5:52 UTC (permalink / raw
To: gentoo-soc
Hi folks,
This week, I have been busy with an eclectic mix of tasks. First of
all, I continued to make improvements to ebuild-commander so it can
run well both in CI environments and on any developers' personal
workstations. Another related accomplishment was a new ebuild
installation test case to be run in GitHub Actions that would have
coverage over all ebuilds in the Spark overlay. Last but not least, I
further expanded the documentation for the Kotlin packages I had
created a few weeks ago with the addition of a maintainer-oriented
wiki page.
The bulk of the improvements to ebuild-commander are new features
designed to optimize its user experience in an interactive
environment. While I was developing ebuild-commander and creating new
test cases for ebuilds in the Spark overlay, I would often run the
test cases from an interactive shell on my own computer before
creating a Git commit. Sometimes I realized something was wrong
immediately after the tests were launched, so I would interrupt
ebuild-commander at once, fix the problem, and re-run the tests. The
interruption would almost always be followed by a deletion of the
Docker container created by ebuild-commander, which was tedious when
it was done manually. So, I added SIGINT handling to ebuild-commander
to let it automatically clean up the container it created upon
interruption. By contrast, ebuild-commander might also perform an
automatic clean-up when the user might want the container to be
retained. For example, if the test fails, the user probably wishes to
open up a shell in the container to inspect it, in which case the
container should not be cleaned up. To deal with this situation, I
added a new command-line option to ebuild-commander for controlling
the automatic clean-up behavior. By default, the clean-up would be
skipped if the test failed, but the user can also choose to keep the
container even if the test succeeded or always remove it even upon
test failure.
As per my original project proposal, I am also adding a test case for
the ebuild installation tests which will ensure every package in the
Spark overlay can be installed at least once. Adding every package to
the emerge command theoretically works, but the command would be too
long. Invoking emerge separately for each package would resolve this
problem, but the overhead of emerge's dependency calculation would
seriously impact the test runtime. I came up with a solution that
could address both issues: write a script to compute a list of leaf
packages in the Spark overlay and pass the packages in the list to
emerge, so every package in the overlay would be installed, and the
emerge command can be simplified to have a shorter length too. The
script can also act as a helpful tool for any ebuild repository's
maintainers to find out all leaf packages in the repository for
maintenance tasks like last-rite and package clean-up. After some
initial optimization and tuning, the script (written in Python) can
compute a list of leaf packages among about 500 packages in the Spark
overlay within only a few minutes. The optimization and tuning is
also the topic for this week's blog post of mine [1]. This post
covers some knowledge and topics from computer science, including
graph theory, graph algorithms, data structure, and time complexity.
If you are interested in any of those subjects, make sure you don't
miss it!
Meanwhile, I created a new page on Gentoo Wiki for the Kotlin Package
Maintainer Guide [2] and added explanations for how a Kotlin package
is built by the upstream, instructions to find the commands that
should be used to compile the Kotlin packages, and a tutorial for
using the eclasses I authored to support the Kotlin ebuilds. I hope
it can be useful to both any future developers who would like to help
maintain those Kotlin packages and myself after a few months in case I
forget the details of my own work.
This concludes my work during the past week and this report. Thank
you for reading it (and my blog post in case you are checking it out)!
Best regards,
Leo
[1]: https://leo3418.github.io/2021/07/18/find-leaf-packages.html
[2]: https://wiki.gentoo.org/wiki/User:Leo3418/Kotlin/Package_Maintainer_Guide
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [gentoo-soc] Week 6 Report for Big Data Infrastructure and H2O ebuilds Project
2021-07-19 5:52 [gentoo-soc] Week 6 Report for Big Data Infrastructure and H2O ebuilds Project Yuan Liao (Leo)
@ 2021-07-19 10:29 ` Benda Xu
2021-07-20 17:39 ` A Schenck
1 sibling, 0 replies; 4+ messages in thread
From: Benda Xu @ 2021-07-19 10:29 UTC (permalink / raw
To: gentoo-soc
"Yuan Liao (Leo)" <liaoyuan@gmail.com> writes:
> This week, I have been busy with an eclectic mix of tasks. First of
> all, I continued to make improvements to ebuild-commander so it can
> run well both in CI environments and on any developers' personal
> workstations. Another related accomplishment was a new ebuild
> installation test case to be run in GitHub Actions that would have
> coverage over all ebuilds in the Spark overlay. Last but not least, I
> further expanded the documentation for the Kotlin packages I had
> created a few weeks ago with the addition of a maintainer-oriented
> wiki page.
>
> The bulk of the improvements to ebuild-commander are new features
> designed to optimize its user experience in an interactive
> environment. While I was developing ebuild-commander and creating new
> test cases for ebuilds in the Spark overlay, I would often run the
> test cases from an interactive shell on my own computer before
> creating a Git commit. Sometimes I realized something was wrong
> immediately after the tests were launched, so I would interrupt
> ebuild-commander at once, fix the problem, and re-run the tests. The
> interruption would almost always be followed by a deletion of the
> Docker container created by ebuild-commander, which was tedious when
> it was done manually. So, I added SIGINT handling to ebuild-commander
> to let it automatically clean up the container it created upon
> interruption. By contrast, ebuild-commander might also perform an
> automatic clean-up when the user might want the container to be
> retained. For example, if the test fails, the user probably wishes to
> open up a shell in the container to inspect it, in which case the
> container should not be cleaned up. To deal with this situation, I
> added a new command-line option to ebuild-commander for controlling
> the automatic clean-up behavior. By default, the clean-up would be
> skipped if the test failed, but the user can also choose to keep the
> container even if the test succeeded or always remove it even upon
> test failure.
>
> As per my original project proposal, I am also adding a test case for
> the ebuild installation tests which will ensure every package in the
> Spark overlay can be installed at least once. Adding every package to
> the emerge command theoretically works, but the command would be too
> long. Invoking emerge separately for each package would resolve this
> problem, but the overhead of emerge's dependency calculation would
> seriously impact the test runtime. I came up with a solution that
> could address both issues: write a script to compute a list of leaf
> packages in the Spark overlay and pass the packages in the list to
> emerge, so every package in the overlay would be installed, and the
> emerge command can be simplified to have a shorter length too. The
> script can also act as a helpful tool for any ebuild repository's
> maintainers to find out all leaf packages in the repository for
> maintenance tasks like last-rite and package clean-up. After some
> initial optimization and tuning, the script (written in Python) can
> compute a list of leaf packages among about 500 packages in the Spark
> overlay within only a few minutes. The optimization and tuning is
> also the topic for this week's blog post of mine [1]. This post
> covers some knowledge and topics from computer science, including
> graph theory, graph algorithms, data structure, and time complexity.
> If you are interested in any of those subjects, make sure you don't
> miss it!
>
> Meanwhile, I created a new page on Gentoo Wiki for the Kotlin Package
> Maintainer Guide [2] and added explanations for how a Kotlin package
> is built by the upstream, instructions to find the commands that
> should be used to compile the Kotlin packages, and a tutorial for
> using the eclasses I authored to support the Kotlin ebuilds. I hope
> it can be useful to both any future developers who would like to help
> maintain those Kotlin packages and myself after a few months in case I
> forget the details of my own work.
>
> This concludes my work during the past week and this report. Thank
> you for reading it (and my blog post in case you are checking it out)!
Thank you! Keeping going with good work.
Yours,
Benda
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [gentoo-soc] Week 6 Report for Big Data Infrastructure and H2O ebuilds Project
2021-07-19 5:52 [gentoo-soc] Week 6 Report for Big Data Infrastructure and H2O ebuilds Project Yuan Liao (Leo)
2021-07-19 10:29 ` Benda Xu
@ 2021-07-20 17:39 ` A Schenck
2021-07-21 6:23 ` Yuan Liao (Leo)
1 sibling, 1 reply; 4+ messages in thread
From: A Schenck @ 2021-07-20 17:39 UTC (permalink / raw
To: gentoo-soc
On 7/18/21 10:52 PM, Yuan Liao (Leo) wrote:
> Hi folks,
>
> <snip/>
>
> As per my original project proposal, I am also adding a test case for
> the ebuild installation tests which will ensure every package in the
> Spark overlay can be installed at least once. Adding every package to
> the emerge command theoretically works, but the command would be too
> long. Invoking emerge separately for each package would resolve this
> problem, but the overhead of emerge's dependency calculation would
> seriously impact the test runtime. I came up with a solution that
> could address both issues: write a script to compute a list of leaf
> packages in the Spark overlay and pass the packages in the list to
> emerge, so every package in the overlay would be installed, and the
> emerge command can be simplified to have a shorter length too. The
> script can also act as a helpful tool for any ebuild repository's
> maintainers to find out all leaf packages in the repository for
> maintenance tasks like last-rite and package clean-up. After some
> initial optimization and tuning, the script (written in Python) can
> compute a list of leaf packages among about 500 packages in the Spark
> overlay within only a few minutes. The optimization and tuning is
> also the topic for this week's blog post of mine [1]. This post
> covers some knowledge and topics from computer science, including
> graph theory, graph algorithms, data structure, and time complexity.
> If you are interested in any of those subjects, make sure you don't
> miss it!
Thanks! We actually don't really care much about Java (haven't used it
seriously since College), and haven't even been involved in Gentoo GSoC
in a decade, but we're glad we stay on this list for things like this.
It's really nice seeing someone who still has that spark of interest in
computer things. We do happen to like graph theory and network analysis
and time complexity and such, but haven't really been able to apply it
in "the real world" of tech companies. Every time we try to do things
"the right way" with real computer science, coworkers and bosses just
say "just hack something together".
Oh well, thanks for what you're doing,
-A
>
> <snip/>
>
> This concludes my work during the past week and this report. Thank
> you for reading it (and my blog post in case you are checking it out)!
>
> Best regards,
> Leo
>
> [1]: https://leo3418.github.io/2021/07/18/find-leaf-packages.html
> [2]: https://wiki.gentoo.org/wiki/User:Leo3418/Kotlin/Package_Maintainer_Guide
>
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [gentoo-soc] Week 6 Report for Big Data Infrastructure and H2O ebuilds Project
2021-07-20 17:39 ` A Schenck
@ 2021-07-21 6:23 ` Yuan Liao (Leo)
0 siblings, 0 replies; 4+ messages in thread
From: Yuan Liao (Leo) @ 2021-07-21 6:23 UTC (permalink / raw
To: gentoo-soc
> Thanks! We actually don't really care much about Java (haven't used it
> seriously since College), and haven't even been involved in Gentoo GSoC
> in a decade, but we're glad we stay on this list for things like this.
> It's really nice seeing someone who still has that spark of interest in
> computer things. We do happen to like graph theory and network analysis
> and time complexity and such, but haven't really been able to apply it
> in "the real world" of tech companies. Every time we try to do things
> "the right way" with real computer science, coworkers and bosses just
> say "just hack something together".
>
>
> Oh well, thanks for what you're doing,
>
> -A
It is wonderful to hear from people who have been involved in Gentoo
GSoC before! Thanks for sharing this with us. I guess it would be
reasonable to say that little of the knowledge pertaining to low-level
details taught in the classroom could have a chance to be applied in
solving a real-world problem. I was wrecked by a self-balancing tree
implementation problem in my midterm exam for a data structure class,
but that would not affect my ability to write a working program which
uses self-balancing trees because I can just borrow an existing
implementation of the data structure built by others. That said,
knowing how things happen under the hood is still useful in
programming and optimizing for efficiency.
Thanks,
Leo
On Tue, Jul 20, 2021 at 10:39 AM A Schenck <lane_andrew@hotmail.com> wrote:
>
> On 7/18/21 10:52 PM, Yuan Liao (Leo) wrote:
> > Hi folks,
> >
> > <snip/>
> >
> > As per my original project proposal, I am also adding a test case for
> > the ebuild installation tests which will ensure every package in the
> > Spark overlay can be installed at least once. Adding every package to
> > the emerge command theoretically works, but the command would be too
> > long. Invoking emerge separately for each package would resolve this
> > problem, but the overhead of emerge's dependency calculation would
> > seriously impact the test runtime. I came up with a solution that
> > could address both issues: write a script to compute a list of leaf
> > packages in the Spark overlay and pass the packages in the list to
> > emerge, so every package in the overlay would be installed, and the
> > emerge command can be simplified to have a shorter length too. The
> > script can also act as a helpful tool for any ebuild repository's
> > maintainers to find out all leaf packages in the repository for
> > maintenance tasks like last-rite and package clean-up. After some
> > initial optimization and tuning, the script (written in Python) can
> > compute a list of leaf packages among about 500 packages in the Spark
> > overlay within only a few minutes. The optimization and tuning is
> > also the topic for this week's blog post of mine [1]. This post
> > covers some knowledge and topics from computer science, including
> > graph theory, graph algorithms, data structure, and time complexity.
> > If you are interested in any of those subjects, make sure you don't
> > miss it!
>
> Thanks! We actually don't really care much about Java (haven't used it
> seriously since College), and haven't even been involved in Gentoo GSoC
> in a decade, but we're glad we stay on this list for things like this.
> It's really nice seeing someone who still has that spark of interest in
> computer things. We do happen to like graph theory and network analysis
> and time complexity and such, but haven't really been able to apply it
> in "the real world" of tech companies. Every time we try to do things
> "the right way" with real computer science, coworkers and bosses just
> say "just hack something together".
>
>
> Oh well, thanks for what you're doing,
>
> -A
>
> >
> > <snip/>
> >
> > This concludes my work during the past week and this report. Thank
> > you for reading it (and my blog post in case you are checking it out)!
> >
> > Best regards,
> > Leo
> >
> > [1]: https://leo3418.github.io/2021/07/18/find-leaf-packages.html
> > [2]: https://wiki.gentoo.org/wiki/User:Leo3418/Kotlin/Package_Maintainer_Guide
> >
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2021-07-21 6:24 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-07-19 5:52 [gentoo-soc] Week 6 Report for Big Data Infrastructure and H2O ebuilds Project Yuan Liao (Leo)
2021-07-19 10:29 ` Benda Xu
2021-07-20 17:39 ` A Schenck
2021-07-21 6:23 ` Yuan Liao (Leo)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox