On 07/01/2013 03:23 PM, Greg KH wrote:
> On Mon, Jul 01, 2013 at 08:45:16PM +0200, Tom Wijsman wrote:
>>>> Q: What about my stable server? I really don't want to run this
>>>> stuff!
>>>>
>>>> A: These options would depend on !CONFIG_VANILLA or
>>>> CONFIG_EXPERIMENTAL
>>>
>>> What is CONFIG_VANILLA?  I don't see that in the upstream kernel tree
>>> at all.
>>>
>>> CONFIG_EXPERIMENTAL is now gone from upstream, so you are going to
>>> have a problem with this.
>>
>> Earlier I mentioned "2) These feature should depend on a non-vanilla /
>> experimental option." which is an option we would introduce under the
>> Gentoo distribution menu section.
> 
> Distro-specific config options, great :(
> 
>>>>    which would be disabled by default, therefore if you keep this
>>>> option the way it is on your stable server; it won't affect you.
>>>
>>> Not always true.  Look at aufs as an example.  It patches the core
>>> kernel code in ways that are _not_ accepted upstream yet.  Now you all
>>> are running that modified code, even if you don't want aufs.
>>
>> Earlier I mentioned "3) The patch should not affect the build by
>> default."; if it does, we have to adjust it to not do that, this is
>> something that can be easily scripted. It's just a matter of embedding
>> each + block in the diff with a config check and updating the counts.
> 
> Look at aufs as a specific example of why you can't do that, otherwise,
> don't you think that the aufs developer(s) wouldn't have done so?

I am accquainted with the developer of a stackable filesystem developer.
According to what he has told me in person offline, the developers on
the LKML cannot decide on how a stackable filesystem should be
implemented. I was told three different variations on the design that
some people liked and others didn't, which ultimately kept the upstream
kernel from adopting anything. I specifically recall two variations,
which were doing it as part of the VFS and doing it as part of ext4. If
you want to criticize stackable filesystems, would you lay out a
groundwork for getting one implemented upon which people will agree?

> The goal of "don't touch any other kernel code" is a very good one, but
> not always true for these huge out-of-tree kernel patches.  Usually that
> is the main reason why these patches aren't merged upstream, because
> those changes are not acceptable.

I was under the impression that there were several reasons for patches
not being merged upstream:

1. Lack of signed-off
2. Code drop that no one will maintain
3. Subsystem maintainers saying no simply because they do not like
<insert non-technical reason here>.
4. Risk of patent trolls
5. Actual technical reasons

> So be very careful here, you are messing with things that are rejected
> by upstream.
> 
> greg k-h
> 

Only some of the patches were rejected. Others were never submitted. The
PaX/GrSecurity developers prefer their code to stay out-of-tree. As one
of the people hacking on ZFSOnLinux, I prefer that the code be
out-of-tree. That is because fixes for other filesystems are either held
back by a lack of system kernel updates or held hostage by regressions
in newer kernels on certain hardware.

With that said, being in Linus' tree does not make code fall under some
golden standard for quality. There are many significant issues in code
committed to Linus' the kernel, some of which have been problems for
years. Just to name a few:

1. Doing `rm -r /dir` on a directory tree containing millions of inodes
(e.g. ccache) on an ext4 filesystem mounted with discard with the CFQ IO
elevator will cause a system to hang for hours on pre-SATA 3.1 hardware.
This is because TRIM is a non-queued command and is being interleaved
with writes for "fairness". Incidentally, using noop turns a multiple
hour hang into a laggy experience of a few minutes.

2. aio_sync() is unimplemented, which means that there is no sane way
for userland software like QEMU and TGT to be both fast and guarantee
data integrity. A single crash and your guest is corrupted. It would
have been better had AIO never been implemented.

3. dm-crypt will reorder write requests across flushes. That is because
upon seeing a write, it sends it to a work queue to be processed
asynchronously and upon seeing a flush, it immediately processes it. A
single kernel panic or sudden power loss can damage filesystems stored
on it.

4. Under low memory conditions with hundreds of concurrent threads (e.g.
package builds), every thread will enter direct reclaim and there will
be a remarkable drop in system throughput, assuming that the system does
not lockup. There is a fairly substantial amount of time wasted after
one thread finishes direct reclaim in other threads because they will
still be performing direct reclaim afterward.

5. The Linux 3.7 nouveau rewrite broke kexec support. The graphics
hardware will not reinitialize properly.

6. A throttle mechanism introduced for memory cgroups can cause the
system to deadlock whenever it is holding a lock needed for swap and
enters direct reclaim with a significant number of dirty pages.

7. Code has been accepted on multiple occasions that does not compile
and the build failures persist for weeks if not months after Linus' tag.
I sent a patch to fix one failure. It was rejected because I had fixed
code to compile with -Werror, people thought that -Werror should be
removed (and therefore was no reason to fix the warnings) and we went 2
months until someone wrote a patch that people liked to fix it. For a
current example of accepted code failing to build, look here:

https://bugzilla.kernel.org/show_bug.cgi?id=38052

Note that I have not checked Linus' tree to see if that bug is still
current, but the bug itself appears to be open as of this writing.

There are plenty more technical issues, but these are just my pet
peeves. If you want more examples, you could look at the patches people
send you each day and ask yourself how many are things that could have
been caught had people been more careful during review. For instance,
look at the barrier patches that were done around Linux 2.6.30. What
prevented those from being caught by review years earlier?

Being outside Linus' tree is not synonymous with being bad and being bad
is not synonymous with being rejected. It is perfectly reasonable to
think that there are examples of good code outside Linus' tree.
Furthermore, should the kernel kernel choose to engage that out-of-tree
code, my expectation is that its quality will improve as they do testing
and write patches.