[oe] Kernel Headers Quality Issue

Fri May 14 12:38:20 UTC 2010

On Fri, 2010-05-14 at 13:40 +0200, Thilo Fromm wrote:
> This would mean that checking for kernel features (e.g. inotify_init1()) 
> at compile time via an application's configure.ac is pointless, right?

Not entirely pointless: the feature set that you determine at configure
time is the maximum set that you can expect to be available.  That is,
if you determine at configure time that __NR_inotify_init1 is not
defined, you clearly should not even attempt to use this syscall.

> Code like this
> 
> #ifdef HAVE_INOTIFY_INIT1
> 	fd = inotify_init1(O_NONBLOCK);
> #else
> 	fd = inotify_init();
> 	flags = fcntl(fd, F_GETFL, 0);
> 	fcntl(ret, F_SETFL, flags | O_NONBLOCK);
> #endif
> 
> will break at runtime even though everything was OK at compile time, and 
> you are proposing that I should expect it to break.

Correct.  In this particular case you would probably be better off just
using the "else" block at all times.

> Well, what to do _if_ a feature is not present is a whole different 
> story. But for figuring out _whether_ it is present or not you propose 
> (blind) runtime probing while I would like to rely on configure output, 

I'm not quite sure why you describe this as "blind" probing.  There is
nothing blind about it; the semantics of unimplemented syscalls are
quite well-defined.

In the general case, there simply isn't any way to determine statically
at compile time what features will be available at run time.  Some
features might require modules that aren't loaded, or might be disabled
by kernel commandline options, or might not be supported on the
particular CPU that the kernel finds itself running on, or any number of
other things.  As I have mentioned before, this is exactly what glibc
itself does internally in order to cope with multiple different kernel
versions.

If you are going to rely on nonstandard and volatile Linux-specific
programming interfaces then the onus is definitely on you as the
application developer to ensure that your application behaves sensibly
under all conditions that it is likely to encounter.  You simply have to
decide which features are mandatory for your program and which are
optional.  If a mandatory feature is not available then the program will
simply not run: you might choose to produce a "kernel too old" kind of
diagnostic, like glibc does if you configure it with --enable-kernel=X
and then try to run on an older version, or you might just let it crash
with ENOSYS or some such.  If an optional feature is unavailable then
you have to provide some suitable fallback action.

> As at least some of the 
> applications integrated by openembedded do not share your views, so 
> they'll do weird things at runtime. This is a severe quality issue. And 
> we have only a small chance of noticing any application runtime 
> misbehaviour. The syscall in question might only rarely be used, it 
> might be some corner-case of an application that is executed e.g. only 
> once in a month.

If you can't tolerate the risk of applications breaking in this way
then, as a distro maintainer, you can simply set and document an
appropriate policy that avoids the situation arising (e.g. compile all
applications against the lowest common denominator headers, or lock the
versions together so that the runtime kernel version always precisely
matches the compile time header version).  That would be a perfectly
acceptable way to proceed.  

p.