[OE-core] pseudo: host user contamination

Mon Mar 26 18:49:30 UTC 2018

On Sun, Mar 25, 2018 at 9:05 AM, Andre McCurdy <armccurdy at gmail.com> wrote:
> On Sat, Mar 24, 2018 at 10:37 PM, Victor Kamensky <kamensky at cisco.com> wrote:
>> On Sat, 24 Mar 2018, Andre McCurdy wrote:
>>> On Sat, Mar 24, 2018 at 5:09 PM, Victor Kamensky <kamensky at cisco.com>
>>> wrote:
>>>> On Sat, 24 Mar 2018, Burton, Ross wrote:
>>>>> On 24 March 2018 at 20:12, Victor Kamensky <kamensky at cisco.com> wrote:
>>>>>> Here is another crazy idea how to deal with it, just
>>>>>> brainstorming what options are on the table: disable
>>>>>> renameat2 with help of seccomp and force coreutils to
>>>>>> use other calls. Something along the lines that were
>>>>>> suggested with intercept of syscall function call, but
>>>>>> let kernel to do interception work.
>>>>>
>>>>> Wow, that's impressively magic.  Does this depend on kernel options or
>>>>> specific recent versions?
>>>
>>> Yeah, it's impressive but perhaps overkill for this situation.
>>>
>>> Having the kernel run a BPF script on every syscall is going to have a
>>> much bigger performance impact than intercepting one specific libc
>>> function in user space.
>>
>> I don't think we should worry about overhead in pseudo case.
>>
>>> Also, AFAIK, seccomp can't be nested - so building within an
>>> environment which has already been secured with seccomp (e.g. recent
>>> versions of docker?) might be a problem if pseudo starts to rely on
>>> seccomp too.
>>
>> Above is true. It was on my mind.
>>
>> Note I have no problem whatsoever if you can intercept syscall
>> function correctly. Function intercepting way is definitely more
>> aligned with what pseudo does. I was just listing other
>> possible options.
>>
>> But please note syscall function takes a
>> variable number of arguments and call another variable
>> number of argument function, real syscall implementation, in
>> general, cannot be done. One would need to have complimentary
>> vsyscall function taking va_list. I.e like printf and vprintf.
>>
>> Please see http://c-faq.com/varargs/handoff.html
>>
>> But maybe something special can be done for syscall case.
>> Disclaimer: I did not read full thread, maybe you already
>> discussed this.
>
> Yes, I think it's already been covered in the thread. Although the
> libc syscall() function takes a variable number of arguments, it's
> known that there are a maximum of 6 of them and they are all of a data
> type which fits into the register size of the target architecture (ie
> "long" for most 32bit and 64bit targets, "long long" for x32 etc).
> Therefore it's possible to extract them from the va_args created by
> the caller into 6 temporary variables and then pass those variables on
> when calling the real libc syscall() function. ie we don't actually
> need to pass the original caller's va_args on to the real syscall()
> function - we just need to pass on all the arguments.
>
> There's some concern that unconditionally extracting 6 arguments when
> the caller may have supplied less than that could be problematic.
> However, there's code in both glibc and musl which does exactly that,
> so I'm inclined to think it's OK in practice. The worst that can
> happen would seem to be passing some extra junk values to a syscall in
> the kernel which is going to ignore them.
> --
FWIW: All my build machines are affected by this issue. As temporary
workaround I downgraded coreutils-8.27-20.fc27 by

dnf install coreutils-8.27-16.fc27

Now images seem to build again without floods of host contamination.
Have no idea for how long downgrade is possible...

Interesting background: mv/renameat2 change seemed so important for
Fedora that they backported the changes into 8.27.

Andreas