[OE-core] pseudo: host user contamination

Tue Mar 27 19:11:22 UTC 2018

On Mon, Mar 26, 2018 at 9:41 PM, Seebs <seebs at seebs.net> wrote:
>
>> The syscall manpage is from the kernel manpages, not glibc.
>
>>   http://man7.org/linux/man-pages/man2/syscall.2.html
>
> And yet! glibc is setting those registers in its code. Why? If that's a
> kernel thing and libc doesn't need to do it, why is libc doing it?

Of course libc syscall is setting those registers WITHIN it's code.
The job of the syscall() function is to translate from a C callable
API into a kernel syscall - so it must read arguments passed in from
the C caller (via the normal C function call rules, e.g. the first few
arguments passed via registers, the rest on the stack, etc) and use
them to setup a kernel syscall (via the kernel's syscall interface, ie
maximum of 6 arguments, all passed via registers). After the kernel
syscall has returned, the implementation of libc syscall() needs to
collect the result from whichever register the kernel leaves it in and
return it via the normal C function call rules (plus take care of some
extra housekeeping, ie setting errno).

Whatever happens within syscall() is not important. The key point is
that it's a C callable function and follows standard C function call
rules.

> Okay, you've read the code in glibc and understand it. So, why does the
> glibc code have that register-setting assembly, if that register-setting
> assembly doesn't matter?

If you are asking why does glibc implement syscall() in assembler when
it could be implemented in completely generic C code (as musl does)
then the answer is I don't know. Historical I guess.

Looking at the glibc 32bit ARM syscall() assembler. After stripping
away the cfi_XXX annotations (ie stuff related to debug, not actual
opcodes) the assembler is:

ENTRY (syscall)
    mov    ip, sp
    push    {r4, r5, r6, r7}
    mov    r7, r0
    mov    r0, r1
    mov    r1, r2
    mov    r2, r3
    ldmfd    ip, {r3, r4, r5, r6}
    swi    0x0
    pop    {r4, r5, r6, r7}
    cmn    r0, #4096
    it    cc
    RETINSTR(cc, lr)
    b    PLTJMP(syscall_error)
PSEUDO_END (syscall)

ie it's pushing the original contents of r4, r5, r6 and r7 to the
stack, shuffling the first 4 arguments from C into the kernel's
syscall registers (the syscall number in r0 -> r7, the first argument
in r1 -> r0, etc), loading the next 4 arguments from C into registers
(cunningly, it loads 4 arguments directly from the stack into the
registers used for the next 4 arguments for the kernel syscall).
Interestingly, it's taking a total of 8 arguments from the C caller -
the first is the syscall number, then 7 additional arguments (one more
than required if the maximum is 6). It then invokes the syscall,
restores the callers original r4, r5, r6 and r7 values from the stack
and returns via a helper to set errno if the result from the kernel
indicated an error.

Now, looking at the C code implementation of syscall() in musl:

long syscall(long n, ...)
{
    va_list ap;
    syscall_arg_t a,b,c,d,e,f;
    va_start(ap, n);
    a=va_arg(ap, syscall_arg_t);
    b=va_arg(ap, syscall_arg_t);
    c=va_arg(ap, syscall_arg_t);
    d=va_arg(ap, syscall_arg_t);
    e=va_arg(ap, syscall_arg_t);
    f=va_arg(ap, syscall_arg_t);
    va_end(ap);
    return __syscall_ret(__syscall(n,a,b,c,d,e,f));
}

It fetches 6 va_args arguments from the caller, using standard C
function calling rules, and passes them on to the architecture
specific __syscall() macro, which will put the arguments in the
registers used for the kernel syscall and then invoke the syscall.
Note that since this is pure generic C code, you can insert debug,
call other functions etc where ever you like (the only thing that
needs special attention is that __syscall_ret() set errno).

Compiling the musl C code for 32bit ARM gives the following assembler:

00000000 <syscall>:
   0:    e92d000f     push    {r0, r1, r2, r3}
   4:    e92d48b0     push    {r4, r5, r7, fp, lr}
   8:    e28db010     add    fp, sp, #16
   c:    e28b0008     add    r0, fp, #8
  10:    e24dd00c     sub    sp, sp, #12
  14:    e28bc008     add    ip, fp, #8
  18:    e59b7004     ldr    r7, [fp, #4]
  1c:    e50bc018     str    ip, [fp, #-24]    ; 0xffffffe8
  20:    e890000f     ldm    r0, {r0, r1, r2, r3}
  24:    e59b4018     ldr    r4, [fp, #24]
  28:    e59b501c     ldr    r5, [fp, #28]
  2c:    ef000000     svc    0x00000000
  30:    ebfffffe     bl    0 <__syscall_ret>
  34:    e24bd010     sub    sp, fp, #16
  38:    e8bd48b0     pop    {r4, r5, r7, fp, lr}
  3c:    e28dd010     add    sp, sp, #16
  40:    e12fff1e     bx    lr

Although this is a bit of a mess (gcc obviously isn't good at
optimising va_args as it needlessly saves the first 4 arguments to the
stack and then loads them back again...) the basic shuffling of
arguments from a C function call into the registers used for the
kernel syscall is the same as the glibc assembler! (Apart from the
fact it only handles 6 syscall arguments, not 7 as the glibc assembler
does, so nothing is setup in r6).

ie the glibc assembler isn't some mysterious function with a non
standard calling convention - it's just an optimised implementation of
a standard C function.

> Okay, you say you understand why ARM EABI "sometimes" needs an argument
> to offset things. What are the circumstances?

The background to this is that in ARM 32bit EABI, 64bit values in
registers need to be kept in an even/odd register pair, which then
allows "double word" load and store instructions (ie single
instructions, first added in ARMv5, which can load or store 64bit
values from an even/odd register pair) to be used to read and write
them to/from memory. Since the ARM 32bit EABI kernel syscall interface
uses registers r0,r1,r2,r3, etc to pass the syscall arguments, a
padding argument is required if the first word of a 64bit value passed
to the kernel would not naturally be placed into an even numbered
register. In the readahead example, the first syscall argument is the
32bit file descriptor (which will be passed to the kernel in r0),
therefore a padding argument is required to fill r1 and ensure that
the first word of the 64bit offset gets passed in r2.

> Is it specific to 32-bit
> targets?

The above is completely specific to ARM 32bit EABI. I guess *similar*
issues may apply to some other 32bit architectures (as suggested in
the manpage). It's certainly not an issue with is generic to all 32bit
targets though.

> On a target with 64-bit pointers, would it apply also to
> 64-bit pointers, or is it exclusively for 64-bit integers?

Since 64bit architectures can, by definition, read and write 64bit
values to memory using single load and store instructions, no 64bit
architecture would have an ABI which places a restriction that 64bit
values need to be held in any particular register - so no padding
arguments would ever be required to accommodate that.

> Because it seems to me that on a 64-bit target, renameat2() would in
> fact be passing a 64-bit object as the second argument. And if there's
> a reason that this doesn't count as a 64-bit argument passed after an
> odd number of 32-bit arguments, I'd like to know specifically what that
> reason is before I go relying on it to stay true forever.

For a 64bit architecture, the distinction between a 32bit argument and
a 64bit argument is only in how you interpret that data. In all cases
the data is passed as a 64bit value.

The code calling libc syscall() and the code within the kernel which
interprets the syscall arguments must agree on the format of the data,
but for a libc syscall() implementation which just passes the
arguments along it can treat everything as 64bit values. It doesn't
matter if an argument is actually int, long, or pointer. See the musl
syscall() implementation - all va_args values are extracted from the
caller as long.

If syscall(), or a wrapper for it, *does* need to interpret the
arguments for a particular syscall then the syscall() implementation
would have to also agree with the interpretation of the data defined
by the kernel.