[OE-core] pseudo: host user contamination
Andre McCurdy
armccurdy at gmail.com
Tue Mar 27 19:11:22 UTC 2018
On Mon, Mar 26, 2018 at 9:41 PM, Seebs <seebs at seebs.net> wrote:
>
>> The syscall manpage is from the kernel manpages, not glibc.
>
>> http://man7.org/linux/man-pages/man2/syscall.2.html
>
> And yet! glibc is setting those registers in its code. Why? If that's a
> kernel thing and libc doesn't need to do it, why is libc doing it?
Of course libc syscall is setting those registers WITHIN it's code.
The job of the syscall() function is to translate from a C callable
API into a kernel syscall - so it must read arguments passed in from
the C caller (via the normal C function call rules, e.g. the first few
arguments passed via registers, the rest on the stack, etc) and use
them to setup a kernel syscall (via the kernel's syscall interface, ie
maximum of 6 arguments, all passed via registers). After the kernel
syscall has returned, the implementation of libc syscall() needs to
collect the result from whichever register the kernel leaves it in and
return it via the normal C function call rules (plus take care of some
extra housekeeping, ie setting errno).
Whatever happens within syscall() is not important. The key point is
that it's a C callable function and follows standard C function call
rules.
> Okay, you've read the code in glibc and understand it. So, why does the
> glibc code have that register-setting assembly, if that register-setting
> assembly doesn't matter?
If you are asking why does glibc implement syscall() in assembler when
it could be implemented in completely generic C code (as musl does)
then the answer is I don't know. Historical I guess.
Looking at the glibc 32bit ARM syscall() assembler. After stripping
away the cfi_XXX annotations (ie stuff related to debug, not actual
opcodes) the assembler is:
ENTRY (syscall)
mov ip, sp
push {r4, r5, r6, r7}
mov r7, r0
mov r0, r1
mov r1, r2
mov r2, r3
ldmfd ip, {r3, r4, r5, r6}
swi 0x0
pop {r4, r5, r6, r7}
cmn r0, #4096
it cc
RETINSTR(cc, lr)
b PLTJMP(syscall_error)
PSEUDO_END (syscall)
ie it's pushing the original contents of r4, r5, r6 and r7 to the
stack, shuffling the first 4 arguments from C into the kernel's
syscall registers (the syscall number in r0 -> r7, the first argument
in r1 -> r0, etc), loading the next 4 arguments from C into registers
(cunningly, it loads 4 arguments directly from the stack into the
registers used for the next 4 arguments for the kernel syscall).
Interestingly, it's taking a total of 8 arguments from the C caller -
the first is the syscall number, then 7 additional arguments (one more
than required if the maximum is 6). It then invokes the syscall,
restores the callers original r4, r5, r6 and r7 values from the stack
and returns via a helper to set errno if the result from the kernel
indicated an error.
Now, looking at the C code implementation of syscall() in musl:
long syscall(long n, ...)
{
va_list ap;
syscall_arg_t a,b,c,d,e,f;
va_start(ap, n);
a=va_arg(ap, syscall_arg_t);
b=va_arg(ap, syscall_arg_t);
c=va_arg(ap, syscall_arg_t);
d=va_arg(ap, syscall_arg_t);
e=va_arg(ap, syscall_arg_t);
f=va_arg(ap, syscall_arg_t);
va_end(ap);
return __syscall_ret(__syscall(n,a,b,c,d,e,f));
}
It fetches 6 va_args arguments from the caller, using standard C
function calling rules, and passes them on to the architecture
specific __syscall() macro, which will put the arguments in the
registers used for the kernel syscall and then invoke the syscall.
Note that since this is pure generic C code, you can insert debug,
call other functions etc where ever you like (the only thing that
needs special attention is that __syscall_ret() set errno).
Compiling the musl C code for 32bit ARM gives the following assembler:
00000000 <syscall>:
0: e92d000f push {r0, r1, r2, r3}
4: e92d48b0 push {r4, r5, r7, fp, lr}
8: e28db010 add fp, sp, #16
c: e28b0008 add r0, fp, #8
10: e24dd00c sub sp, sp, #12
14: e28bc008 add ip, fp, #8
18: e59b7004 ldr r7, [fp, #4]
1c: e50bc018 str ip, [fp, #-24] ; 0xffffffe8
20: e890000f ldm r0, {r0, r1, r2, r3}
24: e59b4018 ldr r4, [fp, #24]
28: e59b501c ldr r5, [fp, #28]
2c: ef000000 svc 0x00000000
30: ebfffffe bl 0 <__syscall_ret>
34: e24bd010 sub sp, fp, #16
38: e8bd48b0 pop {r4, r5, r7, fp, lr}
3c: e28dd010 add sp, sp, #16
40: e12fff1e bx lr
Although this is a bit of a mess (gcc obviously isn't good at
optimising va_args as it needlessly saves the first 4 arguments to the
stack and then loads them back again...) the basic shuffling of
arguments from a C function call into the registers used for the
kernel syscall is the same as the glibc assembler! (Apart from the
fact it only handles 6 syscall arguments, not 7 as the glibc assembler
does, so nothing is setup in r6).
ie the glibc assembler isn't some mysterious function with a non
standard calling convention - it's just an optimised implementation of
a standard C function.
> Okay, you say you understand why ARM EABI "sometimes" needs an argument
> to offset things. What are the circumstances?
The background to this is that in ARM 32bit EABI, 64bit values in
registers need to be kept in an even/odd register pair, which then
allows "double word" load and store instructions (ie single
instructions, first added in ARMv5, which can load or store 64bit
values from an even/odd register pair) to be used to read and write
them to/from memory. Since the ARM 32bit EABI kernel syscall interface
uses registers r0,r1,r2,r3, etc to pass the syscall arguments, a
padding argument is required if the first word of a 64bit value passed
to the kernel would not naturally be placed into an even numbered
register. In the readahead example, the first syscall argument is the
32bit file descriptor (which will be passed to the kernel in r0),
therefore a padding argument is required to fill r1 and ensure that
the first word of the 64bit offset gets passed in r2.
> Is it specific to 32-bit
> targets?
The above is completely specific to ARM 32bit EABI. I guess *similar*
issues may apply to some other 32bit architectures (as suggested in
the manpage). It's certainly not an issue with is generic to all 32bit
targets though.
> On a target with 64-bit pointers, would it apply also to
> 64-bit pointers, or is it exclusively for 64-bit integers?
Since 64bit architectures can, by definition, read and write 64bit
values to memory using single load and store instructions, no 64bit
architecture would have an ABI which places a restriction that 64bit
values need to be held in any particular register - so no padding
arguments would ever be required to accommodate that.
> Because it seems to me that on a 64-bit target, renameat2() would in
> fact be passing a 64-bit object as the second argument. And if there's
> a reason that this doesn't count as a 64-bit argument passed after an
> odd number of 32-bit arguments, I'd like to know specifically what that
> reason is before I go relying on it to stay true forever.
For a 64bit architecture, the distinction between a 32bit argument and
a 64bit argument is only in how you interpret that data. In all cases
the data is passed as a 64bit value.
The code calling libc syscall() and the code within the kernel which
interprets the syscall arguments must agree on the format of the data,
but for a libc syscall() implementation which just passes the
arguments along it can treat everything as 64bit values. It doesn't
matter if an argument is actually int, long, or pointer. See the musl
syscall() implementation - all va_args values are extracted from the
caller as long.
If syscall(), or a wrapper for it, *does* need to interpret the
arguments for a particular syscall then the syscall() implementation
would have to also agree with the interpretation of the data defined
by the kernel.
More information about the Openembedded-core
mailing list