[oe] arm kernel configurations

Phil Blundell pb at reciva.com
Wed Oct 29 15:26:55 UTC 2008


On the subject of gta01 performance I thought it might be worth pointing
out a couple of common kernel inefficiencies:

- out of the box, the arm kernel always compiles itself with APCS-style
frame pointers enabled.  This means that you lose a general purpose
register, and makes the function prologue and epilogue sequences more
heavyweight.  For production use I would always recommend turning off
frame pointers (which involves patching arch/arm/Kconfig.debug) in order
to get a smaller and faster kernel.

- if your userspace is pure EABI, make sure that you have
CONFIG_OABI_COMPAT turned off.  With this option enabled, the kernel
needs to inspect all SWI instructions to see if they might be OABI
syscalls: on machines with Harvard caches, this will almost certainly
involve hauling a cacheline in from main memory, which is slow and leads
to poor dcache utilisation (since you are loading eight words of which
only one is likely to be any use).

- if your userspace doesn't depend on load-rotate (most doesn't
nowadays) then you can hard-wire the alignment trap behaviour to be
SIGBUS and avoid reloading cp15 on every kernel entry.  This saves
another two lines in the dcache.  I don't have a pretty patch for this
yet but contributions would be welcome :-)

These three changes make a measurable difference to the kernel benchmark
results.  For example, syscall entry/exit drops from about 490ns to
about 390ns on my gta01, an improvement of more than 20%.  The cost of
stat() drops from 36.8us to 33.2us, an improvement of 14%.

share and enjoy

p.






More information about the Openembedded-devel mailing list