[oe] arm kernel configurations
Phil Blundell
pb at reciva.com
Wed Oct 29 15:26:55 UTC 2008
On the subject of gta01 performance I thought it might be worth pointing
out a couple of common kernel inefficiencies:
- out of the box, the arm kernel always compiles itself with APCS-style
frame pointers enabled. This means that you lose a general purpose
register, and makes the function prologue and epilogue sequences more
heavyweight. For production use I would always recommend turning off
frame pointers (which involves patching arch/arm/Kconfig.debug) in order
to get a smaller and faster kernel.
- if your userspace is pure EABI, make sure that you have
CONFIG_OABI_COMPAT turned off. With this option enabled, the kernel
needs to inspect all SWI instructions to see if they might be OABI
syscalls: on machines with Harvard caches, this will almost certainly
involve hauling a cacheline in from main memory, which is slow and leads
to poor dcache utilisation (since you are loading eight words of which
only one is likely to be any use).
- if your userspace doesn't depend on load-rotate (most doesn't
nowadays) then you can hard-wire the alignment trap behaviour to be
SIGBUS and avoid reloading cp15 on every kernel entry. This saves
another two lines in the dcache. I don't have a pretty patch for this
yet but contributions would be welcome :-)
These three changes make a measurable difference to the kernel benchmark
results. For example, syscall entry/exit drops from about 490ns to
about 390ns on my gta01, an improvement of more than 20%. The cost of
stat() drops from 36.8us to 33.2us, an improvement of 14%.
share and enjoy
p.
More information about the Openembedded-devel
mailing list