[OE-core] Tune files and knobs to turn

Koen Kooi koen at dominion.thruhere.net
Thu Jun 30 17:41:49 UTC 2011


Op 30 jun 2011, om 18:02 heeft Tom Rini het volgende geschreven:

> On 06/28/2011 10:36 AM, Darren Hart wrote:
>> 
>> 
>> On 06/24/2011 04:54 AM, Koen Kooi wrote:
>>> Hi,
>>> 
>>> We discussed tune files a bit during last nights TSC meeting and Khem had
>>> expressed the need before, so I'd like to get this discussion started by using
>>> armv7a as an example.
>>> 
>>> For armv7a capable cores we have the following hardware features:
>>> 
>>> * armv7a instruction set
>>> * thumb1 instruction set
>>> * thumb2 instruction set
>>> * VFP coprocessor
>>> * optional NEON coprocessor
>>> 
>>> For the ABI we can choose the following:
>>> 
>>> * softtp without hw support (e.g. no VFP instructions emitted, slow)
>>> * softfp with hw support (e.g. VFP and/or NEON instructions emitted, fast)
>>> * hardfp, emits VFP and/or NEON instructions, slightly faster than softfp/hw,
>>>  incompatible with everything else
>>> 
>>> And the extra knobs:
>>> 
>>> * pure thumb1, no arm instructions (limited use)
>>> * thumb1/arm interworking
>>> * pure thumb2,  no arm instructions
>>> * thumb2 interworking (not sure if that's actually usefull, thumb2 has complete coverage)
>>> 
>>> In OE .dev we have the following vars:
>>> 
>>> TARGET_FPU: switches between hw float and sw float, no reflection in package arch
>>> ARM_FP_ABI: switches between softfp and hardfp, will create 'armv7a' or
>>>            'armv7a-hardfp' as package arch
>>> ARM_INSTRUCTION_SET: switches between arm and thumb1, no reflection in package arch
>>> THUMB_INTERWORK: turns on interworking, no reflection in package arch
>>> 
>>> (side note, oe-core/distroless and meta-yocto/poky don't turn set TARGET_FPU
>>> for armv7a and will generate slow code, angstrom does turn it on)
>> 
>> 
>> oe-core tune-cortexa8.inc doesn't make use of these variables (unlike
>> meta-texasinstruments) and does make use of the neon coprocessor, but
>> still uses the softfp float-api:
>> 
>> TARGET_CC_ARCH = "-march=armv7-a -mtune=cortex-a8 -mfpu=neon
>> -mfloat-abi=softfp -fno-tree-vectorize"
> 
> What's with the -fno-tree-vectorize?  I had someone point out to me that
> the TI wiki recommends turning that on, even outside of -O3 (which
> enables it by default).

Real world experience with gcc 4.3 and 4.5 has shown that gcc is shockingly bad at vectorizing for NEON, so you need this unbreak-me option to avoid slowdowns. It might be less bad on cortex-a9 or a15, but for A8 not vectorizing is a net win.



More information about the Openembedded-core mailing list