[OE-core] [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors

Mark Hatle mark.hatle at windriver.com
Tue Jun 12 14:32:17 UTC 2018


On 6/12/18 4:30 AM, Herve Jourdain wrote:
> Hi,
> 
> I believe I'm the "original author" of some patch attempt at tackling this problem, more than a year ago, as referenced in this series.
> And I understand why everyone, Khem being the first and not the only one, would like some "simpler" things for ARM.
> But the problem is that ARM-based SoCs are very diverse, and ARM does have a number of optional IP blocks (such as crypto, but neon is another one, and there are others), defined for each architecture. Then ARM defines some "standard" SoCs (like cortex-A53, cortex-A57, ...) which may set some of those optional IPs as required for that SoC, and the rest still as optional.
> And SoC vendors decide what optional IPs they will implement or not...

Simplification is a goal in this, but as you said, not always reasonable with a
processor designed to be customized.

Typically true customization (vendor specific) doesn't belong in the oe-core
tune files, but stuff that is architecturally defined may.

> So when we're talking "cortex-A53", it's not necessarily the same cortex-A53 for all SoC vendors.
> 
> GCC does support all that complexity. So the main question is, do we want to be able to generate code that could take advantage of the optional IPs present on a SoC? Or do we prefer to settle for the least common denominator?

I think this is the key.  What combinations does GCC support (actually generate
code for?)   If GCC can't generate code for that combination, then I don't
believe it belongs as a tune in OE-Core, unless there is a compelling argument
that assembly level functions will be common enough to justify it.

> As someone who is close to the SoC, I definitely would prefer to be able to take advantage of the optional IPs present on an ARM SoC, and I'd rather have a system that can at least support that even if it's slightly more complex. This said, once it's done, most people won't look under the hood but just use it, so the complexity would end up being hidden - much like now with armv7.

And this is why my GCC statement is being made.  Most developers will define a
tune, but will never go into the assembly realm.  They simply don't have the
knowledge or care to devote a bunch of time for a .5% performance improvement.
If GCC can add specific optimizations, then we've hit the 'trivial optimization'
phase, and a tune may be justified.  We just need to be careful of the variant
names -- once set they will last a VERY long time.

> I've personally followed up on my patches from last year, and I now have a slightly modified/simplified version of them, which I've used to build some production-ready environments using cortex-a53/armv8 tunes, that trigger the optimization for cortex-a53 + neon. And if the SoC I'm working with had the crypto extension, I would be very happy to build for it, by just switching the tune I use for my cortex-a53 to the armv8 tune supporting crypto.
> 
> So I believe now may be a good time to talk this over again, because we're basically building for cortex-a53 with cortexa7/armv7ve, and that is not the most optimal thing to do in my opinion (like, some instructions that were native in armv7ve are simulated in armv8).

I don't think anyone objects to armv8, but I was under the impression that
things like neon were now 'required', (i.e. were not supposed to be removed from
the instruction set.)  So for anything that is now standard, they would be the
definition of armv8.. and if there are rare, but customized version w/o neon or
something else -- then I think it's a silicon vendor specific tune that is needed.

In the end it comes down to what has ARM specified, what does GCC support, and
what is ACTUALLY being broadly implemented.

> One thing that I did come up as a simplification was the handling of thumb, I don't think it needs to be an option anymore, since its support is mandatory in armv8 (but I think it was also the case in armv7). That simplifies things a bit, but nothing fundamental, you still need to carry the support for the optional IPs around...

The only reason to continue with the existing 32-bit naming conventions (t,
neon, vfp, etc) is to show the compatibility matrix.  I don't know if this
actually justifies the extensions though.  (I do know I have customers who never
want to use thumb or always [as much as possible] want to use thumb based on
their own performance requirements and designs.. so thumb being switchable is
still a desired attribute -- at least in the armv7 designs I know of.)

> And in addition to what I proposed to support last year, we indeed now have to add armv8.1a, armv8.2a, armv8.3a, armv8.4a (so far...), which each have their own specificities/differences that make it unlikely to be supported within a single file.

IF the instruction scheduling, generated instructions, optimizations, etc are
truely different.. then we should call them armv81a, etc..  (I don't believe we
can use a '.' for various reasons..)   But if there is no difference in the
compiler behavior, or the generated code.. and it's just assembly level
instruction additions -- then I'm reluctant to add these tunes as they can give
a false impression.

> Thoughts? Can we talk this over, so we can have a chance to have a good support for armv8-32 in oe, instead of everyone doing its own?
> 
> Cheers,
> Herve
> 
> -----Original Message-----
> From: openembedded-core-bounces at lists.openembedded.org [mailto:openembedded-core-bounces at lists.openembedded.org] On Behalf Of Koen Kooi
> Sent: mardi 12 juin 2018 11:01
> To: Randy Li <ayaka at soulik.info>
> Cc: OE-core <openembedded-core at lists.openembedded.org>
> Subject: Re: [OE-core] [PATCH v2 0/4] Add tune for ARMv8 and some cortex processors
> 
> 
> 
>> Op 9 jun. 2018, om 08:26 heeft Randy Li <ayaka at soulik.info> het volgende geschreven:
>>
>> I read the ARMv8 manual again, it looks the hardware float is 
>> mandatory in Linux Distributions and toolchain libraries. Even some 
>> cortex processors can be configured without FPU/NEON hardware, but I 
>> don't think they would be used in openembeded core.
>>
>> So I can assume the NEON(SIMD) would exist all the time. Leaving only 
>> the crc and crypto instructions are optional here.
>>
>>
>> Randy Li (4):
>>  arch-armv8a.inc: add tune include for armv8
>>  tune-cortexa35: add tunes for ARM Cortex-A35
>>  tune-cortexa32: add tunes for ARM Cortex-A32
>>  tune-cortexa72: add tunes for ARM Cortex-A72
> 
> Having been forced to deal with the mess that’s 32-bit arm tunes: Let’s only add an implementation specific tunes *after* having seem conclusive, repeatable benchmark results. 90% of the 32 bit tune files are placebo effect and just explode number of package archs in your distro feed. The goal of aarch64 was to stop being different for the sake of being different, let’s not make a mess because we are used to messes.
> 
> regards,
> 
> Koen
> --
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core at lists.openembedded.org
> http://lists.openembedded.org/mailman/listinfo/openembedded-core
> 




More information about the Openembedded-core mailing list