[OE-core] [PATCH] python3: enable profile optimized builds

Thu Aug 23 01:44:03 UTC 2018

On Thu, Aug 16, 2018 at 9:48 PM, Anuj Mittal <anuj.mittal at intel.com> wrote:
> On 08/17/2018 03:31 AM, Andre McCurdy wrote:
>> On Wed, Aug 15, 2018 at 11:26 PM, Anuj Mittal <anuj.mittal at intel.com> wrote:
>>> Enable profile guided optimization (pgo) for python3. Enabling pgo in
>>> python is generally as simple as invoking the target profile-opt which:
>>>
>>> - builds python binaries with profile instrumentation enabled,
>>> - runs a specific profile task using that python to get the profile
>>> data and,
>>> - feeds the compiler with this profile data and rebuilds python.
>>>
>>> This change invokes qemu-user for the second step of running a profile
>>> task using target python. Depending on how long profile task takes to
>>> run, this might add a significant time to compilation (which would be
>>> true for native builds too). The default profile task can be changed by
>>> the users depending on what makes sense for their use case (or can be
>>> left empty). In case qemu-user isn't supported, profile task won't be run.
>>
>> Is it important to re-create the profile data during every build or
>> would we get most of the same benefits from using reference data which
>> is generated offline?
>
> We should get the same benefit using the data generated offline as long
> as the source code/configure options/flags are same I believe. I have
> only tried with data generate offline using the same build configuration
> though.

As an additional data point, python do_compile now takes approx 18
minutes on my laptop. A single qemu-i386 thread loading one CPU core
at 100%, with nothing else being scheduled in parallel for most of
that time.

If we can get most of the benefits of pgo with a pre-generated data
file then that might still be something to explore.

> It would however need tweaking of the Makefiles to pass
> -fprofile-dir=<path> while using the profile data among other things.
> Please see this if you'd like an example that works:
>
> https://git.yoctoproject.org/clean/cgit.cgi/poky-contrib/commit/?h=anujm/9338&id=e57654cb51b121e9dfa76e66432c4d37fd339d42
>
>> How big is the data file? Is it binary or text?
>
> gcc -fprofile-generate generates .gcda files which are used only for
> profile use and can be deleted later and aren't installed. For more
> information:
>
> https://gcc.gnu.org/onlinedocs/gcc/Gcov-Data-Files.html
>
>> Is the data expected to be target architecture specific?>
>> If reference data were used, what are the consequences of the data not
>> corresponding exactly to the current build configuration? A build
>> failure? Or just a decrease in the effectiveness of the optimisation?
>
> From gcc man page:
>
> "By default, GCC emits an error message if the feedback profiles do not
> match the source code. This error can be turned into a warning by using
> -Wcoverage-mismatch. Note this may result in poorly optimized code."
>
>>
>> Does the profiling instrumentation measure execution timing? Or only
>> the frequency / order in which functions are called? ie is there any
>> concern that data generated from running under qemu may not be optimal
>> for running on the target?
>
> It tries to identify code hot spots, how many times each branch and call
> is executed and how many times it is taken or returns etc. so I don't
> think it should matter. I did try on target hardware and using qemu as
> well and at least performance wise, I didn't see any difference. I
> didn't perform any exhaustive analysis though.
>