[OE-core] [PATCH 1/2 v5] resultstool: enable merge, store, report and regression analysis

Mon Jan 28 02:12:13 UTC 2019

Hi RP,

Thanks for providing the precious inputs. 
Agreed with you that the current patch that enable files based regression was not enough for other use cases. 

From the information that you had shared, there are 2 more regression use cases that I have in mind:
Use case#1: directory based regression
Given that Autobuilder stored result files inside /testresults directories, user shall be able to perform the directory based regression using output from Autobuilder directly, such as below Autobuilder directories.
https://autobuilder.yocto.io/pub/releases/yocto-2.6.1.rc1/testresults/qemux86/testresults.json
https://autobuilder.yocto.io/pub/releases/yocto-2.7_M1.rc1/testresults/qemux86/testresults.json
https://autobuilder.yocto.io/pub/releases/yocto-2.7_M2.rc1/testresults/qemux86/testresults.json

Assumed that there are 2 directories storing list of result files. User shall provide these 2 directories for regression, regression scripts will first parse through all the available files inside each directories, then perform regression based on available configuration data to determine the regression pair (eg. select result_set_1 from directory#1 and result_set_x from directory#2 if they both have matching configurations). 

Use case#2: git branch based regression
Given that Autobuilder stored result files inside /testresults directories, user shall first store these directories and the result files in each git branch accordingly using the existing store plugin. After that, user can used the git branch based regression to analysis the information.
Store in yocto-2.6.1.rc1, yocto-2.7_M1.rc1, yocto-2.7_M2.rc1 git branch accordingly
https://autobuilder.yocto.io/pub/releases/yocto-2.6.1.rc1/testresults/
https://autobuilder.yocto.io/pub/releases/yocto-2.7_M1.rc1/testresults/
https://autobuilder.yocto.io/pub/releases/yocto-2.7_M2.rc1/testresults/

Assumed that result files are stored inside git repository with specific git branch storing result files for single commit. User shall provide the 2 specific git branches for regression, regression scripts will first parse through all the available files inside each git branch, then perform regression based on available configuration data to determine the regression pair (eg. select result_set_1 from git_branch_1 and result_set_x from git_branch_2 if they both have matching configurations).

The current codebase can be easily extended to enable both use cases above. Please let me know if both use cases above are important and please give us your inputs. 

Thanks,
Ee Peng 

-----Original Message-----
From: Richard Purdie [mailto:richard.purdie at linuxfoundation.org] 
Sent: Friday, January 25, 2019 11:44 PM
To: Yeoh, Ee Peng <ee.peng.yeoh at intel.com>; openembedded-core at lists.openembedded.org
Cc: Eggleton, Paul <paul.eggleton at intel.com>; Burton, Ross <ross.burton at intel.com>
Subject: Re: [OE-core] [PATCH 1/2 v5] resultstool: enable merge, store, report and regression analysis

On Tue, 2019-01-22 at 17:42 +0800, Yeoh Ee Peng wrote:
> OEQA outputs test results into json files and these files were 
> archived by Autobuilder during QA releases. Example: each oe-selftest 
> run by Autobuilder for different host distro generate a 
> testresults.json file.
> 
> These scripts were developed as a test result tools to manage these 
> testresults.json file.
> 
> Using the "store" operation, user can store multiple testresults.json 
> files as well as the pre-configured directories used to hold those 
> files.
> 
> Using the "merge" operation, user can merge multiple testresults.json 
> files to a target file.
> 
> Using the "report" operation, user can view the test result summary 
> for all available testresults.json files inside a ordinary directory 
> or a git repository.
> 
> Using the "regression" operation, user can perform regression analysis 
> on testresults.json files specified.

Thanks Ee Peng, this version is much improved!

As an experiment I had a local test results file and I was able to run:

$ resultstool regression /tmp/repo/testresults.json /tmp/repo/testresults.json -b sdk_core-image-sato_x86_64_qemumips_20181219111311 -t sdk_core-image-sato_x86_64_qemumips_20181219200052
Successfully loaded base test results from: /tmp/repo/testresults.json Successfully loaded target test results from: /tmp/repo/testresults.json Getting base test result with result_id=sdk_core-image-sato_x86_64_qemumips_20181219111311
Getting target test result with result_id=sdk_core-image-sato_x86_64_qemumips_20181219200052
============================Start Regression============================
Only print regression if base status not equal target <test case> : <base status> -> <target status> ========================================================================
assimp.BuildAssimp.test_assimp : ERROR -> PASSED ==============================End Regression==============================

I was able to clearly see that my failing test case went from ERROR to PASSED which is good. The interface and the way the information is being presented and stored are now the things we need to work on.

What is odd about the current tool/behaviour is that it sometimes expects files that are stored in a repository and sometimes expects standalone files. It doesn't feel like the work flow and the way the user would interact with this is quite correct.

You can see from my test case above that the results file I wanted to compare were both in the same repo file which I'd already merged/stored. The tool can do it by passing the file twice and specifying the IDs but its awkward and not obvious.

I'm still a little unsure how we expect to use the tool and whether the layout of the files in the git repository is how we're going to ultimately want to do this.

For example, combining all results into a single json file means we can't really show useful comparison information directly from "git diff". Would we be better with a file per results ID for example with a layout allowing "git diff" to be useful? A directory structure like:

selftest/<HOST_DISTRO>/
runtime/<DISTRO>/<MACHINE>
sdk/<MACHINE>/<SDKMACHINE>
esdk/<MACHINE>/<SDKMACHINE>
build-perf/<HOST_DISTRO>/

for example? I don't have the "right" answer here as I've not experimented with this but its something we need to think about.

As another example, the regression command can't compare the file from two different git commits easily, you have to copy one file out the git repo and then checkout the other version to compare them. This isn't easy for the user.

In summary, I think the base underlying tools here are heading the right way but the layout of the git repo and the way people are expected to use and interact with it needs a little bit more work...

Cheers,

Richard