[OE-core] [PATCH 1/2] systemd: test target boot time against unit timeout
Benjamin Esquivel
benjamin.esquivel at linux.intel.com
Fri Oct 23 15:57:21 UTC 2015
This reminds me I haven't been able to parse your logs to check your
data. If what you say is true, that the failure is presented when the
system (kernel + userspace) booted in less than 90 secs, then I can
make 2 observations:
1. your data and Randy's diverge, Randy's data has every systemd
failure tied to a timeout message and a boot time greater than the
unit's timeout. I would happily investigate further, if you send me
your logs.
2. my test would not be really as effective as I think to catch the
condition.
Things to take into account:
A. I'm using Randy's data to compare against yours because you both
used the same method. I have my data that I obtained with a different
method that supports my point too.
B. (Randy/Brian)'s method and Benjamin's method threw data that they 3
assume is the condition that made the autobuilder's fail, while I don't
say they are wrong, this is not data coming out of the actual failures
of the autobuilder. Having that said, I would not disprove your
nice/ionice change nor adding this test to check if it can be a single
trap for the systemd errors which right now have too many faces.
Benjamin
On Thu, 2015-10-22 at 23:10 +0000, Avery, Brian wrote:
> The problem with this is one of the runs I did had a lot of systemd
> timeout failures but the stats were like:
>
> Fails:
> min startup time: 54 sec
> avg startup time: 114 sec
> max startup time: 260 sec
>
> Successes:
> min startup time: 46 sec
> avg startup time: 110 sec
> max startup time: 266 sec
>
> so I¹m not sure that the time is a good indicator.
>
> These numbers are pulled from the jounalctl line in the
> ssh_target_log
> file that looks like:
>
> Oct 21 11:19:40 qemux86 systemd[1]: Startup finished in 42.255s
> (kernel) +
> 3min 10.075s (userspace) = 3min 52.331s.
>
>
> -b
>
>
>
> On 10/22/15, 3:45 PM, "Benjamin Esquivel"
> <benjamin.esquivel at linux.intel.com> wrote:
>
> > the new test:
> > oeqa.runtime.systemd.SystemdJournalTests.test_systemd_boot_time
> > will query the target boot time from journactl and will
> > compare it against the default systemd's unit start timeout:
> > (TimeoutStartSec)
> >
> > if the boot time is greater than the timeout, it will fail
> >
> > the reason is because once the boot time exceeds the unit start
> > timeout then it is uncertain that the systemd units are able to
> > properly come up and function
> >
> > when this test fails, the rest of the systemd units (services)
> > should be
> > skipped
> >
> > this test also prints the startup time in the test log like:
> > ...
> > test_systemd_boot_time (oeqa.runtime.systemd.SystemdJournalTests)
> > ...
> > Startup finished in 6.922s (kernel) + 52.089s (userspace) =
> > 59.011s.
> > ...
> >
> > [YOCTO#8141]
> >
> > Signed-off-by: Benjamin Esquivel <benjamin.esquivel at linux.intel.com
> > >
> > ---
> > meta/lib/oeqa/runtime/systemd.py | 73
> > ++++++++++++++++++++++++++++++++++++++++
> > 1 file changed, 73 insertions(+)
> >
> > diff --git a/meta/lib/oeqa/runtime/systemd.py
> > b/meta/lib/oeqa/runtime/systemd.py
> > index c74394c..251d06e 100644
> > --- a/meta/lib/oeqa/runtime/systemd.py
> > +++ b/meta/lib/oeqa/runtime/systemd.py
> > @@ -21,6 +21,33 @@ class SystemdTest(oeRuntimeTest):
> > self.assertEqual(status, expected, message)
> > return output
> >
> > + def journalctl(self, args='',l_match_units=[]):
> > + """
> > + Request for the journalctl output to the current target
> > system
> > +
> > + Arguments:
> > + -args, an optional argument pass through argument
> > + -l_match_units, an optional list of units to filter the
> > output
> > + Returns:
> > + -string output of the journalctl command
> > + Raises:
> > + -AssertionError, on remote commands that fail
> > + -ValueError, on a journalctl call with filtering by
> > l_match_units that
> > + returned no entries
> > + """
> > + query_units=""
> > + if len(l_match_units):
> > + query_units = ['_SYSTEMD_UNIT='+unit for unit in
> > l_match_units]
> > + query_units = " ".join(query_units)
> > + command = 'journalctl %s %s' %(args, query_units)
> > + status, output = self.target.run(command)
> > + if status:
> > + raise AssertionError("Command '%s' returned non-zero
> > exit \
> > + code %d:\n%s" % (command, status, output))
> > + if len(output) == 1 and "-- No entries --" in output:
> > + raise ValueError("List of units to match: %s, returned
> > no
> > entries"
> > + % l_match_units)
> > + return output
> >
> > class SystemdBasicTests(SystemdTest):
> >
> > @@ -99,3 +126,49 @@ class SystemdJournalTests(SystemdTest):
> > def test_systemd_journal(self):
> > (status, output) = self.target.run('journalctl')
> > self.assertEqual(status, 0, output)
> > +
> > + @skipUnlessPassed('test_systemd_basic')
> > + def test_systemd_boot_time(self, systemd_TimeoutStartSec=90):
> > + """
> > + Compare the target boot time from journalctl against
> > TimeoutStartSec
> > +
> > + Arguments:
> > + -systemd_TimeoutStartSec, an optional argument containing
> > systemd's
> > + unit start timeout
> > + """
> > +
> > + expr_items=["Startup finished","kernel",
> > "userspace","\.$"]
> > + try:
> > + output = self.journalctl(args="-o cat --reverse")
> > + except AssertionError:
> > + self.fail("Error occurred while calling journalctl")
> > + if not len(output):
> > + self.fail("Error: unable to obtain the startup time
> > from\
> > + systemd journal")
> > +
> > + # check for the regular expression items that match the
> > startup
> > time
> > + for line in output.split('\n'):
> > + check_match =
> > "".join(re.findall(".*".join(expr_items),
> > line))
> > + if check_match: break
> > + # put the startup time in the test log
> > + if check_match:
> > + print "\n%s" % check_match
> > + else:
> > + self.fail("Error while obtaining the boot time from
> > journalctl")
> > + boot_time_sec = 0
> > +
> > + # get the numeric values from the string and convert them
> > to
> > seconds
> > + # same data will be placed in list and string for
> > manipulation
> > + l_boot_time = check_match.split(" ")[-2:]
> > + s_boot_time = " ".join(l_boot_time)
> > + # Obtain the minutes it took to boot
> > + if l_boot_time[0].endswith('min') and
> > l_boot_time[0][0].isdigit():
> > + boot_time_min = s_boot_time.split("min")[0]
> > + # convert to seconds and accumulate it
> > + boot_time_sec += int(boot_time_min) * 60
> > + # Obtain the seconds it took to boot and accumulate
> > + boot_time_sec += float(l_boot_time[1].split("s")[0])
> > + #Assert the target boot time against systemd's unit start
> > timeout
> > + msg = "Target boot time %s exceeds systemd's
> > TimeoutStartSec %s"\
> > + %(boot_time_sec, systemd_TimeoutStartSec)
> > + self.assertTrue(boot_time_sec < systemd_TimeoutStartSec,
> > msg =
> > msg)
More information about the Openembedded-core
mailing list