[OE-core] [PATCH 1/1] useradd_base.bbclass: sleep more and more seconds (up to 10)

Robert Yang liezhi.yang at windriver.com
Thu Apr 3 09:59:36 UTC 2014


Currently, it would sleep 1 second when fail to add the user, this maybe
not enough when we use the sstate cache, as my test shows below, nearly
all the useradd actions are doing in the same minute when mirror from
ssate cache, and it would fail when the load is high, I got these time
by adding strace before the useradd for debugging:

2014-03-31 14:48:22.978079781 +0800 /tmp/log/pulseaudio.4.c
2014-03-31 14:48:22.028079813 +0800 /tmp/log/pulseaudio.1.c
2014-03-31 14:48:21.949079816 +0800 /tmp/log/pulseaudio.3.c
2014-03-31 14:48:20.903079852 +0800 /tmp/log/pulseaudio.2.c
2014-03-31 14:48:20.006079883 +0800 /tmp/log/nfs-utils.9.c
2014-03-31 14:48:18.876079923 +0800 /tmp/log/xuser-account.9.c
2014-03-31 14:48:18.824079924 +0800 /tmp/log/pulseaudio.0.c
2014-03-31 14:48:17.826079959 +0800 /tmp/log/xuser-account.8.c
2014-03-31 14:48:17.766079961 +0800 /tmp/log/nfs-utils.8.c
2014-03-31 14:48:16.794079995 +0800 /tmp/log/xuser-account.7.c
2014-03-31 14:48:16.735079997 +0800 /tmp/log/nfs-utils.7.c
2014-03-31 14:48:14.719080066 +0800 /tmp/log/xuser-account.5.c
2014-03-31 14:48:14.677080068 +0800 /tmp/log/nfs-utils.5.c
2014-03-31 14:48:12.621080139 +0800 /tmp/log/nfs-utils.3.c
2014-03-31 14:48:11.589080175 +0800 /tmp/log/nfs-utils.2.c
2014-03-31 14:48:10.242080221 +0800 /tmp/log/builder.0.c
2014-03-31 14:48:09.523080246 +0800 /tmp/log/nfs-utils.0.c
2014-03-31 14:48:09.488080248 +0800 /tmp/log/openssh.0.c
2014-03-31 14:48:09.485080248 +0800 /tmp/log/rpcbind.1.c
2014-03-31 14:48:07.590080313 +0800 /tmp/log/rpcbind.0.c
2014-03-31 14:28:15.437121590 +0800 /tmp/log/avahi.0.c
2014-03-31 14:18:19.067142238 +0800 /tmp/log/dbus.0.c

The nfs-utils and xuser-account are failed to add the user.

The useradd command needs two locks, passwd.lock and group.lock, it may
get one, but can't get another one if we look into these .c files, sleep
1 second is not enough, it needs more seconds, the reason is that, if
succeed, it doesn't have any side effects, if failed, we need wait for
more seconds rather than make it more crowding.

I've tried to use "sleep 5", but it didn't make much better since they
would sleep and wake up nearly at the same time, I also tried to use
"sleep <RANDOM seconds between 1 and 10>", that didn't make much better
,either.

I think that a better ways is sleep more and more seconds (up to 10
seconds) when failed, this can't fix the problem that they may do the
actions at the same time, but the logic is: if it is not crowding, sleep
less time should be OK, otherwise sleep more and more time.

Here is the testing result which seems much better:
2014-04-03 14:09:56.605185284 +0800 dbus.0.c
2014-04-03 14:09:39.899185862 +0800 rpcbind.5.c
2014-04-03 14:09:38.400185914 +0800 distcc.4.c
2014-04-03 14:09:35.206186025 +0800 pulseaudio.1.c
2014-04-03 14:09:33.979186067 +0800 rpcbind.4.c
2014-04-03 14:09:33.364186089 +0800 pulseaudio.0.c
2014-04-03 14:09:33.360186089 +0800 distcc.3.c
2014-04-03 14:09:30.996186171 +0800 avahi-ui.0.c
2014-04-03 14:09:30.298186195 +0800 distcc.2.c
2014-04-03 14:09:29.905186208 +0800 rpcbind.3.c
2014-04-03 14:09:29.410186226 +0800 avahi-ui.2.c
2014-04-03 14:09:28.239186266 +0800 distcc.1.c
2014-04-03 14:09:27.298186299 +0800 xuser-account.0.c
2014-04-03 14:09:27.032186308 +0800 distcc.0.c
2014-04-03 14:09:26.836186315 +0800 rpcbind.2.c
2014-04-03 14:09:25.846186349 +0800 nfs-utils.1.c
2014-04-03 14:09:25.752186352 +0800 avahi-ui.1.c
2014-04-03 14:09:24.779186386 +0800 builder.0.c
2014-04-03 14:09:24.746186387 +0800 rpcbind.1.c
2014-04-03 14:09:23.916186416 +0800 openssh.1.c
2014-04-03 14:09:23.848186418 +0800 nfs-utils.0.c
2014-04-03 14:09:23.594186427 +0800 rpcbind.0.c
2014-04-03 14:09:22.609186461 +0800 ppp-dialin.0.c
2014-04-03 14:09:21.817186488 +0800 openssh.0.c

[YOCTO #6085]

Signed-off-by: Robert Yang <liezhi.yang at windriver.com>
---
 meta/classes/useradd_base.bbclass | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/meta/classes/useradd_base.bbclass b/meta/classes/useradd_base.bbclass
index 7aafe29..01d2e99 100644
--- a/meta/classes/useradd_base.bbclass
+++ b/meta/classes/useradd_base.bbclass
@@ -24,7 +24,7 @@ perform_groupadd () {
 			group_exists="`grep "^$groupname:" $rootdir/etc/group || true`"
 			if test "x$group_exists" = "x"; then
 				bbwarn "groupadd command did not succeed. Retrying..."
-				sleep 1
+				sleep `expr $count + 1`
 			else
 				break
 			fi
@@ -52,7 +52,7 @@ perform_useradd () {
 		       user_exists="`grep "^$username:" $rootdir/etc/passwd || true`"
 		       if test "x$user_exists" = "x"; then
 			       bbwarn "useradd command did not succeed. Retrying..."
-			       sleep 1
+			       sleep `expr $count + 1`
 		       else
 			       break
 		       fi
@@ -90,7 +90,7 @@ perform_groupmems () {
 			mem_exists="`grep "^$groupname:[^:]*:[^:]*:\([^,]*,\)*$username\(,[^,]*\)*" $rootdir/etc/group || true`"
 			if test "x$mem_exists" = "x"; then
 				bbwarn "groupmems command did not succeed. Retrying..."
-				sleep 1
+				sleep `expr $count + 1`
 			else
 				break
 			fi
@@ -126,7 +126,7 @@ perform_groupdel () {
 			group_exists="`grep "^$groupname:" $rootdir/etc/group || true`"
 			if test "x$group_exists" != "x"; then
 				bbwarn "groupdel command did not succeed. Retrying..."
-				sleep 1
+				sleep `expr $count + 1`
 			else
 				break
 			fi
@@ -154,7 +154,7 @@ perform_userdel () {
 		       user_exists="`grep "^$username:" $rootdir/etc/passwd || true`"
 		       if test "x$user_exists" != "x"; then
 			       bbwarn "userdel command did not succeed. Retrying..."
-			       sleep 1
+			       sleep `expr $count + 1`
 		       else
 			       break
 		       fi
@@ -184,7 +184,7 @@ perform_groupmod () {
 			eval $PSEUDO groupmod $opts
 			if test $? != 0; then
 				bbwarn "groupmod command did not succeed. Retrying..."
-				sleep 1
+				sleep `expr $count + 1`
 			else
 				break
 			fi
@@ -214,7 +214,7 @@ perform_usermod () {
 		       eval $PSEUDO usermod $opts
 		       if test $? != 0; then
 			       bbwarn "usermod command did not succeed. Retrying..."
-			       sleep 1
+			       sleep `expr $count + 1`
 		       else
 			       break
 		       fi
-- 
1.8.3.1




More information about the Openembedded-core mailing list