[OE-core] [PATCH 1/1] glibc: Allow 64 bit atomics for x86

Fri Oct 30 16:02:22 UTC 2015

So I put my insomnia to a good use and I think I can explain everything.
The offending code is in nptl/sem_wait.c:

#if SHLIB_COMPAT (libpthread, GLIBC_2_0, GLIBC_2_1)
int
attribute_compat_text_section
__old_sem_wait (sem_t *sem)
{
  int *futex = (int *) sem;
  int err;

  do
    {
      if (atomic_decrement_if_positive (futex) > 0)
	return 0;

      /* Enable asynchronous cancellation.  Required by the standard.  */
      int oldtype = __pthread_enable_asynccancel ();

      /* Always assume the semaphore is shared.  */
      err = lll_futex_wait (futex, 0, LLL_SHARED);

      /* Disable asynchronous cancellation.  */
      __pthread_disable_asynccancel (oldtype);
    }
  while (err == 0 || err == -EWOULDBLOCK);

  __set_errno (-err);
  return -1;
}

This is where we were looping forever, since the value in futex was -2.
So "atomic_decrement_if_positive" was not executed, never reaching "return 0".
The reason why the value was -2 is because struct_new_sem has changed from glibc 2.20 to 2.221:

/* Semaphore variable structure.  */
struct new_sem
{
#if __HAVE_64B_ATOMICS
  /* The data field holds both value (in the least-significant 32 bytes) and
     nwaiters.  */
# if __BYTE_ORDER == __LITTLE_ENDIAN
#  define SEM_VALUE_OFFSET 0
# elif __BYTE_ORDER == __BIG_ENDIAN
#  define SEM_VALUE_OFFSET 1
# else
# error Unsupported byte order.
# endif
# define SEM_NWAITERS_SHIFT 32
# define SEM_VALUE_MASK (~(unsigned int)0)
  uint64_t data;
  int private;
  int pad;
#else
# define SEM_VALUE_SHIFT 1
# define SEM_NWAITERS_MASK ((unsigned int)1)
  unsigned int value;
  unsigned int nwaiters;
  int private;
  int pad;
#endif
};

In particular for __HAVE_64B_ATOMICS == 0 the field "value" has a different meaning now:
top 31 bits correspond to the value (shifted left by 1) and the bit 0 indicates if there are any nwaiters.
(I guess the reasoning was semaphore value cannot be negative and so it only needs 31 bits
and we can use the extra bit we saved for something else) 
So when we initialize the semaphore value with SEM_VALUE_MAX   (2147483647) we actually store 
0x7FFFFFFF << 1 which is 0xFFFFFFFE which is  (-2). 
(bitbake/pyhon/multiprocessing initialize some semaphores to SEM_VALUE_MAX)    
So the reason forcing  __HAVE_64B_ATOMICS to 1 works for x32 has not much to do with the 
64bit atomics as such, but because the "value" filed is defined differently, as a 32bit value.
Long story short,  the reason this is broken is that the "old code" uses new structures. 
(to fix this   "atomic_decrement_if_positive"  needs to be replaced by a call that understands
the new layout, similar to the "new code")

I will report  this to glibc developers and wait for their reply.

Juro

> -----Original Message-----
> From: Richard Purdie [mailto:richard.purdie at linuxfoundation.org]
> Sent: Friday, October 30, 2015 12:13 AM
> To: Randy Witt
> Cc: Bystricky, Juro; jurobystricky at hotmail.com; openembedded-
> core at lists.openembedded.org
> Subject: Re: [OE-core] [PATCH 1/1] glibc: Allow 64 bit atomics for x86
> 
> On Thu, 2015-10-29 at 15:29 -0700, Randy Witt wrote:
> >
> >
> > On Thu, Oct 29, 2015 at 3:20 PM, Richard Purdie
> > <richard.purdie at linuxfoundation.org> wrote:
> >         On Thu, 2015-10-29 at 13:43 -0700, Juro Bystricky wrote:
> >         > This patch fixes [YOCTO#8140].
> >         >
> >         > The fix consist of allowing 64bit atomic ops for x86.
> >         > This should be safe for i586 and newer CPUs.
> >         > It also makes the synchronization more efficient.
> >         >
> >         > Signed-off-by: Juro Bystricky <juro.bystricky at intel.com>
> >         > ---
> >         >  .../glibc/glibc/use_64bit_atomics.patch            | 24
> >         ++++++++++++++++++++++
> >         >  meta/recipes-core/glibc/glibc_2.22.bb              |  1 +
> >         >  2 files changed, 25 insertions(+)
> >         >  create mode 100644
> >         meta/recipes-core/glibc/glibc/use_64bit_atomics.patch
> >
> >         Since the patch is only changing nativesdk 32 bit x86 and we
> >         know that
> >         the 32 bit SDK is pretty broken at the moment I've merged this
> >         on the
> >         basis that it can't really make it any worse. There is
> >         pressure to move
> >         to the next rc candidate for 2.0.
> >
> >         I would like to get to the bottom of the real issue here and I
> >         still
> >         suspect 32 bit x86 images are likely broken too :/.
> >
> >
> > With Juro's change I built core-image-minimal using a 32-bit
> > buildtools on a 32-bit host(vm) and that appears to boot fine using
> > runqemu.
> 
> Right, but can we run bitbake in a qemux86 image (on target) or does it show
> the same issue as bug 8140? If it shows the same issue, futexes are likely
> bust on 32 bit x86 targets too.
> 
> Cheers,
> 
> Richard
> 
> 
> 
>