TL;DL

  • Having a solid library foundation actually matters.
  • We have a long way to go, but there is hope!
  • This is an exciting area to get involved and make things better.

Motivations

  • Improving the quality of "Linux on the Desktop".
  • Laying the groundwork for a native mobile ecosystem.
  • Becoming competitive in "hard" realtime/embedded.

Concept of Library-Safe Code

  • Analogous to concepts like thread-safe and async-signal-safe.
  • Formal definition of library-safe is hard.
  • The basic idea: We want library-safe code to usable by:
    • an arbitary application, OR
    • another library
    without placing unreasonable restrictions on the application in which the library code is being used.
  • Code which is not thread-safe cannot be library-safe, but library-safe is a much stronger requirement.

Common issues that make code
non-Library-Safe

The offenses and offenders

Common Issues that make code non-LS

Lack of thread-safety

  • Race conditions
  • Use of non-thread-safe libc functions
  • Unsynchronized access to global data
    • Special-case: library initialization functions

Common Issues that make code non-LS

Library initialization functions

The general pattern:

void mylib_init()
{
    static int initialized = 0;
    if (initialized) return;
    initialized = 1;
    /* do stuff */
}

Major offenders:

  • FFmpeg/libav libraries (libavcodec, etc.)
  • libxml2
    • Almost 300 other libs on Debian depend on it!
    • It tries to be thread-safe but botched init.

Common Issues that make code non-LS

Lack of per-caller context

  • The basic issue:
    • Even in a single-threaded program, there may be more than one program component (or more than one other library) using your library.
  • Typical offenses:
    • Arguments or state in global variables
    • Library configuration in global variables, especially hooks:
      • Error handler functions
      • Logging functions
  • Major offenders:
    • FFmpeg/libav libraries: av_log handlers
    • glib: g_mem_set_vtable, which happens to be the only way to inhibit abort on malloc failure in glib.

Common Issues that make code non-LS

Continued

  • Leaking file descriptors on exec:
    • Failure to set close-on-exec flag
    • Or, setting it non-atomically
    • Major offender: gcrypt (from GPG)
  • Clobbering of application's global state, for example:
    • Locale
      • Major offender: GTK+
    • Signal handlers
    • PRNG state

Common Issues that make code non-LS

Aborting or crashing the caller

  • Typical causes:
    • Breaking interface contacts
    • Unbounded stack usage due to alloca or variable-length arrays.
    • Failure to check return value of functions that can fail.
    • Intentionally aborting when another function fails, rather than reporting failure to the caller.
  • Major offenders:
    • GNU libc (glibc)
    • glib
    • GNU multi-precision library (gmp)

Prevailing attitudes towards OOM

  • The Myth: malloc never returns NULL.
  • Based on naive understanding of overcommit
  • Common library behaviors:
    • Not checking the return value of malloc at all.
    • Calling abort() when malloc fails.

But malloc can fail

  • On a 32-bit system, when the process already consumed 3GB.
  • When resource-usage policies (like ulimit) are in effect.
  • When you pass a huge argument to malloc.
  • When overcommit is turned off and physical resources (RAM+swap) are exhausted.

Why library OOM handling matters

  • Software which must survive OOM
    • Core system components
    • Hard embedded/realtime
  • Software whose data must survive OOM
  • Software that deals with abusively large inputs
  • Software that runs under constrained resource limits
    • Traditional ulimit
    • Linux Containers
    • Laying the groundwork for native mobile ecosystem: resource limits, virtualization, and sandboxing are key.

Obstacles to Good Library Code

  • Getting Stuff Done™
  • Cargo-culting
  • Weakest-link concept
  • Hopelessness

glibc

A weak foundation for the library stack

glibc: Dynamic thread-local storage OOM

elf/dl-tls.c:

static void
__attribute__ ((__noreturn__))
oom (void)
{
  _dl_fatal_printf ("cannot allocate memory for thread-local data: ABORT\n");
}
                  if (dtv == GL(dl_initial_dtv))
                    {
                      /* Comment snipped */
                      newp = malloc ((2 + newsize) * sizeof (dtv_t));
                      if (newp == NULL)
                        oom ();
                      memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t));
                    }
                  else
                    {
                      newp = realloc (&dtv[-1],
                                      (2 + newsize) * sizeof (dtv_t));
                      if (newp == NULL)
                        oom ();
                    }

glibc: abort in pthread_cancel

nptl/sysdeps/pthread/unwind-forcedunwind.c:

  handle = __libc_dlopen (LIBGCC_S_SO);

  if (handle == NULL
      || (resume = __libc_dlsym (handle, "_Unwind_Resume")) == NULL
      || (personality = __libc_dlsym (handle, "__gcc_personality_v0")) == NULL
      || (forcedunwind = __libc_dlsym (handle, "_Unwind_ForcedUnwind"))
         == NULL
      || (getcfa = __libc_dlsym (handle, "_Unwind_GetCFA")) == NULL
#ifdef ARCH_CANCEL_INIT
      || ARCH_CANCEL_INIT (handle)
#endif
      )
    __libc_fatal (LIBGCC_S_SO " must be installed for pthread_cancel to work\n");

glibc: Thread cancellability race conditions

Open bug #12683

Example: sysdeps/unix/sysv/linux/readv.c:

ssize_t
__libc_readv (fd, vector, count)
     int fd;
     const struct iovec *vector;
     int count;
{
  ssize_t result;

  if (SINGLE_THREAD_P)
    result = INLINE_SYSCALL (readv, 3, fd, vector, count);
  else
    {
      int oldtype = LIBC_CANCEL_ASYNC ();

      result = INLINE_SYSCALL (readv, 3, fd, vector, count);

      LIBC_CANCEL_RESET (oldtype);
    }

glibc: Overflows in wprintf

Open bug #14286

Examples:

  • wprintf(L"%d\n", wprintf(L"%s", hello_1gb_plus_1)); // gb = 1<<30
  • h1
  • wprintf(L"%d\n", wprintf(L"%s", hello_1gb_plus_5));
  • hello5

The cause?

stdio-common/vfprintf.c:

            /* Allocate dynamically an array which definitely is long         \
               enough for the wide character version.  Each byte in the       \
               multi-byte string can produce at most one wide character.  */  \
            if (__libc_use_alloca (len * sizeof (wchar_t)))                   \
              string = (CHAR_T *) alloca (len * sizeof (wchar_t));            \
            else if ((string = (CHAR_T *) malloc (len * sizeof (wchar_t)))    \
                     == NULL)                                                 \

glibc: Unchecked alloca

Some currently-open bugs:

  • Bug 15670 - Unchecked alloca in __tzfile_read
  • Bug 14752 - Unsafe use of alloca in shm_open
  • Bug 14547 - strcoll integer / buffer overflow, Comment 4
  • Bug 14806 - stack overflow in getaddrinfo() when host has many addresses

glibc: On the bright side...

  • Proposals for fixing all alloca usage in glibc are in the works.
  • New glibc maintainership is much more:
    • Friendly
    • Easy to work with
    • Willing to fix long-standing bugs

musl

Modern Userspace Standard Library

What is musl?

  • A new general-purpose implementation of the standard library (libc)
    • Math library (libm)
    • POSIX threads (libpthread)
    • Dynamic linker (ldso)
  • MIT licensed
  • Supports static and dynamic linking
  • Consistent quality and behavior from tiny embedded systems to full-fledged servers
    • Less than 300 lines of mandatory arch-specific asm
  • Can be used in place of:
    • GNU libc (glibc) on servers and desktop systems
    • Bionic (Android) on mobile devices
    • uClibc in embedded systems

musl as a foundation

for a robust library stack

  • Low-memory / resource-exhaustion conditions never fatal
  • No dynamic allocation in functions which otherwise could not fail
  • No unrecoverable late failures
    • No lazy bindings
    • No lazy TLS allocation
  • Designed for realtime-quality robustness
  • New, race-free thread cancellation design
  • Atomic in-place shared libc upgrades
  • Compatible with most existing libraries and applications
    • Even many ugly non-portable ones written for glibc

What musl can't do

  • Fix the rest of the library stack.

But...

  • Maybe our community can!

What musl can do

  • Provide a solid foundation
    • High-reliability embedded
    • New mobile platforms
  • Provide an incentive to write solid libraries
    • Now there's at least one Linux-based platform where third-party libraries are the weakest link in the stack.
  • Provide examples of library-safe code and techniques

A long way to go

Getting involved

  • With the musl community
  • Fixing issues in glibc
    • Current maintainers and community are eagar to fix bugs, but lack resources.
  • Reporting and fixing bugs in other libraries.

Diplomacy matters!

  • Nobody wants to be told their software is "broken", "sucks", etc.
  • Nobody likes having their language skills policed.
    • "Library-safe" should be about building a social contract for better software, not policing people's coding.
  • Try to build allies rather than just wearing down people's resistance.
    • To "win", we need allies anyway.

Further practical tips for diplomacy

  • A bug report with a patch is always better.
  • Go out of your way to make fixing the bug a win-win situation.
  • Have an example of code you've written or plan to write that's affected by the library issue you're reporting.

Thank you

www.musl-libc.org