UpStare is a dynamic software updating system for multi-threaded applications that applies updates atomically and with bounded delay. It discovered a new updating model called whole-program updating — and a new programming abstraction. UpStare applies live updates no other system can.

Live updates are the only way to keep some software continually running. The browser you are using to read this page and the server that prepared it will always need a live update. The alternative of waiting for software to restart happens often enough to cost tens of thousands of dollars per hour of downtime.

What makes UpStare different is that it updates live all the code and data of a program, not just parts of it. In particular, UpStare updates functions on the stack. You need this ability to update common multi-threaded server applications that can't be updated otherwise. If you were to update only parts of a program, without updating the stack, the update could take forever to finish safely and that practically means downtime. [1]

Although the approach is general for use in other programming languages, UpStare updates applications written in C. The updates are applied in three steps. First UpStare changes an application's source code automatically to make it updateable. Then it compiles the changed source code with an existing compiler like gcc to create the executable file the user runs. Finally, when a new version of the source code becomes available, UpStare generates automatically an update patch to the next version. This update patch can be changed by the user where necessary, and is applied easily on the live application at any time.

Whole-program updating in UpStare is implemented with a new approach called stack reconstruction. [2] This approach tears apart and restitches an application into a new version piece by piece. While this happens the application doesn't exit, and network connections stay open. All functions on the stack are updated at the same time, which guarantees all active functions are of the same version after an update.

It wasn't as obvious in the beginning but stack reconstruction is a new programming abstraction. It's a way to do easily something useful and hard in a programming language, similar to recursion or garbage collection.

Specifically, stack reconstruction is a way to modify continuations: representations of the runtime state of a program. Modifying continuations is useful because you need to update the stack to apply some kinds of updates and the stack is part of the continuation. The stack also proved to be the hardest part to update without stopping a program that's running. It's more obvious now continuation updates help programs continue to run. [3]

Stack reconstruction is missing from programming languages. Existing languages don't modify continuations before they restore them. They don't make continuation internals accessible to be modified at such a fine grain. More generally, today's programming languages don't update whole programs live.

Though powerful, it's hard to tell how important stack reconstruction will turn out to be. Powerful programming abstractions like macros and continuations are rare. Most of them were discovered more than forty years ago. They also tend to stretch the way average programmers think to such a degree that the most powerful abstractions are used scarcely. It takes good programmers to use them. If an abstraction surfaced now that was used by good programmers it wouldn't just be powerful; it'd be a big deal.

Dynamic updates UpStare applied successfully include:

    Recursion. Updating an application computing recursively Fibonacci numbers, while nested deep in the stack, to report additional information when the recursion unrolls.

    Network sockets. Updating a server application while serving multiple clients without closing the network socket.

    Multi-threaded applications. Updating the main function body executed by multiple threads of an application. Also, updating in a producer/consumer multi-threaded application only the consumer threads while the producer threads remained unmodified.

    Multi-nested long-lived loops. Updating in the middle of executing Bubblesort, a multi-nested long-lived loop, to continue executing from the middle of a different multi-nested long-lived loop implementing Selectionsort while reusing the existing program state. Additionally, updating from Bubblesort to Heapsort, which is a drastically different sorting algorithm executing over different loop iterators.

    vsFTPd. Applying 13 updates spanning 5.5 years of the multi-process (forked processes do not communicate with each other) vsFTPd server (about 12,000 lines of code).

    PostgreSQL. Updating the multi-process (forked processes communicate with each other) PostgreSQL 7.4.16 database server (more than 200,000 lines of code).

The software is available here and there is a user manual [HTML single] [HTML multiple] [PDF]. There are also papers:

    Whole-Program Dynamic Software Updating, Kristis Makris, Ph.D Dissertation, December 2009. [PDF] [BibTex] 

    Dynamic Software Updates: The State Mapping Problem. Rida A. Bazzi, Kristis Makris, Peyman Nayeri, Jun Shen, The 2nd ACM Workshop on Hot Topics in Software Upgrades (HotSWUp '09), October 2009. [PDF] [BibTex] [presentation slides] 

    Immediate Multi-Threaded Dynamic Software Updates Using Stack Reconstruction. Kristis Makris, Rida A. Bazzi, USENIX 2009, June 2009. [PDF] [BibTex] [presentation slides]

This work led in-part to an NSF grant through DUOSS at ASU, and was supported in-part by the grant. [4]


[1]  With a partial update both the new and the old code must offer the guarantee of safety, and this is harder to do. It's not enough if new code just updates a program's current state to the new version. Old code has to be careful too: old code shouldn't process the new state incorrectly. So an update has to wait until no old code is waiting to run on the stack that could break the new state.

Waiting works in many cases, but the wait duration is unbound. Old code could wait a lot until a long-lived computational loop finishes. Or wait a lot until a blocking system call returns; if old code blocks on an accept() or read() it may never return. In both cases this practically means downtime. The chances of a safe, partial program update are reduced further at high concurrency with many threads.

UpStare on the other hand doesn't need to wait. It breaks out of loops and blocking system calls at any time. And it updates functions and data active on the stack at any time. This lets UpStare apply updates not just immediately but also as an atomic transaction: before an update only old code is running, after the update only new code is running, and at no time do old and new code both run at the same time.

So updates with UpStare are atomic and with bounded delay.

[2]  The earliest form of stack reconstruction was applied for efficient large-scale process-oriented parallel simulations by Kalyan S. Perumalla and Richard M. Fujimoto in 1998.

[3]  If this were a Lisp, and without having access to Scheme's call/cc control operator, we could get the same advantages as stack reconstruction by writing a code walker that transforms programs not written in continuation-passing style to capture and modify their continuations. Which is what UpStare does.

[4]  This material is based upon work supported by the National Science Foundation under Grant Number CSR-0849980. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.