back




UpStare is a dynamic software updating system for multi-threaded applications. It discovered a new updating model called whole-program updating — and a new programming abstraction — that can apply live updates no other model can.

Live updates are the only way to keep some software continually running. The browser you are using to read this page and the server that prepared it will always need a live update. The alternative of waiting for software to restart happens often enough to cost tens of thousands of dollars per hour of downtime.

What makes UpStare different is that it updates live all the code and data of a program, not just parts of it. In particular, UpStare updates functions on the stack. You need this ability to update common multi-threaded server applications that can't be updated otherwise. If you were to update only parts of a program, without updating the stack, the update may take forever to finish safely and that practically means downtime. [1]

Although the approach is general for use in other programming languages, UpStare updates applications written in C. The updates are applied in three steps. First UpStare changes an application's source code automatically to make it updateable. Then it compiles the changed source code with an existing compiler like gcc to produce the executable file the user runs. Finally, when a new version of the source code becomes available, UpStare automatically generates an update patch to the next version. This update patch can be complemented by the user where necessary, and is applied easily on the running application at any time.

Whole-program updating in UpStare is implemented with a new mechanism called stack reconstruction. [2] This mechanism tears apart and restitches an application into a new version piece by piece. During this process the application does not exit, and network connections remain open. All functions on the stack are updated at the same time, which guarantees all active functions are of the same version after an update.

It wasn't as obvious in the beginning but stack reconstruction isn't just a mechanism. It's a programming abstraction. It is a means of doing easily something useful and hard in a programming language, similar to recursion or garbage collection.

Specifically, stack reconstruction is a means of modifying continuations: representations of the runtime state of a program. Modifying continuations is useful because you need to update the stack to apply some kinds of updates and the stack is part of the continuation. The stack also proved to be the hardest part to update without stopping a program that's running. It's more obvious now continuation updates help programs continue to run. [3]

Stack reconstruction is missing from programming languages. Existing languages don't modify continuations before they restore them. They don't make continuation internals accessible to be modified at such a fine grain. More generally, today's programming languages don't update whole programs live.

Though powerful, it's hard to tell how important stack reconstruction will turn out to be. Powerful programming abstractions like macros and continuations are rare. Most of them were discovered more than forty years ago. They also tend to stretch the way average programmers think to such a degree that the most powerful abstractions are used scarcely. It takes good programmers to use them. If an abstraction surfaced now that was used by good programmers it wouldn't just be powerful; it'd be a big deal.

Dynamic updates UpStare applied successfully include:

    Recursion. Updating an application computing recursively Fibonacci numbers, while nested deep in the stack, to report additional information when the recursion unrolls.

    Network sockets. Updating a server application while serving multiple clients without closing the network socket.

    Multi-threaded applications. Updating the main function body executed by multiple threads of an application. Also, updating in a producer/consumer multi-threaded application only the consumer threads while the producer threads remained unmodified.

    Multi-nested long-lived loops. Updating in the middle of executing Bubblesort, a multi-nested long-lived loop, to continue executing from the middle of a different multi-nested long-lived loop implementing Selectionsort while reusing the existing program state. Additionally, updating from Bubblesort to Heapsort, which is a drastically different sorting algorithm executing over different loop iterators.

    vsFTPd. Applying 13 updates spanning 5.5 years of the multi-process (forked processes do not communicate with each other) vsFTPd server (about 12,000 lines of code).

    PostgreSQL. Updating the multi-process (forked processes communicate with each other) PostgreSQL 7.4.16 database server (more than 200,000 lines of code).


The software is available here and there is a user manual [HTML single] [HTML multiple] [PDF]. There are also papers:

    Whole-Program Dynamic Software Updating, Kristis Makris, Ph.D Dissertation, December 2009. [PDF] [BibTex] 

    Dynamic Software Updates: The State Mapping Problem. Rida A. Bazzi, Kristis Makris, Peyman Nayeri, Jun Shen, The 2nd ACM Workshop on Hot Topics in Software Upgrades (HotSWUp '09), October 2009. [PDF] [BibTex] [presentation slides] 

    Immediate Multi-Threaded Dynamic Software Updates Using Stack Reconstruction. Kristis Makris, Rida A. Bazzi, USENIX 2009, June 2009. [PDF] [BibTex] [presentation slides]

This work led in-part to an NSF grant through DUOSS at ASU, and was supported in-part by the grant. [4]







Notes:

[1]  Safe, partial program updates may take forever to finish because it's not enough for new code to update a program's current state to the new version. The old code must also be careful not to process the new state incorrectly. This means an update may have to wait until no old code is waiting to run on the stack.

Waiting often works, but the problem is the wait duration is uncertain. Old code could wait a lot until long-lived computational loops finish execution, or wait indefinitely until a blocking system call returns. If blocked on an accept() or read(), old code may never return. At high concurrency with many threads, the chances of a safe, partial program update can drop to zero.

UpStare on the other hand doesn't need to wait. It can break out of blocking system calls and loops at any time. And it can update functions and data active on the stack at any time. This lets UpStare apply updates not only immediately but also as an atomic transaction: before an update only old code is running, after the update only new code is running, and at no time do old and new code both run at the same time.

Updates with UpStare are atomic and with bounded delay.

[2]  The earliest form of stack reconstruction was applied for efficient large-scale process-oriented parallel simulations by Kalyan S. Perumalla and Richard M. Fujimoto in 1998.

[3]  If this were a Lisp, and without having access to Scheme's call/cc control operator, we could get the same advantages as stack reconstruction if we wrote macros or a code walker. This would transform programs not written in continuation-passing style to capture and modify their continuations, which is what UpStare does.

[4]  This material is based upon work supported by the National Science Foundation under Grant Number CSR-0849980. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.