A Taste of Computer Security
Sandboxing
Sandboxing is a popular technique for creating confined execution environments, which could be be used for running untrusted programs. A sandbox limits, or reduces, the level of access its applications have — it is a container.
If a process P runs a child process Q in a sandbox, then Q's privileges would typically be restricted to a subset of P's. For example, if P is running on the "real" system (or the "global" sandbox), then P may be able to look at all processes on the system. Q, however, will only be able to look at processes that are in the same sandbox as Q. Barring any vulnerabilities in the sandbox mechanism itself, the scope of potential damage caused by a misbehaving Q is reduced.
Sandboxes have been of interest to systems researchers for a long time. In his 1971 paper titled Protection, Butler Lampson provided an abstract model highlighting properties of several existing protection and access-control enforcement mechanisms, many of which were semantic equivalents (or at least semantic relatives) of sandboxing. Lampson used the word "domain" to refer to the general idea of a protection environment.
There were several existing variants of what Lampson referred to as domains: protection context, environment, ring, sphere, state, capability list, etc.
Historical Example: Hydra
Hydra [1975] was a rather flexible capability-based protection system. One of its fundamental principles was the separation of policy and mechanism. A protection kernel provided mechanisms to implement policies encoded in user-level software that communicated with the kernel. Hydra's philosophy could be described as follows:
- It is easier to protect information if it is divided into distinct objects.
- Assign a particular type to each object. Some types (such as Procedure and Process) are provided natively by Hydra, but mechanisms exist for creating new types of objects. Hydra provides generic (type-independent) operations to manipulate each object, in addition to mechanisms for defining type-specific operations.
- Capabilities are used to determine if access to an object is allowed. Objects do not have "owners".
- Each program should have no more access rights than that are absolutely necessary.
- Hydra abstracted the representation and implementation of operations for each object type in modules called subsystems. A user could only access an object through the appropriate subsystem procedures. Moreover, Hydra provided rights amplification — wherein a called procedure could have greater rights in the new domain (created for the invocation) than it had in the original (caller's) domain.
Hydra was relevant from several (related) perspectives: protection, sandboxing, security, virtualization, and even quality of service. The Hydra kernel's component for handling process scheduling was called the Kernel Multiprocessing System (KMPS). While parametrized process schedulers existed at that time, Hydra provided more flexibility through user-level schedulers. In fact, multiple schedulers could run concurrently as user-level processes called Policy Modules (PMs). The short-term scheduling decisions made in the kernel (by KMPS) were based on parameters set by the PMs. There was also provision for having a "guarantee algorithm" for allocating a rate guarantee (a fixed percentage of process cycles over a given period) to each PM.
Varieties
Today sandboxes are used in numerous contexts. Sandbox environments range from those that look like a complete operating environment to applications within, to those that provide a minimum level of isolation (to a few system calls, say). Some common sandbox implementation categories include:
- Virtual machines (VMs) can be used to provide runtime sandboxes, which are helpful in enhancing users' level of confidence in the associated computing platform. The Java virtual machine is an example, although it can only be used to run Java programs (and is therefore not general purpose).
Virtualization
A more detailed discussion on virtualization, including several example of sandboxing mechanisms, is available in An Introduction to Virtualization.
- Applications could contain a sandbox mechanism within themselves. Proof-carrying code (PCC) is a technique used for safe execution of untrusted code. In the PCC system, the recipient of such code has a set of rules guaranteeing safe execution. Proof-carrying code must be in accordance with these safety rules (or safety policies), and carries a formal proof of this compliance. Note that the producer of such code is untrusted, so the proof must be validated by the recipient before code execution. The overall system is such that the programs are tamperproof — any modification will either render the proof invalid, or in case the proof still remains valid, it may not be a safety proof any more. In the case where the proof remains valid and is a safety proof, the code may be executed with no harmful effects.
- A layer could be implemented in user-space, perhaps as a set of libraries that intercept library calls (in particular, invocations of system call stubs) and enforce sandboxing. A benefit of this approach is that you do not need to modify the operating system kernel (which might be operationally, technically, or politically disruptive to deployment). However, this approach might not yield a secure, powerful, or flexible enough sandboxing mechanism, and is often frowned upon by "the kernel people".
- The sandboxing layer could be implemented within the operating system kernel.
Ideally you would design a system with explicit support for sandboxing, but it is often more practical to retrofit sandboxing into existing systems. Intercepting system calls is a common approach to creating sandboxes.
Intercepting System Calls
A common technique used for retrofitting sandboxing mechanisms is the "taking over", or interception of system calls. While such implementation is a powerful tool and is trivial to implement on most systems (usually without any kernel modifications, such as through loadable modules), it could, depending upon the situation, be extremely difficult to use effectively.
After you intercept a system call, you could do various things, such as:
- Deny the system call
- Audit the system call's invocation
- Pre-process the arguments
- Post-process the result
- Replace the system call's implementation
There are several problems, however. If you are implementing a sandboxing mechanism (or mechanisms for access control, auditing, capabilities, etc.), you may not have all the context you need at the interception point. In many cases, even the original system call does not have this context, because such context may be a new requirement, introduced by your mechanism. Moreover, you may have to replicate the original system call's implementation because pre- and post-processing do not suffice. In closed source systems, this is a bigger problem. Even with an open source system, a system call could be off-loading most of its work to an internal kernel function, and you may not wish to make kernel source modifications. Chaining of calls (such as calling the original implementation from within the new one) will often not be atomic, so race conditions are possible. You also might need to ensure that semantics are fully preserved.
On many systems, the system call vector (usually an array of structures, each containing one or more function pointers — implementation(s) of that particular system call) is accessible and writable from within a kernel module. If not, there are workarounds. Even internal functions could be overtaken, say, by instruction patching. However, such approaches might turn out to be maintenance nightmares in case you are shipping a product that does this on an operating system that is not your own.
Examples
Let us briefly discuss a few examples that involve sandboxing.
TRON - ULTRIX (1995)
TRON was an implementation of a process-level discretionary access control system for ULTRIX. Using TRON, a user could specify capabilities for a process to access filesystem objects (individual files, directories, and directory trees). Moreover, TRON provided protected domains — restrictive execution environments with a specified set of filesystem access rights.
Rights on files included read, write, execute, delete, and modify (permissions), while rights on directories included creation of new files or links to existing files. TRON enforced capabilities via system call wrappers that were compiled into the kernel. Moreover, the implementation modified the process structure, key system calls such as fork and exit, and the system call vector itself.
LaudIt - Linux (1997)
I was a system administrator during my days at IIT Delhi as an undergraduate student. I had considerable success with a Bastard Operator From Hell (BOFH) facade, but certain mischievous users were quite relentless in pursuing their agenda. It was to foil them that I initially came up with the idea of LaudIt. First implemented in 1997, LaudIt was perhaps one of the earliest user-configurable and programmable system call interception mechanisms for the Linux kernel.
I implemented LaudIt as a loadable kernel module that dynamically modified the system call vector to provide different security policies. A user-interface and an API allowed a privileged user to mark system calls as having alternate implementations. Such a re-routed system call could have a chain of actions associated with the call's invocation. An action could be to log the system call, or consult a set of rules to allow or disallow the invocation to proceed. If no existing rule involved a re-routed system call, its entry in the system call vector was restored to its original value.
LaudIt required no modification (or recompilation) of the kernel itself, and could be toggled on or off.
Thus, using LaudIt, system calls could be audited, denied, or re-routed on a per-process (or per-user) basis. It was also possible to associate user-space programs with certain system call events. For example, you could implement per-file per-user passwords on files.
ECLIPSE/BSD (1997)
A description can be found here.
Ensim Private Servers (1999)
Ensim Corporation did pioneering work in the area of virtualizing operating systems on commodity hardware. Ensim's Virtual Private Server (VPS) technology allows you to securely partition an operating system in software, with quality of service, complete isolation, and manageability. There exist versions for Solaris, Linux, and Windows. Although all solutions are kernel-based, none of these implementations require source code changes to the kernel.
Solaris Zones (2004)
Sun introduced static partitioning in 1996 on its E10K family of servers. The partitions, or domains, were defined by a physical subset of resources - such as a system board with some processors, memory, and I/O buses. A domain could span multiple boards, but could not be smaller than a board. Each domain ran its own copy of Solaris. In 1999, Sun made this partitioning "dynamic" (known as Dynamic System Domains) in the sense that resources could be moved from one domain to another.
By the year 2002, Sun had also introduced Solaris Containers: execution environments with limits on resource consumption, existing within a single copy of Solaris. Sun has been improving and adding functionality to its Resource Manager (SRM) product, which was integrated with the operating system beginning with Solaris 9. SRM is used to do intra-domain management of resources such as CPU usage, virtual memory, maximum number of processes, maximum logins, connect time, disk space, etc.
The newest Sun reincarnation of these concepts is called "Zones": a feature in the upcoming Solaris 10. According to Sun, the concept is derived from the BSD "jail" concept: a Zone (also known as a "trusted container") is an isolated and secure execution environment that appears as a "real machine" to applications. There is only one copy of the Solaris kernel.
An example of using Zones is provided in the section on Solaris security.