kernelthread.com

A Taste of Computer Security

Sandboxing

Sandboxing is a popular technique for creating confined execution environments, which could be be used for running untrusted programs. A sandbox limits, or reduces, the level of access its applications have — it is a container.

If a process P runs a child process Q in a sandbox, then Q's privileges would typically be restricted to a subset of P's. For example, if P is running on the "real" system (or the "global" sandbox), then P may be able to look at all processes on the system. Q, however, will only be able to look at processes that are in the same sandbox as Q. Barring any vulnerabilities in the sandbox mechanism itself, the scope of potential damage caused by a misbehaving Q is reduced.

Sandboxes have been of interest to systems researchers for a long time. In his 1971 paper titled Protection, Butler Lampson provided an abstract model highlighting properties of several existing protection and access-control enforcement mechanisms, many of which were semantic equivalents (or at least semantic relatives) of sandboxing. Lampson used the word "domain" to refer to the general idea of a protection environment.

There were several existing variants of what Lampson referred to as domains: protection context, environment, ring, sphere, state, capability list, etc.

Historical Example: Hydra

Hydra [1975] was a rather flexible capability-based protection system. One of its fundamental principles was the separation of policy and mechanism. A protection kernel provided mechanisms to implement policies encoded in user-level software that communicated with the kernel. Hydra's philosophy could be described as follows:

Hydra was relevant from several (related) perspectives: protection, sandboxing, security, virtualization, and even quality of service. The Hydra kernel's component for handling process scheduling was called the Kernel Multiprocessing System (KMPS). While parametrized process schedulers existed at that time, Hydra provided more flexibility through user-level schedulers. In fact, multiple schedulers could run concurrently as user-level processes called Policy Modules (PMs). The short-term scheduling decisions made in the kernel (by KMPS) were based on parameters set by the PMs. There was also provision for having a "guarantee algorithm" for allocating a rate guarantee (a fixed percentage of process cycles over a given period) to each PM.

Varieties

Today sandboxes are used in numerous contexts. Sandbox environments range from those that look like a complete operating environment to applications within, to those that provide a minimum level of isolation (to a few system calls, say). Some common sandbox implementation categories include:

Virtualization

A more detailed discussion on virtualization, including several example of sandboxing mechanisms, is available in An Introduction to Virtualization.

Ideally you would design a system with explicit support for sandboxing, but it is often more practical to retrofit sandboxing into existing systems. Intercepting system calls is a common approach to creating sandboxes.

Intercepting System Calls

A common technique used for retrofitting sandboxing mechanisms is the "taking over", or interception of system calls. While such implementation is a powerful tool and is trivial to implement on most systems (usually without any kernel modifications, such as through loadable modules), it could, depending upon the situation, be extremely difficult to use effectively.

After you intercept a system call, you could do various things, such as:

There are several problems, however. If you are implementing a sandboxing mechanism (or mechanisms for access control, auditing, capabilities, etc.), you may not have all the context you need at the interception point. In many cases, even the original system call does not have this context, because such context may be a new requirement, introduced by your mechanism. Moreover, you may have to replicate the original system call's implementation because pre- and post-processing do not suffice. In closed source systems, this is a bigger problem. Even with an open source system, a system call could be off-loading most of its work to an internal kernel function, and you may not wish to make kernel source modifications. Chaining of calls (such as calling the original implementation from within the new one) will often not be atomic, so race conditions are possible. You also might need to ensure that semantics are fully preserved.

On many systems, the system call vector (usually an array of structures, each containing one or more function pointers — implementation(s) of that particular system call) is accessible and writable from within a kernel module. If not, there are workarounds. Even internal functions could be overtaken, say, by instruction patching. However, such approaches might turn out to be maintenance nightmares in case you are shipping a product that does this on an operating system that is not your own.

Examples

Let us briefly discuss a few examples that involve sandboxing.

TRON - ULTRIX (1995)

TRON was an implementation of a process-level discretionary access control system for ULTRIX. Using TRON, a user could specify capabilities for a process to access filesystem objects (individual files, directories, and directory trees). Moreover, TRON provided protected domains — restrictive execution environments with a specified set of filesystem access rights.

Rights on files included read, write, execute, delete, and modify (permissions), while rights on directories included creation of new files or links to existing files. TRON enforced capabilities via system call wrappers that were compiled into the kernel. Moreover, the implementation modified the process structure, key system calls such as fork and exit, and the system call vector itself.

LaudIt - Linux (1997)

I was a system administrator during my days at IIT Delhi as an undergraduate student. I had considerable success with a Bastard Operator From Hell (BOFH) facade, but certain mischievous users were quite relentless in pursuing their agenda. It was to foil them that I initially came up with the idea of LaudIt. First implemented in 1997, LaudIt was perhaps one of the earliest user-configurable and programmable system call interception mechanisms for the Linux kernel.

I implemented LaudIt as a loadable kernel module that dynamically modified the system call vector to provide different security policies. A user-interface and an API allowed a privileged user to mark system calls as having alternate implementations. Such a re-routed system call could have a chain of actions associated with the call's invocation. An action could be to log the system call, or consult a set of rules to allow or disallow the invocation to proceed. If no existing rule involved a re-routed system call, its entry in the system call vector was restored to its original value.

LaudIt required no modification (or recompilation) of the kernel itself, and could be toggled on or off.

Thus, using LaudIt, system calls could be audited, denied, or re-routed on a per-process (or per-user) basis. It was also possible to associate user-space programs with certain system call events. For example, you could implement per-file per-user passwords on files.

ECLIPSE/BSD (1997)

A description can be found here.

Ensim Private Servers (1999)

Ensim Corporation did pioneering work in the area of virtualizing operating systems on commodity hardware. Ensim's Virtual Private Server (VPS) technology allows you to securely partition an operating system in software, with quality of service, complete isolation, and manageability. There exist versions for Solaris, Linux, and Windows. Although all solutions are kernel-based, none of these implementations require source code changes to the kernel.

Solaris Zones (2004)

Sun introduced static partitioning in 1996 on its E10K family of servers. The partitions, or domains, were defined by a physical subset of resources - such as a system board with some processors, memory, and I/O buses. A domain could span multiple boards, but could not be smaller than a board. Each domain ran its own copy of Solaris. In 1999, Sun made this partitioning "dynamic" (known as Dynamic System Domains) in the sense that resources could be moved from one domain to another.

By the year 2002, Sun had also introduced Solaris Containers: execution environments with limits on resource consumption, existing within a single copy of Solaris. Sun has been improving and adding functionality to its Resource Manager (SRM) product, which was integrated with the operating system beginning with Solaris 9. SRM is used to do intra-domain management of resources such as CPU usage, virtual memory, maximum number of processes, maximum logins, connect time, disk space, etc.

The newest Sun reincarnation of these concepts is called "Zones": a feature in the upcoming Solaris 10. According to Sun, the concept is derived from the BSD "jail" concept: a Zone (also known as a "trusted container") is an isolated and secure execution environment that appears as a "real machine" to applications. There is only one copy of the Solaris kernel.

An example of using Zones is provided in the section on Solaris security.

<<< Detecting Intrusion main An Example: Solaris Security >>>