CSCI 555 Advanced OS and MIT 6.828 OS Engineering Lab

2019-01-01

This post goes through the general content of USC CSCI 555 Advanced OS and talks a little about MIT 6.828 (an old version, instead of the up to date one, and you should be able to find lots of more information (codes, blog) about this lab than this post).

CSCI 555 Advanced OS

Since this is a master-level course, instead of going through basic OS ideas, the professor used papers in different OS areas as threads to introduce detailed implementations in OS.

In this post, I list all papers talked in the course in order and provided a brief summary.

This paper talks about the general designs of UNIX.

It talks about:

what is time-sharing system;
File System design/implementation and special files handling;
protection of data;
processes and images;
the shell;
influence, experiments of UNIX.

The paper introduces the origin of UNIX and the later envolving of it. A more history oriented paper.

Exokernel

Exokernel: An Operating System Architecture for Application-level Resource Management, Proceedings of the Fifteenth ACM Symposium on Operating Systems Principles is a paper talks about new kernel design, Exokernel.

It addresses the problem of OS is not optimized with an application. By separating tradition kernel into Exokernel and LibOS, the application could use different special designed LibOS to achieve better performance.

Xen and Virtualization

Xen and the Art of Virtualization, Proceedings of the Nineteenth ACM Symposium on Operating Systems Principles talks about the idea of virtual machines and how they are virtualized. The details of managing memory, CPU resources for different VMs and the function of the hypervisor. *The advantages and disadvantages of full-virtualization and para-virtualization are compared in the paper.

The Singularity System

An experimental project of Microsoft and it wants to make use of safe programming language (Sing#, developed from C#) to enhance the safety of OS. It argues the normal OS did lots of extra work to solve a problem caused by unsafe language. Consequently, they proposed the Singularity System.

processes in Singularity are software-isolated processes, or SiPs, that rely on language safety, not hardware mechanisms, to isolate system software components from one another.

Scheduler Activations

The initial problem in Scheduler Activations: Effective Kernel Support for the User-Level Management of Parallelism paper is kernel has no idea about how does user level threads work. Consequently, if the programmer employs kernel threads resulting in poor performance; if the programmer employs user-level threads resulting in deficient function. The solution provides each application with virtual multiprocessor. Scheduler Activation is a kernel abstraction for a virtual processor.

Lottery Scheduling

Lottery Scheduling: Flexible Proportional-Share Resource Management talks about the famous scheduling algorithm and the benchmark result of this “‘novel’ randomized resource allocation mechanism”.

The initial motivation of the new mechanism is the need for “responsive control over the relative execution rates of computations.”

This mechanism can also be applied to I/O bandwidth, memory, and access to locks.

The details about how tickets assigned, transferred, inflated, present in the paper.

Eraser

Eraser: A Dynamic Data Race Detector for Multithreaded Programs talks about how to solve data race problem in application or OS. The proposed analyzer could detect potential data race for concurrent threads in a program. Compared with other approaches (lock, annotation, etc.), they improve the performance, lower the false alarm.

Non-Scalable Locker

Non-Scalable Locks are dangerous proves the dangerousness of non-scalable locks, which is well known but not rigorously proved. The experiment shows on multicore processors, non-scalable locks significantly slow-down the performance, due to multicores contend for the lock. The performance collapses as more cores contending for the lock, serial section grows, resulting another core more likely start to contend. Scalable locks like MCS could solve this problem.

Memory Resource Management in VMWare ESX Server

The paper introduced how VMWare ESX server manages the memory resource to achieve shared memory for VMs on the same machine. Balloning introduced as the trick makes different VMs have a different view of currently memory availability, which consists with the physical memory assigned to it. The memory share is transparent on ESX. Copy-on-write is a common method to avoid redundant writes on the same content.

Superpages

Practical, transparent operating system support for superpages just talks about superpages, its motivation, design, implementation and benchmark results. The motivation is the development in CPU, memory and disk. How to achieve transparency and how does superpage improve the performance is presented in the paper.

RadixVM

RadixVM: Scalable address spaces for multithreaded applications addresses the problem that

Because operating systems serialize mmap and munmap calls, even for non-overlapping memory regions, these applications can easily be bottlenecked by contention in the OS kernel.

RadixVM uses radix tree to store mapped memory region information; uses scalable reference counting to tell when physical pages are free and radix tree nodes are no longer used; then it avoids to shoot down TLBs which does not have the page mapping cached.

RadixVM allows multiple threads of the same process to perform mmap, munmap, and pagefault operations for non-overlapping memory regions in parallel.

RAID

A Case for Redundant Arrays of Inexpensive Disks (RAID) tries to solve the problems of the failure of disks. The intuitive idea is more disks with replication. And this is RAID. The paper talks about 5 different arrangement of disk arrays and provided the best choice advise considering the cost and performance. Different level of RAID means different ways to store the replication copy.

Log-Structured File System

The Design and Implementation of a Log-Structured File System present technique for disk storage management. File system writes all modifications to disk sequentially in a log-like structure. It speeds up file writing and crash recovery. Logs are divided into segments and use a segment cleaner to compress the live information from heavily fragmented segments.

Journaling

Journaling the Linux ext2fs Filesystem‘s goals are the performance of File System does not suffer; compatibility with the existing application will stay; reliability must not be compromised. The prototype is extended from the Linux ext2fs filesystem.

It records the new contents of filesystem metadata blocks while we are in the process of committing transactions. The only other requirement of the log is that we must be able to atomically commit the transactions it contains.

After each transaction, the journal will be committed and checkpoints are made to make sure the file system is recoverable when crushed starting from this point.

GFS

The Google File System is network and distributed file system. It is the trade-off between flexibility and scale. Early implementation details are presented in the paper.

Commutativity Rule

The Scalable Commutativity Rule: Designing Scalable Software for Multicore Processors presents ways to improve concurrency using the commutativity of System calls. A tool named commuter is developed to check if the calls are commutative.

Dune

Dune: Safe User-level Access to Privileged CPU Features targets to provide safe access to Hardware features (TLB, Page Table, Exceptions). The normal process transforms into Dune process, with ioctl /dev/dune, to get privilege from Root Ring 3 to Non-root Ring 0. Dune process has direct access to the page table, and it could use system call API.

IX

IX: A Protected Dataplane Operating System for High Throughput and Low Latency presents solutions of high throughput and low latency. Each process in IX is a Dune process.

Tricks for low latency. Protection: Dune; Latency: polling, avoid TLB/cache, run to completion, zero copy I/O; high throughput: batching; efficiency: core allocation.

~~感觉这文章写起来比自己原来料想的要痛苦和长很多=-=~~

CSCI 555 Lab a.k.a. An Ancient-Version Lab for MIT 6.828

To Be Continued…