Dissertation Defense
Reducing Redundant Computation and Data in Non-deterministic Systems by Exploiting the Computation-data Duality
Add to Google Calendar
Repeated computations and redundant data are pervasively invoked or generated in modern systems and software. These computations and data are mostly, but not exactly the same. For example, web servers invoke similar computations when displaying a common web page for different users, applications perform similar initialization during startups, and cloud storage services maintain past versions of user files with redundant data chunks shared among these versions. Consequently, many prior systems detect and deduplicate similar computations or data to reduce computation, network and storage costs. However, the non-determinism inherited in modern systems limits the effectiveness of existing approaches.
Therefore, the thesis exploits the principle of equivalence between computation and data to reduce redundancy for non-deterministic systems via two novel systematic supports. On the one hand, we can substitute computation for data to reduce storage and network costs for cloud-based storage services. In lieu of the actual file data, a file can be selectively represented as a description of the non-deterministic computation that is sufficient to re-create the same file data. Moreover, when the description of the computation is less costly than the file data, storage and communication costs are reduced.
On the other hand, we can substitute data for computation to accelerate similar but non-deterministic computations, such as web server request processing and application startups. The computation is first memoized to disk, and then a predicated slice, which includes all instructions affected by non-deterministic inputs, is generated. For a subsequent execution of the same code region, the memoized outputs are applied to the program state and only these instructions in the predicated slice are re-evaluated to incorporate new non-deterministic inputs. As a result, the code region is significantly accelerated since most instructions are skipped due to memoization while only instructions in the predicated slice are executed.