6.S974, Fall 2018

6.S974, Decentralized Applications, Fall 2018

Introduction

The goal of 6.S974 is to understand recent efforts in decentralized applications, to learn what the main design trade-offs are, and to identify areas for new research.

Users often entrust their data to web sites (e.g. e-mail, photo-sharing, blogging, and social net sites) in return for the ability to share their data and to interact with other users. However, such web sites typically give users little direct control over their own data: the web site sets the rules for who can see the user's data, the web site's software and employees can look at or reveal a user's data, only the web site's own software can be used with the user's data, and it is rarely easy for the user to move their data from one web site to a competitor. The goal of decentralized applications is to give users more control over their own data, while supporting flexible sharing and interaction with other users over the Internet. While there exist some promising technologies for decentralized application infrastructure, change is rapid and there are likely many areas where advances are possible.

Class meetings will will consist of discussions of recent papers and deployed systems. Each student will lead discussion of one or more papers; do a handful of small labs; and design and build a substantial project, including a written report (roughly 6 pages) and presentation to the class. There will be no exams.

Signing Up

If you'd like to take this course, please send e-mail to rtm@mit.edu with the two or three papers for which you'd most like to lead discussion, and a few sentences about your relevant background and interests.

Questions

We'll examine existing systems in order to understand the design space. Here are some questions we'll gnaw on:

In what ways can we reasonably expect decentralized applications to be better than current web sites? More private? More censor-resistant? More open choice of software to use with user data?
For what types of applications does decentralization seem most compelling?
Can we hope for a fixed "universal" storage service API that can be used by a wide array of in-browser JavaScript applications, or is application-specific server-side support likely to be needed?
Where should a user's data be stored? On the user's own computer? In cooperative peer-to-peer storage? In commercial cloud storage services such as Amazon S3?
How critical is it for data to be stored encrypted?
What are the options for access control using cryptography? How to provide ACLs and groups? How to support changes to ACLs and groups?
How to support global non-primary-key queries such as "all comments on this blog post" or "e-mail addressed to me"?
If users store different kinds of data in one storage system, they will likely want to ensure that e.g. their photo editor cannot read their e-mail; how to provide this kind of least privilege execution?
How to support applications such as news aggregators that involve data which isn't naturally owned by a single user (e.g. the front page)?
Abusive participants are likely to spam, lie about contributing p2p resources, manipulate votes and karma, hog resources, forge data, insert objectionable content, and launch denial-of-service attacks; are there decentralized defenses?
Untrusted storage services might try to serve stale data, or claim that data doesn't exist, or "equivocate" by showing different state to different users. What defenses are possible?
Do distributed hash tables (DHTs) have a place here? They allow keyed retrieval from peer-to-peer collections of computers, eliminating the need for cloud servers; but they are pretty vulnerable to attack (Sybil, eclipse, etc) and have trouble guaranteeing freshness and absence of forks.
Do block chains have a place here? Existing block chains (e.g. Bitcoin) seem pretty good at guaranteeing freshness and absence of forks, which is impressive given that the participants can't be trusted. But they have trouble with scale (transaction rate and volume), they seemed tied to novel currencies in an awkward way, and they require many participants in order to resist attack.
Some applications involve information that's notionally updated by multiple users. Should that be implemented by directly having different users modify the same storage? Or simulated by e.g. selecting the most recent of multiple versions maintained by different users?
Can one compose independent mechanisms to achieve the overall desired properties, e.g. storage, lookup, mutability, access control, payment, authentication, notification? Or does only a unified design make sense?
Do we need to worry about how storage and compute time (for queries) are paid for? Who pays when A modifies B's file, or when A performs an expensive query over B's data? Do we need ways to check that providers are really storing what they promise to store?

I don't know the answers, and I'm not even sure what the right questions are; I hope to learn as much as anyone from this course.

Paper Discussion

Most class meetings will consist of paper discussions. Everyone should read each paper and be prepared to argue about it, both about technical details and the extent to which the paper's design and ideas seem promising. Each paper discussion will be led by a student. For the paper(s) you're assigned to lead, you should come to class prepared with:

A introduction (should be quick, since everyone has read the paper).
Where the paper fits into an overall picture of decentralized applications.
A summary of what we should learn from the paper: interesting ideas and lessons.
Opinions about whether the system is likely to be useful and successful.
Experience using and/or programming the system, if that is possible (download the software and run it, play with their web site, etc.). Feel free to project a demo from your laptop.
Explanations of important techniques that may be hard to understand.
Background or related information you find by following links, searching, or reading cited work.
Areas in which the system seems weak, particularly areas in which new research and ideas seem to be needed.
Questions for the class to consider.

Everyone should feel free to post questions about the papers on Piazza.

Projects

Everyone should do a project of their choice. Group projects are encouraged. The goal is to explore an idea, evaluate whether it makes sense, and write up the results. Milestones include a proposal, project conferences, a write-up in the form of a research paper, submitted code, and an in-class presentation and demo (see the calendar for dates). Project topics should be along similar lines to the papers we read. If you're not sure, please ask me. Here are some thoughts to serve as starting points for your own project ideas:

Build a decentralized storage system that provides fork consistency but scales well, or a library that helps applications build their own fork consistency on top of untrusted cloud storage.
Implement general-purpose access control (perhaps ACLs and groups) using cryptography on top of untrusted storage.
Build a storage system that supports decentralized applications which display information derived from large numbers of users, such as the Hacker News front page. Perhaps this can be done with mutable data items, or append-only or set data items, or with queries that examine many users' data.
Build a storage system that supports queries on multi-user shared data that's encrypted.
Build a system to improve the trustworthyness of decentralized application code, and code updates, downloaded into browsers.
If users store different kinds of data in the same storage system, they will want applications to run with limited privileges (so my photo editor can't steal my e-mail). Build a system that supports least-privilege execution of applications.
Build an ambitious multi-user decentralized application on top of some existing infrastructure, or on your own new infrastructure.
Build good scalable defenses against bad storage server behavior, particularly against equivocation. Fork consistency and block chains may be useful, but there seems to be a large gap between their performance and the request rate a large-scale storage system would need to support.

Grades

If it's clear you put significant effort into class discussion, the paper(s) for which you were the lead, the labs, and your project, you'll get an A.

Pre-requisites

You'll need a 6.033-level understanding of the web, SSL, security, public key cryptography, and Bitcoin. You'll need to program in order to play with the systems we'll look at and to do a project; JavaScript will likely be the most useful language.