6.S974, Decentralized Applications, Fall 2018
Introduction
The goal of 6.S974 is to understand recent efforts in
decentralized applications, to learn what the main design trade-offs
are, and to identify areas for new research.
Users often entrust their data to web sites (e.g. e-mail,
photo-sharing, blogging, and social net sites) in return for the
ability to share their data and to interact with other users. However,
such web sites typically give users little direct control over their
own data: the web site sets the rules for who can see the user's data,
the web site's software and employees can look at or reveal a user's
data, only the web site's own software can be used with the user's
data, and it is rarely easy for the user to move their data from one
web site to a competitor. The goal of decentralized applications is to
give users more control over their own data, while supporting flexible
sharing and interaction with other users over the Internet. While
there exist some promising technologies for decentralized application
infrastructure, change is rapid and there are likely many areas where
advances are possible.
Class meetings will will consist of discussions of recent papers and
deployed systems. Each student will lead discussion of one or more
papers; do a handful of small labs; and design and build a substantial
project, including a written report (roughly 6 pages) and presentation to the class.
There will be no exams.
Signing Up
If you'd like to take this course, please send e-mail to rtm@mit.edu
with the two or three papers for which you'd most like to lead discussion,
and a few sentences about your relevant background and interests.
Questions
We'll examine existing systems in order to understand the design
space. Here are some questions we'll gnaw on:
- In what ways can we reasonably expect decentralized applications
to be better than current web sites? More private? More
censor-resistant? More open choice of software to use with user data?
- For what types of applications does decentralization seem most
compelling?
- Can we hope for a fixed "universal" storage service API that can
be used by a wide array of in-browser JavaScript applications, or is
application-specific server-side support likely to be needed?
- Where should a user's data be stored? On the user's own computer?
In cooperative peer-to-peer storage? In commercial cloud storage
services such as Amazon S3?
- How critical is it for data to be stored encrypted?
- What are the options for access control using cryptography? How to
provide ACLs and groups? How to support changes to ACLs and groups?
- How to support global non-primary-key queries such as "all
comments on this blog post" or "e-mail addressed to me"?
- If users store different kinds of data in one storage system, they
will likely want to ensure that e.g. their photo editor cannot read
their e-mail; how to provide this kind of least privilege execution?
- How to support applications such as news aggregators that involve
data which isn't naturally owned by a single user (e.g. the front page)?
- Abusive participants are likely to spam, lie about contributing
p2p resources, manipulate votes and karma, hog resources, forge data,
insert objectionable content,
and launch denial-of-service attacks; are there decentralized
defenses?
- Untrusted storage services might try to serve stale data, or claim
that data doesn't exist, or "equivocate" by showing different state to
different users. What defenses are possible?
- Do distributed hash tables (DHTs) have a place here? They allow
keyed retrieval from peer-to-peer collections of computers,
eliminating the need for cloud servers; but they are pretty vulnerable
to attack (Sybil, eclipse, etc) and have trouble guaranteeing
freshness and absence of forks.
- Do block chains have a place here? Existing block chains (e.g.
Bitcoin) seem pretty good at guaranteeing freshness and absence of
forks, which is impressive given that the participants can't be
trusted. But they have trouble with scale (transaction rate and
volume), they seemed tied to novel currencies in an awkward way, and
they require many participants in order to resist attack.
- Some applications involve information that's notionally updated by
multiple users. Should that be implemented by directly having
different users modify the same storage? Or simulated by e.g.
selecting the most recent of multiple versions maintained by different
users?
- Can one compose independent mechanisms to achieve the overall
desired properties, e.g. storage, lookup, mutability, access control,
payment, authentication, notification? Or does only a unified design
make sense?
- Do we need to worry about how storage and compute time (for
queries) are paid for? Who pays when A modifies B's file, or when A
performs an expensive query over B's data? Do we need ways to check
that providers are really storing what they promise to store?
I don't know the answers, and I'm not even sure what the right
questions are; I hope to learn as much as anyone from this course.
Paper Discussion
Most class meetings will consist of paper discussions. Everyone should
read each paper and be prepared to argue about it, both about
technical details and the extent to which the paper's design and ideas
seem promising. Each paper discussion will be led by a student. For
the paper(s) you're assigned to lead, you should come to class
prepared with:
- A introduction (should be quick, since everyone has read the paper).
- Where the paper fits into an overall picture of decentralized
applications.
- A summary of what we should learn from the paper: interesting
ideas and lessons.
- Opinions about whether the system is likely to be useful
and successful.
- Experience using and/or programming the system, if that is
possible (download the software and run it, play with their
web site, etc.). Feel free to project a demo from your laptop.
- Explanations of important techniques that may be
hard to understand.
- Background or related information you find by following links,
searching, or reading cited work.
- Areas in which the system seems weak, particularly areas
in which new research and ideas seem to be needed.
- Questions for the class to consider.
Everyone should feel free to post questions about the papers on
Piazza.
Projects
Everyone should do a project of their choice. Group projects are encouraged.
The goal is to explore an idea, evaluate whether it makes sense,
and write up the results. Milestones include a proposal, project
conferences, a write-up in the form of a research paper, submitted
code, and an in-class presentation and demo (see the
calendar
for dates). Project topics should be
along similar lines to the papers we read. If you're not sure, please
ask me.
Here are some thoughts to serve as starting points for your own
project ideas:
- Build a decentralized storage system that provides fork
consistency but scales well, or a library that helps applications
build their own fork consistency on top of untrusted cloud
storage.
- Implement general-purpose access control (perhaps ACLs and groups)
using cryptography on top of untrusted storage.
- Build a storage system that supports decentralized applications
which display information derived from large numbers of users, such as
the Hacker News front page. Perhaps this can be done with mutable data
items, or append-only or set data items, or with queries that examine
many users' data.
- Build a storage system that supports queries on multi-user shared data
that's encrypted.
- Build a system to improve the
trustworthyness of decentralized application code, and
code updates, downloaded into
browsers.
- If users store different kinds of data in the same storage
system, they will want applications to run with limited privileges (so
my photo editor can't steal my e-mail). Build a system that supports
least-privilege execution of applications.
- Build an ambitious multi-user decentralized application on top of
some existing infrastructure, or on your own new infrastructure.
- Build good scalable defenses against bad storage server behavior,
particularly against equivocation. Fork consistency and block chains
may be useful, but there seems to be a large gap between their
performance and the request rate a large-scale storage system would
need to support.
Grades
If it's clear you put significant effort into class discussion, the
paper(s) for which you were the lead, the labs, and your project, you'll get an A.
Pre-requisites
You'll need a 6.033-level understanding of the web, SSL,
security, public key cryptography, and Bitcoin. You'll need to
program in order to play with the systems we'll look at and to do a
project; JavaScript will likely be the most useful language.