6.S974 Introduction

what do I mean by decentralized apps?
  apps built in a way that moves app code, and control over
  data, away from centrally-controlled web sites and into users' hands.

there is a lot of energy in this area, spurred by the success of
Bitcoin, new technical ideas, and growing dissatisfaction with the way
power has become centralized on the Internet.

old: a typical (centralized) web site
  [user browsers, net, site's web servers w/ app code, site's DB]
  users' data hidden behind proprietary app code
  e.g. blog posts, gmail, piazza, reddit comments, photo sharing,
    calendar, medical records, &c
  this arrangement has been very successful!
  why is this not ideal?
    users have to use this web site's UI if they want to see their data
    web site sets (and changes!) the rules for who gets access
    web site may snoop, sell information to advertisers
    web site's employees may snoop for personal reasons
    disappointing since it's often the user's own data!
  a design view of the problem:
    the big interface division is between users and app+data
    app+data integration is convenient for web site owner
    but this interface is UI-oriented (HTML, or web API),
    and is usually not good about giving users control and access to data

new: decentralized apps
  [user apps, net, general-purpose shared distributed storage]
  this architecture separates app code from user data
    the big interface division is between user+app and data
    the interface grants access to data -- so programs can use it
  I'm imagining an open general-purpose storage API
    users store different kinds of data (mail, calendar, blog posts, &c)
    modulo permissions, apps can see each others' data
    modulo permissions, users can share data, for multi-user apps
  I'm imagining the storage system lets users control access permissions

what's the point?
  easier for users to switch apps, since data is open
  easier to have apps that look at multiple kinds of data
    calendar/email, or backup, or file browser
  easier for users to switch storage provider
    data and apps don't have to change
  privacy vs snooping (assuming end-to-end encryption)
  harder to censor/block some users

how might decentralized applications work?
  here's one simple possibility.
  app: a to-do list shared by two users
    [UI x2, check-box list, "add" button]
  both contribute items to be done
  both can mark an item as finished
  we'll use a DHT (distributed hash table) for storage
    this is a peer-to-peer key/value database, which spreads keys over
      participants' computers in order to spread the load
    put(key, value)
    get(key) -> value
    the put() and get() client-side library routines can figure
      out what computer holds any given key, sends network request there
      e.g. with hash(key) mod NServers
    lots of apps are likely using the DHT, not just our to-do list
      I'm imagining a single global DHT service
  users U1 and U2 run apps on their computers
    maybe as JavaScript in browsers
    the apps call put() and get()
  the app doesn't have any associated server, it just uses the DHT
  how to represent to-do list items in the DHT?
    key = U1-U2-item-3, value = "get milk"
  "finished" marks?
    key = U1-U2-done-3, value = nil
  app get()s higher and higher item numbers until get() fails
  the point:
    the service is storage, independent of any application.
    so users can switch apps, write their own, add encryption to
    prevent snooping, delete their to-do lists, back them up,
    integrate with e-mail app, &c
  for a lab next week you'll use a more realistic system along these lines
    Blockstack

what could go wrong?
  decentralization is painful:
    simple put/get storage much less flexible than dedicated SQL DB
    no trusted server to e.g. look at auction bids w/o revealing
    cryptographic privacy/authentication makes everything else harder
    awkward for users as well as programmers
  current web site architecture works very well
    easy to program
    central control over software+data makes changes (and debugging) easy
    good solutions for performance, reliability
    successful revenue model (ads)
  worries about current situation are not overwhelming
    could be fixed with laws about privacy or user rights
    or with modest technology evolution, like APIs and secure e-mail
    or ignored

course structure
  http://nil.lcs.mit.edu
  class meetings will be paper discussions (not lectures)
    either a research paper, or a real-life project
  each discussion will have a leader
    extract lessons, track down hard details, raise questions
    use or program the system, if that makes sense
    please e-mail me with your top three choices
  everyone should read and think about each paper
    be prepared to ask/answer questions, criticize, support, &c
  two Blockstack labs; see writeups on course calendar
    the point is to get hands-on experience
    first lab due on Tuesday (when we read Blockstack)
      the goal is to get a 10- or 20-line program to work
      feel free to ask questions on Piazza
  projects
    pick an idea, design, build, evaluate if the idea was good
    doesn't have to be research
    proposal, conferences, report, short presentation
    groups of 2 or 3 are OK

Some questions we'll wrestle with:

* What's the expected benefit? What ultimate goals can the technology
reasonably achieve? Maybe we're angry that big Internet companies have
a monopoly hold on our data, or seem to be snooping on us, or sell us
as a product to advertisers. And intuitively it seems like it ought to
help to move data out of web sites, into neutral storage, separate
from applications. Or maybe we just think the technology is neat. But
it would be good to have a solid argument for what will be improved
and why (why more private, less snooping, less censoring, more
flexible, &c).

* What's the killer app? Or, for what class of applications is
decentralization compelling? My own private data? Small group
communication / closed forums? Sensitive interactions like medical
records and online dating? Big social networks? Open aggregators like
Reddit?

* where to store user data? on each user's own machine; or cooperative
p2p e.g. DHT; or paid cloud providers like Amazon. do you still get
the benefits if you store your data on a commercial third-party cloud
server, e.g. can you still control who can see your data? will
anything other than commercial cloud be reliable enough?

* how to pay storage providers? credit cards, to commercial outfits,
like Amazon AWS? most people aren't used to paying for services on the
web! do we need to pay for queries too, and perhaps other people
fetching my data? or maybe ordinary people put servers online and
charge users via Bitcoin? perhaps smart contracts that don't pay
unless the service can prove it stored the data?

* how much do we trust storage servers? do we trust them to keep our
data alive and accessible? if we trust them that far, would it
also make sense to trust them to enforce access control?

* if we don't trust storage servers, we probably need to encrypt data.
do we also need to hide who is accessing data, and which data they are
accessing? do we need strong verification of read consistency?
important because cryptography often makes everything else more
awkward and complex.

* how to provide flexible access control, e.g. groups. particularly if
data is encrypted. ok for small sets of users (encrypt for each), may
be expensive for lots of users, especially when deleting users from
groups.

* least privilege: if users store lots of different kinds of
information in one storage infrastructure, e.g. both photos and
e-mail, how can we ensure that the photo editor doesn't snoop on the
user's e-mail? after all, users will likely run lots of random
not-very-trustworthy apps.

* users need help keeping track of their own cryptographic keys,
recovering forgotten keys, revoking keys when personal devices are
stolen, finding the latest public keys of other users, and learning
about other users' revocations. these are old problems, but it's not
clear good solutions exist.

* how can we get well-behaved storage despite untrusted (perhaps
malicious) storage servers? clients can encrypt and sign to rule out
simple theft and forgery. what about stale versions of data, or "data
doesn't exist", or "equivocation" by showing different versions or
subsets of data to different users? the dangers are clearest when
managing money, since they might allow double-spending, but they come
up for other kinds of data too (e.g. hiding deletion from a file
listing people who hold security clearances).

* the block-chain aspect of Bitcoin has attractive consistency
properties, e.g. a story for equivocation. can block-chains be made
fast enough to form the basis for storage systems? or, if slow, can
they be used to help verify data retrieved from a faster storage
system? do block-chains make sense if not associated with a
crypto-currency?

* do we need automated audits to check that storage services are
actually storing what they have promised to store? or other technical
means to motivate them? what can we do if they are seen to misbehave?
this problem may be particularly acute for cooperative p2p storage
schemes.

* how to retrieve all comments made on a given post, all messages
addressed to me, recent links submitted to Reddit, &c? the problem is
that the querier doesn't know a unique key for the data, but most
large-scale storage schemes want simple keys. does app need to scan
all storage providers, all DHT nodes, &c? or should clients store and
update a big index (and pay for it)? who owns the index; who can write
it; who pays for it? how can we verify that it's correct, given that
we may not trust the servers, and the index is often the only way we
can find data? or maybe we can have objects that are writeable or
appendable by many people (to store lists)? though the same worries
apply.

* web sites are routinely attacked with denial of service, spam, fraud
(e.g. spurious voting), fake users, and warez. it's hard for any web
site to defend itself; are there good decentralized defenses?

* is the app / general-purpose storage split reasonable? is put/get
enough? do we need application-specific servers? e.g. for content
search in reddit, or to enforce complex access control rules, to
detect spam and vote fraud, to enforce rules and enforce agreement for
ebay bidding, to do online dating matching? will dependence on
application-specific servers undermine the charm of decentralization?

* what to do about data that's not naturally owned by a specific user
(who can sign it, pay for it, update it, &c)? e.g. information about
the Reddit front page, or vote counts?

* users will have to trust app code, e.g. JavaScript apps running in
the browser. how will users know they are running the right code? how
to update code securely? are client-side app environments isolated
enough to run sensitive apps?

* suppose all data can be used (modulo permissions) by an open-ended
set of applications. will that require lots of standardization to be
useful? will proprietary formats creep in? if there are complex
multi-user apps, e.g. decentralized Piazza, with multiple
implementations, how to keep the implementations in sync w.r.t. data
formats? experiences like Bitcoin show that it can be done, but that
it can also be painful.

for next tuesday:
  read the Blockstack paper
  do lab 1