6.S974 Introduction what do I mean by decentralized apps? apps built in a way that moves app code, and control over data, away from centrally-controlled web sites and into users' hands. there is a lot of energy in this area, spurred by the success of Bitcoin, new technical ideas, and growing dissatisfaction with the way power has become centralized on the Internet. old: a typical (centralized) web site [user browsers, net, site's web servers w/ app code, site's DB] users' data hidden behind proprietary app code e.g. blog posts, gmail, piazza, reddit comments, photo sharing, calendar, medical records, &c this arrangement has been very successful! why is this not ideal? users have to use this web site's UI if they want to see their data web site sets (and changes!) the rules for who gets access web site may snoop, sell information to advertisers web site's employees may snoop for personal reasons disappointing since it's often the user's own data! a design view of the problem: the big interface division is between users and app+data app+data integration is convenient for web site owner but this interface is UI-oriented (HTML, or web API), and is usually not good about giving users control and access to data new: decentralized apps [user apps, net, general-purpose shared distributed storage] this architecture separates app code from user data the big interface division is between user+app and data the interface grants access to data -- so programs can use it I'm imagining an open general-purpose storage API users store different kinds of data (mail, calendar, blog posts, &c) modulo permissions, apps can see each others' data modulo permissions, users can share data, for multi-user apps I'm imagining the storage system lets users control access permissions what's the point? easier for users to switch apps, since data is open easier to have apps that look at multiple kinds of data calendar/email, or backup, or file browser easier for users to switch storage provider data and apps don't have to change privacy vs snooping (assuming end-to-end encryption) harder to censor/block some users how might decentralized applications work? here's one simple possibility. app: a to-do list shared by two users [UI x2, check-box list, "add" button] both contribute items to be done both can mark an item as finished we'll use a DHT (distributed hash table) for storage this is a peer-to-peer key/value database, which spreads keys over participants' computers in order to spread the load put(key, value) get(key) -> value the put() and get() client-side library routines can figure out what computer holds any given key, sends network request there e.g. with hash(key) mod NServers lots of apps are likely using the DHT, not just our to-do list I'm imagining a single global DHT service users U1 and U2 run apps on their computers maybe as JavaScript in browsers the apps call put() and get() the app doesn't have any associated server, it just uses the DHT how to represent to-do list items in the DHT? key = U1-U2-item-3, value = "get milk" "finished" marks? key = U1-U2-done-3, value = nil app get()s higher and higher item numbers until get() fails the point: the service is storage, independent of any application. so users can switch apps, write their own, add encryption to prevent snooping, delete their to-do lists, back them up, integrate with e-mail app, &c for a lab next week you'll use a more realistic system along these lines Blockstack what could go wrong? decentralization is painful: simple put/get storage much less flexible than dedicated SQL DB no trusted server to e.g. look at auction bids w/o revealing cryptographic privacy/authentication makes everything else harder awkward for users as well as programmers current web site architecture works very well easy to program central control over software+data makes changes (and debugging) easy good solutions for performance, reliability successful revenue model (ads) worries about current situation are not overwhelming could be fixed with laws about privacy or user rights or with modest technology evolution, like APIs and secure e-mail or ignored course structure http://nil.lcs.mit.edu class meetings will be paper discussions (not lectures) either a research paper, or a real-life project each discussion will have a leader extract lessons, track down hard details, raise questions use or program the system, if that makes sense please e-mail me with your top three choices everyone should read and think about each paper be prepared to ask/answer questions, criticize, support, &c two Blockstack labs; see writeups on course calendar the point is to get hands-on experience first lab due on Tuesday (when we read Blockstack) the goal is to get a 10- or 20-line program to work feel free to ask questions on Piazza projects pick an idea, design, build, evaluate if the idea was good doesn't have to be research proposal, conferences, report, short presentation groups of 2 or 3 are OK Some questions we'll wrestle with: * What's the expected benefit? What ultimate goals can the technology reasonably achieve? Maybe we're angry that big Internet companies have a monopoly hold on our data, or seem to be snooping on us, or sell us as a product to advertisers. And intuitively it seems like it ought to help to move data out of web sites, into neutral storage, separate from applications. Or maybe we just think the technology is neat. But it would be good to have a solid argument for what will be improved and why (why more private, less snooping, less censoring, more flexible, &c). * What's the killer app? Or, for what class of applications is decentralization compelling? My own private data? Small group communication / closed forums? Sensitive interactions like medical records and online dating? Big social networks? Open aggregators like Reddit? * where to store user data? on each user's own machine; or cooperative p2p e.g. DHT; or paid cloud providers like Amazon. do you still get the benefits if you store your data on a commercial third-party cloud server, e.g. can you still control who can see your data? will anything other than commercial cloud be reliable enough? * how to pay storage providers? credit cards, to commercial outfits, like Amazon AWS? most people aren't used to paying for services on the web! do we need to pay for queries too, and perhaps other people fetching my data? or maybe ordinary people put servers online and charge users via Bitcoin? perhaps smart contracts that don't pay unless the service can prove it stored the data? * how much do we trust storage servers? do we trust them to keep our data alive and accessible? if we trust them that far, would it also make sense to trust them to enforce access control? * if we don't trust storage servers, we probably need to encrypt data. do we also need to hide who is accessing data, and which data they are accessing? do we need strong verification of read consistency? important because cryptography often makes everything else more awkward and complex. * how to provide flexible access control, e.g. groups. particularly if data is encrypted. ok for small sets of users (encrypt for each), may be expensive for lots of users, especially when deleting users from groups. * least privilege: if users store lots of different kinds of information in one storage infrastructure, e.g. both photos and e-mail, how can we ensure that the photo editor doesn't snoop on the user's e-mail? after all, users will likely run lots of random not-very-trustworthy apps. * users need help keeping track of their own cryptographic keys, recovering forgotten keys, revoking keys when personal devices are stolen, finding the latest public keys of other users, and learning about other users' revocations. these are old problems, but it's not clear good solutions exist. * how can we get well-behaved storage despite untrusted (perhaps malicious) storage servers? clients can encrypt and sign to rule out simple theft and forgery. what about stale versions of data, or "data doesn't exist", or "equivocation" by showing different versions or subsets of data to different users? the dangers are clearest when managing money, since they might allow double-spending, but they come up for other kinds of data too (e.g. hiding deletion from a file listing people who hold security clearances). * the block-chain aspect of Bitcoin has attractive consistency properties, e.g. a story for equivocation. can block-chains be made fast enough to form the basis for storage systems? or, if slow, can they be used to help verify data retrieved from a faster storage system? do block-chains make sense if not associated with a crypto-currency? * do we need automated audits to check that storage services are actually storing what they have promised to store? or other technical means to motivate them? what can we do if they are seen to misbehave? this problem may be particularly acute for cooperative p2p storage schemes. * how to retrieve all comments made on a given post, all messages addressed to me, recent links submitted to Reddit, &c? the problem is that the querier doesn't know a unique key for the data, but most large-scale storage schemes want simple keys. does app need to scan all storage providers, all DHT nodes, &c? or should clients store and update a big index (and pay for it)? who owns the index; who can write it; who pays for it? how can we verify that it's correct, given that we may not trust the servers, and the index is often the only way we can find data? or maybe we can have objects that are writeable or appendable by many people (to store lists)? though the same worries apply. * web sites are routinely attacked with denial of service, spam, fraud (e.g. spurious voting), fake users, and warez. it's hard for any web site to defend itself; are there good decentralized defenses? * is the app / general-purpose storage split reasonable? is put/get enough? do we need application-specific servers? e.g. for content search in reddit, or to enforce complex access control rules, to detect spam and vote fraud, to enforce rules and enforce agreement for ebay bidding, to do online dating matching? will dependence on application-specific servers undermine the charm of decentralization? * what to do about data that's not naturally owned by a specific user (who can sign it, pay for it, update it, &c)? e.g. information about the Reddit front page, or vote counts? * users will have to trust app code, e.g. JavaScript apps running in the browser. how will users know they are running the right code? how to update code securely? are client-side app environments isolated enough to run sensitive apps? * suppose all data can be used (modulo permissions) by an open-ended set of applications. will that require lots of standardization to be useful? will proprietary formats creep in? if there are complex multi-user apps, e.g. decentralized Piazza, with multiple implementations, how to keep the implementations in sync w.r.t. data formats? experiences like Bitcoin show that it can be done, but that it can also be painful. for next tuesday: read the Blockstack paper do lab 1