Amr El Abbadi Computer Science, UC Santa Barbara [email protected] Collaborators: Divy Agrawal, Vaibhav Arora, Sudipto Das, Aaron Elmore, Hatem Mahmoud, Faisal Nawab, and Cetin Sahin. Analysis & Quality DASFAA 2015 2 Facebook: ◦ 1.4 Billion users ◦ 140.3 Billion friendships Twitter in a day: ◦ 500 million tweets sent Youtube in a day: ◦ 3 billion videos viewed Stats from facebook.com, twitter.com and youtube.com DASFAA 2015 3 104+ Hours of video uploaded on youtube 42,408+ App Downloads 153,804+ New photos uploaded on Facebook $263,947+ money spent on web shopping 298,013+ New Tweets 1,881,737+ youtube video views 2,521,244+ search queries on Google 2,692,323+ New Facebook Likes 20,234,009+ flickr photos views 204,709,030+ emails sent over the Internet Source: http://whathappensontheinternetin60seconds.com/ 4 DASFAA 2015 5 DASFAA 2015 6 DASFAA 2015 7 Client Site Client Site Client Site Load Balancer (Proxy) App Server App Server App Server App Server App Server Database becomes the DATABASE Scalability Bottleneck Cannot leverage elasticity DASFAA 2015 8 Client Site Client Site Client Site Load Balancer (Proxy) App Server App Server App Server App Server App Server DATABASE DASFAA 2015 9 Client Site Client Site Client Site Load Balancer (Proxy) App Server App Server App Server App Server App Server Key Value Scalable and Elastic, but limited consistency and Stores operational flexibility DASFAA 2015 10 DASFAA 2015 11 Key-Valued data model ◦ Key is the unique identifier ◦ Value can be structured or unstructured Every read or write of a single row is atomic. Objective: make all operations single-sited. DASFAA 2015 12 Scale-up ◦ Classical enterprise setting (RDBMS) ◦ Flexible ACID transactions ◦ Transactions in a single node Scale-out ◦ Cloud friendly (Key value stores) ◦ Execution at a single server Limited functionality & guarantees ◦ No multi-row or multi-step transactions DASFAA 2015 13 On-line Social Media need to be consistent! ◦ New unanticipated applications The activist’s dilemma ◦ Remove untrustworthy friend X as friend ◦ Post “Demonstration Next Friday at YYY” (from Google Spanner presentation at OSDI 12) DASFAA 2015 14 Key-value Stores Transactions and SQL DASFAA 2015 15 Application developers need higher-level abstractions: ◦ Map Reduce paradigm for Big Data analysis ◦ Transaction Management in DBMSs ◦ Shared Logs paradigm for developing applications DASFAA 2015 16 First Gen Consistent Systems. Systems questioning wisdom of abandoning the proven data management principles Gradual realization of the value of the concept of a transaction. distributed transactions by co-locating data items that are accessed together Avoid DASFAA 2015 17 Start with the Relational Model and break up Major challenges ◦ Data Partitioning Three Example Systems ◦ ElasTraS (UCSB) ◦ SQL Azure (MSR) ◦ Relational Cloud (MIT) DASFAA 2015 18 Pre-defined partitioning scheme ◦ e.g.: Tree schema ◦ ElasTras, SQLAzure. Workload driven partitioning scheme ◦ e.g.: Schism in RelationalCloud DASFAA 2015 19 Start with the Key-Value Stores and combine the individual key-value pairs into larger granules of transactional access Megastore (Google) ◦ Statically defined by the apps G-Store (UCSB) ◦ Dynamically defined by the apps DASFAA 2015 20 In many applications, access patterns evolve, often rapidly. Transactional access to a group of data items formed on-demand Leader Followers Transaction Manager Cache Manager Log Asynchronous update Propagation DASFAA 2015 21 DASFAA 2015 22 “As a result they had no access to email, calendars, or - most importantly - their documents and Office Online applications” “most of digital communication - email, Lync, Sharepoint - was out” “Most of the other high-profile companies, including the likes of Amazon, have had substantial outages … cloud services are still in their infancy, and glitches like this are going to happen” Need to tolerate catastrophic failures ◦ Geographic Replication Clients can access data in any datacenter, appears as single copy with atomic access ◦ Distributed Transactions! Major challenges: ◦ Latency bottleneck (cross data center communication) ◦ Throughput DASFAA 2015 24 Google’s Spanner (OSDI 2012) Global-scale data infrastructure with True Time. Data is partitioned within data center Replication across data centers using Paxos Transactions use 2-phase commit 2PC Paxos DASFAA 2015 26 Client 2PC Paxos Number of wide-area messages: 1 3 46 DASFAA 2015 27 169 101 99 21 260 341 173 DASFAA 2015 28 UCSB’s Replicated Commit (VLDB 2013) Execute communication expensive protocol within a data center. Paxos for fault-tolerance across data centers Expensive 2PC within data center Paxos 2PC DASFAA 2015 30 Client 2PC Paxos Number of wide-area messages: 12 DASFAA 2015 31 Transaction latency is still high ◦ At least ONE round-trip time to detect conflicts Is it possible to commit transactions faster? A lower-bound on transaction latency How close can we get from the lower-bound? DASFAA 2015 32 Transaction Detect conflicts Up to 100s ms Up to 100s ms Replica1 Datacenter A Wide-area link Replica 2 Datacenter B DASFAA 2015 33 T1 requests to commit T1 commits A B Events can affect outcome of T1 Transaction T2 Events can be affected by T1 Commit latency of T1 + Commit latency of T2 must be greater than or equal the Round-Trip Time between them Based on the lower-bound A causally ordered log [Wuu & Bernstein 1984] is used to communicate transactions ◦ Send transaction information at request time ◦ Send transaction decision at commit time ◦ Transaction latency is time between the request and the commit decision Transactions are timestamped DASFAA 2015 35 Assign each datacenter a target transaction latency (TL) ◦ Sum of two latencies is greater than RTT At request time: send transaction info to other datacenters and calculate commit time ◦ commit time = request time + TL At commit time: check transactions received from other datacenters ◦ if no conflicts => commit ◦ Otherwise => abort DASFAA 2015 36 RTT(A,B) = 8 Latency(A) = 5 Latency(B) = 3 Transaction T1 Request at time 4 A Knowledge line Log sent T2 waits until time 6+3=9 Then, it already knows about T1 Transactions before time 5 at B will be known to T1 B T1 waits until time 4+5=9 Transaction T2 Request at time 6 In the paper a complete description of the protocol Time synchronization is not necessary for correctness Five datacenters – CA, OR, VA, Ireland, Singapore Comparing ◦ Helios: augmented with fault-tolerance layer to tolerate 2 datacenter failures ◦ Replicated Commit: tolerates 2 datacenter failures ◦ 2PC/Paxos: tolerates 2 datacenter failures (VA is leader) Workload: 5 operations/transaction ◦ 20% reads, 80% updates DASFAA 2015 38 Read latency 100 300 ms Commit latency 150 250 ms 100 ms Helios 200 ms 100 ms Replicated Commit 2PC/Paxos DASFAA 2015 39 DASFAA 2015 40 Data Confidentiality • Access privacy – Attacks ◦ Attacks Unauthorized accesses, side channel attacks, etc • Inferences on access patterns or query results Query Data Answer User Cloud Servers DASFAA 2015 41 In the long term, Current state-of-art solutions outsource encrypted database to the confidentiality level degrades to the cloud and query on encrypted data weakest encryption level Multiple Levels of Encryption • CryptDB [SOSP’11] and MONOMI [VLDB’13] Relies on costly Trusted/Secure Hardware secure co• TrustedDB [SIGMOD’11] and Cipherbase [CIDR’13] processors, which HOW ABOUT suffers from limited ACCESSresources PRIVACY? IS ACCESS PRIVACY REALLY THEY DON’T SECURE ACCESS PATTERNS NECESSARY??? DASFAA 2015 42 • Leak up to 80% of the search queries made to an encrypted email repository [Islam et al NDSS’12 ] • Reveals sensitive information in online behavioral advertising and web search [Backes et al S&P’12] • Search for a certain drug person’s medical condition • Search for a restaurant in a certain place person’s location Although encryption can easily be deployed to protect data confidentiality, it does not solve all privacy challenges posed by outsourcing to public clouds To eliminate the inference of useful information, access patterns should be made oblivious DASFAA 2015 43 Retrieving an item from a database while completely hiding the identity of the retrieved records! Client q=“give me ith record” Xi X1 Server X2 encrypted(q) Is there a betterencrypted(q)) solution encrypted-result=f(X, for access privacy? database X= … … … Xn • Traditional PIR: Requires computational indistinguishability • Not more efficient than downloading the entire database [Sion NDSS’07] • Various Optimization: Requires heavy server computation and does not provide sender security. DASFAA 2015 44 • ORAM has been used for outsourcing data storage • Provides access pattern protection by shuffling and re-encrypting data in each access Single Client ORAM Constructions Classical Constructions Public Cloud DB Query (Read or Write) Client [Goldreich et al STOC’87, JACM’96] • Small Client Memory • High Amortized Overhead Recent Practical Constructions SSS-ORAM [Stefanov et al NDSS’12], Path ORAM [Stefanov et al CCS’13] • More Client Memory • Less Amortized Overhead DASFAA 2015 46 An Dương, Cổ Loa and the legend of the thousand arrow magical crossbow Public Cloud DB Clients Goodrich et al SIAM’12 • Stores ORAM state encrypted at the server • Clients take turns to access database PrivateFS [Williams et al CCS’12] • Supports parallel access from different clients • Clients communicate through a log in the server Level of parallelism is limited ObliviStore[Stefanov et al S&P’13] • Multiple clients share a trusted proxy • Proxy coordinates requests and runs ORAM scheme • Uses more client memory but more practical DASFAA 2015 48 Outsourcing Data Storage to Public Cloud Benefit from trusted infrastructure/hardware Private Institute Networks • Universities, corporations, government agencies and hospitals Secure communication between clients and trusted proxy DASFAA 2015 49 Tree-based Asynchronous Oblivious Storage • • • • Supports multiple client accesses for Private Institute Networks Concurrent and non-blocking processing of requests Supports realistic (skewed) workloads Hides access patterns DASFAA 2015 50 Inherited Tree Based Cloud Storage Design From Path ORAM [Stefanov et al CCS’13] Each node stores fixed number of buckets Read operation is done over a path All buckets in the path are fetched to trusted local infrastructure DASFAA 2015 51 DB Maintains a dynamic local cache to store fetched blocks Stash and Subtree Does incremental non-blocking batch flush operation to write local blocks back to untrusted storage Untrusted Trusted CONTROL UNIT Local Cache STASH SUBTREE Thread Pool Flush Controller POSITION MAP ORAM Request Handler Clients DASFAA 2015 52 Path ORAM ObliviStore TaoStore 800 743.65 700 600 ~94X 500 400 ~46X 364.13 311.33 300 232.01 213.54 200 100 155.51 63.63 79.07 63.8 38.71 0 Uniform Skewed Response Time (ms) 7.9 7.88 Uniform Skewed Throughput (Ops/s) DASFAA 2015 53 The Cloud is the inevitable choice for hosting Big Data Ecosystems. Big Data Challenges: ◦ Volume, Velocity and Variety ◦ and PriVacy Cloud challenges: ◦ Distribution and Consistency ◦ Fault-tolerance and efficiency DASFAA 2015 54
© Copyright 2024