Not Only SQL

Not Only SQL
Table of Content
 Background and history
 Used Applications
 What is Cassandra? – Overview
 Replication & Consistency
 Writing, Reading, Querying and Sorting
 API’s & Installation
 World Database in Cassandra
 Using Hector API
 Administration tools
Background
 Influential Technologies:
 Dynamo – Fully distributed design - infrastructure
 BigTable – Sparse data model
Other NoSql databases
NoSql
Big Data NoSql
 MongoDB
 Hypertab
 Neo4J
 Cassandra
 HyperGra
 Riak
 Memcach
 Voldemort
 Tokyo Ca
 HBase
 Redis
 CouchDB
Bigtable / Dynamo
Bigtable
Dynamo
 Hbase
 Hypertable
 Riak
 Voldemort
Cassandra Combination of Both
CAP Theorem
 Consistency
 Availability
 Partition Tolerance
Applications
 Facebook
 Google Code
 Apache
 Digg
 Twitter
 Rackspace
 Others…
What Is Cassandra?
 O(1) node lookup
 Key – Value Store
 Column based data store
 Highly Distributed – decentralized (no master\slave)
 Elasticity
 Durable, Fault-tolerant - Replications
 Sparse
 ACID NoSQL!
Overview – Data Model
 Keyspace
 Uppermost namespace
 Typically one per application
 Column
 Basic unit of storage – Name, Value and timestamp
 ColumnFamily
 Associates records of a similar kind
 Record-level Atomicity
 Indexed
 SuperColumn
 Columns whose values are columns
 Array of columns
 SuperColumnFamily
 ColumnFamily whose values are only SuperColumns
Examples
 Column - City:
ORANJESTAD {"id": 1,
"name": "ORANJESTAD",
"population": 33000,
"capital": true}
 SuperColumns – Country:
Aruba {"id": "aa",
"name": "Aruba",
"fullName": "Aruba“,
"location": "Caribbean, island in the Caribbean Sea, north of Venezuela",
"coordinates": {
"latitudeType": "N",
"latitude": 12.5,
"longitudeType": "W",
"longitude": 69.96667},
….
Replication & Consistency
 Consistency Level is based on Replication Factor (N), nor





the number of nodes in the system.
The are a few options to set How many replicas must
respond to declare success
Query all replicas on every read
Every Column has a value and a timestamp – latest
timestamp wins
Read repair – read one replica and check the
checksum/timestamp to verify
R(number of nodes to read from) + W(number of nodes to
write on) > N (number of nodes)
The Ring - Partitioning
 Each NODE has a single, unique TOKEN
 Each NODE claims a RANGE of its neighbors in the
ring
 Partitioning – Map from Key Space to Token – Can be
random or Order Preserving
 Snitching – Map from Nodes to Physical Location
Writing
 No Locks
 Append support without read ahead
 Atomicity guarantee for a key (in a ColumnFamily)
 Always Writable!!!
 SSTables – Key/data – SSTable file for each column
family
 Fast
Reading
 Wait for R responses
 Wait for N – R responses in the background and
perform read repair
 Read multiple SSTables
 Slower than writes (but still fast)
Compare with MySQL (RDBMS)
 Compare a 50GB Database:
 MySQL
 ~300ms write
 ~350ms read
 Cassandra
 ~0.12ms write
 ~15ms read
Queries
 Single column
 Slice
 Set of names / range of names
 Simple slice -> columns
 Super slice -> supercolumns
 Key range
Sorting
 Sorting is set on writing
 Sorting is set by the type of the Column/Supercolumn
keys
 Sorting/keys Types
 Bytes
 UTF8
 Ascii
 LexicalUUID
 TimeUUID
Drawbacks
 No joins (for speed)
 Not able to sort at query time
 Not really supports sql (altough some API’s support it
on a very small portion)
API’s
Many API’s for large number of languages includes C++,
Java, Python, PHP, Ruby, Erlang, Haskell, C#,
Javascript and more…
 Thrift interface – Driver level interface – hard to use.
 Hector – a java Cassandra client – simple Column
based client – does what Cassandra is intended to do.
 Kundera – JPA supported java client – tries to translate
JPA classes and attributes to Cassandra – good on
inserts, hard and problematic still with queries.
Cassandra Installation
 Install prerequisite – basically the latest java se release
 Extract the Cassandra Zip files to your requested path
 Run Bin/cassandra.but –f
 Cassandra node is up and running
World database in cassandra
 World - Keyspace
 Countries – SuperColumn Family
 CountryDetails – SuperColumn
 Border – SuperColumns
 Coordinates – SuperColumn
 GDP – SuperColumn
 Language – SuperColumns
 Cities – Column Family
Using Hector API - definitions
 Creating a Cassandra Cluster :
Cluster cluster = HFactory.getOrCreateCluster("WorldCluster", "localhost:9160");
 Adding a keyspace:
columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE);
 Adding a Column:
BasicColumnFamilyDefinition columnFamilyDefinition = new
BasicColumnFamilyDefinition();
columnFamilyDefinition.setKeyspaceName(WORLD_KEYSPACE);
columnFamilyDefinition.setName(CITY_CF); // ColumnFamily Name
columnFamilyDefinition.addColumnDefinition(columnDefinition);
Using Hector API - definitions
 Adding a SuperColumn:
BasicColumnFamilyDefinition superCfDefinition = new BasicColumnFamilyDefinition();
superCfDefinition.setKeyspaceName(WORLD_KEYSPACE);
superCfDefinition.setName(COUNTRY_SUPER);
superCfDefinition.setColumnType(ColumnType.SUPER);
 Adding all definition to cluster:
ColumnFamilyDefinition cfDefStandard = new ThriftCfDef(columnFamilyDefinition);
ColumnFamilyDefinition cfDefSuper = new ThriftCfDef(superCfDefinition);
KeyspaceDefinition keyspaceDefinition =
HFactory.createKeyspaceDefinition(WORLD_KEYSPACE,
"org.apache.cassandra.locator.SimpleStrategy",
1, Arrays.asList(cfDefStandard, cfDefSuper));
cluster.addKeyspace(keyspaceDefinition);
Using Hector API - inserting
 Creating a Column Template
ColumnFamilyTemplate<String, String> template =
new ThriftColumnFamilyTemplate<String, String>(keyspaceOperator,
columnFamilyName,
stringSerializer,
stringSerializer);
 Adding a Row into a Column Family
ColumnFamilyUpdater<String, String> updater = template.createUpdater("a key");
updater.setString(“key", "value");
try { template.update(updater); }
catch (HectorException e) { // do something ... }
Using Hector API - inserting
 Creating a Super Column Template
SuperCfTemplate<String,String, String> template =
new ThriftSuperCfTemplate<String, String, String>(keyspaceOperator,
columnFamilyName,
stringSerializer,
stringSerializer,
stringSerializer);
 Adding a Row into a SuperColumn Family
SuperCfUpdater<String, String, String> updater = template.createUpdater("a key");
HSuperColumn<String, String, ByteBuffer> superColumn = updater.addSuperColumn(“sc name”);
superColumn.setString(“column name”, value);
superColumn.update();
try { template.update(updater); }
catch (HectorException e) { // do something ... }
Using Hector API - reading
 Reading all Rows and it’s columns from a Column
Family (Using CQL)
CqlQuery<String,String,String> cqlQuery = new
CqlQuery<String,String,String>(factory.getKeyspaceOperator(), stringSerializer, stringSerializer,
stringSerializer);
cqlQuery.setQuery("select * from City");
QueryResult<CqlRows<String,String,String>> result = cqlQuery.execute();
 Reading all columns from a Row in a SuperColumn
Family
SuperCfTemplate<String,String,String> superColumn =
HectorFactory.getFactory().getSuperColumnFamilyTemplate(“SuperColumnFamily”);
SuperCfResult<String, String, String> superRes = superColumn.querySuperColumns(“key");
Collection<String> columnNames = superRes.getSuperColumns();
Using Hector API - reading
 Reading a SuperColumn from a Row in a SuperColumn
Family
SuperColumnQuery<String, String, String, String> query = HFactory.createSuperColumnQuery(keyspaceOperator,
stringSerializer,
stringSerializer, stringSerializer, stringSerializer);
query.setColumnFamily(“SuperColumnFamily”);
query.setKey(“key");
query.setSuperName(“SuperColumnName");
QueryResult<HSuperColumn<String, String, String>> result = query.execute();
for (HColumn<String, String> col : result.get().getColumns()) {
String name = col.getName();
String value = col.getValue();
}
 Every query as options to get part of the rows – by
setting start value and end value (the rows are sorted
on inserting), and part of the columns by setting the
column names explicitly
Administration tools
 Cassandra – node activator
 Nodetool – bootstrapping and monitoring
 Cassandra-cli – Application Console
 Sstable2json - Export
 Json2sstable - Import