How to test NoSQL & HBase by YCSB

Written by Haejoon Lee in KAIST
How to test NoSQL & HBase by YCSB
Step1. Installing
JDK Version : java version “1.8.0_20
Hadoop Version : hadoop 1.2.1””
Cassandra : Cassandra 1.2.18
-
wget http://apache.mirror.cdnetworks.com/cassandra/1.2.18/apache-cassandra-1.2.18-src.tar.gz
YCSB : YCSB 0.1.4
-
wget https://github.com/downloads/brianfrankcooper/YCSB/ycsb-0.1.4.tar.gz
Installed location : for YCSB /usr/lib/ycsb for Cassandra /usr/lib/Cassandra
OS : Ubuntu 14.04 amd64
Machine: a single Node (localhost)
After installing Cassandra and YCSB in the folders, you should set up the path in
environment setting file. (~/.profile or /etc/profile: $CASSANDRA_HOME or $YCSB_HOME).
If you can command „ycsb‟ and „cassandra‟ in home directory,
/home/jjoon>>ycsb
/home/jjoon>>cassandra
Everything is okay, so far/
Step2. Configuration
Before testing, you should make properties folders in your cassandra directory.
/usr/lib/cassandra>>mkdir {commitlog,log,saved_caches,data}
Change configuration files (/cassandra/conf/log4j-server.properties, /cassandra /conf/Cassandra.yaml)
In log4j-server.properties
Log4j.rootLogger = INFO -> DEBUG
Log4j.appender.R.File = ~/log/system.log
In Cassandra.yaml
Data_file_directories: ~/data
Commitlog_directory: ~/commitlog
Saved_caches_directory: ~/saved_caches
Written by Haejoon Lee in KAIST
Step3. Testing
Before testing it by YCSB, you should start Cassandra Server, format tables in Cassandra.
>>canssandra –f
// Starting server
>>canssandra-cli –host localhost // Starting Client to connect with Server
: create keyspace usertable;
: use usertable;
: create column family with comparator=UTF8Type and default_validation_class=UTF8Type;
While Cassandra runs,
YCSB load (Execute the load phase) & run (Execute the transaction phase)
>>./bin/ycsb load cassandra-10 -P workloads/workloada –p hosts=localhost >> load.log
>>./bin/ycsb run cassandra-10 -P workloads/workloada –p hosts=localhost >>run.log
YCSB includes 6 workload (https://github.com/brianfrankcooper/YCSB/wiki/Core-Workloads)
-
workloada
-
workloadb
-
workloadc
-
workloadd
-
workloade
-
workloadf
You can set up each property by using ‘-p’ optional value
-p recordcount=10000 \
-p readmodifywriteproportion=0 \
-p operationcount=10000 \
-p requestdistribution=zipfian \
-p workload=com.yahoo.ycsb.workloads.CoreWorkload \
-p hosts=172.21.81.139,172.21.81.127 \
-p readallfields=true \
-p cassandra.connectionretries=1 \
-p readproportion=0.5 \
-p cassandra.operationretries=1 \
-p updateproportion=0 \
-p cassandra.readconsistencylevel=ALL \
-p scanproportion=0 \
-p cassandra.writeconsistencylevel=ALL \
-p insertproportion=0.5 \
-p cassandra.deleteconsistencylevel=ALL \
-threads 10
Written by Haejoon Lee in KAIST
Step4. How to test HBase by YCSB
HBase ?
-
HBase is an open source, non-relational, distributed database modeled after Google's
BigTable and written in Java. It is developed as part of Apache Software Foundation's
Apache Hadoop project and runs on top of HDFS (Hadoop Distributed Filesystem),
providing BigTable-like capabilities for Hadoop
HBase : 0.98.5
- wget http://apache.mirror.cdnetworks.com/hbase/hbase-0.98.5/hbase-0.98.5-hadoop1-bin.tar.gz
After installing HBase in the folders, you should set up the path in environment setting file.
(~/.profile or /etc/profile: $HBASE_HOME)
>>hbase
Go inside this newly created YCSB directory and move inside the hbase directory. You will
find an xml file here named as pom.xml. Open this pom.xml file and edit it.
<dependencies>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase</artifactId>
<!--<version>${hbase.version}</version>-->
<version>0.98.5</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-core</artifactId>
<!--<version>1.2.1</version>-->
<version>1.2.1</version>
</dependency>
<dependency>
<groupId>com.yahoo.ycsb</groupId>
<artifactId>core</artifactId>
<version>${project.version}</version>
</dependency>
</dependencies>
Written by Haejoon Lee in KAIST
YCSB Information
YCSB에서는 성능(Performance), 확장성(Scalability), 가용성(Availability), 복제(Replication)등 4
가지 측면에서 벤치마크를 수행할 수 있도록 해준다. 1차적으로 성능 중심의 테스트를 수행하려
고 한다. 성능 벤치마크의 경우 아래와 같이 이미 만들어진 다양한 유형의 작업 부하를 사용할
수 있으며, 필요한 경우 새로운 유형의 작업 부하를 만들 수 있다.
- 성능 작업 부하 유형
■Workload A: 업데이트 중심의 작업 (읽기 50%, 업데이트 50%)
사례) 최근의 액션을 저장하는 세션 정보 어플리케이션
■Workload B: 읽기 중심의 작업 (읽기 95%, 업데이트 5%)
사례) 포토 태그 ? 태그는 한번만 작성하고 주로 읽기 작업이 실행된다.
■Workload C: 읽기 전용 작업 (읽기 100%)
사례) 사용자 프로파일 캐시 ? 외부 저장소에 저장되어 있는 사용자 정보를 조회
■Workload D: 최근 레코드 중심의 읽기 작업 (읽기 95%, 쓰기 5%)
최근 저장된 레코드를 중심으로 읽기 수행
사례) SNS 사용자 상태 업데이트 ? 가장 최근의 상태와 뉴스만이 중요함
■Workload E: 영역 스캔 (읽기 95%, 쓰기 5%)
각 작업마다 100개 내외의 레코드 영역을 한번에 쿼리 한다
사례) 쓰레드 형식의 게시판
■Workload F: 읽기-쓰기-수정
읽기, 수정, 쓰기 작업을 순서대로 수행
사례) 사용자 데이터베이스 ? 각 사용자가 액션 수행 시 레코드를 읽고 업데이트
Reference) http://damul21c.tistory.com/7
Why we moved Cassandra to HBase?
http://ria101.wordpress.com/2010/02/24/hbase-vs-cassandra-why-we-moved/