Pivotal GemFire XD DISTRIBUTED IN-MEMORY AND HADOOP-INTEGRATED SQL DATABASE

DATA S HEET
Pivotal GemFire XD
DISTRIBUTED IN-MEMORY AND HADOOP-INTEGRATED SQL DATABASE
FOR MISSION CRITICAL APPLICATIONS
OVERVIEW
AT-A-GLANCE
For developers that need to meet highest service level
requirements for structured big data applications,
Pivotal™ GemFire® XD is a distributed in-memory
database that is designed to provide:
•
Scale-out performance
•
Consistent database operations across globally
distributed applications
•
High availability, resilience, and global scale
•
Standards-based developer features and interfaces
•
Easy administration of distributed nodes
KEY FEATURES & BENEFITS
Scale-out performance
•
In-memory storage: all operational data available inmemory to avoid disk I/O penalty
•
High-memory nodes: supports systems with memory
capacity larger than JVM heap size limits
•
Elastic, linear scalability: easily scale up or down
capacity to meet changes in demand
•
Optimized data distribution & processing: configure
data distribution across grid to optimize speed of data
access & processing
Consistent database operations for Hadoop clusters
and across globally distributed applications
•
Flexible persistence: Store data in performanceoptimized disk persistence, or within Pivotal HD.
•
Configurable consistency: choose consistency model
supporting distributed OLTP applications to balance
performance and data availability.
•
SQL query support: Supports SQL queries of data over
distributed nodes that can be optimized with indexes
on key values
•
Advanced analytics access: analyze archived on-disk,
and in-memory data with Pivotal HAWQ via PXF
pivotal.io
SCALING OUT STRATEGIC DATA-DRIVEN SQL APPLICATIONS
Many applications are built with a relational data model to
meet requirements for reporting and analytics on current
and historical data. Other times its just a default choice of
starting with an RDBMS as the data management system.
When companies choose to scale-out such applications in
high concurrency deployments with thousands to hundreds
of thousands of concurrent operations, traditional relational
databases develop unacceptable performance problems.
Such high usage applications typically generate significant
historical information. Only with inexpensive, and flexible storage
solutions such as Hadoop does it make sense to keep large
detailed data sets. This includes not only transactional data, but
history, application logging, and data from external sources to
analyze user behavior, and application performance.
Pivotal GemFire XD is a distributed in-memory SQL database
for high scale custom applications. GemFire XD provides low
latency data access to applications at massive scale with many
concurrent transactions involving terabytes of operational
data. Designed for maintaining consistency of concurrent
operations across its distributed data nodes, Pivotal GemFire XD
can support ACID transactions for massively scaled applications
such as data stream analysis and processing, financial payments,
and ticket sales in proven customer deployments of more than
10 million user transactions a day. With optional persistence
and archival in HDFS, GemFire XD will store an extremely large,
consistent database in Hadoop nodes which can be accessed
for analysis by Pivotal HAWQ. Through support of standards
such as JDBC, GemFire XD works with common development
frameworks and reporting tools for relational data.
DATA SHEET PIVOTAL GEMFIRE XD
SCALE-OUT PERFORMANCE
IN-MEMORY STORAGE
KEY FEATURES & BENEFITS (CONTINUED)
GemFire XD stores all required data in RAM memory across
distributed nodes to provide fastest access to data while
eliminating the performance penalty of reading from disk.
High availability, resilience, and global scale
HIGH-MEMORY NODES
GemFire XD allocates in-memory storage off heap to take
advantage of hardware systems with memory capacity larger
than JVM head size limits, and to provide faster performance
by avoiding the Java garbage collection cycle governing memory
deal location.
•
Node failover: application and data access ensured in
event of network split or node failure
•
Resilient self-healing: fast node startup on reconnect,
self-healing of clusters automates restoration after
node failure
•
Cluster to cluster WAN connectivity: enabling global
scale of data access and multi-site capability
Standards-based Developer Features and Interfaces
•
API’s and Standards Support: develop in any
programming language that supports JDBC, Spring
Data JDBC, ADO.NET, ODBC, MapReduce.
•
Data type support: ANSI SQL-92 data types, table
definitions, and foreign key relationships, JSON
documents
•
Powerful application functions: data-aware stored
procedures , SQL-compliant queries and DML
statements, publish & subscribe event framework with
reliable asynchronous queues for delivering events.
•
Use familiar tools: Hibernate, NHibernate, Roo,
SQuirreL, IntelliJ, other JDBC-compliant tools
ELASTIC, LINEAR SCALABILTY
GemFire XD provides linear scalability that allows you to
predictably increase capacity for number of operations per
second, and data storage simply by adding additional nodes
to a cluster. Data distribution and system resource usage is
automatically adjusted as nodes are added or removed, making
it easy to scale up or down to quickly meet expected, or
unexpected, spikes of demand.
OPTIMIZED DATA DISTRIBUTION ACROSS NODES
GemFire XD will automatically optimize how data is distributed
across nodes to optimize latency and usage of system resources.
You can also configure partitioning and replication of data to
further optimize application response time. GemFireXD will
appropriately direct processing operations on data to the specific
nodes where data resides in order to reduce latency and network
traffic, according to the cluster configuration you set up for data
distribution and replication between nodes.
CONSISTENT DATABASE OPERATIONS FOR
HADOOP CLUSTERS AND ACROSS GLOBALLY
DISTRIBUTED APPLICATIONS
FLEXIBLE PERSISTENCE
To ensure durability of data in the event of node failure, GemFire
XD writes to disk a log of all creates, updates, and deletes of data
managed by a node. This log can then be read to reconstruct the
last consistent state of the in-memory database on that node
when a node comes back online. When persisted or archived in
Hadoop, this data can be used in analytics processing with tools
such as Pivotal HAWQ, and support even larger data volumes.
Using the event framework, you can modify persistence behavior
for purposes such as archiving historical data.
Easy administration of distributed clusters
•
Auto tuning and simplified cluster configuration:
automatic distribution of data to optimize usage
of system resources on nodes for best cluster
performance
•
Simplified Cluster Configuration: configure all nodes in
cluster from single fault-tolerant service
•
Cluster monitoring & data query: dashboard showing
cluster & node status; view and query data in nodes
•
Performance statistics analysis: offline tool for viewing
historical logs and statistics to diagnose bottlenecks
•
Command line tools: easy automation and scripting of
administrative tasks via command line interface
CONFIGURABLE CONSISTENCY
GemFire XD is capable of providing ACID consistency across
distributed nodes to support high capacity transactional
applications. You can also configure consistency models for
higher performance such as allowing the entire grid to cache
and operate on data, or turn consistency off when your
requirements case calls for speed rather than consistency.
2
DATA SHEET PIVOTAL GEMFIRE XD
Figure 1. Example topologies of Pivotal GemFire XD deployments supporting different service level requirements of data-driven applications.
SQL QUERY SUPPORT
CLUSTER-TO-CLUSTER WAN CONNECTIVITY
Pivotal GemFire XD supports the ANSI SQL-92 for authoring
queries. Queries are sent to the appropriate nodes that serve
relevant partitions of data. Query results are then merged and
sent back to the client application. Developers can define indexes
on key values to improve performance. You can define key values
that control distribution of data across nodes. When functions
that operate on partitions of data are invoked, processing will be
routed to appropriate nodes responsible for serving partitions of
targeted data.
GemFire XD allows multiple clusters to be connected via WAN
gateways. This allows application data access to span across the
globe, and allows companies to meet local data requirements,
such as country-specific privacy regulations. WAN connected
clusters also enable multi-site failover capability, ensuring
ongoing availability and built-in disaster recovery in the case of
catastrophic failure.
ADVANCED ANALYTICS ACCESS
Data persisted in Pivotal HD by GemFire XD can be accessed for
advanced analytic processing by Pivotal HAWQ by way of Pivotal
Extension Framework (PXF). This includes archived data as well
as latest state active data in-memory.
HIGH AVAILABILITY, RESILIENCE, AND
GLOBAL SCALE
STANDARDS-BASED DEVELOPER FEATURES
AND INTERFACES
API’S AND STANDARDS SUPPORT
Pivotal GemFire XD will manage data for applications in any
programming language that supports JDBC, ADO.NET, or ODBC.
For Java developers, GemFire XD provides support for Spring
Data JDBC. GemFire XD also extends the Hadoop MapReduce
API allowing MapReduce jobs to access GemFire XD data without
needing to start or access a GemFire XD distributed system.
NODE FAIL OVER
DATA TYPE SUPPORT
GemFire XD provides continuous uptime with built in high
availability and disaster recovery. Multiple failure detection models
detect and react to failures quickly, ensuring that the cluster is
always available, and that the data set is always complete.
GemFire XD supports structured data in relational data
models with declared tables and foreign key relationships.
Data types supported include those defined in the ASI SQL-92
standard. GemFire XD also supports JSON documents
and custom Java types.
RESILENT SELF-HEALING
GemFire XD has self-healing automation that allows a node to
quickly rejoin a cluster once it becomes operational again, with
fast startup, reconnect, and incremental updates of changed data,
all handled without administrator intervention.
POWERFUL APPLICATION FEATURES
GemFire XD provides powerful advanced application
features to developers that want to leverage its distributed
database capabilities. Like many database platforms, developers
can embed and generate queries using SQL. GemFire XD
3
DATA SHEET PIVOTAL GEMFIRE XD
provides a sophisticated event handling mechanism providing
durable asynchronous queues suitable for mission critical
application requirements.
USE FAMILAR TOOLS
GemFire XD, through support of JDBC and ANSI SQL, allows
usage of familiar integrated development environments,
app-development frameworks, business intelligence and
visualization tools.
EASY ADMINISTRATION OF
DISTRIBUTED NODES
AUTOMATED TUNING
GemFire XD is built to automate administrative tasks as much as
possible. This includes automating tuning of system resources
between nodes in a cluster by intelligently managing the
placement of data while reducing network round trips. Data
gets distributed and replicated according to the cluster
configuration, and requests for access are routed intelligently
using the most direct path available. This data placement and
resource allocation is adjusted automatically if nodes are added
to, or removed from the cluster.
COMPREHENSIVE MONITORING & ADMINISTRATION TOOLS
GemFire XD provides a comprehensive set of online and offline
tools for monitoring and administering clusters. The online
dashboard allows drill down into cluster and node status, and
querying of stored data. The offline analytics tool allows
diagnosis of system bottlenecks through analysis of historical
statistics logging. A command line tool allows administrators to
take action on clusters and nodes such as starting, stopping and
configuring settings.
FLEXIBLE DEPLYOYMENT OPTIONS
GemFire XD runs in Java Virtual Machines in 32 and 64-bit mode
on Linux and Windows operating systems. GemFire XD grids
can be set up with active/active multi-site bi-directional WAN
replication to enable disaster recovery, business continuity, and
geographical proximity for lowest possible latency world-wide.
LEARN MORE
To learn more about Pivotal’s products and services, please visit
us at pivotal.io. For more information about Pivotal Big Data Suite
for application developers, please visit pivotal.io/big-data.
SIMPLIFIED CLUSTER CONFIGURATION
Node configuration is handled centrally with automatic
redundancy for high-availability. New nodes can get their
configuration from the centralized configuration manager
upon startup to quickly join a cluster with no additional system
administration tasks.
Pivotal offers a modern approach to technology that organizations need to thrive in a new era of business innovation. Our solutions intersect cloud, big data and agile
development, creating a framework that increases data leverage, accelerates application delivery, and decreases costs, while providing enterprises the speed and scale
they need to compete.
Pivotal 3495 Deer Creek Road Palo Alto, CA 94304 pivotal.io
Pivotal, Pivotal CF, and Cloud Foundry are trademarks and/or registered trademarks of Pivotal Software, Inc. in the United States and/or other Countries. All other trademarks used herein are the property of their respective owners.
© Copyright 2014 Pivotal Software, Inc. All rights reserved. Published in the USA. PVTL-DS-10/14