Download Report

D ATA S H E E T
PHEMI Central Big Data Warehouse
Take Advantage of
Enterprise-Grade Big Data to
Unlock the Value in Your Data
Collect
Unlock data silos, consolidate your data
Curate
Curate data for sub-second lookups with
automatic indexing and cataloging
Consume
Automatically enforce governance policies
PHEMI Central™ is a big data warehouse that takes advantage of the power, scalability,
and flexibility of Hadoop while providing fully integrated privacy, security, governance,
and data management—all built right in.
For the first time, organizations can take advantage of big data while retaining the
governance and data management of a traditional enterprise data warehouse to unlock
the value in their data, driving discovery and fuelling innovation with big data economics,
while meeting compliance and governance objectives:
t The ability to collect, curate, and consume any volume and variety of data
t High-speed data ingestion and processing that supports real-time operations
and business intelligence applications
t Full data control and lifecycle management
t Built-in Privacy by Design
Privacy, Security, and Governance
Automatically enforce data sharing, consent,
and privacy rules
The PHEMI Central Big Data Warehouse includes the following innovations:
Data Management
Gain full control with field-level enterprisegrade data management
DPF framework: Custom and standard Data Processing Functions (code libraries)
can parse, recognize, extract, cleanse, standardize, encrypt, mask, or redact
selected fields.
Metadata framework: Extensible, descriptive end-to-end metadata enables
access control and data management at the field level.
Privacy By Design: PHEMI Central is designed from the ground up to implement
PbD principles.
PHEMI Central Big Data Warehouse
PRIVACY, SECURITY, GOVERNANCE
Protect information at the field-level and ensure rightful access at scale
DATABASE
DATA SOURCES
REPORTS
and ANALYTICS
TEXT
BUSINESS
INTELLIGENCE
SPREADSHEET
IMAGES
SENSORS
GENOMICS
COLLECT
Ingest any raw data type
and tag with metadata
CURATE
CONSUME
Use powerful data processing
functions to transform and catalog
data into analytics-ready assets
Generate datasets on demand
Use any third-party apps
SYSTEM MANAGEMENT
Enterprise-grade reliability, availability, and scalability with cluster economics
CUSTOM
APPLICATIONS
THIRD-PARTY
APPLICATIONS
APPLICATIONS AND USERS
DATA MANAGEMENT
Manage data down to the field level
SYSTEMS
D ATA S H E E T
P H E M I C E N T R A L™ B I G D ATA W A R E H O U S E
Collect
Ingest and describe all types and any size of data
PHEMI Central ingests data from multiple and disparate sources. Data can range from small kilobyte files to large
terabyte files. Schemaless ingestion is fast. You can:
t Stream data from machine-to-machine data sources through the PHEMI REST API
t Push data directly from data sources and ETL tools using either JDBC or the PHEMI REST API
t Deploy a custom connector based on the PHEMI REST API to allow PHEMI Central to fetch data from
data sources
t Upload data manually using a standard web browser window
Data is tagged on ingest with descriptive metadata that immediately enforces privacy policies and controls the
data lifecycle. Data is indexed and cataloged as it is stored, making it immediately findable and retrievable.
Curate
Extract the greatest possible value from your data with processing, indexing, cataloging, linking,
and metadata
PHEMI Central uses a flexible, distributed key-value store, automatic indexing and cataloging, and sophisticated
metadata tagging to manage, describe, and govern the data that it stores.
Data Linking
After cataloging and indexing, data can be linked based on keywords, graph relationships, and geospatial attributes.
Data linking expands the kinds of connections you can make between data items, promotes discovery, and gives
you a more complete picture of your data.
Data Processing Function Framework
PHEMI Central lets you develop customized pieces of executable code, called Data Processing Functions (DPFs),
that provide unprecedented power and flexibility.
t Parse ingested data, extract or cleanse data, encrypt, redact, or anonymize selected information
t Provide enhanced or deeper indexing and cataloging
t Restructure data
t Transform data into standardized ontologies
t Analyze streams of machine data to find patterns and exceptions, calculate aggregates, or convert streaming data
into an analytics-ready state for trending and predictive analysis.
As the organization’s needs evolve and knowledge advances, you can simply develop new DPFs and re-execute
on your data. DPFs can be developed in modern programming languages such as Java, Python, and C++.
No specialized expertise in MapReduce or YARN is required. Your DPF can be developed by PHEMI, by your
in-house programmers, or by a third party.
Data Dictionary
Conventional big data systems store big data, but struggle to catalog or track diverse data types. With PHEMI
Central, you can use DPFs that act as data dictionaries, identifying and saving a common a common interpretation
for fields that occur frequently but are named differently or use different format conventions (such as “M/F” vs.
“Male/Female”, or converting between Imperial and metric measurement schemes). Data dictionaries greatly simplify
queries and analysis.
D ATA S H E E T
P H E M I C E N T R A L™ B I G D ATA W A R E H O U S E
Consume
Access your datasets on demand at sub-second speeds,
even with petabytes of data
Describing information with metadata means that users and applications can
query data based on the data’s properties, instead of navigating complex
directories or schemas. Multiple users can interact with the system, accessing
datasets via SQL, data exports, and custom applications.
Above all, information in PHEMI Central is findable and searchable, for users
and applications.
t Break down costly data silos by constructing datasets across multiple and
disparate data sources
t Reduce data sprawl by creating virtual datasets that are not instantiated
until export
t Improve consumption speeds with digital assets that are cataloged and
indexed in advance
t Ensure rightful access at all times, with every data request automatically
mediated by a policy enforcement engine
Privacy, Security, and Governance
Automatically de-identify, encrypt, or mask personal information
PHEMI Central provides an industry-pioneering set of capabilities to manage
the governance of sensitive data, enforced from end to end and throughout
the lifecycle of data. PHEMI Central uses one coordinated framework based
on Privacy by Design principles to define, manage, and enforce data sharing
agreements and privacy policies across an entire organization or set
of organizations.
Data is tagged with attributes that describes its level of sensitivity. Users
are tagged with attributes that describe their level of authorization. Simple,
powerful access rules describe the relationships between data visibility and
user authorization. Datasets can be associated with access policies that are
independent of the policies attached to the source data collections,
but rightful access to data is always enforced.
PHEMI Central keeps your data secure:
t User roles determine what operations a user can perform
t The system maintains a complete, tamperproof audit log of operations
and data access
t Communication links from data sources or to consuming systems can
be encrypted using Secure Sockets Layer (SSL) or Transport Layer
Security (TLS)
t Data fields can be individually selected for encryption at rest
t Because privacy and security are performed at the data level, it’s easier
and faster to prototype, test, and deploy new applications
Privacy by Design
A Privacy by Design (PbD) approach requires you
to take into account seven foundational principles
throughout your system. But how do you know
whether your system implements PbD principles?
Here’s a checklist:
1. Metadata. All data should be tagged on ingest
with enough descriptive information to allow
adequate privacy, sharing, consent, and lifecycle
management, plus compliance with any other
governance requirements.
2. Role-based access control. User and
application access to functionality and operations
is adequately restricted by system roles.
3. Policy-based data access. Access to and
visibility of data is restricted by permissions and
authorizations, and controlled by access policies.
4. Automatic policy enforcement. The system
automatically enforces policies and governance;
manual intervention is not required.
5. Transparency. Data stewards and privacy
officers can directly view and verify the system
implementation of governance policies.
6. Auditability. The system automatically
tracks system activity, and maintains a detailed,
tamperproof audit log of data access and
system operations.
7. Data immutability. Data in the repository
remains available in its original form, regardless of
what digital assets are derived from the original
through transformation.
8. Ability to anonymize. The system should be
able to automatically de-identify, encrypt, mask,
obfuscate, or redact personal information, and
allow the data steward or privacy officer to choose
which version of data appears to which users.
D ATA S H E E T
P H E M I C E N T R A L™ B I G D ATA W A R E H O U S E
Specifications
Data Management
Use a powerful metadata framework to
manage digital assets at the field level
Element-level metadata embeds the rules and
policies governing the data at the field level. Data
retention policies and data sharing agreements
are automatically enforced. Data in the system is
immutable: the original data cannot be modified
and data is only purged from the system based
on the configured retention policy. Robust
version control and rollback capabilities mean
that data is never lost, corrupted, or overwritten.
On-Premise Deployment
Cloud Deployment
4 Cluster Nodes. Each:
t Subscribe to PHEMI Central
as a managed service using
Amazon Web Services.
t8xCore (2.2GHz)
t64 GB RAM
t12 TB Direct Attached Storage
2 Management Nodes. Each:
tCloud service grows from
1 TB storage capacity.
t4xCore (2.2 GHz)
t64 GB RAM
t2 TB RAID1 Storage
10 Gigabit Ethernet Network
System Management
Get cluster reliability and economics at scale
PHEMI Central can be deployed at the customer
premise, as a managed service, or as a cloudbased service. The system uses low-cost
commodity hardware components and Direct
Attached disk drives to lower the cost of
ownership compared to traditional enterprise
data warehouse systems. Storage and
compute resources scale linearly from terabytes
to petabytes.
All data in the system is replicated three times to
ensure availability and resiliency. Direct-attached
drives can be hot-swapped without impacting
performance or data availability. Larger or faster
Direct Attached drives and nodes are absorbed
into the system and load-balanced automatically.
The system provides clear visibility into system
health, diagnostics, troubleshooting, capacity,
and digital assets under management.
Using Apache Ambari, system management
capabilities can be integrated with existing tools.
Data Ingest Protocols
Data Export Protocols
tSFTP File Transfer
t Excel/CSV/TSV Download
tHTTP/HTTPS Manual Upload
t REST Web Services API
tREST Web Services API
t ODBC/JDBC SQL Interface
tODBC/JDBC SQL Interface
tCCDA HL7 Interface
Analytics Tools
Data Processing Functions
tR
t Excel Reader
t SAP
t Variant Call Format (VCF) Reader
t SAS
t JSON Reader
t SPSS
t XML Reader
t Stata
t Tableau
Ease Your Entry into Big Data
PHEMI Central makes it easy to break into big data. The software is fully integrated and enterprise-ready, so you don’t need to hire a
team of Hadoop engineers to build and maintain your system. And, you can start small and expand incrementally. Use PHEMI Central
to offload your existing data warehouse, or to capture new data types or sources. Keep your existing systems and tools and let PHEMI
Central feed data into them. You can move into big data as you become ready, at your own speed.
Visit www.phemi.com for more information.
www.phemi.com
[email protected]
twitter.com/PHEMIsystems
linkedin.com/company/phemi
Copyright © 2015, PHEMI and/or its affiliates. All rights reserved. Affiliate names may be trademarks of their respective owners. April 2015