Storage Trends File and Object Based Storage and how NFS-Ganesha can play

IBM Linux Technology Center
Storage Trends
File and Object Based Storage
and
how NFS-Ganesha can play
Venkateswararao Jujjuri (JV)
File systems and Storage Architect
IBM Linux Technology center
[email protected] | [email protected]
© 2014 IBM Corporation
2014
IBM Linux Technology Center
Outline











Data is Exploding
Storage Trends
Unstructured Data
Need for new solution – Object Store
File vs Object and Object details
Big question and answer
FOBS – File and Object Based Storage
Object Storage details and variations
NFS Evolution and pNFS and future
NFS-Ganesha
Conclusions
2/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Data is Exploding
 We create
IDC Says
Growth
– Data
willwill
reach
grow from 4.4ZB today to 44 ZB by 2020
Source: http://www-01.ibm.com/software/data/bigdata/what-is-big-data.html
3/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Storage Trends
 Data growth is around 70%/ year, most of it is unstructured.
 Scale-out rather than scale-up.
 Object is gaining lot of traction but file is not going away; NAS will
stay as significant player.

Analysts predict NAS grow at a CAGR of 25.44% over 2013-2018.
(http://cti.tmcnet.com/news/2014/04/04/7762020.htm)
 Unified Storage – NAS, SAN, and Object
 Growth mantra: FOBS
IDC Projections
* Structured Data Will grow
At a 21.8% CAGR
* Unstructured Data Will grow
At a 61.7% CAGR
4/28
Market Needs and Adoption
2000 – Direct Attached Storage – SAN
2010 – Network Attached Storage – NFS,CIFS
2020 – File and Object Based Storage
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Unstructured Data
 Basically non-database data
 Usually generated on an event (Cheese......Click)

Typically no access or read access (photos, xrays, dental recs)

Tough to interpret the content (jpeg can be a silly pic or blueprint)
 Emails, Instant Messages, Documents, Spread Sheets, Graphics,
Images, Videos, Social Media, Medical Records, wearable. on .. and
.. on...
 Explosive growth – in search for cost effectiveness and manageability.
 Why not continue file/NAS model?
5/28

Simple access model – No need for heavy POSIX interface.

Scale-Out: Hierarchical model is more of an overhead

Context: Difficult to build context of an individual file. (need entire
path)

Metadata is distributed hence complex/inefficient policies.

Loose/Eventual consistency is often good enough.
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Need for a new Solution – Requirements






Simple interface
Easy access, no need to traverse through dirs/subdirs
Context of the contents
Scale-Out capabilities
Massive and Cheap
Easy policies for ILM
6/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
File vs Object
Object
Penguin.jpg
File
FileName: Penguin.jpg
Times: atime, mtime, ctime etc
Owner:Group
Permissions: Unix style, ACLs etc
ObjectId
FileTyle
Times
Camera Info:
Resolution:
Owner Name:
Location:
Copyright:
Orientation
YcbCr positioning
Compression
Exposure Time
X-Resolution
Y-Resolution
Focal
Aperture
Flash
Focal Length
Color Space
Angle
Orientation
Preferred Display
Category
Importance
Tags
Version
Notes
Voice/comment
•Object: Simply an abstract container where
data and metadata are co-located
7/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Objects





Rich meta-data that co-exists with data; easy policies
Addressed by a 128 bit id – Flat Access
Checksum is part of metadata
Multiple file types can be in one object (a wave and jpeg)
Cost effective because of eventual consistency and the lack
of POSIX complexity.
 Scales well with off-the-shelf hardware
 Simple access protocol, RESTful API.
 Suited for the digital world generated unstructured data.
8/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Big Question
So...
File and NAS are DEAD?
9/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
….and the Answer is..
NO
File and NAS will continue to grow
File and Object joins hands together
to keep the party on!
FOBS – File and Object Based Storage
10/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
FOBS – File and Object Based Storage
 Object storage works best for WORM workloads and not all
data fits that tab.
 Object is meant for low cost mass storage which is not
actively shared.
 Traditional applications and file systems use continues
 File fills part of the spectrum where the need for rich set of
security and consistency guarantees.
 Object Storage fills the space where file/NAS is week.
 After-all, most of the object stores and structured data stores
(databases) are created on file-systems
11/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
FOBS – File and Object Based Storage
•
• Secure
• Consistency
• General Purpose
• Performance
• Legacy
• WORM/Cold
• Cost Effective
• High Volume
• Scalability
• Manageability
Volume Based
Market share
File
Object Store
Access/Update Frequency
FOBS
12/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Object Storage
 Objects are broadly divided into two categories.
Storage Devices
Web Services
* Move Smarts into the device layer
* Objects created on Filesystems
and accessed through web.
NASD, OSD-1, OSD-2, OSD Layer on FS etc
13/28
* Access command set
Ex: SCSI model command set for OSD.
* REST Model – HTTP protocol
Operation:Get, Put, Post, Delete
* Custom OSD mode: Lustre, Ceph
* Highly Available
* T10 OSD model: EXOFS, PanFS
* Loosely consistent.
* pNFS support.
* SWIFT, S3, Azure etc
* PBs of storage on 1000s of disks,
1000s of clients
* Gaining tremendous popularity.
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Object Based Filesystem (Ex: Ceph)
* Provides Posix-Compliant FS on top of Object-Based Ceph Storage Cluster
* Files gets mapped to Objects and MDS below librados
* MDS stores all Filesystem Metadata (Directories, Owners, Access info etc)
* Data directly stored on OSDs
* Out of band IO: Metadata provides data location, and IO is directly to OSDs
* Offers kernel mount or FUSE interface
Source:http://ceph.com/docs/master/architecture/
14/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Web Services Object Store - SWIFT
* Storage Nodes consists Objects,
stored as binary files on the filesystem
with metadata stored in the file’s
extended attributes (xattrs).
* Proxy Nodes receive and process
Incoming request and determine the
correct storage server for the request.
* All objects stored in Swift have a URL
* All objects stored are replicated 3x in
as-unique-as-possible zones.
* Object data can be located any where
in the cluster
* Nodes/Disks can be added without
downtime.
Source:http://docs.openstack.org/training-guides/content/module003-ch007-cluster-architecture.html
15/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
NFS Evolution
 NFS is extremely popular and widely used.
 Stateless NFSv3, very successful and de-facto for 'NFS'.
 Stateful NFSv4 came out in 2003

Adaptation is slow but gaining momentum since NFSv4.1 came out

Became a stepping stone to move towards NFSv4.1
 NFSv4.1 introduced in 2010, added enhancements and
addressed NFSv4 deficiencies.
16/28

Improved performance - pNFS, Directory Delegations, Trunking

Robustness - Exactly Once Semantics

Security – Windows native ACL support, Kerberized Back Channel

For time-to-market reasons, few players skipping NFSv4.0 and
directly moving to NFSv4.1. Ex: Vmware, Microsoft.
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Parallel NFS - pNFS
* Allows direct client access to the storage devices
* Clients can do parallel IO across storage
* Layouts can be leased, re-callable, and revokable.
* Removes IO bottlenecks and improves
large file performance.
POSIX
Interface
File/Object
Layout Driver
* Load balancing
* Scale-out model
* Control Protocol is not standardized,
vendor value-add.
Control
ExoFS
17/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Parallel NFS - pNFS
* Supported layout types are open-ended.
* Supports three types of layouts
- File (RFC 5661)
- Block (RFC 5663)
- Object (RFC 5664)
- Future:
- Flexible File Layout (proposal) and others
* File Layout
- Files, NFS protocol
- Default layout and many implementations
* Block Layout
- SCSI blocks, iSCSI, FCP etc
User Interface
NFSv4.1
PNFS Layouts
File
Obj
Block
Future
Network / IO stack
* Object Layout
- OSD SCSI object protocol, OSD2
- Few implementations, PanFS, Exofs(OSDFS)
18/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
NAS and Object to become FOBS
 Many traditional applications written for POSIX access
 Object storage is different and foreign to the traditional
applications.
 One of the solutions is to create a Filesystem layer on top of
object store.
 Ex: Maldivica storage connector creates filesystem
interface on top of SWIFT object store which can be
exported via NFS/CIFS (NAS)
 Provide Object Interface on NAS.

Ex: Calsoft Integrates NAS with modified openstack
SWIFT and provides SWIFT interface on NAS.
Source: http://www.calsoftinc.com/OpenStack-Object-Storage-Swift.aspx# , http://maldivica.com/
19/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Swift-on-File
 Ability to access the back-end using both object interface and
file interface.
 Swift-on-File stores objects following the same path hierarchy
as that object's URL.

Object URL: https://swift.example.com/v1/acc/cont/obj

Swift:/mnt/sdb1/2/node/sdb2/objects/981/f79/f566bd022b9285
b05e665fd7b843bf79/1401254393.89313.data

SoF: /mnt/gluster-vol/acc/cont/obj
 Enables objects created using the Swift API to be accessed as
files on a Posix filesystem.
 This opens up enormous possibilities including NAS and
RESTful interface to create and access the same data
 Use Case: Create video files using SWIFT, use file access to
trans-code it, and let it use by SWIFT to access in different
codec.
Source:https://github.com/swiftonfile/swiftonfile/blob/master/README.md
20/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
NFS for future
 NFS Pathless Objects - filesystem objects which can be
created, queried for and destroyed without being associated
with a pathname. (http://tools.ietf.org/html/draft-dipankar-nfsv4-pathless-objects-02)
 Metastripe - RFCs are being proposed to stripe/scale metadata servers
(http://tools.ietf.org/html/draft-mbenjamin-nfsv4-pnfs-metastripe-01)
 Ceph providing access to back-end RADOS object store
through LIBRADOS API, S3/Swift compatible API, Block,
CEPHFS - which can be nfs exported, including pNFS.
(http://ceph.com/docs/master/architecture/)
 pNFS over CEPH – CohortFS with metastripe
 PNFS over Lustre. - CEA, French Defense organization.
 Possible to offer selectable consistency with nfs backed
object store vrs web based.
 OpenStack Manila project (https://wiki.openstack.org/wiki/Manila/)
21/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
NFSv4.2
 Added Advanced features takes NFS into advanced file sharing
category.
 Performance:

Server Side Copy: Removes one leg of copy operation

IO_ADVISE: Client advise Server on Application access pattern.

Application Data Blocks (ADB): ex: VM image file type.

Sparce file support.
 Security:
.

Labeled NFS: Mandatory Access Control based on system wide policy
 Scalability and QoS
22/28

Space Reservation: Reserve Storage useful in thin provisioning

Hole Punching: Return unused parts of the file back to the pool.
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
NFS-Ganesha
 One of the mainstream NFS Server.
 User-level NFS server suitable for enterprise applicatoins

Manageability and debug-ability
http://tinyurl.com/kka8czz
Can manage huge meta-data and data caches
Provision to exploit FS specific features.
Can serve multiple types of File Systems at the same time.
Can serve multiple protocols at the same time.
Can act as a Proxy server and export a remote NFSv4 server.
Cluster friendly and Cluster Manager agnostic.








Easy recovery, failover and failback implementation.

Multi-protocol support with common DLM (planned)
 Small but growing community.
 Active participants

23/28
IBM, Panasas, Redhat, CohortFS(LinuxBox), CES, Bull, + few more.
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
NFS-Ganesha
 Supports many Filesystems through FSAL layer

VFS, GPFS, PanFS, Gluster, CEPH, Lustre, XFS, FUSE, Proxy, PT etc
 NFS v3, NFSv4.0, NFSv4.1, pNFS support. Minimal NFSv4.2
 IBM, Redhat, LinuxBox, Panasas released/releasing products
based on NFS-Ganesha.
 Released 1.5, 2.0, 2.1 releases, 2.2 is set to be GA'd by end of
October 2014.
.
24/28

Delegation, Statistics, Dynamic exports, LTTng support.

Supports file and object layouts of pNFS

Cluster Manager Abstraction Layer (CMAL)
– Clustered DRC, DLM, multi-protocol support.
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Conclusions
 Object storage is expanding and file remains to be very
important part of the equation and expected play together FOBS
is the future.
 Unified storage - NAS, Object, SAN
 NFS is progressing as a protocol, NFSv4.1 and pNFS support is a
must to be competitive in the market space.
 pNFS has major advantages - Scale-out meta-data, data; parallel
IO/ performance improvement.
 NFSv4.2 is taking NFS as a preferred filesystem/access protocol
for future storage needs.
25/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
NFS-Ganesha links
 NFS-Ganesha is available under the terms of the LGPLv3 license.
 NFS-Ganesha Project homepage on github

https://github.com/nfs-ganesha/nfs-ganesha/wiki
 Github:

https://github.com/nfs-ganesha/nfs-ganesh
 Download page

http://sourceforge.net/projects/nfs-ganesha/files
 Mailing lists
26/28

[email protected][email protected][email protected]
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Legal Statements
 This work represents the view of the author and does not necessarily represent the





view of IBM.
IBM is a registered trademark of International Business Machines Corporation in
the United States and/or other countries.
UNIX is a registered trademark of The Open Group in the United States and other
countries .
Linux is a registered trademark of Linus Torvalds in the United States, other
countries, or both.
Other company, product, and service names may be trademarks or service marks
of others
CONTENTS are "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR
IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.
Some states do not allow disclaimer of express or implied warranties in certain
transactions, therefore, this statement may not apply to you. This information
could include technical inaccuracies or typographical errors. Changes are
periodically made to the information herein; these changes will be incorporated in
new editions of the publication. Author/IBM may make improvements and/or
changes in the product(s) and/or the program(s) described in this publication at
any time without notice.
27/28
© 2014 IBM Corporation
LinuxCon 2014
IBM Linux Technology Center
Q & A
THANK YOU!
28/28
© 2014 IBM Corporation
LinuxCon 2014