How to Prepare Your Organization for the Metadata Era

The Metadata Era
CONTENTS OF THIS WHITE PAPER
Introduction ..............................................1
How to Prepare Your Organization
for the Metadata Era
The Dawn of the Metadata Era................2
Metadata Collection .................................3
Metadata Analysis ...................................3
The Varonis Metadata Framework ..........4
Summary .................................................5
INTRODUCTION
Over the past two decades, the widespread interconnectivity
and availability of computing resources has precipitated rapid
growth in digital collaboration and an exponential increase in
the amount of data that is created, shared, streamed and
stored.
Now we enter into a new era, where organizations have more
digital data than ever that must be continuously managed and
protected in order for it to remain safe and retain its value. To
do so, organizations need continuous, up-to-date information
about the data. To comprehensively manage data, you need
metadata. Use and analysis of metadata is already more
common than we realize; automated collection, storage,
analysis, and presentation of metadata will become necessity in
the new metadata era.
The digital revolution shares similarities with the transportation
revolution. When there were fewer than 10 automobiles in the
world, traffic lights weren’t necessary—it took about a hundred
years after cars showed up for the first documented electric
traffic light and motor vehicle speed limits to appear in the US.
Wilbur Wright didn’t file a flight plan when he took The Flyer out
for a spin in 1903—it took more than 30 years for the first
Airway Traffic Control Center to be built in 1935—a year that
saw over 30 airplane crashes (Source: planecrashinfo.com,
centennialofflight.gov).
The amount of data that IT Organizations must manage daily
has reached that same watershed. IT is working at capacity to
manage and protect data manually as best they can—
responding to authorization requests, migrating data, and
cleaning up excessive access. Despite this effort, they have
been falling further and further behind for the past 15 years.
There is simply too much data being created too quickly to
manage, protect, and realize its full value without automated
collection, analysis, storage, and presentation of metadata; the
Metadata Era has arrived.
The Metadata Era
Varonis Systems, Inc.
1
The Metadata Era
THE DAWN OF THE METADATA ERA
When objects have value and they start to multiply, we must begin to cope with their increasing numbers. We must
observe them, manage them, create and enforce rules in order to fully realize their benefits and potential. Not only do the
rules provide safety, but they provide a framework that enhances the value of the objects—we can drive faster with
designated lanes and fly more planes with air traffic control. Prior to the digital revolution, the number of information items
was relatively small, and distribution was slow. Organizations grew to the capacities of their human networks; hierarchies
of humans and file cabinets created a pool of tribal knowledge to draw from and analyze. Rules about who had access to
what information pertained to verbal and physical (paper) distribution.
As individuals and organizations started to use computers and create files, they naturally began to organize, manage,
and protect them using the available controls—they put them in folders on servers and protected them with access
control lists—the same methods that they used with their physical files, like paper folders in a shared file cabinet with
locks on some of the drawers. This worked reasonably well while the set of files and number of users was still relatively
small, though even small workgroups could lose track of their data assets before long.
Fast forward 15 years later, and the amount of available digital information has increased by several orders of magnitude,
the criticality of the data has grown, and the necessity for collaboration through digital information has never been more
prevalent. Gartner estimates that 80% of all data is unstructured, and that it will grow by 650% in the next 5 years, or
roughly 50% year over year. Not only are personnel within organizations digitally collaborating on a daily basis, but interdepartmental digital collaboration at scale is a vital necessity. Data is more valuable when it is organized, managed, and
protected; it is more available and less at risk of loss, theft, or tampering.
Organizations now store more and more information about their customers and partners, and have a responsibility to
safeguard it. Failure to protect this data can be damaging to organizations and individuals beyond the organization
storing the data; partners and customers now expect assurance that their information is being consistently protected in
order to conduct business. Managing and protecting millions of files manually is unrealistic, so more containers (shared
folders, sites, etc.) are needed to share the files among changing, cross-functional teams. More containers mean more
access decisions and reviews under more intense pressure. It can be almost impossible to mentally calculate the many
complex functional relationships between users, groups, and data—wrong decisions mean lost productivity and
increased risk.
Currently, an average terabyte of data contains roughly 50,000 containers. Of those 50,000 folders, 2,500 usually have
unique permissions applied to them. These folders’ permissions usually refer to several groups that contain a few or
dozens of users—an organization of 1,000 users often has 1,000 or more groups stored in their directory service (e.g.
Active Directory). All of these folder permissions and groups need to be maintained and updated as employees change
roles, yet 91% of organizations can’t identify business owners for their folders, (Source: Ponemon Institute Study, June
2008), nor can they determine which folders their groups grant access to.
IT simply cannot keep up by manually creating, updating and reviewing spreadsheets. Manual techniques are acceptable
when the numbers of objects and the functional relationships between them are relatively small, but as the numbers of
objects and relationships grow, a person’s ability to effectively observe and analyze them diminishes very quickly. Luckily,
WORLDWIDE HEADQUARTERS
EUROPE, MIDDLE EAST AND AFRICA
499 7th Ave., 23rd Floor, South Tower
New York, NY 10018
Phone: 877-292-8767
[email protected]
1 Northumberland Ave., Trafalgar Square
London, United Kingdom WC2N 5BW
Phone: +44-0-800-756-9784
[email protected]
The Metadata Era
computers are excellent tools to analyze large amounts of information, as long as we can program them to collect,
process, and analyze relevant metadata.
METADATA COLLECTION
In order to manage data, we need metadata that will help us determine, for example, who it belongs to, has access to it,
who uses it, and what kind of content it contains. Metadata comes in many forms: files and folders have names, size,
access timestamps, and permissions. Personnel don’t usually annotate the files they create with useful information, so
many organizations are automating content analysis to discover files that contain interesting data, like those that concern
special projects or contain regulated content like credit card numbers or other private customer information. This
metadata is commonly called “file classification.”
Another useful metadata element is a record of who is using each file, or audit trail. Unfortunately, most organizations
currently have no audit trail of data usage because native operating system auditing is too taxing on disk and CPU
resources. Imagine trying to manage your finances without a record of expenses!
Each file and folder has many metadata elements associated with it at any given point in time. If we track changes and
access activity, the associated metadata grows very quickly. The constantly changing files and folders generate streams
of metadata, and the combined metadata streams become a torrent. To capture, analyze, store and understand so much
metadata requires technology specifically designed for this purpose.
Adding to the complexity, these files and folders have interesting functional relationships between them—folders contain
many files, some users access the same folders and files, some files contain the same types of content, and some
folders are accessible by the same people. The number of functional relationships between metadata elements is
another order of magnitude greater than the elements themselves.
METADATA ANALYSIS
Simply collecting the metadata will not be enough to help us visualize and understand the complex functional
relationships which surround our data; the metadata must be synthesized and analyzed to help us determine where
sensitive data is exposed, who it belongs to, who has excessive permissions to it, and identify other data management
and protection concerns. The torrent of metadata elements and the functional relationships between them are far too
numerous and complex for humans to analyze effectively, so we must turn to automated analysis.
Automated analysis already plays a large part in how we interact with the world. For example:
•
You probably used an Internet search engine today—more than once.
•
Amazon.com now makes recommendations about books I might like based on what I’ve previously ordered.
•
Credit card companies analyze transactions to spot possible fraudulent activity.
•
I met my wife on Match.com—my profile popped up on her monitor with the words, “If you liked him, you might
also like this guy,” after she had contacted some other fellow (Tough luck, pal).
ITunes and other online shopping engines have similar functionality.
WORLDWIDE HEADQUARTERS
EUROPE, MIDDLE EAST AND AFRICA
499 7th Ave., 23rd Floor, South Tower
New York, NY 10018
Phone: 877-292-8767
[email protected]
1 Northumberland Ave., Trafalgar Square
London, United Kingdom WC2N 5BW
Phone: +44-0-800-756-9784
[email protected]
The Metadata Era
Automated analysis transforms an overwhelming set of objects into a digestible one, picking out items of high interest so
we don’t have to ferret through them manually. There are simply too many websites, books, songs, people using credit
cards, and potential mates for any human to go through them all, much less analyze them. Automating the analysis of
metadata will help us find the data and access rights that require our attention. One technology is built for this purpose:
The Varonis Metadata Framework.
THE VARONIS METADATA FRAMEWORK
The Varonis metadata framework non-intrusively collects critical metadata, generates metadata where existing metadata
is lacking (e.g. its file system filters and content inspection technologies), pre-processes it, normalizes it, analyzes it,
stores it, and presents it to IT administrators and data owners in an interactive, dynamic interface.
Four distinct metadata streams are currently collected:
•
User and Group Information – From Active Directory, LDAP, NIS, SharePoint, etc.
•
Permissions Information – Knowing who can access what data in which containers
•
Access Activity – Knowing which users do access what data, when and what they’ve done
•
Sensitive Content Indicators – Knowing which files contain items of sensitivity and importance, and where they
reside
With these metadata streams collected, synthesized, processed, and presented intelligently by the Varonis framework,
organizations can regularly answer the numerous pressing questions that arise in data governance:
•
Who has access to a data set?
•
Who should have access to a data set?
•
Who has been accessing it?
•
What other data have they been accessing?
•
Who is the likely data owner?
•
Which data is sensitive?
•
•
Where is my sensitive data overexposed and how do I fix it?
What data is unused?
Like search engines and online stores, Varonis uses sophisticated analytics to identify objects of interest, like users
whose access activity indicates that they have changed roles, yet still have access to data sets that are no longer
relevant for them, or users that suddenly access a statistically significant number of files. Varonis also uses automation to
help identify data owners—the most active users of a high level container where the business has write access are very
likely candidates. Once data owners are identified, they are empowered to make informed authorization and permissions
maintenance decisions through a web-based interface—that are then executed—with no IT overhead or manual backend
processes.
By collecting, processing, analyzing and presenting metadata to IT and the business, Varonis completes a full value
cycle for IT and the organization: comprehensive visibility into access rights, auditing of all access and authorization
activities, automated recommendations for where access should be restricted and activity scrutinized, and clear, robust
interfaces and reports. Effective access control, comprehensive auditing and data ownership are the foundations of data
WORLDWIDE HEADQUARTERS
EUROPE, MIDDLE EAST AND AFRICA
499 7th Ave., 23rd Floor, South Tower
New York, NY 10018
Phone: 877-292-8767
[email protected]
1 Northumberland Ave., Trafalgar Square
London, United Kingdom WC2N 5BW
Phone: +44-0-800-756-9784
[email protected]
The Metadata Era
management and protection—not only will they address most current data governance issues, they will also enable
successful execution of future data management and protection initiatives.
The Varonis Metadata Framework will scale to present and future requirements using standard computing infrastructure,
even as the number of functional relationships between metadata entities grows exponentially. As new platforms and
metadata streams emerge, they will be seamlessly absorbed into the Varonis framework and the productive
methodologies it enables for data management and protection.
SUMMARY
To fully realize the benefits of the digital information revolution, organizations will need governance, automation, and
analysis; data is simply more valuable when it is organized, managed, and protected. Managing and protecting data
without automation will be as inefficient and ineffective as trying to find information on the Internet without a decent
search engine—with exponential growth of mutually-critical data shared by organizations, customers, partners and
employees, organizations that do not protect and manage their data with automation will struggle to remain competitive
and survive. Those that do protect and manage data with automation will have significant advantages: the right data will
be more promptly available to the right people, and only the right people. Intellectual property will be secure, and secrets
will stay secret. Customers and partners will have confidence that shared information is protected.
To keep up with their already overwhelming data-related tasks, like permissions management, data auditing, data
ownership, data classification, data migrations, and archiving, it is inevitable that IT will need metadata and automated
analysis. Doing so will provide actionable intelligence and workflows that augment and accelerate existing business
processes. Automation will accelerate tasks that IT is laboring with on a daily basis, like creating permissions reports,
finding lost and deleted files, securing sensitive data, and remediating folders and SharePoint sites that are incorrectly
permissioned. They will also be able to perform tasks that they are unable to do today, such as identifying data owners,
providing them actionable information about who has access to their data, who accesses their data, and who has access
that probably shouldn’t. Those organizations that adopt and embrace metadata technology will have a distinct advantage
over those that do not—their organization will be more efficient, secure, and cost effective. Those organizations that can
harness the power of metadata will be leaders in the era following the digital information revolution—the era
of metadata.
FOR MORE INFORMATION
Phone: 877-292-8767
www.varonis.com/product
WORLDWIDE HEADQUARTERS
EUROPE, MIDDLE EAST AND AFRICA
499 7th Ave., 23rd Floor, South Tower
New York, NY 10018
Phone: 877-292-8767
[email protected]
1 Northumberland Ave., Trafalgar Square
London, United Kingdom WC2N 5BW
Phone: +44-0-800-756-9784
[email protected]