Data Analysis and Visualization Guide - Documentation

Data Analysis and Visualization
Guide
Version 4.5
Copyright Platfora 2015
Last Updated: 10:15 p.m. June 28, 2015
Contents
Document Conventions............................................................................................. 8
Contact Platfora Support...........................................................................................9
Copyright Notices...................................................................................................... 9
Chapter 1: About Data Analysis in Platfora............................................................ 11
FAQs - Data Analysis and Analytics in Platfora..................................................... 11
Quantitative Analysis Concepts.............................................................................. 15
Event Series Analysis Concepts.............................................................................16
Chapter 2: Get Going with Vizboards...................................................................... 20
FAQs—Vizboard Basics..........................................................................................20
The Vizboard Workspace........................................................................................22
How Interactive Viz Queries Work..........................................................................24
Turn Live Updates Off and On............................................................................... 25
Chapter 3: Get Going in Viz Builder.........................................................................27
About Drag and Drop Viz Building..........................................................................27
About the Different Viz Types.................................................................................29
About the Builder Drop Zones................................................................................ 31
About the Viz Toolbar and Menus.......................................................................... 34
FAQs—Viz Builder Basics...................................................................................... 34
Chapter 4: Get Data to Analyze................................................................................ 38
About Lenses and Viz Types..................................................................................38
Choose a Lens........................................................................................................40
Find Fields in a Lens.............................................................................................. 41
Understand Lens Field Types and Roles............................................................... 41
Understand the Data in a Lens Field......................................................................43
View the Data Lineage of a Field......................................................................43
View the Definition of a Field............................................................................ 44
View the Definition of a Segment Field............................................................. 45
Chapter 5: Control the Marks on a Viz.................................................................... 48
Encode Data Using Mark Drop Zones....................................................................48
About Mark Types...................................................................................................50
Use Bar Marks...................................................................................................50
Use Point Marks................................................................................................ 56
Use Line Marks..................................................................................................60
Use Area Marks.................................................................................................63
Use Path Marks................................................................................................. 65
Data Analysis and Visualization Guide - Contents
Use Polygon Marks........................................................................................... 66
Use Text Marks................................................................................................. 66
Adjust Mark Appearance.........................................................................................69
Show Mark Outlines.......................................................................................... 69
Adjust Mark Color.............................................................................................. 71
Adjust Mark Size................................................................................................73
Adjust Mark Opacity.......................................................................................... 75
Adjust Mark Shape............................................................................................ 77
Adjust Mark Labels............................................................................................ 81
Chapter 6: Control Field Labels in a Viz..................................................................84
Truncate Field Labels in a Viz................................................................................84
Change Field Labels Width in a Viz....................................................................... 87
Change Field Display Names on an Axis............................................................... 90
Hide Field Name and Values on a Chart Axis........................................................92
Change the Number Formatting of a Measure....................................................... 95
Chapter 7: Sort Viz Data............................................................................................97
Default Sort Behavior..............................................................................................97
Change the Sort Order of a Dimension Axis.......................................................... 98
Chapter 8: Filter Viz Data........................................................................................ 101
FAQs—Viz Filters..................................................................................................101
Add a Filter on a Field..........................................................................................106
Filter on Dimension Fields...............................................................................107
Filter on Date Fields........................................................................................ 114
Filter on Measure Fields..................................................................................116
Create a Page Filter............................................................................................. 117
Toggle Filter Include/Exclude Mode......................................................................119
Filter by Selection................................................................................................. 120
Filter by Limit.........................................................................................................121
Chapter 9: Build a Chart Viz................................................................................... 124
FAQs—Chart Visualizations.................................................................................. 124
About the Chart Viz Workspace........................................................................... 124
Chapter 10: About Chart Viz Axes......................................................................... 127
Use Multiple Fields on an Axis............................................................................. 128
Transpose the X and Y Axis.................................................................................131
Change Axis Options for Measure Data............................................................... 131
Change the Value Range of a Measure Axis..................................................132
Change the Scale of a Measure Axis............................................................. 134
Change Axis Options for Dimension Data............................................................ 138
Page 3
Data Analysis and Visualization Guide - Contents
Change the Type of a Dimension Axis........................................................... 138
Chapter 11: Build a Cross-Tab Viz......................................................................... 142
FAQs—Cross-Tab Visualizations.......................................................................... 142
Enable Cross-Tab Totals...................................................................................... 143
Chapter 12: Build a Polar Chart Viz....................................................................... 145
FAQs—Polar Chart Visualizations........................................................................ 145
About the Polar Chart Viz Workspace.................................................................. 150
Chapter 13: Build a Geo Map Viz........................................................................... 152
FAQs—Geo Map Visualizations............................................................................152
About the Geo Map Viz Workspace..................................................................... 157
Chapter 14: Build a Funnel Viz............................................................................... 159
FAQs—Funnel Analysis Visualizations................................................................. 159
About Event Series Analysis.................................................................................160
About the Funnel Analysis Viz Workspace...........................................................161
Define and Analyze Funnel Stages...................................................................... 164
Analyze Funnel Stages Across Dimensions......................................................... 165
Chapter 15: Explore Marks in a Viz........................................................................166
Select and Highlight Marks on a Viz.................................................................... 166
Understand Data Values Not Displayed in Viz..................................................... 168
View the Data Values for a Mark......................................................................... 171
Zoom and Pan in a Viz.........................................................................................171
Drill Down Through Dimension Fields.................................................................. 175
About Drilling Down......................................................................................... 175
Drill Down FAQ................................................................................................178
Drill Down on a Field Value in a Chart Axis....................................................179
Drill Down on a Viz Mark................................................................................ 181
Drill Down on a Cross-Tab Cell.......................................................................183
View a Drill Path in a Viz................................................................................ 186
Drill Up............................................................................................................. 187
Chapter 16: Prepare Pages and Dashboards........................................................ 190
FAQs—Vizboard Pages........................................................................................ 190
Resize a Page to Fit the Browser Window...........................................................196
Show and Hide Tool Panels................................................................................. 197
Manage Viz Layout............................................................................................... 197
Edit a Visualization................................................................................................198
Arrange Visualizations on a Vizboard Page......................................................... 198
Preview a Vizboard with View Only Permission................................................... 201
Page 4
Data Analysis and Visualization Guide - Contents
Chapter 17: Share and Collaborate........................................................................ 203
Set Vizboard Permissions..................................................................................... 204
View Vizboard with View Only Permission...................................................... 205
Manage Vizboard Comments................................................................................206
FAQs—Vizboard Comments............................................................................206
Create a Comment on a Viz........................................................................... 209
Share a Link to a Vizboard...................................................................................212
Export Viz Data..................................................................................................... 213
Export a Viz Image............................................................................................... 214
Email a Single Viz as an Image........................................................................... 214
Share Vizboard as a PDF.....................................................................................216
FAQs—Vizboard PDFs.................................................................................... 216
Export a Vizboard as a PDF Manually............................................................ 218
Email a Vizboard as a PDF Manually............................................................. 220
Email a Vizboard as a PDF on a Schedule.................................................... 221
How Platfora Renders a Vizboard as a PDF...................................................223
Export a Viz as a New Dataset............................................................................ 223
Chapter 18: Request or Derive Additional Lens Fields........................................ 227
Vizboard Computed Fields....................................................................................227
FAQs—Vizboard Computed Fields..................................................................227
Add a Vizboard Computed Field..................................................................... 233
Combined Fields................................................................................................... 234
About Combined Fields................................................................................... 234
Create Combined Field....................................................................................236
Request Additional Lens Data.............................................................................. 238
Create a New Lens From Viz..........................................................................238
Segments...............................................................................................................239
FAQs—Segments............................................................................................ 239
Create Segments............................................................................................. 246
Chapter 19: Save Your Work in a Vizboard...........................................................253
Manage Vizboard Versions................................................................................... 254
Restore a Vizboard to a Previous Version........................................................... 255
Exit a Vizboard without Saving............................................................................. 256
Using Undo and Redo in a Vizboard....................................................................256
Duplicate a Vizboard.............................................................................................257
Chapter 20: Trace the Data Lineage of Viz Fields.................................................259
Export Viz Data Lineage....................................................................................... 260
What Data Lineage Includes.................................................................................260
Interpret Data Lineage Levels...............................................................................261
Page 5
Data Analysis and Visualization Guide - Contents
Chapter 21: Viz Example Gallery............................................................................ 265
Axis Chart Viz Examples...................................................................................... 265
Chart Type: Simple Bar................................................................................... 266
Chart Type: Bars with Different Color Values................................................. 267
Chart Type: Stacked Bar................................................................................. 268
Chart Type: Split Bar with Values................................................................... 269
Chart Type: Bar with Variable Widths............................................................. 270
Chart Type: Point Plot..................................................................................... 271
Chart Type: Scatter Plot.................................................................................. 272
Chart Type: Color Encoded Scatter Plot......................................................... 273
Chart Type: Bubble Chart................................................................................274
Chart Type: Color Encoded Bubble Chart.......................................................275
Chart Type: Gradient Grouped Scatter Plot.................................................... 276
Chart Type: Shape Encoded Scatter Plot....................................................... 277
Chart Type: Heatmap...................................................................................... 278
Chart Type: Size Encoded Heatmap...............................................................279
Chart Type: Size Encoded Matrix................................................................... 280
Chart Type: Line Chart.................................................................................... 281
Chart Type: Multi-Series Line Chart................................................................ 282
Chart Type: Color Encoded Multi-Series Line Chart....................................... 283
Chart Type: Variable Color Line Chart............................................................ 284
Chart Type: Variable Thickness Line Chart.....................................................285
Non-Axis Chart Viz Examples...............................................................................286
Chart Type: Packed Bubbles...........................................................................286
Chart Type: Packed Bubbles with Different Colors......................................... 287
Chart Type: Text Gauge..................................................................................288
Chart Type: Word Cloud..................................................................................289
Polar Chart Viz Examples.....................................................................................290
Polar Chart Type: Donut..................................................................................290
Polar Chart Type: Size Encoded Donut.......................................................... 291
Polar Chart Type: Pie...................................................................................... 292
GeoMap Viz Examples......................................................................................... 293
Chart Type: Simple Geo Map..........................................................................293
Chart Type: Color-Encoded Geo Map.............................................................294
Chart Type: Size-Encoded Geo Map.............................................................. 295
Cross-Tab Viz Examples...................................................................................... 295
Cross-Tab Type: Simple.................................................................................. 296
Cross-Tab Type: With Dimensional Groupings............................................... 297
Cross-Tab Type: Show Totals......................................................................... 298
Chapter 22: Platfora Expressions...........................................................................299
Expression Building Blocks................................................................................... 299
Functions in an Expression............................................................................. 299
Page 6
Data Analysis and Visualization Guide - Contents
Operators in an Expression.............................................................................301
Fields in an Expression................................................................................... 303
Literal Values in an Expression.......................................................................305
PARTITION Expressions and Event Series Processing (ESP).............................306
How Event Series Processing Works..............................................................306
Best Practices for Event Series Processing (ESP)......................................... 310
ROLLUP Measures and Window Expressions..................................................... 312
Understand ROLLUP Measures...................................................................... 312
Understand ROLLUP Window Expressions.................................................... 315
Computed Field Examples.................................................................................... 316
Troubleshoot Computed Field Errors....................................................................318
Write a Lens Query...............................................................................................320
FAQs - Expression Basics.................................................................................... 321
Expression Language Reference..........................................................................322
Expression Quick Reference........................................................................... 322
Comparison Operators.....................................................................................337
Logical Operators.............................................................................................338
Arithmetic Operators........................................................................................ 339
Conditional and NULL Processing...................................................................339
Event Series Processing..................................................................................341
String Functions............................................................................................... 349
URL Functions................................................................................................. 377
IP Address Functions...................................................................................... 382
Date and Time Functions................................................................................ 384
Math Functions................................................................................................ 390
Data Type Conversion Functions.................................................................... 394
Aggregate Functions........................................................................................399
ROLLUP and Window Functions.....................................................................403
User Defined Functions (UDFs)...................................................................... 417
Regular Expression Reference........................................................................422
Page 7
Preface
This guide provides information and instructions for analyzing data in Platfora®. This guide is intended
for data analysts and business analysts who are responsible for exploring data, finding insights, and
building dashboards and reports. Knowledge of business intelligence and data analysis is recommended.
Document Conventions
This documentation uses certain text conventions for language syntax and code examples.
Convention
Usage
Example
$
Command-line prompt proceeds a command to be
entered in a command-line
terminal session.
$ ls
$ sudo
Command-line prompt
$ sudo yum install open-jdk-1.7
for a command that
requires root permissions
(commands will be prefixed
with sudo).
UPPERCASE
Function names and
keywords are shown in all
uppercase for readability,
but keywords are caseinsensitive (can be written
in upper or lower case).
SUM(page_views)
italics
Italics indicate a usersupplied argument or
variable.
SUM(field_name)
[ ] (square
Square brackets denote
optional syntax items.
CONCAT(string_expression[,...])
...
(elipsis)
An elipsis denotes a syntax
item that can be repeated
any number of times.
CONCAT(string_expression[,...])
brackets)
Page 8
Data Analysis and Visualization Guide - Introduction
Contact Platfora Support
For technical support, you can send an email to:
[email protected]
Or visit the Platfora support site for the most up-to-date product news, knowledge base articles, and
product tips.
http://support.platfora.com
To access the support portal, you must have a valid support agreement with Platfora. Please contact
your Platfora sales representative for details about obtaining a valid support agreement or with questions
about your account.
Copyright Notices
Copyright © 2012-15 Platfora Corporation. All rights reserved.
Platfora believes the information in this publication is accurate as of its publication date. The
information is subject to change without notice.
THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” PLATFORA
CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH
RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS
IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR
PURPOSE.
Use, copying, and distribution of any Platfora software described in this publication requires an
applicable software license. Platfora®, You Should Know™, Interest Driven Pipeline™, Fractal Cache™,
and Adaptive Job Synthesis™ are trademarks of the Platfora Corporation. Apache Hadoop™ and Apache
Hive™ are trademarks of the Apache Software Foundation. All other trademarks used herein are the
property of their respective owners.
Embedded Software Copyrights and License Agreements
Platfora contains the following open source and third-party proprietary software subject to their
respective copyrights and license agreements:
• Apache Hive PDK
• dom4j
• freemarker
• GeoNames
• Google Maps API
• javassist
Page 9
Data Analysis and Visualization Guide - Introduction
• javax.servlet
• Mortbay Jetty 6.1.26
• OWASP CSRFGuard 3
• PostgreSQL JDBC 9.1-901
• Scala
• sjsxp : 1.0.1
• Unboundid
Page 10
Chapter
1
About Data Analysis in Platfora
Got questions about the types of data analysis / analytics you can do in Platfora? Want to know how you go
about analyzing data once you have it in a lens? This section explains how data analysts work with data in
Platfora, and the major concepts of data analysis in Platfora.
Topics:
•
FAQs - Data Analysis and Analytics in Platfora
•
Quantitative Analysis Concepts
•
Event Series Analysis Concepts
FAQs - Data Analysis and Analytics in Platfora
This topic answers the most frequently asked questions (FAQs) about data analysis and analytics in
Platfora.
What is the difference between 'data analysis' and 'data analytics'?
The terms analysis and analytics are often used interchangeably, and indeed the differences between
the two terms is subtle. Analysis refers to the process of analyzing data, where analytics refers to the
technology and methodologies involved in analyzing data. Basically, analysis and analytics perform
the same function, but analytics refers to a specific application of statistical methodology or computer
technology applied to data analysis.
What is 'big data analytics'?
Big Data Analytics refers to the collection of technologies and methodologies for processing and
analyzing large amounts of data of many different types. The primary goal of big data analytics is to
uncover hidden patterns, unknown correlations, and other useful information in order to make better
business decisions.
Big data analytics allows business analysts and data scientists to analyze large volumes of transactional
data, as well as other data sources that traditional data warehouses and business intelligence reporting
(BI) programs cannot handle. These other data sources may include things like server logs, web
clickstream data, social media reports, mobile device records, and machine-generated sensor data.
Page 11
Data Analysis and Visualization Guide - About Data Analysis in Platfora
Platfora is an end-to-end big data analytics solution for high-volume, multi-structured data stored
natively in Hadoop.
How does Platfora prepare data for analysis?
Platfora prepares data for analysis by building a Lens, Platfora's proprietary data storage object. Lenses
contain pre-aggregated, compressed, columnar data that is optimized for interactive analysis. Lenses
are loaded into memory on the Platfora servers as they are used. This makes the experience of building
visualizations (lens queries) fast and responsive. Analysts can build their own lenses by choosing fields
of interest from a Dataset in the Platfora data catalog. Platfora datasets point to raw data in Hadoop.
There are two kinds of lenses you can build in Platfora - an Aggregate Lens or Event Series Lens (ESL).
Aggregate lenses are the most common lens type and the most flexible in terms of the types of analysis
you can do.
How can I tell where the data came from and how it was prepared?
Platfora can show the data lineage of any field in a lens, all the way back to the raw data file(s) in
Hadoop. Analysts can see every transformation that happened to the data along the way.
You might want to view data lineage to address any of the following questions about the data:
• How can I reproduce this result?
• Where did this data come from?
• How recent is the data?
What kinds of analysis can I do in Platfora?
The Platfora documentation uses two broad categories that correspond to how Platfora prepares the data
for analysis. Using the two lens types offered by Platfora, there are many different kinds of analysis you
can do, and many different analytics methodologies you can apply. These are just some of the general
types of analysis possible:
Page 12
Data Analysis and Visualization Guide - About Data Analysis in Platfora
• Quantitative Analysis
When you build an Aggregate Lens in Platfora, you are preparing data for quantitative analysis. Data
is pre-aggregated. This is a broad category that encompasses many different analysis techniques and
methodologies, such as:
• Descriptive Analysis - Describe the main features of a large collection of data.
• Confirmatory Analysis - Confirm or negate a hypothesis.
• Exploratory Analysis - Find previously unknown relationships in the data.
• Inferential Analysis - Use a smaller sample of data to learn something about a bigger population.
• Causal Analysis - Find out what happens to one variable when you change another.
• Event Series Analysis
When you build an Event Series Lens in Platfora, you are preparing data for event series analysis
(also known as Time Series Analysis or Behavioral Analysis). Data is not pre-aggregated; it is
grouped by a common entity (such as user id) and ordered by time to facilitate searching for patterns
across multiple event records and datasets.
What tools does Platfora provide for analyzing data?
Platfora has several tools for querying the prepared data in a lens, but the primary way to analyze data
in Platfora is to use the Viz Builder tools in a Vizboard. By dragging fields to the various drop zones
in the viz builder panel, you can dynamically generate lens queries that are rendered as Visualizations
(charts, graphs, geo maps, summary tables, etc.).
In addition to the viz builder tools in the vizboard, Platfora has other, non-visual ways to access the data
in a lens:
• Platfora Expression Language - Platfora has an extensive library of built-in functions that you
can use to further manipulate data in a lens to achieve the results you want. You can use Platfora
expressions to define the computed fields needed for your analysis.
Page 13
Data Analysis and Visualization Guide - About Data Analysis in Platfora
• Programmatic Query Access - Using Platfora's SQL-like query language, you can submit queries
to a Platfora lens using the REST API. This allows you to access data stored in Platfora from other
analytics tools, such as R.
• Lens Data Export- You can also export the data in a lens back to HDFS or download portions of a
lens to your desktop. This allows you to use data prepared by Platfora in other applications or data
workflows.
What kinds of charts can I create in a Platfora vizboard?
Platfora's viz builder is very flexible and allows you to dynamically build many different kinds of charts
and graphs. (which are called a Visualization or Viz in Platfora). The types of charts available depends
on how the data was prepared (the Platfora lens type), and the combination of data fields used in the viz.
Some of the charts you can build with an aggregate lens are:
• Bar / Column Charts
• Histograms
• Line Charts
• Area Charts
• Heatmaps
• X/Y (Scatterplot) Charts
• Bubble Charts
• Pie or Doughnut Charts
• Text Gauges / Word Clouds
• Geo Maps
• Cross-Tabs (Pivot Tables)
With an event series lens, there is currently only one available chart type:
• Funnel Chart
Can Platfora do advanced analytics, such as predictive analytics?
Predictive analytics encompasses a variety of statistical techniques to analyze historical data and model
patterns that can identify potential future risks or opportunities. When building models for predictive
analysis, a lot of the work involves analyzing the historical data to look for relationships and patterns.
Platfora allows data scientists to build aggregate data samples (lenses) for developing and validating
their models in an iterative, self-service fashion.
Platfora provides an R connector to allow data scientists to build lenses in Platfora to sample data
directly from Hadoop. They can then use R Studio to develop and test their predictive models against a
Platfora lens. Scoring a model involves running a job directly on the Hadoop cluster using a tool such as
Revolution R or Hive. The output of the scored model is then written back to HDFS, where Platfora can
analyze and visualize the final results.
Page 14
Data Analysis and Visualization Guide - About Data Analysis in Platfora
Quantitative Analysis Concepts
This topic explains the format of data in an aggregate lens, the most common lens type in Platfora.
Aggregate lenses are built to support quantitative data analysis. Data fields in an aggregate lens serve
one of two key roles in an analysis: measure or dimension. The concepts of measures and dimensions are
important to understand, as they impact how data is prepared and analyzed in Platfora.
Measures (The Quantitative Data)
The measure fields of a lens hold the data to be quantified. Measure fields are always listed first in an
aggregate lens, with a special icon
to denote that they are measures.
Measures provide the basis for quantitative analysis in a visualization or lens query. A measure is a
numeric value representing an aggregation of values from multiple rows. For example, measures contain
data such as total dollar amounts, average number of users, count distinct of users, and so on.
Measure values always result from an aggregate function. Examples of aggregate functions include
COUNT, DISTINCT, AVG, SUM, MIN, MAX, VARIANCE, and so on.
In some data analysis tools, measures (or metrics as they are sometimes called) can be aggregated at the
time of analysis because the amount of data to aggregate is relatively small. In Platfora, however, the
data in a lens is pre-aggregated to optimize performance of big data queries. Therefore, you must decide
how to aggregate the metrics of your dataset up front. You do this by defining measures either in the
dataset or at lens build time. When you go to analyze the data in a vizboard, you can only do quantitative
analysis on the measures you have available in the lens.
Measures answer 'how' questions about data such as 'how many' or 'how long'?
Measure Type
Examples of Measure Fields
How much or how many
sum of all sales, count of distinct users, maximum unit
price, minimum feedback score
How long
total minutes on a call, number of days between sales
To what degree
maximum number of users, minimum page hits
Measures are always numeric data that can be represented on a continuous scale. However, not all
numbers in a dataset are continuous and measurable. For example, a zip code is a number, but not
something that makes sense to add or average. It might make sense, however, to count how many
distinct zip codes you have. Choosing the right fields in a dataset to serve as measures, and the right
aggregations to apply to those fields, depends on how you want to analyze the data.
Dimensions (The Categorical Data)
The dimension fields hold the categorical values by which you analyze the measure data. Dimensions
are used to summarize, filter, and group quantitative data in order to analyze it from different
Page 15
Data Analysis and Visualization Guide - About Data Analysis in Platfora
perspectives. For example, a Product dimension can help you understand which products generate the
most sales for your business. A Date dimension can show you the breakdown of sales by year, quarter,
month, or day.
A dimension field is data that answers 'who', 'what', 'where', and 'when' questions.
Dimension Type
Examples of Dimension Fields
Who
customer name, gender, title, cookie, social security
number
What
product type, call type, account type, product ID
Where
geo location, zip code, sales region, country, state
When
date, timestamp, month
Dimensions can be numeric
, text
, or date/time-based
dimensions are the values you would GROUP BY.
data. If you are familiar with SQL,
Event Series Analysis Concepts
This topic explains the different methods for doing event series analysis in Platfora. Event series analysis
in Platfora involves partitioning events by some entity (such as a user), ordering events records by
their timestamp, and then looking for interesting patterns of behavior. There are two ways to do this
in Platfora -- add special event series processing (ESP) computed fields to a single dataset or build a
special event series lens (ESL) that contains event records from one or more datasets.
Event Series Processing (ESP) Computed Fields
Time-stamped event data is very common in big data. With machine-generated data, a single application
can generate millions of events in a single day. For example, when users visit a website, they generate
events by clicking links or viewing pages. Each action they take is written to a log file along with the
timestamp.
Event series processing (ESP) computed fields iterate over multiple rows in a single dataset to find
interesting patterns. For example, suppose you wanted to know which visitors to your website viewed
a product page, then added the product to their cart, then left the site without making a purchase. You
could create an ESP computed field that looks at each visitor session, then outputs an 'abandoned cart'
flag for each session that met that criteria.
ESP computed fields can only be created in a dataset using a special PARTITION expression.
PARTITION expressions are very powerful and flexible, but can also be complicated to write. ESP
computed fields are an advanced feature intended for data administrators. To analysts, ESP computed
fields behave just like any other field in a dataset.
Page 16
Data Analysis and Visualization Guide - About Data Analysis in Platfora
You should use ESP computed fields when:
• You want to analyze event series data in a single dataset only
• Your dataset has time-stamped event records
• You want to do visual analysis on the results (include the ESP field in a lens)
• You have complex patterns you want to evaluate
• You want to use the output of an ESP field in another calculation (for example, create a 'bounce rate'
measure from the results of other ESP fields)
Entity-Centric Data Modeling and Event Series Lenses (ESL)
In some cases, a company may be tracking user events across several applications. For example, a web
application tracks the pages the user visits, an ad server tracks which advertisements a user is shown,
and a transactional system tracks the user's purchases. As long as all events have a common entity (the
user), they can be combined into a single event series lens and analyzed together in Platfora.
In order to create an ESL, data administrators first have to model the event datasets around a common
entity dataset.
Page 17
Data Analysis and Visualization Guide - About Data Analysis in Platfora
Once the data is modeled in this way, then analysts can create event series lenses that include event
records from multiple datasets. In a vizboard, analysts can then use an event series lens to do funnel
analysis. Funnels are currently the only viz type available on event series lens data.
You should use event series lenses if:
• You want to analyze event across multiple datasets
• All of your event datasets have time-stamped event records
• All of your event datasets have a common entity field (such as a user ID)
• The common entity data can be modeled into a separate dataset with a primary key
• You are only interested in doing funnel analysis on your event data (Funnel is the only viz type
currently supported for ESLs)
Page 18
Data Analysis and Visualization Guide - About Data Analysis in Platfora
• You want to define segments of users based on different event criteria (for example, create a segment
of users that visited your website and then called customer service)
Page 19
Chapter
2
Get Going with Vizboards
In Platfora, the vizboard is where you create and manage visualizations around a particular project or subject
area. Typically, you would create one vizboard for each data analysis project that you are working on. A
vizboard can contain one or more pages and pages can contain one or more visualizations.
Topics:
•
FAQs—Vizboard Basics
•
The Vizboard Workspace
•
How Interactive Viz Queries Work
•
Turn Live Updates Off and On
FAQs—Vizboard Basics
A vizboard is the starting point for data analysis. This topic answers some frequently asked questions
about vizboards.
What is a vizboard?
A vizboard is the starting point for data analysis, and can be thought of as a dashboard or project
workspace. The vizboard is the canvas for discovering and sharing data insights. Vizboards have one
or more pages, and each page can have one or more visualizations. The individual visualizations on
a vizboard page can be related (use the same underlying data), or unrelated (use completely different
data).
What is a visualization (viz)?
A visualization (or viz for short) is a graphical representation of certain data fields chosen from the
perspective of a Platfora lens. For more information, see FAQs—Viz Builder Basics.
Who can create and view a vizboard?
To create a vizboard a user must have the Analyst (Limited) role or higher and also have data access
permission on a dataset in order to create a visualization.
Any user can view a vizboard, but they can only view the viz data if they have data access permission on
the data sources used in the viz.
Page 20
Data Analysis and Visualization Guide - Get Going with Vizboards
How do I create a vizboard?
You can create a new, empty vizboard in several different ways. Each new vizboard contains one page
with one empty visualization to help you get started.
• Click the Add Vizboard button from the Vizboards page.
• Click the Create Vizboard button from the data catalog of a particular dataset.
Page 21
Data Analysis and Visualization Guide - Get Going with Vizboards
How do I rename a vizboard?
You can change the name of a vizboard at any time by opening it and changing its name from the
vizboard title bar. All vizboards are given a default name of Untitled Vizboard when they are first
created. You must have edit permissions on a vizboard in order to rename it.
How do I close a vizboard?
To close a vizboard, navigate off the page by clicking any other page in the top navigation header. If you
don't want to lose your work, Save the vizboard before you exit.
There are so many vizboards, can I organize the vizboards I'm interested in?
Yes! You can apply labels to all objects, including vizboards, to more easily find them from the
Vizboards page. For more information, see Organize Datasets, Lenses and Vizboards with Labels.
The Vizboard Workspace
The vizboard is where you create and manage visualizations around a particular business question,
project, or subject area. A vizboard contains pages, and pages contain visualizations.
Visualizations only exist in the context of a vizboard document. From the vizboard you can easily
navigate between pages to add or edit visualizations, and arrange the layout of visualizations on each
page. You can also comment on visualizations within the vizboard and share your insights with other
Platfora users.
Page 22
Data Analysis and Visualization Guide - Get Going with Vizboards
Vizboard-level and page-level controls are located in the menus at the top of the page. Using the panel
controls on the left and right side of the page, you can show and hide vizboard panels to toggle between
exploratory mode and presentation mode.
1. Vizboard name
2. Vizboard-level and page-level controls
3. Edit lens button
4. Add menu for supplementing the lens
5. Choose type of viz, either Chart or Cross-Tab.
6. Show/hide pages panel
7. Show/hide builder panels
8. Pages panel
9. Lens panel
10.Builder panel
11.Selected viz
12.Filters and legends panel
13.Show/hide filter panel
14.Show/hide comments
Page 23
Data Analysis and Visualization Guide - Get Going with Vizboards
How Interactive Viz Queries Work
When you build a visualization in Platfora, you are actually creating queries. The act of dragging a lens
field to a Builder drop zone creates a query that is sent to the Platfora server, fetches the requested lens
data, and then visually renders the results as a Chart or graph. You can also see the results in a tabular,
spreadsheet format by looking at the Cross-Tab of a viz.
When many people think of queries, they think of SQL (the structured query language used to access
data stored in a relational database). Platfora queries are not written in SQL, they are constructed by
a user's actions in the vizboard. However, it may help to understand how lens data is requested and
returned if we make a comparison with SQL.
If a lens query were expressed in SQL, it would look something like this (the clauses are listed in the
order that they are processed by the Platfora query engine):
SELECT <dimensions in builder>, <measures in builder>
FROM <lens>
WHERE <dimensions in filters>
GROUP BY <dimensions in builder>
ROLLUP <rollup measures in builder>
HAVING <measure filters>
ORDER BY <dimension sort options>
LIMIT <dimension limit options>
Note that the query is constructed based on the kind of field (measure or dimension), and where the
fields are placed in the Builder drop zones.
Page 24
Data Analysis and Visualization Guide - Get Going with Vizboards
Turn Live Updates Off and On
When working in a vizboard, you can choose to pause live updates for the vizboard. This allows you to
conserve resources on the Platfora cluster by reducing the number of times the web application queries
and renders data.
By default, when you make a change to a visualization (viz) in a vizboard, the Platfora web application
updates the viz in real-time by querying the Platfora server and rendering the change in the viz.
Modifying a viz uses resources on the Platfora cluster, such as memory, network bandwidth, and CPU
cycles. Typically, lenses with more data use more resources during viz updates.
When to Turn Live Updates Off
You might want to pause live updates when you are working with a very large lens, high cardinality
dimension fields, or a large number of unique data points. By default, each change to the builder or
filter drop zones issues a new query. When working with large data, you may not want to wait for each
change to finish rendering before you can make another change.
For example, you might want to pause live updates while arranging multiple fields in the builder drop
zones, then issue the query once you have all the fields and filters in place. This way, the query is only
executed one time and the application only renders the filtered data, not all of the data.
How to Turn Live Updates Off and On
To control live updates, use the Live Updates button in the vizboard page control menu. Turning
off live updates pauses automatic updates for all visualizations in the vizboard. When live updates are
Page 25
Data Analysis and Visualization Guide - Get Going with Vizboards
turned off, the application does not query and render the data as you modify a viz. To update a viz after
modifying it, you can either click Update Now in the viz, or turn on live updates again.
Page 26
Chapter
3
Get Going in Viz Builder
This section describes some basic concepts about visualizations and the viz builder tools.
Topics:
•
About Drag and Drop Viz Building
•
About the Different Viz Types
•
About the Builder Drop Zones
•
About the Viz Toolbar and Menus
•
FAQs—Viz Builder Basics
About Drag and Drop Viz Building
To begin building a visualization, drag fields from the data panel into one of the Builder drop zones. A
good way to get started is to think of your business question and drag the associated fields to the X-axis
and Y-axis drop zones.
For example, if your question was "How much did we sell from each product line?" you could start by
dragging the Product Line dimension to X-axis and the Total Sales measure to Y-axis.The X-axis
drop zone in charts equates to the Columns drop zone and the Y-axis drop zone equates to the Rows
drop zone in cross-tab visualizations.
When you select a field and start to drag it over, the appearance of the drop zones change to help
guide your placement of the field in the Builder panel. Some drop zones allow both measures and
dimensions, some only allow measures, and some only allow dimensions. A grayed out drop zone means
Page 27
Data Analysis and Visualization Guide - Get Going in Viz Builder
the drop zone does not apply to the selected field. A highlighted drop zone means that the drop zone is
available for the selected field.
The X-axis, Y-axis, Details, and Filter drop zones all allow multiple fields. Once these drop zones
are populated with at least one field, the drop zone for adding additional fields appears as a thin blue line
either above or below the currently populated field.
In the X-axis and Y-axis drop zones, measures must always be placed below dimensions.
The position of dimension fields in the X-axis and Y-axis drop zones determines the grouping order
on the axes.
Page 28
Data Analysis and Visualization Guide - Get Going in Viz Builder
You can remove a field from a drop zone by clicking the orange X next to the field name. You can also
drag a new field directly on top of an existing field to replace it.
About the Different Viz Types
Platfora offers a diverse range of tools to interactively explore and analyze your data in visualizations.
The different types of visualizations allow analysts to perform different kinds of data analysis.
Table 1: Visualization Types
Viz Type
Analysis Type
Lens Type
Chart
quantitative
aggregate lens Allows you to do exploratory analysis of
data that you graphically represent in
chart form. Chart visualizations support
several types of marks allowing you to
create different types of charts, such
as bars, lines, plots, text, and more.
Typically most charts are displayed on a
X-Y axis, but some chart types use no
axis, such as a word cloud.
Page 29
Description
Data Analysis and Visualization Guide - Get Going in Viz Builder
Viz Type
Analysis Type
Lens Type
Cross-Tab
quantitative
aggregate lens Allows you to view the data in the lens
in a tabular, spreadsheet format.
Geo Map
geographic,
quantitative
aggregate lens Allows you to perform geographic
analysis on an aggregate lens that
contains geo-encoded location data. A
geo map viz is similar to a scatterplot
chart viz except the marks are
displayed on a map background.
Page 30
Description
Data Analysis and Visualization Guide - Get Going in Viz Builder
Viz Type
Analysis Type
Lens Type
Description
Polar Chart
quantitative
aggregate lens Allows you to do exploratory analysis
of data that you graphically represent
in chart form using polar coordinates.
Currently, polar charts only support the
Bar mark type, allowing you to create
pie charts and donut charts.
Funnel
event series
event series
lens
Allows you to track users' behavior
across a sequence of events. Each step
in the sequence is defined as a stage.
Each funnel stage shows progressively
decreasing proportions of the original
set of users.
About the Builder Drop Zones
Dragging fields to the Builder drop zones determine the placement and visual appearance of the marks
on your visualization.
Page 31
Data Analysis and Visualization Guide - Get Going in Viz Builder
Some drop zones are available only for certain field roles (measure or dimension), and some drop zones
may not be useful for the mark type selected. By default, Platfora chooses the best visual representation
for the combinations of measure and dimension fields you add to the drop zones.
Drop
Zone
Allowed
Field
Roles
Allows
Multiple
Fields?
Description
X-axis
(Chart)
Columns
(Crosstab)
Any
(except
datetime
measure)
Yes
Sets the data points shown on the horizontal axis of
the visualization. When multiple fields are added to
this drop zone, the horizontal axis will be grouped (for
dimensions) or trellised (for measures) according to
the order of the fields in the drop zone. Measures are
always placed below dimensions.
Y-axis
(Chart)
Rows
(Crosstab)
Any
(except
datetime
measure)
Yes
Sets the data points shown on the vertical axis of
the visualization. When multiple fields are added to
this drop zone, the vertical axis will be grouped (for
dimensions) or trellised (for measures) according to
the order of the fields in the drop zone. Measures are
always placed below dimensions. In Cross-tab view,
measures are always shown as columns even if they
are placed in the rows drop zone.
Angle
(Polar
Chart)
Yes
Sets the relative angle from the Y-axis of a Polar
Chart visualization using the polar coordinate system.
When multiple fields are added to this drop zone, the
viz is trellised horizontally according to the order of
the fields in the drop zone, showing a polar chart for
each unique angle value. Measures are always placed
below dimensions.
Geography Location
(Geo Map)
No
Sets the data point positions (geographical
coordinates) on a Geo Map visualization.
Details
Yes
In Chart view, shows additional measure values in
the tooltips only. For dimensions, it adds additional
marks (or groups) to the visualization without
any visual encoding. In Cross-tab view, it adds
additional columns for measures, and sub-columns for
dimensions.
Any
(except
datetime
measure)
Any
Page 32
Data Analysis and Visualization Guide - Get Going in Viz Builder
Drop
Zone
Allowed
Field
Roles
Allows
Multiple
Fields?
Description
Color
Any
(except
datetime
measure)
No
Color-encodes marks on the viz. Dimensions use a
categorical color palette. Measures use a continuous
range of colors. A color legend is added to help you
decode the colors assigned to each value or range of
values.
Size
Measure
(numeric
only)
No
Size-encodes marks using a continuous range with
the smallest value in the range having the smallest
size mark, and the largest value having the biggest
mark. A size legend is added to show the range of
values.
Shape
Dimension No
Shape-encodes point marks. When a dimension field
is dropped in the Shape zone, a unique shape is given
to each value in the dimension. A shape legend is
added to help you decode the shapes assigned to
each value.
Opacity
Measure
(numeric
only)
No
Transparency-encodes marks using a continuous
range with the smallest value in the range having
the lightest mark, and the largest value having the
darkest mark. A transparency legend is added to
show the range of values.
Labels
Any
No
Adds a text label to marks. For measures, text labels
show the aggregated value. For dimensions, it shows
the dimension member value. If value names are long
or if the selected field has a lot of values, using this
drop zone can result in overlapping, unreadable text.
Page 33
Data Analysis and Visualization Guide - Get Going in Viz Builder
About the Viz Toolbar and Menus
Every visualization has a collection of buttons in its top right corner. These viz controls include options
for duplicating, resizing, and exporting the visualization. There is also menu controls for creating and
viewing viz comments, and for managing viz selections.
FAQs—Viz Builder Basics
A visualization (viz) is a graphical representation of certain data fields chosen from the perspective of a
Platfora lens. This topic answers some frequently asked questions about working with visualizations.
What is a visualization (viz)?
A visualization (or viz for short) is a graphical representation of certain data fields chosen from the
perspective of a Platfora lens. It is a query of aggregated lens data that is visually rendered based on the
types of fields chosen (measure or dimension), their order and placement in the Builder drop zones,
and the various appearance encodings applied to the data (color, size, shape, and so on).
Page 34
Data Analysis and Visualization Guide - Get Going in Viz Builder
A viz shows aggregated measure data grouped and filtered by the chosen dimensions. A chart in Platfora
can best be described as a recipe of dimension and measure fields, plus axis placement (X-axis and Yaxis), plus appearance encodings (Color, Size, Shape, Opacity, Labels), plus mark type (Point, Line, Bar,
Area, and so on).
Who can create a viz?
To create a viz, a user must have the Analyst (Limited) role or higher and also have data access
permission on the dataset you want to analyze.
What types of visualizations can I create?
Platfora has several different viz types including Chart, Cross-Tab, Geo Map, Polar, and Funnel.
The type of viz you can create depends on the type of lens you want to analyze. For more information,
see About the Different Viz Types.
How do I create a viz?
When you create a new vizboard, Platfora inserts a new viz automatically. You can add another viz by
clicking the Add Viz button in a vizboard.
To analyze data in a viz, first you choose the type of viz to create because that determines the datasets
available to you. Then you choose the dataset to analyze and an associated lens built on that dataset. For
more information, see Get Data to Analyze.
Page 35
Data Analysis and Visualization Guide - Get Going in Viz Builder
You can choose where visualizations appear on the page when you add them. From the vizboard View
menu, choose either Add to center of page or Grow page downward.
How do I rename a viz?
You can change the name of a viz at any time by selecting it and changing its name from the viz title
bar. All visualizations are given a default name of Visualization X (where X is a number) when first
created. You must have edit permissions on the vizboard to rename a viz.
How do I delete a viz?
Click the delete icon
Menus.
in the viz toolbar. To see the viz toolbar, go to see About the Viz Toolbar and
Page 36
Data Analysis and Visualization Guide - Get Going in Viz Builder
I found some great insights in my current viz and I want to do more analysis
on it, but I don't want to lose the work I have. How can I best do that?
You can duplicate the current viz and edit the copy. Click the duplicate button
in the viz menu.
Platfora creates a new viz on the same page with the same lens, filters, and fields in the drop zones. You
can move the viz elsewhere on the page or move it to another page by dragging the viz by its toolbar.
Page 37
Chapter
4
Get Data to Analyze
The first step in creating a visualization is choosing the data you want to explore and analyze, which is done by
selecting a lens. Once you pick a lens to work with, the measure and dimension fields in that lens are loaded into
the lens field list in the data panel.
Topics:
•
About Lenses and Viz Types
•
Choose a Lens
•
Find Fields in a Lens
•
Understand Lens Field Types and Roles
•
Understand the Data in a Lens Field
About Lenses and Viz Types
Every visualization displays data using a single viz type, such as chart or cross-tab. The viz type
determines the type of lenses available for the viz.
Page 38
Data Analysis and Visualization Guide - Get Data to Analyze
Each viz type uses a particular lens type, such as aggregate or event series lenses. Therefore, when you
create a viz, you first choose the viz type in the data panel before choosing a lens to analyze.
Once you've chosen the viz type, Platfora lists all datasets that have built lenses of the appropriate type.
If you do not see the dataset you want, that means that no lenses of the appropriate type have been built
for that dataset.
When you create a viz of a particular type, you can change to another type later
as long as both viz types use the same lens type. For example, if you create a
Cross-Tab viz, you can change it to Chart at any time. Be careful when changing
viz types later because not all drop zones in one viz type correlate to drop zones
in the other viz type. You may lose some viz configurations when you change viz
types.
Page 39
Data Analysis and Visualization Guide - Get Data to Analyze
Choose a Lens
Every visualization must be associated with a lens. To find a lens, first select the dataset you are
interested in. Only datasets that have successfully built lenses will be shown. Choosing a dataset will
show any lenses that have been built from the focal point of that dataset.
1. With a viz selected, find the name of the dataset you want in the data panel. Datasets that have lenses
available are listed in alphabetical order. If you do not see the dataset you want, that means that no
lenses have been built for that dataset.
2. Choose a lens that was built from the selected dataset.
3. Picking a lens will load the available fields of that lens into the data panel. Measure fields are at the
top of the field list, and dimension fields are listed below the measures in alphabetical order.
Once you pick a lens, you cannot go back and change your lens selection. If the
lens was not the one you wanted, delete the viz and start again with a new viz.
Page 40
Data Analysis and Visualization Guide - Get Data to Analyze
Find Fields in a Lens
When you pick a lens to work with, the measure and dimension fields included in the lens are shown in
the lens data panel. The data panel does not show all of the dataset fields, only those that were requested
when the lens was built.
Measure fields (
dimensions (
blue.
) are always listed at the top, followed by dimensions (
) and referenced
) in alphabetical order. Computed fields that were defined in the vizboard are highlighted
If you hover over a field name, you can see a tooltip with the field details and the field option menu.
In a long list, you can use the search to filter fields by name. Clear the search criteria to remove the
search filter.
Understand Lens Field Types and Roles
Lens fields are categorized into two basic roles: measures and dimensions. Measures are always
aggregated numeric type data. Dimensions can be numeric, text, datetime, or location type data. As you
Page 41
Data Analysis and Visualization Guide - Get Data to Analyze
browse through the fields in the data panel, you will notice that each field has an icon to denote its role
and type.
Icon
Field Role
Description
Measure
(numeric)
A measure provides the basis for quantitative data analysis in
a viz. Measure fields produce aggregated data values for each
dimension grouping in the viz. Measures are always quantitative
(continuous) numeric values, and every visualization must have
at least one measure. Measures are always listed at the top of
the field list.
Datetime
Measure
A datetime measure is a special variety of measure fields that
show the maximum or minimum date value for a dimension
category. Datetime measures are always the DATETIME data type
and use the MAX or MIN aggregate function.
Categorical
Dimension
A categorical dimension allows you to group measure values into
discrete categories. For example, a Region dimension with values
such as US, Asia, and Europe could be used to group a measure
such as Total Sales. Categorical dimensions are typically textual
data (strings) that are used to filter and group measure data.
Numeric
Dimension
A numeric dimension allows you to show dimension data as a
continuous range of values in addition to discrete categories. For
example, a Customer Rating dimension with values 1-10 could be
viewed as a range of values or as discrete categories. Numeric
dimensions are always numeric data types (integer, fixed,
double, long). Note that numeric dimensions are not the same as
measures. They are not aggregated data and can only be used
for grouping, not for quantitative analysis.
Date
A date is a special kind of dimension field for datetime type data
values. It renders each date as a discrete category.
Date
A date (timeseries) is a special kind of dimension field for
(Timeseries) datetime type data values. It contains the same data values as
a regular date field, but renders the data as a continuous range
rather than as discrete values.
Location
A location is a special kind of dimension field for geo-encoded
data. It uses a complex datatype that includes geo coordinate
information (latitude and longitude) and can optionally include a
label that associates a place name with the coordinates. This field
is typically used in geo map visualizations to place positions on a
map.
Page 42
Data Analysis and Visualization Guide - Get Data to Analyze
Icon
Field Role
Description
Reference
Lenses can contain fields from multiple datasets, as long as the
datasets are related by a reference. A reference is shown as
a toggle arrow. You can expand a reference to see additional
dimension fields that you can use in your visualization.
Segment
A segment is a special type of dimension field that you can create
to group together members of a population that meet some
defined common criteria. A segment is a based on members of a
dimension dataset (such as customers) that have some behavior
in common (such as purchasing a particular product). A segment
is always based on a referenced dimension dataset, and must
include at least one condition from a fact or event dataset.
Understand the Data in a Lens Field
Analysts typically explore data prepared by someone else. This section describes some tools analysts can
use to better understand the history and meaning behind the fields in a lens.
View the Data Lineage of a Field
Preparing data for analysis often involves some form of cleansing and manipulation in order to make
sense of the data. Raw source data is seldom consumed in its original format, and usually goes through
various processing and transformation steps before it is used in a visualization. Viewing the data lineage
of a field allows you to see where the data came from and what processing functions were applied.
Data lineage allows you to answer questions such as:
• Where did this data come from?
• How current is this data?
• How was this result calculated?
• How can I reproduce this analysis on other data?
For any field in a lens, Platfora is able to show how that field was derived - all the way back to the
source data files in Hadoop. This includes the data sources, datasets, lenses, and computed field
expressions that the data went through to create the data values you are seeing in your visualization
Page 43
Data Analysis and Visualization Guide - Get Data to Analyze
or cross-tab. The data lineage report lists the lens field and all its parent objects up to the configured
number of levels. It also includes other information, such as filter expressions and time stamps.
1. From the lens data panel or viz builder panel, select Show Data Lineage from the field menu.
2. This opens the data lineage report for that field. Expand the arrows to see each level in the data
lineage tree.
3. (Optional) Click Export as JSON to download the data lineage report in a
platforaData.json file. This file is saved in the default downloads directory configured for
your browser.
View the Definition of a Field
When working in a viz, lens field names aren't always enough to inform analysts what the data values
represent. This is more critical for fields that are computed from the raw source data. To help understand
the data better, you can view the definition of a field to see how the data values were computed.
Page 44
Data Analysis and Visualization Guide - Get Data to Analyze
You can view the definition of a field in the lens panel or the builder panel. Click the field's contextual
menu and view the definition below the field name.
View the Definition of a Segment Field
A segment is a special type of dimension field that you can create to group together members of a
population that meet some defined common criteria. Segment fields can be described by another analyst
Page 45
Data Analysis and Visualization Guide - Get Data to Analyze
who has access to the lens. To understand logic behind the segment field, you can view its definition to
see how the data values were computed.
1. Click a viz that uses a lens that has one or more segments defined.
2. Click the Add button and choose Select Segment.
Page 46
Data Analysis and Visualization Guide - Get Data to Analyze
3. In the Select Segments dialog, click the segment whose definition you want to view.
You can also view the definition of a segment field in the lens panel as long as
the field is not currently in drop zone. Click the segment field's contextual menu
and choose Edit Field.
Page 47
Chapter
5
Control the Marks on a Viz
A mark is a graphical symbol (point, area, line, and so on) used to encode data in a visualization. In Platfora,
a mark is the visual representation of a measure value calculated for a group of input records or rows. A group
consists of records that share the same value for the dimension(s) used in the visualization. For example, if
you picked fields called Region (a dimension) and Units Sold (a measure) in your viz, then there would be a
mark on your viz for each region depicting the total number of units sold in that region (North America=5000,
Europe=3000, Asia=7000, etc.).
Topics:
•
Encode Data Using Mark Drop Zones
•
About Mark Types
•
Adjust Mark Appearance
Encode Data Using Mark Drop Zones
The initial marks on a visualization are determined by the dimension and measure combinations used on
the axes (X-axis and Y-axis). You can add additional marks (or groups) to the visualization by adding
additional dimensions to any of the Marks drop zones, such as the Details or Color encoding drop
zones.
Add Marks without Encoding
When a measure is used in a visualization, the group of input records used to calculate the measure value
is determined by the dimensions selected in the viz. For example in a dataset about diamonds, if cut
was a dimension in the viz, there would be five input groups for which the measure would be calculated
(Fair, Good, Ideal, Premium, and Very Good). Each input group is a mark on the viz.
An unencoded mark adds additional groups to the viz, but the groups are not visually differentiated. You
can add unencoded marks by putting a dimension in the Details drop zone. For example, if you put
Page 48
Data Analysis and Visualization Guide - Control the Marks on a Viz
color in Details, a mark is added to the viz for each cut, color combination, but the color marks are not
visible unless you select them.
Page 49
Data Analysis and Visualization Guide - Control the Marks on a Viz
Add Marks with Encoding
Adding additional dimension fields to the Color, Shape, or Labels drop zones both adds additional
groups to the viz and encodes the marks so the groups are visually differentiated. For example, if you put
color in Color, then within each cut group, the marks are color-encoded by color.
About Mark Types
The mark type controls the shape of the data on the visualization and how data points are visually
represented. Setting the mark type to Auto (the default) will automatically choose the best visual
representation of your data based on the fields you have selected.
You can select one of several mark types from the Marks drop-down menu. The default type is Auto,
which means that Platfora will choose the shape that best fits the fields you select. The rendering of a
particular chart depends on the mark type in combination with the placements of measure and dimension
fields in the Builder drop zones.
Use Bar Marks
Bars are useful for comparing quantitative data by different categorical groupings. Bar is the default
mark choice when you are analyzing a single measure across one or more dimensions, such as
Page 50
Data Analysis and Visualization Guide - Control the Marks on a Viz
comparing sales totals by region. Bars are most often used to mark data with categorical values,
although they can also be used for quantitative data as well.
Simple Bar Charts
A bar chart displays quantitative (measure) data points as rectangular bars in which the length of a bar
is proportional to the value. A bar chart is often used to compare values across one or more categories,
such as showing the number of items sold by product or by region. The bars can be vertical (if the
measure is placed in Y-axis) or horizontal (if the measure is placed in X-axis). To create a simple bar
chart, drag a measure field to the Y-axis zone and drag a dimension field to the X-axis zone.
Page 51
Data Analysis and Visualization Guide - Control the Marks on a Viz
Stacked Bar Charts
You can color-encode a bar chart by dragging an additional dimension field to the Color zone. By
default, the bars are stacked, meaning multiple dimension values are shown cumulatively within a single
bar.
Page 52
Data Analysis and Visualization Guide - Control the Marks on a Viz
Unstacked bars are placed one on top of another (not side-by-side), so unstacked bar charts are not
always visually useful. To show the color-encoded values as individual side-by-side bars (grouped), add
the same dimension field to both X-axis and color.
100 Percent of Total Stacked Bar Charts
100 percent total stacked bar charts show the percentage that each color-encoded dimension member
contributed to the total value rather than showing the actual cumulative values. You can change a
Page 53
Data Analysis and Visualization Guide - Control the Marks on a Viz
stacked bar chart to a 100% of total stacked bar chart by changing the scale of the measure axis. Note
that if your values have negative numbers, the percent of total scale may be above 100% or below 0%.
Create a Histogram
Marks on a Bar chart are separated by empty space by default. You can increase the size of the bar
marks to make a histogram (no space between bars).
Page 54
Data Analysis and Visualization Guide - Control the Marks on a Viz
When the viz uses Bar marks, use the Size pull-out menu in the Builder panel and maximize the size by
changing it to 1.00x.
Page 55
Data Analysis and Visualization Guide - Control the Marks on a Viz
Mark outlines are on by default, so if you want to turn off mark outlines, clear the Show mark
outlines option from the Marks options menu.
Use Point Marks
The point mark type is best used to show the relationship between two independent variables, and is
most often used for building scatterplot charts or bubble charts. Point is the default mark type when you
select a measure for both the X-axis and Y-axis, and place a dimension in color, opacity, or size.
Scatterplots
A scatter plot chart shows the relationship of a data point between two quantitative (measure) values.
Scatter plots are used to show how much one variable is affected by another. The relationship between
the two variables is called their correlation. For example, you might use a scatter plot to show the
relationship of two variables such as population size and income. To create a simple scatterplot, drag
Page 56
Data Analysis and Visualization Guide - Control the Marks on a Viz
a measure field to both the Y-axis and X-axis zones, and drag a dimension to the Color, Size or
Opacity.
Bubble Charts
A bubble chart is a variation of a scatterplot, except that it shows the relationship of a numeric data point
between two quantitative variables. The data points on a bubble chart are compared in terms of their
size, as well as to their relative positions on the horizontal and vertical axes. To create a simple bubble
Page 57
Data Analysis and Visualization Guide - Control the Marks on a Viz
chart, drag a measure field to both the Y-axis and X-axis zones, drag a dimension to Color, and then
drag an additional measure to Size.
Packed Bubbles Charts
A packed bubbles chart displays quantitative (measure) data as solid circular points (bubbles) in which
the area of each bubble is proportionate to the value of the measure. The bubbles are not located on
either the X-axis or Y-axis, so their locations have no significance. Instead, they are placed closely
together to use the space more efficiently.
Packed bubbles charts are different from bubble charts on an axis because they show fewer measure
fields (e.g. one measure as opposed to three measures).
A packed bubbles chart is similar to a word cloud chart, but it uses point marks instead of text marks.
By default, a packed bubbles viz displays a maximum number of 40,000
marks. However, your system administrator might change this value using the
platfora.viz.bubble.limit configuration property.
Page 58
Data Analysis and Visualization Guide - Control the Marks on a Viz
To create a packed bubbles chart, place a measure field in the Size drop zone, and place a dimension
field in the Labels drop zone. Then choose Point from the Marks menu.
Optionally, you can color encode the bubbles in a packed bubbles chart by placing a dimension field in
both the Color and Labels drop zones. Then choose Point from the Marks menu.
Page 59
Data Analysis and Visualization Guide - Control the Marks on a Viz
Use Line Marks
A line chart displays a series of individual quantitative (measure) data points connected by line
segments. A line chart is often used to visualize a trend in measure data over intervals of time (time
series data) with the line drawn chronologically. Line is the default mark choice when you are analyzing
a measure across a date or a numeric (quantitative) dimension.
Simple Line Charts
A line chart displays a series of quantitative (measure) data points over a numeric dimension, such as
year or age. A line chart is often used to show trends over a continuum, such as sales performance over
time. To create a simple line chart, drag a measure field to the Y-axis zone and drag a date or numeric
dimension field to the X-axis zone.
Stacked and Unstacked Lines
Page 60
Data Analysis and Visualization Guide - Control the Marks on a Viz
You can color-encode a line chart by dragging a dimension field to the Color zone. Depending on the
field selections you have made for the X-axis and Y-axis, the lines may be shown unstacked or stacked.
Stacked is the default choice when analyzing a measure by a date dimension, such as year.
When line marks are stacked, the lines are not drawn independently of each other, and are not compared
along the horizontal axis. Instead the lines are drawn cumulatively along the vertical axis. The stack
order reflects the order of dimension values, from bottom to top. Instead of interpreting the lines
Page 61
Data Analysis and Visualization Guide - Control the Marks on a Viz
themselves, you interpret the space between the lines, which is why stacked lines are often better
visualized as shaded areas.
Page 62
Data Analysis and Visualization Guide - Control the Marks on a Viz
When line marks are unstacked, the lines are drawn independently of each other. You interpret a data
point on the line by reading values along the horizontal and vertical axes. Unstacked is the default choice
when analyzing a measure by a numeric dimension (such as age or height).
Use Area Marks
Similar to a line graph, an area graph displays quantitative (measure) data over a continuum (such as
time), and is typically used to compare two or more quantities. However, unlike lines, area charts are
typically used to represent cumulated totals rather than individual totals. An area chart shows the space
between marks filled in with color, which is helpful for showing how the member of a dimension is
contributing to an overall trend.
Page 63
Data Analysis and Visualization Guide - Control the Marks on a Viz
Simple Area Charts
An area chart shows the space between the lines filled in with color. To create a simple area chart, drag
a numeric date field (such as year) to the X-axis zone, drag a measure to Y-axis zone, and drag a
categorical dimension to Color.
Page 64
Data Analysis and Visualization Guide - Control the Marks on a Viz
Heatmaps
Heat maps are used to compare a measure across multiple dimensions. Heat maps allow you to use color
to see variations in the data. To create a simple heatmap, drag a dimension field to both the Y-axis and
X-axis zones, and drag a measure to Color.
Use Path Marks
A path connects data points with lines. However, unlike the line chart type, the data points are connected
in order. Paths are useful for showing data that follows a natural order, such as ordered stops on a road
trip over, or ordered web page visits over time.
Page 65
Data Analysis and Visualization Guide - Control the Marks on a Viz
A path charts displays a series of quantitative (measure) data points over an ordinal dimension, such as
time. The drawing order of the line is determined by the sort order of the selected dimension field. For
example, this path shows the average departure delays of flights over the course of a day.
Note: Platfora currently does not support geographical encoding of data, such as plotting a path of data
points on a map image, although this feature is planned for a future release.
Use Polygon Marks
A polygon mark is similar to a path mark, except that the connected lines are filled in as shaded areas.
The polygon chart type is useful when you have areas of data points on your visualization that you want
to shade in to see distinct areas of data. Polygon charts typically require special datasets to effectively
convert useful insights, and are not used that often in the current product. This mark type will be more
useful in the future when Platfora increases support.
Use Text Marks
Text marks are useful for displaying text data at a glance. Text marks are most often used on
visualizations without axes (non-axis visualizations). Text is the default mark choice when you are
analyzing a single measure in the Labels drop zone.
Text Gauge Viz
A text gauge viz displays a single numeric value displayed as text. You might want to create a text
gauge viz to display a key performance indicator (KPI) as part of a dashboard of several visualizations.
Optionally, you can color code the text value by placing a different single value field into the Color drop
zone. You might use a measure or a single value numeric dimension in a text gauge viz.
Page 66
Data Analysis and Visualization Guide - Control the Marks on a Viz
To create a text gauge viz, place a measure field or a single value dimension field in the Labels drop
zone.
Word Cloud Viz
A word cloud viz displays text data as a weighted list. The text data values are distinguished from each
other by size. Word cloud visualizations are useful for quickly perceiving the most prominent terms.
By default, a word cloud viz displays a maximum number of 1500 words.
However, your system administrator might change this value using the
platfora.viz.word.limit configuration property.
Page 67
Data Analysis and Visualization Guide - Control the Marks on a Viz
To create a word cloud viz, place a dimension field in the Labels drop zone, and a measure field in the
Size drop zone.
Page 68
Data Analysis and Visualization Guide - Control the Marks on a Viz
Adjust Mark Appearance
Dragging a field into one of the Marks drop zones of the Builder panel is one way to add marks to a
visualization and visually encode their appearance, but you can also use the pull-out menus to change
the visual appearance of marks already on the visualization without adding additional groupings.
The Builder drop zones show the current settings that apply to the viz marks.
Show Mark Outlines
Viz marks for some mark types are outlined in a darker color by default. You can choose whether or not
to display mark outlines in a viz. You might want to display mark outlines to more easily differentiate
among individual marks, especially when they overlap each other.
When Platfora is configured to show mark outlines, they are available for the Bar, Point (solid shapes
only), Area, and Polygon mark types. The outline color is a slightly darker version of the fill color for
an individual mark.
Page 69
Data Analysis and Visualization Guide - Control the Marks on a Viz
Use the pull-out menu for the Marks drop zone to show or clear mark outlines.
Page 70
Data Analysis and Visualization Guide - Control the Marks on a Viz
To remove mark outlines, clear the Show mark outlines option from the Marks pull-out menu.
Adjust Mark Color
Viz marks can be colored in a single color or a collection (palette) of colors, depending on how the data
is encoded in the viz.
The appearance controls in the Color pull-out menu vary depending on whether or not a field is in that
drop zone.
Page 71
Data Analysis and Visualization Guide - Control the Marks on a Viz
Field is in the drop zone
The Color menu controls a range of values (a color palette) that apply to the marks in the viz created by
the field in the Color drop zone.
Page 72
Data Analysis and Visualization Guide - Control the Marks on a Viz
Field is not in the drop zone
The Color menu controls a single color that applies to all marks in the viz.
Adjust Mark Size
Viz marks can appear as a single size or range of sizes, depending on how the data is encoded in the viz.
All mark types are assigned a default size, minimum size, and a maximum size. For example, the
maximum size for bar marks is such that each bar mark touches the bars on either side without
overlapping them (a histogram). The minimum size for bar marks is one pixel. When adjusting mark
size, you make the marks bigger or smaller than the default size. Any mark size between 0x and 1.00x
makes the mark bigger than default, and mark sizes between 0x and -1.00x make the mark smaller than
default.
The appearance controls in the Size pull-out menu vary depending on whether or not a field is in that
drop zone.
Page 73
Data Analysis and Visualization Guide - Control the Marks on a Viz
Field is in the drop zone
The Size menu controls a range of sizes that apply to the marks in the viz created by the field in the
Size drop zone.
Page 74
Data Analysis and Visualization Guide - Control the Marks on a Viz
Field is not in the drop zone
The Size menu controls a single size that applies to all marks in the viz.
Adjust Mark Opacity
Viz marks can appear as a single opacity or range of opacities, depending on how the data is encoded in
the viz.
The appearance controls in the Opacity pull-out menu vary depending on whether or not a field is in
that drop zone.
Page 75
Data Analysis and Visualization Guide - Control the Marks on a Viz
Field is in the drop zone
The Opacity menu controls a range of opacities that apply to the marks in the viz created by the field in
the Opacity drop zone.
Page 76
Data Analysis and Visualization Guide - Control the Marks on a Viz
Field is not in the drop zone
The Opacity menu controls a single opacity that applies to all marks in the viz.
Adjust Mark Shape
Point viz marks can be colored in a single shape or a collection of shapes, depending on how the data is
encoded in the viz.
The appearance controls in the Shape pull-out menu vary depending on whether or not a field is in
that drop zone. When choosing a shape or collection of shapes, you can choose between solid (Fill) or
hollow (Outline) shapes.
The shape controls only affect point mark types. If a field is placed in the Shape drop zone and a nonPoint mark type is displayed, Platfora treats that field as if it were in the Details drop zone.
Page 77
Data Analysis and Visualization Guide - Control the Marks on a Viz
Field is in the drop zone
The Shape menu controls a collection of shapes (a shape palette) that apply to the marks in the viz
created by the field in the Shape drop zone.
Currently, Platfora supports only one shape palette.
Page 78
Data Analysis and Visualization Guide - Control the Marks on a Viz
Field is not in the drop zone
The Shape menu controls a single shape that applies to all marks in the viz.
About Solid and Hollow Shapes
Point marks on a chart appear as solid shapes by default. You can configure Point marks to be either
solid (Fill) or hollow (Outline)
When the viz uses Point marks, use the Shape pull-out menu in the Builder panel to choose solid or
hollow shapes.
Page 79
Data Analysis and Visualization Guide - Control the Marks on a Viz
Click Fill and choose the shape you want to create a solid shape.
Page 80
Data Analysis and Visualization Guide - Control the Marks on a Viz
Click Outline and choose the shape you want to create a hollow shape.
Adjust Mark Labels
Viz marks can be displayed with or without a label representing one of different values associated with
each mark.
The appearance controls in the Labels pull-out menu vary depending on whether or not a field is in that
drop zone.
The Labels menu controls the text that applies to all marks in the viz. Whether or not you can choose a
particular label depends on if a field is in that drop zone.
Page 81
Data Analysis and Visualization Guide - Control the Marks on a Viz
Field is in the drop zone
When a field is in the Labels drop zone, the value of the drop zone field is displayed for each mark.
Field is not in the drop zone
You can choose from the following label options:
• None - This option turns off the text labels.
• X-Axis Value - This option always displays the mark's value along the horizontal axis. You might
want to use this option to quickly and easily read the mark's exact value on the horizontal axis
without zooming in.
Page 82
Data Analysis and Visualization Guide - Control the Marks on a Viz
• Y-Axis Value - This option always displays the mark's value along the vertical axis. You might
want to use this option to quickly and easily read the mark's exact value on the vertical axis without
zooming in.
Page 83
Chapter
6
Control Field Labels in a Viz
Platfora allows you to control how field names, labels, and values appear on the axis of a viz.
Topics:
•
Truncate Field Labels in a Viz
•
Change Field Labels Width in a Viz
•
Change Field Display Names on an Axis
•
Hide Field Name and Values on a Chart Axis
•
Change the Number Formatting of a Measure
Truncate Field Labels in a Viz
When dimension field values are really long, the field name and value labels in a visualization are
truncated on the right side by default. However, you can configure where to truncate text for fields used
in a drop-zone.
You can truncate long text in field labels on the left side, ride side, or center of the text. You might want
to choose a different truncation location when you have multiple field values that start with identical
characters that are only distinguished from each other at the end of the string. Platfora displays an ellipse
character (...) when it truncates a label.
Page 84
Data Analysis and Visualization Guide - Control Field Labels in a Viz
Truncation applies to the field in both Chart and Cross-Tab view.
Date dimension fields cannot be truncated. Platfora always displays the full date.
Page 85
Data Analysis and Visualization Guide - Control Field Labels in a Viz
Page 86
Data Analysis and Visualization Guide - Control Field Labels in a Viz
1. Click a visualization to select it. Field labels are controlled on a per-viz basis.
2. Find the dimension field in the Builder panel that you want to edit, and select Options and Sort
from its field menu.
(Optional) Click the field name in the viz to open the Options and Sort dialog.
3. Click the Labels tab.
4. Choose where to truncate the field label in the viz, either Left, Center, or Right.
5. Click Apply.
The labels for the field name and field values are updated in the currently selected viz.
The truncated labels apply to the current viz only. The truncated labels do not
affect other visualizations in the same vizboard. When the field is removed from
a drop zone and added again, the truncation setting is reverted to the default.
Change Field Labels Width in a Viz
When you add a dimension field to your visualization, the field name and its values are assigned a width
by default. Platfora calculates a best guess width based on several factors including the length of the
current values in the field. However, you can define the width assigned the labels of fields used in a
drop-zone.
You can choose one of the following options when configuring field label width for a dimension field:
• Fit to best guess. Platfora calculates a label width based on several factors including the size of
the viz and length of values in the field. This is the default.
• Fit to axis values. The label width is large enough to accommodate the longest value along the
axis.
• Fit to field name. The label width is as wide as the field name.
• Max pixel width. You can choose the number of pixels to use for the label width.
Page 87
Data Analysis and Visualization Guide - Control Field Labels in a Viz
The label width applies to the field in both Chart and Cross-Tab view.
1. Click a visualization to select it. Field label widths are controlled on a per-viz basis.
2. Find the dimension field in the Builder panel that you want to edit, and select Options and Sort
from its field menu.
(Optional) Click the field name in the viz to open the Options and Sort dialog.
3. Click the Labels tab.
4. Choose how to calculate the field label width.
5. Click Apply.
Page 88
Data Analysis and Visualization Guide - Control Field Labels in a Viz
6. The field label width is updated in the currently selected viz.
The field label width applies to the current viz only. The width does not affect
other visualizations in the same vizboard. When the field is removed from a
drop zone and added again, the width setting is reverted to the default.
Page 89
Data Analysis and Visualization Guide - Control Field Labels in a Viz
Change Field Display Names on an Axis
When you add a field to your visualization, the field name is displayed in the viz axis headers. By
default, the field name shown in a viz is the same as it appears in the lens. You can change the display
name of any field used in a visualization on a per-viz basis.
Page 90
Data Analysis and Visualization Guide - Control Field Labels in a Viz
Page 91
Data Analysis and Visualization Guide - Control Field Labels in a Viz
1. Click a visualization to select it. Field display names are controlled on a per-viz basis.
2. Find the field in the Builder panel that you want to rename, and select Options (for measures) or
Options and Sort (for dimensions) from its field menu.
3. Enter the new Display Name and click Apply.
4. The display name is updated in the headers of the currently selected viz.
The new display name applies to the current viz only - the display name does
not affect the field name in the dataset, lens, or other visualizations in the same
vizboard. The display name currently only applies to the viz headers, not the viz
filters and legends panels.
Hide Field Name and Values on a Chart Axis
When you add a field to your visualization, the field name and its values are displayed in the viz axis
headers. To save space in a viz, you can hide the field name or its values on the axis of a chart viz on a
per-viz basis.
You might want to hide field names or values for any of the following reasons:
• To save space in the viz.
• If the information is duplicated through color, labels, or other attributes.
• If the chart is explained in the title and doesn’t need to show axes.
Page 92
Data Analysis and Visualization Guide - Control Field Labels in a Viz
Field names and values are only hidden in Chart view. They always appear in Cross-Tab view.
Page 93
Data Analysis and Visualization Guide - Control Field Labels in a Viz
Page 94
Data Analysis and Visualization Guide - Control Field Labels in a Viz
1. Click a visualization to select it. Hiding field names is controlled on a per-viz basis.
2. Find the field in the Builder panel that you want to edit, and select Options (for measures) or
Options and Sort (for dimensions) from its field menu.
3. Select the Hide field name (chart only) or Hide axis values (chart only) options as
desired.
4. Click Apply.
The field name or axis labels are removed from the currently selected viz.
This setting applies to the field in the current viz only. It does not affect other
visualizations in the same vizboard. When the field is removed from a drop
zone and added again, the setting is reverted to the default.
Change the Number Formatting of a Measure
You can specify the number format for the measure values shown in the viz and cross-tab text. You can
select from a set of standard formats, such as normal, currency, scientific, and percentage.
1. Select Options in the measure field drop-down menu.
You can only change the number formatting for measure values, not numeric
dimension values.
Page 95
Data Analysis and Visualization Guide - Control Field Labels in a Viz
2. Choose the number format you want to use, and select the desired formatting options.
The following number formats and options are available:
Format
Description
Options
Auto
The data values in the field are
used to determine the best display
format.
None.
Normal Values are displayed as regular
numbers.
Negative Values - How to display
negative values.
Decimal Places - How many decimal
places to display.
CurrencyValues are displayed as monetary
units.
Negative Values - How to display
negative values.
Decimal Places - How many decimal
places to display.
Symbol - The monetary symbol to use
(dollar, euro, yen, etc.)
Percent Values are displayed as a
percentage where a value of 1 is
interpreted as 100%, 0.8 as 80%,
0 as 0%, and so on.
Negative Values - How to display
negative values.
Decimal Places - How many decimal
places to display.
ScientificValues are shown in scientific
exponential or e notation. The
Negative Values - How to display
negative values.
Decimal Places - How many decimal
places to display.
notation e+n represents times ten
raised to the nth power. Note that
in this usage the character e is
not related to the mathematical
constant e or the exponential
function ex.
3. Click Confirm.
Page 96
Chapter
7
Sort Viz Data
You can sort data in a viz to arrange the data in a meaningful order for analysis.
Topics:
•
Default Sort Behavior
•
Change the Sort Order of a Dimension Axis
Default Sort Behavior
By default, dimension values are sorted alphabetically (chronologically for dates and numerically for
numbers) in ascending (A-Z) order. You can change the sort order to sort the marks in descending order
(Z-A) or to sort the marks according to a measure field instead.
Page 97
Data Analysis and Visualization Guide - Sort Viz Data
Change the Sort Order of a Dimension Axis
Changing the sort order of a dimension axis rearranges the order of the marks on the viz. You can
change the default sort order of a dimension, sort the dimension marks on the viz according to the value
of a measure, or limit the number of dimension values shown in the viz.
Sort options are only available for dimension fields placed in a Builder drop-zone. Select Options
and Sort in the field drop-down menu to access the sort options for that field.
Page 98
Data Analysis and Visualization Guide - Sort Viz Data
Changing the Default Sort Order
By default, dimension axes show values as categorical data. Each dimension field has a default sort
order depending on its data type:
•
•
String data (
Numeric data (
• Datetime data (
) is shown in alphabetical order (A-Z).
) is shown from low to high values (1-10).
) is shown in chronological order (Jan 1 - Dec 31).
The default sort order for a categorical dimension axis is Ascending. Values are displayed in natural
reading order - left to right for X-axis or top to bottom for Y-axis.
To change the default sort order to Descending (high to low):
1. Select Options and Sort in the field drop-down menu.
2. Change the Sort Direction from Ascending to Descending.
3. Click Apply.
Sorting a Dimension by a Measure Field
Sorting a dimension axis by a measure is a good way to show the highest (or lowest) performing
categories of a dimension. To sort by a measure:
1. Select Options and Sort in the dimension field drop-down menu.
2. (Optional) Choose the Sort Direction - Ascending if you want to explore the bottom of the
range, Descending if you want to explore the top of the range.
3. In the Sort by field, choose the measure you want to use.
4. (Optional) To limit the number of marks shown on the viz, enter a Limit number.
5. (Optional) Select Include 'Others' member if you want a mark to represent the members that
were excluded by the limit.
Page 99
Data Analysis and Visualization Guide - Sort Viz Data
For example, to show the top 10 busiest airports, you could do a descending sort on airports by the
number of flights and then limit the results to 10.
Page 100
Chapter
8
Filter Viz Data
Adding a filter to a visualization allows you to constrain the data that is shown in the visualization. You can add
a filter using a particular field, or by selecting and isolating a set of marks on a viz.
Topics:
•
FAQs—Viz Filters
•
Add a Filter on a Field
•
Create a Page Filter
•
Toggle Filter Include/Exclude Mode
•
Filter by Selection
•
Filter by Limit
FAQs—Viz Filters
This topic answers some frequently asked questions about filtering data in a visualization.
What is a viz filter?
A viz filter is a condition that specifies which data values to display in a visualization.
Filtering a viz affects the query Platfora creates and runs against the lens. For more information on how
filtering affects the viz query, see How Interactive Viz Queries Work.
Who can create a viz filter?
To create a viz filter, a user must have the Analyst (Limited) role or higher and also have object
access permission to edit the vizboard.
What types of viz filters can I create?
You can create the following types of viz filters:
• Local filter. Local filters apply to a single viz. Local filters appear in the Filters panel under the
Filters section.
Page 101
Data Analysis and Visualization Guide - Filter Viz Data
• Page filter. Page filters apply to all visualizations on a page that use the same lens. Page filters
appear in the Filters panel under the Page Filters section.
When viewing a vizboard with no viz currently selected, the Page Filters section
includes the lens name for each page filter.
How can I create a viz filter?
You can create a viz filter in the following ways:
Page 102
Data Analysis and Visualization Guide - Filter Viz Data
• From a field. Filtering on a field allows you to include or exclude records from the visualization
based on the values of the selected field. Filtering from a field can apply to a single viz (local filter)
or all visualizations on the page using the same lens (page filter). For more information on working
with field filters, see Add a Filter on a Field.
When you drill down on a field in a viz, Platfora applies a filter to a field. For
more information on drilling down, see Drill Down FAQ.
• From a selection of viz marks. Using the cursor, you can select marks in a viz and filter on the
selection. Filtering from a selection can apply to a single viz (local filter) or all visualizations on the
page using the same lens (page filter). For more information on working with selection filters, see
Filter by Selection.
• From a sort limit. You can limit the number of dimension members in a viz based on a measure
calculation. For example, you could limit a viz to the top 10 sellers. Filtering data using a sort limit is
only ever applied to a single viz (local filter). For more information on working with limit filters, see
Filter by Limit.
Why would I want to create a page filter?
You might want to create a page filter to look at different variables in multiple visualizations across a
common dimension.
How do I create a page filter?
You can create a page filter by dragging a field into the Page Filters drop zone, or by designating
(promoting) a local filter to apply to the entire page.
For more information on creating page filters, see Create a Page Filter.
Are viz field filters inclusive or exclusive?
By default, field filters are inclusive, meaning the values selected in the filter are included in the
visualization. You can change a field filter to be exclusive, meaning everything except the selected
values are included in the visualization. For more information on how to do this, see Toggle Filter
Include/Exclude Mode.
Can I create multiple filters on a viz?
Yes, but note the following:
• You can create multiple filters on different fields, as either local or page filters.
• You can create multiple selection filters, as either local or page filters.
• You can create only one page filter on a particular lens field.
• You can create only one local filter on a particular lens field.
• You can create a local filter and a page filter on the same lens field.
• Each viz only allows one limit filter.
Page 103
Data Analysis and Visualization Guide - Filter Viz Data
Does the order of the viz filters in the Filters panel matter?
No. Platfora uses an AND condition in the viz query when applying all applicable filters.
Can I change a page filter to a local filter for a particular viz?
Yes, you can demote a page filter to a local filter. You can demote most page filters to any viz on the
page. However, you can only demote a page filter that contains a drill filter to the original viz.
To demote a page filter, you can drag it from the Page Filters section to the Filters section of the
Filters panel. Or, you can select the viz to apply the filter to, and from the filter contextual menu
choose Apply to only this Viz.
Can I create a filter on a geography field?
Yes. Filtering on a location field filters on the location name (label) assigned to each value as a
categorical dimension. For example, if you have a location field for zip code data and the location name
for each value is the 5-digit zip code, you can filter on values such as 94402 and 94111, but not on the
latitude and longitude coordinates.
Page 104
Data Analysis and Visualization Guide - Filter Viz Data
What happens when I copy a viz that has a page filter applied to it to another
vizboard page?
When you move a viz that has page filter applied to it to another page (either in the same or different
vizboard), the page filter is converted to a local filter and is applied to that viz only.
Also, if the page filter contains a drill filter, the page drill filter is converted to a local drill filter if the
moved viz originally crearted the page drill filter. However, if the moved viz didn't create the page drill
filter, it gets converted to a regular local filter.
How do I remove a filter?
To remove a filter on a field, find the field in the Filters panel and click the X next to the field name.
Or, from the filter contextual menu choose Remove from Viz.
Note that limit filters do not show in the Filters panel. To remove a limit filter on a dimension, you
need to open the Options and Sort dialog on the particular dimension field to remove any limits that
have been applied.
Can I promote a local filter created by drilling down on a viz to a page filter?
Yes, but note the following:
Page 105
Data Analysis and Visualization Guide - Filter Viz Data
• When you promote a local drill filter to a page drill filter, the page drill filter shows the viz name to
make it clear which viz created it.
• If you promote a local drill filter to a page drill filter and later delete the viz in which you created the
drill filter, the page drill filter changes in appearance to a regular page filter. That means the viz name
disappears from the page filter.
• When you duplicate a viz that has a page drill filter, the duplicated viz has a local drill filter applied
as well as the page filter. The local drill filter determines which fields appear in each viz drop zone,
and the page filter determines which data members are filtered out.
Add a Filter on a Field
Filtering on a field allows you to include or exclude records from the visualization based on the values
of the selected field. The filter controls differ depending on the type of field you are filtering on
(dimension, measure, or date).
You can add a filter on any field by dragging it to the Filters panel. If the visualization has multiple
filters applied, drag new fields to the desired position in the list (above or below an existing filter field).
Page 106
Data Analysis and Visualization Guide - Filter Viz Data
When the drop zone indicator appears, drop the field into position. Filters are applied independently of
each other, so their position in the list does not change how the filter is applied to the data.
You can also add a filter from the field's drop-down menu from the lens panel or builder panel. This
adds a filter control for that field to the bottom of the Filters panel.
Filter on Dimension Fields
Dimension fields can contain textual or numeric data, so filtering on a dimension depends on the data
type of the selected field. Category filters allow you to select specific dimension members (values) to
include or exclude in your visualization. Range filters allow you to choose a range of values to include
or exclude, and are only applicable to dimensions that contain numeric data.
When you add a dimension field to the Filters panel, you get the dimension filter control. Depending
on the data type of the field selected, the control defaults to either Category (for textual data) or
Range (for numeric data). For numeric data, you can change the filter control to Category if you
want to choose specific numeric values to include or exclude.
Page 107
Data Analysis and Visualization Guide - Filter Viz Data
Basic Category Filters
A category filter shows a list of the distinct values (or members) in the selected dimension field. As long
as the dimension field has 100 values or less, you can scroll through the entire list of values and check
the values you want.
By default, the category filter control is inclusive, meaning you select the values
you want to include in your visualization (unselected values are filtered out). To
change to exclusive, see Toggle Filter Include/Exclude Mode.
Page 108
Data Analysis and Visualization Guide - Filter Viz Data
Category Filters with Long Member Lists
The category list only displays the field values when the field contains 100 or fewer values (or
members). For dimension fields that have a lot of distinct values, click Edit List to choose values that
are not visible in the list.
Page 109
Data Analysis and Visualization Guide - Filter Viz Data
In the Edit Filter dialog, choose from the data source member values displayed in the dialog and add
one or more values to the filter criteria. To create more room in the Edit Filter dialog, click the
icon to collapse the left side pane.
Optionally, you can then do one of the following to add member values to the filter criteria:
• Search for a specific value to select. Platfora displays the first 10,000 values. To view values not
listed, you can enter a search string to list a subset of values and then choose from the displayed
member values.
• Add a match pattern to select values. Define a custom member if the member is not currently in the
data. You can also use wildcard characters to match multiple members."
• Import a list of values. Import a list values contained in a text file.
Page 110
Data Analysis and Visualization Guide - Filter Viz Data
To search for a specific value to select:
1. Choose how to match the search criteria. By default, the search pattern searches the entire string for
a match (Contains). You can also choose to search for a pattern at the beginning (Starts with) or
end (Ends with) of the string.
2. Type the search criteria in the Search box to find possible matches.
You can use the wildcard characters of ? (question mark) to represent a single character or *
(asterisk) to represent any number of characters, and \ (backslash) as the escape character. Note
that the search pattern you enter is not case-sensitive. For example, entering a search pattern of
Contains A will return any value that contains the letter A or a, not just values that begin with A.
3. Select matching members to add to filter criteria.
4. Click Apply.
Page 111
Data Analysis and Visualization Guide - Filter Viz Data
To select values based on a match pattern:
1. Click Create Custom to create an empty search pattern.
2. Type the search pattern in the empty field in the Selected Members box. Note that the search
pattern you enter is not case-sensitive. You can use the following wildcard characters in the search
pattern:
Wildcard
Character
Matches
*
Zero or more characters
?
Any single character
\*
The asterisk (*) character
\?
The question mark (?) character
\\
The backslash (\) character
Alternatively, you can enter the search pattern in the Search Member field and then click Create
Custom. When text is entered in the search box, Platfora creates a selected member based on the
search pattern you entered. The asterisk (*) character is added as a wildcard to the beginning or end
Page 112
Data Analysis and Visualization Guide - Filter Viz Data
of the search. For example, if you chose Starts with san, then Platfora creates a filter section
member *san.
3.
(Optional) When the edit icon
member string.
appears next to a selected member, you can edit the filter section
4. Click Apply.
To upload a list of values:
1. Click Import List and navigate to a text file containing members you want to select. The file must
be smaller than 1 MB and contain fewer than 10,000 values.
Platfora adds a member value to the Selected Members list for each value on a new line in the
file. Platfora uses the new line character to delimit values in the file.
Note that when the imported file contains a very large number of values, the import may take a while
to load.
2. Click Apply.
Page 113
Data Analysis and Visualization Guide - Filter Viz Data
Range Filters
When you add a numeric dimension to the Filters panel, the default filter control is a range filter. You
can type the Start and End range values (hit ENTER to apply a value after typing it in).
If your data is categorical in nature (for example, SKU numbers or product codes), you can switch it to
be a category filter instead.
Filter on Date Fields
Date fields are a special kind of dimension that allow for either range or relative filters. A date range
filter allows you to select a range of days to include or exclude. A relative date filter allows you to pick a
particular date and then choose the day, week, month, quarter, or year prior to, following, containing, or
up to that date. Note that date filters only apply to Date or Date (Time Series) fields. Other date-related
fields (Month, Week, Year, and so on) are treated just like any other dimension field.
When you add a date field to the Filters panel, you get the date filter control. The range filter control is
the default. You can also choose the relative date filter control.
Page 114
Data Analysis and Visualization Guide - Filter Viz Data
Date Range Filters
A range filter is the default filter control for dates. It allows you to choose a specific Start and End
range (the range is inclusive) of date values. You can type a specific date value and press ENTER, or
use the calendar control to pick dates.
Why does my date range include January 1, 1970?
If you have records in your source data that contain a null value for a dimension field (the value is
empty), Platfora assigns them to a default value when it builds the lens. The default value assigned to
null date values is January 1, 1970 (01/01/1970).
Relative Date Filters
To switch the filter control of a date field from a range to a relative filter, select Relative from the
filter Options menu.
To create a relative date filter, you must first select a date to use as the basis of the filter rule, either
Relative to today or Relative to a specific date.
There are four types of relative date filter rules you can create:
• Previous - Shows a number of time periods (day, week, month, quarter, or year) prior to and
including the period of the selected date. For example, Previous Year includes last year and this year.
In particular, choosing Previous Year on March 15, 2015 includes 01/01/2014 through 12/31/2015.
• Next - Shows a number of time periods (day, week, month, quarter, or year) following and including
the period of the selected date. For example, Next Month includes this month and next month. In
particular, choosing Next Year on March 15, 2015 includes 03/01/2015 through 04/30/2015.
Page 115
Data Analysis and Visualization Guide - Filter Viz Data
• This - Shows the time period (day, week, month, quarter, or year) that the selected date is a member
of. For example, This Quarter for 10/15/2015 includes 10/01/2015 through 12/31/2015.
• To-Date - Shows the time period (day, week, month, quarter, or year) that the selected date
is member of, up to and including the selected date. For example, To-Date Quarter-to-Date for
10/15/2015 includes 10/01/2015 through 10/15/2015.
For example the following relative date filter rule includes 02/01/2015 through 03/31/2015:
Filter on Measure Fields
Measures always contain quantitative data, so filtering on a measure involves choosing a range of
numeric values to include or exclude in your visualization. In Platfora, measures are always the result
of an aggregate calculation on a group of dimension values, so the visual rendering of a measure filter
changes depending on the dimension fields and filters used in the visualization.
When you add a measure field to the Filters panel, you get the measure filter control. It shows a range
of numeric values based on the dimensions currently selected in the visualization. Note that the possible
range of values can change as you add and remove dimensions and dimension filters in the visualization.
Page 116
Data Analysis and Visualization Guide - Filter Viz Data
You can type values in the Start and End fields (hit the refresh icon to apply the range filter to the
visualization).
Create a Page Filter
Creating a page filter allows you to filter multiple visualizations at once (provided they all use the same
lens).
Promote a Local Filter to a Page Filter
You might want to promote a local filter to a page filter when you have a filter created from a selection
of viz marks.
To make a local filter apply to the page, you can drag it from the Filters section to the Page Filters
section of the Filters panel, or you can use the following steps:
1. Select the viz that contains the filter.
2. Find the filter in the Filters panel.
3. Open the filter contextual menu.
Page 117
Data Analysis and Visualization Guide - Filter Viz Data
4. Select Apply to entire page.
If any other visualizations on the same vizboard page also use the same lens, the filter will be applied to
those visualizations as well.
Page 118
Data Analysis and Visualization Guide - Filter Viz Data
Create a Page Filter from a Lens Field
To make a page filter from a particular lens field, drag a field from the lens builder to the Page Filters
drop zone in the Filters panel.
Toggle Filter Include/Exclude Mode
By default, filters are inclusive, meaning the values selected in the filter are included in the visualization.
You can change a filter to be exclusive, meaning everything except the selected values are included in
the visualization. Exclusive mode is useful if you have a long list of values, and there are only a couple
of values you want to filter out.
Page 119
Data Analysis and Visualization Guide - Filter Viz Data
To make a filter exclusive instead of inclusive, select Exclude Mode from the field's drop-down menu
in the Filters panel. This changes the filter so that selected values are excluded from the visualization.
Filter by Selection
While working in a visualization, you can isolate particular marks on the visualization by selecting them.
You can then save your selection as an inclusive or exclusive filter. Saved selection filters are saved in
the Filters panel of the visualization workspace.
Selecting and Unselecting Marks
To select marks on a visualization, you can:
• Click individual marks in the visualization.
• Click individual dimension members in the Legends panel.
• Click above a mark in a visualization and drag across and/or down to select multiple marks at once.
• Press the CTRL key (Windows) or command key (Mac) and click individual marks to add them to
the selection.
To unselect marks on a visualization, you can:
• Press the ESC key.
• Click any whitespace in the visualization.
Page 120
Data Analysis and Visualization Guide - Filter Viz Data
• Press the CTRL key (Windows) or Command key (Mac) and click individual marks to remove
them from the selection.
Saving a Selection as a Filter
To save a selection as a filter:
1. Select the marks on the visualization you want to include or exclude.
2. From the marks selected drop-down menu, choose Isolate Selection or Exclude Selection.
3. The selection is then added as a filter to the Filters panel.
Filter by Limit
Another way to filter a dimension field is to apply a limit on the number of dimension members
appearing in the visualization. You can also use limit in combination with a sort to limit a dimension
based on a measure calculation. For example, if you wanted to filter products to only show the top 10
sellers, you could sort the products dimension by the total sales measure, and then limit the results by
10.
For dimension fields that are already rendered in the visualization, you can add a limit filter by selecting
Options and Sort from the field's drop-down option menu. This opens the Options and Sort
dialog.
Page 121
Data Analysis and Visualization Guide - Filter Viz Data
By default, dimension values are sorted alphabetically (or chronologically for dates) in ascending (AZ) order. You can change the sort order to reflect that of a measure field instead. For example, in this
visualization we are sorting departure airports by the number of flights in descending order (most flights
to least flights), and limiting the results to 15, thereby showing the top 15 busiest departure airports in
our data.
Selecting Include "Others" Member will include a mark on the viz labeled Others. This group
includes all other dimension values filtered out by the limit criteria as one combined group.
Page 122
Data Analysis and Visualization Guide - Filter Viz Data
Fields that have a sort or limit applied will show an icon in the field drop zone.
Page 123
Chapter
9
Build a Chart Viz
Chart visualizations allow you to do exploratory analysis of aggregate data in chart form. Visualizations appear
inside panels (or boxes) in the workspace area of a vizboard page. A new vizboard page will always have one
empty placeholder visualization to get you started.
Topics:
•
FAQs—Chart Visualizations
•
About the Chart Viz Workspace
FAQs—Chart Visualizations
This topic answers some frequently asked questions about chart visualizations.
Information forthcoming.
About the Chart Viz Workspace
When you edit a chart viz within a vizboard, the Builder panel contains the tools you need to build
the visualization. The control panels on the left-hand side of the workspace are used to select lens
data and design the visual representation of the data. The control panels on the right-hand side of the
viz workspace show the data filters and appearance-encoding legends that are active in the selected
visualization.
Chart visualizations use a grid layout of columns and rows. Fields added to X-axis show on the
horizontal axis (equivalent to columns), and fields added to Y-axis show on the vertical axis
(equivalent to rows). The axes have headers that display the field names, and labels to display individual
field values (or ranges of values).
Page 124
Data Analysis and Visualization Guide - Build a Chart Viz
A mark is the visual representation of a measure value calculated for a group of input records or rows.
A group consists of records that share the same value for the dimension(s) used in the visualization.
Hovering over an individual mark will show the data details in a tooltip.
1. Lens name
2. Edit lens button
3. Add menu for supplementing the lens
4. Pan and zoom controls
5. Viz controls
6. Show/hide pages and builder panels
7. Filter lens fields
8. Lens fields
9. Choose viz type
10.Axis controls
11.Mark
12.Tooltip
13.Mark type controls
14.Mark appearance options
15.Mark appearance controls
Page 125
Data Analysis and Visualization Guide - Build a Chart Viz
16.Field header/name
17.Field label
18.Filters conrols
19.Encoding legends
Page 126
Chapter
10
About Chart Viz Axes
Visualizations use a grid layout of columns and rows. Fields added to the X-axis drop-zone show on the
horizontal axis (equivalent to columns), and fields added to the Y-axis drop-zone show on the vertical axis
(equivalent to rows). The axes have headers that display the field names, and labels denoting the field values (or
members).
Topics:
•
Use Multiple Fields on an Axis
•
Transpose the X and Y Axis
•
Change Axis Options for Measure Data
•
Change Axis Options for Dimension Data
Page 127
Data Analysis and Visualization Guide - About Chart Viz Axes
Use Multiple Fields on an Axis
The X-axis and Y-axis drop zones can contain multiple fields. Adding additional measures to an axis
produces a dual or trellised axis. Adding additional dimensions to an axis produces a grouped axis.
Grouped Axes
Placing additional dimensions in an axis drop zone produces a grouped axis. The ordering of the
dimensions in the drop zone determines the grouping order. Typically, you should group an axis on
dimensions with fewer members for the best results (place lower cardinality dimensions above higher
cardinality dimensions). When measures and dimensions are placed on the same axis, the dimension
must always be on top.
Page 128
Data Analysis and Visualization Guide - About Chart Viz Axes
Dual and Trellised Axes
Placing two measures in the same axis drop zone produces a dual axis. A dual axis allows you to
compare two measures side by side over a common dimension.
Page 129
Data Analysis and Visualization Guide - About Chart Viz Axes
Placing multiple measures in both of the axis drop zones produces a trellised axis. A trellised axis allows
you to compare multiple measures over a common dimension.
Page 130
Data Analysis and Visualization Guide - About Chart Viz Axes
Transpose the X and Y Axis
You can transpose the X and Y axis of a visualization by clicking Swap underneath the X-axis drop
zone in the Builder panel. This swaps the fields in the X-axis drop zone with those in the Y-axis
drop zone, thereby flipping the orientation of your viz.
Change Axis Options for Measure Data
Measure fields represent the quantitative data in your viz, and always contain aggregated numeric type
data. When you place a measure field in the X-axis or Y-axis drop-zone, you get a quantitative (or
continuous) axis. You can change various display options of a measure axis such as the display name
formatting, value range, the scale, or the number formatting.
Page 131
Data Analysis and Visualization Guide - About Chart Viz Axes
Measure axis options can be accessed from the field menu of any measure field placed in the X-axis or
Y-axis drop zones of the Builder panel.
Measure axis options are available on measure fields placed in the appearance
encoding drop zones as well, but they do not have any affect on the viz axes in
that context.
Change the Value Range of a Measure Axis
By default, the range of values shown on a measure axis always includes 0 (zero) in the range. For data
that has a range skewed to high values, you may not want the measure axis to start at zero. For example,
if your data values start at 1,000,000 and go to 10,000,000 then you may want the measure axis to reflect
the actual range of values rather than starting at 0.
When you have a measure field in the axis drop zones (X-axis or Y-axis), the default is to always
include 0 in the range of values on the axis. To have the axis reflect the actual range of values, you can
choose to not always include 0 in the range.
Page 132
Data Analysis and Visualization Guide - About Chart Viz Axes
Note that this will only affect measure fields whose range does not include 0 already and where the
range is skewed. If the range naturally includes 0 (or values close to 0), then the measure axis must
include 0.
1. Select Options in the measure field drop-down menu.
2. Deselect Always include zero under the Range options.
3. Click Confirm.
For some data, it is easier to see the variations in the data when the range starts at a higher value rather
than at zero.
Page 133
Data Analysis and Visualization Guide - About Chart Viz Axes
Change the Scale of a Measure Axis
When you have a measure field in the axis drop zones (X-axis or Y-axis), the corresponding axis
shows a numeric scale depicting the range of values for the selected measure. By default, measure axes
show a Linear scale of numeric values from lowest to highest. You can change a measure axis to
display values on a Logarithmic scale or a Percent of Total scale.
Logarithmic Scale Axis
There are two main reasons to use a logarithmic scale for a measure axis in a viz. The first is to respond
to skewness towards large values (cases in which a few values are significantly larger than the bulk of
the data). The second is to show change of magnitude for data values that grow exponentially, such as
Richter scale values that measure the strength of an earthquake. Also, when the marks plotted on the viz
cover a very large range, changing the measure axis to a logarithmic scale can make it easier to see the
growth curve of the data.
A linear scale is plotted with equal distance between the tick marks, each unit of change is represented
by the same distance on the scale. By contrast, a logarithmic scale is plotted so that the tick marks are
not positioned equidistantly; instead, the scale is plotted in such a way that two equal magnitude changes
are plotted with the same distance on the scale. For example, here is stock data that shows the average
spread (difference between the high and low price) for certain stocks. When shown linearly, the axis
Page 134
Data Analysis and Visualization Guide - About Chart Viz Axes
shows the average price difference in 2 dollar increments. When shown logarithmicly, the axis shows
the magnitude of change between the average price difference values.
The values on a log scale axis show the natural logarithm to the base of e, where e (Euler's number) is
a mathematical constant approximately equal to 2.718. The natural logarithm is the power to which the
constant e must be raised in order to equal the value of magnitude change between the values plotted
on the viz. Switching from a linear to a log axis does not change the underlying measure values; it only
changes how the values are rendered on the axis.
Page 135
Data Analysis and Visualization Guide - About Chart Viz Axes
Log axes are not compatible with the Bar or Edge mark types. They only make
sense for mark types that can show curves in the data, such as Line, Point, or
Area. Negative values cannot be plotted on a logarithmic scale. Also, log axes
are not compatible with the Polar Chart viz type and disabled for polar chart
visualizations.
Percent of Total Scale Axis
A Percent of Total scale axis only makes sense when showing stacked charts, such as stacked bar
charts, stacked line charts, or area charts. When you enable Percent of Total on a measure axis,
the axis scale is rendered as the percentage that each dimension group contributed to the total for each
column (or row). The measure values are also represented as percentages rather than the actual values.
For example, in the first column of the stacked bar chart below, the Percent of Total axis shows that
morning departure flights contributed about 30% to the total number of flights served on October 1,
Page 136
Data Analysis and Visualization Guide - About Chart Viz Axes
2012. This is in contrast to 5,000 flights by count. Date is the column in this case for which the percent
of total is being calculated.
If your measure values contain negative numbers, the axis scale may show
percentages above 100% or below 0%.
Page 137
Data Analysis and Visualization Guide - About Chart Viz Axes
Change Axis Options for Dimension Data
When you have a dimension field in the X-axis or Y-axis drop zone, the corresponding axis shows the
labels of the dimension values. You can change the sort order of a dimension axis, limit the number of
values shown on the axis, or toggle the axis type to display values on a categorical or quantitative scale.
Change the Type of a Dimension Axis
When you have a dimension field in a X-axis or Y-axis drop-zone, the default axis type is categorical
(discrete). This means that the values are shown as individual members or categories. For dimension
fields that contain numeric or datetime type data, you can change the axis type to a quantitative scale
(continuous) axis instead.
Changing a Numeric Dimension Axis from Categorical to Quantitative
Dimension fields that are of a numeric data type (
quantitative scale (continuous ) axis.
) can be switched from a categorical axis to a
To change the axis type of a numeric dimension:
1. Select Options and Sort in the field drop-down menu.
2. Select Quantitative on the Sorting tab.
3. Click Apply.
Page 138
Data Analysis and Visualization Guide - About Chart Viz Axes
Changing the dimension axis to Quantitative may also change the default mark type of the viz if you
are using the Auto mark type.
For example, the viz below is comparing the average trade volume (a measure) to the spread percentage
(a numeric dimension) for certain technology stocks. When spread percentage is shown on a categorical
axis, member values are compared side-by-side with a representative sampling of value labels along
the axis. Switching the axis to quantitative, shows a quantitative scale with tick marks at evenly spaced
Page 139
Data Analysis and Visualization Guide - About Chart Viz Axes
intervals, similar to how an axis is rendered for a measure. Also, since the viz is now comparing two
quantitative values, the default mark type changes from a bar to a point.
Note that a dimension axis displayed on a quantitative scale cannot be sorted like a categorical
dimension axis can. It will always show the values as a range from low to high with tick marks at evenly
spaced intervals.
Page 140
Data Analysis and Visualization Guide - About Chart Viz Axes
Changing a Date Dimension Axis from Categorical to Quantitative (Continuous)
Changing the axis type for datetime type data is controlled by the field you choose from the lens, not by
the field menu options. For each datetime type field in your lens, there will be two fields that contain the
same data values:
•
Date - Dragging a Date field to X-axis or Y-axis will give you a discrete date axis (dates are
displayed as individual categories).
•
Date (Time Series) - Dragging a Date (Time Series) field to X-axis or Y-axis will give
you a quantitative (continuous) date axis (dates are displayed as a range of values).
Page 141
Chapter
11
Build a Cross-Tab Viz
Cross-tab visualizations allow you to view the data in the lens in a tabular format. Each cross-tab viz appears
inside a panel (or box) in the workspace area of a vizboard page.
Topics:
•
FAQs—Cross-Tab Visualizations
•
Enable Cross-Tab Totals
FAQs—Cross-Tab Visualizations
This topic answers some frequently asked questions about cross-tab visualizations.
What is a cross-tab viz?
A Cross-Tab viz is a viz type that displays the data in the builder drop zones in a tabular, spreadsheet
format.
Why would I want to create a cross-tab viz instead of a chart viz?
You might want to create a cross-tab viz if you want to see the raw data comprising the points on a viz.
How is a cross-tab viz similar to and different than a chart viz?
A Cross-Tab viz is very similar to a Chart viz with a couple exceptions. The workspace is the same,
except the drop zones are called Columns and Rows instead of X-Axis and Y-Axis respectively. All
other functionality is the same, like filtering, sorting, sharing, and label controls.
How are the fields laid out in the table?
In a cross-tab viz, measure values are always shown in columns, regardless of where the measure fields
are placed in the Builder panel drop zones. Dimension fields placed in the Rows drop zone will show
Page 142
Data Analysis and Visualization Guide - Build a Cross-Tab Viz
as rows. However, dimensions placed in other drop zones such as Details, Color or Shape will subdivide the measure columns.
Can I switch from Cross-Tab to Chart?
Yes. All fields remain in their original drop zones.
I can't see all my data, where is it?
The size of a cross-tab viz doesn't change when more fields are added to a drop zone. If more columns or
rows are displayed in the viz, you can either make the viz size bigger or scroll within the viz to see more
data.
Enable Cross-Tab Totals
A Cross-Tab viz can be enabled to show totals for the data in columns and rows. For example, if each
row shows the sum of orders, then turning on totals for rows creates a column showing the sum of all
orders in each row.
Totals can be enabled separately for rows and columns. When totals are enabled for both rows and
columns, the intersection of these totals shows the grand total.
When totals are enabled for columns, Platfora adds totals for each column in the cross-tab, including all
fields in the Marks drop zones and measures in any drop zone.
The value of each total is calculated from the original values in the dataset records (rows) that contribute
to the values displayed in the row or column. That is, the total value is not an aggregate of the aggregate
values. For some aggregate functions (sum, count, maximum, and minimum), you can easily verify the
total using the values displayed in the row or column. For example, the total of sums is the addition of
the values displayed in the row or column. And for maximums, the total is the highest value displayed in
the row or column.
Page 143
Data Analysis and Visualization Guide - Build a Cross-Tab Viz
When the viz contains multiple date or time dimensions, Platfora only shows totals for the most granular
date or time. For example, if the viz contains month and quarter data, the totals are shown for months.
Note that Platfora cannot show totals for a column or row under the following circumstances:
• The column or row contains a measure that is configured to show percent of total.
• The column or row contains a field that uses an expression containing the ROLLUP function.
To show totals for column data, choose Show the total per column from the Columns pull-out
menu. To show totals for row data, choose Show the total per row from the Rows pull-out menu.
Page 144
Chapter
12
Build a Polar Chart Viz
Polar chart visualizations allow you to do exploratory analysis of aggregate data on a chart using polar
coordinates.
Topics:
•
FAQs—Polar Chart Visualizations
•
About the Polar Chart Viz Workspace
FAQs—Polar Chart Visualizations
A polar chart is a circular chart that shows information as polar coordinates. This topic answers some
frequently asked questions about polar chart visualizations.
What is a polar chart?
A polar chart is a circular chart that use values and angles to show information as polar coordinates.
You can perform the same functions on a Polar Chart viz as a Chart viz, such as sort, filter, pan and
zoom, and export.
In the polar coordinate system, a point on a plane is determined by a distance (the radius) from a fixed
point (the pole) and an angle from a fixed direction (polar angle).
Page 145
Data Analysis and Visualization Guide - Build a Polar Chart Viz
In Platfora, a polar chart uses the Y-axis drop zone for the radius and the Angle drop zone for the
polar angle. Additionally, Platfora uses the Size drop zone to determine the size of each mark starting
from the perimeter going toward the center.
How does a polar chart display data?
Like a (rectangular) Chart viz, a Polar Chart viz displays data points as marks (but using polar
coordinates). Currently, Platfora supports one mark type for polar charts, Bar. This allows you to create
two different kinds of polar chart visualizations: pie charts and donut charts. (Yum!)
What is a pie chart?
A pie chart is a type of polar chart that is a circle divided into sectors, with each sector illustrating a
percentage of the total. The angle of each sector is proportionate to the percentage it represents. For
example, an angle of 90 degrees represents 25%.
Pie charts are similar to bar charts that are configured to show Percent of Total on a measure axis.
What is a donut chart?
A donut chart is similar to a pie chart, except it has a blank center (hole). The blank center is due to the
Size of the mark not reaching all the way to the center of the circle. The size of a mark is its length
measured from the perimeter going toward the center of the chart.
By default, the size of each mark is the same. However, donut charts can show additional measure data
by changing the size of each mark to reflect the measure value.
Why would I want to represent data as a pie or donut chart?
Pie and donut charts are useful for showing relative sizes at a glance. The human eye cannot easily
distinguish between angles that have similar sizes, especially if there are many very small sectors
(marks). Platfora recommends using a pie or donut chart under the following circumstances:
• You only have one (pie chart) or two (donut chart) measures to display in the viz.
• All values to display in the viz are positive (no zero or negative values).
• The dimension values to display represent part of a whole.
Page 146
Data Analysis and Visualization Guide - Build a Polar Chart Viz
• The number of dimension values to display is low, such as less then 10. You can sort and filter a
dimension field to reduce the values displayed.
How do I create a donut chart?
Create a new viz, choosing Polar Chart as the viz type. Then drag a dimension field into the Color
drop zone. When no measure is placed in the Angle drop zone, Platfora uses the default measure for the
Angle drop zone. Optionally, you can place a different measure field into the Angle drop zone.
Page 147
Data Analysis and Visualization Guide - Build a Polar Chart Viz
How do I create a pie chart?
Create a donut chart using the instructions above, and then edit the Size drop zone setting to maximize
the size. The default size is 0.00x, so change it to 1.00x.
Can I convert between a Chart and Polar Chart viz?
Yes, you can convert between Chart and Polar Chart viz types. When going from chart to polar
chart, any field in the Y-Axis drop zone is moved to the Details drop zone. However, when going
from polar chart to chart, all fields remain in their original drop zones. This is also true if change from
chart to polar chart and then click Undo to return to chart.
Page 148
Data Analysis and Visualization Guide - Build a Polar Chart Viz
What happens when the measure field in the Angle drop zone contains
negative or zero (0) values?
Polar chart visualizations cannot display negative or zero values in the Angle drop zone, so when a
measure field contains those values, Platfora displays a warning icon in the upper right corner of the viz.
Click the warning icon to open a dialog explaining how many values do and do not appear in the viz.
When I sort data in a polar chart viz, where does Platfora start the ordered list
of marks?
The first mark in a sorted list starts at the Y-axis, which is the top half of a vertical line going through
the circle. The next mark in the sorted list is placed next to the first in a clock-wise rotation.
Page 149
Data Analysis and Visualization Guide - Build a Polar Chart Viz
About the Polar Chart Viz Workspace
The Polar Chart workspace is similar to the Chart workspace, but with slightly different drop zones
in the Builder panel. When you edit a polar chart viz within a vizboard, the Builder panel contains
the tools you need to build the visualization. The control panels on the left-hand side of the workspace
are used to select lens data and design the visual representation of the data. The control panels on the
right-hand side of the viz workspace show the data filters and appearance-encoding legends that are
active in the selected visualization.
Polar Chart visualizations use a circular layout of length and angle. Fields added to Angle divide the
circle into sectors. Currently, Platfora does not support the Y-axis drop zone (equivalent to length) in
polar charts. Polar chart visualizations have no axes and therefore have no headers that display the field
names. But they do have labels to display individual field values.
A mark is the visual representation of a measure value calculated for a group of input records or rows.
A group consists of records that share the same value for the dimension(s) used in the visualization.
Hovering over an individual mark will show the data details in a tooltip.
1. Choose viz type
2. Angle drop zone
3. Y-axis drop zone (currently unavailable)
4. Mark type (currently Bar only)
Page 150
Data Analysis and Visualization Guide - Build a Polar Chart Viz
5. Mark appearance options
6. Mark appearance controls
7. Mark
8. Tooltip
Page 151
Chapter
13
Build a Geo Map Viz
Geo map visualizations allow you to do geographical analysis using an aggregate lens. A geo map viz is similar
to a scatterplot viz on a map background.
Topics:
•
FAQs—Geo Map Visualizations
•
About the Geo Map Viz Workspace
FAQs—Geo Map Visualizations
This topic answers some frequently asked questions about geo map visualizations.
What is a geo map viz?
A Geo Map is a viz type that allows analysts to perform geographic analysis on a lens that contains
location data. It includes the Geography drop zone that places marks representing positions (using a
location field) on a map background.
A geo map viz is similar to a scatterplot chart viz except the marks are displayed on a map background,
but both coordinates are represented by the same field. You can configure the mark appearance drop
zones like a Chart viz.
Who can create a geo map viz?
To create a geo map viz, a user must have the Analyst (Limited) role or higher and also have data
access permission on a dataset that contains geo-encoded data.
How can I create a geo map viz?
When creating a viz, choose Geo Map as the viz type. Platfora lists all datasets that have a location
field and a built lens. In the viz, place a location field (
other viz settings as desired like a Chart viz.
) in the Geography drop zone. Configure all
What is a location field?
A location field ( ) is a dataset field encoded with a complex datatype that includes geo coordinate
information (latitude and longitude) and a label that associates a location name with the coordinates.
Page 152
Data Analysis and Visualization Guide - Build a Geo Map Viz
Depending on how the location field was defined in the dataset, the label value may come from another
field (visible or hidden) in the dataset or it may be a unique string that Platfora generates from the
coordinate values (for example @(122.33063°W, 37.541886°N)).
Only location fields can be placed in the Geography drop zone. When placed there, the coordinates
of each data value are placed on a map using the Point mark type. When you hover over a mark, the
tooltip displays all data associated with that mark value: the label name, latitude, and longitude.
Can I filter on a location field?
Yes. Filtering on a location field filters on the location name (label) assigned to each value as a
categorical dimension. For example, if you have a location field for zip code data and the location name
Page 153
Data Analysis and Visualization Guide - Build a Geo Map Viz
for each value is the 5-digit zip code, you can filter on values such as 94402 and 94111, but not on the
latitude and longitude coordinates.
Can I use a location field in other drop zones, even if it's in a different type of
viz?
Yes. When a location field is in a different drop zone in any type of viz, Platfora uses the location name
value in that drop zone, not the coordinate values. For example, if you place the State Location field into
the Labels drop zone, Platfora displays California as the mark label for a location inside the state of
California.
Why does my map have points off the western coast of Africa at latitude 0 and
longitude 0?
This is the default geo coordinate position in Platfora. If you see this value in your location field, it
usually means the latitude and longitude coordinates were missing (or NULL) for a record. NULL
values can result from missing data values in the raw data, or can result from a dataset row that was
not able to join to its corresponding referenced dataset row. In either case, Platfora substitutes the 0,0
coordinates for the missing location data.
Page 154
Data Analysis and Visualization Guide - Build a Geo Map Viz
Optionally, it might mean someone was sailing off the coast of Africa at this exact location when this
event record was generated.
How does Platfora render maps?
Platfora uses Google Maps to render geo map visualizations. Your system administrator needs to
configure Platfora to use the Google Maps service in order to create a geo map viz.
When I export my map viz to a PNG or PDF file, the map background looks
different, why is that?
Platfora has modified the visual style of Google Maps in the viz as shown in the web browser. This is to
make it easier to view and distinguish marks on the map background. For example, Platfora has reduced
the text information on the map to reduce interference with mark points and mark labels. However, due
how Google Maps works, some aspects of this modified visual style are lost when exporting to a PNG
or PDF file. Platfora has tried to reduce the difference in map style between what you see in the web
browser and what you see in an exported file as much as possible.
How do I zoom in and out on the map?
Platfora uses its own pan and zoom functionality to zoom in and out of the map and to move the map in
the viz window. The Google Maps zoom functionality has been disabled.
I can't create a geo map viz because I get an error saying "Google Maps
service unavailable. Make sure Platfora is configured with a valid Google Maps
Client ID." What do I do?
To create a geo map viz, Platfora must be configured to use the Google Maps service. If you see this
message in a vizboard, contact your system administrator.
Page 155
Data Analysis and Visualization Guide - Build a Geo Map Viz
Can I convert between a Chart and Geo Map viz?
Yes, you can convert between Chart and Geo Map viz types. When going from chart to geo map, any
field in the Y-Axis drop zone is moved to the Details drop zone, and the top most field in the X-Axis
drop zone is moved to the Geography drop zone. If there are multiple fields in the X-Axis drop zone,
all fields below the top field are moved to Details.
When going from geo map to chart, the field in Geography is moved to X-Axis, and all other fields
remain in their original drop zones. This is also true if change from chart to geo map and then click
Undo to return to chart.
What happens when the location field in the Geography drop zone contains
invalid geo-coordinate values?
Geo map visualizations can only display valid latitude and longitude values, so when a location field
contains invalid values, Platfora displays a warning icon in the upper right corner of the viz. Click the
warning icon to open a dialog explaining how many values do and do not appear in the viz.
Page 156
Data Analysis and Visualization Guide - Build a Geo Map Viz
Why can't I see my data points near the poles?
Some values are outside of the mappable area in Google Maps, either too far south or north. Google
Maps uses a variant of the Mercator projection and therefore cannot accurately display points near the
poles. Instead, it limits the mappable area to latitude values from approximately -85 degrees to +85
degrees. Unfortunately, this limits some of the polar bears, penguins, and earthquakes that can be placed
on a map.
About the Geo Map Viz Workspace
The Geo Map workspace is similar to the Chart workspace, but with slightly different drop zones in
the Builder panel. When you edit a geo map visualization, the Builder panels contain the tools you
need to build the visualization. The control panels on the left-hand side of the workspace are used to
select lens data and design the visual representation of the data. The control panels on the right-hand
side of the viz workspace show the data filters and appearance-encoding legends that are active in the
selected visualization.
Geo map visualizations display location data on a map background. Geo-encoded fields added to
Geography appear as points on the map. On a geo map viz, a mark is the visual representation of a
position with geographical coordinates (latitude and longitude). All marks on a geo map viz are Point
marks. Hovering over an individual mark shows the data details in a tooltip.
1. Choose viz type
2. Geography drop zone
Page 157
Data Analysis and Visualization Guide - Build a Geo Map Viz
3. Mark type (Points only)
4. Mark appearance options
5. Mark appearance controls
6. Mark
7. Tooltip
8. Location field
Page 158
Chapter
14
Build a Funnel Viz
Funnel analysis visualizations allow you to track user behavior across one or more fact datasets defined in an
event series lens. The analysis is performed on individual events instead of aggregated data in order to find
sequential patterns in behavior.
Topics:
•
FAQs—Funnel Analysis Visualizations
•
About Event Series Analysis
•
About the Funnel Analysis Viz Workspace
•
Define and Analyze Funnel Stages
•
Analyze Funnel Stages Across Dimensions
FAQs—Funnel Analysis Visualizations
A funnel is a visualization type that tracks users' behavior across a sequence of events. This topic
answers some frequently asked questions about funnel visualizations.
What is a funnel?
A funnel is a visual analysis type that tracks users' behavior across a sequence of events. Each step in
the sequence is defined as a stage. Each funnel stage shows progressively decreasing proportions of the
original set of users. The first stage has 100% of the original group of users by definition.
For example, a funnel can be used to track the pathways users take through a website, such as visiting a
page, then viewing a video, and then registering. The first stage would be defined as all users who click
that page, the second stage would be defined as the users from the first stage who then viewed the video,
and the third stage would be defined as the users from the second stage who then registered with the
website.
The Platfora documentation uses the term users in a generic sense. Users can
consist of any type of dimension, such as customers, sessions, devices, players,
etc.
Page 159
Data Analysis and Visualization Guide - Build a Funnel Viz
Why would I want to create a funnel?
Use funnels to look for patterns in behaviors among users of a particular group. For example, you
might define a funnel for a particular sequence of events, and compare how users in different segments
compare to the total and to each other.
You can also use funnels to understand at which stages the most drop-off occurs in a multi-step
conversion process (conversion rate).
Who can create a funnel viz?
Funnels can be created by Platfora users who have the Analyst system role (or above), provided they
also have access to the underlying source data in Hadoop.
How are funnel visualizations created?
Funnels are based on event series lenses. When adding a new viz, you first choose Funnel as the viz
type. When funnel is selected, Platfora lists the datasets that have an event series lens defined. For more
context about event series lens, see About Event Series Analysis.
In a funnel viz, you create stages and then define filters for each stage. The order of stages is important.
By default, the funnel viz counts all users that meet the criteria for each stage. However, you can analyze
the flow through each stage for different sub-groups of users. You do this after defining the funnel stages
by dragging into the Rows drop zone one or more dimension fields.
How is each stage defined?
Each stage is based on an event dataset used in the lens with one or more stage conditions applied. Each
stage can use a different event dataset, allowing analysts to define flows from different sources of event
data related to the same users. For example, you can define one stage based on website clicks and the
next stage based on customer service phone calls.
A member of a stage is a user that has a record in the event dataset, meets all conditions defined in
the stage, and whose timestamp is greater than the timestamp of its record in the previous stage. For
example, if stage one is defined as users who clicked home.html, and stage two is defined as users who
clicked checkout.html, then any user who clicked checkout.html after clicking home.html is a member of
both stage one and two, regardless of any website clicks in between those events.
About Event Series Analysis
There are two ways to perform event series analysis in Platfora. This topic explains how to use an event
series lens in a funnel visualization.
Event series analysis in Platfora involves partitioning events by some entity (such as a user), ordering
events records by their timestamp, and then looking for interesting patterns of behavior (events). Platfora
uses the Funnel viz type to perform event series analysis on a lens type that was created for this type of
analysis.
Page 160
Data Analysis and Visualization Guide - Build a Funnel Viz
An event series lens (ESL) is a special type of lens created from datasets that were modeled specifically
for event series analysis. Data administrators model one or more event datasets around a common entity
dataset.
Suppose a company has data sources for web page visits by users, email campaigns sent to users, and
calls made by users to the company call center. As long as all events have a common entity (the user),
they can be combined into a single event series lens and analyzed together in Platfora. This might result
in the following datasets:
Once the data is modeled in this way, then data administrators can create event series lenses that include
event records from multiple datasets. In a vizboard, analysts can then use an event series lens to do
funnel analysis. Funnels are currently the only viz type available on event series lens data.
The purpose of a funnel viz is to tracks users' behavior across a sequence of events, with each step in
the sequence defined as a stage. The behaviors for the users come from the various event datasets. For
example, using the dataset model above, you could define one stage in the funnel from the Click Event
dataset, and the next stage from the Email Event dataset.
About the Funnel Analysis Viz Workspace
When you create and edit a funnel analysis visualization, the builder panels contain tools specifically
needed to build a funnel viz. Use the panels on the left-hand side to select an event series lens, define
the funnel stages, and analyze the funnels for different users. The panels on the right-hand side of the viz
workspace show the data filters and comments applied to the selected viz.
Funnel visualizations display the number of users in each stage in order. A stage is a single phase (step)
in the process that makes up the funnel.
The builder panels in the left-hand side of the workspace display on two tabs.
Page 161
Data Analysis and Visualization Guide - Build a Funnel Viz
Stages Tab
Define the stages and their conditions on the Stages tab. The stage builder includes the controls to
define stages.
1. Toggle builder panels on/off
2. Stages Builder Panel
3. Stage Definition
4. Stage Event
5. Stage Condition
6. Edit Lens Button
7. Add Menu Button
8. Viz Menu Toolbar
9. Funnel for one distinct user
10.Funnel for all users
11.Filter Controls
Page 162
Data Analysis and Visualization Guide - Build a Funnel Viz
Analysis Tab
Configure the viz to display funnels for different users on the Analysis tab. The analysis builder panel
has a Rows drop zone to drag dimension fields from the dimension dataset into.
1. Analysis Tab
2. Lens Panel, listing dimension fields
3. Analysis Builder Panel
4. Field Controls
5. Count Column, listing distinct user count
6. % of Total Column, listing conversion rate from the first stage
7. % of Previous Column, listing the conversion rate from the previous stage
Page 163
Data Analysis and Visualization Guide - Build a Funnel Viz
Define and Analyze Funnel Stages
To create a funnel analysis visualization, you first choose an event series lens and then define the stages
in the funnel. Define funnel stages in the stage builder of a funnel viz.
1. Enter a name for the stage.
2. Choose an event defined in the lens.
3. (Optional) Define a condition for the stage by first choosing a field and then the condition that the
users must meet to reach this stage.
Platfora lists all datetime and dimension fields in the event and dimension dataset. The event or
reference name appears after the field name. This is useful when different datasets have fields with
the same name.
4.
(Optional) Click the
5.
icon to add another stage condition.
(Optional) Click Create New Stage to add another stage to the funnel. Or, click the
duplicate a stage and then edit the copy.
You can change the order of a stage in the funnel using the
down.
Page 164
and
icon to
icons to move a stage up or
Data Analysis and Visualization Guide - Build a Funnel Viz
Analyze Funnel Stages Across Dimensions
Funnel visualizations show the total number of users who reach each stage. You can also analyze
funnels by comparing funnels across dimension fields.
1. Compare funnels across dimensions on the Analysis tab. Users are grouped and filtered by the
dimensions in the Rows drop zone. Platfora lists all dimension fields in the dimension dataset.
2. You can compare each dimension against the values of the entire population using the Enable
Baseline option.
3. The vertical red lines are the baseline indicators.
Page 165
Chapter
15
Explore Marks in a Viz
The Platfora vizboard has a number of tools for exploring individual marks (data points) or sets of marks in a
viz. You can select or highlight marks, hover over a mark to see its data details, or pan and zoom to focus on a
particular area of marks in a viz.
Topics:
•
Select and Highlight Marks on a Viz
•
Understand Data Values Not Displayed in Viz
•
View the Data Values for a Mark
•
Zoom and Pan in a Viz
•
Drill Down Through Dimension Fields
Select and Highlight Marks on a Viz
While working in a visualization, you can isolate particular marks on the visualization by selecting or
highlighting them. The selection applies to all marks on a vizboard page that share the same dimension
group. For example, if you have two visualizations on a page that were created using the same lens data,
highlighting a mark in one will also highlight related marks in other visualizations on the same page.
To select marks on a visualization, you can:
1.
Use the selection tool (
) to select an individual mark in the visualization.
Press the CTRL key (on Windows) or Command key (on Mac) and click an individual mark to add
it to the selection.
2. Click a dimension member in the Legends panel.
Page 166
Data Analysis and Visualization Guide - Explore Marks in a Viz
3. Click above a mark in a visualization and drag across and/or down to select an area of multiple
marks.
To deselect marks on a visualization:
• Press the ESC key.
• Click any whitespace in the visualization.
• Press the CTRL key (Windows) or Command key (Mac) and click an individual mark to remove it
from the selection.
Page 167
Data Analysis and Visualization Guide - Explore Marks in a Viz
Understand Data Values Not Displayed in Viz
Some visualizations are unable to display all data values as marks. When this occurs, Platfora displays a
warning informing you that some data values are not shown.
Platfora does not display a value as a viz mark under the following circumstances:
• Word Cloud Viz — The number of values exceeds the configured maximum to display. By default,
a word cloud viz displays a maximum number of 1500 words. However, your system administrator
might change this value using the platfora.viz.word.limit configuration property. Also, the
viz size could be too small for the currently configured mark sizes. In this case, you can increase the
viz size, or edit the Size drop zone and decrease the maximum size to a value less than 1.00px.
• Packed Bubbles Viz — The number of values exceeds the configured maximum to display. By
default, a packed bubbles viz displays a maximum number of 40,000 marks. However, your system
administrator might change this value using the platfora.viz.bubble.limit configuration
property.
• Geo Map Viz — Some values are outside of the mappable area in Google Maps, either too
far south or north. Google Maps uses a variant of the Mercator projection and therefore cannot
accurately display points near the poles. Instead, it limits the mappable area to latitude values from
approximately -85 degrees to +85 degrees.
• Polar Chart Viz — Polar chart visualizations cannot display negative or zero values in the Angle
drop zone.
Page 168
Data Analysis and Visualization Guide - Explore Marks in a Viz
Platfora displays a warning icon
in the upper right corner of the viz when it doesn't display all
values. Click the warning icon to open a dialog explaining how many values do and do not appear in the
viz.
Page 169
Data Analysis and Visualization Guide - Explore Marks in a Viz
Page 170
Data Analysis and Visualization Guide - Explore Marks in a Viz
View the Data Values for a Mark
You can hover your mouse over any mark in a viz to see the data values that comprise that mark. The
measure value(s) for the dimension group represented by the mark are shown in a tooltip.
Zoom and Pan in a Viz
A visualization (viz) fits into a fixed-size panel on a vizboard page, and is always rendered at 100%
scale. To explore different areas of a viz in more detail, you can use the viz pan and zoom controls. The
viz view size always re-adjusts to 100% when you change the viz definition, exit the vizboard, or switch
between pages.
Page 171
Data Analysis and Visualization Guide - Explore Marks in a Viz
Zooming In
To zoom in to a particular area of a viz, select the zoom control
want to explore. Each click enlarges the viz by 100%.
Page 172
and click the area of the viz you
Data Analysis and Visualization Guide - Explore Marks in a Viz
To enlarge a viz by 100%, click the plus control
.
Page 173
Data Analysis and Visualization Guide - Explore Marks in a Viz
Zooming Out
To zoom out:
•
•
•
Select the zoom control
click within the viz.
, press the CTRL key (on Windows) or Command key (on Mac), and
Click the minus control
.
Click the reset control
to reset the viz to 100% size.
Page 174
Data Analysis and Visualization Guide - Explore Marks in a Viz
Panning
While a viz is zoomed in to larger than 100%, you can use the pan control
of the viz into the view.
to drag a particular area
Drill Down Through Dimension Fields
Platfora provides access to all of your data, allowing analysts to interactively explore the data. Using
Platfora’s drill down capability, analysts can more easily explore data in more detail by double-clicking
on a dimension field in a visualization.
About Drilling Down
When a lens has a dataset with a drill path defined in it, you can view measure data in more detail by
navigating (drilling down) through the hierarchy of fields defined in the drill path. Drilling down in a viz
allows analysts to easily explore the data and view it more granularly.
When a dimension field in a drill path is placed in a drop zone, you can drill down on a particular field
value to view the measure data with a more granular dimension. For example, when viewing Sales by
Quarter, you could drill down on Q1 to view Q1 Sales by Month.
You can continue to drill down further through the hierarchy defined in the drill path until you’re
viewing the most detailed field defined in the path.
Page 175
Data Analysis and Visualization Guide - Explore Marks in a Viz
Drilling down on a field is effectively the same as viewing a different field in the builder drop zone and
applying a filter to the viz.
For example, the lens in the viz below uses the built-in drill path named Time and includes of the
following fields: AM/PM, Hour by 6, Hour by 3, and Hour. AM/PM is placed in the X-axis drop zone.
When you drill down on PM along the x-axis of the viz, the following occurs:
• Hour by 6 replaces PM in the X-axis drop zone.
Page 176
Data Analysis and Visualization Guide - Explore Marks in a Viz
• A drill filter is applied to the viz that filters on records that occur in the PM (between noon and
midnight).
Page 177
Data Analysis and Visualization Guide - Explore Marks in a Viz
1.
Gray drill path icon
in a Builder drop zone indicates you can drill down on this field.
2. Tooltip shows which field(s) you can drill down to.
3.
Blue drill path icon
in a Builder drop zone indicates this field was placed in the drop zone
because another field was drilled down on to reveal this field.
4. A drill filter is created when you drill down on a field. As you drill down further in the hierarchy,
additional field filters are added to the drill filter.
When a lens doesn't include a field in drill path, that field is skipped when drilling
down. In this example, if the lens in the viz did not contain the Hour by 6 field,
then drilling down on PM would instead cause the Hour by 3 field to replace the PM
field.
Drill Down FAQ
This topic answers some frequently asked questions about drilling down in a Platfora visualization.
What does it mean to drill down?
Drill down is a data analysis technique for navigating from the most summarized to the most detailed
categorization of a particular dimension field.
How can I drill down in a viz?
You can drill down in a viz by double-clicking a particular field value on an axis, a viz mark, or a crosstab cell.
What happens when I drill down on a field in a viz?
When you drill down on a field, Platfora places the next field in the drill path into that field's drop zone,
and it applies a filter to the viz. The drill filter that is applied filters the field value you drilled on.
For example, when you drill down on Year 2012, Platfora replaces the Year field in the drop zone with
the Quarter field, and it applies a filter to include only data from the year 2012.
When can I drill down on a field?
To drill down, two or more fields from a drill path must be included in the lens. To drill down on a
particular dimension, the lens must have at least one downstream field in the drill path. For example, if
a lens has fields A, B, and C from a drill path, and you place field A in a drop zone, then you can drill
down to field B and then to C. However, if you place field C in a drop zone, you cannot drill down.
How do I know if a field is drillable?
When a drillable field is placed in a drop zone, a gray drill path icon
is displayed in the drop zone.
After the field has been drilled upon, the dimension that is placed in the drop zone will have a blue drill
path icon
.
Page 178
Data Analysis and Visualization Guide - Explore Marks in a Viz
How do I know what field will be navigated to when I drill down?
When a field is in a Builder drop zone, you can choose View drill path from the field menu. Platfora
shows all drill paths that apply to this field. The current field is highlighted in bold. When multiple drill
paths are available, Platfora follows the drill path that comes first alphabetically.
Additionally, you can hover the cursor over a viz mark or cross-tab cell. The tooltip lists the fields that
will be navigated to.
Can I drill down on multiple fields concurrently?
Yes. You can drill down on one dimension at a time, or on all dimensions with a defined drill path
depending on how you drill down in the viz:
• One dimension – In a chart viz, double-click on the field value for that dimension value on the
horizontal or vertical axis.
• All dimensions – Double-click an individual mark in a chart viz or a cell in a cross-tab viz.
Can I "drill up?"
Yes. You can remove a drill path filter from the Filters panel to “drill up.” When you continue to drill
down in a viz, a new field filter is added to the drill filter. You can remove the most recent drill filter to
move up in the drill path hierarchy one field at a time, or you can remove the entire drill filter to revert
to the highest level of the hierarchy.
Drill Down on a Field Value in a Chart Axis
Double-clicking on a field value for a drillable field on the axis of a Chart visualization drills down on
that value.
Page 179
Data Analysis and Visualization Guide - Explore Marks in a Viz
To drill down on a field value in a Chart axis, hover over the label of the value you want to more detail
on, and then double-click. The tool tip displays the field that will be navigated to.
Page 180
Data Analysis and Visualization Guide - Explore Marks in a Viz
In this viz, the user drills down on the year 2012 on the X-axis and then the viz shows Quarter.
Drill Down on a Viz Mark
Double clicking on a single mark in a Chart visualization drills down on all drillable fields in the
Builder panel.
Page 181
Data Analysis and Visualization Guide - Explore Marks in a Viz
To drill down on a viz mark, hover over the viz mark you want to more detail on, and then double-click.
The tool tip displays the fields that will be navigated to.
In this viz, two drillable fields are in the Builder panel, Year and State. When the user drills down on the
mark shown, both Year and State are drilled down on to Quarter and Neighborhood respectively.
Page 182
Data Analysis and Visualization Guide - Explore Marks in a Viz
Drill Down on a Cross-Tab Cell
Double clicking on a single cell in a Cross-Tab visualization drills down on all drillable fields in the
Builder panel.
Page 183
Data Analysis and Visualization Guide - Explore Marks in a Viz
To drill down on a cell, hover over the cell you want to more detail on, and then double-click. The tool
tip displays the fields that will be navigated to.
Page 184
Data Analysis and Visualization Guide - Explore Marks in a Viz
In this viz, two drillable fields are in the Builder panel, Year and State. When the user drills down on the
mark shown, both Year and State are drilled down on to Quarter and Neighborhood respectively.
Page 185
Data Analysis and Visualization Guide - Explore Marks in a Viz
View a Drill Path in a Viz
When a drillable field is in a drop zone, you can view all fields in each drill path the field is a member
of. The current field is highlighted in bold to easily where in the path it’s located.
1. From the drillable field in the Builder panel, select View drill path.
Page 186
Data Analysis and Visualization Guide - Explore Marks in a Viz
2. View the drill path(s) in the dialog that displays.
3. Click OK.
Drill Up
When you drill down on a field, filters are created on that visualization. You can "drill up" in a viz by
removing these filters.
When you drill down in a viz, a drill filter is created with a single field filter. When you continue to drill
down the hierarchy defined in the drill path, a new field filter is added to the drill filter. You can move
up in the drill path hierarchy (drill up) by removing either of these filters. The type of filter you remove
determines how far up the hierarchy you navigate.
Drill Up to the Highest Level in the Drill Path
Remove the entire drill filter to navigate to the highest level in the drill path hierarchy. Removing the
drill filter removes all field filters it contains.
Page 187
Data Analysis and Visualization Guide - Explore Marks in a Viz
Hover over the gray drill path icon
icon changes to a blue delete icon
in the Filters panel for the drill filter to remove. The drill path
. Click the
icon to remove the entire drill filter.
Drill Up One Level in the Drill Path
Remove the most recent field filter in a drill filter to drill up one level on that drill path.
Page 188
Data Analysis and Visualization Guide - Explore Marks in a Viz
Click the gray delete icon
at the bottom of a drill filter to remove that most recent field filter.
When the drill filter contains one field filter, removing the field filter is the same as
removing the entire drill filter.
Page 189
Chapter
16
Prepare Pages and Dashboards
A vizboard is made up of one or more pages. A page can contain multiple visualizations, and those
visualizations may or may not use the same underlying lens data. Within a page, you can add multiple
visualizations, edit them, arrange them, or delete them. While working in a vizboard, you work on one
page at a time, but you can easily move visualizations between pages.
A visualization can be thought of as an individual insight in the overall data story of the vizboard, and pages are
a way to group visualizations together around a particular theme.
Topics:
•
FAQs—Vizboard Pages
•
Resize a Page to Fit the Browser Window
•
Show and Hide Tool Panels
•
Manage Viz Layout
•
Edit a Visualization
•
Arrange Visualizations on a Vizboard Page
•
Preview a Vizboard with View Only Permission
FAQs—Vizboard Pages
A vizboard page is a separate canvas that contains one or more visualizations. This topic answers some
frequently asked questions about vizboard pages.
Page 190
Data Analysis and Visualization Guide - Prepare Pages and Dashboards
How do I create a vizboard page?
By default, a new vizboard has one page. You can add additional pages by clicking Add Page at the
top of the Pages panel. The new page is added at the bottom of the list in the Pages panel. Click the
page icon to select it.
Page 191
Data Analysis and Visualization Guide - Prepare Pages and Dashboards
How do I rename a page?
You can change the name of a page at any time by clicking its name in the Pages panel and editing it
directly. All pages are given a default name of Page X (where X is a number) when they are first created.
You must have edit permissions on a vizboard in order to rename a page.
How do I resize the canvas of a page?
You can resize a page canvas using either of the following methods:
• Use the View > Resize Page menu. Platfora changes (either increases or decreases) the canvas
size to take up the space currently shown in the web browser. Platfora also changes the size of each
viz proportionately. For more information, see Resize a Page to Fit the Browser Window.
Page 192
Data Analysis and Visualization Guide - Prepare Pages and Dashboards
• Click and drag the lower right corner of the canvas to change the canvas to the desired length and
width.
Page 193
Data Analysis and Visualization Guide - Prepare Pages and Dashboards
How do I make a copy of a page and all visualizations on the page?
You can make a copy of page on the current vizboard (duplicate) or another vizboard. The page is added
as the last page in the vizboard and is titled "Copy of pagename."
1. Click the
icon to make a copy of the page in another vizboard. Platfora prompts you to choose the
vizboard name.
2. Click the
icon to make a duplicate copy of the page in the current vizboard.
Page 194
Data Analysis and Visualization Guide - Prepare Pages and Dashboards
How do I delete a page?
In the Pages panel, hover over the desired page and click the delete icon.
Can I change the order of the pages in a vizboard?
Yes. Click and drag a page in the Pages panel and place it in the desired order.
Can I move a viz from one page to another?
Yes. Click the viz in the middle of its toolbar and drag it to the desired page in the Pages panel. Then
go to the new page to position the viz as desired.
Page 195
Data Analysis and Visualization Guide - Prepare Pages and Dashboards
Resize a Page to Fit the Browser Window
While working in a vizboard, you can resize the visualizations to fit the available canvas area on the
page. This allows you to easily adjust the page canvas size as you resize your browser window or show
and hide the Builder panels.
To resize visualizations to fit the available workspace on the page, select View > Resize Page.
Note that this correctly resizes the visualizations to fit the available horizontal space, but can compress
visualizations vertically if you have multiple visualizations on the page below the visible area.
Page 196
Data Analysis and Visualization Guide - Prepare Pages and Dashboards
Show and Hide Tool Panels
You can show and hide the Pages, Builder, and Filters tool panels as you work on a vizboard page.
Hiding the panels allows you to have more workspace for viewing and arranging visualizations. When
you need to edit a visualization or page, you can toggle the tool panels on again.
Use the Pages, Builder, and Filters buttons on the sides of the vizboard page to show and hide the
tool panels. You can also close a panel by clicking the X in the right corner of the panel.
Manage Viz Layout
Vizboard pages can contain one or more visualizations. You can arrange visualizations on a page and
edit, rename, and delete them as necessary.
Page 197
Data Analysis and Visualization Guide - Prepare Pages and Dashboards
Edit a Visualization
To edit a visualization in a vizboard, simply click anywhere inside the visualization. Use the tools in the
Builder panels to work on the visualization.
Arrange Visualizations on a Vizboard Page
By default, new visualizations are added to the middle of a vizboard page in fixed-size panel. You can
move and resize visualizations on a vizboard page by dragging them by the panel borders.
Control Where New Visualizations are Added on a Page
By default, new visualizations are added to the middle of a vizboard page (Add to center of
page), sometimes overlapping the visualizations that are already on the page. You can choose to have
new visualizations added to the bottom of the page instead. To change the default position for new
Page 198
Data Analysis and Visualization Guide - Prepare Pages and Dashboards
visualizations, select View > Grow Page Downward. This option is set individually for each page
in the vizboard.
Page 199
Data Analysis and Visualization Guide - Prepare Pages and Dashboards
Resize Visualizations
To resize a visualization on the page, click a corner or border handle on the visualization, and drag it to
the desired size.
Move Visualizations
To move a visualization to another location on the page, click a border on the visualization (between the
border handles) or the viz header, and drag to the desired position.
Page 200
Data Analysis and Visualization Guide - Prepare Pages and Dashboards
Preview a Vizboard with View Only Permission
Save and Preview allows you to save a vizboard and preview it in presentation mode. This allows
you to view and test the vizboard as users with view only vizboard permissions will see it if you publish
or share it. While in preview mode, the viz builder tools and vizboard edit controls are hidden.
Users with View only vizboard object permissions can view all panels in the vizboard except the
Builder panel. That means they can view all pages in the vizboard, all local and page filters, and the
legends for each viz.
However, since they can't view the Builder panel, they cannot view the fields used in each drop zone
or the settings applied to them, such as sorts and limits. Therefore it's very important to make sure that
each viz clearly indicates what information is displayed. Consider editing viz titles and labels to ensure
they are informative. For example, if you limit the number of results in a viz, you could edit the viz title
to reflect that, such as "Top 10 Results."
For more information on view only vizboard permissions, see View Vizboard with View Only
Permission.
1. Select Save and Preview from the vizboard Save menu.
2. This opens the vizboard in preview or presentation mode. This allows you to test vizboard
functionality as view-only users will see it.
Page 201
Data Analysis and Visualization Guide - Prepare Pages and Dashboards
3. Click Edit to exit preview mode and return to edit mode.
Page 202
Chapter
17
Share and Collaborate
As you prepare visualizations and vizboards, you can share your insights with other Platfora users, collaborate
with other analysts on your findings, or export the viz or underlying data for use in an outside presentation or
application.
Topics:
•
Set Vizboard Permissions
•
Manage Vizboard Comments
•
Share a Link to a Vizboard
•
Export Viz Data
•
Export a Viz Image
•
Email a Single Viz as an Image
•
Share Vizboard as a PDF
•
Export a Viz as a New Dataset
Page 203
Data Analysis and Visualization Guide - Share and Collaborate
Set Vizboard Permissions
The user that creates a vizboard is the default vizboard Owner. The vizboard owner can view, edit, or
delete the vizboard. Vizboard owners can also change the vizboard access permissions to grant or revoke
access permissions for other Platfora users.
The default vizboard permissions grant View Only access to the Everyone group (all Platfora
users), but only the vizboard owner can edit the vizboard. For more information on view only vizboard
permissions, see View Vizboard with View Only Permission.
Vizboard access permissions are not the same as data access permissions.
Granting a user access to a vizboard does not necessarily mean they can see
the underlying data that comprises the visualizations in the vizboard. Users will
also need to have data access to the source data and datasets in order to see the
visualizations in a vizboard.
1. Select Permission Settings from the vizboard Share menu.
2. In the Sharing and Permissions dialog, select Add Collaborators.
3. In the Find User or Group dialog, select the users or groups to add to the vizboard. You can use
the search to quickly find users or groups by name.
4. Click Add after you have selected all of the users and groups you want to grant vizboard permissions
to.
5. Make sure the user or group is in the correct permission category (Own, Edit, or View Only).
If they are not, use the drop-down menu to the right of their name to move them.
To remove the user's vizboard permissions entirely, click the X to the right of their name.
6. Click OK.
Page 204
Data Analysis and Visualization Guide - Share and Collaborate
View Vizboard with View Only Permission
Vizboard owners can view, edit, or delete the vizboard and grant or revoke access permissions for other
Platfora users. Users with View Only object permissions on the vizboard can perform a limited set of
tasks on the vizboard.
By default, all Platfora users have view only access to vizboards (the default vizboard permissions
grant View Only access to the Everyone group). However, in order to see a visualization (viz) in a
vizboard, users also need to have data access to the source data and datasets used in the viz.
Users with view only permissions can perform a limited set of tasks on a vizboard depending on the user
role. All user roles with view only vizboard permissions can perform the following tasks:
• Change the filter properties for existing local and page filters. Note that filter changes are never saved
to the vizboard.
• Create a new filter by selecting a group of viz marks and choosing Isolate Selection or Exclude
Selection from the marks selected viz menu. Note that filter changes are never saved to the
vizboard.
• Turn live updates off and on. You might want to do this if you change several filter properties.
• Write and view comments on the vizboard. Any comments added by a view only user are
automatically saved to the vizboard.
• Export the vizboard as a PDF file.
• Send an email with the vizboard as a PDF file.
• Share a link to this vizboard.
• Send an email with a single viz as an image.
• Download the data used in a viz as a CSV file.
Page 205
Data Analysis and Visualization Guide - Share and Collaborate
• Download the image of a viz as a PNG file.
• Download the data lineage for a viz as a JSON file.
• Resize the page by choosing Resize Page from the View menu. Note that the new page size is not
saved to the vizboard.
In addition to the tasks above, users with Analyst (Limited) role with view only vizboard
permissions can perform the following tasks:
• Edit the vizboard without saving it, or save it as a new vizboard. To do this, click Edit. You can
change the visualizations on the page to see how the data is displayed differently. If you want to save
your changes, you must save the vizboard under a new name.
• Export the data used in a viz as a CSV file to a remote file system.
In addition to the tasks above, users with Analyst role (and higher) view only vizboard permissions can
perform the following tasks:
• Create a new lens from the fields used in the viz.
Manage Vizboard Comments
Analysts can share insights found in a vizboard with other Platfora users within the Platfora application.
FAQs—Vizboard Comments
You can collaborate with other analysts by sharing your insights using comments on a vizboard. This
topic answers some frequently asked questions about using this feature.
What are vizboard comments?
Vizboard comments are a way for analysts to share insights found in a vizboard with other Platfora
users. You can include text and even a snapshot of a particular viz for reference. Each vizboard
maintains a single, collective history of all comments on each viz in the vizboard.
Who can add comments to a vizboard?
Any user who is allowed to view a vizboard can add a comment to that vizboard.
How can I view vizboard comments?
When a vizboard already has comments on it, the Comments panel lists the number of comments.
Click the Comments panel to open the comments dialog.
Page 206
Data Analysis and Visualization Guide - Share and Collaborate
How do I create viz comment?
You can create a viz comment by clicking the Comments panel for the vizboard or by clicking the
comments icon (
) in a viz. For more information, see Create a Comment on a Viz.
Can I make changes to an existing comment?
Yes. You can edit a comment that you wrote previously. You can also delete a comment you wrote. To
edit or delete a comment, hover over the cursor over the comment and click either the edit (
delete (
) button.
Page 207
) or
Data Analysis and Visualization Guide - Share and Collaborate
What's the difference between a comment and a reply?
A reply is a comment that is threaded with another comment. New comments are added to the top of the
comment history, and replies appear below the original comment, indented one level.
Page 208
Data Analysis and Visualization Guide - Share and Collaborate
Vizboards change all the time. How can I ensure that someone else views a
particular version?
You can share a link to the current vizboard version by creating a permalink. Click the Share menu and
select Direct link to version in the Permalink section. Copy the URL provided and paste it into
the comment.
Create a Comment on a Viz
You can add a comment to a vizboard and optionally include a snapshot of a particular viz.
Page 209
Data Analysis and Visualization Guide - Share and Collaborate
1.
In the viz to comment on, click the comments button (
) in the viz toolbar.
2. Enter text for the insight you found.
3. (Optional) Click Snapshot to view a snapshot of the viz configured at this particular date and time.
4. (Optional) Click Use Snapshot to include the snapshot in the comment.
5. Click Post.
Page 210
Data Analysis and Visualization Guide - Share and Collaborate
Page 211
Data Analysis and Visualization Guide - Share and Collaborate
Share a Link to a Vizboard
Platfora provides a direct, permanent link (a permalink) to a vizboard that you can give to other Platfora
users. Copy the permalink provided in the vizboard and share it with others by pasting, such as in a
vizboard comment or email.
Platfora includes different permalink types that direct you to the following vizboard versions:
• Latest version. Use this permalink when you want others to always view the most up to date
information shown in the vizboard.
• Current version. Use this permalink when you want others to view the specific data presented in the
current version of the vizboard, even if more versions are made later. The URL in the permalink is
the same as the other permalink but includes a "version" parameter.
1. Click the vizboard Share menu.
2. In the Permalink section, select the permalink type, either Link to vizboard (latest version) or
Direct link to version (current version).
3. Select the URL and copy the text.
Page 212
Data Analysis and Visualization Guide - Share and Collaborate
Export Viz Data
You can export the data comprising a visualization as a CSV-formatted file, and download it to your
local computer or to a remote file system such as HDFS or S3.
Users in view-only mode can only download viz data as a CSV file to their local
computer.
1.
From the viz toolbar click the export menu
.
2. Select either Download Data as CSV or Export Data as CSV.
When you choose Download Data as CSV, a single a gzip-compressed comma-separated values
(csv) file is created on your Desktop (for Windows) or in Downloads (for Mac). The file naming
convention is:
dataset-name_lens-name_epoch-timestamp.csv.gz
When you choose Export Data as CSV, you must enter a URL to a remote file system in the
format of:
protocol://hostname:port/path-to-export-location
For example:
hdfs://10.80.231.123:8020/platfora/exports
Page 213
Data Analysis and Visualization Guide - Share and Collaborate
3. Depending on the size of the data requested, the export can take a while to complete. If downloading
data, stay on the page until the download starts or else the export will be cancelled.
Export a Viz Image
You can export a visualization as a PNG image file, and download it to your local computer.
1.
From the viz toolbar click the export menu
and select Download Chart as PNG.
2. This downloads a PNG image file to your local computer (to the Downloads folder on Mac or the
Desktop on Windows). The file name will be the same as the viz title.
Email a Single Viz as an Image
You can share a visualization with other users outside of Platfora by sending it in an email. The viz
will be sent as a PNG image embedded in an email message. To share a viz via email, Platfora must be
configured to connect to an email server.
Page 214
Data Analysis and Visualization Guide - Share and Collaborate
To share a viz via email, Platfora must be configured to connect to an email server.
1.
From the viz toolbar, click the email button
.
2. In the Send email to field, enter a comma-separated list of email addresses.
3. Edit the Subject field so the email recipients have more context about the email's contents. By
default the email subject line will be the same as the viz name.
4. In the Additional Comments field, enter any additional information that you want to include in
the email message body. This text will appear above the viz image in the email message.
Tip: The email is sent by whatever email account was configured by your Platfora system
administrator. You may want to include your name in the email message body so that the recipients
know the email is from you.
5. Click Send Email.
6. If Platfora is able to connect to the email server and send the email, you will see a confirmation at the
top of the page.
Tip: Platfora only reports if the email was successfully sent, not for failed deliveries. Any failed
delivery notifications will be sent to the email account that was configured by your Platfora system
administrator.
Page 215
Data Analysis and Visualization Guide - Share and Collaborate
Share Vizboard as a PDF
You can share a vizboard as a PDF file. You might want to do this to share insights discovered in
Platfora with a larger group of people. You can share rendered PDFs with other Platfora users or users
without access to Platfora. Users do not need access to a vizboard to view a rendered PDF.
FAQs—Vizboard PDFs
You can share a vizboard as a PDF file. This topic answers some frequently asked questions about using
this feature.
How can I create a vizboard PDF to share with others?
You can export (download) it manually or send it in an email message, either manually or on a
scheduled basis. Note that Platfora must be configured to send email in order to send a vizboard PDF in
an email message.
How does Platfora render the vizboard PDF?
PDF rendering happens in the background. Depending on the size and number of visualizations, this
might take some time. You can continue working in the vizboard while the PDF is rendered. You can
view the progress of PDF rendering jobs on the System > Activities page.
Each vizboard page becomes a page in the PDF. For more information, see How Platfora Renders a
Vizboard as a PDF.
How do I know when Platfora has finished creating the PDF file?
When the rendering is complete, a notification is displayed in the Platfora UI. Also, a notification
appears in your profile notification list.
Where are the PDF files stored?
Platfora stores generated PDF files on the server. Platfora retains PDF files for one day by default (your
administrator might change this value using the platfora.rendering.job.artifact.maxAge
configuration property).
How do I export a vizboard PDF manually?
In the vizboard, choose Prepare PDF for Download from the Share menu. For more information,
see Export a Vizboard as a PDF Manually.
How do I email a vizboard PDF manually?
In the vizboard, choose Email PDF from the Share menu. For more information, see Email a
Vizboard as a PDF Manually.
Page 216
Data Analysis and Visualization Guide - Share and Collaborate
How do I create a schedule to automatically generate and email vizboard PDFs?
In the vizboard, choose Create Schedule from the Share menu. For more information, see Email a
Vizboard as a PDF on a Schedule.
How many schedules can I define for a vizboard?
Each vizboard can have one schedule, but each schedule can have multiple schedule rules that determine
when Platfora creates a PDF file of the vizboard and sends it out in an email message.
I sent an email to another person, but not to myself. Can I download the PDF file
locally?
Yes. You can manually download a PDF while it still exists on the server. To do this, click the link to
the file from your user profile notification list. Once it's been deleted, you must render the PDF again.
How can I temporarily pause the vizboard PDF email schedule?
Open the vizboard, and choose Pause Schedule from the Share menu. Platfora won't render
the vizboard PDF nor send any email message for this vizboard until someone chooses Resume
Schedule from the Share menu.
How can I delete the vizboard PDF email schedule?
Open the vizboard, and choose Delete Schedule from the Share menu. All rules defined in the
schedule are removed. Platfora won't send any email message for the vizboard unless some creates an
email schedule for it.
Page 217
Data Analysis and Visualization Guide - Share and Collaborate
I can't share a vizboard as a PDF, why are all the PDF menu items grayed out?
By default, Platfora allows vizboard users to export to PDF. However, your administrator may disable
this functionality.
Who can create a vizboard PDF?
Any user who can view a vizboard can download it as a PDF or can send it as a PDF file manually.
However, to create, edit, pause, resume, or delete a vizboard PDF email schedule, the user must be able
to edit the vizboard.
Export a Vizboard as a PDF Manually
You can export a vizboard as a PDF file manually. When the rendering is complete, the file is
downloaded to the default downloads directory for the browser and a notification is displayed in the
Platfora UI.
Page 218
Data Analysis and Visualization Guide - Share and Collaborate
Any user who can view a vizboard can export a PDF file manually.
1. Click the vizboard Share menu.
2. Choose Prepare PDF for Download.
You can also choose Prepare PDF for Download from the vizboard contextual
menu on the Vizboards page.
3. In the dialog, click Save Vizboard and Prepare PDF or Prepare PDF, depending on
whether the vizboard was already saved.
Page 219
Data Analysis and Visualization Guide - Share and Collaborate
Platfora saves the vizboard, if applicable, and begins to render the vizboard as a PDF file. When the
PDF is ready, it's downloaded to your local machine according to how the web browser is configured.
Large vizboards may take some time. Platfora informs you when it starts
rendering a PDF and when the file is created.
Email a Vizboard as a PDF Manually
You can send a vizboard as a PDF file in an email message. Platfora renders the PDF, sends it to the
addresses you specify, and displays a notification in the Platfora UI.
Any user who can view a vizboard can send it as a PDF file manually.
Page 220
Data Analysis and Visualization Guide - Share and Collaborate
1. Click the vizboard Share menu.
2. Choose Email PDF.
3. If the vizboard hasn't been saved yet, click Save Vizboard and Email in the dialog.
4. Enter the email recipient(s), subject, and body.
Separate multiple recipient addresses with a comma (,). If the Subject is left empty, Platfora uses
the vizboard name as the subject of the email.
5. Click Send Email.
Platfora begins to render the vizboard as a PDF file. When the PDF is ready, it's sent as an attachment
in an email to each recipient you entered.
Large vizboards may take some time. Platfora informs you when it starts
rendering a PDF and when the file is created.
Email a Vizboard as a PDF on a Schedule
You can configure Platfora to send a vizboard as a PDF file in an email message on a regular basis.
Page 221
Data Analysis and Visualization Guide - Share and Collaborate
Any user who can edit a vizboard can create, edit, pause, or delete a vizboard PDF email schedule.
1. From the vizboard Share menu, click Create Schedule.
2. In the Create Email Schedule dialog, enter one or more email addresses to send the vizboard
PDF to. Separate multiple addresses by commas.
3. Enter a subject for the message. By default, Platfora uses the vizboard name as the subject. You can
change this.
4. Enter text to include in the body of the message to give context.
Page 222
Data Analysis and Visualization Guide - Share and Collaborate
5. In the Schedule Rules section, choose the type of rule to define, either by days of the week at
certain times, days of the week at an hourly interval, or day of the month at a certain time.
6. Choose the day(s) and time(s) to send the vizboard PDF email.
7. (Optional) Click Add another rule to define an additional rule for the schedule.
Vizboard PDF emails are only sent once if you define multiple overlapping rules
for the same time and day.
8. Click Create.
How Platfora Renders a Vizboard as a PDF
This section lists the guidelines Platfora follows when rendering vizboards as PDF files.
• The file name is comprised of the vizboard name and a date stamp using the following format:
vizboardname_yyyy-mm-dd.pdf.
• Platfora prompts you to save the vizboard if it’s not saved already before rendering it as a PDF.
• Each vizboard page is rendered as a single page in the PDF file, showing all visualizations on the
canvas and all applicable legends.
• Legends display a maximum of 20 items for the Color drop zone, and nine items for the Shape
drop zone.
• Cross-tab and funnel visualizations in the PDF only show the visible area displayed in Platfora.
• The size of each PDF page is the same as the size of the vizboard page canvas plus room for the
legends.
• The minimum size of a PDF page is 6.34 x 4.54 inches.
Export a Viz as a New Dataset
You can choose to save the data comprising a visualization as a new dataset. This is called a derived
dataset in Platfora. A derived dataset allows you to save the query results from a lens as a new dataset
in the Platfora Data Catalog. Once a derived dataset is saved, you can use it as you would any other
Page 223
Data Analysis and Visualization Guide - Share and Collaborate
dataset in Platfora - you can edit it, add additional computed fields, and join it by reference to other
datasets in the Platfora data catalog.
Page 224
Data Analysis and Visualization Guide - Share and Collaborate
1. Create a visualization (or viz query) that you want to use as the basis of the derived dataset.
Tip: Working in Cross-Tab view allows you to see the data rows and columns that will comprise
the new dataset.
Tip: Save the vizboard if you want to keep the query after you create the derived dataset.
2.
From the viz toolbar click the export menu
and select New Dataset from Viz.
3. In the New Dataset from Viz dialog, enter a New dataset name and choose one of the
derived dataset Types.
There are two types of derived datasets you can create:
• Static saves the data comprising a viz as a file in the Hadoop distributed file system (DFS),
essentially taking a snapshot of the viz data at that point in time. A static derived dataset does not
change if the source dataset changes or if its parent lens is rebuilt. You can use a static derived
dataset as a historical snapshot, and then use it to compare past data to recent data.
• Dynamic does not save the actual data in the viz, but instead saves the query used to produce the
data from its parent lens. You can think of it as a dynamically updated view whose data changes
as the source data changes. When a dynamic derived dataset is used to build a lens, the saved
query is run against the parent lens to obtain the latest data values, and stored temporarily while
Page 225
Data Analysis and Visualization Guide - Share and Collaborate
processing the final lens build results. Dynamic derived datasets are typically used to aggregate
records in one dataset to make it possible to join to records of another dataset.
4. Click Create Dataset.
5. Once the dataset is created, click Go to Dataset. This will exit the vizboard, and discard any
unsaved changes in the vizboard.
6. In the new dataset, you can define a key, edit fields, add computed fields, and add references just as
you would in a regular dataset.
7. One difference between derived datasets and regular datasets is their data source. Derived datasets
always use a Platfora lens as their data source.
8. Click Save or Save and Exit to save your changes to the dataset.
Page 226
Chapter
18
Request or Derive Additional Lens Fields
The data available for a viz is determined by the fields made available in the lens. If the lens currently doesn't
have the data you need, you can modify the lens to add more fields if you have the proper permissions. Data
analysts without the permissions to edit a lens still have other options to enhance and supplement existing lens
data.
Topics:
•
Vizboard Computed Fields
•
Combined Fields
•
Request Additional Lens Data
•
Segments
Vizboard Computed Fields
You add a computed field in a vizboard by writing an expression that transforms existing fields.
Computed fields help you refine your data analysis.
FAQs—Vizboard Computed Fields
This topic answers some frequently asked questions about vizboard computed fields.
What is a vizboard computed field?
A vizboard computed field is a user-defined field created in a vizboard that transforms existing lens
fields using the Platfora expression language. Vizboard computed fields can be used in a viz like fields
that come from the dataset. For example, you can filter on them, sort them, and include them in builder
drop zones.
Vizboard computed fields are defined in a visualization and are local to the existing vizboard. You can
use the vizboard computed field in any viz in the current vizboard that uses the same lens as the viz
where the vizboard computed field was defined.
Can I use a vizboard computed field in a different vizboard?
No. The vizboard computed field is stored in the current vizboard only.
Page 227
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
Who can create a vizboard computed field?
To create a vizboard computed field, a user must have the Analyst (Limited) role or higher and also
have object access permission to edit the vizboard.
Why would I want to create a vizboard computed field?
You might want to create a vizboard computed field for any of the following purposes:
• The lens doesn't have data in the form you need and you want to quickly supplement the lens data
without rebuilding the lens.
• The lens doesn't have the data in the form you need and you don't have edit permission on the dataset
to create a dataset computed field.
• You want to test a computed field expression to see how it renders in a visualization before defining
the computed field in the dataset.
How do I create a vizboard computed field?
In a visualizations, select Add Computed Field from the Add menu. Enter the expression and click
Save.
Page 228
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
How do vizboard computed fields appear in the lens panel?
Vizboard computed fields appear in the lens panel like lens fields, but shown with blue text.
How is a vizboard computed field different than a dataset computed field?
In a vizboard, a computed field is calculated using data from a lens. Because the lens contains preprocessed data, it is immediately available for use in visualizations. There is no need to rebuild the lens.
In a dataset, a computed field is calculated using the raw source data. The data in the computed field is
calculated when the lens is built, which may take some time.
Platfora reads all source data during lens build time only. As a result, functions that require Platfora to
process all source data can only be used in expressions in a dataset computed field. For example, you
cannot include the PARTITION function in a vizboard computed field. You can include all expressions,
including user-defined functions, in a dataset computed field.
Why are some fields and functions not available in the vizboard expression builder?
Vizboards query the prepared data in a Platfora lens. A lens contains data that has been pre-processed
and optimized during the lens build process. Some computed field expressions can only be executed
during the lens build process, not later at lens query time. Therefore, vizboard computed fields have
some limitations that dataset computed fields don't have. These limitations are as follows:
• Vizboard computed fields can only operate on the fields that exist in the currently selected lens.
Dataset fields that were not selected at lens build time cannot be used.
• A vizboard computed field can break if the fields it relies on are later removed from the lens
definition.
• A vizboard computed field is only available within the vizboard where it was defined. It is not
available from the dataset, other lenses, or programmatic lens queries.
• You cannot create event series processing computed fields (PARTITION expressions).
• You cannot use custom user-defined functions (UDFs) in vizboard computed field expressions.
Page 229
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
• Referenced fields cannot be used directly in an aggregate function. To work around this, create
another vizboard computed field that includes the referenced field, and then use that vizboard
computed field in the aggregate function.
What kinds of expressions can I write in a vizboard computed field?
You can define expressions that operate on all types of fields that exist in the current lens using
functions included in the Platfora expression language except for PARTITION.
Note that Platfora recommends aggregating data in the lens build instead of in a vizboard computed field
whenever possible. When data is only aggregated in a vizboard computed field instead of the lens, the
lens size is typically much larger.
Can I use a user-defined function (UDF) in a vizboard computed field?
No. Platfora only calls user-defined functions during lens build time.
How do I edit a vizboard computed field?
From the lens data panel or viz builder panel, select Edit Field from the field menu. Change the
expression or field name, and click Save.
What happens when I edit a vizboard computed field that's currently used in a viz?
Platfora updates all visualizations that use the vizboard computed field.
However, if the new expression results in a different field role, then a visualization will result in an error
if the new field role is unsupported in that drop zone. For example, if a field expression is originally a
measure and is placed in the Size drop zone, and then changes to become a dimension, there will be an
Page 230
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
error in that viz saying there is an incompatible field. This is because only measure fields are allowed in
the Size drop zone, not dimensions.
When this happens, you can click Undo to revert the change, edit the expression to correct the field
role, or remove the field from all invalid drop zones.
For more information about which field roles are allowed in each drop zone, see About the Builder Drop
Zones.
How do I delete a vizboard computed field?
From the lens data panel or viz builder panel>, select Delete from the field menu.
Page 231
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
When you delete a vizboard computed field, Platfora removes it everywhere it's used in all visualizations
in the vizboard, including drop zones and filters.
What happens when I delete a vizboard computed field that's currently used in a
viz?
Platfora informs you which visualizations currently use the field you are trying to delete. You can cancel
the operation or delete the field anyway.
Where can I find examples of useful computed field expressions?
Platfora's expression reference documentation has lots of examples of useful expressions. Expression
Language Reference.
Page 232
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
Add a Vizboard Computed Field
You add a computed field in a vizboard by writing an expression that transforms existing fields. Once
you define the computed field, it is listed in the vizboard lens panel. You can then use the computed
field by dragging it to a drop zone.
1. Select Add Computed Field from the Add menu.
This opens the Add Field expression builder window.
2. Enter a field name and a description.
The description is optional but very useful for others that might use your analysis later.
Page 233
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
3. Choose function from the Functions list.
Use the drop-down to restrict the type of functions you see.
4. Double-click a function from the list to add it to the Expression area.
The Expression panel updates with the function's template. Also, the Fields list refreshes with
those fields you can use with the function. For example, TO_DATE works on STRING data types.
5. Double-click a field to add it into the Expression area.
6. Continue adding functions and fields until your expression is complete.
7. Make sure your expression is correct.
The system checks your syntax as you build the expression. The yellow text box below the
Expression area displays error messages. Platfora only allows you to save expressions that
evaluate successfully. If it cannot resolve an expression, the Save button is not available.
8. Click Save to add the new computed field to the vizboard.
Computed field names are blue to distinguish them from regular lens fields.
Writing expressions for computed fields is an advanced topic. For information on working with
expression syntax, see Platfora Expressions.
Combined Fields
Platfora allows you to view multiple, orthogonal dimension values side by side in the same chart.
About Combined Fields
A combined field is a special type of vizboard computed field that lets you merge the values across
different dimension fields into a single field. This allows you to compare the values of different
dimension fields on the same axis in a viz. Combined fields are immediately available for use in
visualizations without having to rebuild the lens.
You might want to use a combined field to compare different segments of the same population that may
or may not be mutually exclusive. For example, if a viz contains a segment of users who bought iOS
devices and another segment of users who bought any mobile devices, you could place both segments in
a combined field to see the patterns of both types of buyers in a single chart.
Combined fields have the same restrictions as vizboard computed fields as well as the following:
• Combined fields can only be comprised of fields from the lens or segments. They cannot be
comprised of vizboard computed fields.
• The fields that comprise a combined field must be of the same data type. Once the first dimension is
selected to add to the combined field, Platfora only allows users to add a field of the same datatype.
For example, if a FIXED field is selected, only other FIXED fields are available to add.
• Segment fields, STRING fields, and location fields are considered to be of the same data type and
can comprise a single combined field.
Page 234
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
• Combined fields are not available to be used inside a vizboard computed field.
1. Segments included in the combined field.
2. Combined field comprised of two segments.
3. Viz only shows values configured to include in the combined field.
Page 235
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
Create Combined Field
Create combined fields in a visualization using the Add menu. Once you define the combined field and
save it, the field is added to the lens panel of each viz in the vizboard. You can then use the combined
field by dragging it to a builder drop zone.
1. Select Add Combined Field from the Add menu.
2. Enter a name for the combined field.
3.
Click the
icon for all fields to add to the combined field.
After you click the first field, Platfora only allows you to select other fields of the same datatype.
Page 236
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
4.
(Optional) Click the filter icon
to choose which field values to include in the combined field for
analysis. In the Edit Filter dialog, choose the selected members and click Apply.
By default, adding a dimension to a combined field includes every value
from that dimension. You might want to filter a dimension to focus on the
populations of interest.
5. Choose whether or not to show the overall total values for the focal dataset in the combined field.
When you click Show the Total, Platfora includes an additional value for the combined field that
represents the total for the focal dataset. Since the combined field can contain members that overlap,
showing the total is useful to benchmark each member against the total without any overlap in values.
You might want to enable this when the combined field is comprised of segment fields and you want
to compare each segment against the total population.
6. Click OK.
Page 237
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
The combined field is added to the lens panel. Combined field names are blue so you can distinguish
them from regular lens fields.
Request Additional Lens Data
When working in a viz, sometimes the lens used in the viz doesn't contain all data you need. If you have
the appropriate permissions, you can add new dataset fields to the current lens or create a new lens from
the current dataset and include additional fields.
Create a New Lens From Viz
You can create a new lens from a visualization as well as from the data catalog. The newly created lens
uses the same datasets as the viz's lens, and it includes the fields used in the viz by default. However,
you can add or remove fields from the new lens before saving it.
You might want to derive a new lens based on the fields currently used in a viz to drill further into the
current data at a more granular level. By getting more granular on only the data you're interested in, you
create and build a lens that is only as large as necessary. For example, if the current lens only includes
data down to the day level, but you want to view the current viz at the hour level, you could create a new
lens based on the fields included in the viz and also add the Hour field from the Time reference.
Page 238
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
To create a new lens, you must be at least an Analyst role or above. You must have data access
permissions on the source data and at least Define Lens from Dataset object permissions on the
focus dataset, as well as any datasets that are included in the lens by reference.
1. Choose Derive New Lens from the marks menu in a viz.
2. Choose whether or not to include all fields from the existing lens in the new lens.
The default is to only include fields currently used in the viz. You can add and remove fields from
the lens, and add, modify, and remove lens filters in the lens builder panel.
3. Click Define Lens.
Segments
Members of a population can be grouped together so you can analyze events while looking for patterns
among the group. Identify members that share similar behaviors by creating a segment, which is a
collection of members of a population grouped together based on common behaviors and attributes.
Segments can be created by Platfora users who have the Analyst system role (or above), provided they
also have access to the underlying source data in Hadoop and define lens permission on all datasets used
in the segment definition.
FAQs—Segments
A segment is a collection of members of a population grouped together based on common behaviors and
attributes. This topic answers some frequently asked questions about segments.
Page 239
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
What is a segment?
A segment is a special type of dimension field that you can create to group together members of a
population that meet some defined common criteria. A segment is based on members of a dimension
dataset (such as customers) that have some behavior in common (such as purchasing a particular
product). Segments may also be based on common attributes as well as behaviors.
For example, users older than 30 years (attribute) who returned to your website (behavior) is a segment,
but users older than 30 years (attribute) is not.
Segments allow you to analyze behaviors among a subset of the population.
Segments in Platfora are more than just saved filters. Members in a segment are always based on users
in a dimension dataset and have a condition (behavior) in common from a fact dataset. Rows in the
dimension dataset are either members of the segment or not members.
How are segments created?
Create segments in a visualization. To create a segment, a viz must use a lens that allows segments to be
created for at least one referenced dataset. Choose Add Segment from the Add menu.
When a segment is created, Platfora creates and builds a special type of lens that determines the
members in the segment, and then it adds the segment as a field in the viz lens.
For more details, see Create Segments.
Who can create a segment?
Segments can be created by Platfora users who have the Analyst system role (or above), provided they
also have access to the underlying source data in Hadoop and define lens permission on all datasets used
in the segment definition.
How can I use a segment?
Use a segment field in a viz like any other dimension field. For example, you can use it any drop
zone, filter on it, or use it in a combined field. Use it in the viz it was created in, or in another lens that
references the same dimension dataset the segment is based on.
What values are included in a segment?
A segment field has two possible values: members that are IN the population and members that are NOT
IN the population.
When you use a segment field in a viz drop zone, Platfora displays both the IN and NOT IN values.
However, when you use a segment field in a viz filter, page filter, or combined field, Platfora uses only
the IN values by default by filtering out the NOT IN values.
Can I use a segment field in a combined field?
Yes. When you put multiple segment fields into a combined field, you can use that in a viz to perform
side by side comparisons.
Page 240
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
Why would I want to use a segment?
You can easily compare behaviors among different segments in your population. You can create
multiple segments and then use those segment fields in a single viz to compare the results.
You can use segments to compare behaviors of a group of individuals across multiple fact or event
datasets. Once a segment is created, it can be used in other lenses that use a dataset that is based off of or
references the dimension dataset used in the segment. This allows you to join data between fact datasets
that otherwise can't be joined.
Where are segments located in the vizboard lens panel?
Segment fields are grouped together under the Segments group of a referenced dataset in the vizboard
lens panel. The Segments group only appears when the data administrator allows segment creation for
that reference in the lens.
Page 241
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
I see segment fields in my viz field list that I didn't create. Where did they come
from?
Any segment that someone else created for the referenced dataset is available to you to use if you have
been granted permission on the segment. These segments could have been created in a viz using the
same lens, or a different lens that references the same referenced dataset in your lens.
What is the difference between the different types of segments I see in the
vizboard lens panel?
When a segment is first created in a viz, it is an ad-hoc segment. Ad-hoc segments are created in a viz
and can be edited in a viz or from the Data Catalog page.
However, someone editing a lens can choose to include an ad-hoc segment field as a field in the lens,
natively. When a segment field is included in a lens, Platfora builds the results of the segment values
into the lens. Vizboard users can still use the segment field in a viz like they can for ad-hoc segments,
but the performance will be faster.
Ad-hoc segments appear in the lens panel as blue fields, and segments included in the lens appear as
black fields in the vizboard lens panel. (This works similarly to other fields. Vizboard computed fields
appear as blue fields, and fields included in the lens appear as black.)
Why can't I create a segment for a referenced dataset?
Data administrators who edit and build lenses can choose whether or not to allow vizboard users to
create and use ad-hoc segments. They can make this choice per reference in a lens. For example,
they might allow ad-hoc segments for the Arrival Airport reference, but not the Date or Time
references.
Page 242
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
What happens if I edit a segment included in a lens?
When you edit and save a segment included in the lens, Platfora updates the segment members and then
changes the segment to an ad-hoc segment. It remains an ad-hoc segment for all users and all lenses that
have the segment. This allows vizboard users to work with the most recent members of the segment.
However, as soon as you rebuild a lens that includes the segment, the segment changes back from being
an ad-hoc segment to native segment field in that lens only. When you create a viz based on a lens that
already included the segment, the segment appears as an ad-hoc segment (blue field in the vizboard lens
panel).
How is the data in the segment updated?
The data in the segment is updated every time the segment lens is rebuilt. The segment lens is rebuilt
when someone changes the segment definition conditions, or when someone rebuilds the segment lens
from the Data Catalog > Segments page.
What happens if the viz or lens I created the segment from is deleted?
Once a segment is created, Platfora creates its own special type of lens behind the scenes to create
and populate the members of the segment. The segment does not rely on the original lens or viz it was
created from. However, if the original lens is deleted, you can no longer edit any segment made from it.
Page 243
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
Why is the Segments group empty in the vizboard lens panel?
The Segments group in the lens panel always appears when ad-hoc segments is enabled for a
reference. However, if no segments have been created for the reference in any lens, then no segment
fields appear in the group. Platfora displays a warning icon explaining why no segments are listed.
Page 244
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
There are a lot of segments in the vizboard lens panel. Is there any easier way to
find what I need?
Yes! Choose Only Show Segments from the lens panel menu. This hides all other fields in the lens
panel until you decide to show them again.
Page 245
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
How do I edit a segment definition?
You can edit a segment from a viz or the Data Catalog > Segments page. In a viz, click the
segment's contextual menu and choose Edit Field. You can edit a segment from the lens panel or when
it's being used in a viz drop zone.
How do I let another Platfora user use the segment I created?
When you create a segment, only you have permission on it by default. You can edit the segment
definition and grant other users permission. They can use it in a viz as long as they have permission on
the segment, and data access to the datasets used in the segment lens.
When editing the segment definition, click Permission Settings to grant permission to other users.
Create Segments
Create segments from a viz that uses a lens with a referenced dataset.
Segments created for a particular referenced dataset are available for any other viz that uses a lens with
that dataset.
You can create segments in any viz either from scratch or by selecting a single mark or funnel stage.
Page 246
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
Create Segment from Chart Viz Lens
You can define a segment from scratch in any viz that uses a dataset referenced by another dataset.
1. Choose Add Segment from the lens Add menu.
2. Enter a name for the segment.
The name must be unique among segment names and dataset names. Platfora recommends using a
very descriptive name. There is no description field for segments to help other users understand the
criteria for segment membership. This name will appear in the list of available fields for a viz when
the segment is added to a viz.
Segment names cannot be changed later. Instead, you can create a copy of the
segment, use a different name for the copy, and delete the original segment.
Page 247
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
3. In the Segment of field, choose a dimension dataset used in the lens that this segment is based on.
Platfora lists datasets that are referenced by another dataset.
Platfora displays the fact dataset used in this lens in the Occurring in Dataset field and the lens
used in this viz in the Origin Lens field.
4. In the Segment Conditions section, define a condition for membership in the segment. Choose a
field from the lens and the required values.
Platfora lists fields defined in the lens from the fact dataset and each referenced dataset. Vizboard
computed fields and segment fields are not listed.
5.
(Optional) Click the
icon to define additional conditions.
6. (Optional) In the Segment Value Label section, define the value labels for records that are
members and non-members of the segment. If no value labels is specified, the segment name is used
by default.
Platfora uses the text you enter here as the labels for the segment values when used in a viz. For
example, when you use the segment in the X-Axis drop zone, this text is used as the two values
displays along the x-axis of the viz.
7. Click Save Segment.
Platfora creates the segment members lens and adds the segment to the current viz lens panel as an
available field in the Segments section under the reference it was created in. Segment field names are
blue so you can distinguish them from regular lens fields and segment fields that are included in the
lens. The spinning icon on the right side of the segment in the lens panel indicates Platfora is building
the special lens for the segment.
If the segment definition does not explicitly include a condition based on
the fact dataset, Platfora displays a message informing you of the implied
condition on the fact dataset. The implied condition means that the segment
only includes members that also appear in the fact dataset. You can save the
segment with the implied condition, or edit the segment to create your own
condition on the fact dataset.
Page 248
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
Create Segment from Mark Selection
Segments can be created from a single selected mark in a viz. Platfora determines the conditions for
the mark and configures the segment conditions for you. You can accept the conditions as is, or modify
them further.
You can't create a segment from a mark if a vizboard computed field or another
segment field is in any drop zone in the viz.
1. Select a single mark in a viz.
2. Choose Add Segment from the viz selection menu.
3. Enter a name for the segment.
Page 249
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
The name must be unique among segment names and dataset names. Platfora recommends using a
very descriptive name. There is no description field for segments to help other users understand the
criteria for segment membership. This name will appear in the list of available fields for a viz when
the segment is added to a viz.
Segment names cannot be changed later. Instead, you can create a copy of the
segment, use a different name for the copy, and delete the original segment.
4. Verify the segment is being created in the dimension dataset you want. Change if necessary.
5. (Optional) Edit the pre-configured segment definition attributes, such as creating new conditions.
6. (Optional) In the Segment Value Label section, define the value labels for records that are
members and non-members of the segment. If no value labels is specified, the segment name is used
by default.
Platfora uses the text you enter here as the labels for the segment values when used in a viz. For
example, when you use the segment in the X-Axis drop zone, this text is used as the two values
displays along the x-axis of the viz.
7. Click Save Segment.
Platfora creates the segment members lens and adds the segment to the current viz lens panel as an
available field. Segment field names are blue so you can distinguish them from regular lens fields.
If the segment definition does not explicitly include a condition based on
the fact dataset, Platfora displays a message informing you of the implied
condition on the fact dataset. The implied condition means that the segment
only includes members that also appear in the fact dataset. You can save the
segment with the implied condition, or edit the segment to create your own
condition on the fact dataset.
Page 250
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
Create Segment from Funnel Stage
Segments can be created from a selected stage in a funnel viz. Platfora configures the segment attributes
for you, and they cannot be edited. When creating a segment from a stage, choose whether to include
records that meet the stage criteria, or records that meet the previous stage criteria, but not the current
stage's criteria (the records that dropped out at the current stage).
For example, if the previous stage contained 50% of the original records, and the current stage contains
20% of the previous stage records, then choosing Reached at least this stage results in 10% of
the original records (0.5 x 0.2) and choosing Reached the previous stage, but not this stage
results in 40% of the original records (0.5 x (1.0-0.1)).
1. Select a single stage in a funnel viz.
2. Choose Add Segment from the viz selection menu.
3. Enter a name for the segment.
The name must be unique among segment names and dataset names. Platfora recommends using a
very descriptive name. There is no description field for segments to help other users understand the
criteria for segment membership. This name will appear in the list of available fields for a viz when
the segment is added to a viz.
Segment names cannot be changed later. Instead, you can create a copy of the
segment, use a different name for the copy, and delete the original segment.
4. Choose the segment membership criteria, either records that reached the selected stage or records that
reached the previous stage, but not the selected stage.
5. Click Create.
Page 251
Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields
Platfora creates the segment members lens and adds the segment to the current viz lens panel as an
available field. Segment field names are blue so you can distinguish them from regular lens fields.
Page 252
Chapter
19
Save Your Work in a Vizboard
It is best practice to save your work each time you add or change a visualization or page in a vizboard. Vizboards
are not auto-saved as you work. If you leave or reload the vizboard without saving your changes first, or if the
browser closes unexpectedly, any unsaved changes will be lost.
Topics:
•
Manage Vizboard Versions
•
Restore a Vizboard to a Previous Version
•
Exit a Vizboard without Saving
•
Using Undo and Redo in a Vizboard
•
Duplicate a Vizboard
Whenever a vizboard has unsaved changes, the Save button is highlighted with an orange border. If
you try to navigate away from the vizboard without saving, you will be prompted to stay and save your
changes first. Leaving the vizboard without saving will discard any unsaved changes.
Each time you click Save, a version is added to the vizboard Restore menu.
Page 253
Data Analysis and Visualization Guide - Save Your Work in a Vizboard
Manage Vizboard Versions
The Restore menu has a history of each time a vizboard was saved, ordered from most recent to oldest.
Versions are named by the date and time they were created, along with the name of the user who saved
the version. You cannot rename versions. You can only delete or restore versions.
About Vizboard Versions
A version is a snapshot of the vizboard and its contained pages and visualizations at a given point
in time. Note that the underlying lens data is not saved when you save a version, only the page and
visualization definitions.
The Restore menu keeps a history of each vizboard Save event. You can always go back to any point
in the timeline, and then work forward from that point.
A vizboard will always be opened to its last saved version. If multiple users are working on a vizboard at
the same time, the latest saved version may not necessarily be the last version that you saved. However,
you can always go back in the version timeline to recover your work.
Delete a Saved Version
The list of versions in the Restore menu will continue to grow each time you click Save. You should
periodically delete older versions from the list that you no longer need.
To delete a version:
1. Open the Restore menu.
2. Click the
icon to the right of the version you want to delete.
Page 254
Data Analysis and Visualization Guide - Save Your Work in a Vizboard
3. Click Confirm.
Restore a Vizboard to a Previous Version
Restoring a vizboard to a previous version allows you to rollback the vizboard to a previously saved
state, then continue working from that point. If you want to keep the current state of the vizboard as
well, be sure to Save before restoring. Any unsaved changes will be discarded when you restore to a
previous version.
1. Select the version you want to restore from the Restore menu.
Page 255
Data Analysis and Visualization Guide - Save Your Work in a Vizboard
2. Click Confirm to rollback the vizboard to the selected version.
3. If the vizboard has unsaved changes, you will be prompted before reverting to the previous version.
Click Reload this Page to discard unsaved changes.
If you want to keep the current state of the vizboard, click Stay on Page. This allows you to Save
a new version before you restore.
Exit a Vizboard without Saving
When you close a vizboard without saving it first, Platfora asks whether or not you want to save the
vizboard before exiting.
To close a vizboard, just navigate off the page by clicking any other page in the top navigation header. If
you don't want to lose your work, Save the vizboard before you exit.
If you try to navigate away from a vizboard page without saving your work first, you will be prompted
to either Stay on Page (do not exit without saving first) or Leave Page (exit without saving). If
you want to save your work, click Stay on Page. Then click Save before leaving the page again.
Using Undo and Redo in a Vizboard
While working in an ad hoc analysis vizboard, you can undo and redo your changes to pages and
visualizations. A history of actions is kept for the current session only. Leaving the vizboard page clears
the action history.
Page 256
Data Analysis and Visualization Guide - Save Your Work in a Vizboard
Hovering your mouse over the Undo or Redo button displays a tooltip of the action to be undone or
redone.
Duplicate a Vizboard
Save As makes a copy of the current vizboard and saves it as a new vizboard. Note that the underlying
lens data is not saved when you duplicate a vizboard, only the page and visualization definitions. Any
Page 257
Data Analysis and Visualization Guide - Save Your Work in a Vizboard
unsaved changes in the current vizboard will be saved in the duplicate copy, but not in the original
vizboard. The version history in the Restore menu is not carried forward to the new vizboard.
1. Select Save As from the vizboard Save menu.
2. Enter a name for the new vizboard and click Confirm.
Page 258
Chapter
20
Trace the Data Lineage of Viz Fields
All fields in a viz originate from raw data in Hadoop files. However, the data may be processed and manipulated
by multiple datasets and computed fields before it appears in a viz. Analysts can trace data lineage through
Platfora lenses, datasets, all field types (computed, base, and measure), and data sources to the source files in
Hadoop.
Topics:
•
Export Viz Data Lineage
•
What Data Lineage Includes
•
Interpret Data Lineage Levels
Lineage tells the analyst where data used in critical decisions came from and what was done to the data before it
was used in a viz. You might want to view data lineage to address any of the following questions:
• How can I reproduce this result? Sometimes data analysts need to port the data to a different system for
further analysis. Data lineage shows the order in which the actions are performed. Analysts can reproduce
these actions in the new system.
• Where did this data come from? Visualizations can be based on derived datasets that are based on other
datasets. That is, on the results of someone else's analysis. By viewing the data lineage, analysts can prevent
false positives.
• When was this data retrieved from Hadoop? Data lineage shows the timestamp of all objects in the field's
history, including the timestamp of the Hadoop source files.
Page 259
Data Analysis and Visualization Guide - Trace the Data Lineage of Viz Fields
Export Viz Data Lineage
Exporting data lineage to a JSON file for all fields in a visualization allows data analysts to see where
the data in the fields ultimately came from and how they were manipulated in the process. The data
lineage report includes filters applied to fields used in the viz.
1.
From the viz toolbar click the export menu
and select Download Lineage as JSON.
2. The JSON file (platforaData.json by default) is saved in the default downloads directory configured
for your browser.
What Data Lineage Includes
All fields in a visualization originate in the underlying lens. The lens, in turn, comes from a dataset
(or another lens). These are the fields parent objects. Data lineage includes more than a field's parent
objects, it includes details about filters and expressions from those objects as well.
When applicable, the data lineage report shows the following types of information:
• Lens field names
• Reference field names
• Filter expressions
• Field expressions
• Lens names
Page 260
Data Analysis and Visualization Guide - Trace the Data Lineage of Viz Fields
• Dataset names
• Data source names
• Data source locations
• Lens build specific source file names including their paths
• Timestamps
When viewing data lineage for a lens field, Platfora lists that lens field and all parent objects (up to the
configured number of levels).
Interpret Data Lineage Levels
System administrators can configure how much of a data lineage to report. This configuration applies to
all data in Platfora's catalog. Administrators cannot configure lineage on individual catalog items.
You can view data lineage for a single field or all fields in a viz. To view a field's lineage, choose
Show Data Lineage from the field's menu. To view lineage for an entire viz, choose Export
Lineage as JSON from a visualization's menu.
Page 261
Data Analysis and Visualization Guide - Trace the Data Lineage of Viz Fields
When interpreting lineage, think of a field as a root of a tree. Each object that feeds into this field is a
branch. The lens itself is always the initial "branch" of a field. The simplest tree is always a base field:
The Device ID tree has one branch (LENS: VOD Device) with two ancestors, the VOD Device
lens itself and its parent dataset, VOD Mobile. On any Platfora lineage, the last parent is always the
dataset.
A field can have multiple branches. For example, fields formed from aggregate functions or other
computation have multiple branches. Consider a Session Event Count field that results from the
Page 262
Data Analysis and Visualization Guide - Trace the Data Lineage of Viz Fields
MAX(Session Event Number) function. The Session Event Count lineage contains two
branches, the lens and the Session Event Number:
You can see that the Session Even Number branch is itself a computed field. As such, the branch
has four ancestors Event Name, Device-Asset, Client Time and the VOD Device lens.
A Platfora system administrator can configure the number of levels from the root field to the end of
an ancestor branch. When reading a report, each ancestor appears as a branch with subsequent levels
indented after that. For example, the Session Event Count field shows three levels to reach through
the Session Event Number to the dataset:
Page 263
Data Analysis and Visualization Guide - Trace the Data Lineage of Viz Fields
• Session Event Number
• LENS: VOD Device
• DATASET: VOD Mobile
By default, Platfora displays five levels of lineage on the graphical report and 10 levels in an exported
JSON file. System administrators can configure reporting levels on the System > Global Settings
page. Increasing the levels is only necessary if your data contains multiple derived datasets or computed
fields.
The following table lists the possible ancestors for each object type.
Object
Possible Ancestors
Lens or dataset field that is a base field
lens
Lens or dataset field that is a computed dimension lens, lens or dataset field
field
Lens or dataset field that is a measure (aggregate)
field
lens, lens or dataset field
Lens or dataset field that is a referenced field
lens, lens or dataset field that is a referenced field,
lens or dataset field that is the foreign key
Lens
dataset
Dataset that is a derived dataset
lens
Dataset that is not a derived dataset
data source
Data source
no parent
Page 264
Chapter
21
Viz Example Gallery
You can create dozens of different types of charts depending on the types of fields placed in the viz drop zones.
This sections gives samples of different chart types including the types of fields required in the different drop
zones (chart type recipe). If the default mark type doesn't create the desired chart type, change the mark type
from the Mark Type menu.
Topics:
•
Axis Chart Viz Examples
•
Non-Axis Chart Viz Examples
•
Polar Chart Viz Examples
•
GeoMap Viz Examples
•
Cross-Tab Viz Examples
Axis Chart Viz Examples
The examples in this section demonstrate how to create different kinds of Chart visualizations that
display both an X and Y axis.
Page 265
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Simple Bar
A bar chart is useful for recording discrete categories of data. Bar graphs can also be used for more
complex comparisons of data with grouped bar charts and stacked bar charts.
Table 2: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Dimension
Measure
Details
Color
Size
Opacity
Shape
Labels
Page 266
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Bars with Different Color Values
Table 3: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Dimension
Measure
Details
Color
Measure
Size
Opacity
Shape
Labels
Page 267
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Stacked Bar
Table 4: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Dimension
Measure
Details
Color
Dimension
Size
Opacity
Shape
Labels
Page 268
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Split Bar with Values
Table 5: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Details
Color
Field Type
Dimension
Measure
Dimension
Measure
Size
Opacity
Shape
Labels
Page 269
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Bar with Variable Widths
Table 6: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Dimension
Measure
Details
Color
Size
Measure
Opacity
Shape
Labels
Page 270
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Point Plot
Table 7: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Dimension
Measure
Details
Color
Size
Opacity
Shape
Dimension
Labels
Page 271
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Scatter Plot
Table 8: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Details
Field Type
Measure
Measure
Dimension
Color
Size
Opacity
Shape
Labels
Page 272
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Color Encoded Scatter Plot
Table 9: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Measure
Measure
Details
Color
Dimension
Size
Opacity
Shape
Labels
Page 273
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Bubble Chart
Table 10: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Details
Field Type
Measure
Measure
Dimension
Color
Size
Measure
Opacity
Shape
Labels
Page 274
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Color Encoded Bubble Chart
Table 11: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Measure
Measure
Details
Color
Size
Dimension
Measure
Opacity
Shape
Labels
Page 275
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Gradient Grouped Scatter Plot
Table 12: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Details
Color
Field Type
Measure
Measure
Dimension
Measure
Size
Opacity
Shape
Labels
Page 276
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Shape Encoded Scatter Plot
Table 13: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Measure
Measure
Details
Color
Size
Opacity
Shape
Dimension
Labels
Page 277
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Heatmap
Table 14: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Dimension
Dimension
Details
Color
Measure
Size
Opacity
Shape
Labels
Page 278
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Size Encoded Heatmap
Mark Type: Bar (not Auto)
Table 15: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Dimension
Dimension
Details
Color
Size
Dimension
Measure
Opacity
Shape
Labels
Page 279
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Size Encoded Matrix
Mark Type: Bar (not Auto)
Table 16: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Dimension
Dimension
Details
Color
Size
Measure
Opacity
Shape
Labels
Page 280
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Line Chart
Table 17: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Date (Time-Series)
Measure
Details
Color
Size
Opacity
Shape
Labels
Page 281
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Multi-Series Line Chart
Table 18: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Details
Field Type
Date (Time-Series)
Measure
Dimension
Color
Size
Opacity
Shape
Labels
Page 282
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Color Encoded Multi-Series Line Chart
Table 19: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Date (Time-Series)
Measure
Details
Color
Dimension
Size
Opacity
Shape
Labels
Page 283
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Variable Color Line Chart
Table 20: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Date (Time-Series)
Measure
Details
Color
Measure
Size
Opacity
Shape
Labels
Page 284
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Variable Thickness Line Chart
Table 21: Chart Type Recipe
Drop Zone
X-axis
Y-axis
Field Type
Date (Time-Series)
Measure
Details
Color
Size
Measure
Opacity
Shape
Labels
Page 285
Data Analysis and Visualization Guide - Viz Example Gallery
Non-Axis Chart Viz Examples
The examples in this section demonstrate how to create different kinds of Chart visualizations that do
not have an X and Y axis.
Chart Type: Packed Bubbles
Mark Type: Point (not Auto)
Table 22: Chart Type Recipe
Drop Zone
Field Type
X-axis
Y-axis
Details
Color
Size
Measure
Opacity
Shape
Page 286
Data Analysis and Visualization Guide - Viz Example Gallery
Drop Zone
Labels
Field Type
Dimension
Chart Type: Packed Bubbles with Different Colors
Mark Type: Point (not Auto)
Table 23: Chart Type Recipe
Drop Zone
Field Type
X-axis
Y-axis
Details
Color
Size
Dimension
Measure
Opacity
Page 287
Data Analysis and Visualization Guide - Viz Example Gallery
Drop Zone
Field Type
Shape
Labels
Dimension
Chart Type: Text Gauge
Table 24: Chart Type Recipe
Drop Zone
Field Type
X-axis
Y-axis
Details
Color
Size
Opacity
Shape
Labels
Measure
Page 288
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Word Cloud
Table 25: Chart Type Recipe
Drop Zone
Field Type
X-axis
Y-axis
Details
Color
Size
Measure
Opacity
Shape
Labels
Dimension
Page 289
Data Analysis and Visualization Guide - Viz Example Gallery
Polar Chart Viz Examples
The examples in this section demonstrate how to create different kinds of Polar Chart visualizations.
Polar Chart Type: Donut
Table 26: Polar Chart Type Recipe
Drop Zone
Angle
Field Type
Measure
Details
Color
Dimension
Size
Opacity
Labels
Page 290
Data Analysis and Visualization Guide - Viz Example Gallery
Polar Chart Type: Size Encoded Donut
Table 27: Polar Chart Type Recipe
Drop Zone
Angle
Field Type
Measure
Details
Color
Size
Dimension
Measure
Opacity
Labels
Page 291
Data Analysis and Visualization Guide - Viz Example Gallery
Polar Chart Type: Pie
Table 28: Polar Chart Type Recipe
Drop Zone
Angle
Field Type
Measure
Details
Color
Size
Dimension
Maximize the size.
Opacity
Labels
Page 292
Data Analysis and Visualization Guide - Viz Example Gallery
GeoMap Viz Examples
The examples in this section demonstrate how to create different kinds of Geomap visualizations.
Chart Type: Simple Geo Map
Table 29: Chart Type Recipe
Drop Zone
Geography
Field Type
Location
Details
Color
Size
Opacity
Shape
Labels
Page 293
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Color-Encoded Geo Map
Table 30: Chart Type Recipe
Drop Zone
Geography
Field Type
Location
Details
Color
Measure
Size
Opacity
Shape
Labels
Page 294
Data Analysis and Visualization Guide - Viz Example Gallery
Chart Type: Size-Encoded Geo Map
Table 31: Chart Type Recipe
Drop Zone
Geography
Field Type
Location
Details
Color
Size
Measure
Opacity
Shape
Labels
Cross-Tab Viz Examples
The examples in this section demonstrate how to create different kinds of Cross-Tab visualizations.
Page 295
Data Analysis and Visualization Guide - Viz Example Gallery
Cross-Tab Type: Simple
Table 32: Chart Type Recipe
Drop Zone
Columns
Field Type
Measure
Measure
Rows
Dimension
Details
Color
Size
Opacity
Shape
Labels
Page 296
Data Analysis and Visualization Guide - Viz Example Gallery
Cross-Tab Type: With Dimensional Groupings
Table 33: Chart Type Recipe
Drop Zone
Columns
Field Type
Measure
Measure
Rows
Dimension
Dimension
Details
Dimension
Color
Size
Opacity
Shape
Labels
Page 297
Data Analysis and Visualization Guide - Viz Example Gallery
Cross-Tab Type: Show Totals
Columns setting: Show the total per column
Rows setting: Show the total per row
Table 34: Chart Type Recipe
Drop Zone
Columns
Field Type
Dimension
Measure
Rows
Dimension
Details
Color
Size
Opacity
Shape
Labels
Page 298
Chapter
22
Platfora Expressions
Platfora comes with a powerful, flexible built-in expression language that you can use to transform, manipulate,
and query data. This section describes Platfora's expression language, and describes how to use it to define
dataset computed fields, vizboard computed fields, measures, lens filters, and lens query statements.
Topics:
•
Expression Building Blocks
•
PARTITION Expressions and Event Series Processing (ESP)
•
ROLLUP Measures and Window Expressions
•
Computed Field Examples
•
Troubleshoot Computed Field Errors
•
Write a Lens Query
•
FAQs - Expression Basics
•
Platfora Expression Language Reference
Expression Building Blocks
This section explains the building blocks of an expression, and the general rules for constructing a valid
expression.
Functions in an Expression
Functions perform common data processing tasks. While not all expressions contain functions, most do.
This section describes basic concepts you need to know to use functions.
Function Inputs and Outputs
Functions take one or more input values and return an output value. Input values can be a literal value
or the name of a field that contains a value. In both cases, the function expects the input value to be a
particular data type such as STRING or INTEGER. For example, the CONCAT() function combines
STRING inputs and outputs a new STRING.
Page 299
Data Analysis and Visualization Guide - Platfora Expressions
This example shows how to use the CONCAT() function to concatenate the values in the month, day,
and year fields separated by the literal forward slash character:
CONCAT(month,"/",day,"/",year)
A function's return value may be the same as its input type or it may be an entirely new data type. For
example, the TO_DATE() function takes a STRING as input, but outputs a DATETIME value. If a
function expects a STRING, but is passed another data type as input, the function returns an error.
Typically, functions are classified by what data type they take or what purpose they serve. For example,
CONCAT() is a string function and TO_DATE() is a data type conversion function. You'll find a
complete list of functions by type in Platfora's Expression Language Reference.
Nesting Functions
Functions can take other functions as arguments. For example, you can use the CONCAT function as an
argument to the TO_DATE() function. The final result is a DATETIME value in the format 10/31/2014.
TO_DATE(CONCAT(month,"/",day,"/",year),"MM/dd/yyyy")
The nested function must return the correct data type. So, because TO_DATE() expects string input and
CONCAT() returns a string, the nesting succeeds.
Only row functions allow nesting. Aggregate functions do not allow nested expressions as input.
Aggregate Functions versus Row Functions
Most functions process one value from one row at a time. These are called row functions because they
operate on one value from a single row at a time. Aggregate functions are a special class of functions.
Unlike row functions, aggregate functions process the values from multiple rows together into a single
return value. Some examples of row functions are:
• SUM()
• MIN()
• VARIANCE()
Aggregate functions are also special because you use them to define measures. Measures always return
numeric values that serve as the quantitative data in an analysis. Aggregate expressions are often refered
to as measure expressions in Platfora.
Limitations of Aggregation Functions
Unlike row functions, aggregate functions can only take simple expressions as input (such as field names
or literal values). Aggregate functions cannot take row functions as arguments. You also cannot use an
aggregate function as input into a row function. You cannot mix aggregate functions and row functions
together in one expression.
Finally, while you can build expressions in both the dataset or the vizboard, only the following aggregate
functions are allowed in a vizboard computed field expressions:
• DISTINCT()
Page 300
Data Analysis and Visualization Guide - Platfora Expressions
• MIN()
• MAX()
• ROLLUP
Operators in an Expression
Platfora has a number of built-in operators for doing arithmetic, logical, and comparison operations.
Often, you'll use operators to combine or compare values. The values can be literal values, field values,
or even other expressions.
Arithmetic Operators
Arithmetic operators perform basic math operations on two values of the same data type. For example,
you could calculate the gross profit margin percentage using the values of a total_revenue and
total_cost field as follows:
((total_revenue - total_cost) / total_cost) * 100
Or you can use the plus (+) operator to combine STRING values:
"Firstname" + " " + "Lastname"
You can use the plus (+) and minus (-) operators to add or subtract DATETIME values. The following
table lists the math operators:
Operator
Description
Example
+
Addition
amount + 10
(add 10 to the value of the
amount
field)
-
Subtraction
amount - 10
(subtract 10 from the value of the
amount
field)
*
Multiplication
amount * 100
(multiply the value of the
amount
field by 100)
/
Division
bytes / 1024
(divide the value of the
bytes
field by 1024 and return the quotient)
Page 301
Data Analysis and Visualization Guide - Platfora Expressions
Comparison Operators
Comparison operators are used to define Boolean (true / false) expressions. They test whether two values
are equivalent. Comparisons return 1 for true, 0 for false. If the comparison is invalid, for example
comparing a STRING to an INTEGER, the comparison operator returns NULL.
For example, you could use comparison operators within a CASE expression:
CASE WHEN age <= 25 THEN "0-25"
WHEN age <= 50 THEN "26-50
ELSE "over 50" END
This expression compares the value in the age field to a literal number value. If true, it returns the
appropriate STRING value.
You cannot use comparison operators to test for equality between DATETIME values. The following
table lists the comparison operators:
Operator
Meaning
Example Expression
= or ==
Equal to
order_date = "12/22/2011"
>
Greater than
age > 18
!>
Not greater than
age !> 8
<
Less than
age < 30
!<
Not less than
age !< 12
>=
Greater than or equal to
age >= 20
<=
Less than or equal to
age <= 29
<> or != or ^=
Not equal to
age <> 30
Logical Operators
Logical operators are used in expressions to test for a condition. Logical operators are often used in
lens filters, CASE expressions, and PARTITION expressions. Filters test if a field or value meets some
condition. For example, this tests if a date falls between two other dates.
BETWEEN 2013-06-01 AND 2013-07-31
Logical operators are also used to construct WHERE clauses in Platfora's query language. The following
table lists the logical operators:
Operator
Meaning
Example Expression
AND
Test whether two
conditions are true.
OR
Test if either of two
conditions are true.
Page 302
Data Analysis and Visualization Guide - Platfora Expressions
Operator
Meaning
BETWEEN
Test whether a date or year BETWEEN 2000 AND 2012
numeric value is within
the min and max values
(inclusive).
IN(list)
Test whether a value is product_type
within a set.
IN("tablet","phone","laptop")
LIKE("pattern")
Simple inclusive caseinsensitive character
pattern matching.
The * character
matches any number
of characters. The ?
character matches
exactly one character.
last_name LIKE("?utch*")
matches Kutcher, hutch but not Krutcher or
crutch
Check whether a field
value or expression is
null (empty)
ship_date IS NULL
evaluates to true when the ship_date field is
Reverses the value of
other operators.
• year NOT BETWEEN 2000 AND 2012
min_value AND
max_value
value
IS NULL
NOT
Example Expression
company_name LIKE("platfora")
matches Platfora or platfora
empty
• first_name NOT LIKE("Jo?n*")
excludes John, jonny but not Jon or Joann
• Date.Weekday NOT
IN("Saturday","Sunday")
• purchase_date IS NOT NULL
evaluates to true when the purchase_date
field is not empty
Fields in an Expression
Expressions often operate on the values of a field. This section explains how to use field names in
expressions.
Referring to Fields in the Current Dataset
When you specify a field name in an expression, if the field name does not contain spaces or special
characters, you can simply refer to the field by its name. For example, the following expression sums the
values of the sales field:
SUM(sales)
Page 303
Data Analysis and Visualization Guide - Platfora Expressions
Enclose field names with square brackets ([]) if they contain spaces, special characters, reserved
keywords (such as function names), or start with numeric characters. For example:
SUM([Sale Amount])
SUM([2013_data])
SUM([count])
If a field name contains a ] (closing square bracket), you must escape the closing square bracket by
doubling it ]]. So if the field name is:
Min([crs_flight_duration])
You enclose the entire field name in square brackets and escape the closing bracket that is part of the
actual field name:
[Min([crs_flight_duration]])]>
If you are using the expression builder, it provides the correct escapes for you.
Field is a synonym for dataset column. The documentation uses the word field
because that is the terminology used in Platfora's user interface.
Use Dot Notation for Fields in a Referenced Dataset
Your expression might refer to a field in the focus dataset. (Focus dataset is simply the current dataset
you are working with.) You also might include a field in a referenced dataset. When including fields
in a referenced dataset, you must qualify the field name with the proper notation. The convention is
reference_name.field_name.
Don't confuse a reference name with the dataset name; they are not the same. When you create a
reference link in a dataset, you give that reference its own name. Use . (dot) notation to separate the two
components.
For example consider, the Airports dataset which goes by the Departure Airport reference name. To
refer to the City field of the Departure Airport reference to the Airports dataset, you would use the
notation:
[Departure Airport].City
Just as with field names, you must escape reference names if they contain spaces, special characters,
reserved keywords (such as function names), or start with numeric characters.
Aggregated Functions and Fields in a Referenced Dataset
Aggregate functions can only operate on fields in the current focus dataset. You cannot directly calculate
a measure on a field belonging to a referenced dataset. For example, the following expression is not
allowed:
DISTINCT([Departure Airport].City)
Page 304
Data Analysis and Visualization Guide - Platfora Expressions
Instead, use a two-step process to 'pull up' a referenced field into the current dataset. First, define
Departure Airport City computed field whose expression is just the path to the referenced dataset field:
[Departure Airport].City
Then, you can use the interim Departure Airport City computed field as an argument to the aggregate
expression. For example:
DISTINCT([Departure Airport City])
Literal Values in an Expression
Sometimes you need to use a literal value in an expression, as opposed to a field value. How you specify
a literal value depends on its data type (text, numeric, or date). This section explains how to use literals
in expressions.
Literal STRING Values
To specify a literal or actual STRING value, enclose the value in double quotes ("). For example, this
expression converts the values of a gender field to the literal values of male, female, or unknown:
CASE WHEN gender="M" THEN "male" WHEN gender="F" THEN "female"
ELSE "unknown" END
To escape a literal quote within a literal value itself, double the literal quote character. For example:
CASE WHEN height="60""" THEN "5 feet" WHEN height="72""" THEN "6 feet"
ELSE "other" END
The REGEX() function is a special case. In the REGEX() function, string expressions are also enclosed
in quotes. When a string expression contains literal quotes, double the literal quote character. For
example:
REGEX(height, "\d\'(\d)+""")
Literal DATE and DATETIME Values
To refer to a DATETIME value in a lens filter expression, the date format must be yyyy-MM-dd without
any enclosing quotation marks or other punctuation.
order_date BETWEEN 2012-12-01 AND 2012-12-31
To refer to a literal date value in a computed field expression, you must specify the format of the date
and time components using TO_DATE, which takes a string literal argument and a format string. For
example:
CASE WHEN order_date=TO_DATE("2013-01-01 00:00:59 PST","yyyy-MM-dd
HH:mm:ss z") THEN "free shipping" ELSE "standard shipping" DONE
Page 305
Data Analysis and Visualization Guide - Platfora Expressions
Literal Numeric Values
For literal numeric values, you can just specify the number itself without any special escaping or
formatting. For example:
CASE WHEN is_married=1 THEN "married" is_married=0 THEN "not_married"
ELSE NULL END
PARTITION Expressions and Event Series Processing (ESP)
Computed fields that contain a PARTITION expression are considered event series processing (ESP)
computed fields. You can add ESP computed fields to Platfora datasets only (not vizboards).
Event series processing is also referred to as pattern matching or event correlation. Use event series
processing (ESP) to partition the rows of a dataset, order the rows sequentially (typically by a
timestamp), and search for matching patterns among the rows.
ESP fields evaluate multiple rows in the dataset, and output one value (or column) per row. You can use
the results of an ESP computed field in other expressions or (after lens build processing) in a viz.
How Event Series Processing Works
This section explains how even series processing works by walking you through a simple use of the
PARTITION expression.
This example uses some weblog page view data. Each row represents a page view at a given point in
time within a user session. Each session is unique and belongs to only one user. Users can have multiple
sessions. Within any session a user can visit any page one or more times.
SessionID
UserID
Timestamp
Page
2A
2
3/4/13 2:02 AM
products.html
1A
1
12/1/13 9:00 AM
home.html
1A
1
12/1/13 9:10 AM
products.html
1A
1
12/1/13 9:05 AM
company.html
1B
1
3/1/13 9:45 PM
home.html
1A
1
3/1/13 9:40 PM
checkout.html
2A
2
3/4/13 2:56 AM
checkout.html
1B
1
3/1/13 9:46 PM
products.html
1A
1
12/1/13 9:20 AM
checkout.html
Page 306
Data Analysis and Visualization Guide - Platfora Expressions
SessionID
UserID
Timestamp
Page
2A
2
3/4/13 2:20 AM
home.html
2A
2
3/4/13 2:33 AM
blogs.html
1A
1
12/1/13 9:15 AM
blogs.html
Consider the following partial PARTITION expression:
PARTITION BY SessionID
ORDER BY Timestamp
...
This paritions the rows by the SessionID. Within each partition, the function orders each row by
Timestamp in ascending order (the default order).
Suppose you wanted to find sessions where users traversed the pages in order from home.html to
products.html and then to the checkout.html page. To look for this page view pattern, you
complete the expression like this.
PARTITION BY SessionID
ORDER BY Timestamp
PATTERN (A,B,C)
DEFINE A AS Page = "home.html",
B AS Page = "product.html",
C AS Page = "checkout.html"
OUTPUT "TRUE"
The PATTERN clause describes the sequence and the DEFINE clauses assigns values to the PATTERN
elements. This pattern says that there is a match whenever there are 3 consecutive rows that meet
criteria A then B then C. If the computed field containing this PARTITION expression was called
Path=home,product,checkout, you would get output that looks like this:
SessionID
UserID
Timestamp
Page
Path=home,product,checkout
1A
1
12/1/13 9:00 AM
home.html
NULL
1A
1
12/1/13 9:05 AM
company.html
NULL
1A
1
12/1/13 9:10 AM
products.html
NULL
1A
1
12/1/13 9:15 AM
blogs.html
NULL
1A
1
12/1/13 9:20 AM
checkout.html
NULL
1B
1
3/1/13 9:40 PM
home.html
NULL
1B
1
3/1/13 9:45 PM
products.html
NULL
1B
1
3/1/13 9:46 PM
checkout.html
TRUE
Page 307
Data Analysis and Visualization Guide - Platfora Expressions
SessionID
UserID
Timestamp
Page
Path=home,product,checkout
2A
2
3/4/13 2:02 AM
products.html
NULL
2A
2
3/4/13 2:20 AM
home.html
NULL
2A
2
3/4/13 2:33 AM
blogs.html
NULL
2A
2
3/4/13 2:56 AM
checkout.html
NULL
The lens build processing that happens to produce these results is as follows:
1. Partition (or group) the rows of the dataset by session.
2. Order the rows in each partition by time (in ascending order by default).
3. Evaluate the rows against each DEFINE clause and bind the row to the symbol where there is a
match.
4. Check if the PATTERN clause conditions are met in the specified order and frequency.
5. If the PATTERN criteria is met, output TRUE as the result value for the last row that caused the
pattern to be true. Write the output results to a new computed field: Path=home,product,checkout. If
a row does not cause the pattern to be true, output nothing (NULL).
Understand Pattern Match Processing Order
During lens processing, the build evaluates patterns row-by-row from the partitions top row and going
downwards. A pattern match is evaluated based on the current row, and any rows that come before (in
terms of their position in the partition). The pattern match only looks back from the current row – it does
not look ahead to the next row in the partition.
Order processing is important to consider when you want to look for events that happened later
or next (chronologically speaking). With the default sort order (ascending), the build sorts rows
within a partition from oldest to most recent. This means that you can only pattern match backwards
chronologically (or look for events that happened previously in time).
Page 308
Data Analysis and Visualization Guide - Platfora Expressions
For example, to answer a question such as "what page did a user visit before they visited the product
page?", the following expression would return the previous (chronologically) viewed page before the
product page:
PARTITION BY SessionID
ORDER BY Timestamp ASC
PATTERN (^product_page?,A)
DEFINE product_page AS "product.html",
A AS TRUE
OUTPUT A.Page
If you want to pattern match forwards chronologically (or look for events that happened later in time),
you would specify DESC sort order in the ORDER BY clause of your PARTITION expression. For
example, to answer a question such as "what page did a user visit after they visited the product page?",
the following expression would return the next (chronologically) viewed page after the product page:
PARTITION BY SessionID
ORDER BY Timestamp DESC
PATTERN (^product_page?,A)
DEFINE product_page AS "product.html",
A AS TRUE
OUTPUT A.Page
Understand Pattern Match Precedence
By default, pattern expressions are matched from left to right. The innermost parenthetical expressions
are evaluated first and then moving outward from there.
For example, the pattern:
PATTERN (((A,B)|(C,D)),E)
Would evaluate differently than:
PATTERN (A,B|C,D,E)
Understand Regex-Style Quantifiers (Greedy and Reluctant)
The PATTERN clause can use regex-style quantifiers to denote the frequency of a match.
By default, quantifiers are greedy. This means that it matches as many rows as possible. For example:
PATTERN (A*,B?)
Causes symbol A to match zero or more rows. Symbol B can match to exactly one row.
Adding an additional question mark ? to a quantifier makes it reluctant. This means that the PATTERN
only matches to a row when the row cannot match to any other subsequent match criteria in the pattern.
For example:
PATTERN (A*?,B)
Causes symbol A to match zero or more rows, but only when symbol B does not produce a match. You
can use reluctant quantifiers to break ties when there is more than one possible match to the pattern.
Page 309
Data Analysis and Visualization Guide - Platfora Expressions
A quantifier applies to a single match criteria symbol only. You cannot apply quantifiers to parenthetical
expressions. For example, you cannot write ((A,B,C)*, D) to indicate that the asterisk quantifier
applies to the whole (A,B,C) expression.
Best Practices for Event Series Processing (ESP)
Event series processing (ESP) computed fields, unlike other computed fields, require advanced
processing during lens builds. This means they require more compute resources on your Hadoop cluster.
This section discusses what to consider when adding event series computed fields to your dataset
definitions, and the best practices when using this feature.
Use Helpful Field Names and Descriptions
In the Data Catalog and Vizboards areas of the Platfora application, event series computed fields
look just like any other dataset field. When defining event series computed fields, give them names and
descriptions that help users understand the field's purpose. This cues users on how to use a field in an
analysis.
For example, if describing an event series computed field that computes Next Page Viewed, it may be
helpful for users to know that this field is best used in conjunction with the Page field. Whatever the
current value is for the Page field, the Next Page Viewed field has the value of Page for the next click
record immediately following the current page.
Increase Partition Limit for Larger Event Series Processing Jobs
The global configuration property platfora.max.pattern.events sets the maximum number of
rows in a partition to evaluate for a pattern match. The default is one million rows.
If a partition exceeds this number of rows, the result of the PARTITION function is NULL for all the
rows that exceed the limit. For example, if you had an event series computed field that partitioned by
UserID and ordered by Timestamp, the build processes only the first million rows and ignores any rows
beyond that so the event series computed field is NULL for those rows.
If you are noticing a lot of default values in your lens data (for example: ‘January 1, 1970’ for dates or
‘NULL’ for strings), you may want to increase platfora.max.pattern.events so that all of the
rows are processed. Keep in mind that increasing this limit will consume more memory resources on the
Hadoop cluster during lens processing.
Filter Partitioning Fields to Restrict Lens Build Scope
Platfora cannot incrementally build lenses that include event series processing fields. Due to the nature
of patten matching logic, lenses with ESP fields require full lens builds that scan all of a dataset's input
data. You can limit the scope of these lens builds and improve processing time by adding a lens filter on
a dataset partitioning field.
A dataset partitioning field is different from the partition criteria of the ESP field. For Hive data sources,
partitioning fields are defined on the data source by the Hive administrator. For HDFS or S3 data
Page 310
Data Analysis and Visualization Guide - Platfora Expressions
sources, partitioning fields are defined in a Platfora dataset. If there are partitioning fields available in a
lens, the lens builder displays a special icon
next to them.
Consider How Lens Filters Impact Event Series Processing Results
Lens builds always apply lens filters on dataset partitioning fields as the first step of a lens build. This
means a build excludes some source data before processing any computed field expressions. If your lens
includes both lens filters on partitioning fields and ESP computed fields, you should take this behavior
into consideration as it can change the results of PARTITION expresssions, and ultimately, your analysis
conclusions.
For example, suppose you are analyzing web page visits by user on data from 2012 and 2013:
SessionID
UserID
Timestamp (partition field)
Page
1A
1
12/1/12 9:00 AM
home.html
1A
1
12/1/12 9:05 AM
company.html
1A
1
12/1/12 9:10 AM
products.html
1A
1
12/1/12 9:15 AM
blogs.html
1B
1
3/1/13 9:40 PM
home.html
1B
1
3/1/13 9:45 PM
products.html
1B
1
3/1/13 9:46 PM
checkout.html
2A
2
3/4/13 2:02 AM
products.html
2A
2
3/4/13 2:20 AM
home.html
Page 311
Data Analysis and Visualization Guide - Platfora Expressions
SessionID
UserID
Timestamp (partition field)
Page
2A
2
3/4/13 2:33 AM
blogs.html
2A
2
3/4/13 2:56 AM
checkout.html
Timestamp is a partitioning field and it has a filter that excludes 2012 sessions. Then, you create a
computed field with an event series PARTITION function that returns a user's first visit date. When the
lens builds, the PARTITION expression would process this filtered data:
SessionID
UserID
Timestamp
Page
1B
1
3/1/13 9:40 PM
home.html
1B
1
3/1/13 9:45 PM
products.html
1B
1
3/1/13 9:46 PM
checkout.html
2A
2
3/4/13 2:02 AM
products.html
2A
2
3/4/13 2:20 AM
home.html
2A
2
3/4/13 2:33 AM
blogs.html
2A
2
3/4/13 2:56 AM
checkout.html
Additionally, the results would say UserID 1 had a first visit date of 3/1/13 even though the user's
first visit was actually 12/1/12. This discrepancy results from the build processing the lens filter on
the partitioning field (Timestamp) before the event series processing field.
Lens filters on other, non-partitioning dataset fields are applied after event series
processing.
ROLLUP Measures and Window Expressions
This section explains how to write ROLLUP and window expressions to calculate complex measures,
such as running totals, benchmark comparisons, rank ordering, percentiles, and so on.
Understand ROLLUP Measures
ROLLUP is a modifier to a measure (or aggregate) expression that allows you to operate on a subset of
rows within the overall result set of a query. Using ROLLUP you can build a frame around one or more
rows in a dataset or query result, and then compute an aggregate result in relation to that frame only.
The result of a ROLLUP expression is always a measure. However, instead of just doing a simple
aggregation, it does more complex aggregate processing over a specified set of rows (or marks in a viz).
Page 312
Data Analysis and Visualization Guide - Platfora Expressions
If you are familiar with SQL, a ROLLUP expression in Platfora is equivalent to the OVER clause in SQL.
For example, this SQL statement:
SELECT SUM(distance) OVER (PARTITION BY departure_date)
would be equivalent to this ROLLUP expression in Platfora:
ROLLUP SUM(Distance) TO [Departure Date]
What is the difference between a measure and a ROLLUP measure?
A measure is the result of an aggregate function (such as SUM) applied to a group of input data rows. For
example, using the Flights tutorial data that comes with your Platfora installation, suppose you wanted
to calculate the total distance flown by an airline. You could create a measure called Distance(Sum) with
an aggregate expression such as this:
SUM(Distance)
The group of input records passed into this aggregate calculation is then determined by the dimension(s)
used in a visualization or lens query. Records that have the same dimension members are grouped
together in a single row, which then gets represented as a mark in a viz. For example, in this viz there is
one group or mark for each Carrier/Week combination in the input data.
A ROLLUP clause modifies another aggregate function to define additional partitioning, ordering, and
window frame criteria. Like a regular aggregate function, ROLLUP also computes aggregate values over
groups of input rows. However, a ROLLUP measure then partitions the overall rows returned by the
Page 313
Data Analysis and Visualization Guide - Platfora Expressions
viz query into subsets or buckets, and then computes the aggregate expression separately within each
individual bucket.
A ROLLUP is useful when you want to compute an aggregation over a subset of rows (or marks)
independently of the overall result of the viz query. The ROLLUP function specifies how to partition the
subset of rows and how to compute the aggregation within that subset.
For example, suppose you wanted to calculate the percentage of all miles that were flown in a given
week. You could write a ROLLUP expression that calculates the percent of total distance within the
partition of a week (total distance for the week is 100%). The ROLLUP expression to define such a
calculation would look something like this:
100 * [Distance(Sum)] / ROLLUP [Distance(Sum)] TO ([Departure
Date].Week)
Then when this ROLLUP expression is used in a viz, the group of input records passed into the aggregate
calculation is determined by the dimension(s) used in the viz (such as Carrier in this case), however the
aggregation is calculated independently within each week. In this case, you can see the percentage that
each carrier contributed to the total distance flown in a given week.
How to calculate a ROLLUP over an 'adaptive' partition
A ROLLUP expression can have fixed or adaptive partitioning criteria. When you define the ROLLUP
measure expression, the TO clause of the expression specifies how to partition the data. You can either
specify an exact field name (fixed), a reference field name (adaptive), or no field name at all (adaptive).
Page 314
Data Analysis and Visualization Guide - Platfora Expressions
In the previous example, the ROLLUP expression used a fixed partition of [Departure Date].Week.
If we changed the partition criteria to use just [Departure Date] (a reference), the partition criteria
becomes adaptive to any field of that reference that is used in a viz. The expression to define an adaptive
date partition might look something like this:
100 * [Distance(Sum)] / ROLLUP [Distance(Sum)] TO ([Departure Date])
Since Departure Date is a reference that points to the Date dimension, the calculation dynamically
changes if you drill down from week to day in the viz. This expression can then be used to partition
by any granularity of Departure Date without having to rewrite the ROLLUP expression. The ROLLUP
expression adapts to any granularity of Departure Date used in a viz.
Understand ROLLUP Window Expressions
Adding an ORDER BY plus an optional RANGE or ROWS clause to a ROLLUP expression turns it into a
window expression. These clauses are used to specify an order inside of each partition, and a window
frame around all, one, or several rows over which to compute the aggregate calculation. The window
frame defines how to crop, shift, or fix the row set in relation to the position of the current row.
For example, suppose you wanted to calculate a cumulative total on a day to day basis. You could do
this by adding a window frame to your ROLLUP expression that ordered the rows in each partition by
date (using the ORDER BY clause) , and then summed up the current row and all the days that came
Page 315
Data Analysis and Visualization Guide - Platfora Expressions
before it (using a ROWS UNBOUNDED PRECEDING clause). In the Flights tutorial data, an expression
that calculated a cumulative total of flights per day would look something like this:
ROLLUP [Total Records] TO () ORDER BY ([Departure Date].Date) ROWS
UNBOUNDED PRECEDING
When this ROLLUP expression is used in a viz, the Total Records measure is computed cumulatively
by day for each partition group (the Date and Cancel Status dimensions in this case), allowing us to see
the progression of cancelled flights in the month of October 2012. This allows us to see unusual growth
patterns in the data, such as the dramatic spike in cancellations at the end of the month.
The RANK, DENSE_RANK, and NTILE functions are considered exclusively window functions because
they can only be used in a ROLLUP expression, and they always require an ordered set of rows (or
window) over which to compute their result.
Computed Field Examples
This section contains examples of some common data processing tasks you can accomplish using
Platfora computed fields.
The Expression Language Reference has examples for all of the built-in functions that Platfora provides.
Finding and Replacing Values
You may have a particular values in your data that you want to find and change to something else, or
reformat them in a way so they are all consistent. For example, find and replace values in a name field
Page 316
Data Analysis and Visualization Guide - Platfora Expressions
where name values are formatted as firstname lastname and replace them with name values
formatted as lastname, firstname:
REGEX_REPLACE(name,"(.*) (.*)","$2, $1")
Or you may have field values that are not formatted exactly the same, and want to change them so that
like values can be grouped and sorted together. For example, change all profession_title field values that
contain the word "Retired" anywhere in the string to just be a value of "Retired":
REGEX_REPLACE(profession_title,".*(Retired).*","Retired")
Extracting Information from File Names and Directories
You may have a dataset where the information you need is not inside the source files, but in the Hadoop
file name or directory path, such as dates or server names.
Suppose your dataset is based on daily log files that are organized into directories by date, and the file
names are the server IP address of the server that produced the log file.
For example, the URI path to a log file produced by server 172.12.131.118 on July 4, 2012 is:
hdfs://myhdfs-server.com/data/logs/20120704/172.12.131.118.log
The following expression uses FILE_PATH() in combination with REGEX() and TO_DATE() to
create a date field from the date directory name:
TO_DATE(REGEX(FILE_PATH(),"hdfs://myhdfs-server.com/data/logs/(\d{8})/
(?:\d{1,3}\.*)+\.log"),"yyyyMMdd")
And the following expression uses FILE_NAME() and REGEX() to extract the server IP address from
the file name:
REGEX(FILE_NAME(),"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.log")
Extracting a Portion of Field Values
You may have field values where only part of the value contains useful information. You can pull out a
portion of a field value to define a new field. For example, suppose you had an email_address field with
values in the format of [email protected], and you wanted to extract just the provider portion
of the email address:
REGEX(email,"^[a-zA-Z0-9._%+-]+@([a-zA-Z0-9._-]+)\.[a-zA-Z]{2,4}$")
Renaming Field Values
Sometimes field values are not very user-friendly. For example, a Boolean field may have values of 0
and 1 that you want to change to more human-readable values.
CASE WHEN cancelled=0 THEN "Not Cancelled" WHEN cancelled=1 THEN
"Cancelled" ELSE NULL END
Page 317
Data Analysis and Visualization Guide - Platfora Expressions
Deriving a New Field from Other Fields
You may want to combine the values of other fields to create a new field. For example, you could
combine a month, day, and year field into a single date field. This would then allow you to reference
Platfora's built-in Date dimension dataset.
TO_DATE(CONCAT(month,"/",day,"/",year),"MM/dd/yyyy")
You can also use the values of other fields to calculate a new value. For example, you could calculate a
gross profit margin percentage using the values of a revenue and cost field as follows:
((revenue - cost) / cost) * 100
Cleansing and Casting Field Values
Sometimes the are data values in a column need to be transformed and cast to another data type in order
to allow for further calculations on the data. For example, you might have some numeric data that you
want to use as a measure, however, it has string values of "NA" to represent what should really be NULL
values. You could transform the "NA" values to NULL and then cast the column to a numeric data type.
TO_INT(CASE WHEN delay_minutes="NA" then NULL ELSE delay_minutes END)
Troubleshoot Computed Field Errors
When you create a computed field Platfora catches any syntax error in your expression when you try to
save the field. This section describes the most common causes of expression syntax errors.
Function Arguments Don't Match the Expected Data Type
Functions expect input arguments to be of a certain data type. When a function uses another field as its
input argument, and that field is not of the expected data type, you might see an error such as:
Function REGEX takes 2 arguments with types STRING, STRING, but one
argument of type INTEGER was provided.
Look at the function's arguments that appear in the error message and verify they are the proper data
types. If the argument is a field, you might need to change the data type of the base field or use a data
type conversion function to cpnvert the argument to the expected data type within the expression itself.
See also: Functions in an Expression
Not Escaping Field or Dataset Names
Field and dataset names used in an expression must be enclosed in square brackets ([ ]) if they contain
spaces, special characters, reserved keywords, or start with numeric characters. When an expression
contains a field or dataset name that meets one of these criteria and is not encosed in square brackets,
you might see an error such as:
Platfora expected the string `)', but instead received `F'.
TO_LONG(New Field)
Page 318
Data Analysis and Visualization Guide - Platfora Expressions
Look at the bolded character in the expression to find the location of the error. Note the text that comes
after this position. If it is part of a field or dataset name, you need to enclose the name with square
brackets. To correct the expression in this example, use: TO_LONG([New Field])
See also: Escaping Spaces or Special Characters in Field and Dataset Names
Not Specifying the Full Path to Fields of a Referenced Dataset
Functions can use a field that is in dataset referenced from the focus dataset. You must specify the field's
full path by including the reference dataset's reference name. If you forget to use the full path, you might
see an error like:
Field not found: carrier_name
When you see the Field not found error, make sure the field is qualified with the reference name.
In this example, carrier_name is a field in a referenced dataset. The reference name in this example is
carriers. To correct this expression, use: carriers.carrier_name for the field name.
See also: Referring to Fields in a Referenced Dataset
Unenclosed Literal Strings
You can include a literal string value as a function argument, but it must be enclosed in double quotes
("). When an expression uses a literal string that isn't enclosed in double quotes, you might see an error
such as:
Field not found: Platfora
When you see the Field not found error, one option is that the alleged field is meant to be a literal
string and needs to be enclosed in double quotes. To correct this expression, use: "Platfora" for the
string.
See also: Literal Values in an Expression
Unescaped Special Characters
Field and dataset names may contain a right square bracket (]), but it must be preceded by another right
square bracket (]]). Literal strings may contain a double quote ("), but it must be preceded by another
double quote (""). Suppose you want to concatenate the strings "Hello and world." to make the
string "Hello world.". The double quotes in each string are special characters and must be escaped
in the expression. If not, you might see an error like:
Platfora expected the string `)', but instead received `H'.
CONCAT(""Hello", " world."")
Look at the bolded character in the expression to find the location of the error. To correct this error,
escape the double quotes with another double quote:
CONCAT("""Hello", " world.""")
Page 319
Data Analysis and Visualization Guide - Platfora Expressions
Invalid Syntax
Functions have specific requirements, including required arguments and keywords. When an expression
is missing a keyword, you might see an error such as:
Platfora expected a string matching the regular expression
`(?i)\Qend\E', but instead received end of source.
CASE WHEN cancel_code=0 THEN "Not Cancelled" WHEN cancel_code=1 THEN
"Cancelled" ELSE NULL
Look at the bolded character in the expression to find the location of the error. In this example, it
expected the string END (indicated by (?i)\Qend\E), but instead it reached the end of the expression.
The CASE function requires the END keyword at the end of its syntax string. To correct this error, add
END to the end of the expression:
CASE WHEN cancel_code=0 THEN "Not Cancelled" WHEN cancel_code=1 THEN
"Cancelled" ELSE NULL END
See also: Expression Language Reference
Using Row and Aggregate Functions Together in the Same Expression
Aggregate functions (functions used to define measures) cannot use nested expressions as their input
arguments. Aggregate functions can only accept field names as input. You also cannot use an aggregate
expression as input to a row function expression. Aggregate functions and row functions cannot be
mixed together in one expression.
Write a Lens Query
Platfora includes a programmatic query access feature you can use to query a lens. This section
describes support for querying lenses using Platfora's lens query language and the REST API.
Platfora allows you to make a query against an aggregate lens in your Platfora instance. This feature is
not meant as an end-user feature. Rather it is intended to allow you to write programs that issue SQLlike queries to a Platfora lens. For example, you could write a simple command-line client for querying
a lens. Since programmatic query access is meant for use by programs rather than people, a caller makes
the queries through REST API calls.
A query consists of a SELECT statement with one or more optional clauses. The statement and its
clauses use the same expression language elements you encounter when building a computed field
expression and/or a lens filter expression.
[ DEFINE alias-name AS expression [ DEFINE ... ] ]
SELECT measure-field [ AS alias-name ] | measure-expression AS alias-name [ , {
dimension-field [ AS alias-name ] | row-expression AS alias-name } [ , ...] ]
FROM lens-name
[ WHERE filter-expression [ AND filter-expression ] ]
[ GROUP BY dimension-field [ [, group-ordering ] ]
[ HAVING measure-filter-expression ]
Page 320
Data Analysis and Visualization Guide - Platfora Expressions
For example, you make a query like the following:
SELECT [device].[manufacturer], [user].[gender], [Num Users]
FROM bo_view2G_PSM
WHERE video.genre %3D "Action/Comedy"
AND user.gender !%3D "male"
GROUP BY [device].[manufacturer], [user].[gender]
Once you know the query structure, you make an REST call use the query endpoint. You can pass the
query as a parameter to a GET or as JSON body to a POST.
https://hostname:port/api/v1/query?query="HTML-encoded SELECT
statement ..."
Considerations for Using Programmatic Query Access
Here are some considerations to keep in mind when constructing lens queries:
• You can only query aggregate lenses. You cannot query event series lenses.
• Queries run against the currently built version of the lens.
• Queries that once worked can later fail because the underlying dataset or lens changed.
• You cannot do a SELECT * on a lens.
FAQs - Expression Basics
This section covers the basic concepts and common questions about the Platfora expression language.
What is an expression?
An expression computes or produces a value by combining fields (or columns), constant values,
operators, and functions. An expression outputs a value of a particular data type, such as numeric, string,
datetime, or Boolean (true/false) values. Simple expressions can be a single constant value, the values of
a given column or field, or a function call. You can use operators to join two or more simple expressions
into a complex expression.
How are expressions used in the Platfora application?
Platfora expressions allow you to select, process, transform, and manipulate data. Expressions are used
in several ways in the Platfora application:
• In Datasets, they are used to define computed fields and measures that operate on the raw source
data.
• In Lenses, they are used to define lens filters that limit the scope of raw data requested from Hadoop.
• In Vizboards, they are used to define computed fields that further manipulate the prepared data in a
lens.
Page 321
Data Analysis and Visualization Guide - Platfora Expressions
• In the Lens Query Language via the REST API, they are used to programmatically access and
manipulate the prepared data in a lens from external applications or plugins.
What is the expression builder?
The expression builder helps you create computed field expressions in the Platfora application. It
shows the available fields in the dataset or lens you are working with, plus the list of Platfora's built-in
functions and statements. It validates your expressions for correct syntax, input data types, and so on.
You can also access the help to view correct syntax and examples for all of the built-in functions and
statements.
What is a computed field expression?
A computed field expression generates its values based on a calculation or condition, and returns a value
for each input row. Computed field expressions that can contain values from other fields, constants,
mathematical operators, comparison operators, or built-in row functions.
What is a measure expression?
A measure expression generates its values as the result of an aggregate function. It takes input values
from multiple rows and returns a single aggregated value.
How are expressions used in programmatic lens queries?
Platfora's lens query language does not have a graphical user interface like the expression builder.
Instead, you can use the cURL command line, Chrome's Postman extension, or write your own plugin
extension to submit a SQL-like SELECT query statement through Platfora's REST API.
The lens query language makes use of expressions in its SELECT statement, DEFINE clause, WHERE
clause and HAVING clause.
Programmatic lens queries are subject to some of the same expression limitations as vizboard computed
fields, since they also operate on the pre-processed data in a lens.
Platfora Expression Language Reference
An expression computes or produces a value by combining field or column values, constant values,
operators, and functions. Platfora has a built-in expression language. You use the language's functions
and operators in dataset computed fields, vizboard computed fields, lens filters, and programmatic lens
queries.
Expression Quick Reference
An expression is a combination of columns (or fields), constant values, operators, and functions used
to evaluate, transform, or produce a value. Simple expressions can be combined to make more complex
expressions. This quick reference describes the functions and operators that can be used to write
expressions.
Page 322
Data Analysis and Visualization Guide - Platfora Expressions
Platfora's built-in statements, functions and operators are divided into the following categories:
• Conditional and NULL Processing
• Event Series Processing
• String Processing
• Date and Time Processing
• URL Processing
• IP Address Processing
• Mathematical Processing
• Data Type Conversion
• Aggregation and Measure Processing
• ROLLUP and Window Calculations
• User Defined Functions
• Comparison Operators
• Logical Operators
• Arithmetic Operators
Conditional and NULL Processing
Conditional and NULL processing allows you to transform or manipulate data values based on certain
defined conditions. Conditional processing (CASE) can be done at either the dataset or vizboard level.
NULL processing (COALESCE and IS_VALID) is only applicable at the dataset level. During a lens
build, any NULL values in the source data are converted to default values, so lenses and vizboards have
no concept of NULL values.
Function
Description
Example
CASE
evaluates each row in
the dataset according
to one or more input
conditions, and
outputs the specified
result when the input
conditions are met
CASE WHEN gender = "M" THEN "Male"
WHEN gender = "F" THEN "Female" ELSE
"Unknown" END
COALESCE
returns the first valid
value (NOT NULL
value) from a commaseparated list of
expressions
COALESCE(hourly_wage * 40 * 52, salary)
IS_VALID
returns 0 if the
returned value is
NULL, and 1 if the
returned value is NOT
NULL.
IS_VALID(sale_amount)
Page 323
Data Analysis and Visualization Guide - Platfora Expressions
Event Series Processing
Event series processing allows you to partition rows of input data, order the rows sequentially (typically
by a timestamp), and search for matching patterns in a set of rows. Computed fields that are defined
in a dataset using a PARTITION expression are considered event series processing computed fields.
Event series processing computed fields are processed differently than regular computed fields. Instead
of computing values from the input of a single row, they compute values from inputs of multiple rows
in the dataset. Event series processing computed fields can only be defined in the dataset - not in the
vizboard.
Function
Description
Example
PACK_VALUES
returns multiple
PACK_VALUES("ID",custid,"Age",age)
output values packed
into a single string
of key/value pairs
separated by the
Platfora default key
and pair separators
- useful when the
OUTPUT clause of a
PARTITION expression
returns multiple output
values
PARTITION
partitions the rows
of a dataset, orders
the rows sequentially
(typically by a
timestamp), and
searches for matching
patterns in a set of
rows
PARTITION BY SessionID ORDER BY
Timestamp PATTERN (A,B,C) DEFINE
A AS Page = "home.html", B AS
Page = "product.html", C AS Page =
"checkout.html" OUTPUT "TRUE"
String Functions
String functions allow you to manipulate and transform textual data, such as combining string values or
extracting a portion of a string value.
Function
Description
Example
ARRAY_CONTAINS
performs a whole
string match against
a string containing
delimited values
and returns a 1 or 0
depending on whether
or not the string
contains the search
value.
ARRAY_CONTAINS(device,",","iPad")
Page 324
Data Analysis and Visualization Guide - Platfora Expressions
Function
Description
Example
CONCAT
concatenates
(combines together)
the results of multiple
string expressions
CONCAT(month,"/",day,"/",year)
FILE_NAME
returns the original file TO_DATE(SUBSTRING(FILE_NAME(),0,8),"yyyyMMdd")
name from the source
file system
FILE_PATH
returns the full URI
path from the source
file system
TO_DATE(REGEX(FILE_PATH(),"hdfs://
myhdfs-server.com/data/logs/(\d{8})/(?:
\d{1,3}\.*)+\.log"),"yyyyMMdd")
EXTRACT_COOKIE
extracts the value
of the given cookie
identifier from a semicolon delimited list of
cookie key=value pairs.
EXTRACT_COOKIE("SSID=ABC; vID=44",
"vID") returns 44
EXTRACT_VALUE
extracts the value for
the given key from
a string containing
delimited key/value
pairs.
EXTRACT_VALUE("firstname;daria|
lastname;hutch","lastname",";","|") returns
INSTR
returns an integer
indicating the position
of a character within
a string that is the
first character of
the occurrence of a
substring.
INSTR(url,"http://",-1,1)
JAVA_STRING
returns the unescaped
version of a Java
unicode character
escape sequence as a
string value
CASE WHEN currency ==
JAVA_STRING("\u00a5") THEN "yes" ELSE
"no" END
JOIN_STRINGS
concatenates
JOIN_STRINGS("/",month,day,year)
(combines together)
the results of multiple
string expressions
with the separator in
between each non-null
value
hutch
Page 325
Data Analysis and Visualization Guide - Platfora Expressions
Function
Description
Example
JSON_ARRAY_CONTAINS
performs a whole
string match against
a string formatted
as a JSON array
and returns a 1 or 0
depending on whether
or not the string
contains the search
value
JSON_ARRAY_CONTAINS(software,"platfora")
JSON_DOUBLE
extracts a DOUBLE
value from a field in a
JSON object
JSON_DOUBLE(top_scores,"test_scores.2")
JSON_FIXED
extracts a FIXED value JSON_FIXED(top_scores,"test_scores.2")
from a field in a JSON
object
JSON_INTEGER
extracts an INTEGER
value from a field in a
JSON object
JSON_INTEGER(top_scores,"test_scores.2")
JSON_LONG
extracts a LONG value
from a field in a JSON
object
JSON_LONG(top_scores,"test_scores.2")
JSON_STRING
extracts a STRING
value from a field in a
JSON object
JSON_STRING(misc,"hobbies.0")
LENGTH
returns the count of
characters in a string
value
LENGTH(name)
REGEX
performs a whole
REGEX(weblog.request_line,"GET\s/([a-zAstring match against
Z0-9._%-]+\.[html])\sHTTP/[0-9.]+")
a string value with a
regular expression and
returns the portion of
the string matching
the first capturing
group of the regular
expression
Page 326
Data Analysis and Visualization Guide - Platfora Expressions
Function
Description
Example
REGEX_REPLACE
evaluates a string
value against a
regular expression to
determine if there is
a match, and replaces
matched strings
with the specified
replacement value
REGEX_REPLACE(phone_number,"([0-9]
{3})\.([[0-9]]{3})\.([[0-9]]{4})","\($1\)
$2-$3")
SPLIT
breaks down a
delimited input string
into sections and
returns the specified
section of the string
SPLIT("Restaurants>Location>San
Francisco",">", -1) returns San Francisco
SUBSTRING
returns the specified
characters of a string
value based on the
given start and end
position
SUBSTRING(name,0,1)
TO_LOWER
converts all alphabetic
characters in a string
to lower case
TO_LOWER("123 Main Street") returns 123
converts all alphabetic
characters in a string
to upper case
TO_UPPER("123 Main Street") returns 123
TRIM
removes leading and
trailing spaces from a
string value
TRIM(area_code)
XPATH_STRING
takes an XMLformatted string and
returns the first string
matching the given
XPath expression
XPATH_STRING(address,"//
address[@type='home']/zipcode")
XPATH_STRINGS
takes an XMLformatted string and
returns a newlineseparated array of
strings matching
the given XPath
expression
XPATH_STRINGS(address,"/list/address[1]/
street")
TO_UPPER
main street
MAIN STREET
Page 327
Data Analysis and Visualization Guide - Platfora Expressions
Function
Description
Example
XPATH_XML
takes an XMLformatted string
and returns an XMLformatted string
matching the given
XPath expression
XPATH_XML(address,"//address[last()]")
Date and Time Functions
Date and time functions allow you to manipulate and transform datetime values, such as calculating time
differences between two datetime values, or extracting a portion of a datetime value.
Function
Description
Example
DAYS_BETWEEN
calculates the
whole number of
days (ignoring
time) between two
DATETIME values
DAYS_BETWEEN(ship_date,order_date)
DATE_ADD
adds the specified time DATE_ADD(invoice_date,45,"day")
interval to a DATETIME
value
HOURS_BETWEEN
calculates the
whole number of
hours (ignoring
minutes, seconds, and
milliseconds) between
two DATETIME values
HOURS_BETWEEN(NOW(),impressions.adview_timestam
EXTRACT
returns the specified
portion of a DATETIME
value
EXTRACT("hour",order_date)
MILLISECONDS_BETWEEN
calculates the
MILLISECONDS_BETWEEN(request_timestamp,response_
MINUTES_BETWEEN calculates the whole
MINUTES_BETWEEN(impression_timestamp,conversion_t
whole number of
milliseconds between
two DATETIME values
number of minutes
(ignoring seconds and
milliseconds) between
two DATETIME values
NOW
returns the current
system date and time
as a DATETIME value
YEAR_DIFF(NOW(),users.birthdate)
Page 328
Data Analysis and Visualization Guide - Platfora Expressions
Function
Description
Example
SECONDS_BETWEEN calculates the
whole number of
seconds (ignoring
milliseconds) between
two DATETIME values
SECONDS_BETWEEN(impression_timestamp,conversion_
TRUNC
truncates a DATETIME
value to the specified
format
TRUNC(TO_DATE(order_date,"MM/dd/yyyy
HH:mm:ss"),"day")
YEAR_DIFF
calculates the
fractional number of
years between two
DATETIME values
YEAR_DIFF(NOW(),users.birthdate)
URL Functions
URL functions allow you to extract different portions of a URL string, and decode text that is URLencoded.
Function
Description
Example
URL_AUTHORITY
returns the authority
URL_AUTHORITY("http://
portion of a URL string user:[email protected]:8012/
mypage.html") returns
user:[email protected]:8012
URL_FRAGMENT
returns the fragment
URL_FRAGMENT("http://platfora.com/
portion of a URL string news.php?topic=press#Platfora%20News")
returns Platfora%20News
URL_HOST
returns the host,
URL_HOST("http://
domain, or IP address user:[email protected]:8012/
portion of a URL string mypage.html") returns mycompany.com
URL_PATH
returns the path
URL_PATH("http://platfora.com/company/
portion of a URL string contact.html") returns /company/contact.html
URL_PORT
returns the port
URL_PORT("http://
portion of a URL string user:[email protected]:8012/
mypage.html") returns 8012
URL_PROTOCOL
returns the protocol
URL_PROTOCOL("http://www.platfora.com")
(or URI scheme name) returns http
portion of a URL string
Page 329
Data Analysis and Visualization Guide - Platfora Expressions
Function
Description
Example
URL_QUERY
returns the query
URL_QUERY("http://platfora.com/news.php?
portion of a URL string topic=press&timeframe=today") returns
topic=press&timeframe=today
URLDECODE
decodes a string that
has been encoded
with the application/xwww-form-urlencoded
media type
URLDECODE("N%2FA%20or%20%22not
%20applicable%22")
IP Address Functions
IP address functions allow you to manipulate and transform STRING data consisting of IP address
values.
Function
Description
Example
CIDR_MATCH
compares two
CIDR_MATCH("60.145.56.0/24","60.145.56.246")
STRING arguments
returns 1
representing a CIDR
mask and an IP
address, and returns 1
if the IP address falls
within the specified
subnet mask or 0 if it
does not
HEX_TO_IP
converts a
HEX_TO_IP(AB20FE01) returns 171.32.254.1
hexadecimal-encoded
STRING to a text
representation of an IP
address
Math Functions
Math functions allow you to perform basic math calculations on numeric values. You can also use the
arithmetic operators to perform simple math calculations, such as addition, subtraction, division and
multiplication.
Function
Description
Example
DIV
divides two LONG
values and returns a
quotient value of type
LONG
DIV(TO_LONG(file_size),1024)
Page 330
Data Analysis and Visualization Guide - Platfora Expressions
Function
Description
Example
EXP
raises the
EXP(Value)
mathematical constant
e to the power
(exponent) of a
numeric value and
returns a value of type
DOUBLE.
FLOOR
returns the largest
integer that is less
than or equal to the
input argument
FLOOR(32.6789) returns 32.0
HASH
evenly partitions data
values into the specified
number of buckets
HASH(username,20)
LN
returns the natural
logarithm of a number
LN(2.718281828) returns 1
MOD
divides two LONG
values and returns the
remainder value of
type LONG
MOD(TO_LONG(file_size),1024)
POW
raises a numeric
100 * POW(end_value/start_value, 0.2) - 1
value to the power
(exponent) of another
numeric value and
returns a value of type
DOUBLE.
ROUND
rounds a DOUBLE
value to the specified
number of decimal
places
ROUND(32.4678954,2) returns 32.47
Page 331
Data Analysis and Visualization Guide - Platfora Expressions
Data Type Conversion Functions
Data type conversion functions allow you to cast data values from one data type to another. These
functions are used implicitly whenever you set the data type of a field or column in the Platfora user
interface. The supported data types are: INTEGER, LONG, DOUBLE, FIXED, DATETIME, and STRING
Function
Description
Example
EPOCH_MS_TO_DATEconverts LONG values
EPOCH_MS_TO_DATE(1360260240000)
to DATETIME values,
returns 2013-02-07T18:04:00:000Z
where the input
number represents the
number of milliseconds
since the epoch
TO_FIXED
converts STRING,
INTEGER, LONG, or
DOUBLE values to
fixed-decimal values
TO_FIXED(opening_price)
TO_DATE
converts STRING
values to DATETIME
values, and specifies
the format of the date
and time elements in
the string
TO_DATE(order_date,"yyyy.MM.dd 'at'
HH:mm:ss z")
TO_DOUBLE
converts STRING,
INTEGER, LONG, or
DOUBLE values to
DOUBLE (decimal)
values
TO_DOUBLE(average_rating)
TO_INT
converts STRING,
INTEGER, LONG,
or DOUBLE values
to INTEGER (whole
number) values
TO_INT(average_rating)
TO_LONG
converts STRING,
INTEGER, LONG, or
DOUBLE values to
LONG (whole number)
values
TO_LONG(average_rating)
TO_STRING
converts values of
other data types to
STRING (character)
values
TO_STRING(sku_number)
Page 332
Data Analysis and Visualization Guide - Platfora Expressions
Aggregate Functions
An aggregate function groups the values of multiple rows together based on some defined input
expression. Aggregate functions return one value for a group of rows, and are only valid for defining
measures in Platfora. In the dataset, measures can be defined using any of the aggregate functions. In the
vizboard, only the DISTINCT, MAX, or MIN aggregate functions are allowed.
Function
Description
Example
AVG
returns the average
of all valid numeric
values
AVG(sale_amount)
COUNT
returns the number of
rows in a dataset
COUNT(sales.customers)
COUNT_VALID
returns the number
of rows for which the
given expression is
valid
COUNT_VALID(page_views)
DISTINCT
returns the number of
distinct values for the
given expression
DISTINCT(user_id)
MAX
returns the biggest
value from the given
input expression
MAX(sale_amount)
MIN
returns the smallest
value from the given
input expression
MIN(sale_amount)
SUM
returns the total of all
values from the given
input expression
SUM(sale_amount)
STDDEV
calculates the
population standard
deviation for a group
of numeric values
STDDEV(sale_amount)
VARIANCE
calculates the
VARIANCE(sale_amount)
population variance
for a group of numeric
values
ROLLUP and Window Functions
ROLLUP is a modifier to an aggregate expression that turns an aggregate into a windowed aggregate.
Window functions (RANK, DENSE_RANK and NTILE) can only be used within a ROLLUP statement.
The ROLLUP statement defines the partitioning and ordering of a rowset before the associated aggregate
function or window function is applied.
Page 333
Data Analysis and Visualization Guide - Platfora Expressions
ROLLUP defines a window or user-specified set of rows within a query result set. A window function
then computes a value for each row in the window. You can use window functions to compute
aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group
results.
ROLLUP statements can be specified in either the dataset or the vizboard. When using a ROLLUP in a
vizboard, the measure for which you are calculating the ROLLUP must already exist in the lens you are
using in the vizboard.
Function
Description
Example
DENSE_RANK
assigns the rank
(position) of each row
in a group (partition)
of rows and does not
skip rank numbers in
the event of tie
ROLLUP DENSE_RANK() TO () ORDER BY
([Sales(Sum)] DESC) ROWS UNBOUNDED
PRECEDING
NTILE
divides a partitioned
group of rows into the
specified number of
buckets, and returns
the bucket number to
which the current row
belongs
ROLLUP NTILE(100) TO () ORDER BY
([Total Records] DESC) ROWS UNBOUNDED
PRECEDING AND UNBOUNDED FOLLOWING
RANK
assigns the rank
ROLLUP RANK() TO () ORDER BY
(position) of each row ([Sales(Sum)] DESC) ROWS UNBOUNDED
in a group (partition)
PRECEDING
of rows and skips rank
numbers in the event
of tie
ROLLUP
a modifier to an
aggregate function
that turns a regular
aggregate function
into a windowed,
partitioned, or adaptive
aggregate function
100 * COUNT(Flights) / ROLLUP
COUNT(Flights) TO ([Departure Date])
ROW_NUMBER
a modifier to an
aggregate function
that turns a regular
aggregate function
into a windowed,
partitioned, or adaptive
aggregate function
ROLLUP ROW_NUMBER() TO (Quarter)
ORDER BY (Sum_Sales DESC) ROWS
UNBOUNDED PRECEDING
Page 334
Data Analysis and Visualization Guide - Platfora Expressions
User Defined Functions
User defined functions (UDFs) allow you to define your own per-row processing logic, and then expose
that functionality to users in the Platfora application expression builder. See User Defined Functions
(UDFs) for more information.
Comparison Operators
Comparison operators are used to compare the equivalency of two expressions of the same data type.
The result of a comparison expression is a Boolean value (returns 1 for true, 0 for false, or NULL for
invalid). Boolean expressions are most often used to specify data processing conditions or filters.
Operator
Meaning
Example Expression
= or ==
Equal to
order_date = "12/22/2011"
>
Greater than
age > 18
!>
Not greater than
age !> 8
<
Less than
age < 30
!<
Not less than
age !< 12
>=
Greater than or equal to
age >= 20
<=
Less than or equal to
age <= 29
<> or != or ^=
Not equal to
age <> 30
Logical Operators
Logical operators are used to define Boolean (true / false) expressions. Logical operators are used
in expressions to test for a condition, and return 1 if the condition is true or 0 if it is false. Logical
operators are often used in lens filters, CASE expressions, PARTITION expressions, and WHERE clauses
of queries.
Operator
Meaning
Example Expression
AND
Test whether two
conditions are true.
OR
Test if either of two
conditions are true.
Page 335
Data Analysis and Visualization Guide - Platfora Expressions
Operator
Meaning
BETWEEN
Test whether a date or year BETWEEN 2000 AND 2012
numeric value is within
the min and max values
min_value AND
max_value
Example Expression
(inclusive).
IN(list)
Test whether a value is product_type
within a set.
IN("tablet","phone","laptop")
LIKE("pattern")
Simple inclusive caseinsensitive character
pattern matching.
The * character
matches any number
of characters. The ?
character matches
exactly one character.
last_name LIKE("?utch*")
matches Kutcher, hutch but not Krutcher or
crutch
Check whether a field
value or expression is
null (empty)
ship_date IS NULL
evaluates to true when the ship_date field is
Reverses the value of
other operators.
• year NOT BETWEEN 2000 AND 2012
value
IS NULL
NOT
company_name LIKE("platfora")
matches Platfora or platfora
empty
• first_name NOT LIKE("Jo?n*")
excludes John, jonny but not Jon or Joann
• Date.Weekday NOT
IN("Saturday","Sunday")
• purchase_date IS NOT NULL
evaluates to true when the purchase_date
field is not empty
Arithmetic Operators
Arithmetic operators perform basic math operations on two expressions of the same data type resulting
in a numeric value. The plus (+) and minus (-) operators can also be used to perform arithmetic
operations on DATETIME values.
Operator
Description
Example
+
Addition
amount + 10
(add 10 to the value of the
amount
field)
Page 336
Data Analysis and Visualization Guide - Platfora Expressions
Operator
Description
Example
-
Subtraction
amount - 10
(subtract 10 from the value of the
amount
field)
*
Multiplication
amount * 100
(multiply the value of the
amount
field by 100)
/
Division
bytes / 1024
(divide the value of the
bytes
field by 1024 and return the quotient)
Comparison Operators
Comparison operators are used to compare the equivalency of two expressions of the same data type.
The result of a comparison expression is a Boolean value (returns 1 for true, 0 for false, or NULL for
invalid). Boolean expressions are most often used to specify data processing conditions or filter criteria.
Operator Definitions
Operator
Meaning
Example Expression
= or ==
Equal to
order_date = "12/22/2011"
>
Greater than
age > 18
!>
Not greater than
age !> 8
<
Less than
age < 30
!<
Not less than
age !< 12
>=
Greater than or equal to
age >= 20
<=
Less than or equal to
age <= 29
Page 337
Data Analysis and Visualization Guide - Platfora Expressions
Operator
Meaning
Example Expression
<> or != or ^=
Not equal to
age <> 30
If you are writing queries with REST and the query string includes an = (equal)
character, you must URL encode it as %3D. Failure to encode the character can
result in this error:
string matching regex `(?i)\Qnot\E\b' expected but end of source
found.
Logical Operators
Logical operators are used to define Boolean (true / false) expressions. Logical operators are used
in expressions to test for a condition, and return 1 if the condition is true or 0 if it is false. Logical
operators are often used in lens filters, CASE expressions, PARTITION expressions, and WHERE clauses
of queries.
Operator
Meaning
AND
Test whether two
conditions are true.
OR
Test if either of two
conditions are true.
BETWEEN
Test whether a date or year BETWEEN 2000 AND 2012
numeric value is within
the min and max values
min_value AND
max_value
Example Expression
(inclusive).
IN(list)
Test whether a value is product_type
within a set.
IN("tablet","phone","laptop")
LIKE("pattern")
Simple inclusive caseinsensitive character
pattern matching.
The * character
matches any number
of characters. The ?
character matches
exactly one character.
last_name LIKE("?utch*")
matches Kutcher, hutch but not Krutcher or
crutch
Check whether a field
value or expression is
null (empty)
ship_date IS NULL
evaluates to true when the ship_date field is
value
IS NULL
company_name LIKE("platfora")
matches Platfora or platfora
empty
Page 338
Data Analysis and Visualization Guide - Platfora Expressions
Operator
Meaning
Example Expression
NOT
Reverses the value of
other operators.
• year NOT BETWEEN 2000 AND 2012
• first_name NOT LIKE("Jo?n*")
excludes John, jonny but not Jon or Joann
• Date.Weekday NOT
IN("Saturday","Sunday")
• purchase_date IS NOT NULL
evaluates to true when the purchase_date
field is not empty
Arithmetic Operators
Arithmetic operators perform basic math operations on two expressions of the same data type resulting
in a numeric value. The plus (+) and minus (-) operators can also be used to perform arithmetic
operations on DATETIME values.
Operator
Description
Example
+
Addition
amount + 10
(add 10 to the value of the
amount
field)
-
Subtraction
amount - 10
(subtract 10 from the value of the
amount
field)
*
Multiplication
amount * 100
(multiply the value of the
amount
field by 100)
/
Division
bytes / 1024
(divide the value of the
bytes
field by 1024 and return the quotient)
Conditional and NULL Processing
Conditional and NULL processing allows you to transform or manipulate data values based on certain
defined conditions. Conditional processing (CASE) can be done at either the dataset or vizboard level.
NULL processing (COALESCE and IS_VALID) is only applicable at the dataset level. During a lens
Page 339
Data Analysis and Visualization Guide - Platfora Expressions
build, any NULL values in the source data are converted to default values, so lenses and vizboards have
no concept of NULL values.
CASE
CASE is a row function that evaluates each row in the dataset according to one or more input conditions,
and outputs the specified result when the input conditions are met.
CASE WHEN input_condition [AND|OR input_condition]THEN
output_expression [...] [ELSE other_output_expression] END
Returns one value per row of the same type as the output expression. All output expressions must return
the same data type.
If there are multiple output expressions that return different data types, then you will need to enclose
your entire CASE expression in one of the data type conversion functions to explicitly cast all output
values to a particular data type.
WHEN input_condition
Required. The WHEN keyword is used to specify one or more Boolean expressions (see Platfora's
supported conditional operators). If an input value meets the condition, then the output expression
is applied. Input conditions can include other row functions in their expression, but cannot contain
aggregate functions or measure expressions. You can use the AND or OR keywords to combine multiple
input conditions.
THEN output_expression
Required. The THEN keyword is used to specify an output expression when the specified conditions
are met. Output expressions can include other row functions in their expression, but cannot contain
aggregate functions or measure expressions.
ELSE other_output_expression
Optional. The ELSE keyword can be used to specify an alternate output expression to use when the
specified conditions are not met. If an ELSE expression is not supplied, ELSE NULL is the default.
END
Required. Denotes the end of CASE function processing.
Convert values in the age column into a range-based groupings (binning):
CASE WHEN age <= 25 THEN "0-25" WHEN age <= 50 THEN "26-50" ELSE "over
50" END
Transform values in the gender column from one string to another:
CASE WHEN gender = "M" THEN "Male" WHEN gender = "F" THEN "Female" ELSE
"Unknown" END
The vehicle column contains the following values: truck, bus, car, scooter, wagon, bike, tricycle, and
motorcycle. The following example convert multiple values in the vehicle column into a single value:
Page 340
Data Analysis and Visualization Guide - Platfora Expressions
CASE WHEN vehicle in ("bike","scooter","motorcycle) THEN "two-wheelers"
ELSE "other" END
COALESCE
COALESCE is a row function that returns the first valid value (NOT NULL value) from a commaseparated list of expressions.
COALESCE(expression[,expression][,...])
Returns one value per row of the same type as the first valid input expression.
expression
At least one required. A field name or expression.
The following example shows an expression to calculate employee yearly income for exempt employees
that have a salary and non-exempt employees that have an hourly_wage. This expression checks the
values of both fields for each row, and returns the value of the first expression that is valid (NOT NULL).
COALESCE(hourly_wage * 40 * 52, salary)
IS_VALID
IS_VALID is a row function that returns 0 if the returned value is NULL, and 1 if the returned value is
NOT NULL. This is useful for computing other calculations where you want to exclude NULL values
(such as when computing averages).
IS_VALID(expression)
Returns 0 if the returned value is NULL, and 1 if the returned value is NOT NULL.
expression
Required. A field name or expression.
Define a computed field using IS_VALID. This returns a row count only for the rows where this field
value is NOT NULL. If a value is NULL, it returns 0 for that row. In this example, we create a computed
field (sale_amount_not_null) using the sale_amount field as the basis.
IS_VALID(sale_amount)
Then you can use the sale_amount_not_null computed field to calculate an acurate average for
sale_amount that excludes NULL values:
SUM(sale_amount)/SUM(sale_amount_not_null)
This is what happens automatically when you use the AVG function.
Event Series Processing
Event series processing allows you to partition rows of input data, order the rows sequentially (typically
by a timestamp), and search for matching patterns in a set of rows. Computed fields that are defined
in a dataset using a PARTITION expression are considered event series processing computed fields.
Page 341
Data Analysis and Visualization Guide - Platfora Expressions
Event series processing computed fields are processed differently than regular computed fields. Instead
of computing values from the input of a single row, they compute values from inputs of multiple rows
in the dataset. Event series processing computed fields can only be defined in the dataset - not in the
vizboard or a lens query.
PARTITION
PARTITION is an event series processing language that partitions the rows of a dataset, orders the
rows sequentially (typically by a timestamp), and searches for matching patterns in a set of rows.
Computed fields that are defined in a dataset using a PARTITION expression are considered event
series processing computed fields. Event series processing computed fields are processed differently
than regular computed fields. Instead of computing values from the input of a single row, they compute
values from inputs of multiple rows in the dataset.
The PARTITION function can only be used to define a computed field in the
dataset definition (pre-lens build). PARTITION cannot be used to define a
vizboard computed field. Unlike other expressions, PARTITION expressions
cannot be embedded within other functions or expressions - it must be a top-level
expression.
PARTITION BYfield_name
ORDER BY field_name [ASC|DESC]
PATTERN (pattern_expression)
DEFINE symbol_1 AS filter_expression
[,symbol_n AS filter_expression ]
[, ...]
OUTPUT output_expression
To understand how event series processing works, we'll walk through a simple example of a
PARTITION expression.
This is a simple example of some weblog page view data. Each row represents a page view by a user at
a give point in time. Session IDs are used to group together page views that happened in the same user
session:
Page 342
Data Analysis and Visualization Guide - Platfora Expressions
Suppose you wanted to know how many sessions included the path of page visits to ‘home.html’ then
‘products.html’ then ‘checkout.html’. You could define a PARTITION expression that groups the rows
by session, orders by time, and then iterates through the rows from top to bottom to find sessions that
match the pattern:
PARTITION BY SessionID
ORDER BY Timestamp
PATTERN (A,B,C)
DEFINE A AS Page = "home.html",
B AS Page = "product.html",
C AS Page = "checkout.html"
OUTPUT "TRUE"
1. The PARTITION BY clause partitions (or groups) the rows of the dataset by session.
2. Within each partition, the ORDER BY clause sorts the rows by time (in ascending order by default).
3. Each DEFINE clause specifies a condition used to evaluate a row, and binds that condition to a
symbol that is then used in the PATTERN clause.
4. The PATTERN clause checks if the conditions are met in the specified order and frequency. This
pattern says that there is a match whenever there are 3 consecutive rows that meet criteria A then B
then C.
5. For a row that satisfies all of the PATTERN criteria, the value of the OUTPUT clause is applied.
Otherwise the output is NULL for rows that don’t meet all of the PATTERN criteria.
Returns one value per row of the same type as the output_expression for rows that match the
defined match pattern, otherwise returns NULL for rows that do not match the pattern.
Output values are calculated during the lens build process using a special
event series MapReduce job. Therefore, sample output values for a PARTITION
computed field cannot be shown in the dataset workspace.
PARTITION BY field_name
Required. The PARTITION BY clause is used to specify a field in the current dataset by
which to partition the rows. Rows that share the same value for this field will be grouped
Page 343
Data Analysis and Visualization Guide - Platfora Expressions
together, and each group will then be processed independently according to the matching
pattern criteria.
The partition field cannot be a field of a referenced dataset; it must be a field in
the current focus dataset.
ORDER BY field_name
Optional. The ORDER BY clause specifies a field by which to sort the rows within each
partition before applying the match pattern criteria. For event series processing, records are
typically ordered by a DATETIME type field, such as a date or a timestamp. The default
sort order is ascending (first to last or low to high).
The ordering field cannot be a field of a referenced dataset; it must be a field in
the current focus dataset.
PATTERN (pattern_expression)
Required. The PATTERN clause specifies the matching pattern to search for within a
partition of rows. The pattern_expression is expressed in a format similar to a regular
expression. The pattern_expression can include:
• A symbol that represents some match criteria (as declared in the DEFINE clause).
• A symbol followed by one of the following regex quantifiers:
? (matches once or not at all - greedy construct)
?? (matches once or not at all - reluctant construct)
* (matches zero or more times - greedy construct)
*? (matches zero or more times - reluctant construct)
+ (matches one or more times - greedy construct)
+? (matches one or more times - reluctant construct)
** (matches the empty sequence, or one or more of the quantified symbol, with gaps
allowed in between. The match need not begin or end with the quantified symbol)
*+ (matches the empty sequence, or one or more of the quantified symbol, with gaps
allowed in between. The match must end with the quantified symbol)
++ (matches the quantified symbol, followed by zero or more of the quantified symbol,
with gaps allowed in between. The match must end with the quantified symbol)
+* (matches the quantified symbol, followed by zero or more of the quantified symbol,
with gaps allowed in between. The match need not end with the quantified symbol)
• A symbol or pattern of symbols anchored by the regex special characters for the
beginning of string.
Page 344
Data Analysis and Visualization Guide - Platfora Expressions
^ (marks the beginning of the set of rows that match to the pattern)
• patternA|patternB - The alternation operator (pipe symbol) between two symbols or
patterns signifies an OR match.
• patternA,patternB - The concatenation operator (comma) between two symbols or
patterns signifies a match when pattern B immediately follows pattern A.
• patternA->patternB - The follows operator (minus and greater-than sign) between
two symbols or patterns signifies a match when pattern B eventually follows pattern A.
• (pattern_expression) - By default, pattern expressions are matched from left to
right. If parenthesis are used to group sub-expressions, the sub-expression within the
parenthesis is evaluated first.
You cannot use quantifiers outside of parenthesis. For example, you cannot write
((A,B,C)*), to indicate that the asterisk quantifier applies to the whole (A,B,C)
expression.
DEFINE symbol AS filter_expression
Required. The DEFINE clause is used to enumerate symbols used in the PATTERN clause
(or in the filter_expression of a subsequent symbol definition).
A symbol is a name used to refer to some pattern matching criteria. This can be any name
or token that follows Platfora's object naming rules. For example, if the name contains
spaces, special characters, keywords, or starts with a number, you must enclose the name
in brackets [] to escape it. Otherwise, this can be any logical name that helps you identify a
piece of pattern matching logic in your expression.
The filter_expression is a Boolean (true or false) expression that operates on each row of
the partition.
A filter_expression can contain:
• The special expression TRUE or 1, meaning allow the match to occur for any row in the
partition.
• Any field_name in the current dataset.
• symbol.field_name - A field from the dataset qualified by the name of a symbol
that (1) appears only once in the PATTERN clause, (2) preceeds this symbol in the
PATTERN clause, and (3) is not followed by a repetition quantifier in the PATTERN
clause.
For example:
PATTERN (A, B) DEFINE A AS TRUE, B AS product = A.product
This means that the expression for symbol B will match to a row if the product field
for that row is also equal to the product field for the row that is bound to symbol A.
• Any of the comparison operators, such as greater than, less than, equals, and so on.
• The keywords AND or OR (for combining multiple criteria in a single filter expression)
Page 345
Data Analysis and Visualization Guide - Platfora Expressions
• FIRST|LAST(symbol.field_name) - A field from the dataset, qualified by the name
of a symbol that (1) only appears once in the PATTERN clause, (2) preceeds this symbol
in the PATTERN clause, and (3) is followed by a repetition quantifier in the PATTERN
clause (*,*?,+, or +?). This returns the field value for the first or last row when the
pattern matches to a set of rows.
For example:
PATTERN (A+) DEFINE A AS product = FIRST(A.product) OR COUNT(A)=0
The pattern A+ will match to a series of consecutive rows that all have the same value
for the product field as the first row in the sequence. If the current row happens to be
the first row in the sequence, then it will also be included in the match.
A FIRST or LAST expression evaluates to NULL if it refers to a
symbol that ends up matching an empty sequence. Make sure
your expression handles the row at the beginning or end of a
sequence if you want that row to match as well.
• Any computed expression that operates on the fields or expressions listed above and/or
on literal values.
OUTPUT output_expression
Required. An expression that specifies what the output value should be. The output
expression can refer to:
• The field declared in the PARTITION BY clause.
• symbol.field_name - A field from the dataset, qualified by the name of a symbol that
(1) appears only once in the PATTERN clause, and (2) is not followed by a repetition
quantifier in the PATTERN clause. This will output the matching field value.
• COUNT(symbol) where symbol (1) appears only once in the PATTERN clause, and
(2) is followed by a repetition quantifier in the PATTERN clause. This will output the
sequence number of the row that matched the symbol pattern.
• FIRST | LAST | SUM | COUNT | AVG(symbol.field_name) where symbol (1)
appears only once in the PATTERN clause, and (2) is followed by a repetition quantifier
in the PATTERN clause. This will output an aggregated value for a set of rows that
matched the symbol pattern.
• Since you can only output a single column value, you can use the PACK_VALUES
function to output multiple results in a single column as key/value pairs.
'Session Start Time' Expression
Calculate a user session by partitioning by user and ordering by time. The matching logic represented
by symbol A checks if the time of the current row is less than 30 minutes from the preceding row. If
it is, then it is considered part of the same session as the previous row. Otherwise, the current row is
considered the start of a new session. The PATTERN (A+) means that the matching logic represented
Page 346
Data Analysis and Visualization Guide - Platfora Expressions
by symbol A must be true for one or more consecutive rows. The output then returns the time of the first
row in a session.
PARTITION BY UserID
ORDER BY Timestamp
PATTERN (A+)
DEFINE A AS COUNT(A)=0
OR MINUTES_BETWEEN(Timestamp,LAST(A.Timestamp)) < 30
OUTPUT FIRST(A. Timestamp)
'Click Number in Session' Expression
Calculate where a click happened in a session by partitioning by session and ordering by time. The
matching logic represented by symbol A simply matches to any row in the session. The PATTERN (A
+) means that the matching logic represented by symbol A must be true for one or more consecutive
rows. The output then returns to count of the row within the partition (based on its order or position in
the partition).
PARTITION BY [Session ID]
ORDER BY Timestamp
PATTERN (A+)
DEFINE A AS TRUE
OUTPUT COUNT(A)
'Path to Page' Expression
This is a complicated expression that looks back from the current row's position to determine the
previous 4 pages viewed in a session. Since a PARTITION expression can only output one column value
as its result, the OUTPUT clause uses the PACK_VALUES function to return the previous page positions
1,2,3, and 4 in one output value. You can then use a series of EXTRACT_VALUE expressions to create
individual columns for each prior page view in the path.
PARTITION BY SessionID
ORDER BY Timestamp
PATTERN (^OtherPreviousPages*?, Page4Back??, Page3Back??, Page2Back??,
Page1Back??, CurrentPage)
DEFINE OtherPreviousPages AS TRUE,
Page4Back AS TRUE,
Page3Back AS TRUE,
Page2Back AS TRUE,
Page1Back AS TRUE,
CurrentPage AS TRUE
OUTPUT PACK_VALUES("Back4",Page4Back.Page, "Back3",Page3Back.Page,
"Back2",Page2Back.Page, "Back1",Page1Back.Page)
‘Page -1 Back’ Expression
Use the output from the Path to Page expression and extract the last page viewed before the current
page.
EXTRACT_VALUE([Path to Page],"Back1")
Page 347
Data Analysis and Visualization Guide - Platfora Expressions
PACK_VALUES
PACK_VALUES is a row function that returns multiple output values packed into a single string of key/
value pairs separated by the Platfora default key and pair separators. This is useful when the OUTPUT
clause of a PARTITION expression returns multiple output values. The string returned is in a format that
can be read by the EXTRACT_VALUE function. PACK_VALUES uses the same key and pair separator
values that EXTRACT_VALUE uses (the Unicode escape sequences u0003 and u0002, respectively).
PACK_VALUES(key_string,value_expression[,key_string,value_expression]
[,...])
Returns one value per row of type STRING. If the value for either key_string or
value_expression of a pair is null or contains either of the two separators, the full key/value pair is
omitted from the return value.
key_string
At least one required. A field name of any type, a literal string or number, or an expression that returns
any value.
value_expression
At least one required. A field name of any type, a literal string or number, or an expression that returns
any value. The expression must include one value_expression instance for each key_string instance.
Combine the values of the custid and age fields into a single string field.
PACK_VALUES("ID",custid,"Age",age)
The following expression returns ID\u00035555\u0002Age\u000329 when the value of the custid field is
5555 and the value of the age field is 29:
PACK_VALUES("ID",custid,"Age",age)
The following expression returns Age\u000329 when the value of the age field is 29:
PACK_VALUES("ID",NULL,"Age",age)
The following expression returns 29 as a STRING value when the age field is an INTEGER and its value
is 29:
EXTRACT_VALUE(PACK_VALUES("ID",custid,"Age",age),"Age")
You might want to use the PACK_VALUES function to combine multiple field values into a single value
in the OUTPUT clause of the PARTITION (event series processing) function. Then you can use the
EXTRACT_VALUE function in a different computed field in the dataset to get one of the values returned
by the PARTITION function. For example, in the example below, the PARTITION function creates a set
of rows that defines the previous five web pages accessed in a particular user session:
PARTITION BY Session
ORDER BY Time DESC
PATTERN (A?, B?, C?, D?, E)
DEFINE A AS true, B AS true, C AS true, D AS true, E AS true
OUTPUT PACK_VALUES("A", A.Page, "B", B.Page, "C", C.Page, "D", D.Page)
Page 348
Data Analysis and Visualization Guide - Platfora Expressions
String Functions
String functions allow you to manipulate and transform textual data, such as combining string values or
extracting a portion of a string value.
CONCAT
CONCAT is a row function that returns a string by concatenating (combining together) the results of
multiple string expressions.
CONCAT(value_expression[,value_expression][,...])
Returns one value per row of type STRING.
value_expression
At least one required. A field name of any type, a literal string or number, or an expression that returns
any value.
Combine the values of the month, day, and year fields into a single date field formatted as MM/DD/
YYYY.
CONCAT(month,"/",day,"/",year)
ARRAY_CONTAINS
ARRAY_CONTAINS is a row function that performs a whole string match against a string containing
delimited values and returns a 1 or 0 depending on whether or not the string contains the search value.
ARRAY_CONTAINS(array_string,"delimiter","search_string")
Returns one value per row of type INTEGER. A return value of 1 indicates a positive match, and a return
value of 0 indicates no match.
array_string
Required. The name of a field or expression of type STRING (or a literal string) that contains a valid
array.
delimiter
Required. The delimiter used between values in the array string. This can be a name of a field or
expression of type STRING.
search_string
Required. The literal string that you want to search for. This can be a name of a field or expression of
type STRING.
If you had a device field that contained a comma delimited list formatted like this:
Safari,iPad
You could determine whether or not the device used was an iPad using the following expression:
Page 349
Data Analysis and Visualization Guide - Platfora Expressions
ARRAY_CONTAINS(device,",","iPad")
The following expressions return 1:
ARRAY_CONTAINS("platfora","|","platfora")
ARRAY_CONTAINS("platfora|hadoop|2.3","|","hadoop")
The following expressions return 0:
ARRAY_CONTAINS("platfora","|","plat")
ARRAY_CONTAINS("platfora,hadoop","|","platfora")
FILE_NAME
FILE_NAME is a row function that returns the original file name from the source file system. This is
useful when the source data that comprises a dataset comes from multiple files, and there is useful
information in the file names themselves (such as dates or server names). You can use FILE_NAME in
combination with other string processing functions to extract useful information from the file name.
FILE_NAME()
Returns one value per row of type STRING.
Your dataset is based on daily log files that use an 8 character date as part of the file name. For example,
20120704.log is the file name used for the log file created on July 4, 2012. The following expression
uses FILE_NAME in combination with SUBSTRING and TO_DATE to create a date field from the first 8
characters of the file name.
TO_DATE(SUBSTRING(FILE_NAME(),0,8),"yyyyMMdd")
Your dataset is based on log files that use the server IP address as part of the file name. For example,
172.12.131.118.log is the log file name for server 172.12.131.118. The following expression uses
FILE_NAME in combination with REGEX to extract the IP address from the file name.
REGEX(FILE_NAME(),"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.log")
FILE_PATH
FILE_PATH is a row function that returns the full URI path from the source file system. This is
useful when the source data that comprises a dataset comes from multiple files, and there is useful
information in the directory names or file names themselves (such as dates or server names). You can
use FILE_PATH in combination with other string processing functions to extract useful information
from the file path.
FILE_PATH()
Returns one value per row of type STRING.
Your dataset is based on daily log files that are organized into directories by date on the source file
system, and the file names are the server IP address of the server that produced the log file. For
Page 350
Data Analysis and Visualization Guide - Platfora Expressions
example, the URI path to a log file produced by server 172.12.131.118 on July 4, 2012 is hdfs://myhdfsserver.com/data/logs/20120704/172.12.131.118.log.
The following expression uses FILE_PATH in combination with REGEX and TO_DATE to create a date
field from the date directory name.
TO_DATE(REGEX(FILE_PATH(),"hdfs://myhdfs-server.com/data/logs/(\d{8})/(?:
\d{1,3}\.*)+\.log"),"yyyyMMdd")
And the following expression uses FILE_NAME and REGEX to extract the server IP address from the file
name:
REGEX(FILE_NAME(),"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.log")
EXTRACT_COOKIE
EXTRACT_COOKIE is a row function that extracts the value of the given cookie identifier from a semicolon delimited list of cookie key=value pairs. This function can be used to extract a particular cookie
value from a combined web access log Cookie column.
EXTRACT_COOKIE("cookie_list_string",cookie_key_string)
Returns the value of the specified cookie key as type STRING.
cookie_list_string
Required. A field or literal string that has a semi-colon delimited list of cookie key=value pairs.
cookie_key_string
Required. The cookie key name for which to extract the cookie value.
Extract the value of the vID cookie from a literal cookie string:
EXTRACT_COOKIE("SSID=ABC; vID=44", "vID") returns 44
Extract the value of the vID cookie from a field named Cookie:
EXTRACT_COOKIE(Cookie,"vID")
EXTRACT_VALUE
EXTRACT_VALUE is a row function that extracts the value for the given key from a string containing
delimited key/value pairs.
EXTRACT_VALUE(string,key_name [,delimiter] [,pair_delimiter])
Returns the value of the specified key as type STRING.
string
Required. A field or literal string that contains a delimited list of key/value pairs.
key_name
Required. The key name for which to extract the value.
Page 351
Data Analysis and Visualization Guide - Platfora Expressions
delimiter
Optional. The delimiter used between the key and the value. If not specified, the value u0003 is used.
This is the Unicode escape sequence for the start of text character (which is the default delimiter used
by Hive).
pair_delimiter
Optional. The delimiter used between key/value pairs when the input string contains more than one key/
value pair. If not specified, the value u0002 is used. This is the Unicode escape sequence for the end of
text character (which is the default delimiter used by Hive).
Extract the value of the lastname key from a literal string of key/value pairs:
EXTRACT_VALUE("firstname;daria|lastname;hutch","lastname",";","|")
returns hutch
Extract the value of the email key from a string field named contact_info that contains strings in the
format of key:value,key:value:
EXTRACT_VALUE(contact_info,"email",":",",")
INSTR
INSTR is a row function that returns an integer indicating the position of a character within a string that
is the first character of the occurrence of a substring. Platfora's INSTR function is similar to the FIND
function in Excel, except that the first letter is position 0 and the order of the arguments is reversed.
INSTR(string,substring,position,occurrence)
Returns one value per row of type INTEGER. The first position is indicated with the value of zero (0).
string
Required. The name of a field or expression of type STRING (or a literal string).
substring
Required. A literal string or name of a field that specifies the substring to search for in string.
position
Optional. An integer that specifies at which character in string to start searching for substring. A value
of 0 (zero) starts the search at the beginning of string. Use a positive integer to start searching from
the beginning of string, and use a negative integer to start searching from the end of string. When no
position is specified, INSTR searches at the beginning of the string (0).
occurrence
Optional. A positive integer that specifies which occurrence of substring to search for. When no
occurrence is specified, INSTR searches for the first occurrence of the substring (1).
Return the position of the first occurrence of the substring "http://" starting at the end of the url field:
INSTR(url,"http://",-1,1)
Page 352
Data Analysis and Visualization Guide - Platfora Expressions
The following expression searches for the second occurrence of the substring "st" starting at the
beginning of the string "bestteststring". INSTR finds that the substring starts at the seventh character in
the string, so it returns 6:
INSTR("bestteststring","st",0,2)
The following expression searches backward for the second occurrence of the substring "st" starting at 7
characters before the end of the string "bestteststring". INSTR finds that the substring starts at the third
character in the string, so it returns 2:
INSTR("bestteststring","st",-7,2)
JAVA_STRING
JAVA_STRING is a row function that returns the unescaped version of a Java unicode character escape
sequence as a string value. This is useful when you want to specify unicode characters in an expression.
For example, you can use JAVA_STRING to specify the unicode value representing a control character.
JAVA_STRING(unicode_escape_sequence)
Returns the unescaped version of the specified unicode character, one value per row of type STRING.
unicode_escape_sequence
Required. A STRING value containing a unicode character expressed as a Java unicode escape
sequence. Unicode escape sequences consist ofa backslash '\' (ASCII character 92, hex 0x5c), a
'u' (ASCII 117, hex 0x75), optionally one or more additional 'u' characters, and four hexadecimal digits
(the characters '0' through '9' or 'a' through 'f' or 'A' through 'F'). Such sequences represent the UTF-16
encoding of a Unicode character. For example, the letter 'a' is equivalent to '\u0061'.
Evaluates whether the currency field is equal to the yen symbol.
CASE WHEN currency == JAVA_STRING("\u00a5") THEN "yes" ELSE "no" END
JOIN_STRINGS
JOIN_STRINGS is a row function that returns a string by concatenating (combining together) the results
of multiple values with the separator in between each non-null value.
JOIN_STRINGS(separator,value_expression[,value_expression][,...])
Returns one value per row of type STRING.
separator
Required. A field name of type STRING, a literal string, or an expression that returns a string.
value_expression
At least one required. A field name of any type, a literal string or number, or an expression that returns
any value.
Combine the values of the month, day, and year fields into a single date field formatted as MM/DD/
YYYY.
Page 353
Data Analysis and Visualization Guide - Platfora Expressions
JOIN_STRINGS("/",month,day,year)
The following expression returns NULL:
JOIN_STRINGS("+",NULL,NULL,NULL)
The following expression returns a+b:
JOIN_STRINGS("+","a","b",NULL)
JSON_ARRAY_CONTAINS
JSON_ARRAY_CONTAINS is a row function that performs a whole string match against a string
formatted as a JSON array and returns a 1 or 0 depending on whether or not the string contains the
search value.
JSON_ARRAY_CONTAINS(json_array_string,"search_string")
Returns one value per row of type INTEGER. A return value of 1 indicates a positive match, and a return
value of 0 indicates no match.
json_array_string
Required. The name of a field or expression of type STRING (or a literal string) that contains a valid
JSON array. A JSON array is an ordered sequence of values separated by commas and enclosed in
square brackets.
search_string
Required. The literal string that you want to search for. This can be a name of a field or expression of
type STRING.
If you have a software field that contains a JSON array formatted like this:
["hadoop","platfora"]
The following expression returns 1:
JSON_ARRAY_CONTAINS(software,"platfora")
JSON_DOUBLE
JSON_DOUBLE is a row function that extracts a DOUBLE value from a field in a JSON object.
JSON_DOUBLE(json_string,"json_field")
Returns one value per row of type DOUBLE.
json_string
Required. The name of a field or expression of type STRING (or a literal string) that contains a valid
JSON object.
json_field
Required. The key or name of the field value you want to extract.
Page 354
Data Analysis and Visualization Guide - Platfora Expressions
For top-level fields, specify the name identifier (key) of the field.
To access fields within a nested object, specify a dot-separated path of field names (for example
top_level_field_name.nested_field_name).
To extract a value from an array, specify the dot-separated path of field names and the array position
starting at 0 for the first value in an array, 1 for the second value, and so on (for example, field_name.0).
If the name identifier contains dot or period characters within the name itself, escape the name by
enclosing it in brackets (for example, [field.name.with.dot].[another.dot.field.name]
If the field name is null (empty), use brackets with nothing in between as the identifier, for example [].
If you had a top_scores field that contained a JSON object formatted like this (with the values contained
in an array):
{"practice_scores":["538.67","674.99","1021.52"], "test_scores":
["753.21","957.88","1032.87"]}
You could extract the third value of the test_scores array using the expression:
JSON_DOUBLE(top_scores,"test_scores.2")
JSON_FIXED
JSON_FIXED is a row function that extracts a FIXED value from a field in a JSON object.
JSON_FIXED(json_string,"json_field")
Returns one value per row of type FIXED.
json_string
Required. The name of a field or expression of type STRING (or a literal string) that contains a valid
JSON object.
json_field
Required. The key or name of the field value you want to extract.
For top-level fields, specify the name identifier (key) of the field.
To access fields within a nested object, specify a dot-separated path of field names (for example
top_level_field_name.nested_field_name).
To extract a value from an array, specify the dot-separated path of field names and the array position
starting at 0 for the first value in an array, 1 for the second value, and so on (for example, field_name.0).
If the name identifier contains dot or period characters within the name itself, escape the name by
enclosing it in brackets (for example, [field.name.with.dot].[another.dot.field.name]
If the field name is null (empty), use brackets with nothing in between as the identifier, for example [].
Page 355
Data Analysis and Visualization Guide - Platfora Expressions
If you had a top_scores field that contained a JSON object formatted like this (with the values contained
in an array):
{"practice_scores":["538.67","674.99","1021.52"], "test_scores":
["753.21","957.88","1032.87"]}
You could extract the third value of the test_scores array using the expression:
JSON_FIXED(top_scores,"test_scores.2")
JSON_INTEGER
JSON_INTEGER is a row function that extracts an INTEGER value from a field in a JSON object.
JSON_INTEGER(json_string,"json_field")
Returns one value per row of type INTEGER.
json_string
Required. The name of a field or expression of type STRING (or a literal string) that contains a valid
JSON object.
json_field
Required. The key or name of the field value you want to extract.
For top-level fields, specify the name identifier (key) of the field.
To access fields within a nested object, specify a dot-separated path of field names (for example
top_level_field_name.nested_field_name).
To extract a value from an array, specify the dot-separated path of field names and the array position
starting at 0 for the first value in an array, 1 for the second value, and so on (for example, field_name.0).
If the name identifier contains dot or period characters within the name itself, escape the name by
enclosing it in brackets (for example, [field.name.with.dot].[another.dot.field.name]
If the field name is null (empty), use brackets with nothing in between as the identifier, for example [].
If you had an address field that contained a JSON object formatted like this:
{"street_address":"123 B Street", "city":"San Mateo", "state":"CA",
"zip_code":"94403"}
You could extract the zip_code value using the expression:
JSON_INTEGER(address,"zip_code")
If you had a top_scores field that contained a JSON object formatted like this (with the values contained
in an array):
{"practice_scores":["538","674","1021"], "test_scores":
["753","957","1032"]}
Page 356
Data Analysis and Visualization Guide - Platfora Expressions
You could extract the third value of the test_scores array using the expression:
JSON_INTEGER(top_scores,"test_scores.2")
JSON_LONG
JSON_LONG is a row function that extracts a LONG value from a field in a JSON object.
JSON_LONG(json_string,"json_field")
Returns one value per row of type LONG.
json_string
Required. The name of a field or expression of type STRING (or a literal string) that contains a valid
JSON object.
json_field
Required. The key or name of the field value you want to extract.
For top-level fields, specify the name identifier (key) of the field.
To access fields within a nested object, specify a dot-separated path of field names (for example
top_level_field_name.nested_field_name).
To extract a value from an array, specify the dot-separated path of field names and the array position
starting at 0 for the first value in an array, 1 for the second value, and so on (for example, field_name.0).
If the name identifier contains dot or period characters within the name itself, escape the name by
enclosing it in brackets (for example, [field.name.with.dot].[another.dot.field.name]
If the field name is null (empty), use brackets with nothing in between as the identifier, for example [].
If you had a top_scores field that contained a JSON object formatted like this (with the values contained
in an array):
{"practice_scores":["538","674","1021"], "test_scores":
["753","957","1032"]}
You could extract the third value of the test_scores array using the expression:
JSON_LONG(top_scores,"test_scores.2")
JSON_STRING
JSON_STRING is a row function that extracts a STRING value from a field in a JSON object.
JSON_STRING(json_string,"json_field")
Returns one value per row of type STRING.
json_string
Required. The name of a field or expression of type STRING (or a literal string) that contains a valid
JSON object.
Page 357
Data Analysis and Visualization Guide - Platfora Expressions
json_field
Required. The key or name of the field value you want to extract.
For top-level fields, specify the name identifier (key) of the field.
To access fields within a nested object, specify a dot-separated path of field names (for example
top_level_field_name.nested_field_name).
To extract a value from an array, specify the dot-separated path of field names and the array position
starting at 0 for the first value in an array, 1 for the second value, and so on (for example, field_name.0).
If the name identifier contains dot or period characters within the name itself, escape the name by
enclosing it in brackets (for example, [field.name.with.dot].[another.dot.field.name]
If the field name is null (empty), use brackets with nothing in between as the identifier, for example [].
If you had an address field that contained a JSON object formatted like this:
{"street_address":"123 B Street", "city":"San Mateo", "state":"CA",
"zip":"94403"}
You could extract the state value using the expression:
JSON_STRING(address,"state")
If you had a misc field that contained a JSON object formatted like this (with the values contained in an
array):
{"hobbies":["sailing","hiking","cooking"], "interests":
["art","music","travel"]}
You could extract the first value of the hobbies array using the expression:
JSON_STRING(misc,"hobbies.0")
LENGTH
LENGTH is a row function that returns the count of characters in a string value.
LENGTH(string)
Returns one value per row of type INTEGER.
string
Required. The name of a field or expression of type STRING (or a literal string).
Return count of characters from values in the name field. For example, the value Bob would return a
length of 3, Julie would return a length of 5, and so on:
LENGTH(name)
Page 358
Data Analysis and Visualization Guide - Platfora Expressions
REGEX
REGEX is a row function that performs a whole string match against a string value with a regular
expression and returns the portion of the string matching the first capturing group of the regular
expression.
REGEX(string_expression,"regex_matching_pattern")
Returns the matched STRING value of the first capturing group of the regular expression. If there is no
match, returns NULL.
string_expression
Required. The name of a field or expression of type STRING (or a literal string).
regex_matching_pattern
Required. A regular expression pattern based on the regular expression pattern matching syntax of the
Java programming language. To return a non-NULL value, the regular expression pattern must match
the entire string value.
This section lists a summary of the most commonly used constructs for defining a regular expression
matching pattern. See the Regular Expression Reference for more information about regular expression
support in Platfora.
Literal and Special Characters
The most basic form of pattern matching is the match of literal characters. For example, if the regular
expression is foo and the input string is foo, the match will succeed because the strings are identical.
Certain characters are reserved for special use in regular expressions. These special characters are often
called metacharacters. If you want to use special characters as literal characters, they must be escaped.
You can escape a single character using a \ (backslash), or escape a character sequence by enclosing it
in \Q ... \E.
To escape literal double-quotes, double the double-quotes ("").
Character Name
Character
Reserved For
opening bracket
[
start of a character class
closing bracket
]
end of a character class
hyphen
-
character ranges within a character class
backslash
\
general escape character
caret
^
beginning of string, negating of a character class
dollar sign
$
end of string
period
.
matching any single character
Page 359
Data Analysis and Visualization Guide - Platfora Expressions
Character Name
Character
Reserved For
pipe
|
alternation (OR) operator
question mark
?
optional quantifier, quantifier minimizer
asterisk
*
zero or more quantifier
plus sign
+
once or more quantifier
opening parenthesis
(
start of a subexpression group
closing parenthesis
)
end of a subexpression group
opening brace
{
start of min/max quantifier
closing brace
}
end of min/max quantifier
Character Class Constructs
A character class allows you to specify a set of characters, enclosed in square brackets, that can produce
a single character match. There are also a number of special predefined character classes (backslash
character sequences that are shorthand for the most common character sets).
Construct
Type
Description
[abc]
simple
matches
a
or
b
or
c
[^abc]
negation
matches any character except
a
or
b
or
c
Page 360
Data Analysis and Visualization Guide - Platfora Expressions
Construct
Type
Description
[a-zA-Z]
range
matches
a
through
z
, or
A
through
Z
(inclusive)
[a-d[m-p]]
union
matches
a
through
d
, or
m
through
p
[a-z&&[def]]
intersection matches
d
,
e
, or
f
[a-z&&[^xq]]
subtraction matches
a
through
z
, except for
x
and
q
Predefined Character Classes
Page 361
Data Analysis and Visualization Guide - Platfora Expressions
Predefined character classes offer convenient shorthands for commonly used regular expressions.
Construct
Description
Example
.
matches any single character (except newline)
.at
matches "cat", "hat", and also"bat" in the
phrase "batch files"
\d
\D
matches any digit character (equivalent to
\d
[0-9]
)
matches "3" in "C3PO" and "2" in
"file_2.txt"
matches any non-digit character (equivalent to
\D
[^0-9]
matches "S" in "900S" and "Q" in "Q45"
)
\s
matches any single white-space character
(equivalent to
[ \t\n\x0B\f\r]
\sbook
matches "book" in "blue book" but
nothing in "notebook"
)
\S
matches any single non-white-space character
\Sbook
matches "book" in "notebook" but
nothing in "blue book"
\w
matches any alphanumeric character, including r\w*
underscore (equivalent to
matches "rm" and "root"
[A-Za-z0-9_]
)
\W
matches any non-alphanumeric character
(equivalent to
[^A-Za-z0-9_]
\W
matches "&" in "stmd &" , "%" in
"100%", and "$" in "$HOME"
)
Line and Word Boundaries
Boundary matching constructs are used to specify where in a string to apply a matching pattern. For
example, you can search for a particular pattern within a word boundary, or search for a pattern at the
beginning or end of a line.
Construct
Description
Example
^
matches from the beginning of a line (multiline matches are currently not supported)
^172
Page 362
will match the "172" in IP address
"172.18.1.11" but not in "192.172.2.33"
Data Analysis and Visualization Guide - Platfora Expressions
Construct
Description
Example
$
matches from the end of a line (multi-line
matches are currently not supported)
d$
matches within a word boundary
\bis\b
\b
will match the "d" in "maid" but not in
"made"
matches the word "is" in "this is my
island", but not the "is" part of "this" or
"island".
\bis
matches both "is" and the "is" in "island",
but not in "this".
\B
\Bb
matches within a non-word boundary
matches "b" in "sbin" but not in "bash"
Quantifiers
Quantifiers specify how often the preceding regular expression construct should match. There are three
classes of quantifiers: greedy, reluctant, and possessive. The difference between greedy, reluctant, and
possessive quantifiers involves what part of the string to try for the initial match, and how to retry if the
initial attempt does not produce a match.
Greedy ReluctantPossessiveDescription
ConstructConstructConstruct
Example
?
matches the previous
character or construct once
or not at all
st?on
matches the previous
character or construct zero
or more times
if*
matches the previous
character or construct one
or more times
if+
matches the previous
character or construct
exactly
o{2}
*
+
{n}
??
*?
+?
{n}?
?+
*+
++
{n}+
n
times
Page 363
matches "son" in "johnson" and "ston"
in "johnston" but nothing in "clinton" or
"version"
matches "if", "iff" in "diff", or "i" in "print"
matches "if", "iff" in "diff", but nothing in
"print"
matches "oo" in "lookup" and the first two o's
in "fooooo" but nothing in "mount"
Data Analysis and Visualization Guide - Platfora Expressions
Greedy ReluctantPossessiveDescription
ConstructConstructConstruct
Example
{n,}
o{2,}
{n,}?
{n,}+
matches the previous
character or construct at
least
matches "oo" in "lookup" all five o's in
"fooooo" but nothing in "mount"
n
times
{n,m} {n,m}? {n,m}+ matches the previous
character or construct at
least
F{2,4}
matches "FF" in "#FF0000" and the last four
F's in "#FFFFFF"
n
times, but no more than
m
times
Groups are specified by a pair of parenthesis around a subpattern in the regular expression. A pattern can
have more than one group and the groups can be nested. The groups are numbered 1-n from left to right,
starting with the first opening parenthesis. There is always an implicit group 0, which contains the entire
match. For example, the pattern:
(a(b*))+(c)
contains three groups:
group 1: (a(b*))
group 2: (b*)
group 3: (c)
Capturing Groups
By default, a group captures the text that produces a match, and only the most recent match is captured.
The REGEX function returns the string that matches the first capturing group in the regular expression.
For example, if the input string to the expression above was abc, the entire REGEX function would
match to abc, but only return the result of group 1, which is ab.
Non-Capturing Groups
In some cases, you may want to use parenthesis to group subpatterns, but not capture text. A noncapturing group starts with (?: (a question mark and colon following the opening parenthesis). For
example, h(?:a|i|o)t matches hat or hit or hot, but does not capture the a, i, or o from the
subexpression.
Match all possible email address strings with a pattern of [email protected], but only return
the provider portion of the email address from the email field:
REGEX(email,"^[a-zA-Z0-9._%+-]+@([a-zA-Z0-9._-]+)\.[a-zA-Z]{2,4}$")
Match the request line of a web log, where the value is in the format of:
Page 364
Data Analysis and Visualization Guide - Platfora Expressions
GET /some_page.html HTTP/1.1
and return just the requested HTML page names:
REGEX(weblog.request_line,"GET\s/([a-zA-Z0-9._%-]+\.[html])\sHTTP/[0-9.]+")
Extract the inches portion from a height field where example values are 6'2", 5'11" (notice the
escaping of the literal quote with a double double-quote):
REGEX(height, "\d\'(\d)+""")
Extract all of the contents of the device field when the value is either iPod, iPad, or iPhone:
REGEX(device,"(iP[ao]d|iPhone)")
REGEX_REPLACE
REGEX_REPLACE is a row function that evaluates a string value against a regular expression to
determine if there is a match, and replaces matched strings with the specified replacement value.
REGEX_REPLACE(string_expression,"regex_match_pattern","regex_replace_pattern")
Returns the regex_replace_pattern as a STRING value when regex_match_pattern produces a match. If
there is no match, returns the value of string_expression as a STRING.
string_expression
Required. The name of a field or expression of type STRING (or a literal string).
regex_match_pattern
Required. A string literal or regular expression pattern based on the regular expression pattern matching
syntax of the Java programming language. You can use capturing groups to create backreferences that
can be used in the regex_replace_pattern. You might want to use a string literal to make a case-sensitive
match. For example, when you enter jane as the match value, the function matches jane but not Jane.
The function matches all occurrences of a string literal in the string expression.
regex_replace_pattern
Required. A string literal or regular expression pattern based on the regular expression pattern
matching syntax of the Java programming language. You can refer to backreferences from the
regex_match_pattern using the syntax $n (where n is the group number).
This section lists a summary of the most commonly used constructs for defining a regular expression
matching pattern. See the Regular Expression Reference for more information.
Literal and Special Characters
The most basic form of pattern matching is the match of literal characters. For example, if the regular
expression is foo and the input string is foo, the match will succeed because the strings are identical.
Certain characters are reserved for special use in regular expressions. These special characters are often
called metacharacters. If you want to use special characters as literal characters, they must be escaped.
Page 365
Data Analysis and Visualization Guide - Platfora Expressions
You can escape a single character using a \ (backslash), or escape a character sequence by enclosing it
in \Q ... \E.
Character Name
Character
Reserved For
opening bracket
[
start of a character class
closing bracket
]
end of a character class
hyphen
-
character ranges within a character class
backslash
\
general escape character
caret
^
beginning of string, negating of a character class
dollar sign
$
end of string
period
.
matching any single character
pipe
|
alternation (OR) operator
question mark
?
optional quantifier, quantifier minimizer
asterisk
*
zero or more quantifier
plus sign
+
once or more quantifier
opening parenthesis
(
start of a subexpression group
closing parenthesis
)
end of a subexpression group
opening brace
{
start of min/max quantifier
closing brace
}
end of min/max quantifier
Character Class Constructs
Page 366
Data Analysis and Visualization Guide - Platfora Expressions
A character class allows you to specify a set of characters, enclosed in square brackets, that can produce
a single character match. There are also a number of special predefined character classes (backslash
character sequences that are shorthand for the most common character sets).
Construct
Type
Description
[abc]
simple
matches
a
or
b
or
c
[^abc]
negation
matches any character except
a
or
b
or
c
[a-zA-Z]
range
matches
a
through
z
, or
A
through
Z
(inclusive)
[a-d[m-p]]
union
matches
a
through
d
, or
m
through
p
[a-z&&[def]]
intersection matches
d
,
e
, or
f
Page 367
Data Analysis and Visualization Guide - Platfora Expressions
Construct
Type
Description
[a-z&&[^xq]]
subtraction matches
a
through
z
, except for
x
and
q
Predefined Character Classes
Predefined character classes offer convenient shorthands for commonly used regular expressions.
Construct
Description
Example
.
matches any single character (except newline)
.at
matches "cat", "hat", and also"bat" in the
phrase "batch files"
\d
\D
matches any digit character (equivalent to
\d
[0-9]
)
matches "3" in "C3PO" and "2" in
"file_2.txt"
matches any non-digit character (equivalent to
\D
[^0-9]
matches "S" in "900S" and "Q" in "Q45"
)
\s
matches any single white-space character
(equivalent to
[ \t\n\x0B\f\r]
\sbook
matches "book" in "blue book" but
nothing in "notebook"
)
\S
matches any single non-white-space character
\Sbook
matches "book" in "notebook" but
nothing in "blue book"
\w
matches any alphanumeric character, including r\w*
underscore (equivalent to
matches "rm" and "root"
[A-Za-z0-9_]
)
Page 368
Data Analysis and Visualization Guide - Platfora Expressions
Construct
Description
Example
\W
matches any non-alphanumeric character
(equivalent to
\W
[^A-Za-z0-9_]
matches "&" in "stmd &" , "%" in
"100%", and "$" in "$HOME"
)
Line and Word Boundaries
Boundary matching constructs are used to specify where in a string to apply a matching pattern. For
example, you can search for a particular pattern within a word boundary, or search for a pattern at the
beginning or end of a line.
Construct
Description
Example
^
matches from the beginning of a line (multiline matches are currently not supported)
^172
matches from the end of a line (multi-line
matches are currently not supported)
d$
matches within a word boundary
\bis\b
$
\b
will match the "172" in IP address
"172.18.1.11" but not in "192.172.2.33"
will match the "d" in "maid" but not in
"made"
matches the word "is" in "this is my
island", but not the "is" part of "this" or
"island".
\bis
matches both "is" and the "is" in "island",
but not in "this".
\B
matches within a non-word boundary
\Bb
matches "b" in "sbin" but not in "bash"
Quantifiers
Quantifiers specify how often the preceding regular expression construct should match. There are three
classes of quantifiers: greedy, reluctant, and possessive. The difference between greedy, reluctant, and
Page 369
Data Analysis and Visualization Guide - Platfora Expressions
possessive quantifiers involves what part of the string to try for the initial match, and how to retry if the
initial attempt does not produce a match.
Greedy ReluctantPossessiveDescription
ConstructConstructConstruct
Example
?
matches the previous
character or construct once
or not at all
st?on
matches the previous
character or construct zero
or more times
if*
matches the previous
character or construct one
or more times
if+
matches the previous
character or construct
exactly
o{2}
*
+
{n}
??
*?
+?
{n}?
?+
*+
++
{n}+
matches "son" in "johnson" and "ston"
in "johnston" but nothing in "clinton" or
"version"
matches "if", "iff" in "diff", or "i" in "print"
matches "if", "iff" in "diff", but nothing in
"print"
matches "oo" in "lookup" and the first two o's
in "fooooo" but nothing in "mount"
n
times
{n,}
{n,}?
{n,}+
matches the previous
character or construct at
least
o{2,}
matches "oo" in "lookup" all five o's in
"fooooo" but nothing in "mount"
n
times
{n,m} {n,m}? {n,m}+ matches the previous
character or construct at
least
F{2,4}
matches "FF" in "#FF0000" and the last four
F's in "#FFFFFF"
n
times, but no more than
m
times
Match the values in a phone_number field where phone number values are formatted as
xxx.xxx.xxxx and replace them with phone number values formatted as (xxx) xxx-xxxx:
REGEX_REPLACE(phone_number,"([0-9]{3})\.([[0-9]]{3})\.([[0-9]]
{4})","\($1\) $2-$3")
Match the values in a name field where name values are formatted as firstname lastname and
replace them with name values formatted as lastname, firstname:
Page 370
Data Analysis and Visualization Guide - Platfora Expressions
REGEX_REPLACE(name,"(.*) (.*)","$2, $1")
Match the string literal mrs in a title field and replace it with the string literal Mrs.
REGEX_REPLACE(title,"mrs","Mrs")
SPLIT
SPLIT is a row function that breaks down a delimited input string into sections and returns the specified
section of the string. A section is considered any sub-string between the specified delimiter.
SPLIT(input_string_expression,"delimiter_string",position_integer)
Returns one value per row of type STRING.
input_string_expression
Required. The name of a field or expression of type STRING (or a literal string).
delimiter_string
Required. A literal string representing the delimiter used to separate values in the input string. The
delimiter can be a single character or multiple characters.
position_integer
Required. An integer representing the position of the section in the input string that you want to extract.
Positive integers count the position from the beginning of the string, and negative integers count the
position from the end of the string. A value of 0 returns NULL.
Return the third section of the literal delimited string: Restaurants>Location>San Francisco:
SPLIT("Restaurants>Location>San Francisco",">", -1) returns San Francisco
Return the first section of a phone_number field where phone number values are in the format of
123-456-7890:
SPLIT(phone_number,"-",1)
SUBSTRING
SUBSTRING is a row function that returns the specified characters of a string value based on the given
start and end position.
SUBSTRING(string,start,end)
Returns one value per row of type STRING.
string
Required. The name of a field or expression of type STRING (or a literal string).
start
Page 371
Data Analysis and Visualization Guide - Platfora Expressions
Required. An integer that specifies where the returned characters start (inclusive), with 0 being the first
character of the string. If start is greater than the number of characters, then an empty string is returned.
If start is greater than end, then an empty string is returned.
end
Required. A positive integer that specifies where the returned characters end (exclusive), with the end
character not being part of the return value. If end is greater than the number of characters, the whole
string value (from start) is returned.
Return the first letter of the name field:
SUBSTRING(name,0,1)
TO_LOWER
TO_LOWER is a row function that converts all alphabetic characters in a string to lower case.
TO_LOWER(string_expression)
Returns one value per row of type STRING.
string_expression
Required. The name of a field or expression of type STRING (or a literal string).
Return the literal input string 123 Main Street in all lower case letters::
TO_LOWER("123 Main Street") returns 123 main street
TO_UPPER
TO_UPPER is a row function that converts all alphabetic characters in a string to upper case.
TO_UPPER(string_expression)
Returns one value per row of type STRING.
string_expression
Required. The name of a field or expression of type STRING (or a literal string).
Return the literal input string 123 Main Street in all upper case letters:
TO_UPPER("123 Main Street") returns 123 MAIN STREET
TRIM
TRIM is a row function that removes leading and trailing spaces from a string value.
TRIM(string_expression)
Returns one value per row of type STRING.
string_expression
Page 372
Data Analysis and Visualization Guide - Platfora Expressions
Required. The name of a field or expression of type STRING (or a literal string).
Return the value of the area_code field without any leading or trailing spaces. For example, if the input
string is " 650 ", then the return value would be "650":
TRIM(area_code)
Return the value of the phone_number field without any leading or trailing spaces. For example, if the
input string is " 650 123-4567 ", then the return value would be "650 123-4567" (note that the extra
spaces in the middle of the string are not removed, only the spaces at the beginning and end of the
string):
TRIM(phone_number)
XPATH_STRING
XPATH_STRING is a row function that takes an XML-formatted string and returns the first string
matching the given XPath expression.
XPATH_STRING(xml_formatted_string,"xpath_expression")
Returns one value per row of type STRING.
If the XPath expression matches more than one string in the given XML node, this function will return
the first match only. To return all matches, use XPATH_STRINGS instead.
xml_formatted_string
Required. The name of a field or a literal string that contains a valid XML node (a snippet of XML
consisting of a parent element and one or more child nodes).
xpath_expression
Required. An XPath expression that refers to a node, element, or attribute within the XML string passed
to this expression. Any XPath expression that complies to the XML Path Language (XPath) Version 1.0
specification is valid.
These example XPATH_STRING expressions assume you have a field in your dataset named address
that contains XML-formatted strings such as this:
<list>
<address type="work">
<street>1300 So. El Camino Real</street1>
<street>Suite 600</street2>
<city>San Mateo</city>
<state>CA</state>
<zipcode>94403</zipcode>
</address>
<address type="home">
<street>123 Oakdale Street</street1>
<street/>
<city>San Francisco</city>
<state>CA</state>
Page 373
Data Analysis and Visualization Guide - Platfora Expressions
<zipcode>94123</zipcode>
</address>
</list>
Get the zipcode value from any address element where the type attribute equals home:
XPATH_STRING(address,"//address[@type='home']/zipcode")
returns: 94123
Get the city value from the second address element:
XPATH_STRING(address,"/list/address[2]/city")
returns: San Francisco
Get the values from all child elements of the first address element (as one string):
XPATH_STRING(address,"/list/address")
returns: 1300 So. El Camino RealSuite 600 San MateoCA94403
XPATH_STRINGS
XPATH_STRINGS is a row function that takes an XML-formatted string and returns a newline-separated
array of strings matching the given XPath expression.
XPATH_STRINGS(xml_formatted_string,"xpath_expression")
Returns one value per row of type STRING.
If the XPath expression matches more than one string in the given XML node, this function will return
all matches separated by a newline (you cannot specify a different delimiter).
xml_formatted_string
Required. The name of a field or a literal string that contains a valid XML node (a snippet of XML
consisting of a parent element and one or more child nodes).
xpath_expression
Required. An XPath expression that refers to a node, element, or attribute within the XML string passed
to this expression. Any XPath expression that complies to the XML Path Language (XPath) Version 1.0
specification is valid.
These example XPATH_STRINGS expressions assume you have a field in your dataset named address
that contains XML-formatted strings such as this:
<list>
<address type="work">
<street>1300 So. El Camino Real</street1>
<street>Suite 600</street2>
<city>San Mateo</city>
<state>CA</state>
<zipcode>94403</zipcode>
Page 374
Data Analysis and Visualization Guide - Platfora Expressions
</address>
<address type="home">
<street>123 Oakdale Street</street1>
<street/>
<city>San Francisco</city>
<state>CA</state>
<zipcode>94123</zipcode>
</address>
</list>
Get all zipcode values from all address elements:
XPATH_STRINGS(address,"//address/zipcode")
returns:
94123
94403
Get all street values from the first address element:
XPATH_STRINGS(address,"/list/address[1]/street")
returns:
1300 So. El Camino Real
Suite 600
Get the values from all child elements of all address elements (as one string per line):
XPATH_STRINGS(address,"/list/address")
returns:
123 Oakdale StreetSan FranciscoCA94123
1300 So. El Camino RealSuite 600 San MateoCA94403
XPATH_XML
XPATH_XML is a row function that takes an XML-formatted string and returns an XML-formatted string
matching the given XPath expression.
XPATH_XML(xml_formatted_string,"xpath_expression")
Returns one value per row of type STRING in XML format.
xml_formatted_string
Required. The name of a field or a literal string that contains a valid XML node (a snippet of XML
consisting of a parent element and one or more child nodes).
xpath_expression
Required. An XPath expression that refers to a node, element, or attribute within the XML string passed
to this expression. Any XPath expression that complies to the XML Path Language (XPath) Version 1.0
specification is valid.
Page 375
Data Analysis and Visualization Guide - Platfora Expressions
These example XPATH_STRING expressions assume you have a field in your dataset named address
that contains XML-formatted strings such as this:
<list>
<address type="work">
<street>1300 So. El Camino Real</street1>
<street>Suite 600</street2>
<city>San Mateo</city>
<state>CA</state>
<zipcode>94403</zipcode>
</address>
<address type="home">
<street>123 Oakdale Street</street1>
<street/>
<city>San Francisco</city>
<state>CA</state>
<zipcode>94123</zipcode>
</address>
</list>
Get the last address node and its child nodes in XML format:
XPATH_XML(address,"//address[last()]")
returns:
<address type="home">
<street>123 Oakdale Street</street1>
<street/>
<city>San Francisco</city>
<state>CA</state>
<zipcode>94123</zipcode>
</address>
Get the city value from the second address node in XML format:
XPATH_XML(address,"/list/address[2]/city")
returns: <city>San Francisco</city>
Get the first address node and its child nodes in XML format:
XPATH_XML(address,"/list/address[1]")
returns:
<address type="work">
<street>1300 So. El Camino Real</street1>
<street>Suite 600</street2>
<city>San Mateo</city>
<state>CA</state>
<zipcode>94403</zipcode>
</address>
Page 376
Data Analysis and Visualization Guide - Platfora Expressions
URL Functions
URL functions allow you to extract different portions of a URL string, and decode text that is URLencoded.
URL_AUTHORITY
URL_AUTHORITY is a row function that returns the authority portion of a URL string. The authority
portion of a URL is the part that has the information on how to locate and connect to the server.
URL_AUTHORITY(string)
Returns the authority portion of a URL as a STRING value, or NULL if the input string is not a valid
URL.
For example, in the string http://www.platfora.com/company/contact.html, the authority
portion is www.platfora.com.
In the string http://user:[email protected]:8012/mypage.html, the authority
portion is user:[email protected]:8012.
In the string mailto:[email protected]?subject=Topic, the authority portion is
NULL.
string
Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format
of: protocol:authority[/path][?query][#fragment].
The authority portion of the URL contains the host information, which can be specified as a domain
name (www.platfora.com), a host name (localhost), or an IP address (127.0.0.1). The host
information can be preceeded by optional user information terminated with @ (for example,
username:[email protected]), and followed by an optional port number preceded by a colon
(for example, localhost:8001).
Return the authority portion of URL string values in the referrer field:
URL_AUTHORITY(referrer)
Return the authority portion of a literal URL string:
URL_AUTHORITY("http://user:[email protected]:8012/mypage.html")
returns user:[email protected]:8012
URL_FRAGMENT
URL_FRAGMENT is a row function that returns the fragment portion of a URL string.
URL_FRAGMENT(string)
Returns the fragment portion of a URL as a STRING value, NULL if the URL or does not contain a
fragment, or NULL if the input string is not a valid URL.
Page 377
Data Analysis and Visualization Guide - Platfora Expressions
For example, in the string http://www.platfora.com/contact.html#phone, the fragment
portion is phone.
In the string http://www.platfora.com/contact.html, the fragment portion is NULL.
In the string http://platfora.com/news.php?topic=press#Platfora%20News, the
fragment portion is Platfora%20News.
string
Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format
of: protocol:authority[/path][?query][#fragment].
The optional fragment portion of the URL is separated by a hash mark (#) and provides direction to a
secondary resource, such as a heading or anchor identifier.
Return the fragment portion of URL string values in the request field:
URL_FRAGMENT(request)
Return the fragment portion of a literal URL string:
URL_FRAGMENT("http://platfora.com/news.php?topic=press#Platfora%20News")
returns Platfora%20News
Return and decode the fragment portion of a literal URL string:
URLDECODE(URL_FRAGMENT("http://platfora.com/news.php?
topic=press#Platfora%20News")) returns Platfora News
URL_HOST
URL_HOST is a row function that returns the host, domain, or IP address portion of a URL string.
URL_HOST(string)
Returns the host portion of a URL as a STRING value, or NULL if the input string is not a valid URL.
For example, in the string http://www.platfora.com/company/contact.html, the host
portion is www.platfora.com.
In the string http://admin:[email protected]:8001/index.html, the host portion is
127.0.0.1.
In the string mailto:[email protected]?subject=Topic, the host portion is NULL.
string
Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format
of: protocol:authority[/path][?query][#fragment].
The authority portion of the URL contains the host information, which can be specified as a domain
name (www.platfora.com), a host name (localhost), or an IP address (127.0.0.1).
Page 378
Data Analysis and Visualization Guide - Platfora Expressions
Return the host portion of URL string values in the referrer field:
URL_HOST(referrer)
Return the host portion of a literal URL string:
URL_HOST("http://user:[email protected]:8012/mypage.html") returns
mycompany.com
URL_PATH
URL_PATH is a row function that returns the path portion of a URL string.
URL_PATH(string)
Returns the path portion of a URL as a STRING value, NULL if the URL or does not contain a path, or
NULL if the input string is not a valid URL.
For example, in the string http://www.platfora.com/company/contact.html, the path
portion is /company/contact.html.
In the string http://admin:[email protected]:8001/index.html, the path portion is /
index.html.
In the string mailto:[email protected]?subject=Topic, the path portion is
[email protected].
string
Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format
of: protocol:authority[/path][?query][#fragment].
The optional path portion of the URL is a sequence of resource location segments separated by a
forward slash (/), conceptually similar to a directory path.
Return the path portion of URL string values in the request field:
URL_PATH(request)
Return the path portion of a literal URL string:
URL_PATH("http://platfora.com/company/contact.html") returns /company/
contact.html
URL_PORT
URL_PORT is a row function that returns the port portion of a URL string.
URL_PORT(string)
Returns the port portion of a URL as an INTEGER value. If the URL does not specify a port, then returns
-1. If the input string is not a valid URL, returns NULL.
Page 379
Data Analysis and Visualization Guide - Platfora Expressions
For example, in the string http://localhost:8001, the port portion is 8001.
string
Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format
of: protocol:authority[/path][?query][#fragment].
The authority portion of the URL contains the host information, which can be specified as a
domain name (www.platfora.com), a host name (localhost), or an IP address (127.0.0.1). The
host information can be followed by an optional port number preceded by a colon (for example,
localhost:8001).
Return the port portion of URL string values in the referrer field:
URL_PORT(referrer)
Return the port portion of a literal URL string:
URL_PORT("http://user:[email protected]:8012/mypage.html") returns
8012
URL_PROTOCOL
URL_PROTOCOL is a row function that returns the protocol (or URI scheme name) portion of a URL
string.
URL_PROTOCOL(string)
Returns the protocol portion of a URL as a STRING value, or NULL if the input string is not a valid
URL.
For example, in the string http://www.platfora.com, the protocol portion is http.
In the string ftp://ftp.platfora.com/articles/platfora.pdf, the protocol portion is
ftp.
string
Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format
of: protocol:authority[/path][?query][#fragment]
The protocol portion of a URL consists of a sequence of characters beginning with a letter and followed
by any combination of letter, number, plus (+), period (.), or hyphen (-) characters, followed by a colon
(:). For example: http:, ftp:, mailto:
Return the protocol portion of URL string values in the referrer field:
URL_PROTOCOL(referrer)
Return the protocol portion of the literal URL string:
URL_PROTOCOL("http://www.platfora.com") returns http
Page 380
Data Analysis and Visualization Guide - Platfora Expressions
URL_QUERY
URL_QUERY is a row function that returns the query portion of a URL string.
URL_QUERY(string)
Returns the query portion of a URL as a STRING value, NULL if the URL or does not contain a query, or
NULL if the input string is not a valid URL.
For example, in the string http://www.platfora.com/contact.html, the query portion is
NULL.
In the string http://platfora.com/news.php?
topic=press&timeframe=today#Platfora%20News, the query portion is
topic=press&timeframe=today.
In the string mailto:[email protected]?subject=Topic, the query portion is
subject=Topic.
string
Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format
of: protocol:authority[/path][?query][#fragment].
The optional query portion of the URL is separated by a question mark (?) and typically contains an
unordered list of key=value pairs separated by an ampersand (&) or semicolon (;).
Return the query portion of URL string values in the request field:
URL_QUERY(request)
Return the query portion of a literal URL string:
URL_QUERY("http://platfora.com/news.php?topic=press&timeframe=today")
returns topic=press&timeframe=today
URLDECODE
URLDECODE is a row function that decodes a string that has been encoded with the application/
x-www-form-urlencoded media type. URL encoding, also known as percent-encoding, is a
mechanism for encoding information in a Uniform Resource Identifier (URI). When sent in an HTTP
GET request, application/x-www-form-urlencoded data is included in the query component
of the request URI. When sent in an HTTP POST request, the data is placed in the body of the message,
and the name of the media type is included in the message Content-Type header.
URLDECODE(string)
Returns a value of type STRING with characters decoded as follows:
• Alphanumeric characters (a-z, A-Z, 0-9) remain unchanged.
• The special characters hyphen (-), comma (,), underscore (_), period (.), and asterisk (*) remain
unchanged.
Page 381
Data Analysis and Visualization Guide - Platfora Expressions
• The plus sign (+) character is converted to a space character.
• The percent character (%) is interpreted as the start of a special escaped sequence, where in the
sequence %HH, HH represents the hexadecimal value of the byte. For example, some common escape
sequences are:
percent encoding sequence
value
%20
space
%0A or %0D or %0D%0A
newline
%22
double quote (")
%25
percent (%)
%2D
hyphen (-)
%2E
period (.)
%3C
less than (<)
%3D
greater than (>)
%5C
backslash (\)
%7C
pipe (|)
string
Required. A field or expression that returns a STRING value. It is assumed that all characters in the
input string are one of the following: lower-case letters (a-z), upper-case letters (A-Z), numeric digits
(0-9), or the hyphen (-), comma (,), underscore (_), period (.) or asterisk (*) character. The percent
character (%) is allowed, but is interpreted as the start of a special escaped sequence. The plus character
(+) is allowed, but is interpreted as a space character.
Decode the values of the url_query field:
URLDECODE(url_query)
Convert a literal URL encoded string (N%2FA%20or%20%22not%20applicable%22) to a humanreadable value (N/A or "not applicable"):
URLDECODE("N%2FA%20or%20%22not%20applicable%22") returns N/A or "not
applicable"
IP Address Functions
IP address functions allow you to manipulate and transform STRING data consisting of IP address
values.
Page 382
Data Analysis and Visualization Guide - Platfora Expressions
CIDR_MATCH
CIDR_MATCH is a row function that compares two STRING arguments representing a CIDR mask and
an IP address, and returns 1 if the IP address falls within the specified subnet mask or 0 if it does not.
CIDR_MATCH(CIDR_string, IP_string)
Returns an INTEGER value of 1 if the IP address falls within the subnet indicated by the CIDR mask
and 0 if it does not.
CIDR_string
Required. A field or expression that returns a STRING value containing either an IPv4 or IPv6 CIDR
mask (Classless InterDomain Routing subnet notation). An IPv4 CIDR mask can only successfully
match IPv4 addresses, and an IPv6 CIDR mask can only successfully match IPv6 addresses.
IP_string
Required. A field or expression that returns a STRING value containing either an IPv4 or IPv6 internet
protocol (IP) address.
Compare an IPv4 CIDR subnet mask to an IPv4 IP address:
CIDR_MATCH("60.145.56.0/24","60.145.56.246") returns 1
CIDR_MATCH("60.145.56.0/30","60.145.56.246") returns 0
Compare an IPv6 CIDR subnet mask to an IPv6 IP address:
CIDR_MATCH("fe80::/70","FE80::0202:B3FF:FE1E:8329") returns 1
CIDR_MATCH("fe80::/72","FE80::0202:B3FF:FE1E:8329") returns 0
HEX_TO_IP
HEX_TO_IP is a row function that converts a hexadecimal-encoded STRING to a text representation of
an IP address.
HEX_TO_IP(string)
Returns a value of type STRING representing either an IPv4 or IPv6 address. The type of IP address
returned depends on the input string. An 8 character hexadecimal string will return an IPv4 address. A
32 character long hexadecimal string will return an IPv6 address. IPv6 addresses are represented in full
length,
without removing any leading zeros and without using the compressed :: notation.
For example, 2001:0db8:0000:0000:0000:ff00:0042:8329 rather than
2001:db8::ff00:42:8329. Input strings that do not contain either 8 or 32 valid hexadecimal
characters will return NULL.
string
Page 383
Data Analysis and Visualization Guide - Platfora Expressions
Required. A field or expression that returns a hexadecimal-encoded STRING value. The hexadecimal
string must be either 8 characters long (in which case it is converted to an IPv4 address) or 32 characters
long (in which case it is converted to an IPv6 address).
Return a plain text IP address for each hexadecimal-encoded string value in the byte_encoded_ips
column:
HEX_TO_IP(byte_encoded_ips)
Convert an 8 character hexadecimal-encoded string to a plain text IPv4 address:
HEX_TO_IP(AB20FE01) returns 171.32.254.1
Convert a 32 character hexadecimal-encoded string to a plain text IPv6 address:
HEX_TO_IP(FE800000000000000202B3FFFE1E8329) returns
fe80:0000:0000:0000:0202:b3ff:fe1e:8329
Date and Time Functions
Date and time functions allow you to manipulate and transform datetime values, such as calculating time
differences between two datetime values, or extracting a portion of a datetime value.
DAYS_BETWEEN
DAYS_BETWEEN is a row function that calculates the whole number of days (ignoring time) between
two DATETIME values (value1-value2).
DAYS_BETWEEN(datetime_1,datetime_2)
Returns one value per row of type INTEGER.
datetime_1
Required. A field or expression of type DATETIME.
datetime_2
Required. A field or expression of type DATETIME.
Calculate the number of days to ship a product by subtracting the value of the order_date field from the
ship_date field:
DAYS_BETWEEN(ship_date,order_date)
Calculate the number of days since a product's release by subtracting the value of the release_date field
in the product dataset from the current date (the result of the expression):
DAYS_BETWEEN(NOW(),product.release_date)
DATE_ADD
DATE_ADD is a row function that adds the specified time interval to a DATETIME value.
Page 384
Data Analysis and Visualization Guide - Platfora Expressions
DATE_ADD(datetime,quantity,"interval")
Returns a value of type DATETIME.
datetime
Required. A field name or expression that returns a DATETIME value.
quantity
Required. An integer value. To add time intervals, use a positive integer. To subtract time intervals, use
a negative integer.
interval
Required. One of the following time intervals:
• millisecond - Adds the specified number of milliseconds to a datetime value.
• second - Adds the specified number of seconds to a datetime value.
• minute - Adds the specified number of minutes to a datetime value.
• hour - Adds the specified number of hours to a datetime value.
• day - Adds the specified number of days to a datetime value.
• week - Adds the specified number of weeks to a datetime value.
• month - Adds the specified number of months to a datetime value.
• quarter - Adds the specified number of quarters to a datetime value.
• year - Adds the specified number of years to a datetime value.
• weekyear - Adds the specified number of weekyears to a datetime value.
Add 45 days to the value of the invoice_date field to calculate the date a payment is due:
DATE_ADD(invoice_date,45,"day")
HOURS_BETWEEN
HOURS_BETWEEN is a row function that calculates the whole number of hours (ignoring minutes,
seconds, and milliseconds) between two DATETIME values (value1-value2).
HOURS_BETWEEN(datetime_1,datetime_2)
Returns one value per row of type INTEGER.
datetime_1
Required. A field or expression of type DATETIME.
datetime_2
Required. A field or expression of type DATETIME.
Calculate the number of hours to ship a product by subtracting the value of the ship_date field from the
order_date field:
HOURS_BETWEEN(ship_date,order_date)
Page 385
Data Analysis and Visualization Guide - Platfora Expressions
Calculate the number of hours since an advertisement was viewed by subtracting the value of the
adview_timestamp field in the impressions dataset from the current date and time (the result of the
expression):
HOURS_BETWEEN(NOW(),impressions.adview_timestamp)
EXTRACT
EXTRACT is a row function that returns the specified portion of a DATETIME value.
EXTRACT("extract_value",datetime)
Returns the specified extracted value as type INTEGER. EXTRACT removes leading zeros. For example,
the month of April returns a value of 4, not 04.
extract_value
Required. One of the following extract values:
• millisecond - Returns the millisecond portion of a datetime value. For example, an input datetime
value of 2012-08-15 20:38:40.213 would return an integer value of 213.
• second - Returns the second portion of a datetime value. For example, an input datetime value of
2012-08-15 20:38:40.213 would return an integer value of 40.
• minute - Returns the minute portion of a datetime value. For example, an input datetime value of
2012-08-15 20:38:40.213 would return an integer value of 38.
• hour - Returns the hour portion of a datetime value. For example, an input datetime value of
2012-08-15 20:38:40.213 would return an integer value of 20.
• day - Returns the day portion of a datetime value. For example, an input datetime value of
2012-08-15 would return an integer value of 15.
• week - Returns the ISO week number for the input datetime value. For example, an input datetime
value of 2012-01-02 would return an integer value of 1 (the first ISO week of 2012 starts on
Monday January 2). An input datetime value of 2012-01-01 would return an integer value of 52
(January 1, 2012 is part of the last ISO week of 2011).
• month - Returns the month portion of a datetime value. For example, an input datetime value of
2012-08-15 would return an integer value of 8.
• quarter - Returns the quarter number for the input datetime value, where quarters start on January 1,
April 1, July 1, or October 1. For example, an input datetime value of 2012-08-15 would return a
integer value of 3.
• year - Returns the year portion of a datetime value. For example, an input datetime value of
2012-01-01 would return an integer value of 2012.
• weekyear - Returns the year value that corresponds the the ISO week number of the input datetime
value. For example, an input datetime value of 2012-01-02 would return an integer value of 2012
(the first ISO week of 2012 starts on Monday January 2). An input datetime value of 2012-01-01
would return an integer value of 2011 (January 1, 2012 is part of the last ISO week of 2011).
datetime
Required. A field name or expression that returns a DATETIME value.
Page 386
Data Analysis and Visualization Guide - Platfora Expressions
Extract the hour portion from the order_date datetime field:
EXTRACT("hour",order_date)
Cast the value of the order_date string field to a datetime value using TO_DATE, and extract the ISO
week year:
EXTRACT("weekyear",TO_DATE(order_date,"MM/dd/yyyy HH:mm:ss"))
MILLISECONDS_BETWEEN
MILLISECONDS_BETWEEN is a row function that calculates the whole number of milliseconds between
two DATETIME values (value1-value2).
MILLISECONDS_BETWEEN(datetime_1,datetime_2)
Returns one value per row of type INTEGER.
datetime_1
Required. A field or expression of type DATETIME.
datetime_2
Required. A field or expression of type DATETIME.
Calculate the number of milliseconds it took to serve a web page by subtracting the value of the
request_timestamp field from the response_timestamp field:
MILLISECONDS_BETWEEN(request_timestamp,response_timestamp)
MINUTES_BETWEEN
MINUTES_BETWEEN is a row function that calculates the whole number of minutes (ignoring seconds
and milliseconds) between two DATETIME values (value1-value2).
MINUTES_BETWEEN(datetime_1,datetime_2)
Returns one value per row of type INTEGER.
datetime_1
Required. A field or expression of type DATETIME.
datetime_2
Required. A field or expression of type DATETIME.
Calculate the number of minutes it took for a user to click on an advertisement by subtracting the value
of the impression_timestamp field from the conversion_timestamp field:
MINUTES_BETWEEN(impression_timestamp,conversion_timestamp)
Calculate the number of minutes since a user last logged in by subtracting the login_timestamp field in
the weblogs dataset from the current date and time (the result of the expression):
Page 387
Data Analysis and Visualization Guide - Platfora Expressions
MINUTES_BETWEEN(NOW(),weblogs.login_timestamp)
NOW
NOW is a scalar function that returns the current system date and time as a DATETIME value. It can be
used in other expressions involving DATETIME type fields, such as , , or . Note that the value of NOW is
only evaluated at the time a lens is built (it is not re-evaluated with each query).
NOW()
Returns the current system date and time as a DATETIME value.
Calculate a user's age using to subtract the value of the birthdate field in the users dataset from the
current date:
YEAR_DIFF(NOW(),users.birthdate)
Calculate the number of days since a product's release using to subtract the value of the release_date
field from the current date:
DAYS_BETWEEN(NOW(),release_date)
SECONDS_BETWEEN
SECONDS_BETWEEN is a row function that calculates the whole number of seconds (ignoring
milliseconds) between two DATETIME values (value1-value2).
SECONDS_BETWEEN(datetime_1,datetime_2)
Returns one value per row of type INTEGER.
datetime_1
Required. A field or expression of type DATETIME.
datetime_2
Required. A field or expression of type DATETIME.
Calculate the number of seconds it took for a user to click on an advertisement by subtracting the value
of the impression_timestamp field from the conversion_timestamp field:
SECONDS_BETWEEN(impression_timestamp,conversion_timestamp)
Calculate the number of seconds since a user last logged in by subtracting the login_timestamp field in
the weblogs dataset from the current date and time (the result of the expression):
SECONDS_BETWEEN(NOW(),weblogs.login_timestamp)
TRUNC
TRUNC is a row function that truncates a DATETIME value to the specified format.
TRUNC(datetime,"format")
Page 388
Data Analysis and Visualization Guide - Platfora Expressions
Returns a value of type DATETIME truncated to the specified format.
datetime
Required. A field or expression that returns a DATETIME value.
format
Required. One of the following format values:
• millisecond - Returns a datetime value truncated to millisecond granularity. Has no effect since
millisecond is already the most granular format for datetime values. For example, an input datetime
value of 2012-08-15 20:38:40.213 would return a datetime value of 2012-08-15 20:38:40.213.
• second - Returns a datetime value truncated to second granularity. For example, an input datetime
value of 2012-08-15 20:38:40.213 would return a datetime value of 2012-08-15 20:38:40.000.
• minute - Returns a datetime value truncated to minute granularity. For example, an input datetime
value of 2012-08-15 20:38:40.213 would return a datetime value of 2012-08-15 20:38:00.000.
• hour - Returns a datetime value truncated to hour granularity. For example, an input datetime value
of 2012-08-15 20:38:40.213 would return a datetime value of 2012-08-15 20:00:00.000.
• day - Returns a datetime value truncated to day granularity. For example, an input datetime value of
2012-08-15 20:38:40.213 would return a datetime value of 2012-08-15 00:00:00.000.
• week - Returns a datetime value truncated to the first day of the week (starting on a Monday). For
example, an input datetime value of 2012-08-15 (a Wednesday) would return a datetime value of
2012-08-13 (the Monday prior).
• month - Returns a datetime value truncated to the first day of the month. For example, an input
datetime value of 2012-08-15 would return a datetime value of 2012-08-01.
• quarter - Returns a datetime value truncated to the first day of the quarter (January 1, April 1, July 1,
or October 1). For example, an input datetime value of 2012-08-15 20:38:40.213 would return a
datetime value of 2012-07-01.
• year - Returns a datetime value truncated to the first day of the year (January 1). For example, an
input datetime value of 2012-08-15 would return a datetime value of 2012-01-01.
• weekyear - Returns a datetime value trucated to the first day of the ISO weekyear (the ISO week
starting with the Monday which is nearest in time to January 1). For example, an input datetime value
of 2008-08-15 would return a datetime value of 2007-12-31. The first day of the ISO weekyear for
2008 is December 31, 2007 (the prior Monday closest to January 1).
Truncate the order_date datetime field to day granularity:
TRUNC(order_date,"day")
Cast the value of the order_date string field to a datetime value using TO_DATE, and truncate it to day
granularity:
TRUNC(TO_DATE(order_date,"MM/dd/yyyy HH:mm:ss"),"day")
Page 389
Data Analysis and Visualization Guide - Platfora Expressions
YEAR_DIFF
YEAR_DIFF is a row function that calculates the fractional number of years between two DATETIME
values (value1-value2).
YEAR_DIFF(datetime_1,datetime_2)
Returns one value per row of type DOUBLE.
datetime_1
Required. A field or expression of type DATETIME.
datetime_2
Required. A field or expression of type DATETIME.
Calculate the number of years a user has been a customer by subtracting the value of the
registration_date field from the current date (the result of the expression):
YEAR_DIFF(NOW(),registration_date)
Calculate a user's age by subtracting the value of the birthdate field in the users dataset from the current
date (the result of the expression):
YEAR_DIFF(NOW(),users.birthdate)
Math Functions
Math functions allow you to perform basic math calculations on numeric values. You can also use
arithmetic operators to perform simple math calculations.
DIV
DIV is a row function that divides two LONG values and returns a quotient value of type LONG (the result
is truncated to 0 decimal places).
DIV(dividend,divisor)
Returns one value per row of type LONG.
dividend
Required. A field or expression of type LONG.
divisor
Required. A field or expression of type LONG.
Cast the value of the file_size field to LONG and divide by 1024:
DIV(TO_LONG(file_size),1024)
Page 390
Data Analysis and Visualization Guide - Platfora Expressions
EXP
EXP is a row function that raises the mathematical constant e to the power (exponent) of a numeric value
and returns a value of type DOUBLE.
EXP(power)
Returns one value per row of type DOUBLE.
power
Required. A field or expression of a numeric type.
Raise e to the power in the Value field.
EXP(Value)
When the Value field value is 2.0, the result is equal to 7.3890 when truncated to four decimal places.
FLOOR
FLOOR is a row function that returns the largest integer that is less than or equal to the input argument.
FLOOR(double)
Returns one value per row of type DOUBLE.
double
Required. A field or expression of type DOUBLE.
Return the floor value of 32.6789:
FLOOR(32.6789) returns 32.0
HASH
HASH is a row function that evenly partitions data values into the specified number of buckets. It creates
a hash of the input value and assigns that value a bucket number. Equal values will always hash to the
same bucket number.
HASH(field_name,integer)
Returns one value per row of type INTEGER corresponding to the bucket number that the input value
hashes to.
field_name
Required. The name of the field whose values you want to partition.
integer
Required. The desired number of buckets. This parameter can be a numeric value of any data type, but
when it is a non-integer value, Platfora truncates the value to an integer. When the value is zero, the
function returns NULL. When the value is negative, the function uses absolute value.
Page 391
Data Analysis and Visualization Guide - Platfora Expressions
Partition the values of the username field into 20 buckets:
HASH(username,20)
LN
LN is a row function that returns the natural logarithm of a number. The natural logarithm is the
logarithm to the base e, where e (Euler's number) is a mathematical constant approximately equal to
2.718281828. The natural logarithm of a number x is the power to which the constant e must be raised in
order to equal x.
LN(positive_number)
Returns the exponent to which base e must be raised to obtain the input value, where e denotes the
constant number 2.718281828. The return value is the same data type as the input value.
For example, LN(7.389) is 2, because e to the power of 2 is approximately 7.389.
positive_number
Required. A field or expression that returns a number greater than 0. Inputs can be of type INTEGER,
LONG, DOUBLE, or FIXED.
Return the natural logarithm of base number e, which is approximately 2.718281828:
LN(2.718281828) returns 1
LN(3.0000) returns 1.098612
LN(300.0000) returns 5.703782
MOD
MOD is a row function that divides two LONG values and returns the remainder value of type LONG (the
result is truncated to 0 decimal places).
MOD(dividend,divisor)
Returns one value per row of type LONG.
dividend
Required. A field or expression of type LONG.
divisor
Required. A field or expression of type LONG.
Cast the value of the file_size field to LONG and divide by 1024:
MOD(TO_LONG(file_size),1024)
Page 392
Data Analysis and Visualization Guide - Platfora Expressions
POW
POW is a row function that raises the a numeric value to the power (exponent) of another numeric value
and returns a value of type DOUBLE.
POW(index,power)
Returns one value per row of type DOUBLE.
index
Required. A field or expression of a numeric type.
power
Required. A field or expression of a numeric type.
Calculate the compound annual growth rate (CAGR) percentage for a given investment over a five year
span.
100 * POW(end_value/start_value, 0.2) - 1
Calculate the square of the Value field.
POW(Value,2)
Calculate the square root of the Value field.
POW(Value,0.5)
The following expression returns 1.
POW(0,0)
ROUND
ROUND is a row function that rounds a DOUBLE value to the specified number of decimal places.
ROUND(double,number_decimal_places)
Returns one value per row of type DOUBLE.
double
Required. A field or expression of type DOUBLE.
number_decimal_places
Required. An integer that specifies the number of decimal places to round to.
Round the number 32.4678954 to two decimal places:
ROUND(32.4678954,2) returns 32.47
Page 393
Data Analysis and Visualization Guide - Platfora Expressions
Data Type Conversion Functions
Data type conversion functions allow you to cast data values from one data type to another. These
functions are used implicitly whenever you set the data type of a field or column in the Platfora user
interface. The supported data types are: INTEGER, LONG, DOUBLE, FIXED, DATETIME, and STRING.
EPOCH_MS_TO_DATE
EPOCH_MS_TO_DATE is a row function that converts LONG values to DATETIME values, where the
input number represents the number of milliseconds since the epoch.
EPOCH_MS_TO_DATE(long_expression)
Returns one value per row of type DATETIME in UTC format yyyy-MM-dd HH:mm:ss:SSS Z.
long_expression
Required. A field or expression of type LONG representing the number of milliseconds since the epoch
datetime (January 1, 1970 00:00:00:000 GMT).
Convert a number representing the number of milliseconds from the epoch to a human-readable date and
time:
EPOCH_MS_TO_DATE(1360260240000) returns 2013-02-07T18:04:00:000Z or February 7,
2013 18:04:00:000 GMT
Or if your data is in seconds instead of milliseconds:
EPOCH_MS_TO_DATE(1360260240 * 1000) returns 2013-02-07T18:04:00:000Z or February
7, 2013 18:04:00:000 GMT
TO_CURRENCY
This function is deprecated. Use the TO_FIXED function instead.
TO_DATE
TO_DATE is a row function that converts STRING values to DATETIME values, and specifies the format
of the date and time elements in the string.
TO_DATE(string_expression,"date_format")
Returns one value per row of type DATETIME (which by definition is in UTC).
string_expression
Required. A field or expression of type STRING.
date_format
Required. A pattern that describes how the date is formatted.
Use the following pattern symbols to define your date format. The count and ordering of the pattern
letters determines the datetime format. Any characters in the pattern that are not in the ranges of a-z and
Page 394
Data Analysis and Visualization Guide - Platfora Expressions
A-Z are treated as quoted delimiter text. For instance, characters such as slash (/) or colon (:) will appear
in the resulting output even they are not escaped with single quotes.
Table 35: Date Pattern Symbols
SymbolMeaning
Presentation
Examples
G
era
text
AD
C
century of era (0 or
greater)
number
20
Y
year of era (0 or
greater)
year
1996
x
week year
year
1996
w
week number of week
year
number
27
e
day of week (number)
number
2
E
day of week (name)
text
Tuesday; Tue
y
year
year
1996
D
day of year
number
189
M
month of year
month
July; Jul; 07
3 or more uses text, otherwise uses a
number
d
day of month
number
10
If the number of pattern letters is 3 or
more, the text form is used; otherwise
the number is used.
a
half day of day
text
PM
K
hour of half day (0-11) number
0
h
clock hour of half day
(1-12)
number
12
H
hour of day (0-23)
number
0
k
clock hour of day
(1-24)
number
24
m
minute of hour
number
30
s
second of minute
number
55
S
fraction of second
number
978
Page 395
Notes
Numeric presentation for year and week
year fields are handled specially. For
example, if the count of 'y' is 2, the year
will be displayed as the zero-based year
of the century, which is two digits.
If the number of pattern letters is 4 or
more, the full form is used; otherwise a
short or abbreviated form is used.
Data Analysis and Visualization Guide - Platfora Expressions
SymbolMeaning
Presentation
Examples
Notes
z
time zone
text
Pacific Standard
Time; PST
If the number of pattern letters is 4 or
more, the full form is used; otherwise a
short or abbreviated form is used.
Z
time zone offset/id
zone
-0800; -08:00;
America/
Los_Angeles
'Z' outputs offset without a colon, 'ZZ'
outputs the offset with a colon, 'ZZZ' or
more outputs the zone id.
'
escape character for
text-based delimiters
delimiter
''
literal representation of literal
a single quote
'
Define a new DATETIME computed field based on the order_date base field, which contains timestamps
in the format of: 2014.07.10 at 15:08:56 PDT:
TO_DATE(order_date,"yyyy.MM.dd 'at' HH:mm:ss z")
Define a new DATETIME computed field by first combining individual month, day, year, and
depart_time fields (using CONCAT), and performing a transformation on depart_time to make sure threedigit times are converted to four-digit times (using REGEX_REPLACE):
TO_DATE(CONCAT(month,"/",day,"/",year,":",REGEX_REPLACE(depart_time,"\b(\d{3})\b",
dd/yyyy:HHmm")
Define a new DATETIME computed field based on the created_at base field, which contains timestamps
in the format of: Sat Jan 25 16:35:23 +0800 2014 (this is the timestamp format returned by Twitter's
API):
TO_DATE(created_at,"EEE MMM dd HH:mm:ss Z yyyy")
TO_DOUBLE
TO_DOUBLE is a row function that converts STRING, INTEGER, LONG, or DOUBLE values to DOUBLE
(decimal) values.
TO_DOUBLE(expression)
Returns one value per row of type DOUBLE.
expression
Required. A field or expression of type STRING (must be numeric characters), INTEGER, LONG, or
DOUBLE.
Convert the values of the average_rating field to a double data type:
TO_DOUBLE(average_rating)
Convert the average_rating field to a double data type, but first transform the occurrence of any NA
values to NULL values using a CASE expression:
Page 396
Data Analysis and Visualization Guide - Platfora Expressions
TO_DOUBLE(CASE WHEN average_rating="N/A" then NULL ELSE average_rating
END)
TO_FIXED
TO_FIXED is a row function that converts STRING, INTEGER, LONG, or DOUBLE values to fixeddecimal values. Using a FIXED data type to represent monetary values allows you to calculate and
aggregate monetary values with accuracy to a ten-thousandth of a monetary unit.
TO_FIXED(expression)
Returns one value per row of type FIXED (fixed-decimal value to 10000th accuracy).
expression
Required. A field or expression of type STRING (must be numeric characters), INTEGER, LONG, or
DOUBLE.
Convert the opening_price field to a fixed decimal data type:
TO_FIXED(opening_price)
Convert the sale_price field to a fixed decimal data type, but first transform the occurrence of any N/A
string values to NULL values using a CASE expression:
TO_FIXED(CASE WHEN sale_price="N/A" then NULL ELSE sale_price END)
TO_INT
TO_INT is a row function that converts STRING, INTEGER, LONG, or DOUBLE values to INTEGER
(whole number) values. When converting DOUBLE values, everything after the decimal will be truncated
(not rounded up or down).
TO_INT(expression)
Returns one value per row of type INTEGER.
expression
Required. A field or expression of type STRING (must be numeric characters), INTEGER, LONG, or
DOUBLE.
Convert the values of the average_rating field to an integer data type:
TO_INT(average_rating)
Convert the flight_duration field to an integer data type, but first transform the occurrence of any NA
values to NULL values using a CASE expression:
TO_INT(CASE WHEN flight_duration="N/A" then NULL ELSE flight_duration
END)
Page 397
Data Analysis and Visualization Guide - Platfora Expressions
TO_LONG
TO_LONG is a row function that converts STRING, INTEGER, LONG, or DOUBLE values to LONG (whole
number) values. When converting DOUBLE values, everything after the decimal will be truncated (not
rounded up or down).
TO_LONG(expression)
Returns one value per row of type LONG.
expression
Required. A field or expression of type STRING (must be numeric characters), INTEGER, LONG, or
DOUBLE.
Convert the values of the average_rating field to a long data type:
TO_LONG(average_rating)
Convert the average_rating field to a long data type, but first transform the occurrence of any NA values
to NULL values using a CASE expression:
TO_LONG(CASE WHEN average_rating="N/A" then NULL ELSE average_rating
END)
TO_STRING
TO_STRING is a row function that converts values of other data types to STRING (character) values.
TO_STRING(expression)
TO_STRING(datetime_expression,date_format)
Returns one value per row of type STRING.
expression
A field or expression of type FIXED, STRING, INTEGER, LONG, or DOUBLE.
datetime_expression
A field or expression of type DATETIME.
date_format
If converting a DATETIME to a string, a pattern that describes how the date is formatted. See TO_DATE
for the date format patterns.
Convert the values of the sku_number field to a string data type:
TO_STRING(sku_number)
Convert values in the age column into a range-based groupings (binning), and cast output values to a
STRING:
Page 398
Data Analysis and Visualization Guide - Platfora Expressions
TO_STRING(CASE WHEN age <= 25 THEN "0-25" WHEN age <= 50 THEN "26-50"
ELSE "over 50" END)
Convert the values of a timestamp datetime field to a string, where the timestamp values are in the
format of: 2002.07.10 at 15:08:56 PDT:
TO_STRING(timestamp,"yyyy.MM.dd 'at' HH:mm:ss z")
Aggregate Functions
An aggregate function groups the values of multiple rows together based on some defined input
expression. Aggregate functions return one value for a group of rows, and are only valid for defining
measures in Platfora. Aggregate functions cannot be combined with row functions.
AVG
AVG is an aggregate function that returns the average of all valid numeric values. It sums all values in
the provided expression and divides by the number of valid (NOT NULL) rows. If you want to compute
an average that includes all values in the row count (including NULL values), you can use a SUM/COUNT
expression instead.
AVG(numeric_field)
Returns a value of type DOUBLE.
numeric_field
Required. A field of type INTEGER, LONG, DOUBLE, or FIXED. Unlike row functions, aggregate
functions can only take field names as input.
Get the average of the valid sale_amount field values:
AVG(sale_amount)
Get the average of the valid net_worth field values in the billionaires data set, which resides in the
samples namespace:
AVG([(samples) billionaires].net_worth)
Get the average of all page_views field values in the web_logs dataset (including NULL values):
SUM(page_views)/COUNT(web_logs)
COUNT
COUNT is an aggregate function that returns the number of rows in a dataset.
COUNT([namespace_name]dataset_name)
Returns a value of type INTEGER.
namespace_name
Page 399
Data Analysis and Visualization Guide - Platfora Expressions
Optional. The name of the namespace in which the dataset resides. If not specified, uses the default
namespace.
dataset_name
Required. The name of the dataset for which to obtain a count of rows. If you want to count rows of a
down-stream dataset that is related to the current dataset, you can specify the hierarchy of dataset names
in the format of:
parent_dataset_name.child_dataset_name.[...]
Count the rows in the sales dataset:
COUNT(sales)
Count the rows in the billionaires dataset, which resides in the samples namespace:
COUNT([(samples) billionaires])
Count the rows in the customer dataset, which is a related dataset down-stream of sales:
COUNT(sales.customers)
COUNT_VALID
COUNT_VALID is an aggregate function that returns the number of rows for which the given expression
is valid (excludes NULL values).
COUNT_VALID(field)
Returns a numeric value of type INTEGER.
field
Required. A field name. Unlike row functions, aggregate functions can only take field names as input.
Count the valid values in the page_views field:
COUNT_VALID(page_views)
DISTINCT
DISTINCT is an aggregate function that returns the number of distinct values for the given expression.
DISTINCT(field)
Returns a numeric value of type INTEGER.
field
Required. A field name. Unlike row functions, aggregate functions can only take field names as input.
Count the unique values of the user_id field in the currently selected dataset:
DISTINCT(user_id)
Page 400
Data Analysis and Visualization Guide - Platfora Expressions
Count the unique values of the name field in the billionaires dataset, which resides in the samples
namespace:
DISTINCT([(samples) billionaires].name)
Count the unique values of the customer_id field in the customer dataset, which is a related dataset
down-stream of web sales:
DISTINCT([web sales].customers.customer_id)
MAX
MAX is an aggregate function that returns the biggest value from the given input expression.
MAX(numeric_or_datetime_field)
Returns a numeric or datetime value of the same type as the input expression.
numeric_or_datetime_field
Required. A field of type INTEGER, LONG, DOUBLE, FIXED, or DATETIME. Unlike row
functions, aggregate functions can only take field names as input.
Get the highest value from the sale_amount field:
MAX(sale_amount)
Get the latest date from the Session Timestamp datetime field:
MAX([Session Timestamp])
MIN
MIN is an aggregate function that returns the smallest value from the given input expression.
MIN(numeric_or_datetime_field)
Returns a numeric or datetime value of the same type as the input expression.
numeric_or_datetime_field
Required. A field of type INTEGER, LONG, DOUBLE, FIXED, or DATETIME. Unlike row
functions, aggregate functions can only take field names as input.
Get the lowest value from the sale_amount field:
MIN(sale_amount)
Get the earliest date from the Session Timestamp datetime field:
MIN([Session Timestamp])
SUM
SUM is an aggregate function that returns the total of all values from the given input expression.
Page 401
Data Analysis and Visualization Guide - Platfora Expressions
SUM(numeric_field)
Returns a numeric value of the same type as the input expression.
numeric_field
Required. A field of type INTEGER, LONG, DOUBLE, or FIXED. Unlike row functions, aggregate
functions can only take field names as input.
Add the values of the sale_amount field:
SUM(sale_amount)
Add values of the session count field in the users dataset, which is a related dataset down-stream of
clicks:
SUM(clicks.users.[session count])
STDDEV
STDDEV is an aggregate function that calculates the population standard deviation for a group of
numeric values. Standard deviation is the square root of the variance.
STDDEV(numeric_field)
Returns a value of type DOUBLE. If there are less than two values in the input group, returns NULL.
numeric_field
Required. A field of type INTEGER, LONG, DOUBLE, or FIXED. Unlike row functions, aggregate
functions can only take field names as input.
Calculate the standard deviation of the values contained in the sale_amount field:
STDDEV(sale_amount)
VARIANCE
VARIANCE is an aggregate function that calculates the population variance for a group of numeric
values. Variance measures the amount by which all values in a group vary from the average value of
the group. Data with low variance contains values that are identical or similar. Data with high variance
contains values that are not similar. Variance is calculated as the average of the squares of the deviations
from the mean. Squaring the deviations ensures that negative and positive deviations do not cancel each
other out.
VARIANCE(numeric_field)
Returns a value of type DOUBLE. If there are less than two values in the input group, returns NULL.
numeric_field
Required. A field of type INTEGER, LONG, DOUBLE, or FIXED. Unlike row functions, aggregate
functions can only take field names as input.
Page 402
Data Analysis and Visualization Guide - Platfora Expressions
Get the population variance of the values contained in the sale_amount field:
VARIANCE(sale_amount)
ROLLUP and Window Functions
Window functions can only be used in conjunction with ROLLUP. ROLLUP is a modifier to an aggregate
expression that determines the partitioning and ordering of a rowset before the associated aggregate
function or window function is applied. ROLLUP defines a window or user-specified set of rows within
a query result set. A window function then computes a value for each row in the window. You can
use window functions to compute aggregated values such as moving averages, cumulative aggregates,
running totals, or a top N per group results.
ROLLUP
ROLLUP is a modifier to an aggregate function that turns a regular aggregate function into a windowed,
partitioned, or adaptive aggregate function. This is useful when you want to compute an aggregation
over a subset of rows within the overall result of a viz query.
ROLLUP aggregate_expression [ WHERE input_group_condition [...] ]
[ TO ([partitioning_columns])
[ ORDER BY (ordering_column [ASC | DESC])
ROWS|RANGE window_boundary [window_boundary]
| BETWEEN window_boundary AND window_boundary ]
]
where window_boundary can be one of:
UNBOUNDED
PRECEDING
value PRECEDING
value FOLLOWING
UNBOUNDED FOLLOWING
A regular measure is the result of an aggregation (such as SUM or AVG) applied to some fact or metric
column of a dataset. For example, suppose we had a dataset with the following rows and columns:
Date
Sale Amount
Product
Region
05/01/2013
100
gadget
west
05/01/2013
200
widget
east
06/01/2013
100
gadget
east
06/01/2013
400
widget
west
07/01/2013
300
widget
west
Page 403
Data Analysis and Visualization Guide - Platfora Expressions
Date
Sale Amount
Product
Region
07/01/2013
200
gadget
east
To define a regular measure called Total Sales, we would use the expression:
SUM([Sale Amount])
When this measure is used in a visualization, the group of input records passed into the aggregate
calculation is determined by the dimensions selected by the user when they create the viz. For example,
if the user chose Region as a dimension in the viz, there would be two input groups for which the
measure would be calculated:
Total Sales / Region
east
west
500
800
If an aggregate expression includes a ROLLUP clause, the column(s) specified in the TO clause of the
ROLLUP expression determine the additional partitions over which to compute the aggregate expression.
It divides the overall rows returned by the viz query into subsets or buckets, and then computes the
aggregate expression within each bucket. Every ROLLUP expression has implicit partitioning defined: an
absent TO clause treats the entire result set as one partition; an empty TO clause partitions by whatever
dimension columns are present in the viz query.
The WHERE clause is used to filter the input rows that flow into each partition. Input rows that meet the
WHERE clause criteria will be partitioned, and rows that don't will not be partitioned.
The ORDER BY with a RANGE or ROW clause is used to define a window frame within each partition
over which to compute the aggregate expression.
When a ROLLUP measure is used in a visualization, the aggregate calculation is computed across a
set of input rows that are related to, but separate from, the other dimension(s) used in the viz. This is
similar to the type of calculation that is done with a regular measure. However unlike a regular measure,
a ROLLUP measure does not cause the input rows to be grouped into a single result set; the input rows
still retain their separate identities. The ROLLUP clause determines how the input rows are split up for
processing by the ROLLUP's aggregate function.
ROLLUP expressions can be written to make the partitioning adaptive to whatever dimension columns
are selected in the visualization. This is done by using a reference name as the partitioning column, as
opposed to a regular column. For example, suppose we wanted to be able to calculate the total sales for
any granularity of date. We could create an adaptive measure called Rollup Sales to Date that partitions
total sales by date as follows:
ROLLUP SUM([Sale Amount]) TO (Date)
Page 404
Data Analysis and Visualization Guide - Platfora Expressions
When this measure is used in a visualization, the group of input records passed into the aggregate
calculation is determined by the dimension fields selected by the user in the viz, but partitioned by the
granularity of Date selected by the user. For example, if the user chose the dimensions Date.Month and
Region in the viz, then total sales would be grouped by month and region, but the ROLLUP measure
expression would aggregate the sales by month only.
Notice that the results for the east and west regions are the same - this is because the aggregation
expression is only considering rows that share the same month when calculating the sum of sales.
Month / (Measures) / Region
May 2013
June 2013
July 2013
Rollup Sales to Date
Rollup Sales to Date
Rollup Sales to Date
east | west
east | west
east | west
300 | 300
500 | 500
500 | 500
Suppose within the date partition, we wanted to calculate the cumulative total day to day. We could
define a window measure called Running Total to Date that looks at each day and all preceding days as
follows:
ROLLUP SUM([Sale Amount]) TO (Date) ORDER BY (Date.Date) ROWS UNBOUNDED
PRECEDING
When this measure is used in a visualization, the group of input records passed into the aggregate
calculation is determined by the dimension fields selected by the user in the viz, and partitioned by the
granularity of Date selected by the user. Within each partition the rows are ordered chronologically (by
Date.Date), and the sum amount is then calculated per date partition by looking at the current row (or
mark), and all rows that come before it within the partition. For example, if the user chose the dimension
Date.Month in the viz, then the ROLLUP measure expression would cumulatively aggregate the sales
within each month.
Month / (Measures) / Date.Date
May 2013
June 2013
July 2013
2013-05-01
2013-06-01
2013-07-01
Running Total to Date
Rollup Sales to Date
Rollup Sales to Date
300
500
500
Returns a numeric value per partition based on the output type of the aggregate_expression.
aggregate_expression
Page 405
Data Analysis and Visualization Guide - Platfora Expressions
Required. An expression containing an aggregate or window function. Simple aggregate
functions such as COUNT, AVG, SUM, MIN, and MAX are supported. Window functions
such as RANK, DENSE_RANK, and NTILE are supported and can only be used in
conjuction with ROLLUP.
Complex aggregate functions such as STDDEV and VARIANCE are not supported.
WHERE input_group_condition
The WHERE clause limits the group of input rows over which to compute the aggregate
expression. The input group condition is a Boolean (true or false) condition defined
using a comparison operator expression. Any row that does not satisfy the condition will
be excluded from the input group used to calculate the aggregated measure value. For
example (note that datetime values must be specified in yyyy-MM-dd format):
WHERE Date.Date BETWEEN 2012-06-01 AND 2012-07-31
WHERE Date.Year BETWEEN 2009 AND 2013
WHERE Company LIKE("Plat*")
WHERE Code IN("a","b","c")
WHERE Sales < 50.00
WHERE Age >= 21
You can specify multiple WHERE clauses in a ROLLUP expression.
TO ([partitioning_columns])
The TO clause is used to specify the dimension column(s) used to partition a group of
input rows. This allows you to calculate a measure value for a specific dimension group
(a subset of input rows) that are somehow related to the other dimension groups used in a
visualization (all input rows). It is possible to define an empty group (meaning all rows) by
using empty parenthesis.
When used in a visualization, measure values are computed for groups of input rows that
return the same value for the columns specified in the partitioning list. For example, if the
Date.Month column is used as a partitioning column, then all records that have the same
value for Date.Month will be grouped together in order to calculate the measure value.
The aggregate expression is applied to the group specified in the TO clause independently
of the other dimension groupings used in the visualization. Note that the partitioning
column(s) specified in the TO clause of an adaptive measure expression must also be
included as dimensions (or grouping columns) in the visualization.
A partitioning column can also be the name of a reference field. Using a reference field allows the
partition criteria to dynamically adapt based on any field of the referenced dataset that is used in a viz.
For example, if the partition column is a reference field pointing to the Date dimension, then any subfield of Date (Date.Year, Date.Month, etc.) can be used as the partitioning column by selecting it in a
viz.
Page 406
Data Analysis and Visualization Guide - Platfora Expressions
A TO clause with an empty partitioning list treats each mark in the result set as an input
group. For example, if the viz includes the Month and Region columns, then TO() would
be equivalent to TO(Month,Region).
ORDER BY (ordering_column)
The optional ORDER BY clause orders the input rows using the values in the specified
column within each partition identified in the TO clause. Use the ORDER BY clause
along with the ROWS or RANGE clauses to define windows over which to compute the
aggregate function. This is useful for computing moving averages, cumulative aggregates,
running totals, or a top value per group of input rows. The ordering column specified in the
ORDER BY clause can be a dimension, measure, or an aggregate expression (for example
ORDER BY (SUM(Sales))). If the ordering column is a dimension, it must be included in
the viz.
By default, rows are sorted in ascending order (low to high values). You can use the DESC
keyword to sort in descending order (high to low values).
ROWS | RANGE
Required when using ORDER BY. Further limits the rows within the partition by
specifying start and end points within the partition. This is done by specifying a range of
rows with respect to the current row either by logical association (RANGE) or physical
association (ROWS). Use either a ROWS or RANGE clause to express the window
boundary (the set of input rows in each partition, relative to the current row, over which to
compute the aggregate expression). The window boundary can include one, several, or all
rows of the partition.
When using the RANGE clause, the ordering column used in the ORDER BY clause must
be a sub-column of a reference to Platfora's built-in Date dimension dataset.
window_boundary
A window boundary is required when using either ROWS or RANGE. This defines the set
of rows, relative to the current row, over which to compute the aggregate expression. The
row order is based on the ordering specified in the ORDER BY clause.
A PRECEEDING clause defines a lower window boundary (the number of rows to include
before the current row). The FOLLOWING clause defines an upper window boundary
(the number of rows to include after the current row). The window boundary expression
must include either a PRECEEDING or FOLLOWING clause, or both. If PRECEEDING
is omitted, the current row is considered the first row in the window. Similarly, if
FOLLOWING is omitted, the current row is considered the last row in the window. The
UNBOUNDED keyword includes all rows in the direction specified. When you need to
specify both a start and end of a window, use the BETWEEN and AND keywords.
For example:
ROWS 2 PRECEDING means that the window is three rows in size, starting with two
rows preceding until and including the current row.
Page 407
Data Analysis and Visualization Guide - Platfora Expressions
ROWS BETWEEN 2 PRECEDING AND 5 FOLLOWING means that the window is eight
rows in size, starting with two rows preceding, the current row, and five rows following
the current row. The current row is included in the set of rows by default.
You can exclude the current row from the window by specifying a window start and end
point before or after the current row. For example:
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING starts the window
with all rows that come before the current row, and ends the window one row before the
current row, thereby excluding the current row from the window.
Calculate the percentage of flight records in the same departure date period. Note that the
departure_date field is a reference to the Date dataset, meaning that the group to which the
measure is applied can adapt to any downstream field of departure_date (departure_date.Year,
departure_date.Month, and so on). When used in a viz, this will calculate the percentage of flights for
each dimension group in the viz that share the same value for departure_date:
100 * COUNT(Flights) / ROLLUP COUNT(Flights) TO ([Departure Date])
Normalize the number of flights using the carrier American Airlines (AA) as the benchmark. This will
allow you to compare the number of flights for other carriers against the fixed baseline number of flights
for AA (if AA = 100 percent, then all other carriers will fall either above or below that percentage):
100 * COUNT(Flights) / ROLLUP COUNT(Flights) WHERE [Carrier Code]="AA"
Calculate a generic percentage of total sales. When this measure is used in a visualization, it will show
the percentage of total sales that a mark in the viz is contributing to the total for all marks in the viz. The
input rows depend on the dimensions selected in the viz.
100 * SUM(sales) / ROLLUP SUM(sales) TO ()
Calculate the cumulative total of sales for a given year on a month-to-month basis (year-to-month sales
totals):
ROLLUP SUM(sales) TO (Date.Year) ORDER BY (Date.Month) ROWS UNBOUNDED
PRECEDING
Calculate the cumulative total of sales (for all input rows) for all previous years, but exclude the current
year from the total.
ROLLUP SUM(sales) TO () ORDER BY (Date.Year) ROWS BETWEEN UNBOUNDED
PRECEDING AND 1 PRECEDING
DENSE_RANK
DENSE_RANK is a windowing aggregate function that orders rows by a measure value and assigns a
rank number to each row in the given partition. Rank positions are not skipped in the event of a tie.
DENSE_RANK must be used within a ROLLUP expression.
ROLLUP DENSE_RANK()
TO ([partitioning_column])
ORDER BY (measure_expression [ASC | DESC])
Page 408
Data Analysis and Visualization Guide - Platfora Expressions
ROWS|RANGE window_boundary [window_boundary]
| BETWEEN window_boundary AND window_boundary ]
where window_boundary can be one of:
UNBOUNDED
PRECEDING
value PRECEDING
value FOLLOWING
UNBOUNDED FOLLOWING
DENSE_RANK is a window aggregate function used to assign a ranking number to each row in a group.
If multiple rows have the same ranking value (there is a tie), then the tied rows are given the same rank
value and subsequent rank positions are not skipped.
The TO clause of the ROLLUP is used to specify the dimension column(s) used to partition a group of
input rows. To define a global ranking that can adapt to any dimension groupings used in a viz, specify
an empty TO clause.
The ORDER BY clause of the ROLLUP expression determines how to order the rows before they are
ranked. The ORDER BY clause should specify the measure field for which you want to calculate the
ranks. The ranked rows in the partition are numbered starting at one.
For example, suppose we had a dataset with the following rows and columns and you want to rank the
Quarters and Regions according to the values in the Sales column.
Quarter
Region
Sales
2010 Q1
North
100
2010 Q1
South
200
2010 Q1
East
300
2010 Q1
West
400
2010 Q2
North
400
2010 Q2
South
250
2010 Q2
East
150
2010 Q2
West
250
Supposing the lens has an existing measure field called Sales(Sum), you could then define a measure
called Sales_Dense_Rank using the following expression:
ROLLUP DENSE_RANK() TO () ORDER BY ([Sales(Sum)] DESC) ROWS UNBOUNDED
PRECEDING
Page 409
Data Analysis and Visualization Guide - Platfora Expressions
When you include the Quarter, Region, and Sales_Dense_Rank columns in the viz, you get the
following data points. Notice that tied values are given the same rank number and no rank positions are
skipped:
Quarter
Region
SalesRank
2010 Q1
North
6
2010 Q1
South
4
2010 Q1
East
2
2010 Q1
West
1
2010 Q2
North
1
2010 Q2
South
3
2010 Q2
East
5
2010 Q2
West
3
Returns a value of type LONG.
ROLLUP
Required. DENSE_RANK must be used within a ROLLUPROLLUP expression in place of the
aggregate_expression of the ROLLUP.
The TO clause of the ROLLUP expression specifies the dimension group(s) over which to calculate the
window function. An empty TO calculates the window function over all rows in the query as one group.
The ORDER BY clause of the ROLLUP expression specifies a measure field or aggregate expression.
Rank the sum of all sales in descending order, so the highest sales is given the ranking of 1.
ROLLUP DENSE_RANK() TO () ORDER BY ([Sales(Sum)] DESC) ROWS UNBOUNDED
PRECEDING
Rank the sum of all sales within a given quarter in descending order, so the highest sales in each quarter
is given the ranking of 1.
ROLLUP DENSE_RANK() TO (Quarter) ORDER BY ([Sales(Sum)] DESC) ROWS
UNBOUNDED PRECEDING
Page 410
Data Analysis and Visualization Guide - Platfora Expressions
NTILE
NTILE is a windowing aggregate function that divides a partitioned group of rows into the specified
number of buckets, and returns the bucket number to which the current row belongs. NTILE must be
used within a ROLLUP expression.
ROLLUP NTILE(integer)
TO ([partitioning_column])
ORDER BY (measure_expression [ASC | DESC])
ROWS|RANGE window_boundary [window_boundary]
| BETWEEN window_boundary AND window_boundary ]
where window_boundary can be one of:
UNBOUNDED
PRECEDING
value PRECEDING
value FOLLOWING
UNBOUNDED FOLLOWING
NTILE is a window aggregate function typically used to calculate percentiles. A percentile (or centile)
is a measure used in statistics indicating the value below which a given percentage of records in a group
falls. For example, the 20th percentile is the value (or score) below which 20 percent of the records may
be found. The term percentile is often used in the reporting of test scores. For example, if a score is in
the 86th percentile, it is higher than 86% of the other scores. The 25th percentile is also known as the
first quartile (Q1), the 50th percentile as the median or second quartile (Q2), and the 75th percentile as
the third quartile (Q3). In general, percentiles, deciles and quartiles are specific types of ntiles.
NTILE must be used within a ROLLUPROLLUP expression in place of the aggregate_expression
of the ROLLUP.
The TO clause of the ROLLUP is used to specify a fixed dimension column used to partition a group of
input rows. To define a global NTILE ranking that can adapt to any dimension groupings used in a viz,
specify an empty TO clause.
The ORDER BY clause of the ROLLUP expression determines how to order the rows before they are
divided into buckets. The ORDER BY clause should specify the measure field for which you want to
calculate NTILE bucket values. A centile would be 100 buckets, a decile would be 10 buckets, a quartile
4 buckets, and so on. The buckets in the partition are numbered starting at one.
For example, suppose we had a dataset with the following rows and columns and you want to divide
the year-to-date sales into four buckets (quartiles) with the highest quartile ranked as 1 and the
lowest ranked as 4. Supposing a measure field has been defined called Sum_YTD_Sales, defined as
Page 411
Data Analysis and Visualization Guide - Platfora Expressions
SUM([Sales YTD]), you could then define a measure called YTD_Sales_Quartile using the following
expression:
ROLLUP NTILE(4) TO() ORDER BY(Sum_YTD_Sales DESC) ROWS UNBOUNDED
PRECEDING AND UNBOUNDED FOLLOWING
Name
Gender
Sales YTD
YTD_Sales_Quartile
Chen
F
3,500,000
1
John
M
3,100,000
1
Pete
M
2,900,000
1
Daria
F
2,500,000
2
Jennie
F
2,200,000
2
Mary
F
2,100,000
2
Mike
M
1,900,000
3
Brian
M
1,700,000
3
Molly
F
1,500,000
3
Theresa
F
1,200,000
4
Hans
M
900,000
4
Ben
M
500,000
4
Because the TO clause of the ROLLUP expression is empty, the quartile partitioning adapts to whatever
dimensions are used in the viz. For example, if you include the Gender dimension field in the viz, the
quartiles would then be computed per gender. The following example divides each gender into buckets
with each gender having 6 year-to-date sales values. The two extra values (the remainder of 6 / 4) are
allocated to buckets 1 and 2, which therefore have one more value than buckets 3 or 4.
Name
Gender
Sales YTD
YTD_Sales_Quartile (partitioned by Gender)
Chen
F
3,500,000
1
Daria
F
2,500,000
1
Jennie
F
2,200,000
2
Mary
F
2,100,000
2
Molly
F
1,500,000
3
Page 412
Data Analysis and Visualization Guide - Platfora Expressions
Name
Gender
Sales YTD
YTD_Sales_Quartile (partitioned by Gender)
Theresa
F
1,200,000
4
John
M
3,100,000
1
Pete
M
2,900,000
1
Mike
M
1,900,000
2
Brian
M
1,700,000
2
Hans
M
900,000
3
Ben
M
500,000
4
Returns a value of type LONG.
ROLLUP
Required. NTILE must be used within a ROLLUPROLLUP expression in place of the
aggregate_expression of the ROLLUP.
The TO clause of the ROLLUP expression specifies the dimension group(s) over which to calculate the
window function. An empty TO calculates the window function over all rows in the query as one group.
The ORDER BY clause of the ROLLUP expression specifies a measure field or aggregate expression.
integer
Required. An integer that specifies the number of buckets to divide the partitioned rows into.
Perhaps the most common use case for NTILE is to get a global ranking of result rows. For example,
if you wanted to get the percentile of Total Records per City, you may think the expression to use is:
ROLLUP NTILE(100) TO (City) ORDER BY ([Total Records] DESC) ROWS UNBOUNDED
PRECEDING AND UNBOUNDED FOLLOWING.
However, by leaving the TO clause blank, the percentile buckets can adapt to whatever dimension(s)
you use in the viz. To calculate the Total Records percentiles by City, you could define a global
Total_Records_Percentiles measure and then use this measure in conjunction with the City dimension in
the viz (or any other dimension for that matter).
ROLLUP NTILE(100) TO () ORDER BY ([Total Records] DESC) ROWS UNBOUNDED
PRECEDING AND UNBOUNDED FOLLOWING
RANK
RANK is a windowing aggregate function that orders rows by a measure value and assigns a rank number
to each row in the given partition. Rank positions are skipped in the event of a tie. RANK must be used
within a ROLLUP expression.
ROLLUP RANK()
Page 413
Data Analysis and Visualization Guide - Platfora Expressions
TO ([partitioning_column])
ORDER BY (measure_expression [ASC | DESC])
ROWS|RANGE window_boundary [window_boundary]
| BETWEEN window_boundary AND window_boundary ]
where window_boundary can be one of:
UNBOUNDED
PRECEDING
value PRECEDING
value FOLLOWING
UNBOUNDED FOLLOWING
RANK is a window aggregate function used to assign a ranking number to each row in a group. If
multiple rows have the same ranking value (there is a tie), then the tied rows are given the same rank
value and the subsequent rank position is skipped.
The TO clause of the ROLLUP is used to specify the dimension column(s) used to partition a group of
input rows. To define a global ranking that can adapt to any dimension groupings used in a viz, specify
an empty TO clause.
The ORDER BY clause of the ROLLUP expression determines how to order the rows before they are
ranked. The ORDER BY clause should specify the measure field for which you want to calculate the
ranks. The ranked rows in the partition are numbered starting at one.
For example, suppose we had a dataset with the following rows and columns and you want to rank the
Quarters and Regions according to the values in the Sales column.
Quarter
Region
Sales
2010 Q1
North
100
2010 Q1
South
200
2010 Q1
East
300
2010 Q1
West
400
2010 Q2
North
400
2010 Q2
South
250
2010 Q2
East
150
2010 Q2
West
250
Page 414
Data Analysis and Visualization Guide - Platfora Expressions
Supposing the lens has an existing measure field called Sales(Sum), you could then define a measure
called Sales_Rank using the following expression:
ROLLUP RANK() TO () ORDER BY ([Sales(Sum)] DESC) ROWS UNBOUNDED
PRECEDING
When you include the Quarter, Region, and Sales_Rank columns in the viz, you get the following
data points. Notice that tied values are given the same rank number and the rank positions 2 and 5 are
skipped:
Quarter
Region
SalesRank
2010 Q1
North
8
2010 Q1
South
6
2010 Q1
East
3
2010 Q1
West
1
2010 Q2
North
1
2010 Q2
South
4
2010 Q2
East
7
2010 Q2
West
4
Returns a value of type LONG.
ROLLUP
Required. RANK must be used within a ROLLUPROLLUP expression in place of the
aggregate_expression of the ROLLUP.
The TO clause of the ROLLUP expression specifies the dimension group(s) over which to calculate the
window function. An empty TO calculates the window function over all rows in the query as one group.
The ORDER BY clause of the ROLLUP expression specifies a measure field or aggregate expression.
Rank the sum of all sales in descending order, so the highest sales is given the ranking of 1.
ROLLUP RANK() TO () ORDER BY ([Sales(Sum)] DESC) ROWS UNBOUNDED
PRECEDING
Rank the sum of all sales within a given quarter in descending order, so the highest sales in each quarter
is given the ranking of 1.
ROLLUP RANK() TO (Quarter) ORDER BY ([Sales(Sum)] DESC) ROWS UNBOUNDED
PRECEDING
Page 415
Data Analysis and Visualization Guide - Platfora Expressions
ROW_NUMBER
ROW_NUMBER is a windowing aggregate function that assigns a unique, sequential number to each row
in a group (partition) of rows, starting at 1 for the first row in each partition. ROW_NUMBER must be used
within a ROLLUP expression, which acts as a modifier for ROW_NUMBER. Use a column in the ROLLUP
order by clause to determine on which column to determine the row number.
ROLLUP ROW_NUMBER(integer)
TO ([partitioning_column])
ORDER BY (ordering_column [ASC | DESC])
ROWS|RANGE window_boundary [window_boundary]
| BETWEEN window_boundary AND window_boundary ]
where window_boundary can be one of:
UNBOUNDED
PRECEDING
value PRECEDING
value FOLLOWING
UNBOUNDED FOLLOWING
For example, suppose we had a dataset with the following rows and columns:
Quarter
Region
Sales
2010 Q1
North
100
2010 Q1
South
200
2010 Q1
East
300
2010 Q1
West
400
2010 Q2
North
400
2010 Q2
South
250
2010 Q2
East
150
2010 Q2
West
250
Suppose you want to assign a unique ID to the sales of each region by quarter in descending order. In
this example, a measure field is defined called Sum_Sales with the expression SUM(Sales). You could
then define a measure called SalesNumber using the following expression:
ROLLUP ROW_NUMBER() TO (Quarter) ORDER BY (Sum_Sales DESC) ROWS
UNBOUNDED PRECEDING
Page 416
Data Analysis and Visualization Guide - Platfora Expressions
When you include the Quarter, Region, and SalesNumber columns in the viz, you get the following data
points:
Quarter
Region
SalesNumber
2010 Q1
North
4
2010 Q1
South
3
2010 Q1
East
2
2010 Q1
West
1
2010 Q2
North
1
2010 Q2
South
2
2010 Q2
East
4
2010 Q2
West
3
Returns a value of type LONG.
None
Assign a unique ID to the sales of each region by quarter in descending order, so the highest sales is
given the number of 1.
ROLLUP ROW_NUMBER() TO (Quarter) ORDER BY (Sum_Sales DESC) ROWS
UNBOUNDED PRECEDING
User Defined Functions (UDFs)
User defined functions (UDFs) allow you to define your own per-row processing logic, and then expose
that functionality to users in the Platfora application expression builder.
User defined functions can only be used to implement new row functions, not
aggregate functions. If a computed field that uses a UDF is included in a lens, the
UDF will be executed once for each row during the lens build process. This is good
to keep in mind when writing UDF Java programs, so you do not write programs
that negatively impact lens build resources or execution times.
Writing a Platfora UDF Java Program
User defined functions (UDFs) are written in the Java programming language and implement the
Platfora-provided Java interface, com.platfora.udf.UserDefinedFunction.
Verify that any JAR file that the UDF will use is compatible with the existing libraries Platfora uses.
You can find those libraries in $PLATFORA_HOME/lib.
Page 417
Data Analysis and Visualization Guide - Platfora Expressions
To define a user defined function for Platfora, you must have the Java Development Kit (JDK) version 6
or 7 installed on the machine where you plan to do your development.
You will also need the com.platfora.udf.UserDefinedFunction interface Java code from
your Platfora master server installation. If you go to the $PLATFORA_HOME/tools/udf directory of
your Platfora master server installation, you will find two files:
• platfora-udf.jar – This is the compiled code for the
com.platfora.udf.UserDefinedFunction interface. You must link to this jar file (place it
in the CLASSPATH) when you compile your UDF Java program.
• /com/platfora/udf/UserDefinedFunction.java – This is the source code for the
Java interface that your UDF classes need to implement. The source code is provided as reference
documentation of the Platfora UserDefinedFunction interface. You can refer to this file when
writing your UDF Java programs.
1. Copy the file $PLATFORA_HOME/tools/udf/platfora-udf.jar to a directory on the
machine where you plan to develop and compile your UDF program.
2. Write a Java program that implements com.platfora.udf.UserDefinedFunction interface.
For example, here is a sample Java program that defines a REPEAT_STRING user defined function.
This simple function repeats an input string a specified number of times.
import java.util.List;
/**
* Sample user-defined function implementation that demonstrates
* how to create a REPEAT_STRING function.
*/
public class RepeatString implements
com.platfora.udf.UserDefinedFunction {
/**
* Returns the name of the user-defined function.
* The first character in the name must be a letter,
* and subsequent characters must be either letters,
* digits, or underscores. You cannot name your function
* the same name as an existing Platfora
* built-in function. Names are case-insensitive.
*/
@Override
public String getFunctionName() {
return "REPEAT_STRING";
}
/**
* Returns one of the following values, reflecting the
* return type of the user-defined function:
* DATETIME, DOUBLE, FIXED, INTEGER, LONG, or STRING.
*/
Page 418
Data Analysis and Visualization Guide - Platfora Expressions
@Override
public String getReturnType() {
return "STRING";
}
/**
* Returns an array of Strings, one for each of the
* input arguments to the user-defined function,
* specifying the required data type for each argument.
* The Strings should be of the following values:
* DATETIME, DOUBLE, FIXED, INTEGER, LONG, STRING.
*/
@Override
public String[] getArgumentTypes() {
return new String[] { "STRING", "INTEGER" };
}
/**
* Returns a human-readable description of what the function
* does, to be displayed to Platfora users in the
* Expression Builder. May return null.
*/
@Override
public String getDescription() {
return "The REPEAT_STRING function returns an input string
repeated " +
" a specified number of times.";
}
/**
* Returns a human-readable description explaining the
* value that the function returns, to be displayed to
* Platfora users in the Expression Builder. May return null.
*/
@Override
public String getReturnValueDescription() {
return "Returns one value per row of type STRING";
}
/**
* Returns a human-readable example of the function syntax,
* to be displayed to Platfora users in the Expression
* Builder. May return null.
*/
@Override
public String getExampleUsage() {
return "CONCAT(\"It's a \", REPEAT_STRING(\"Mad \",4), \"
World\")";
}
/**
Page 419
Data Analysis and Visualization Guide - Platfora Expressions
* The compute method performs the actual work of evaluating
* the user-defined function. The method should operate on the
* argument values provided to calculate the function return
value
* and return a Java object of the appropriate type to represent
* the return value. The following mapping describes the Java
* object type that is used to represent each Platfora data type:
* DATETIME -> java.util.Date
* DOUBLE -> java.lang.Double
* FIXED -> java.lang.Long
* INTEGER -> java.lang.Integer
* LONG -> java.lang.Long
* STRING -> java.lang.String
* Note on FIXED type: fixed-precision numbers in Platfora
* are represented as Longs that have been scaled by a
* factor of 10,000.
*
* In the event that the user-defined function
* encounters invalid inputs, or the function return value is not
* defined given the inputs provided, the compute method should
return
* null rather than throwing an exception. The compute method
should
* avoid throwing any exceptions.
*
* @param arguments The values of the function inputs.
*
* The entries in this list will match the specification
* provided by getArgumentTypes method in type, number, and order:
* for example, if getArgumentTypes returned an array of
* length 3 with the values STRING, DOUBLE, STRING, then
* the arguments parameter will hold be a list of 3 Java
* objects: a java.lang.String, a java.lang.Double, and a
* java.lang.String. Any of the values within the
* arguments List may be null.
*/
@Override
public String compute(List arguments) {
// cast the inputs to the correct types
final String toRepeat = (String) arguments.get(0);
final Integer numberOfRepeats = (Integer) arguments.get(1);
// check for invalid inputs
if (toRepeat == null || numberOfRepeats == null ||
numberOfRepeats < 0)
return null;
// repeat the input string the specified number of times
final StringBuilder builder = new StringBuilder();
for (int i = 0; i < numberOfRepeats; i++) {
builder.append(toRepeat);
}
return builder.toString();
Page 420
Data Analysis and Visualization Guide - Platfora Expressions
}
}
3. Compile your .java UDF program file into a .class file (make sure to link to the platforaudf.jar file or place it in your Java CLASSPATH).
The target Java version must be Java 1.6. Compiling with a target of Java 1.7 will result in an error
when the UDF is used.
For example, to compile the RepeatString.java program using Java 1.6:
javac -source 1.6 -target 1.6 -cp platfora-udf.jar RepeatString.java
4. Create a Java archive file (.jar) containing your .class file.
For example:
jar cf repeat-string-udf.jar RepeatString.class
After you have written and compiled your UDF Java program, you must then install and enable it on the
Platfora master server. See Adding a UDF to the Platfora Expression Builder.
Adding a UDF to the Platfora Expression Builder
After you have written and compiled a user defined function (UDF) Java class, you must install your
class on the Platfora master server and enable it so that it can be seen and used in the Platfora expression
builder.
This task is performed on the Platfora master server.
Before you begin, you must have written and compiled a Java class for your user defined function. See
Writing a Platfora UDF Java Program.
1. Create a directory named extlib in the Platfora data directory on the Platfora master server.
For example:
$ mkdir $PLATFORA_DATA_DIR/extlib
2. Copy the Java archive (.jar) file containing your UDF class to the $PLATFORA_DATA_DIR/
extlib directory on the Platfora master server.
For example:
$ cp repeat-string-udf.jar $PLATFORA_DATA_DIR/extlib/
3. Set the Platfora server configuration property, platfora.udf.class.names, so it contains
the name of your UDF Java class. If you have more than one class, separate the class names with a
comma.
For example, to set this property using the platfora-config command-line utility:
$ $PLATFORA_HOME/bin/platfora-config set --key
platfora.udf.class.names --value RepeatString
4. Restart the Platfora server:
$ platfora-services restart
Page 421
Data Analysis and Visualization Guide - Platfora Expressions
The user defined function will then be available for defining computed field expressions in the Add
Field dialog of the Platfora application.
Due to the way some web browsers cache Javascript files, the newly added
function may not appear in the Functions list for up to 24 hours. However, the
function is immediately available for use and recognized by the Expression autocomplete feature.
Regular Expression Reference
Regular expressions vary in complexity using a combination of basic constructs to describe a string
matching pattern. This reference describes the most common regular expression matching patterns, but
is not a comprehensive list.
Regular expressions, also referred to as regex or regexp, are a standardized collection of special
characters and constructs used for matching strings of text. They provide a flexible and precise language
for matching particular characters, words, or patterns of characters.
Platfora regular expressions are based on the pattern matching syntax of the Java programming
language. For more in depth information on writing valid regular expressions, refer to the Java regular
expression pattern documentation.
Page 422
Data Analysis and Visualization Guide - Platfora Expressions
Platfora makes use of regular expressions in the following contexts:
• In computed field expressions that use the REGEX or REGEX_REPLACE functions.
• In PARTITION expression statements for event series processing computed fields.
• In the Regex file parser in data ingest.
• In the data source location path descriptor in data ingest.
• In lens filter expressions.
Regex Literal and Special Characters
The most basic form of regular expression pattern matching is the match of a literal character or string.
Regular expressions also have a number of special characters that affect the way a pattern is matched.
This section describes the regular expression syntax for referring to literal characters, special characters,
non-printable characters (such as a tab or a newline), and special character escaping.
The most basic form of pattern matching is the match of literal characters. For example, if the regular
expression is foo and the input string is foo, the match will succeed because the strings are identical.
Certain characters are reserved for special use in regular expressions. These special characters are often
called metacharacters. If you want to use special characters as literal characters, they must be escaped.
Character Name
Character
Reserved For
opening bracket
[
start of a character class
closing bracket
]
end of a character class
hyphen
-
character ranges within a character class
backslash
\
general escape character
caret
^
beginning of string, negating of a character class
dollar sign
$
end of string
period
.
matching any single character
pipe
|
alternation (OR) operator
question mark
?
optional quantifier, quantifier minimizer
asterisk
*
zero or more quantifier
plus sign
+
once or more quantifier
opening parenthesis
(
start of a subexpression group
closing parenthesis
)
end of a subexpression group
Page 423
Data Analysis and Visualization Guide - Platfora Expressions
Character Name
Character
Reserved For
opening brace
{
start of min/max quantifier
closing brace
}
end of min/max quantifier
There are two ways to force a special character to be treated as an ordinary character:
• Precede the special character with a \ (backslash character). For example, to specify an asterisk as a
literal character instead of a quantifier, use \*.
• Enclose the special character(s) within \Q (starting quote) and \E (ending quote). Everything
between \Q and \E is then treated as literal characters.
• To escape literal double-quotes in a REGEX() expression, double the double-quotes (""). For
example, to extract the inches portion from a height field where example values are 6'2", 5'11":
REGEX(height, "\'(\d)+""$")
You can use special character sequence constructs to specify non-printable characters in a regular
expression. Some of the most commonly used constructs are:
Construct
Matches
\n
newline character
\r
carriage return character
\t
tab character
\f
form feed character
Regex Character Classes
A character class allows you to specify a set of characters, enclosed in square brackets, that can produce
a single character match. There are also a number of special predefined character classes (backslash
character sequences that are shorthand for the most common character sets).
A character class matches only to a single character. For example, gr[ae]y will match to gray or
grey, but not to graay or graey. The order of the characters inside the brackets does not matter.
You can use a hyphen inside a character class to specify a range of characters. For example, [az] matches a single lower-case letter between a and z. You can also use more than one range, or a
combination of ranges and single characters. For example, [0-9X] matches a numeric digit or the letter
X. Again, the order of the characters and the ranges does not matter.
Page 424
Data Analysis and Visualization Guide - Platfora Expressions
A caret following an opening bracket specifies characters to exclude from a match. For example,
[^abc] will match any character except a, b, or c.
Construct
Type
Description
[abc]
simple
matches
a
or
b
or
c
[^abc]
negation
matches any character except
a
or
b
or
c
[a-zA-Z]
range
matches
a
through
z
, or
A
through
Z
(inclusive)
[a-d[m-p]]
union
matches
a
through
d
, or
m
through
p
[a-z&&[def]]
intersection matches
d
,
e
, or
f
Page 425
Data Analysis and Visualization Guide - Platfora Expressions
Construct
Type
Description
[a-z&&[^xq]]
subtraction matches
a
through
z
, except for
x
and
q
Predefined character classes offer convenient shorthands for commonly used regular expressions.
Construct
Description
Example
.
matches any single character (except newline)
.at
matches "cat", "hat", and also"bat" in the
phrase "batch files"
\d
\D
matches any digit character (equivalent to
\d
[0-9]
)
matches "3" in "C3PO" and "2" in
"file_2.txt"
matches any non-digit character (equivalent to
\D
[^0-9]
matches "S" in "900S" and "Q" in "Q45"
)
\s
matches any single white-space character
(equivalent to
[ \t\n\x0B\f\r]
\sbook
matches "book" in "blue book" but
nothing in "notebook"
)
\S
matches any single non-white-space character
\Sbook
matches "book" in "notebook" but
nothing in "blue book"
\w
matches any alphanumeric character, including r\w*
underscore (equivalent to
matches "rm" and "root"
[A-Za-z0-9_]
)
\W
matches any non-alphanumeric character
(equivalent to
[^A-Za-z0-9_]
)
Page 426
\W
matches "&" in "stmd &" , "%" in
"100%", and "$" in "$HOME"
Data Analysis and Visualization Guide - Platfora Expressions
POSIX has a set of character classes that denote certain common ranges. They are similar to bracket and
predefined character classes, except they take into account the locale (the local language/coding system).
\p{Lower}
a lower-case alphabetic character,
[a-z]
\p{Upper}
an upper-case alphabetic character,
[A-Z]
\p{ASCII}
an ASCII character,
[\x00-\x7F]
\p{Alpha}
an alphabetic character,
[a-zA-z]
\p{Digit}
a decimal digit,
[0-9]
\p{Alnum}
an alphanumeric character,
[a-zA-z0-9]
\p{Punct}
a punctuation character, one of
!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~
\p{Graph}
a visible character,
[\p{Alnum}\p{Punct}]
\p{Print}
a printable character,
[\p{Graph}\x20]
\p{Blank}
a space or tab,
[ t]
\p{Cntrl}
a control character,
[\x00-\x1F\x7F]
\p{XDigit}
a hexidecimal digit,
[0-9a-fA-F]
\p{Space}
a whitespace character,
[ \t\n\x0B\f\r]
Page 427
Data Analysis and Visualization Guide - Platfora Expressions
Regex Line and Word Boundaries
Boundary matching constructs are used to specify where in a string to apply a matching pattern. For
example, you can search for a particular pattern within a word boundary, or search for a pattern at the
beginning or end of a line.
Construct
Description
Example
^
matches from the beginning of a line (multiline matches are currently not supported)
^172
matches from the end of a line (multi-line
matches are currently not supported)
d$
matches within a word boundary
\bis\b
$
\b
will match the "172" in IP address
"172.18.1.11" but not in "192.172.2.33"
will match the "d" in "maid" but not in
"made"
matches the word "is" in "this is my
island", but not the "is" part of "this" or
"island".
\bis
matches both "is" and the "is" in "island",
but not in "this".
\B
matches within a non-word boundary
\Bb
matches "b" in "sbin" but not in "bash"
Regex Quantifiers
Quantifiers specify how often the preceding regular expression construct should match. There are three
classes of quantifiers: greedy, reluctant, and possessive. The difference between greedy, reluctant, and
possessive quantifiers involves what part of the string to try for the initial match, and how to retry if the
initial attempt does not produce a match.
By default, quantifiers are greedy. A greedy quantifier will first try for a match with the entire input
string. If that produces a match, then the match is considered a success, and the engine can move on to
the next construct in the regular expression. If the first try does not produce a match, the engine backsoff one character at a time until a match is found. So a greedy quantifier checks for possible matches in
order from the longest possible input string to the shortest possible input string, recursively trying from
right to left.
Adding a ? (question mark) to a greedy quantifier makes it reluctant. A reluctant quantifier will first try
for a match from the beginning of the input string, starting with the shortest possible piece of the string
that matches the regex construct. If that produces a match, then the match is considered a success, and
the engine can move on to the next construct in the regular expression. If the first try does not produce
a match, the engine adds one character at a time until a match is found. So a reluctant quantifier checks
Page 428
Data Analysis and Visualization Guide - Platfora Expressions
for possible matches in order from the shortest possible input string to the longest possible input string,
recursively trying from left to right.
Adding a + (plus sign) to a greedy quantifier makes it possessive. A possessive quantifier is like a greedy
quantifier on the first attempt (it tries for a match with the entire input string). The difference is that
unlike a greedy quantifier, a possessive quantifier does not retry a shorter string if a match is not found.
If the initial match fails, the possessive quantifier reports a failed match. It does not make any more
attempts.
Greedy ReluctantPossessiveDescription
ConstructConstructConstruct
Example
?
matches the previous
character or construct once
or not at all
st?on
matches the previous
character or construct zero
or more times
if*
matches the previous
character or construct one
or more times
if+
matches the previous
character or construct
exactly
o{2}
*
+
{n}
??
*?
+?
{n}?
?+
*+
++
{n}+
matches "son" in "johnson" and "ston"
in "johnston" but nothing in "clinton" or
"version"
matches "if", "iff" in "diff", or "i" in "print"
matches "if", "iff" in "diff", but nothing in
"print"
matches "oo" in "lookup" and the first two o's
in "fooooo" but nothing in "mount"
n
times
{n,}
{n,}?
{n,}+
matches the previous
character or construct at
least
o{2,}
matches "oo" in "lookup" all five o's in
"fooooo" but nothing in "mount"
n
times
{n,m} {n,m}? {n,m}+ matches the previous
character or construct at
least
n
times, but no more than
m
times
Page 429
F{2,4}
matches "FF" in "#FF0000" and the last four
F's in "#FFFFFF"
Data Analysis and Visualization Guide - Platfora Expressions
Regex Capturing Groups
Groups are specified by a pair of parenthesis around a subpattern in the regular expression. By placing
part of a regular expression inside parentheses, you group that part of the regular expression together.
This allows you to apply regex operators and quantifiers to the entire group at once. Besides grouping
part of a regular expression together, parenthesis also create a capturing group. Capturing groups are
used to determine which matching values to save or return from your regular expression.
A regular expression can have more than one group and the groups can be nested. The groups are
numbered 1-n from left to right, starting with the first opening parenthesis. There is always an implicit
group 0, which contains the entire match. For example, the pattern:
(a(b*))+(c)
contains three groups:
group 1: (a(b*))
group 2: (b*)
group 3: (c)
By default, a group captures the text that produces a match. Besides grouping part of a regular
expression together, parenthesis also create a capturing group or a backreference. The portion of the
string matched by the grouped subexpression is captured in memory for later retrieval or use.
Capturing Groups and the Regex Line Parser
When you choose the Regex line parser during the Parse Data phase of the data ingest process,
Platfora uses capturing groups to determine what parts of the regular expression to return as columns.
The Regex line parser applies the user-supplied regular expression against each line in the source file,
and returns each capturing group in the regular expression as a column value.
For example, suppose you had user records in a file, and the lines were formatted like this:
Name: John Smith Address: 123 Main St. Age: 25 Comment: Active
Name: Sally R. Jones Address: 2 E. El Camino Real Age: 32
Name: Rod Rogers Address: 55 Elm Street Comment: Suspended
You could use the following regular expression to extract the Full Name, Last Name, Address, Age, and
Comment column values:
Name: (.*\s(\p{Alpha}+)) Address:\s+(.*) Age:\s+([0-9]+)(?:\s+Comment:\s
+(.*))?
Capturing Groups and the REGEX Function
The REGEX function can be used to extract a portion of a string value. For the REGEX function, only the
value of the first capturing group is returned. For example, if you wanted to match all possible email
address strings with a pattern of [email protected], but only return the provider portion of the
email address from the email field:
REGEX(email,"^[a-zA-Z0-9._%+-]+@([a-zA-Z0-9._-]+)\.[a-zA-Z]{2,4}$")
Capturing Groups and the REGEX_REPLACE Function
Page 430
Data Analysis and Visualization Guide - Platfora Expressions
The REGEX_REPLACE function is used to match a string value, and replace matched strings with
another value. The REGEX_REPLACE function takes three arguments: an input string, a matching regex,
and a replacement regex. Capturing groups can be used to capture backreferences (see Backreferences),
but do not control what portions of the match are returned (the entire match is always returned).
Backreferences allow you to capture and reuse a subexpression match inside the same regular
expression. You can reuse a capturing group as a backreference by referring to its group number
preceded by a backslash (for example, \1 refers to capturing group 1, \2 refers to capturing group 2,
and so on).
For example, if you wanted to match a pair of HTML tags and their enclosed text, you could capture the
opening tag into a backreference, and then reuse it to match the corresponding closing tag:
(<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\2>)
This regular expression contains two capturing groups, the outermost capturing group (which captures
the entire string), and one which captures the string matched by [A-Z][A-Z0-9]* into backreference
number two. This backreference can then be reused with \2 (backslash two) to match the corresponding
closing HTML tag.
When referring to capturing groups in the previous regular expression, the backreference syntax is
slightly different. The backreference group number is preceded by a dollar sign instead of a backslash
(for example, $1 refers to capturing group 1 of the previous expression). An example of this would be
the REGEX_REPLACE function, which takes two regular expressions: one for the matching string, and
one for the replacement string.
The following example matches the values in a phone_number field where phone number values are
formatted as xxx.xxx.xxxx, and replaces them with phone number values formatted as (xxx) xxxxxxx. Notice the backreferences in the replacement expression; they refer to the capturing groups of the
previous matching expression:
REGEX_REPLACE(phone_number,"([0-9]{3})\.([[0-9]]{3})\.([[0-9]]
{4})","\($1\) $2-$3")
In some cases, you may want to use parenthesis to group subpatterns, but not capture text. A noncapturing group starts with (?: (a question mark and colon following the opening parenthesis). For
example, h(?:a|i|o)t matches hat or hit or hot, but does not capture the a, i, or o from the
subexpression.
Page 431