Data Analysis and Visualization Guide Version 4.5 Copyright Platfora 2015 Last Updated: 10:15 p.m. June 28, 2015 Contents Document Conventions............................................................................................. 8 Contact Platfora Support...........................................................................................9 Copyright Notices...................................................................................................... 9 Chapter 1: About Data Analysis in Platfora............................................................ 11 FAQs - Data Analysis and Analytics in Platfora..................................................... 11 Quantitative Analysis Concepts.............................................................................. 15 Event Series Analysis Concepts.............................................................................16 Chapter 2: Get Going with Vizboards...................................................................... 20 FAQs—Vizboard Basics..........................................................................................20 The Vizboard Workspace........................................................................................22 How Interactive Viz Queries Work..........................................................................24 Turn Live Updates Off and On............................................................................... 25 Chapter 3: Get Going in Viz Builder.........................................................................27 About Drag and Drop Viz Building..........................................................................27 About the Different Viz Types.................................................................................29 About the Builder Drop Zones................................................................................ 31 About the Viz Toolbar and Menus.......................................................................... 34 FAQs—Viz Builder Basics...................................................................................... 34 Chapter 4: Get Data to Analyze................................................................................ 38 About Lenses and Viz Types..................................................................................38 Choose a Lens........................................................................................................40 Find Fields in a Lens.............................................................................................. 41 Understand Lens Field Types and Roles............................................................... 41 Understand the Data in a Lens Field......................................................................43 View the Data Lineage of a Field......................................................................43 View the Definition of a Field............................................................................ 44 View the Definition of a Segment Field............................................................. 45 Chapter 5: Control the Marks on a Viz.................................................................... 48 Encode Data Using Mark Drop Zones....................................................................48 About Mark Types...................................................................................................50 Use Bar Marks...................................................................................................50 Use Point Marks................................................................................................ 56 Use Line Marks..................................................................................................60 Use Area Marks.................................................................................................63 Use Path Marks................................................................................................. 65 Data Analysis and Visualization Guide - Contents Use Polygon Marks........................................................................................... 66 Use Text Marks................................................................................................. 66 Adjust Mark Appearance.........................................................................................69 Show Mark Outlines.......................................................................................... 69 Adjust Mark Color.............................................................................................. 71 Adjust Mark Size................................................................................................73 Adjust Mark Opacity.......................................................................................... 75 Adjust Mark Shape............................................................................................ 77 Adjust Mark Labels............................................................................................ 81 Chapter 6: Control Field Labels in a Viz..................................................................84 Truncate Field Labels in a Viz................................................................................84 Change Field Labels Width in a Viz....................................................................... 87 Change Field Display Names on an Axis............................................................... 90 Hide Field Name and Values on a Chart Axis........................................................92 Change the Number Formatting of a Measure....................................................... 95 Chapter 7: Sort Viz Data............................................................................................97 Default Sort Behavior..............................................................................................97 Change the Sort Order of a Dimension Axis.......................................................... 98 Chapter 8: Filter Viz Data........................................................................................ 101 FAQs—Viz Filters..................................................................................................101 Add a Filter on a Field..........................................................................................106 Filter on Dimension Fields...............................................................................107 Filter on Date Fields........................................................................................ 114 Filter on Measure Fields..................................................................................116 Create a Page Filter............................................................................................. 117 Toggle Filter Include/Exclude Mode......................................................................119 Filter by Selection................................................................................................. 120 Filter by Limit.........................................................................................................121 Chapter 9: Build a Chart Viz................................................................................... 124 FAQs—Chart Visualizations.................................................................................. 124 About the Chart Viz Workspace........................................................................... 124 Chapter 10: About Chart Viz Axes......................................................................... 127 Use Multiple Fields on an Axis............................................................................. 128 Transpose the X and Y Axis.................................................................................131 Change Axis Options for Measure Data............................................................... 131 Change the Value Range of a Measure Axis..................................................132 Change the Scale of a Measure Axis............................................................. 134 Change Axis Options for Dimension Data............................................................ 138 Page 3 Data Analysis and Visualization Guide - Contents Change the Type of a Dimension Axis........................................................... 138 Chapter 11: Build a Cross-Tab Viz......................................................................... 142 FAQs—Cross-Tab Visualizations.......................................................................... 142 Enable Cross-Tab Totals...................................................................................... 143 Chapter 12: Build a Polar Chart Viz....................................................................... 145 FAQs—Polar Chart Visualizations........................................................................ 145 About the Polar Chart Viz Workspace.................................................................. 150 Chapter 13: Build a Geo Map Viz........................................................................... 152 FAQs—Geo Map Visualizations............................................................................152 About the Geo Map Viz Workspace..................................................................... 157 Chapter 14: Build a Funnel Viz............................................................................... 159 FAQs—Funnel Analysis Visualizations................................................................. 159 About Event Series Analysis.................................................................................160 About the Funnel Analysis Viz Workspace...........................................................161 Define and Analyze Funnel Stages...................................................................... 164 Analyze Funnel Stages Across Dimensions......................................................... 165 Chapter 15: Explore Marks in a Viz........................................................................166 Select and Highlight Marks on a Viz.................................................................... 166 Understand Data Values Not Displayed in Viz..................................................... 168 View the Data Values for a Mark......................................................................... 171 Zoom and Pan in a Viz.........................................................................................171 Drill Down Through Dimension Fields.................................................................. 175 About Drilling Down......................................................................................... 175 Drill Down FAQ................................................................................................178 Drill Down on a Field Value in a Chart Axis....................................................179 Drill Down on a Viz Mark................................................................................ 181 Drill Down on a Cross-Tab Cell.......................................................................183 View a Drill Path in a Viz................................................................................ 186 Drill Up............................................................................................................. 187 Chapter 16: Prepare Pages and Dashboards........................................................ 190 FAQs—Vizboard Pages........................................................................................ 190 Resize a Page to Fit the Browser Window...........................................................196 Show and Hide Tool Panels................................................................................. 197 Manage Viz Layout............................................................................................... 197 Edit a Visualization................................................................................................198 Arrange Visualizations on a Vizboard Page......................................................... 198 Preview a Vizboard with View Only Permission................................................... 201 Page 4 Data Analysis and Visualization Guide - Contents Chapter 17: Share and Collaborate........................................................................ 203 Set Vizboard Permissions..................................................................................... 204 View Vizboard with View Only Permission...................................................... 205 Manage Vizboard Comments................................................................................206 FAQs—Vizboard Comments............................................................................206 Create a Comment on a Viz........................................................................... 209 Share a Link to a Vizboard...................................................................................212 Export Viz Data..................................................................................................... 213 Export a Viz Image............................................................................................... 214 Email a Single Viz as an Image........................................................................... 214 Share Vizboard as a PDF.....................................................................................216 FAQs—Vizboard PDFs.................................................................................... 216 Export a Vizboard as a PDF Manually............................................................ 218 Email a Vizboard as a PDF Manually............................................................. 220 Email a Vizboard as a PDF on a Schedule.................................................... 221 How Platfora Renders a Vizboard as a PDF...................................................223 Export a Viz as a New Dataset............................................................................ 223 Chapter 18: Request or Derive Additional Lens Fields........................................ 227 Vizboard Computed Fields....................................................................................227 FAQs—Vizboard Computed Fields..................................................................227 Add a Vizboard Computed Field..................................................................... 233 Combined Fields................................................................................................... 234 About Combined Fields................................................................................... 234 Create Combined Field....................................................................................236 Request Additional Lens Data.............................................................................. 238 Create a New Lens From Viz..........................................................................238 Segments...............................................................................................................239 FAQs—Segments............................................................................................ 239 Create Segments............................................................................................. 246 Chapter 19: Save Your Work in a Vizboard...........................................................253 Manage Vizboard Versions................................................................................... 254 Restore a Vizboard to a Previous Version........................................................... 255 Exit a Vizboard without Saving............................................................................. 256 Using Undo and Redo in a Vizboard....................................................................256 Duplicate a Vizboard.............................................................................................257 Chapter 20: Trace the Data Lineage of Viz Fields.................................................259 Export Viz Data Lineage....................................................................................... 260 What Data Lineage Includes.................................................................................260 Interpret Data Lineage Levels...............................................................................261 Page 5 Data Analysis and Visualization Guide - Contents Chapter 21: Viz Example Gallery............................................................................ 265 Axis Chart Viz Examples...................................................................................... 265 Chart Type: Simple Bar................................................................................... 266 Chart Type: Bars with Different Color Values................................................. 267 Chart Type: Stacked Bar................................................................................. 268 Chart Type: Split Bar with Values................................................................... 269 Chart Type: Bar with Variable Widths............................................................. 270 Chart Type: Point Plot..................................................................................... 271 Chart Type: Scatter Plot.................................................................................. 272 Chart Type: Color Encoded Scatter Plot......................................................... 273 Chart Type: Bubble Chart................................................................................274 Chart Type: Color Encoded Bubble Chart.......................................................275 Chart Type: Gradient Grouped Scatter Plot.................................................... 276 Chart Type: Shape Encoded Scatter Plot....................................................... 277 Chart Type: Heatmap...................................................................................... 278 Chart Type: Size Encoded Heatmap...............................................................279 Chart Type: Size Encoded Matrix................................................................... 280 Chart Type: Line Chart.................................................................................... 281 Chart Type: Multi-Series Line Chart................................................................ 282 Chart Type: Color Encoded Multi-Series Line Chart....................................... 283 Chart Type: Variable Color Line Chart............................................................ 284 Chart Type: Variable Thickness Line Chart.....................................................285 Non-Axis Chart Viz Examples...............................................................................286 Chart Type: Packed Bubbles...........................................................................286 Chart Type: Packed Bubbles with Different Colors......................................... 287 Chart Type: Text Gauge..................................................................................288 Chart Type: Word Cloud..................................................................................289 Polar Chart Viz Examples.....................................................................................290 Polar Chart Type: Donut..................................................................................290 Polar Chart Type: Size Encoded Donut.......................................................... 291 Polar Chart Type: Pie...................................................................................... 292 GeoMap Viz Examples......................................................................................... 293 Chart Type: Simple Geo Map..........................................................................293 Chart Type: Color-Encoded Geo Map.............................................................294 Chart Type: Size-Encoded Geo Map.............................................................. 295 Cross-Tab Viz Examples...................................................................................... 295 Cross-Tab Type: Simple.................................................................................. 296 Cross-Tab Type: With Dimensional Groupings............................................... 297 Cross-Tab Type: Show Totals......................................................................... 298 Chapter 22: Platfora Expressions...........................................................................299 Expression Building Blocks................................................................................... 299 Functions in an Expression............................................................................. 299 Page 6 Data Analysis and Visualization Guide - Contents Operators in an Expression.............................................................................301 Fields in an Expression................................................................................... 303 Literal Values in an Expression.......................................................................305 PARTITION Expressions and Event Series Processing (ESP).............................306 How Event Series Processing Works..............................................................306 Best Practices for Event Series Processing (ESP)......................................... 310 ROLLUP Measures and Window Expressions..................................................... 312 Understand ROLLUP Measures...................................................................... 312 Understand ROLLUP Window Expressions.................................................... 315 Computed Field Examples.................................................................................... 316 Troubleshoot Computed Field Errors....................................................................318 Write a Lens Query...............................................................................................320 FAQs - Expression Basics.................................................................................... 321 Expression Language Reference..........................................................................322 Expression Quick Reference........................................................................... 322 Comparison Operators.....................................................................................337 Logical Operators.............................................................................................338 Arithmetic Operators........................................................................................ 339 Conditional and NULL Processing...................................................................339 Event Series Processing..................................................................................341 String Functions............................................................................................... 349 URL Functions................................................................................................. 377 IP Address Functions...................................................................................... 382 Date and Time Functions................................................................................ 384 Math Functions................................................................................................ 390 Data Type Conversion Functions.................................................................... 394 Aggregate Functions........................................................................................399 ROLLUP and Window Functions.....................................................................403 User Defined Functions (UDFs)...................................................................... 417 Regular Expression Reference........................................................................422 Page 7 Preface This guide provides information and instructions for analyzing data in Platfora®. This guide is intended for data analysts and business analysts who are responsible for exploring data, finding insights, and building dashboards and reports. Knowledge of business intelligence and data analysis is recommended. Document Conventions This documentation uses certain text conventions for language syntax and code examples. Convention Usage Example $ Command-line prompt proceeds a command to be entered in a command-line terminal session. $ ls $ sudo Command-line prompt $ sudo yum install open-jdk-1.7 for a command that requires root permissions (commands will be prefixed with sudo). UPPERCASE Function names and keywords are shown in all uppercase for readability, but keywords are caseinsensitive (can be written in upper or lower case). SUM(page_views) italics Italics indicate a usersupplied argument or variable. SUM(field_name) [ ] (square Square brackets denote optional syntax items. CONCAT(string_expression[,...]) ... (elipsis) An elipsis denotes a syntax item that can be repeated any number of times. CONCAT(string_expression[,...]) brackets) Page 8 Data Analysis and Visualization Guide - Introduction Contact Platfora Support For technical support, you can send an email to: [email protected] Or visit the Platfora support site for the most up-to-date product news, knowledge base articles, and product tips. http://support.platfora.com To access the support portal, you must have a valid support agreement with Platfora. Please contact your Platfora sales representative for details about obtaining a valid support agreement or with questions about your account. Copyright Notices Copyright © 2012-15 Platfora Corporation. All rights reserved. Platfora believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” PLATFORA CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any Platfora software described in this publication requires an applicable software license. Platfora®, You Should Know™, Interest Driven Pipeline™, Fractal Cache™, and Adaptive Job Synthesis™ are trademarks of the Platfora Corporation. Apache Hadoop™ and Apache Hive™ are trademarks of the Apache Software Foundation. All other trademarks used herein are the property of their respective owners. Embedded Software Copyrights and License Agreements Platfora contains the following open source and third-party proprietary software subject to their respective copyrights and license agreements: • Apache Hive PDK • dom4j • freemarker • GeoNames • Google Maps API • javassist Page 9 Data Analysis and Visualization Guide - Introduction • javax.servlet • Mortbay Jetty 6.1.26 • OWASP CSRFGuard 3 • PostgreSQL JDBC 9.1-901 • Scala • sjsxp : 1.0.1 • Unboundid Page 10 Chapter 1 About Data Analysis in Platfora Got questions about the types of data analysis / analytics you can do in Platfora? Want to know how you go about analyzing data once you have it in a lens? This section explains how data analysts work with data in Platfora, and the major concepts of data analysis in Platfora. Topics: • FAQs - Data Analysis and Analytics in Platfora • Quantitative Analysis Concepts • Event Series Analysis Concepts FAQs - Data Analysis and Analytics in Platfora This topic answers the most frequently asked questions (FAQs) about data analysis and analytics in Platfora. What is the difference between 'data analysis' and 'data analytics'? The terms analysis and analytics are often used interchangeably, and indeed the differences between the two terms is subtle. Analysis refers to the process of analyzing data, where analytics refers to the technology and methodologies involved in analyzing data. Basically, analysis and analytics perform the same function, but analytics refers to a specific application of statistical methodology or computer technology applied to data analysis. What is 'big data analytics'? Big Data Analytics refers to the collection of technologies and methodologies for processing and analyzing large amounts of data of many different types. The primary goal of big data analytics is to uncover hidden patterns, unknown correlations, and other useful information in order to make better business decisions. Big data analytics allows business analysts and data scientists to analyze large volumes of transactional data, as well as other data sources that traditional data warehouses and business intelligence reporting (BI) programs cannot handle. These other data sources may include things like server logs, web clickstream data, social media reports, mobile device records, and machine-generated sensor data. Page 11 Data Analysis and Visualization Guide - About Data Analysis in Platfora Platfora is an end-to-end big data analytics solution for high-volume, multi-structured data stored natively in Hadoop. How does Platfora prepare data for analysis? Platfora prepares data for analysis by building a Lens, Platfora's proprietary data storage object. Lenses contain pre-aggregated, compressed, columnar data that is optimized for interactive analysis. Lenses are loaded into memory on the Platfora servers as they are used. This makes the experience of building visualizations (lens queries) fast and responsive. Analysts can build their own lenses by choosing fields of interest from a Dataset in the Platfora data catalog. Platfora datasets point to raw data in Hadoop. There are two kinds of lenses you can build in Platfora - an Aggregate Lens or Event Series Lens (ESL). Aggregate lenses are the most common lens type and the most flexible in terms of the types of analysis you can do. How can I tell where the data came from and how it was prepared? Platfora can show the data lineage of any field in a lens, all the way back to the raw data file(s) in Hadoop. Analysts can see every transformation that happened to the data along the way. You might want to view data lineage to address any of the following questions about the data: • How can I reproduce this result? • Where did this data come from? • How recent is the data? What kinds of analysis can I do in Platfora? The Platfora documentation uses two broad categories that correspond to how Platfora prepares the data for analysis. Using the two lens types offered by Platfora, there are many different kinds of analysis you can do, and many different analytics methodologies you can apply. These are just some of the general types of analysis possible: Page 12 Data Analysis and Visualization Guide - About Data Analysis in Platfora • Quantitative Analysis When you build an Aggregate Lens in Platfora, you are preparing data for quantitative analysis. Data is pre-aggregated. This is a broad category that encompasses many different analysis techniques and methodologies, such as: • Descriptive Analysis - Describe the main features of a large collection of data. • Confirmatory Analysis - Confirm or negate a hypothesis. • Exploratory Analysis - Find previously unknown relationships in the data. • Inferential Analysis - Use a smaller sample of data to learn something about a bigger population. • Causal Analysis - Find out what happens to one variable when you change another. • Event Series Analysis When you build an Event Series Lens in Platfora, you are preparing data for event series analysis (also known as Time Series Analysis or Behavioral Analysis). Data is not pre-aggregated; it is grouped by a common entity (such as user id) and ordered by time to facilitate searching for patterns across multiple event records and datasets. What tools does Platfora provide for analyzing data? Platfora has several tools for querying the prepared data in a lens, but the primary way to analyze data in Platfora is to use the Viz Builder tools in a Vizboard. By dragging fields to the various drop zones in the viz builder panel, you can dynamically generate lens queries that are rendered as Visualizations (charts, graphs, geo maps, summary tables, etc.). In addition to the viz builder tools in the vizboard, Platfora has other, non-visual ways to access the data in a lens: • Platfora Expression Language - Platfora has an extensive library of built-in functions that you can use to further manipulate data in a lens to achieve the results you want. You can use Platfora expressions to define the computed fields needed for your analysis. Page 13 Data Analysis and Visualization Guide - About Data Analysis in Platfora • Programmatic Query Access - Using Platfora's SQL-like query language, you can submit queries to a Platfora lens using the REST API. This allows you to access data stored in Platfora from other analytics tools, such as R. • Lens Data Export- You can also export the data in a lens back to HDFS or download portions of a lens to your desktop. This allows you to use data prepared by Platfora in other applications or data workflows. What kinds of charts can I create in a Platfora vizboard? Platfora's viz builder is very flexible and allows you to dynamically build many different kinds of charts and graphs. (which are called a Visualization or Viz in Platfora). The types of charts available depends on how the data was prepared (the Platfora lens type), and the combination of data fields used in the viz. Some of the charts you can build with an aggregate lens are: • Bar / Column Charts • Histograms • Line Charts • Area Charts • Heatmaps • X/Y (Scatterplot) Charts • Bubble Charts • Pie or Doughnut Charts • Text Gauges / Word Clouds • Geo Maps • Cross-Tabs (Pivot Tables) With an event series lens, there is currently only one available chart type: • Funnel Chart Can Platfora do advanced analytics, such as predictive analytics? Predictive analytics encompasses a variety of statistical techniques to analyze historical data and model patterns that can identify potential future risks or opportunities. When building models for predictive analysis, a lot of the work involves analyzing the historical data to look for relationships and patterns. Platfora allows data scientists to build aggregate data samples (lenses) for developing and validating their models in an iterative, self-service fashion. Platfora provides an R connector to allow data scientists to build lenses in Platfora to sample data directly from Hadoop. They can then use R Studio to develop and test their predictive models against a Platfora lens. Scoring a model involves running a job directly on the Hadoop cluster using a tool such as Revolution R or Hive. The output of the scored model is then written back to HDFS, where Platfora can analyze and visualize the final results. Page 14 Data Analysis and Visualization Guide - About Data Analysis in Platfora Quantitative Analysis Concepts This topic explains the format of data in an aggregate lens, the most common lens type in Platfora. Aggregate lenses are built to support quantitative data analysis. Data fields in an aggregate lens serve one of two key roles in an analysis: measure or dimension. The concepts of measures and dimensions are important to understand, as they impact how data is prepared and analyzed in Platfora. Measures (The Quantitative Data) The measure fields of a lens hold the data to be quantified. Measure fields are always listed first in an aggregate lens, with a special icon to denote that they are measures. Measures provide the basis for quantitative analysis in a visualization or lens query. A measure is a numeric value representing an aggregation of values from multiple rows. For example, measures contain data such as total dollar amounts, average number of users, count distinct of users, and so on. Measure values always result from an aggregate function. Examples of aggregate functions include COUNT, DISTINCT, AVG, SUM, MIN, MAX, VARIANCE, and so on. In some data analysis tools, measures (or metrics as they are sometimes called) can be aggregated at the time of analysis because the amount of data to aggregate is relatively small. In Platfora, however, the data in a lens is pre-aggregated to optimize performance of big data queries. Therefore, you must decide how to aggregate the metrics of your dataset up front. You do this by defining measures either in the dataset or at lens build time. When you go to analyze the data in a vizboard, you can only do quantitative analysis on the measures you have available in the lens. Measures answer 'how' questions about data such as 'how many' or 'how long'? Measure Type Examples of Measure Fields How much or how many sum of all sales, count of distinct users, maximum unit price, minimum feedback score How long total minutes on a call, number of days between sales To what degree maximum number of users, minimum page hits Measures are always numeric data that can be represented on a continuous scale. However, not all numbers in a dataset are continuous and measurable. For example, a zip code is a number, but not something that makes sense to add or average. It might make sense, however, to count how many distinct zip codes you have. Choosing the right fields in a dataset to serve as measures, and the right aggregations to apply to those fields, depends on how you want to analyze the data. Dimensions (The Categorical Data) The dimension fields hold the categorical values by which you analyze the measure data. Dimensions are used to summarize, filter, and group quantitative data in order to analyze it from different Page 15 Data Analysis and Visualization Guide - About Data Analysis in Platfora perspectives. For example, a Product dimension can help you understand which products generate the most sales for your business. A Date dimension can show you the breakdown of sales by year, quarter, month, or day. A dimension field is data that answers 'who', 'what', 'where', and 'when' questions. Dimension Type Examples of Dimension Fields Who customer name, gender, title, cookie, social security number What product type, call type, account type, product ID Where geo location, zip code, sales region, country, state When date, timestamp, month Dimensions can be numeric , text , or date/time-based dimensions are the values you would GROUP BY. data. If you are familiar with SQL, Event Series Analysis Concepts This topic explains the different methods for doing event series analysis in Platfora. Event series analysis in Platfora involves partitioning events by some entity (such as a user), ordering events records by their timestamp, and then looking for interesting patterns of behavior. There are two ways to do this in Platfora -- add special event series processing (ESP) computed fields to a single dataset or build a special event series lens (ESL) that contains event records from one or more datasets. Event Series Processing (ESP) Computed Fields Time-stamped event data is very common in big data. With machine-generated data, a single application can generate millions of events in a single day. For example, when users visit a website, they generate events by clicking links or viewing pages. Each action they take is written to a log file along with the timestamp. Event series processing (ESP) computed fields iterate over multiple rows in a single dataset to find interesting patterns. For example, suppose you wanted to know which visitors to your website viewed a product page, then added the product to their cart, then left the site without making a purchase. You could create an ESP computed field that looks at each visitor session, then outputs an 'abandoned cart' flag for each session that met that criteria. ESP computed fields can only be created in a dataset using a special PARTITION expression. PARTITION expressions are very powerful and flexible, but can also be complicated to write. ESP computed fields are an advanced feature intended for data administrators. To analysts, ESP computed fields behave just like any other field in a dataset. Page 16 Data Analysis and Visualization Guide - About Data Analysis in Platfora You should use ESP computed fields when: • You want to analyze event series data in a single dataset only • Your dataset has time-stamped event records • You want to do visual analysis on the results (include the ESP field in a lens) • You have complex patterns you want to evaluate • You want to use the output of an ESP field in another calculation (for example, create a 'bounce rate' measure from the results of other ESP fields) Entity-Centric Data Modeling and Event Series Lenses (ESL) In some cases, a company may be tracking user events across several applications. For example, a web application tracks the pages the user visits, an ad server tracks which advertisements a user is shown, and a transactional system tracks the user's purchases. As long as all events have a common entity (the user), they can be combined into a single event series lens and analyzed together in Platfora. In order to create an ESL, data administrators first have to model the event datasets around a common entity dataset. Page 17 Data Analysis and Visualization Guide - About Data Analysis in Platfora Once the data is modeled in this way, then analysts can create event series lenses that include event records from multiple datasets. In a vizboard, analysts can then use an event series lens to do funnel analysis. Funnels are currently the only viz type available on event series lens data. You should use event series lenses if: • You want to analyze event across multiple datasets • All of your event datasets have time-stamped event records • All of your event datasets have a common entity field (such as a user ID) • The common entity data can be modeled into a separate dataset with a primary key • You are only interested in doing funnel analysis on your event data (Funnel is the only viz type currently supported for ESLs) Page 18 Data Analysis and Visualization Guide - About Data Analysis in Platfora • You want to define segments of users based on different event criteria (for example, create a segment of users that visited your website and then called customer service) Page 19 Chapter 2 Get Going with Vizboards In Platfora, the vizboard is where you create and manage visualizations around a particular project or subject area. Typically, you would create one vizboard for each data analysis project that you are working on. A vizboard can contain one or more pages and pages can contain one or more visualizations. Topics: • FAQs—Vizboard Basics • The Vizboard Workspace • How Interactive Viz Queries Work • Turn Live Updates Off and On FAQs—Vizboard Basics A vizboard is the starting point for data analysis. This topic answers some frequently asked questions about vizboards. What is a vizboard? A vizboard is the starting point for data analysis, and can be thought of as a dashboard or project workspace. The vizboard is the canvas for discovering and sharing data insights. Vizboards have one or more pages, and each page can have one or more visualizations. The individual visualizations on a vizboard page can be related (use the same underlying data), or unrelated (use completely different data). What is a visualization (viz)? A visualization (or viz for short) is a graphical representation of certain data fields chosen from the perspective of a Platfora lens. For more information, see FAQs—Viz Builder Basics. Who can create and view a vizboard? To create a vizboard a user must have the Analyst (Limited) role or higher and also have data access permission on a dataset in order to create a visualization. Any user can view a vizboard, but they can only view the viz data if they have data access permission on the data sources used in the viz. Page 20 Data Analysis and Visualization Guide - Get Going with Vizboards How do I create a vizboard? You can create a new, empty vizboard in several different ways. Each new vizboard contains one page with one empty visualization to help you get started. • Click the Add Vizboard button from the Vizboards page. • Click the Create Vizboard button from the data catalog of a particular dataset. Page 21 Data Analysis and Visualization Guide - Get Going with Vizboards How do I rename a vizboard? You can change the name of a vizboard at any time by opening it and changing its name from the vizboard title bar. All vizboards are given a default name of Untitled Vizboard when they are first created. You must have edit permissions on a vizboard in order to rename it. How do I close a vizboard? To close a vizboard, navigate off the page by clicking any other page in the top navigation header. If you don't want to lose your work, Save the vizboard before you exit. There are so many vizboards, can I organize the vizboards I'm interested in? Yes! You can apply labels to all objects, including vizboards, to more easily find them from the Vizboards page. For more information, see Organize Datasets, Lenses and Vizboards with Labels. The Vizboard Workspace The vizboard is where you create and manage visualizations around a particular business question, project, or subject area. A vizboard contains pages, and pages contain visualizations. Visualizations only exist in the context of a vizboard document. From the vizboard you can easily navigate between pages to add or edit visualizations, and arrange the layout of visualizations on each page. You can also comment on visualizations within the vizboard and share your insights with other Platfora users. Page 22 Data Analysis and Visualization Guide - Get Going with Vizboards Vizboard-level and page-level controls are located in the menus at the top of the page. Using the panel controls on the left and right side of the page, you can show and hide vizboard panels to toggle between exploratory mode and presentation mode. 1. Vizboard name 2. Vizboard-level and page-level controls 3. Edit lens button 4. Add menu for supplementing the lens 5. Choose type of viz, either Chart or Cross-Tab. 6. Show/hide pages panel 7. Show/hide builder panels 8. Pages panel 9. Lens panel 10.Builder panel 11.Selected viz 12.Filters and legends panel 13.Show/hide filter panel 14.Show/hide comments Page 23 Data Analysis and Visualization Guide - Get Going with Vizboards How Interactive Viz Queries Work When you build a visualization in Platfora, you are actually creating queries. The act of dragging a lens field to a Builder drop zone creates a query that is sent to the Platfora server, fetches the requested lens data, and then visually renders the results as a Chart or graph. You can also see the results in a tabular, spreadsheet format by looking at the Cross-Tab of a viz. When many people think of queries, they think of SQL (the structured query language used to access data stored in a relational database). Platfora queries are not written in SQL, they are constructed by a user's actions in the vizboard. However, it may help to understand how lens data is requested and returned if we make a comparison with SQL. If a lens query were expressed in SQL, it would look something like this (the clauses are listed in the order that they are processed by the Platfora query engine): SELECT <dimensions in builder>, <measures in builder> FROM <lens> WHERE <dimensions in filters> GROUP BY <dimensions in builder> ROLLUP <rollup measures in builder> HAVING <measure filters> ORDER BY <dimension sort options> LIMIT <dimension limit options> Note that the query is constructed based on the kind of field (measure or dimension), and where the fields are placed in the Builder drop zones. Page 24 Data Analysis and Visualization Guide - Get Going with Vizboards Turn Live Updates Off and On When working in a vizboard, you can choose to pause live updates for the vizboard. This allows you to conserve resources on the Platfora cluster by reducing the number of times the web application queries and renders data. By default, when you make a change to a visualization (viz) in a vizboard, the Platfora web application updates the viz in real-time by querying the Platfora server and rendering the change in the viz. Modifying a viz uses resources on the Platfora cluster, such as memory, network bandwidth, and CPU cycles. Typically, lenses with more data use more resources during viz updates. When to Turn Live Updates Off You might want to pause live updates when you are working with a very large lens, high cardinality dimension fields, or a large number of unique data points. By default, each change to the builder or filter drop zones issues a new query. When working with large data, you may not want to wait for each change to finish rendering before you can make another change. For example, you might want to pause live updates while arranging multiple fields in the builder drop zones, then issue the query once you have all the fields and filters in place. This way, the query is only executed one time and the application only renders the filtered data, not all of the data. How to Turn Live Updates Off and On To control live updates, use the Live Updates button in the vizboard page control menu. Turning off live updates pauses automatic updates for all visualizations in the vizboard. When live updates are Page 25 Data Analysis and Visualization Guide - Get Going with Vizboards turned off, the application does not query and render the data as you modify a viz. To update a viz after modifying it, you can either click Update Now in the viz, or turn on live updates again. Page 26 Chapter 3 Get Going in Viz Builder This section describes some basic concepts about visualizations and the viz builder tools. Topics: • About Drag and Drop Viz Building • About the Different Viz Types • About the Builder Drop Zones • About the Viz Toolbar and Menus • FAQs—Viz Builder Basics About Drag and Drop Viz Building To begin building a visualization, drag fields from the data panel into one of the Builder drop zones. A good way to get started is to think of your business question and drag the associated fields to the X-axis and Y-axis drop zones. For example, if your question was "How much did we sell from each product line?" you could start by dragging the Product Line dimension to X-axis and the Total Sales measure to Y-axis.The X-axis drop zone in charts equates to the Columns drop zone and the Y-axis drop zone equates to the Rows drop zone in cross-tab visualizations. When you select a field and start to drag it over, the appearance of the drop zones change to help guide your placement of the field in the Builder panel. Some drop zones allow both measures and dimensions, some only allow measures, and some only allow dimensions. A grayed out drop zone means Page 27 Data Analysis and Visualization Guide - Get Going in Viz Builder the drop zone does not apply to the selected field. A highlighted drop zone means that the drop zone is available for the selected field. The X-axis, Y-axis, Details, and Filter drop zones all allow multiple fields. Once these drop zones are populated with at least one field, the drop zone for adding additional fields appears as a thin blue line either above or below the currently populated field. In the X-axis and Y-axis drop zones, measures must always be placed below dimensions. The position of dimension fields in the X-axis and Y-axis drop zones determines the grouping order on the axes. Page 28 Data Analysis and Visualization Guide - Get Going in Viz Builder You can remove a field from a drop zone by clicking the orange X next to the field name. You can also drag a new field directly on top of an existing field to replace it. About the Different Viz Types Platfora offers a diverse range of tools to interactively explore and analyze your data in visualizations. The different types of visualizations allow analysts to perform different kinds of data analysis. Table 1: Visualization Types Viz Type Analysis Type Lens Type Chart quantitative aggregate lens Allows you to do exploratory analysis of data that you graphically represent in chart form. Chart visualizations support several types of marks allowing you to create different types of charts, such as bars, lines, plots, text, and more. Typically most charts are displayed on a X-Y axis, but some chart types use no axis, such as a word cloud. Page 29 Description Data Analysis and Visualization Guide - Get Going in Viz Builder Viz Type Analysis Type Lens Type Cross-Tab quantitative aggregate lens Allows you to view the data in the lens in a tabular, spreadsheet format. Geo Map geographic, quantitative aggregate lens Allows you to perform geographic analysis on an aggregate lens that contains geo-encoded location data. A geo map viz is similar to a scatterplot chart viz except the marks are displayed on a map background. Page 30 Description Data Analysis and Visualization Guide - Get Going in Viz Builder Viz Type Analysis Type Lens Type Description Polar Chart quantitative aggregate lens Allows you to do exploratory analysis of data that you graphically represent in chart form using polar coordinates. Currently, polar charts only support the Bar mark type, allowing you to create pie charts and donut charts. Funnel event series event series lens Allows you to track users' behavior across a sequence of events. Each step in the sequence is defined as a stage. Each funnel stage shows progressively decreasing proportions of the original set of users. About the Builder Drop Zones Dragging fields to the Builder drop zones determine the placement and visual appearance of the marks on your visualization. Page 31 Data Analysis and Visualization Guide - Get Going in Viz Builder Some drop zones are available only for certain field roles (measure or dimension), and some drop zones may not be useful for the mark type selected. By default, Platfora chooses the best visual representation for the combinations of measure and dimension fields you add to the drop zones. Drop Zone Allowed Field Roles Allows Multiple Fields? Description X-axis (Chart) Columns (Crosstab) Any (except datetime measure) Yes Sets the data points shown on the horizontal axis of the visualization. When multiple fields are added to this drop zone, the horizontal axis will be grouped (for dimensions) or trellised (for measures) according to the order of the fields in the drop zone. Measures are always placed below dimensions. Y-axis (Chart) Rows (Crosstab) Any (except datetime measure) Yes Sets the data points shown on the vertical axis of the visualization. When multiple fields are added to this drop zone, the vertical axis will be grouped (for dimensions) or trellised (for measures) according to the order of the fields in the drop zone. Measures are always placed below dimensions. In Cross-tab view, measures are always shown as columns even if they are placed in the rows drop zone. Angle (Polar Chart) Yes Sets the relative angle from the Y-axis of a Polar Chart visualization using the polar coordinate system. When multiple fields are added to this drop zone, the viz is trellised horizontally according to the order of the fields in the drop zone, showing a polar chart for each unique angle value. Measures are always placed below dimensions. Geography Location (Geo Map) No Sets the data point positions (geographical coordinates) on a Geo Map visualization. Details Yes In Chart view, shows additional measure values in the tooltips only. For dimensions, it adds additional marks (or groups) to the visualization without any visual encoding. In Cross-tab view, it adds additional columns for measures, and sub-columns for dimensions. Any (except datetime measure) Any Page 32 Data Analysis and Visualization Guide - Get Going in Viz Builder Drop Zone Allowed Field Roles Allows Multiple Fields? Description Color Any (except datetime measure) No Color-encodes marks on the viz. Dimensions use a categorical color palette. Measures use a continuous range of colors. A color legend is added to help you decode the colors assigned to each value or range of values. Size Measure (numeric only) No Size-encodes marks using a continuous range with the smallest value in the range having the smallest size mark, and the largest value having the biggest mark. A size legend is added to show the range of values. Shape Dimension No Shape-encodes point marks. When a dimension field is dropped in the Shape zone, a unique shape is given to each value in the dimension. A shape legend is added to help you decode the shapes assigned to each value. Opacity Measure (numeric only) No Transparency-encodes marks using a continuous range with the smallest value in the range having the lightest mark, and the largest value having the darkest mark. A transparency legend is added to show the range of values. Labels Any No Adds a text label to marks. For measures, text labels show the aggregated value. For dimensions, it shows the dimension member value. If value names are long or if the selected field has a lot of values, using this drop zone can result in overlapping, unreadable text. Page 33 Data Analysis and Visualization Guide - Get Going in Viz Builder About the Viz Toolbar and Menus Every visualization has a collection of buttons in its top right corner. These viz controls include options for duplicating, resizing, and exporting the visualization. There is also menu controls for creating and viewing viz comments, and for managing viz selections. FAQs—Viz Builder Basics A visualization (viz) is a graphical representation of certain data fields chosen from the perspective of a Platfora lens. This topic answers some frequently asked questions about working with visualizations. What is a visualization (viz)? A visualization (or viz for short) is a graphical representation of certain data fields chosen from the perspective of a Platfora lens. It is a query of aggregated lens data that is visually rendered based on the types of fields chosen (measure or dimension), their order and placement in the Builder drop zones, and the various appearance encodings applied to the data (color, size, shape, and so on). Page 34 Data Analysis and Visualization Guide - Get Going in Viz Builder A viz shows aggregated measure data grouped and filtered by the chosen dimensions. A chart in Platfora can best be described as a recipe of dimension and measure fields, plus axis placement (X-axis and Yaxis), plus appearance encodings (Color, Size, Shape, Opacity, Labels), plus mark type (Point, Line, Bar, Area, and so on). Who can create a viz? To create a viz, a user must have the Analyst (Limited) role or higher and also have data access permission on the dataset you want to analyze. What types of visualizations can I create? Platfora has several different viz types including Chart, Cross-Tab, Geo Map, Polar, and Funnel. The type of viz you can create depends on the type of lens you want to analyze. For more information, see About the Different Viz Types. How do I create a viz? When you create a new vizboard, Platfora inserts a new viz automatically. You can add another viz by clicking the Add Viz button in a vizboard. To analyze data in a viz, first you choose the type of viz to create because that determines the datasets available to you. Then you choose the dataset to analyze and an associated lens built on that dataset. For more information, see Get Data to Analyze. Page 35 Data Analysis and Visualization Guide - Get Going in Viz Builder You can choose where visualizations appear on the page when you add them. From the vizboard View menu, choose either Add to center of page or Grow page downward. How do I rename a viz? You can change the name of a viz at any time by selecting it and changing its name from the viz title bar. All visualizations are given a default name of Visualization X (where X is a number) when first created. You must have edit permissions on the vizboard to rename a viz. How do I delete a viz? Click the delete icon Menus. in the viz toolbar. To see the viz toolbar, go to see About the Viz Toolbar and Page 36 Data Analysis and Visualization Guide - Get Going in Viz Builder I found some great insights in my current viz and I want to do more analysis on it, but I don't want to lose the work I have. How can I best do that? You can duplicate the current viz and edit the copy. Click the duplicate button in the viz menu. Platfora creates a new viz on the same page with the same lens, filters, and fields in the drop zones. You can move the viz elsewhere on the page or move it to another page by dragging the viz by its toolbar. Page 37 Chapter 4 Get Data to Analyze The first step in creating a visualization is choosing the data you want to explore and analyze, which is done by selecting a lens. Once you pick a lens to work with, the measure and dimension fields in that lens are loaded into the lens field list in the data panel. Topics: • About Lenses and Viz Types • Choose a Lens • Find Fields in a Lens • Understand Lens Field Types and Roles • Understand the Data in a Lens Field About Lenses and Viz Types Every visualization displays data using a single viz type, such as chart or cross-tab. The viz type determines the type of lenses available for the viz. Page 38 Data Analysis and Visualization Guide - Get Data to Analyze Each viz type uses a particular lens type, such as aggregate or event series lenses. Therefore, when you create a viz, you first choose the viz type in the data panel before choosing a lens to analyze. Once you've chosen the viz type, Platfora lists all datasets that have built lenses of the appropriate type. If you do not see the dataset you want, that means that no lenses of the appropriate type have been built for that dataset. When you create a viz of a particular type, you can change to another type later as long as both viz types use the same lens type. For example, if you create a Cross-Tab viz, you can change it to Chart at any time. Be careful when changing viz types later because not all drop zones in one viz type correlate to drop zones in the other viz type. You may lose some viz configurations when you change viz types. Page 39 Data Analysis and Visualization Guide - Get Data to Analyze Choose a Lens Every visualization must be associated with a lens. To find a lens, first select the dataset you are interested in. Only datasets that have successfully built lenses will be shown. Choosing a dataset will show any lenses that have been built from the focal point of that dataset. 1. With a viz selected, find the name of the dataset you want in the data panel. Datasets that have lenses available are listed in alphabetical order. If you do not see the dataset you want, that means that no lenses have been built for that dataset. 2. Choose a lens that was built from the selected dataset. 3. Picking a lens will load the available fields of that lens into the data panel. Measure fields are at the top of the field list, and dimension fields are listed below the measures in alphabetical order. Once you pick a lens, you cannot go back and change your lens selection. If the lens was not the one you wanted, delete the viz and start again with a new viz. Page 40 Data Analysis and Visualization Guide - Get Data to Analyze Find Fields in a Lens When you pick a lens to work with, the measure and dimension fields included in the lens are shown in the lens data panel. The data panel does not show all of the dataset fields, only those that were requested when the lens was built. Measure fields ( dimensions ( blue. ) are always listed at the top, followed by dimensions ( ) and referenced ) in alphabetical order. Computed fields that were defined in the vizboard are highlighted If you hover over a field name, you can see a tooltip with the field details and the field option menu. In a long list, you can use the search to filter fields by name. Clear the search criteria to remove the search filter. Understand Lens Field Types and Roles Lens fields are categorized into two basic roles: measures and dimensions. Measures are always aggregated numeric type data. Dimensions can be numeric, text, datetime, or location type data. As you Page 41 Data Analysis and Visualization Guide - Get Data to Analyze browse through the fields in the data panel, you will notice that each field has an icon to denote its role and type. Icon Field Role Description Measure (numeric) A measure provides the basis for quantitative data analysis in a viz. Measure fields produce aggregated data values for each dimension grouping in the viz. Measures are always quantitative (continuous) numeric values, and every visualization must have at least one measure. Measures are always listed at the top of the field list. Datetime Measure A datetime measure is a special variety of measure fields that show the maximum or minimum date value for a dimension category. Datetime measures are always the DATETIME data type and use the MAX or MIN aggregate function. Categorical Dimension A categorical dimension allows you to group measure values into discrete categories. For example, a Region dimension with values such as US, Asia, and Europe could be used to group a measure such as Total Sales. Categorical dimensions are typically textual data (strings) that are used to filter and group measure data. Numeric Dimension A numeric dimension allows you to show dimension data as a continuous range of values in addition to discrete categories. For example, a Customer Rating dimension with values 1-10 could be viewed as a range of values or as discrete categories. Numeric dimensions are always numeric data types (integer, fixed, double, long). Note that numeric dimensions are not the same as measures. They are not aggregated data and can only be used for grouping, not for quantitative analysis. Date A date is a special kind of dimension field for datetime type data values. It renders each date as a discrete category. Date A date (timeseries) is a special kind of dimension field for (Timeseries) datetime type data values. It contains the same data values as a regular date field, but renders the data as a continuous range rather than as discrete values. Location A location is a special kind of dimension field for geo-encoded data. It uses a complex datatype that includes geo coordinate information (latitude and longitude) and can optionally include a label that associates a place name with the coordinates. This field is typically used in geo map visualizations to place positions on a map. Page 42 Data Analysis and Visualization Guide - Get Data to Analyze Icon Field Role Description Reference Lenses can contain fields from multiple datasets, as long as the datasets are related by a reference. A reference is shown as a toggle arrow. You can expand a reference to see additional dimension fields that you can use in your visualization. Segment A segment is a special type of dimension field that you can create to group together members of a population that meet some defined common criteria. A segment is a based on members of a dimension dataset (such as customers) that have some behavior in common (such as purchasing a particular product). A segment is always based on a referenced dimension dataset, and must include at least one condition from a fact or event dataset. Understand the Data in a Lens Field Analysts typically explore data prepared by someone else. This section describes some tools analysts can use to better understand the history and meaning behind the fields in a lens. View the Data Lineage of a Field Preparing data for analysis often involves some form of cleansing and manipulation in order to make sense of the data. Raw source data is seldom consumed in its original format, and usually goes through various processing and transformation steps before it is used in a visualization. Viewing the data lineage of a field allows you to see where the data came from and what processing functions were applied. Data lineage allows you to answer questions such as: • Where did this data come from? • How current is this data? • How was this result calculated? • How can I reproduce this analysis on other data? For any field in a lens, Platfora is able to show how that field was derived - all the way back to the source data files in Hadoop. This includes the data sources, datasets, lenses, and computed field expressions that the data went through to create the data values you are seeing in your visualization Page 43 Data Analysis and Visualization Guide - Get Data to Analyze or cross-tab. The data lineage report lists the lens field and all its parent objects up to the configured number of levels. It also includes other information, such as filter expressions and time stamps. 1. From the lens data panel or viz builder panel, select Show Data Lineage from the field menu. 2. This opens the data lineage report for that field. Expand the arrows to see each level in the data lineage tree. 3. (Optional) Click Export as JSON to download the data lineage report in a platforaData.json file. This file is saved in the default downloads directory configured for your browser. View the Definition of a Field When working in a viz, lens field names aren't always enough to inform analysts what the data values represent. This is more critical for fields that are computed from the raw source data. To help understand the data better, you can view the definition of a field to see how the data values were computed. Page 44 Data Analysis and Visualization Guide - Get Data to Analyze You can view the definition of a field in the lens panel or the builder panel. Click the field's contextual menu and view the definition below the field name. View the Definition of a Segment Field A segment is a special type of dimension field that you can create to group together members of a population that meet some defined common criteria. Segment fields can be described by another analyst Page 45 Data Analysis and Visualization Guide - Get Data to Analyze who has access to the lens. To understand logic behind the segment field, you can view its definition to see how the data values were computed. 1. Click a viz that uses a lens that has one or more segments defined. 2. Click the Add button and choose Select Segment. Page 46 Data Analysis and Visualization Guide - Get Data to Analyze 3. In the Select Segments dialog, click the segment whose definition you want to view. You can also view the definition of a segment field in the lens panel as long as the field is not currently in drop zone. Click the segment field's contextual menu and choose Edit Field. Page 47 Chapter 5 Control the Marks on a Viz A mark is a graphical symbol (point, area, line, and so on) used to encode data in a visualization. In Platfora, a mark is the visual representation of a measure value calculated for a group of input records or rows. A group consists of records that share the same value for the dimension(s) used in the visualization. For example, if you picked fields called Region (a dimension) and Units Sold (a measure) in your viz, then there would be a mark on your viz for each region depicting the total number of units sold in that region (North America=5000, Europe=3000, Asia=7000, etc.). Topics: • Encode Data Using Mark Drop Zones • About Mark Types • Adjust Mark Appearance Encode Data Using Mark Drop Zones The initial marks on a visualization are determined by the dimension and measure combinations used on the axes (X-axis and Y-axis). You can add additional marks (or groups) to the visualization by adding additional dimensions to any of the Marks drop zones, such as the Details or Color encoding drop zones. Add Marks without Encoding When a measure is used in a visualization, the group of input records used to calculate the measure value is determined by the dimensions selected in the viz. For example in a dataset about diamonds, if cut was a dimension in the viz, there would be five input groups for which the measure would be calculated (Fair, Good, Ideal, Premium, and Very Good). Each input group is a mark on the viz. An unencoded mark adds additional groups to the viz, but the groups are not visually differentiated. You can add unencoded marks by putting a dimension in the Details drop zone. For example, if you put Page 48 Data Analysis and Visualization Guide - Control the Marks on a Viz color in Details, a mark is added to the viz for each cut, color combination, but the color marks are not visible unless you select them. Page 49 Data Analysis and Visualization Guide - Control the Marks on a Viz Add Marks with Encoding Adding additional dimension fields to the Color, Shape, or Labels drop zones both adds additional groups to the viz and encodes the marks so the groups are visually differentiated. For example, if you put color in Color, then within each cut group, the marks are color-encoded by color. About Mark Types The mark type controls the shape of the data on the visualization and how data points are visually represented. Setting the mark type to Auto (the default) will automatically choose the best visual representation of your data based on the fields you have selected. You can select one of several mark types from the Marks drop-down menu. The default type is Auto, which means that Platfora will choose the shape that best fits the fields you select. The rendering of a particular chart depends on the mark type in combination with the placements of measure and dimension fields in the Builder drop zones. Use Bar Marks Bars are useful for comparing quantitative data by different categorical groupings. Bar is the default mark choice when you are analyzing a single measure across one or more dimensions, such as Page 50 Data Analysis and Visualization Guide - Control the Marks on a Viz comparing sales totals by region. Bars are most often used to mark data with categorical values, although they can also be used for quantitative data as well. Simple Bar Charts A bar chart displays quantitative (measure) data points as rectangular bars in which the length of a bar is proportional to the value. A bar chart is often used to compare values across one or more categories, such as showing the number of items sold by product or by region. The bars can be vertical (if the measure is placed in Y-axis) or horizontal (if the measure is placed in X-axis). To create a simple bar chart, drag a measure field to the Y-axis zone and drag a dimension field to the X-axis zone. Page 51 Data Analysis and Visualization Guide - Control the Marks on a Viz Stacked Bar Charts You can color-encode a bar chart by dragging an additional dimension field to the Color zone. By default, the bars are stacked, meaning multiple dimension values are shown cumulatively within a single bar. Page 52 Data Analysis and Visualization Guide - Control the Marks on a Viz Unstacked bars are placed one on top of another (not side-by-side), so unstacked bar charts are not always visually useful. To show the color-encoded values as individual side-by-side bars (grouped), add the same dimension field to both X-axis and color. 100 Percent of Total Stacked Bar Charts 100 percent total stacked bar charts show the percentage that each color-encoded dimension member contributed to the total value rather than showing the actual cumulative values. You can change a Page 53 Data Analysis and Visualization Guide - Control the Marks on a Viz stacked bar chart to a 100% of total stacked bar chart by changing the scale of the measure axis. Note that if your values have negative numbers, the percent of total scale may be above 100% or below 0%. Create a Histogram Marks on a Bar chart are separated by empty space by default. You can increase the size of the bar marks to make a histogram (no space between bars). Page 54 Data Analysis and Visualization Guide - Control the Marks on a Viz When the viz uses Bar marks, use the Size pull-out menu in the Builder panel and maximize the size by changing it to 1.00x. Page 55 Data Analysis and Visualization Guide - Control the Marks on a Viz Mark outlines are on by default, so if you want to turn off mark outlines, clear the Show mark outlines option from the Marks options menu. Use Point Marks The point mark type is best used to show the relationship between two independent variables, and is most often used for building scatterplot charts or bubble charts. Point is the default mark type when you select a measure for both the X-axis and Y-axis, and place a dimension in color, opacity, or size. Scatterplots A scatter plot chart shows the relationship of a data point between two quantitative (measure) values. Scatter plots are used to show how much one variable is affected by another. The relationship between the two variables is called their correlation. For example, you might use a scatter plot to show the relationship of two variables such as population size and income. To create a simple scatterplot, drag Page 56 Data Analysis and Visualization Guide - Control the Marks on a Viz a measure field to both the Y-axis and X-axis zones, and drag a dimension to the Color, Size or Opacity. Bubble Charts A bubble chart is a variation of a scatterplot, except that it shows the relationship of a numeric data point between two quantitative variables. The data points on a bubble chart are compared in terms of their size, as well as to their relative positions on the horizontal and vertical axes. To create a simple bubble Page 57 Data Analysis and Visualization Guide - Control the Marks on a Viz chart, drag a measure field to both the Y-axis and X-axis zones, drag a dimension to Color, and then drag an additional measure to Size. Packed Bubbles Charts A packed bubbles chart displays quantitative (measure) data as solid circular points (bubbles) in which the area of each bubble is proportionate to the value of the measure. The bubbles are not located on either the X-axis or Y-axis, so their locations have no significance. Instead, they are placed closely together to use the space more efficiently. Packed bubbles charts are different from bubble charts on an axis because they show fewer measure fields (e.g. one measure as opposed to three measures). A packed bubbles chart is similar to a word cloud chart, but it uses point marks instead of text marks. By default, a packed bubbles viz displays a maximum number of 40,000 marks. However, your system administrator might change this value using the platfora.viz.bubble.limit configuration property. Page 58 Data Analysis and Visualization Guide - Control the Marks on a Viz To create a packed bubbles chart, place a measure field in the Size drop zone, and place a dimension field in the Labels drop zone. Then choose Point from the Marks menu. Optionally, you can color encode the bubbles in a packed bubbles chart by placing a dimension field in both the Color and Labels drop zones. Then choose Point from the Marks menu. Page 59 Data Analysis and Visualization Guide - Control the Marks on a Viz Use Line Marks A line chart displays a series of individual quantitative (measure) data points connected by line segments. A line chart is often used to visualize a trend in measure data over intervals of time (time series data) with the line drawn chronologically. Line is the default mark choice when you are analyzing a measure across a date or a numeric (quantitative) dimension. Simple Line Charts A line chart displays a series of quantitative (measure) data points over a numeric dimension, such as year or age. A line chart is often used to show trends over a continuum, such as sales performance over time. To create a simple line chart, drag a measure field to the Y-axis zone and drag a date or numeric dimension field to the X-axis zone. Stacked and Unstacked Lines Page 60 Data Analysis and Visualization Guide - Control the Marks on a Viz You can color-encode a line chart by dragging a dimension field to the Color zone. Depending on the field selections you have made for the X-axis and Y-axis, the lines may be shown unstacked or stacked. Stacked is the default choice when analyzing a measure by a date dimension, such as year. When line marks are stacked, the lines are not drawn independently of each other, and are not compared along the horizontal axis. Instead the lines are drawn cumulatively along the vertical axis. The stack order reflects the order of dimension values, from bottom to top. Instead of interpreting the lines Page 61 Data Analysis and Visualization Guide - Control the Marks on a Viz themselves, you interpret the space between the lines, which is why stacked lines are often better visualized as shaded areas. Page 62 Data Analysis and Visualization Guide - Control the Marks on a Viz When line marks are unstacked, the lines are drawn independently of each other. You interpret a data point on the line by reading values along the horizontal and vertical axes. Unstacked is the default choice when analyzing a measure by a numeric dimension (such as age or height). Use Area Marks Similar to a line graph, an area graph displays quantitative (measure) data over a continuum (such as time), and is typically used to compare two or more quantities. However, unlike lines, area charts are typically used to represent cumulated totals rather than individual totals. An area chart shows the space between marks filled in with color, which is helpful for showing how the member of a dimension is contributing to an overall trend. Page 63 Data Analysis and Visualization Guide - Control the Marks on a Viz Simple Area Charts An area chart shows the space between the lines filled in with color. To create a simple area chart, drag a numeric date field (such as year) to the X-axis zone, drag a measure to Y-axis zone, and drag a categorical dimension to Color. Page 64 Data Analysis and Visualization Guide - Control the Marks on a Viz Heatmaps Heat maps are used to compare a measure across multiple dimensions. Heat maps allow you to use color to see variations in the data. To create a simple heatmap, drag a dimension field to both the Y-axis and X-axis zones, and drag a measure to Color. Use Path Marks A path connects data points with lines. However, unlike the line chart type, the data points are connected in order. Paths are useful for showing data that follows a natural order, such as ordered stops on a road trip over, or ordered web page visits over time. Page 65 Data Analysis and Visualization Guide - Control the Marks on a Viz A path charts displays a series of quantitative (measure) data points over an ordinal dimension, such as time. The drawing order of the line is determined by the sort order of the selected dimension field. For example, this path shows the average departure delays of flights over the course of a day. Note: Platfora currently does not support geographical encoding of data, such as plotting a path of data points on a map image, although this feature is planned for a future release. Use Polygon Marks A polygon mark is similar to a path mark, except that the connected lines are filled in as shaded areas. The polygon chart type is useful when you have areas of data points on your visualization that you want to shade in to see distinct areas of data. Polygon charts typically require special datasets to effectively convert useful insights, and are not used that often in the current product. This mark type will be more useful in the future when Platfora increases support. Use Text Marks Text marks are useful for displaying text data at a glance. Text marks are most often used on visualizations without axes (non-axis visualizations). Text is the default mark choice when you are analyzing a single measure in the Labels drop zone. Text Gauge Viz A text gauge viz displays a single numeric value displayed as text. You might want to create a text gauge viz to display a key performance indicator (KPI) as part of a dashboard of several visualizations. Optionally, you can color code the text value by placing a different single value field into the Color drop zone. You might use a measure or a single value numeric dimension in a text gauge viz. Page 66 Data Analysis and Visualization Guide - Control the Marks on a Viz To create a text gauge viz, place a measure field or a single value dimension field in the Labels drop zone. Word Cloud Viz A word cloud viz displays text data as a weighted list. The text data values are distinguished from each other by size. Word cloud visualizations are useful for quickly perceiving the most prominent terms. By default, a word cloud viz displays a maximum number of 1500 words. However, your system administrator might change this value using the platfora.viz.word.limit configuration property. Page 67 Data Analysis and Visualization Guide - Control the Marks on a Viz To create a word cloud viz, place a dimension field in the Labels drop zone, and a measure field in the Size drop zone. Page 68 Data Analysis and Visualization Guide - Control the Marks on a Viz Adjust Mark Appearance Dragging a field into one of the Marks drop zones of the Builder panel is one way to add marks to a visualization and visually encode their appearance, but you can also use the pull-out menus to change the visual appearance of marks already on the visualization without adding additional groupings. The Builder drop zones show the current settings that apply to the viz marks. Show Mark Outlines Viz marks for some mark types are outlined in a darker color by default. You can choose whether or not to display mark outlines in a viz. You might want to display mark outlines to more easily differentiate among individual marks, especially when they overlap each other. When Platfora is configured to show mark outlines, they are available for the Bar, Point (solid shapes only), Area, and Polygon mark types. The outline color is a slightly darker version of the fill color for an individual mark. Page 69 Data Analysis and Visualization Guide - Control the Marks on a Viz Use the pull-out menu for the Marks drop zone to show or clear mark outlines. Page 70 Data Analysis and Visualization Guide - Control the Marks on a Viz To remove mark outlines, clear the Show mark outlines option from the Marks pull-out menu. Adjust Mark Color Viz marks can be colored in a single color or a collection (palette) of colors, depending on how the data is encoded in the viz. The appearance controls in the Color pull-out menu vary depending on whether or not a field is in that drop zone. Page 71 Data Analysis and Visualization Guide - Control the Marks on a Viz Field is in the drop zone The Color menu controls a range of values (a color palette) that apply to the marks in the viz created by the field in the Color drop zone. Page 72 Data Analysis and Visualization Guide - Control the Marks on a Viz Field is not in the drop zone The Color menu controls a single color that applies to all marks in the viz. Adjust Mark Size Viz marks can appear as a single size or range of sizes, depending on how the data is encoded in the viz. All mark types are assigned a default size, minimum size, and a maximum size. For example, the maximum size for bar marks is such that each bar mark touches the bars on either side without overlapping them (a histogram). The minimum size for bar marks is one pixel. When adjusting mark size, you make the marks bigger or smaller than the default size. Any mark size between 0x and 1.00x makes the mark bigger than default, and mark sizes between 0x and -1.00x make the mark smaller than default. The appearance controls in the Size pull-out menu vary depending on whether or not a field is in that drop zone. Page 73 Data Analysis and Visualization Guide - Control the Marks on a Viz Field is in the drop zone The Size menu controls a range of sizes that apply to the marks in the viz created by the field in the Size drop zone. Page 74 Data Analysis and Visualization Guide - Control the Marks on a Viz Field is not in the drop zone The Size menu controls a single size that applies to all marks in the viz. Adjust Mark Opacity Viz marks can appear as a single opacity or range of opacities, depending on how the data is encoded in the viz. The appearance controls in the Opacity pull-out menu vary depending on whether or not a field is in that drop zone. Page 75 Data Analysis and Visualization Guide - Control the Marks on a Viz Field is in the drop zone The Opacity menu controls a range of opacities that apply to the marks in the viz created by the field in the Opacity drop zone. Page 76 Data Analysis and Visualization Guide - Control the Marks on a Viz Field is not in the drop zone The Opacity menu controls a single opacity that applies to all marks in the viz. Adjust Mark Shape Point viz marks can be colored in a single shape or a collection of shapes, depending on how the data is encoded in the viz. The appearance controls in the Shape pull-out menu vary depending on whether or not a field is in that drop zone. When choosing a shape or collection of shapes, you can choose between solid (Fill) or hollow (Outline) shapes. The shape controls only affect point mark types. If a field is placed in the Shape drop zone and a nonPoint mark type is displayed, Platfora treats that field as if it were in the Details drop zone. Page 77 Data Analysis and Visualization Guide - Control the Marks on a Viz Field is in the drop zone The Shape menu controls a collection of shapes (a shape palette) that apply to the marks in the viz created by the field in the Shape drop zone. Currently, Platfora supports only one shape palette. Page 78 Data Analysis and Visualization Guide - Control the Marks on a Viz Field is not in the drop zone The Shape menu controls a single shape that applies to all marks in the viz. About Solid and Hollow Shapes Point marks on a chart appear as solid shapes by default. You can configure Point marks to be either solid (Fill) or hollow (Outline) When the viz uses Point marks, use the Shape pull-out menu in the Builder panel to choose solid or hollow shapes. Page 79 Data Analysis and Visualization Guide - Control the Marks on a Viz Click Fill and choose the shape you want to create a solid shape. Page 80 Data Analysis and Visualization Guide - Control the Marks on a Viz Click Outline and choose the shape you want to create a hollow shape. Adjust Mark Labels Viz marks can be displayed with or without a label representing one of different values associated with each mark. The appearance controls in the Labels pull-out menu vary depending on whether or not a field is in that drop zone. The Labels menu controls the text that applies to all marks in the viz. Whether or not you can choose a particular label depends on if a field is in that drop zone. Page 81 Data Analysis and Visualization Guide - Control the Marks on a Viz Field is in the drop zone When a field is in the Labels drop zone, the value of the drop zone field is displayed for each mark. Field is not in the drop zone You can choose from the following label options: • None - This option turns off the text labels. • X-Axis Value - This option always displays the mark's value along the horizontal axis. You might want to use this option to quickly and easily read the mark's exact value on the horizontal axis without zooming in. Page 82 Data Analysis and Visualization Guide - Control the Marks on a Viz • Y-Axis Value - This option always displays the mark's value along the vertical axis. You might want to use this option to quickly and easily read the mark's exact value on the vertical axis without zooming in. Page 83 Chapter 6 Control Field Labels in a Viz Platfora allows you to control how field names, labels, and values appear on the axis of a viz. Topics: • Truncate Field Labels in a Viz • Change Field Labels Width in a Viz • Change Field Display Names on an Axis • Hide Field Name and Values on a Chart Axis • Change the Number Formatting of a Measure Truncate Field Labels in a Viz When dimension field values are really long, the field name and value labels in a visualization are truncated on the right side by default. However, you can configure where to truncate text for fields used in a drop-zone. You can truncate long text in field labels on the left side, ride side, or center of the text. You might want to choose a different truncation location when you have multiple field values that start with identical characters that are only distinguished from each other at the end of the string. Platfora displays an ellipse character (...) when it truncates a label. Page 84 Data Analysis and Visualization Guide - Control Field Labels in a Viz Truncation applies to the field in both Chart and Cross-Tab view. Date dimension fields cannot be truncated. Platfora always displays the full date. Page 85 Data Analysis and Visualization Guide - Control Field Labels in a Viz Page 86 Data Analysis and Visualization Guide - Control Field Labels in a Viz 1. Click a visualization to select it. Field labels are controlled on a per-viz basis. 2. Find the dimension field in the Builder panel that you want to edit, and select Options and Sort from its field menu. (Optional) Click the field name in the viz to open the Options and Sort dialog. 3. Click the Labels tab. 4. Choose where to truncate the field label in the viz, either Left, Center, or Right. 5. Click Apply. The labels for the field name and field values are updated in the currently selected viz. The truncated labels apply to the current viz only. The truncated labels do not affect other visualizations in the same vizboard. When the field is removed from a drop zone and added again, the truncation setting is reverted to the default. Change Field Labels Width in a Viz When you add a dimension field to your visualization, the field name and its values are assigned a width by default. Platfora calculates a best guess width based on several factors including the length of the current values in the field. However, you can define the width assigned the labels of fields used in a drop-zone. You can choose one of the following options when configuring field label width for a dimension field: • Fit to best guess. Platfora calculates a label width based on several factors including the size of the viz and length of values in the field. This is the default. • Fit to axis values. The label width is large enough to accommodate the longest value along the axis. • Fit to field name. The label width is as wide as the field name. • Max pixel width. You can choose the number of pixels to use for the label width. Page 87 Data Analysis and Visualization Guide - Control Field Labels in a Viz The label width applies to the field in both Chart and Cross-Tab view. 1. Click a visualization to select it. Field label widths are controlled on a per-viz basis. 2. Find the dimension field in the Builder panel that you want to edit, and select Options and Sort from its field menu. (Optional) Click the field name in the viz to open the Options and Sort dialog. 3. Click the Labels tab. 4. Choose how to calculate the field label width. 5. Click Apply. Page 88 Data Analysis and Visualization Guide - Control Field Labels in a Viz 6. The field label width is updated in the currently selected viz. The field label width applies to the current viz only. The width does not affect other visualizations in the same vizboard. When the field is removed from a drop zone and added again, the width setting is reverted to the default. Page 89 Data Analysis and Visualization Guide - Control Field Labels in a Viz Change Field Display Names on an Axis When you add a field to your visualization, the field name is displayed in the viz axis headers. By default, the field name shown in a viz is the same as it appears in the lens. You can change the display name of any field used in a visualization on a per-viz basis. Page 90 Data Analysis and Visualization Guide - Control Field Labels in a Viz Page 91 Data Analysis and Visualization Guide - Control Field Labels in a Viz 1. Click a visualization to select it. Field display names are controlled on a per-viz basis. 2. Find the field in the Builder panel that you want to rename, and select Options (for measures) or Options and Sort (for dimensions) from its field menu. 3. Enter the new Display Name and click Apply. 4. The display name is updated in the headers of the currently selected viz. The new display name applies to the current viz only - the display name does not affect the field name in the dataset, lens, or other visualizations in the same vizboard. The display name currently only applies to the viz headers, not the viz filters and legends panels. Hide Field Name and Values on a Chart Axis When you add a field to your visualization, the field name and its values are displayed in the viz axis headers. To save space in a viz, you can hide the field name or its values on the axis of a chart viz on a per-viz basis. You might want to hide field names or values for any of the following reasons: • To save space in the viz. • If the information is duplicated through color, labels, or other attributes. • If the chart is explained in the title and doesn’t need to show axes. Page 92 Data Analysis and Visualization Guide - Control Field Labels in a Viz Field names and values are only hidden in Chart view. They always appear in Cross-Tab view. Page 93 Data Analysis and Visualization Guide - Control Field Labels in a Viz Page 94 Data Analysis and Visualization Guide - Control Field Labels in a Viz 1. Click a visualization to select it. Hiding field names is controlled on a per-viz basis. 2. Find the field in the Builder panel that you want to edit, and select Options (for measures) or Options and Sort (for dimensions) from its field menu. 3. Select the Hide field name (chart only) or Hide axis values (chart only) options as desired. 4. Click Apply. The field name or axis labels are removed from the currently selected viz. This setting applies to the field in the current viz only. It does not affect other visualizations in the same vizboard. When the field is removed from a drop zone and added again, the setting is reverted to the default. Change the Number Formatting of a Measure You can specify the number format for the measure values shown in the viz and cross-tab text. You can select from a set of standard formats, such as normal, currency, scientific, and percentage. 1. Select Options in the measure field drop-down menu. You can only change the number formatting for measure values, not numeric dimension values. Page 95 Data Analysis and Visualization Guide - Control Field Labels in a Viz 2. Choose the number format you want to use, and select the desired formatting options. The following number formats and options are available: Format Description Options Auto The data values in the field are used to determine the best display format. None. Normal Values are displayed as regular numbers. Negative Values - How to display negative values. Decimal Places - How many decimal places to display. CurrencyValues are displayed as monetary units. Negative Values - How to display negative values. Decimal Places - How many decimal places to display. Symbol - The monetary symbol to use (dollar, euro, yen, etc.) Percent Values are displayed as a percentage where a value of 1 is interpreted as 100%, 0.8 as 80%, 0 as 0%, and so on. Negative Values - How to display negative values. Decimal Places - How many decimal places to display. ScientificValues are shown in scientific exponential or e notation. The Negative Values - How to display negative values. Decimal Places - How many decimal places to display. notation e+n represents times ten raised to the nth power. Note that in this usage the character e is not related to the mathematical constant e or the exponential function ex. 3. Click Confirm. Page 96 Chapter 7 Sort Viz Data You can sort data in a viz to arrange the data in a meaningful order for analysis. Topics: • Default Sort Behavior • Change the Sort Order of a Dimension Axis Default Sort Behavior By default, dimension values are sorted alphabetically (chronologically for dates and numerically for numbers) in ascending (A-Z) order. You can change the sort order to sort the marks in descending order (Z-A) or to sort the marks according to a measure field instead. Page 97 Data Analysis and Visualization Guide - Sort Viz Data Change the Sort Order of a Dimension Axis Changing the sort order of a dimension axis rearranges the order of the marks on the viz. You can change the default sort order of a dimension, sort the dimension marks on the viz according to the value of a measure, or limit the number of dimension values shown in the viz. Sort options are only available for dimension fields placed in a Builder drop-zone. Select Options and Sort in the field drop-down menu to access the sort options for that field. Page 98 Data Analysis and Visualization Guide - Sort Viz Data Changing the Default Sort Order By default, dimension axes show values as categorical data. Each dimension field has a default sort order depending on its data type: • • String data ( Numeric data ( • Datetime data ( ) is shown in alphabetical order (A-Z). ) is shown from low to high values (1-10). ) is shown in chronological order (Jan 1 - Dec 31). The default sort order for a categorical dimension axis is Ascending. Values are displayed in natural reading order - left to right for X-axis or top to bottom for Y-axis. To change the default sort order to Descending (high to low): 1. Select Options and Sort in the field drop-down menu. 2. Change the Sort Direction from Ascending to Descending. 3. Click Apply. Sorting a Dimension by a Measure Field Sorting a dimension axis by a measure is a good way to show the highest (or lowest) performing categories of a dimension. To sort by a measure: 1. Select Options and Sort in the dimension field drop-down menu. 2. (Optional) Choose the Sort Direction - Ascending if you want to explore the bottom of the range, Descending if you want to explore the top of the range. 3. In the Sort by field, choose the measure you want to use. 4. (Optional) To limit the number of marks shown on the viz, enter a Limit number. 5. (Optional) Select Include 'Others' member if you want a mark to represent the members that were excluded by the limit. Page 99 Data Analysis and Visualization Guide - Sort Viz Data For example, to show the top 10 busiest airports, you could do a descending sort on airports by the number of flights and then limit the results to 10. Page 100 Chapter 8 Filter Viz Data Adding a filter to a visualization allows you to constrain the data that is shown in the visualization. You can add a filter using a particular field, or by selecting and isolating a set of marks on a viz. Topics: • FAQs—Viz Filters • Add a Filter on a Field • Create a Page Filter • Toggle Filter Include/Exclude Mode • Filter by Selection • Filter by Limit FAQs—Viz Filters This topic answers some frequently asked questions about filtering data in a visualization. What is a viz filter? A viz filter is a condition that specifies which data values to display in a visualization. Filtering a viz affects the query Platfora creates and runs against the lens. For more information on how filtering affects the viz query, see How Interactive Viz Queries Work. Who can create a viz filter? To create a viz filter, a user must have the Analyst (Limited) role or higher and also have object access permission to edit the vizboard. What types of viz filters can I create? You can create the following types of viz filters: • Local filter. Local filters apply to a single viz. Local filters appear in the Filters panel under the Filters section. Page 101 Data Analysis and Visualization Guide - Filter Viz Data • Page filter. Page filters apply to all visualizations on a page that use the same lens. Page filters appear in the Filters panel under the Page Filters section. When viewing a vizboard with no viz currently selected, the Page Filters section includes the lens name for each page filter. How can I create a viz filter? You can create a viz filter in the following ways: Page 102 Data Analysis and Visualization Guide - Filter Viz Data • From a field. Filtering on a field allows you to include or exclude records from the visualization based on the values of the selected field. Filtering from a field can apply to a single viz (local filter) or all visualizations on the page using the same lens (page filter). For more information on working with field filters, see Add a Filter on a Field. When you drill down on a field in a viz, Platfora applies a filter to a field. For more information on drilling down, see Drill Down FAQ. • From a selection of viz marks. Using the cursor, you can select marks in a viz and filter on the selection. Filtering from a selection can apply to a single viz (local filter) or all visualizations on the page using the same lens (page filter). For more information on working with selection filters, see Filter by Selection. • From a sort limit. You can limit the number of dimension members in a viz based on a measure calculation. For example, you could limit a viz to the top 10 sellers. Filtering data using a sort limit is only ever applied to a single viz (local filter). For more information on working with limit filters, see Filter by Limit. Why would I want to create a page filter? You might want to create a page filter to look at different variables in multiple visualizations across a common dimension. How do I create a page filter? You can create a page filter by dragging a field into the Page Filters drop zone, or by designating (promoting) a local filter to apply to the entire page. For more information on creating page filters, see Create a Page Filter. Are viz field filters inclusive or exclusive? By default, field filters are inclusive, meaning the values selected in the filter are included in the visualization. You can change a field filter to be exclusive, meaning everything except the selected values are included in the visualization. For more information on how to do this, see Toggle Filter Include/Exclude Mode. Can I create multiple filters on a viz? Yes, but note the following: • You can create multiple filters on different fields, as either local or page filters. • You can create multiple selection filters, as either local or page filters. • You can create only one page filter on a particular lens field. • You can create only one local filter on a particular lens field. • You can create a local filter and a page filter on the same lens field. • Each viz only allows one limit filter. Page 103 Data Analysis and Visualization Guide - Filter Viz Data Does the order of the viz filters in the Filters panel matter? No. Platfora uses an AND condition in the viz query when applying all applicable filters. Can I change a page filter to a local filter for a particular viz? Yes, you can demote a page filter to a local filter. You can demote most page filters to any viz on the page. However, you can only demote a page filter that contains a drill filter to the original viz. To demote a page filter, you can drag it from the Page Filters section to the Filters section of the Filters panel. Or, you can select the viz to apply the filter to, and from the filter contextual menu choose Apply to only this Viz. Can I create a filter on a geography field? Yes. Filtering on a location field filters on the location name (label) assigned to each value as a categorical dimension. For example, if you have a location field for zip code data and the location name for each value is the 5-digit zip code, you can filter on values such as 94402 and 94111, but not on the latitude and longitude coordinates. Page 104 Data Analysis and Visualization Guide - Filter Viz Data What happens when I copy a viz that has a page filter applied to it to another vizboard page? When you move a viz that has page filter applied to it to another page (either in the same or different vizboard), the page filter is converted to a local filter and is applied to that viz only. Also, if the page filter contains a drill filter, the page drill filter is converted to a local drill filter if the moved viz originally crearted the page drill filter. However, if the moved viz didn't create the page drill filter, it gets converted to a regular local filter. How do I remove a filter? To remove a filter on a field, find the field in the Filters panel and click the X next to the field name. Or, from the filter contextual menu choose Remove from Viz. Note that limit filters do not show in the Filters panel. To remove a limit filter on a dimension, you need to open the Options and Sort dialog on the particular dimension field to remove any limits that have been applied. Can I promote a local filter created by drilling down on a viz to a page filter? Yes, but note the following: Page 105 Data Analysis and Visualization Guide - Filter Viz Data • When you promote a local drill filter to a page drill filter, the page drill filter shows the viz name to make it clear which viz created it. • If you promote a local drill filter to a page drill filter and later delete the viz in which you created the drill filter, the page drill filter changes in appearance to a regular page filter. That means the viz name disappears from the page filter. • When you duplicate a viz that has a page drill filter, the duplicated viz has a local drill filter applied as well as the page filter. The local drill filter determines which fields appear in each viz drop zone, and the page filter determines which data members are filtered out. Add a Filter on a Field Filtering on a field allows you to include or exclude records from the visualization based on the values of the selected field. The filter controls differ depending on the type of field you are filtering on (dimension, measure, or date). You can add a filter on any field by dragging it to the Filters panel. If the visualization has multiple filters applied, drag new fields to the desired position in the list (above or below an existing filter field). Page 106 Data Analysis and Visualization Guide - Filter Viz Data When the drop zone indicator appears, drop the field into position. Filters are applied independently of each other, so their position in the list does not change how the filter is applied to the data. You can also add a filter from the field's drop-down menu from the lens panel or builder panel. This adds a filter control for that field to the bottom of the Filters panel. Filter on Dimension Fields Dimension fields can contain textual or numeric data, so filtering on a dimension depends on the data type of the selected field. Category filters allow you to select specific dimension members (values) to include or exclude in your visualization. Range filters allow you to choose a range of values to include or exclude, and are only applicable to dimensions that contain numeric data. When you add a dimension field to the Filters panel, you get the dimension filter control. Depending on the data type of the field selected, the control defaults to either Category (for textual data) or Range (for numeric data). For numeric data, you can change the filter control to Category if you want to choose specific numeric values to include or exclude. Page 107 Data Analysis and Visualization Guide - Filter Viz Data Basic Category Filters A category filter shows a list of the distinct values (or members) in the selected dimension field. As long as the dimension field has 100 values or less, you can scroll through the entire list of values and check the values you want. By default, the category filter control is inclusive, meaning you select the values you want to include in your visualization (unselected values are filtered out). To change to exclusive, see Toggle Filter Include/Exclude Mode. Page 108 Data Analysis and Visualization Guide - Filter Viz Data Category Filters with Long Member Lists The category list only displays the field values when the field contains 100 or fewer values (or members). For dimension fields that have a lot of distinct values, click Edit List to choose values that are not visible in the list. Page 109 Data Analysis and Visualization Guide - Filter Viz Data In the Edit Filter dialog, choose from the data source member values displayed in the dialog and add one or more values to the filter criteria. To create more room in the Edit Filter dialog, click the icon to collapse the left side pane. Optionally, you can then do one of the following to add member values to the filter criteria: • Search for a specific value to select. Platfora displays the first 10,000 values. To view values not listed, you can enter a search string to list a subset of values and then choose from the displayed member values. • Add a match pattern to select values. Define a custom member if the member is not currently in the data. You can also use wildcard characters to match multiple members." • Import a list of values. Import a list values contained in a text file. Page 110 Data Analysis and Visualization Guide - Filter Viz Data To search for a specific value to select: 1. Choose how to match the search criteria. By default, the search pattern searches the entire string for a match (Contains). You can also choose to search for a pattern at the beginning (Starts with) or end (Ends with) of the string. 2. Type the search criteria in the Search box to find possible matches. You can use the wildcard characters of ? (question mark) to represent a single character or * (asterisk) to represent any number of characters, and \ (backslash) as the escape character. Note that the search pattern you enter is not case-sensitive. For example, entering a search pattern of Contains A will return any value that contains the letter A or a, not just values that begin with A. 3. Select matching members to add to filter criteria. 4. Click Apply. Page 111 Data Analysis and Visualization Guide - Filter Viz Data To select values based on a match pattern: 1. Click Create Custom to create an empty search pattern. 2. Type the search pattern in the empty field in the Selected Members box. Note that the search pattern you enter is not case-sensitive. You can use the following wildcard characters in the search pattern: Wildcard Character Matches * Zero or more characters ? Any single character \* The asterisk (*) character \? The question mark (?) character \\ The backslash (\) character Alternatively, you can enter the search pattern in the Search Member field and then click Create Custom. When text is entered in the search box, Platfora creates a selected member based on the search pattern you entered. The asterisk (*) character is added as a wildcard to the beginning or end Page 112 Data Analysis and Visualization Guide - Filter Viz Data of the search. For example, if you chose Starts with san, then Platfora creates a filter section member *san. 3. (Optional) When the edit icon member string. appears next to a selected member, you can edit the filter section 4. Click Apply. To upload a list of values: 1. Click Import List and navigate to a text file containing members you want to select. The file must be smaller than 1 MB and contain fewer than 10,000 values. Platfora adds a member value to the Selected Members list for each value on a new line in the file. Platfora uses the new line character to delimit values in the file. Note that when the imported file contains a very large number of values, the import may take a while to load. 2. Click Apply. Page 113 Data Analysis and Visualization Guide - Filter Viz Data Range Filters When you add a numeric dimension to the Filters panel, the default filter control is a range filter. You can type the Start and End range values (hit ENTER to apply a value after typing it in). If your data is categorical in nature (for example, SKU numbers or product codes), you can switch it to be a category filter instead. Filter on Date Fields Date fields are a special kind of dimension that allow for either range or relative filters. A date range filter allows you to select a range of days to include or exclude. A relative date filter allows you to pick a particular date and then choose the day, week, month, quarter, or year prior to, following, containing, or up to that date. Note that date filters only apply to Date or Date (Time Series) fields. Other date-related fields (Month, Week, Year, and so on) are treated just like any other dimension field. When you add a date field to the Filters panel, you get the date filter control. The range filter control is the default. You can also choose the relative date filter control. Page 114 Data Analysis and Visualization Guide - Filter Viz Data Date Range Filters A range filter is the default filter control for dates. It allows you to choose a specific Start and End range (the range is inclusive) of date values. You can type a specific date value and press ENTER, or use the calendar control to pick dates. Why does my date range include January 1, 1970? If you have records in your source data that contain a null value for a dimension field (the value is empty), Platfora assigns them to a default value when it builds the lens. The default value assigned to null date values is January 1, 1970 (01/01/1970). Relative Date Filters To switch the filter control of a date field from a range to a relative filter, select Relative from the filter Options menu. To create a relative date filter, you must first select a date to use as the basis of the filter rule, either Relative to today or Relative to a specific date. There are four types of relative date filter rules you can create: • Previous - Shows a number of time periods (day, week, month, quarter, or year) prior to and including the period of the selected date. For example, Previous Year includes last year and this year. In particular, choosing Previous Year on March 15, 2015 includes 01/01/2014 through 12/31/2015. • Next - Shows a number of time periods (day, week, month, quarter, or year) following and including the period of the selected date. For example, Next Month includes this month and next month. In particular, choosing Next Year on March 15, 2015 includes 03/01/2015 through 04/30/2015. Page 115 Data Analysis and Visualization Guide - Filter Viz Data • This - Shows the time period (day, week, month, quarter, or year) that the selected date is a member of. For example, This Quarter for 10/15/2015 includes 10/01/2015 through 12/31/2015. • To-Date - Shows the time period (day, week, month, quarter, or year) that the selected date is member of, up to and including the selected date. For example, To-Date Quarter-to-Date for 10/15/2015 includes 10/01/2015 through 10/15/2015. For example the following relative date filter rule includes 02/01/2015 through 03/31/2015: Filter on Measure Fields Measures always contain quantitative data, so filtering on a measure involves choosing a range of numeric values to include or exclude in your visualization. In Platfora, measures are always the result of an aggregate calculation on a group of dimension values, so the visual rendering of a measure filter changes depending on the dimension fields and filters used in the visualization. When you add a measure field to the Filters panel, you get the measure filter control. It shows a range of numeric values based on the dimensions currently selected in the visualization. Note that the possible range of values can change as you add and remove dimensions and dimension filters in the visualization. Page 116 Data Analysis and Visualization Guide - Filter Viz Data You can type values in the Start and End fields (hit the refresh icon to apply the range filter to the visualization). Create a Page Filter Creating a page filter allows you to filter multiple visualizations at once (provided they all use the same lens). Promote a Local Filter to a Page Filter You might want to promote a local filter to a page filter when you have a filter created from a selection of viz marks. To make a local filter apply to the page, you can drag it from the Filters section to the Page Filters section of the Filters panel, or you can use the following steps: 1. Select the viz that contains the filter. 2. Find the filter in the Filters panel. 3. Open the filter contextual menu. Page 117 Data Analysis and Visualization Guide - Filter Viz Data 4. Select Apply to entire page. If any other visualizations on the same vizboard page also use the same lens, the filter will be applied to those visualizations as well. Page 118 Data Analysis and Visualization Guide - Filter Viz Data Create a Page Filter from a Lens Field To make a page filter from a particular lens field, drag a field from the lens builder to the Page Filters drop zone in the Filters panel. Toggle Filter Include/Exclude Mode By default, filters are inclusive, meaning the values selected in the filter are included in the visualization. You can change a filter to be exclusive, meaning everything except the selected values are included in the visualization. Exclusive mode is useful if you have a long list of values, and there are only a couple of values you want to filter out. Page 119 Data Analysis and Visualization Guide - Filter Viz Data To make a filter exclusive instead of inclusive, select Exclude Mode from the field's drop-down menu in the Filters panel. This changes the filter so that selected values are excluded from the visualization. Filter by Selection While working in a visualization, you can isolate particular marks on the visualization by selecting them. You can then save your selection as an inclusive or exclusive filter. Saved selection filters are saved in the Filters panel of the visualization workspace. Selecting and Unselecting Marks To select marks on a visualization, you can: • Click individual marks in the visualization. • Click individual dimension members in the Legends panel. • Click above a mark in a visualization and drag across and/or down to select multiple marks at once. • Press the CTRL key (Windows) or command key (Mac) and click individual marks to add them to the selection. To unselect marks on a visualization, you can: • Press the ESC key. • Click any whitespace in the visualization. Page 120 Data Analysis and Visualization Guide - Filter Viz Data • Press the CTRL key (Windows) or Command key (Mac) and click individual marks to remove them from the selection. Saving a Selection as a Filter To save a selection as a filter: 1. Select the marks on the visualization you want to include or exclude. 2. From the marks selected drop-down menu, choose Isolate Selection or Exclude Selection. 3. The selection is then added as a filter to the Filters panel. Filter by Limit Another way to filter a dimension field is to apply a limit on the number of dimension members appearing in the visualization. You can also use limit in combination with a sort to limit a dimension based on a measure calculation. For example, if you wanted to filter products to only show the top 10 sellers, you could sort the products dimension by the total sales measure, and then limit the results by 10. For dimension fields that are already rendered in the visualization, you can add a limit filter by selecting Options and Sort from the field's drop-down option menu. This opens the Options and Sort dialog. Page 121 Data Analysis and Visualization Guide - Filter Viz Data By default, dimension values are sorted alphabetically (or chronologically for dates) in ascending (AZ) order. You can change the sort order to reflect that of a measure field instead. For example, in this visualization we are sorting departure airports by the number of flights in descending order (most flights to least flights), and limiting the results to 15, thereby showing the top 15 busiest departure airports in our data. Selecting Include "Others" Member will include a mark on the viz labeled Others. This group includes all other dimension values filtered out by the limit criteria as one combined group. Page 122 Data Analysis and Visualization Guide - Filter Viz Data Fields that have a sort or limit applied will show an icon in the field drop zone. Page 123 Chapter 9 Build a Chart Viz Chart visualizations allow you to do exploratory analysis of aggregate data in chart form. Visualizations appear inside panels (or boxes) in the workspace area of a vizboard page. A new vizboard page will always have one empty placeholder visualization to get you started. Topics: • FAQs—Chart Visualizations • About the Chart Viz Workspace FAQs—Chart Visualizations This topic answers some frequently asked questions about chart visualizations. Information forthcoming. About the Chart Viz Workspace When you edit a chart viz within a vizboard, the Builder panel contains the tools you need to build the visualization. The control panels on the left-hand side of the workspace are used to select lens data and design the visual representation of the data. The control panels on the right-hand side of the viz workspace show the data filters and appearance-encoding legends that are active in the selected visualization. Chart visualizations use a grid layout of columns and rows. Fields added to X-axis show on the horizontal axis (equivalent to columns), and fields added to Y-axis show on the vertical axis (equivalent to rows). The axes have headers that display the field names, and labels to display individual field values (or ranges of values). Page 124 Data Analysis and Visualization Guide - Build a Chart Viz A mark is the visual representation of a measure value calculated for a group of input records or rows. A group consists of records that share the same value for the dimension(s) used in the visualization. Hovering over an individual mark will show the data details in a tooltip. 1. Lens name 2. Edit lens button 3. Add menu for supplementing the lens 4. Pan and zoom controls 5. Viz controls 6. Show/hide pages and builder panels 7. Filter lens fields 8. Lens fields 9. Choose viz type 10.Axis controls 11.Mark 12.Tooltip 13.Mark type controls 14.Mark appearance options 15.Mark appearance controls Page 125 Data Analysis and Visualization Guide - Build a Chart Viz 16.Field header/name 17.Field label 18.Filters conrols 19.Encoding legends Page 126 Chapter 10 About Chart Viz Axes Visualizations use a grid layout of columns and rows. Fields added to the X-axis drop-zone show on the horizontal axis (equivalent to columns), and fields added to the Y-axis drop-zone show on the vertical axis (equivalent to rows). The axes have headers that display the field names, and labels denoting the field values (or members). Topics: • Use Multiple Fields on an Axis • Transpose the X and Y Axis • Change Axis Options for Measure Data • Change Axis Options for Dimension Data Page 127 Data Analysis and Visualization Guide - About Chart Viz Axes Use Multiple Fields on an Axis The X-axis and Y-axis drop zones can contain multiple fields. Adding additional measures to an axis produces a dual or trellised axis. Adding additional dimensions to an axis produces a grouped axis. Grouped Axes Placing additional dimensions in an axis drop zone produces a grouped axis. The ordering of the dimensions in the drop zone determines the grouping order. Typically, you should group an axis on dimensions with fewer members for the best results (place lower cardinality dimensions above higher cardinality dimensions). When measures and dimensions are placed on the same axis, the dimension must always be on top. Page 128 Data Analysis and Visualization Guide - About Chart Viz Axes Dual and Trellised Axes Placing two measures in the same axis drop zone produces a dual axis. A dual axis allows you to compare two measures side by side over a common dimension. Page 129 Data Analysis and Visualization Guide - About Chart Viz Axes Placing multiple measures in both of the axis drop zones produces a trellised axis. A trellised axis allows you to compare multiple measures over a common dimension. Page 130 Data Analysis and Visualization Guide - About Chart Viz Axes Transpose the X and Y Axis You can transpose the X and Y axis of a visualization by clicking Swap underneath the X-axis drop zone in the Builder panel. This swaps the fields in the X-axis drop zone with those in the Y-axis drop zone, thereby flipping the orientation of your viz. Change Axis Options for Measure Data Measure fields represent the quantitative data in your viz, and always contain aggregated numeric type data. When you place a measure field in the X-axis or Y-axis drop-zone, you get a quantitative (or continuous) axis. You can change various display options of a measure axis such as the display name formatting, value range, the scale, or the number formatting. Page 131 Data Analysis and Visualization Guide - About Chart Viz Axes Measure axis options can be accessed from the field menu of any measure field placed in the X-axis or Y-axis drop zones of the Builder panel. Measure axis options are available on measure fields placed in the appearance encoding drop zones as well, but they do not have any affect on the viz axes in that context. Change the Value Range of a Measure Axis By default, the range of values shown on a measure axis always includes 0 (zero) in the range. For data that has a range skewed to high values, you may not want the measure axis to start at zero. For example, if your data values start at 1,000,000 and go to 10,000,000 then you may want the measure axis to reflect the actual range of values rather than starting at 0. When you have a measure field in the axis drop zones (X-axis or Y-axis), the default is to always include 0 in the range of values on the axis. To have the axis reflect the actual range of values, you can choose to not always include 0 in the range. Page 132 Data Analysis and Visualization Guide - About Chart Viz Axes Note that this will only affect measure fields whose range does not include 0 already and where the range is skewed. If the range naturally includes 0 (or values close to 0), then the measure axis must include 0. 1. Select Options in the measure field drop-down menu. 2. Deselect Always include zero under the Range options. 3. Click Confirm. For some data, it is easier to see the variations in the data when the range starts at a higher value rather than at zero. Page 133 Data Analysis and Visualization Guide - About Chart Viz Axes Change the Scale of a Measure Axis When you have a measure field in the axis drop zones (X-axis or Y-axis), the corresponding axis shows a numeric scale depicting the range of values for the selected measure. By default, measure axes show a Linear scale of numeric values from lowest to highest. You can change a measure axis to display values on a Logarithmic scale or a Percent of Total scale. Logarithmic Scale Axis There are two main reasons to use a logarithmic scale for a measure axis in a viz. The first is to respond to skewness towards large values (cases in which a few values are significantly larger than the bulk of the data). The second is to show change of magnitude for data values that grow exponentially, such as Richter scale values that measure the strength of an earthquake. Also, when the marks plotted on the viz cover a very large range, changing the measure axis to a logarithmic scale can make it easier to see the growth curve of the data. A linear scale is plotted with equal distance between the tick marks, each unit of change is represented by the same distance on the scale. By contrast, a logarithmic scale is plotted so that the tick marks are not positioned equidistantly; instead, the scale is plotted in such a way that two equal magnitude changes are plotted with the same distance on the scale. For example, here is stock data that shows the average spread (difference between the high and low price) for certain stocks. When shown linearly, the axis Page 134 Data Analysis and Visualization Guide - About Chart Viz Axes shows the average price difference in 2 dollar increments. When shown logarithmicly, the axis shows the magnitude of change between the average price difference values. The values on a log scale axis show the natural logarithm to the base of e, where e (Euler's number) is a mathematical constant approximately equal to 2.718. The natural logarithm is the power to which the constant e must be raised in order to equal the value of magnitude change between the values plotted on the viz. Switching from a linear to a log axis does not change the underlying measure values; it only changes how the values are rendered on the axis. Page 135 Data Analysis and Visualization Guide - About Chart Viz Axes Log axes are not compatible with the Bar or Edge mark types. They only make sense for mark types that can show curves in the data, such as Line, Point, or Area. Negative values cannot be plotted on a logarithmic scale. Also, log axes are not compatible with the Polar Chart viz type and disabled for polar chart visualizations. Percent of Total Scale Axis A Percent of Total scale axis only makes sense when showing stacked charts, such as stacked bar charts, stacked line charts, or area charts. When you enable Percent of Total on a measure axis, the axis scale is rendered as the percentage that each dimension group contributed to the total for each column (or row). The measure values are also represented as percentages rather than the actual values. For example, in the first column of the stacked bar chart below, the Percent of Total axis shows that morning departure flights contributed about 30% to the total number of flights served on October 1, Page 136 Data Analysis and Visualization Guide - About Chart Viz Axes 2012. This is in contrast to 5,000 flights by count. Date is the column in this case for which the percent of total is being calculated. If your measure values contain negative numbers, the axis scale may show percentages above 100% or below 0%. Page 137 Data Analysis and Visualization Guide - About Chart Viz Axes Change Axis Options for Dimension Data When you have a dimension field in the X-axis or Y-axis drop zone, the corresponding axis shows the labels of the dimension values. You can change the sort order of a dimension axis, limit the number of values shown on the axis, or toggle the axis type to display values on a categorical or quantitative scale. Change the Type of a Dimension Axis When you have a dimension field in a X-axis or Y-axis drop-zone, the default axis type is categorical (discrete). This means that the values are shown as individual members or categories. For dimension fields that contain numeric or datetime type data, you can change the axis type to a quantitative scale (continuous) axis instead. Changing a Numeric Dimension Axis from Categorical to Quantitative Dimension fields that are of a numeric data type ( quantitative scale (continuous ) axis. ) can be switched from a categorical axis to a To change the axis type of a numeric dimension: 1. Select Options and Sort in the field drop-down menu. 2. Select Quantitative on the Sorting tab. 3. Click Apply. Page 138 Data Analysis and Visualization Guide - About Chart Viz Axes Changing the dimension axis to Quantitative may also change the default mark type of the viz if you are using the Auto mark type. For example, the viz below is comparing the average trade volume (a measure) to the spread percentage (a numeric dimension) for certain technology stocks. When spread percentage is shown on a categorical axis, member values are compared side-by-side with a representative sampling of value labels along the axis. Switching the axis to quantitative, shows a quantitative scale with tick marks at evenly spaced Page 139 Data Analysis and Visualization Guide - About Chart Viz Axes intervals, similar to how an axis is rendered for a measure. Also, since the viz is now comparing two quantitative values, the default mark type changes from a bar to a point. Note that a dimension axis displayed on a quantitative scale cannot be sorted like a categorical dimension axis can. It will always show the values as a range from low to high with tick marks at evenly spaced intervals. Page 140 Data Analysis and Visualization Guide - About Chart Viz Axes Changing a Date Dimension Axis from Categorical to Quantitative (Continuous) Changing the axis type for datetime type data is controlled by the field you choose from the lens, not by the field menu options. For each datetime type field in your lens, there will be two fields that contain the same data values: • Date - Dragging a Date field to X-axis or Y-axis will give you a discrete date axis (dates are displayed as individual categories). • Date (Time Series) - Dragging a Date (Time Series) field to X-axis or Y-axis will give you a quantitative (continuous) date axis (dates are displayed as a range of values). Page 141 Chapter 11 Build a Cross-Tab Viz Cross-tab visualizations allow you to view the data in the lens in a tabular format. Each cross-tab viz appears inside a panel (or box) in the workspace area of a vizboard page. Topics: • FAQs—Cross-Tab Visualizations • Enable Cross-Tab Totals FAQs—Cross-Tab Visualizations This topic answers some frequently asked questions about cross-tab visualizations. What is a cross-tab viz? A Cross-Tab viz is a viz type that displays the data in the builder drop zones in a tabular, spreadsheet format. Why would I want to create a cross-tab viz instead of a chart viz? You might want to create a cross-tab viz if you want to see the raw data comprising the points on a viz. How is a cross-tab viz similar to and different than a chart viz? A Cross-Tab viz is very similar to a Chart viz with a couple exceptions. The workspace is the same, except the drop zones are called Columns and Rows instead of X-Axis and Y-Axis respectively. All other functionality is the same, like filtering, sorting, sharing, and label controls. How are the fields laid out in the table? In a cross-tab viz, measure values are always shown in columns, regardless of where the measure fields are placed in the Builder panel drop zones. Dimension fields placed in the Rows drop zone will show Page 142 Data Analysis and Visualization Guide - Build a Cross-Tab Viz as rows. However, dimensions placed in other drop zones such as Details, Color or Shape will subdivide the measure columns. Can I switch from Cross-Tab to Chart? Yes. All fields remain in their original drop zones. I can't see all my data, where is it? The size of a cross-tab viz doesn't change when more fields are added to a drop zone. If more columns or rows are displayed in the viz, you can either make the viz size bigger or scroll within the viz to see more data. Enable Cross-Tab Totals A Cross-Tab viz can be enabled to show totals for the data in columns and rows. For example, if each row shows the sum of orders, then turning on totals for rows creates a column showing the sum of all orders in each row. Totals can be enabled separately for rows and columns. When totals are enabled for both rows and columns, the intersection of these totals shows the grand total. When totals are enabled for columns, Platfora adds totals for each column in the cross-tab, including all fields in the Marks drop zones and measures in any drop zone. The value of each total is calculated from the original values in the dataset records (rows) that contribute to the values displayed in the row or column. That is, the total value is not an aggregate of the aggregate values. For some aggregate functions (sum, count, maximum, and minimum), you can easily verify the total using the values displayed in the row or column. For example, the total of sums is the addition of the values displayed in the row or column. And for maximums, the total is the highest value displayed in the row or column. Page 143 Data Analysis and Visualization Guide - Build a Cross-Tab Viz When the viz contains multiple date or time dimensions, Platfora only shows totals for the most granular date or time. For example, if the viz contains month and quarter data, the totals are shown for months. Note that Platfora cannot show totals for a column or row under the following circumstances: • The column or row contains a measure that is configured to show percent of total. • The column or row contains a field that uses an expression containing the ROLLUP function. To show totals for column data, choose Show the total per column from the Columns pull-out menu. To show totals for row data, choose Show the total per row from the Rows pull-out menu. Page 144 Chapter 12 Build a Polar Chart Viz Polar chart visualizations allow you to do exploratory analysis of aggregate data on a chart using polar coordinates. Topics: • FAQs—Polar Chart Visualizations • About the Polar Chart Viz Workspace FAQs—Polar Chart Visualizations A polar chart is a circular chart that shows information as polar coordinates. This topic answers some frequently asked questions about polar chart visualizations. What is a polar chart? A polar chart is a circular chart that use values and angles to show information as polar coordinates. You can perform the same functions on a Polar Chart viz as a Chart viz, such as sort, filter, pan and zoom, and export. In the polar coordinate system, a point on a plane is determined by a distance (the radius) from a fixed point (the pole) and an angle from a fixed direction (polar angle). Page 145 Data Analysis and Visualization Guide - Build a Polar Chart Viz In Platfora, a polar chart uses the Y-axis drop zone for the radius and the Angle drop zone for the polar angle. Additionally, Platfora uses the Size drop zone to determine the size of each mark starting from the perimeter going toward the center. How does a polar chart display data? Like a (rectangular) Chart viz, a Polar Chart viz displays data points as marks (but using polar coordinates). Currently, Platfora supports one mark type for polar charts, Bar. This allows you to create two different kinds of polar chart visualizations: pie charts and donut charts. (Yum!) What is a pie chart? A pie chart is a type of polar chart that is a circle divided into sectors, with each sector illustrating a percentage of the total. The angle of each sector is proportionate to the percentage it represents. For example, an angle of 90 degrees represents 25%. Pie charts are similar to bar charts that are configured to show Percent of Total on a measure axis. What is a donut chart? A donut chart is similar to a pie chart, except it has a blank center (hole). The blank center is due to the Size of the mark not reaching all the way to the center of the circle. The size of a mark is its length measured from the perimeter going toward the center of the chart. By default, the size of each mark is the same. However, donut charts can show additional measure data by changing the size of each mark to reflect the measure value. Why would I want to represent data as a pie or donut chart? Pie and donut charts are useful for showing relative sizes at a glance. The human eye cannot easily distinguish between angles that have similar sizes, especially if there are many very small sectors (marks). Platfora recommends using a pie or donut chart under the following circumstances: • You only have one (pie chart) or two (donut chart) measures to display in the viz. • All values to display in the viz are positive (no zero or negative values). • The dimension values to display represent part of a whole. Page 146 Data Analysis and Visualization Guide - Build a Polar Chart Viz • The number of dimension values to display is low, such as less then 10. You can sort and filter a dimension field to reduce the values displayed. How do I create a donut chart? Create a new viz, choosing Polar Chart as the viz type. Then drag a dimension field into the Color drop zone. When no measure is placed in the Angle drop zone, Platfora uses the default measure for the Angle drop zone. Optionally, you can place a different measure field into the Angle drop zone. Page 147 Data Analysis and Visualization Guide - Build a Polar Chart Viz How do I create a pie chart? Create a donut chart using the instructions above, and then edit the Size drop zone setting to maximize the size. The default size is 0.00x, so change it to 1.00x. Can I convert between a Chart and Polar Chart viz? Yes, you can convert between Chart and Polar Chart viz types. When going from chart to polar chart, any field in the Y-Axis drop zone is moved to the Details drop zone. However, when going from polar chart to chart, all fields remain in their original drop zones. This is also true if change from chart to polar chart and then click Undo to return to chart. Page 148 Data Analysis and Visualization Guide - Build a Polar Chart Viz What happens when the measure field in the Angle drop zone contains negative or zero (0) values? Polar chart visualizations cannot display negative or zero values in the Angle drop zone, so when a measure field contains those values, Platfora displays a warning icon in the upper right corner of the viz. Click the warning icon to open a dialog explaining how many values do and do not appear in the viz. When I sort data in a polar chart viz, where does Platfora start the ordered list of marks? The first mark in a sorted list starts at the Y-axis, which is the top half of a vertical line going through the circle. The next mark in the sorted list is placed next to the first in a clock-wise rotation. Page 149 Data Analysis and Visualization Guide - Build a Polar Chart Viz About the Polar Chart Viz Workspace The Polar Chart workspace is similar to the Chart workspace, but with slightly different drop zones in the Builder panel. When you edit a polar chart viz within a vizboard, the Builder panel contains the tools you need to build the visualization. The control panels on the left-hand side of the workspace are used to select lens data and design the visual representation of the data. The control panels on the right-hand side of the viz workspace show the data filters and appearance-encoding legends that are active in the selected visualization. Polar Chart visualizations use a circular layout of length and angle. Fields added to Angle divide the circle into sectors. Currently, Platfora does not support the Y-axis drop zone (equivalent to length) in polar charts. Polar chart visualizations have no axes and therefore have no headers that display the field names. But they do have labels to display individual field values. A mark is the visual representation of a measure value calculated for a group of input records or rows. A group consists of records that share the same value for the dimension(s) used in the visualization. Hovering over an individual mark will show the data details in a tooltip. 1. Choose viz type 2. Angle drop zone 3. Y-axis drop zone (currently unavailable) 4. Mark type (currently Bar only) Page 150 Data Analysis and Visualization Guide - Build a Polar Chart Viz 5. Mark appearance options 6. Mark appearance controls 7. Mark 8. Tooltip Page 151 Chapter 13 Build a Geo Map Viz Geo map visualizations allow you to do geographical analysis using an aggregate lens. A geo map viz is similar to a scatterplot viz on a map background. Topics: • FAQs—Geo Map Visualizations • About the Geo Map Viz Workspace FAQs—Geo Map Visualizations This topic answers some frequently asked questions about geo map visualizations. What is a geo map viz? A Geo Map is a viz type that allows analysts to perform geographic analysis on a lens that contains location data. It includes the Geography drop zone that places marks representing positions (using a location field) on a map background. A geo map viz is similar to a scatterplot chart viz except the marks are displayed on a map background, but both coordinates are represented by the same field. You can configure the mark appearance drop zones like a Chart viz. Who can create a geo map viz? To create a geo map viz, a user must have the Analyst (Limited) role or higher and also have data access permission on a dataset that contains geo-encoded data. How can I create a geo map viz? When creating a viz, choose Geo Map as the viz type. Platfora lists all datasets that have a location field and a built lens. In the viz, place a location field ( other viz settings as desired like a Chart viz. ) in the Geography drop zone. Configure all What is a location field? A location field ( ) is a dataset field encoded with a complex datatype that includes geo coordinate information (latitude and longitude) and a label that associates a location name with the coordinates. Page 152 Data Analysis and Visualization Guide - Build a Geo Map Viz Depending on how the location field was defined in the dataset, the label value may come from another field (visible or hidden) in the dataset or it may be a unique string that Platfora generates from the coordinate values (for example @(122.33063°W, 37.541886°N)). Only location fields can be placed in the Geography drop zone. When placed there, the coordinates of each data value are placed on a map using the Point mark type. When you hover over a mark, the tooltip displays all data associated with that mark value: the label name, latitude, and longitude. Can I filter on a location field? Yes. Filtering on a location field filters on the location name (label) assigned to each value as a categorical dimension. For example, if you have a location field for zip code data and the location name Page 153 Data Analysis and Visualization Guide - Build a Geo Map Viz for each value is the 5-digit zip code, you can filter on values such as 94402 and 94111, but not on the latitude and longitude coordinates. Can I use a location field in other drop zones, even if it's in a different type of viz? Yes. When a location field is in a different drop zone in any type of viz, Platfora uses the location name value in that drop zone, not the coordinate values. For example, if you place the State Location field into the Labels drop zone, Platfora displays California as the mark label for a location inside the state of California. Why does my map have points off the western coast of Africa at latitude 0 and longitude 0? This is the default geo coordinate position in Platfora. If you see this value in your location field, it usually means the latitude and longitude coordinates were missing (or NULL) for a record. NULL values can result from missing data values in the raw data, or can result from a dataset row that was not able to join to its corresponding referenced dataset row. In either case, Platfora substitutes the 0,0 coordinates for the missing location data. Page 154 Data Analysis and Visualization Guide - Build a Geo Map Viz Optionally, it might mean someone was sailing off the coast of Africa at this exact location when this event record was generated. How does Platfora render maps? Platfora uses Google Maps to render geo map visualizations. Your system administrator needs to configure Platfora to use the Google Maps service in order to create a geo map viz. When I export my map viz to a PNG or PDF file, the map background looks different, why is that? Platfora has modified the visual style of Google Maps in the viz as shown in the web browser. This is to make it easier to view and distinguish marks on the map background. For example, Platfora has reduced the text information on the map to reduce interference with mark points and mark labels. However, due how Google Maps works, some aspects of this modified visual style are lost when exporting to a PNG or PDF file. Platfora has tried to reduce the difference in map style between what you see in the web browser and what you see in an exported file as much as possible. How do I zoom in and out on the map? Platfora uses its own pan and zoom functionality to zoom in and out of the map and to move the map in the viz window. The Google Maps zoom functionality has been disabled. I can't create a geo map viz because I get an error saying "Google Maps service unavailable. Make sure Platfora is configured with a valid Google Maps Client ID." What do I do? To create a geo map viz, Platfora must be configured to use the Google Maps service. If you see this message in a vizboard, contact your system administrator. Page 155 Data Analysis and Visualization Guide - Build a Geo Map Viz Can I convert between a Chart and Geo Map viz? Yes, you can convert between Chart and Geo Map viz types. When going from chart to geo map, any field in the Y-Axis drop zone is moved to the Details drop zone, and the top most field in the X-Axis drop zone is moved to the Geography drop zone. If there are multiple fields in the X-Axis drop zone, all fields below the top field are moved to Details. When going from geo map to chart, the field in Geography is moved to X-Axis, and all other fields remain in their original drop zones. This is also true if change from chart to geo map and then click Undo to return to chart. What happens when the location field in the Geography drop zone contains invalid geo-coordinate values? Geo map visualizations can only display valid latitude and longitude values, so when a location field contains invalid values, Platfora displays a warning icon in the upper right corner of the viz. Click the warning icon to open a dialog explaining how many values do and do not appear in the viz. Page 156 Data Analysis and Visualization Guide - Build a Geo Map Viz Why can't I see my data points near the poles? Some values are outside of the mappable area in Google Maps, either too far south or north. Google Maps uses a variant of the Mercator projection and therefore cannot accurately display points near the poles. Instead, it limits the mappable area to latitude values from approximately -85 degrees to +85 degrees. Unfortunately, this limits some of the polar bears, penguins, and earthquakes that can be placed on a map. About the Geo Map Viz Workspace The Geo Map workspace is similar to the Chart workspace, but with slightly different drop zones in the Builder panel. When you edit a geo map visualization, the Builder panels contain the tools you need to build the visualization. The control panels on the left-hand side of the workspace are used to select lens data and design the visual representation of the data. The control panels on the right-hand side of the viz workspace show the data filters and appearance-encoding legends that are active in the selected visualization. Geo map visualizations display location data on a map background. Geo-encoded fields added to Geography appear as points on the map. On a geo map viz, a mark is the visual representation of a position with geographical coordinates (latitude and longitude). All marks on a geo map viz are Point marks. Hovering over an individual mark shows the data details in a tooltip. 1. Choose viz type 2. Geography drop zone Page 157 Data Analysis and Visualization Guide - Build a Geo Map Viz 3. Mark type (Points only) 4. Mark appearance options 5. Mark appearance controls 6. Mark 7. Tooltip 8. Location field Page 158 Chapter 14 Build a Funnel Viz Funnel analysis visualizations allow you to track user behavior across one or more fact datasets defined in an event series lens. The analysis is performed on individual events instead of aggregated data in order to find sequential patterns in behavior. Topics: • FAQs—Funnel Analysis Visualizations • About Event Series Analysis • About the Funnel Analysis Viz Workspace • Define and Analyze Funnel Stages • Analyze Funnel Stages Across Dimensions FAQs—Funnel Analysis Visualizations A funnel is a visualization type that tracks users' behavior across a sequence of events. This topic answers some frequently asked questions about funnel visualizations. What is a funnel? A funnel is a visual analysis type that tracks users' behavior across a sequence of events. Each step in the sequence is defined as a stage. Each funnel stage shows progressively decreasing proportions of the original set of users. The first stage has 100% of the original group of users by definition. For example, a funnel can be used to track the pathways users take through a website, such as visiting a page, then viewing a video, and then registering. The first stage would be defined as all users who click that page, the second stage would be defined as the users from the first stage who then viewed the video, and the third stage would be defined as the users from the second stage who then registered with the website. The Platfora documentation uses the term users in a generic sense. Users can consist of any type of dimension, such as customers, sessions, devices, players, etc. Page 159 Data Analysis and Visualization Guide - Build a Funnel Viz Why would I want to create a funnel? Use funnels to look for patterns in behaviors among users of a particular group. For example, you might define a funnel for a particular sequence of events, and compare how users in different segments compare to the total and to each other. You can also use funnels to understand at which stages the most drop-off occurs in a multi-step conversion process (conversion rate). Who can create a funnel viz? Funnels can be created by Platfora users who have the Analyst system role (or above), provided they also have access to the underlying source data in Hadoop. How are funnel visualizations created? Funnels are based on event series lenses. When adding a new viz, you first choose Funnel as the viz type. When funnel is selected, Platfora lists the datasets that have an event series lens defined. For more context about event series lens, see About Event Series Analysis. In a funnel viz, you create stages and then define filters for each stage. The order of stages is important. By default, the funnel viz counts all users that meet the criteria for each stage. However, you can analyze the flow through each stage for different sub-groups of users. You do this after defining the funnel stages by dragging into the Rows drop zone one or more dimension fields. How is each stage defined? Each stage is based on an event dataset used in the lens with one or more stage conditions applied. Each stage can use a different event dataset, allowing analysts to define flows from different sources of event data related to the same users. For example, you can define one stage based on website clicks and the next stage based on customer service phone calls. A member of a stage is a user that has a record in the event dataset, meets all conditions defined in the stage, and whose timestamp is greater than the timestamp of its record in the previous stage. For example, if stage one is defined as users who clicked home.html, and stage two is defined as users who clicked checkout.html, then any user who clicked checkout.html after clicking home.html is a member of both stage one and two, regardless of any website clicks in between those events. About Event Series Analysis There are two ways to perform event series analysis in Platfora. This topic explains how to use an event series lens in a funnel visualization. Event series analysis in Platfora involves partitioning events by some entity (such as a user), ordering events records by their timestamp, and then looking for interesting patterns of behavior (events). Platfora uses the Funnel viz type to perform event series analysis on a lens type that was created for this type of analysis. Page 160 Data Analysis and Visualization Guide - Build a Funnel Viz An event series lens (ESL) is a special type of lens created from datasets that were modeled specifically for event series analysis. Data administrators model one or more event datasets around a common entity dataset. Suppose a company has data sources for web page visits by users, email campaigns sent to users, and calls made by users to the company call center. As long as all events have a common entity (the user), they can be combined into a single event series lens and analyzed together in Platfora. This might result in the following datasets: Once the data is modeled in this way, then data administrators can create event series lenses that include event records from multiple datasets. In a vizboard, analysts can then use an event series lens to do funnel analysis. Funnels are currently the only viz type available on event series lens data. The purpose of a funnel viz is to tracks users' behavior across a sequence of events, with each step in the sequence defined as a stage. The behaviors for the users come from the various event datasets. For example, using the dataset model above, you could define one stage in the funnel from the Click Event dataset, and the next stage from the Email Event dataset. About the Funnel Analysis Viz Workspace When you create and edit a funnel analysis visualization, the builder panels contain tools specifically needed to build a funnel viz. Use the panels on the left-hand side to select an event series lens, define the funnel stages, and analyze the funnels for different users. The panels on the right-hand side of the viz workspace show the data filters and comments applied to the selected viz. Funnel visualizations display the number of users in each stage in order. A stage is a single phase (step) in the process that makes up the funnel. The builder panels in the left-hand side of the workspace display on two tabs. Page 161 Data Analysis and Visualization Guide - Build a Funnel Viz Stages Tab Define the stages and their conditions on the Stages tab. The stage builder includes the controls to define stages. 1. Toggle builder panels on/off 2. Stages Builder Panel 3. Stage Definition 4. Stage Event 5. Stage Condition 6. Edit Lens Button 7. Add Menu Button 8. Viz Menu Toolbar 9. Funnel for one distinct user 10.Funnel for all users 11.Filter Controls Page 162 Data Analysis and Visualization Guide - Build a Funnel Viz Analysis Tab Configure the viz to display funnels for different users on the Analysis tab. The analysis builder panel has a Rows drop zone to drag dimension fields from the dimension dataset into. 1. Analysis Tab 2. Lens Panel, listing dimension fields 3. Analysis Builder Panel 4. Field Controls 5. Count Column, listing distinct user count 6. % of Total Column, listing conversion rate from the first stage 7. % of Previous Column, listing the conversion rate from the previous stage Page 163 Data Analysis and Visualization Guide - Build a Funnel Viz Define and Analyze Funnel Stages To create a funnel analysis visualization, you first choose an event series lens and then define the stages in the funnel. Define funnel stages in the stage builder of a funnel viz. 1. Enter a name for the stage. 2. Choose an event defined in the lens. 3. (Optional) Define a condition for the stage by first choosing a field and then the condition that the users must meet to reach this stage. Platfora lists all datetime and dimension fields in the event and dimension dataset. The event or reference name appears after the field name. This is useful when different datasets have fields with the same name. 4. (Optional) Click the 5. icon to add another stage condition. (Optional) Click Create New Stage to add another stage to the funnel. Or, click the duplicate a stage and then edit the copy. You can change the order of a stage in the funnel using the down. Page 164 and icon to icons to move a stage up or Data Analysis and Visualization Guide - Build a Funnel Viz Analyze Funnel Stages Across Dimensions Funnel visualizations show the total number of users who reach each stage. You can also analyze funnels by comparing funnels across dimension fields. 1. Compare funnels across dimensions on the Analysis tab. Users are grouped and filtered by the dimensions in the Rows drop zone. Platfora lists all dimension fields in the dimension dataset. 2. You can compare each dimension against the values of the entire population using the Enable Baseline option. 3. The vertical red lines are the baseline indicators. Page 165 Chapter 15 Explore Marks in a Viz The Platfora vizboard has a number of tools for exploring individual marks (data points) or sets of marks in a viz. You can select or highlight marks, hover over a mark to see its data details, or pan and zoom to focus on a particular area of marks in a viz. Topics: • Select and Highlight Marks on a Viz • Understand Data Values Not Displayed in Viz • View the Data Values for a Mark • Zoom and Pan in a Viz • Drill Down Through Dimension Fields Select and Highlight Marks on a Viz While working in a visualization, you can isolate particular marks on the visualization by selecting or highlighting them. The selection applies to all marks on a vizboard page that share the same dimension group. For example, if you have two visualizations on a page that were created using the same lens data, highlighting a mark in one will also highlight related marks in other visualizations on the same page. To select marks on a visualization, you can: 1. Use the selection tool ( ) to select an individual mark in the visualization. Press the CTRL key (on Windows) or Command key (on Mac) and click an individual mark to add it to the selection. 2. Click a dimension member in the Legends panel. Page 166 Data Analysis and Visualization Guide - Explore Marks in a Viz 3. Click above a mark in a visualization and drag across and/or down to select an area of multiple marks. To deselect marks on a visualization: • Press the ESC key. • Click any whitespace in the visualization. • Press the CTRL key (Windows) or Command key (Mac) and click an individual mark to remove it from the selection. Page 167 Data Analysis and Visualization Guide - Explore Marks in a Viz Understand Data Values Not Displayed in Viz Some visualizations are unable to display all data values as marks. When this occurs, Platfora displays a warning informing you that some data values are not shown. Platfora does not display a value as a viz mark under the following circumstances: • Word Cloud Viz — The number of values exceeds the configured maximum to display. By default, a word cloud viz displays a maximum number of 1500 words. However, your system administrator might change this value using the platfora.viz.word.limit configuration property. Also, the viz size could be too small for the currently configured mark sizes. In this case, you can increase the viz size, or edit the Size drop zone and decrease the maximum size to a value less than 1.00px. • Packed Bubbles Viz — The number of values exceeds the configured maximum to display. By default, a packed bubbles viz displays a maximum number of 40,000 marks. However, your system administrator might change this value using the platfora.viz.bubble.limit configuration property. • Geo Map Viz — Some values are outside of the mappable area in Google Maps, either too far south or north. Google Maps uses a variant of the Mercator projection and therefore cannot accurately display points near the poles. Instead, it limits the mappable area to latitude values from approximately -85 degrees to +85 degrees. • Polar Chart Viz — Polar chart visualizations cannot display negative or zero values in the Angle drop zone. Page 168 Data Analysis and Visualization Guide - Explore Marks in a Viz Platfora displays a warning icon in the upper right corner of the viz when it doesn't display all values. Click the warning icon to open a dialog explaining how many values do and do not appear in the viz. Page 169 Data Analysis and Visualization Guide - Explore Marks in a Viz Page 170 Data Analysis and Visualization Guide - Explore Marks in a Viz View the Data Values for a Mark You can hover your mouse over any mark in a viz to see the data values that comprise that mark. The measure value(s) for the dimension group represented by the mark are shown in a tooltip. Zoom and Pan in a Viz A visualization (viz) fits into a fixed-size panel on a vizboard page, and is always rendered at 100% scale. To explore different areas of a viz in more detail, you can use the viz pan and zoom controls. The viz view size always re-adjusts to 100% when you change the viz definition, exit the vizboard, or switch between pages. Page 171 Data Analysis and Visualization Guide - Explore Marks in a Viz Zooming In To zoom in to a particular area of a viz, select the zoom control want to explore. Each click enlarges the viz by 100%. Page 172 and click the area of the viz you Data Analysis and Visualization Guide - Explore Marks in a Viz To enlarge a viz by 100%, click the plus control . Page 173 Data Analysis and Visualization Guide - Explore Marks in a Viz Zooming Out To zoom out: • • • Select the zoom control click within the viz. , press the CTRL key (on Windows) or Command key (on Mac), and Click the minus control . Click the reset control to reset the viz to 100% size. Page 174 Data Analysis and Visualization Guide - Explore Marks in a Viz Panning While a viz is zoomed in to larger than 100%, you can use the pan control of the viz into the view. to drag a particular area Drill Down Through Dimension Fields Platfora provides access to all of your data, allowing analysts to interactively explore the data. Using Platfora’s drill down capability, analysts can more easily explore data in more detail by double-clicking on a dimension field in a visualization. About Drilling Down When a lens has a dataset with a drill path defined in it, you can view measure data in more detail by navigating (drilling down) through the hierarchy of fields defined in the drill path. Drilling down in a viz allows analysts to easily explore the data and view it more granularly. When a dimension field in a drill path is placed in a drop zone, you can drill down on a particular field value to view the measure data with a more granular dimension. For example, when viewing Sales by Quarter, you could drill down on Q1 to view Q1 Sales by Month. You can continue to drill down further through the hierarchy defined in the drill path until you’re viewing the most detailed field defined in the path. Page 175 Data Analysis and Visualization Guide - Explore Marks in a Viz Drilling down on a field is effectively the same as viewing a different field in the builder drop zone and applying a filter to the viz. For example, the lens in the viz below uses the built-in drill path named Time and includes of the following fields: AM/PM, Hour by 6, Hour by 3, and Hour. AM/PM is placed in the X-axis drop zone. When you drill down on PM along the x-axis of the viz, the following occurs: • Hour by 6 replaces PM in the X-axis drop zone. Page 176 Data Analysis and Visualization Guide - Explore Marks in a Viz • A drill filter is applied to the viz that filters on records that occur in the PM (between noon and midnight). Page 177 Data Analysis and Visualization Guide - Explore Marks in a Viz 1. Gray drill path icon in a Builder drop zone indicates you can drill down on this field. 2. Tooltip shows which field(s) you can drill down to. 3. Blue drill path icon in a Builder drop zone indicates this field was placed in the drop zone because another field was drilled down on to reveal this field. 4. A drill filter is created when you drill down on a field. As you drill down further in the hierarchy, additional field filters are added to the drill filter. When a lens doesn't include a field in drill path, that field is skipped when drilling down. In this example, if the lens in the viz did not contain the Hour by 6 field, then drilling down on PM would instead cause the Hour by 3 field to replace the PM field. Drill Down FAQ This topic answers some frequently asked questions about drilling down in a Platfora visualization. What does it mean to drill down? Drill down is a data analysis technique for navigating from the most summarized to the most detailed categorization of a particular dimension field. How can I drill down in a viz? You can drill down in a viz by double-clicking a particular field value on an axis, a viz mark, or a crosstab cell. What happens when I drill down on a field in a viz? When you drill down on a field, Platfora places the next field in the drill path into that field's drop zone, and it applies a filter to the viz. The drill filter that is applied filters the field value you drilled on. For example, when you drill down on Year 2012, Platfora replaces the Year field in the drop zone with the Quarter field, and it applies a filter to include only data from the year 2012. When can I drill down on a field? To drill down, two or more fields from a drill path must be included in the lens. To drill down on a particular dimension, the lens must have at least one downstream field in the drill path. For example, if a lens has fields A, B, and C from a drill path, and you place field A in a drop zone, then you can drill down to field B and then to C. However, if you place field C in a drop zone, you cannot drill down. How do I know if a field is drillable? When a drillable field is placed in a drop zone, a gray drill path icon is displayed in the drop zone. After the field has been drilled upon, the dimension that is placed in the drop zone will have a blue drill path icon . Page 178 Data Analysis and Visualization Guide - Explore Marks in a Viz How do I know what field will be navigated to when I drill down? When a field is in a Builder drop zone, you can choose View drill path from the field menu. Platfora shows all drill paths that apply to this field. The current field is highlighted in bold. When multiple drill paths are available, Platfora follows the drill path that comes first alphabetically. Additionally, you can hover the cursor over a viz mark or cross-tab cell. The tooltip lists the fields that will be navigated to. Can I drill down on multiple fields concurrently? Yes. You can drill down on one dimension at a time, or on all dimensions with a defined drill path depending on how you drill down in the viz: • One dimension – In a chart viz, double-click on the field value for that dimension value on the horizontal or vertical axis. • All dimensions – Double-click an individual mark in a chart viz or a cell in a cross-tab viz. Can I "drill up?" Yes. You can remove a drill path filter from the Filters panel to “drill up.” When you continue to drill down in a viz, a new field filter is added to the drill filter. You can remove the most recent drill filter to move up in the drill path hierarchy one field at a time, or you can remove the entire drill filter to revert to the highest level of the hierarchy. Drill Down on a Field Value in a Chart Axis Double-clicking on a field value for a drillable field on the axis of a Chart visualization drills down on that value. Page 179 Data Analysis and Visualization Guide - Explore Marks in a Viz To drill down on a field value in a Chart axis, hover over the label of the value you want to more detail on, and then double-click. The tool tip displays the field that will be navigated to. Page 180 Data Analysis and Visualization Guide - Explore Marks in a Viz In this viz, the user drills down on the year 2012 on the X-axis and then the viz shows Quarter. Drill Down on a Viz Mark Double clicking on a single mark in a Chart visualization drills down on all drillable fields in the Builder panel. Page 181 Data Analysis and Visualization Guide - Explore Marks in a Viz To drill down on a viz mark, hover over the viz mark you want to more detail on, and then double-click. The tool tip displays the fields that will be navigated to. In this viz, two drillable fields are in the Builder panel, Year and State. When the user drills down on the mark shown, both Year and State are drilled down on to Quarter and Neighborhood respectively. Page 182 Data Analysis and Visualization Guide - Explore Marks in a Viz Drill Down on a Cross-Tab Cell Double clicking on a single cell in a Cross-Tab visualization drills down on all drillable fields in the Builder panel. Page 183 Data Analysis and Visualization Guide - Explore Marks in a Viz To drill down on a cell, hover over the cell you want to more detail on, and then double-click. The tool tip displays the fields that will be navigated to. Page 184 Data Analysis and Visualization Guide - Explore Marks in a Viz In this viz, two drillable fields are in the Builder panel, Year and State. When the user drills down on the mark shown, both Year and State are drilled down on to Quarter and Neighborhood respectively. Page 185 Data Analysis and Visualization Guide - Explore Marks in a Viz View a Drill Path in a Viz When a drillable field is in a drop zone, you can view all fields in each drill path the field is a member of. The current field is highlighted in bold to easily where in the path it’s located. 1. From the drillable field in the Builder panel, select View drill path. Page 186 Data Analysis and Visualization Guide - Explore Marks in a Viz 2. View the drill path(s) in the dialog that displays. 3. Click OK. Drill Up When you drill down on a field, filters are created on that visualization. You can "drill up" in a viz by removing these filters. When you drill down in a viz, a drill filter is created with a single field filter. When you continue to drill down the hierarchy defined in the drill path, a new field filter is added to the drill filter. You can move up in the drill path hierarchy (drill up) by removing either of these filters. The type of filter you remove determines how far up the hierarchy you navigate. Drill Up to the Highest Level in the Drill Path Remove the entire drill filter to navigate to the highest level in the drill path hierarchy. Removing the drill filter removes all field filters it contains. Page 187 Data Analysis and Visualization Guide - Explore Marks in a Viz Hover over the gray drill path icon icon changes to a blue delete icon in the Filters panel for the drill filter to remove. The drill path . Click the icon to remove the entire drill filter. Drill Up One Level in the Drill Path Remove the most recent field filter in a drill filter to drill up one level on that drill path. Page 188 Data Analysis and Visualization Guide - Explore Marks in a Viz Click the gray delete icon at the bottom of a drill filter to remove that most recent field filter. When the drill filter contains one field filter, removing the field filter is the same as removing the entire drill filter. Page 189 Chapter 16 Prepare Pages and Dashboards A vizboard is made up of one or more pages. A page can contain multiple visualizations, and those visualizations may or may not use the same underlying lens data. Within a page, you can add multiple visualizations, edit them, arrange them, or delete them. While working in a vizboard, you work on one page at a time, but you can easily move visualizations between pages. A visualization can be thought of as an individual insight in the overall data story of the vizboard, and pages are a way to group visualizations together around a particular theme. Topics: • FAQs—Vizboard Pages • Resize a Page to Fit the Browser Window • Show and Hide Tool Panels • Manage Viz Layout • Edit a Visualization • Arrange Visualizations on a Vizboard Page • Preview a Vizboard with View Only Permission FAQs—Vizboard Pages A vizboard page is a separate canvas that contains one or more visualizations. This topic answers some frequently asked questions about vizboard pages. Page 190 Data Analysis and Visualization Guide - Prepare Pages and Dashboards How do I create a vizboard page? By default, a new vizboard has one page. You can add additional pages by clicking Add Page at the top of the Pages panel. The new page is added at the bottom of the list in the Pages panel. Click the page icon to select it. Page 191 Data Analysis and Visualization Guide - Prepare Pages and Dashboards How do I rename a page? You can change the name of a page at any time by clicking its name in the Pages panel and editing it directly. All pages are given a default name of Page X (where X is a number) when they are first created. You must have edit permissions on a vizboard in order to rename a page. How do I resize the canvas of a page? You can resize a page canvas using either of the following methods: • Use the View > Resize Page menu. Platfora changes (either increases or decreases) the canvas size to take up the space currently shown in the web browser. Platfora also changes the size of each viz proportionately. For more information, see Resize a Page to Fit the Browser Window. Page 192 Data Analysis and Visualization Guide - Prepare Pages and Dashboards • Click and drag the lower right corner of the canvas to change the canvas to the desired length and width. Page 193 Data Analysis and Visualization Guide - Prepare Pages and Dashboards How do I make a copy of a page and all visualizations on the page? You can make a copy of page on the current vizboard (duplicate) or another vizboard. The page is added as the last page in the vizboard and is titled "Copy of pagename." 1. Click the icon to make a copy of the page in another vizboard. Platfora prompts you to choose the vizboard name. 2. Click the icon to make a duplicate copy of the page in the current vizboard. Page 194 Data Analysis and Visualization Guide - Prepare Pages and Dashboards How do I delete a page? In the Pages panel, hover over the desired page and click the delete icon. Can I change the order of the pages in a vizboard? Yes. Click and drag a page in the Pages panel and place it in the desired order. Can I move a viz from one page to another? Yes. Click the viz in the middle of its toolbar and drag it to the desired page in the Pages panel. Then go to the new page to position the viz as desired. Page 195 Data Analysis and Visualization Guide - Prepare Pages and Dashboards Resize a Page to Fit the Browser Window While working in a vizboard, you can resize the visualizations to fit the available canvas area on the page. This allows you to easily adjust the page canvas size as you resize your browser window or show and hide the Builder panels. To resize visualizations to fit the available workspace on the page, select View > Resize Page. Note that this correctly resizes the visualizations to fit the available horizontal space, but can compress visualizations vertically if you have multiple visualizations on the page below the visible area. Page 196 Data Analysis and Visualization Guide - Prepare Pages and Dashboards Show and Hide Tool Panels You can show and hide the Pages, Builder, and Filters tool panels as you work on a vizboard page. Hiding the panels allows you to have more workspace for viewing and arranging visualizations. When you need to edit a visualization or page, you can toggle the tool panels on again. Use the Pages, Builder, and Filters buttons on the sides of the vizboard page to show and hide the tool panels. You can also close a panel by clicking the X in the right corner of the panel. Manage Viz Layout Vizboard pages can contain one or more visualizations. You can arrange visualizations on a page and edit, rename, and delete them as necessary. Page 197 Data Analysis and Visualization Guide - Prepare Pages and Dashboards Edit a Visualization To edit a visualization in a vizboard, simply click anywhere inside the visualization. Use the tools in the Builder panels to work on the visualization. Arrange Visualizations on a Vizboard Page By default, new visualizations are added to the middle of a vizboard page in fixed-size panel. You can move and resize visualizations on a vizboard page by dragging them by the panel borders. Control Where New Visualizations are Added on a Page By default, new visualizations are added to the middle of a vizboard page (Add to center of page), sometimes overlapping the visualizations that are already on the page. You can choose to have new visualizations added to the bottom of the page instead. To change the default position for new Page 198 Data Analysis and Visualization Guide - Prepare Pages and Dashboards visualizations, select View > Grow Page Downward. This option is set individually for each page in the vizboard. Page 199 Data Analysis and Visualization Guide - Prepare Pages and Dashboards Resize Visualizations To resize a visualization on the page, click a corner or border handle on the visualization, and drag it to the desired size. Move Visualizations To move a visualization to another location on the page, click a border on the visualization (between the border handles) or the viz header, and drag to the desired position. Page 200 Data Analysis and Visualization Guide - Prepare Pages and Dashboards Preview a Vizboard with View Only Permission Save and Preview allows you to save a vizboard and preview it in presentation mode. This allows you to view and test the vizboard as users with view only vizboard permissions will see it if you publish or share it. While in preview mode, the viz builder tools and vizboard edit controls are hidden. Users with View only vizboard object permissions can view all panels in the vizboard except the Builder panel. That means they can view all pages in the vizboard, all local and page filters, and the legends for each viz. However, since they can't view the Builder panel, they cannot view the fields used in each drop zone or the settings applied to them, such as sorts and limits. Therefore it's very important to make sure that each viz clearly indicates what information is displayed. Consider editing viz titles and labels to ensure they are informative. For example, if you limit the number of results in a viz, you could edit the viz title to reflect that, such as "Top 10 Results." For more information on view only vizboard permissions, see View Vizboard with View Only Permission. 1. Select Save and Preview from the vizboard Save menu. 2. This opens the vizboard in preview or presentation mode. This allows you to test vizboard functionality as view-only users will see it. Page 201 Data Analysis and Visualization Guide - Prepare Pages and Dashboards 3. Click Edit to exit preview mode and return to edit mode. Page 202 Chapter 17 Share and Collaborate As you prepare visualizations and vizboards, you can share your insights with other Platfora users, collaborate with other analysts on your findings, or export the viz or underlying data for use in an outside presentation or application. Topics: • Set Vizboard Permissions • Manage Vizboard Comments • Share a Link to a Vizboard • Export Viz Data • Export a Viz Image • Email a Single Viz as an Image • Share Vizboard as a PDF • Export a Viz as a New Dataset Page 203 Data Analysis and Visualization Guide - Share and Collaborate Set Vizboard Permissions The user that creates a vizboard is the default vizboard Owner. The vizboard owner can view, edit, or delete the vizboard. Vizboard owners can also change the vizboard access permissions to grant or revoke access permissions for other Platfora users. The default vizboard permissions grant View Only access to the Everyone group (all Platfora users), but only the vizboard owner can edit the vizboard. For more information on view only vizboard permissions, see View Vizboard with View Only Permission. Vizboard access permissions are not the same as data access permissions. Granting a user access to a vizboard does not necessarily mean they can see the underlying data that comprises the visualizations in the vizboard. Users will also need to have data access to the source data and datasets in order to see the visualizations in a vizboard. 1. Select Permission Settings from the vizboard Share menu. 2. In the Sharing and Permissions dialog, select Add Collaborators. 3. In the Find User or Group dialog, select the users or groups to add to the vizboard. You can use the search to quickly find users or groups by name. 4. Click Add after you have selected all of the users and groups you want to grant vizboard permissions to. 5. Make sure the user or group is in the correct permission category (Own, Edit, or View Only). If they are not, use the drop-down menu to the right of their name to move them. To remove the user's vizboard permissions entirely, click the X to the right of their name. 6. Click OK. Page 204 Data Analysis and Visualization Guide - Share and Collaborate View Vizboard with View Only Permission Vizboard owners can view, edit, or delete the vizboard and grant or revoke access permissions for other Platfora users. Users with View Only object permissions on the vizboard can perform a limited set of tasks on the vizboard. By default, all Platfora users have view only access to vizboards (the default vizboard permissions grant View Only access to the Everyone group). However, in order to see a visualization (viz) in a vizboard, users also need to have data access to the source data and datasets used in the viz. Users with view only permissions can perform a limited set of tasks on a vizboard depending on the user role. All user roles with view only vizboard permissions can perform the following tasks: • Change the filter properties for existing local and page filters. Note that filter changes are never saved to the vizboard. • Create a new filter by selecting a group of viz marks and choosing Isolate Selection or Exclude Selection from the marks selected viz menu. Note that filter changes are never saved to the vizboard. • Turn live updates off and on. You might want to do this if you change several filter properties. • Write and view comments on the vizboard. Any comments added by a view only user are automatically saved to the vizboard. • Export the vizboard as a PDF file. • Send an email with the vizboard as a PDF file. • Share a link to this vizboard. • Send an email with a single viz as an image. • Download the data used in a viz as a CSV file. Page 205 Data Analysis and Visualization Guide - Share and Collaborate • Download the image of a viz as a PNG file. • Download the data lineage for a viz as a JSON file. • Resize the page by choosing Resize Page from the View menu. Note that the new page size is not saved to the vizboard. In addition to the tasks above, users with Analyst (Limited) role with view only vizboard permissions can perform the following tasks: • Edit the vizboard without saving it, or save it as a new vizboard. To do this, click Edit. You can change the visualizations on the page to see how the data is displayed differently. If you want to save your changes, you must save the vizboard under a new name. • Export the data used in a viz as a CSV file to a remote file system. In addition to the tasks above, users with Analyst role (and higher) view only vizboard permissions can perform the following tasks: • Create a new lens from the fields used in the viz. Manage Vizboard Comments Analysts can share insights found in a vizboard with other Platfora users within the Platfora application. FAQs—Vizboard Comments You can collaborate with other analysts by sharing your insights using comments on a vizboard. This topic answers some frequently asked questions about using this feature. What are vizboard comments? Vizboard comments are a way for analysts to share insights found in a vizboard with other Platfora users. You can include text and even a snapshot of a particular viz for reference. Each vizboard maintains a single, collective history of all comments on each viz in the vizboard. Who can add comments to a vizboard? Any user who is allowed to view a vizboard can add a comment to that vizboard. How can I view vizboard comments? When a vizboard already has comments on it, the Comments panel lists the number of comments. Click the Comments panel to open the comments dialog. Page 206 Data Analysis and Visualization Guide - Share and Collaborate How do I create viz comment? You can create a viz comment by clicking the Comments panel for the vizboard or by clicking the comments icon ( ) in a viz. For more information, see Create a Comment on a Viz. Can I make changes to an existing comment? Yes. You can edit a comment that you wrote previously. You can also delete a comment you wrote. To edit or delete a comment, hover over the cursor over the comment and click either the edit ( delete ( ) button. Page 207 ) or Data Analysis and Visualization Guide - Share and Collaborate What's the difference between a comment and a reply? A reply is a comment that is threaded with another comment. New comments are added to the top of the comment history, and replies appear below the original comment, indented one level. Page 208 Data Analysis and Visualization Guide - Share and Collaborate Vizboards change all the time. How can I ensure that someone else views a particular version? You can share a link to the current vizboard version by creating a permalink. Click the Share menu and select Direct link to version in the Permalink section. Copy the URL provided and paste it into the comment. Create a Comment on a Viz You can add a comment to a vizboard and optionally include a snapshot of a particular viz. Page 209 Data Analysis and Visualization Guide - Share and Collaborate 1. In the viz to comment on, click the comments button ( ) in the viz toolbar. 2. Enter text for the insight you found. 3. (Optional) Click Snapshot to view a snapshot of the viz configured at this particular date and time. 4. (Optional) Click Use Snapshot to include the snapshot in the comment. 5. Click Post. Page 210 Data Analysis and Visualization Guide - Share and Collaborate Page 211 Data Analysis and Visualization Guide - Share and Collaborate Share a Link to a Vizboard Platfora provides a direct, permanent link (a permalink) to a vizboard that you can give to other Platfora users. Copy the permalink provided in the vizboard and share it with others by pasting, such as in a vizboard comment or email. Platfora includes different permalink types that direct you to the following vizboard versions: • Latest version. Use this permalink when you want others to always view the most up to date information shown in the vizboard. • Current version. Use this permalink when you want others to view the specific data presented in the current version of the vizboard, even if more versions are made later. The URL in the permalink is the same as the other permalink but includes a "version" parameter. 1. Click the vizboard Share menu. 2. In the Permalink section, select the permalink type, either Link to vizboard (latest version) or Direct link to version (current version). 3. Select the URL and copy the text. Page 212 Data Analysis and Visualization Guide - Share and Collaborate Export Viz Data You can export the data comprising a visualization as a CSV-formatted file, and download it to your local computer or to a remote file system such as HDFS or S3. Users in view-only mode can only download viz data as a CSV file to their local computer. 1. From the viz toolbar click the export menu . 2. Select either Download Data as CSV or Export Data as CSV. When you choose Download Data as CSV, a single a gzip-compressed comma-separated values (csv) file is created on your Desktop (for Windows) or in Downloads (for Mac). The file naming convention is: dataset-name_lens-name_epoch-timestamp.csv.gz When you choose Export Data as CSV, you must enter a URL to a remote file system in the format of: protocol://hostname:port/path-to-export-location For example: hdfs://10.80.231.123:8020/platfora/exports Page 213 Data Analysis and Visualization Guide - Share and Collaborate 3. Depending on the size of the data requested, the export can take a while to complete. If downloading data, stay on the page until the download starts or else the export will be cancelled. Export a Viz Image You can export a visualization as a PNG image file, and download it to your local computer. 1. From the viz toolbar click the export menu and select Download Chart as PNG. 2. This downloads a PNG image file to your local computer (to the Downloads folder on Mac or the Desktop on Windows). The file name will be the same as the viz title. Email a Single Viz as an Image You can share a visualization with other users outside of Platfora by sending it in an email. The viz will be sent as a PNG image embedded in an email message. To share a viz via email, Platfora must be configured to connect to an email server. Page 214 Data Analysis and Visualization Guide - Share and Collaborate To share a viz via email, Platfora must be configured to connect to an email server. 1. From the viz toolbar, click the email button . 2. In the Send email to field, enter a comma-separated list of email addresses. 3. Edit the Subject field so the email recipients have more context about the email's contents. By default the email subject line will be the same as the viz name. 4. In the Additional Comments field, enter any additional information that you want to include in the email message body. This text will appear above the viz image in the email message. Tip: The email is sent by whatever email account was configured by your Platfora system administrator. You may want to include your name in the email message body so that the recipients know the email is from you. 5. Click Send Email. 6. If Platfora is able to connect to the email server and send the email, you will see a confirmation at the top of the page. Tip: Platfora only reports if the email was successfully sent, not for failed deliveries. Any failed delivery notifications will be sent to the email account that was configured by your Platfora system administrator. Page 215 Data Analysis and Visualization Guide - Share and Collaborate Share Vizboard as a PDF You can share a vizboard as a PDF file. You might want to do this to share insights discovered in Platfora with a larger group of people. You can share rendered PDFs with other Platfora users or users without access to Platfora. Users do not need access to a vizboard to view a rendered PDF. FAQs—Vizboard PDFs You can share a vizboard as a PDF file. This topic answers some frequently asked questions about using this feature. How can I create a vizboard PDF to share with others? You can export (download) it manually or send it in an email message, either manually or on a scheduled basis. Note that Platfora must be configured to send email in order to send a vizboard PDF in an email message. How does Platfora render the vizboard PDF? PDF rendering happens in the background. Depending on the size and number of visualizations, this might take some time. You can continue working in the vizboard while the PDF is rendered. You can view the progress of PDF rendering jobs on the System > Activities page. Each vizboard page becomes a page in the PDF. For more information, see How Platfora Renders a Vizboard as a PDF. How do I know when Platfora has finished creating the PDF file? When the rendering is complete, a notification is displayed in the Platfora UI. Also, a notification appears in your profile notification list. Where are the PDF files stored? Platfora stores generated PDF files on the server. Platfora retains PDF files for one day by default (your administrator might change this value using the platfora.rendering.job.artifact.maxAge configuration property). How do I export a vizboard PDF manually? In the vizboard, choose Prepare PDF for Download from the Share menu. For more information, see Export a Vizboard as a PDF Manually. How do I email a vizboard PDF manually? In the vizboard, choose Email PDF from the Share menu. For more information, see Email a Vizboard as a PDF Manually. Page 216 Data Analysis and Visualization Guide - Share and Collaborate How do I create a schedule to automatically generate and email vizboard PDFs? In the vizboard, choose Create Schedule from the Share menu. For more information, see Email a Vizboard as a PDF on a Schedule. How many schedules can I define for a vizboard? Each vizboard can have one schedule, but each schedule can have multiple schedule rules that determine when Platfora creates a PDF file of the vizboard and sends it out in an email message. I sent an email to another person, but not to myself. Can I download the PDF file locally? Yes. You can manually download a PDF while it still exists on the server. To do this, click the link to the file from your user profile notification list. Once it's been deleted, you must render the PDF again. How can I temporarily pause the vizboard PDF email schedule? Open the vizboard, and choose Pause Schedule from the Share menu. Platfora won't render the vizboard PDF nor send any email message for this vizboard until someone chooses Resume Schedule from the Share menu. How can I delete the vizboard PDF email schedule? Open the vizboard, and choose Delete Schedule from the Share menu. All rules defined in the schedule are removed. Platfora won't send any email message for the vizboard unless some creates an email schedule for it. Page 217 Data Analysis and Visualization Guide - Share and Collaborate I can't share a vizboard as a PDF, why are all the PDF menu items grayed out? By default, Platfora allows vizboard users to export to PDF. However, your administrator may disable this functionality. Who can create a vizboard PDF? Any user who can view a vizboard can download it as a PDF or can send it as a PDF file manually. However, to create, edit, pause, resume, or delete a vizboard PDF email schedule, the user must be able to edit the vizboard. Export a Vizboard as a PDF Manually You can export a vizboard as a PDF file manually. When the rendering is complete, the file is downloaded to the default downloads directory for the browser and a notification is displayed in the Platfora UI. Page 218 Data Analysis and Visualization Guide - Share and Collaborate Any user who can view a vizboard can export a PDF file manually. 1. Click the vizboard Share menu. 2. Choose Prepare PDF for Download. You can also choose Prepare PDF for Download from the vizboard contextual menu on the Vizboards page. 3. In the dialog, click Save Vizboard and Prepare PDF or Prepare PDF, depending on whether the vizboard was already saved. Page 219 Data Analysis and Visualization Guide - Share and Collaborate Platfora saves the vizboard, if applicable, and begins to render the vizboard as a PDF file. When the PDF is ready, it's downloaded to your local machine according to how the web browser is configured. Large vizboards may take some time. Platfora informs you when it starts rendering a PDF and when the file is created. Email a Vizboard as a PDF Manually You can send a vizboard as a PDF file in an email message. Platfora renders the PDF, sends it to the addresses you specify, and displays a notification in the Platfora UI. Any user who can view a vizboard can send it as a PDF file manually. Page 220 Data Analysis and Visualization Guide - Share and Collaborate 1. Click the vizboard Share menu. 2. Choose Email PDF. 3. If the vizboard hasn't been saved yet, click Save Vizboard and Email in the dialog. 4. Enter the email recipient(s), subject, and body. Separate multiple recipient addresses with a comma (,). If the Subject is left empty, Platfora uses the vizboard name as the subject of the email. 5. Click Send Email. Platfora begins to render the vizboard as a PDF file. When the PDF is ready, it's sent as an attachment in an email to each recipient you entered. Large vizboards may take some time. Platfora informs you when it starts rendering a PDF and when the file is created. Email a Vizboard as a PDF on a Schedule You can configure Platfora to send a vizboard as a PDF file in an email message on a regular basis. Page 221 Data Analysis and Visualization Guide - Share and Collaborate Any user who can edit a vizboard can create, edit, pause, or delete a vizboard PDF email schedule. 1. From the vizboard Share menu, click Create Schedule. 2. In the Create Email Schedule dialog, enter one or more email addresses to send the vizboard PDF to. Separate multiple addresses by commas. 3. Enter a subject for the message. By default, Platfora uses the vizboard name as the subject. You can change this. 4. Enter text to include in the body of the message to give context. Page 222 Data Analysis and Visualization Guide - Share and Collaborate 5. In the Schedule Rules section, choose the type of rule to define, either by days of the week at certain times, days of the week at an hourly interval, or day of the month at a certain time. 6. Choose the day(s) and time(s) to send the vizboard PDF email. 7. (Optional) Click Add another rule to define an additional rule for the schedule. Vizboard PDF emails are only sent once if you define multiple overlapping rules for the same time and day. 8. Click Create. How Platfora Renders a Vizboard as a PDF This section lists the guidelines Platfora follows when rendering vizboards as PDF files. • The file name is comprised of the vizboard name and a date stamp using the following format: vizboardname_yyyy-mm-dd.pdf. • Platfora prompts you to save the vizboard if it’s not saved already before rendering it as a PDF. • Each vizboard page is rendered as a single page in the PDF file, showing all visualizations on the canvas and all applicable legends. • Legends display a maximum of 20 items for the Color drop zone, and nine items for the Shape drop zone. • Cross-tab and funnel visualizations in the PDF only show the visible area displayed in Platfora. • The size of each PDF page is the same as the size of the vizboard page canvas plus room for the legends. • The minimum size of a PDF page is 6.34 x 4.54 inches. Export a Viz as a New Dataset You can choose to save the data comprising a visualization as a new dataset. This is called a derived dataset in Platfora. A derived dataset allows you to save the query results from a lens as a new dataset in the Platfora Data Catalog. Once a derived dataset is saved, you can use it as you would any other Page 223 Data Analysis and Visualization Guide - Share and Collaborate dataset in Platfora - you can edit it, add additional computed fields, and join it by reference to other datasets in the Platfora data catalog. Page 224 Data Analysis and Visualization Guide - Share and Collaborate 1. Create a visualization (or viz query) that you want to use as the basis of the derived dataset. Tip: Working in Cross-Tab view allows you to see the data rows and columns that will comprise the new dataset. Tip: Save the vizboard if you want to keep the query after you create the derived dataset. 2. From the viz toolbar click the export menu and select New Dataset from Viz. 3. In the New Dataset from Viz dialog, enter a New dataset name and choose one of the derived dataset Types. There are two types of derived datasets you can create: • Static saves the data comprising a viz as a file in the Hadoop distributed file system (DFS), essentially taking a snapshot of the viz data at that point in time. A static derived dataset does not change if the source dataset changes or if its parent lens is rebuilt. You can use a static derived dataset as a historical snapshot, and then use it to compare past data to recent data. • Dynamic does not save the actual data in the viz, but instead saves the query used to produce the data from its parent lens. You can think of it as a dynamically updated view whose data changes as the source data changes. When a dynamic derived dataset is used to build a lens, the saved query is run against the parent lens to obtain the latest data values, and stored temporarily while Page 225 Data Analysis and Visualization Guide - Share and Collaborate processing the final lens build results. Dynamic derived datasets are typically used to aggregate records in one dataset to make it possible to join to records of another dataset. 4. Click Create Dataset. 5. Once the dataset is created, click Go to Dataset. This will exit the vizboard, and discard any unsaved changes in the vizboard. 6. In the new dataset, you can define a key, edit fields, add computed fields, and add references just as you would in a regular dataset. 7. One difference between derived datasets and regular datasets is their data source. Derived datasets always use a Platfora lens as their data source. 8. Click Save or Save and Exit to save your changes to the dataset. Page 226 Chapter 18 Request or Derive Additional Lens Fields The data available for a viz is determined by the fields made available in the lens. If the lens currently doesn't have the data you need, you can modify the lens to add more fields if you have the proper permissions. Data analysts without the permissions to edit a lens still have other options to enhance and supplement existing lens data. Topics: • Vizboard Computed Fields • Combined Fields • Request Additional Lens Data • Segments Vizboard Computed Fields You add a computed field in a vizboard by writing an expression that transforms existing fields. Computed fields help you refine your data analysis. FAQs—Vizboard Computed Fields This topic answers some frequently asked questions about vizboard computed fields. What is a vizboard computed field? A vizboard computed field is a user-defined field created in a vizboard that transforms existing lens fields using the Platfora expression language. Vizboard computed fields can be used in a viz like fields that come from the dataset. For example, you can filter on them, sort them, and include them in builder drop zones. Vizboard computed fields are defined in a visualization and are local to the existing vizboard. You can use the vizboard computed field in any viz in the current vizboard that uses the same lens as the viz where the vizboard computed field was defined. Can I use a vizboard computed field in a different vizboard? No. The vizboard computed field is stored in the current vizboard only. Page 227 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields Who can create a vizboard computed field? To create a vizboard computed field, a user must have the Analyst (Limited) role or higher and also have object access permission to edit the vizboard. Why would I want to create a vizboard computed field? You might want to create a vizboard computed field for any of the following purposes: • The lens doesn't have data in the form you need and you want to quickly supplement the lens data without rebuilding the lens. • The lens doesn't have the data in the form you need and you don't have edit permission on the dataset to create a dataset computed field. • You want to test a computed field expression to see how it renders in a visualization before defining the computed field in the dataset. How do I create a vizboard computed field? In a visualizations, select Add Computed Field from the Add menu. Enter the expression and click Save. Page 228 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields How do vizboard computed fields appear in the lens panel? Vizboard computed fields appear in the lens panel like lens fields, but shown with blue text. How is a vizboard computed field different than a dataset computed field? In a vizboard, a computed field is calculated using data from a lens. Because the lens contains preprocessed data, it is immediately available for use in visualizations. There is no need to rebuild the lens. In a dataset, a computed field is calculated using the raw source data. The data in the computed field is calculated when the lens is built, which may take some time. Platfora reads all source data during lens build time only. As a result, functions that require Platfora to process all source data can only be used in expressions in a dataset computed field. For example, you cannot include the PARTITION function in a vizboard computed field. You can include all expressions, including user-defined functions, in a dataset computed field. Why are some fields and functions not available in the vizboard expression builder? Vizboards query the prepared data in a Platfora lens. A lens contains data that has been pre-processed and optimized during the lens build process. Some computed field expressions can only be executed during the lens build process, not later at lens query time. Therefore, vizboard computed fields have some limitations that dataset computed fields don't have. These limitations are as follows: • Vizboard computed fields can only operate on the fields that exist in the currently selected lens. Dataset fields that were not selected at lens build time cannot be used. • A vizboard computed field can break if the fields it relies on are later removed from the lens definition. • A vizboard computed field is only available within the vizboard where it was defined. It is not available from the dataset, other lenses, or programmatic lens queries. • You cannot create event series processing computed fields (PARTITION expressions). • You cannot use custom user-defined functions (UDFs) in vizboard computed field expressions. Page 229 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields • Referenced fields cannot be used directly in an aggregate function. To work around this, create another vizboard computed field that includes the referenced field, and then use that vizboard computed field in the aggregate function. What kinds of expressions can I write in a vizboard computed field? You can define expressions that operate on all types of fields that exist in the current lens using functions included in the Platfora expression language except for PARTITION. Note that Platfora recommends aggregating data in the lens build instead of in a vizboard computed field whenever possible. When data is only aggregated in a vizboard computed field instead of the lens, the lens size is typically much larger. Can I use a user-defined function (UDF) in a vizboard computed field? No. Platfora only calls user-defined functions during lens build time. How do I edit a vizboard computed field? From the lens data panel or viz builder panel, select Edit Field from the field menu. Change the expression or field name, and click Save. What happens when I edit a vizboard computed field that's currently used in a viz? Platfora updates all visualizations that use the vizboard computed field. However, if the new expression results in a different field role, then a visualization will result in an error if the new field role is unsupported in that drop zone. For example, if a field expression is originally a measure and is placed in the Size drop zone, and then changes to become a dimension, there will be an Page 230 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields error in that viz saying there is an incompatible field. This is because only measure fields are allowed in the Size drop zone, not dimensions. When this happens, you can click Undo to revert the change, edit the expression to correct the field role, or remove the field from all invalid drop zones. For more information about which field roles are allowed in each drop zone, see About the Builder Drop Zones. How do I delete a vizboard computed field? From the lens data panel or viz builder panel>, select Delete from the field menu. Page 231 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields When you delete a vizboard computed field, Platfora removes it everywhere it's used in all visualizations in the vizboard, including drop zones and filters. What happens when I delete a vizboard computed field that's currently used in a viz? Platfora informs you which visualizations currently use the field you are trying to delete. You can cancel the operation or delete the field anyway. Where can I find examples of useful computed field expressions? Platfora's expression reference documentation has lots of examples of useful expressions. Expression Language Reference. Page 232 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields Add a Vizboard Computed Field You add a computed field in a vizboard by writing an expression that transforms existing fields. Once you define the computed field, it is listed in the vizboard lens panel. You can then use the computed field by dragging it to a drop zone. 1. Select Add Computed Field from the Add menu. This opens the Add Field expression builder window. 2. Enter a field name and a description. The description is optional but very useful for others that might use your analysis later. Page 233 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields 3. Choose function from the Functions list. Use the drop-down to restrict the type of functions you see. 4. Double-click a function from the list to add it to the Expression area. The Expression panel updates with the function's template. Also, the Fields list refreshes with those fields you can use with the function. For example, TO_DATE works on STRING data types. 5. Double-click a field to add it into the Expression area. 6. Continue adding functions and fields until your expression is complete. 7. Make sure your expression is correct. The system checks your syntax as you build the expression. The yellow text box below the Expression area displays error messages. Platfora only allows you to save expressions that evaluate successfully. If it cannot resolve an expression, the Save button is not available. 8. Click Save to add the new computed field to the vizboard. Computed field names are blue to distinguish them from regular lens fields. Writing expressions for computed fields is an advanced topic. For information on working with expression syntax, see Platfora Expressions. Combined Fields Platfora allows you to view multiple, orthogonal dimension values side by side in the same chart. About Combined Fields A combined field is a special type of vizboard computed field that lets you merge the values across different dimension fields into a single field. This allows you to compare the values of different dimension fields on the same axis in a viz. Combined fields are immediately available for use in visualizations without having to rebuild the lens. You might want to use a combined field to compare different segments of the same population that may or may not be mutually exclusive. For example, if a viz contains a segment of users who bought iOS devices and another segment of users who bought any mobile devices, you could place both segments in a combined field to see the patterns of both types of buyers in a single chart. Combined fields have the same restrictions as vizboard computed fields as well as the following: • Combined fields can only be comprised of fields from the lens or segments. They cannot be comprised of vizboard computed fields. • The fields that comprise a combined field must be of the same data type. Once the first dimension is selected to add to the combined field, Platfora only allows users to add a field of the same datatype. For example, if a FIXED field is selected, only other FIXED fields are available to add. • Segment fields, STRING fields, and location fields are considered to be of the same data type and can comprise a single combined field. Page 234 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields • Combined fields are not available to be used inside a vizboard computed field. 1. Segments included in the combined field. 2. Combined field comprised of two segments. 3. Viz only shows values configured to include in the combined field. Page 235 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields Create Combined Field Create combined fields in a visualization using the Add menu. Once you define the combined field and save it, the field is added to the lens panel of each viz in the vizboard. You can then use the combined field by dragging it to a builder drop zone. 1. Select Add Combined Field from the Add menu. 2. Enter a name for the combined field. 3. Click the icon for all fields to add to the combined field. After you click the first field, Platfora only allows you to select other fields of the same datatype. Page 236 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields 4. (Optional) Click the filter icon to choose which field values to include in the combined field for analysis. In the Edit Filter dialog, choose the selected members and click Apply. By default, adding a dimension to a combined field includes every value from that dimension. You might want to filter a dimension to focus on the populations of interest. 5. Choose whether or not to show the overall total values for the focal dataset in the combined field. When you click Show the Total, Platfora includes an additional value for the combined field that represents the total for the focal dataset. Since the combined field can contain members that overlap, showing the total is useful to benchmark each member against the total without any overlap in values. You might want to enable this when the combined field is comprised of segment fields and you want to compare each segment against the total population. 6. Click OK. Page 237 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields The combined field is added to the lens panel. Combined field names are blue so you can distinguish them from regular lens fields. Request Additional Lens Data When working in a viz, sometimes the lens used in the viz doesn't contain all data you need. If you have the appropriate permissions, you can add new dataset fields to the current lens or create a new lens from the current dataset and include additional fields. Create a New Lens From Viz You can create a new lens from a visualization as well as from the data catalog. The newly created lens uses the same datasets as the viz's lens, and it includes the fields used in the viz by default. However, you can add or remove fields from the new lens before saving it. You might want to derive a new lens based on the fields currently used in a viz to drill further into the current data at a more granular level. By getting more granular on only the data you're interested in, you create and build a lens that is only as large as necessary. For example, if the current lens only includes data down to the day level, but you want to view the current viz at the hour level, you could create a new lens based on the fields included in the viz and also add the Hour field from the Time reference. Page 238 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields To create a new lens, you must be at least an Analyst role or above. You must have data access permissions on the source data and at least Define Lens from Dataset object permissions on the focus dataset, as well as any datasets that are included in the lens by reference. 1. Choose Derive New Lens from the marks menu in a viz. 2. Choose whether or not to include all fields from the existing lens in the new lens. The default is to only include fields currently used in the viz. You can add and remove fields from the lens, and add, modify, and remove lens filters in the lens builder panel. 3. Click Define Lens. Segments Members of a population can be grouped together so you can analyze events while looking for patterns among the group. Identify members that share similar behaviors by creating a segment, which is a collection of members of a population grouped together based on common behaviors and attributes. Segments can be created by Platfora users who have the Analyst system role (or above), provided they also have access to the underlying source data in Hadoop and define lens permission on all datasets used in the segment definition. FAQs—Segments A segment is a collection of members of a population grouped together based on common behaviors and attributes. This topic answers some frequently asked questions about segments. Page 239 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields What is a segment? A segment is a special type of dimension field that you can create to group together members of a population that meet some defined common criteria. A segment is based on members of a dimension dataset (such as customers) that have some behavior in common (such as purchasing a particular product). Segments may also be based on common attributes as well as behaviors. For example, users older than 30 years (attribute) who returned to your website (behavior) is a segment, but users older than 30 years (attribute) is not. Segments allow you to analyze behaviors among a subset of the population. Segments in Platfora are more than just saved filters. Members in a segment are always based on users in a dimension dataset and have a condition (behavior) in common from a fact dataset. Rows in the dimension dataset are either members of the segment or not members. How are segments created? Create segments in a visualization. To create a segment, a viz must use a lens that allows segments to be created for at least one referenced dataset. Choose Add Segment from the Add menu. When a segment is created, Platfora creates and builds a special type of lens that determines the members in the segment, and then it adds the segment as a field in the viz lens. For more details, see Create Segments. Who can create a segment? Segments can be created by Platfora users who have the Analyst system role (or above), provided they also have access to the underlying source data in Hadoop and define lens permission on all datasets used in the segment definition. How can I use a segment? Use a segment field in a viz like any other dimension field. For example, you can use it any drop zone, filter on it, or use it in a combined field. Use it in the viz it was created in, or in another lens that references the same dimension dataset the segment is based on. What values are included in a segment? A segment field has two possible values: members that are IN the population and members that are NOT IN the population. When you use a segment field in a viz drop zone, Platfora displays both the IN and NOT IN values. However, when you use a segment field in a viz filter, page filter, or combined field, Platfora uses only the IN values by default by filtering out the NOT IN values. Can I use a segment field in a combined field? Yes. When you put multiple segment fields into a combined field, you can use that in a viz to perform side by side comparisons. Page 240 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields Why would I want to use a segment? You can easily compare behaviors among different segments in your population. You can create multiple segments and then use those segment fields in a single viz to compare the results. You can use segments to compare behaviors of a group of individuals across multiple fact or event datasets. Once a segment is created, it can be used in other lenses that use a dataset that is based off of or references the dimension dataset used in the segment. This allows you to join data between fact datasets that otherwise can't be joined. Where are segments located in the vizboard lens panel? Segment fields are grouped together under the Segments group of a referenced dataset in the vizboard lens panel. The Segments group only appears when the data administrator allows segment creation for that reference in the lens. Page 241 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields I see segment fields in my viz field list that I didn't create. Where did they come from? Any segment that someone else created for the referenced dataset is available to you to use if you have been granted permission on the segment. These segments could have been created in a viz using the same lens, or a different lens that references the same referenced dataset in your lens. What is the difference between the different types of segments I see in the vizboard lens panel? When a segment is first created in a viz, it is an ad-hoc segment. Ad-hoc segments are created in a viz and can be edited in a viz or from the Data Catalog page. However, someone editing a lens can choose to include an ad-hoc segment field as a field in the lens, natively. When a segment field is included in a lens, Platfora builds the results of the segment values into the lens. Vizboard users can still use the segment field in a viz like they can for ad-hoc segments, but the performance will be faster. Ad-hoc segments appear in the lens panel as blue fields, and segments included in the lens appear as black fields in the vizboard lens panel. (This works similarly to other fields. Vizboard computed fields appear as blue fields, and fields included in the lens appear as black.) Why can't I create a segment for a referenced dataset? Data administrators who edit and build lenses can choose whether or not to allow vizboard users to create and use ad-hoc segments. They can make this choice per reference in a lens. For example, they might allow ad-hoc segments for the Arrival Airport reference, but not the Date or Time references. Page 242 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields What happens if I edit a segment included in a lens? When you edit and save a segment included in the lens, Platfora updates the segment members and then changes the segment to an ad-hoc segment. It remains an ad-hoc segment for all users and all lenses that have the segment. This allows vizboard users to work with the most recent members of the segment. However, as soon as you rebuild a lens that includes the segment, the segment changes back from being an ad-hoc segment to native segment field in that lens only. When you create a viz based on a lens that already included the segment, the segment appears as an ad-hoc segment (blue field in the vizboard lens panel). How is the data in the segment updated? The data in the segment is updated every time the segment lens is rebuilt. The segment lens is rebuilt when someone changes the segment definition conditions, or when someone rebuilds the segment lens from the Data Catalog > Segments page. What happens if the viz or lens I created the segment from is deleted? Once a segment is created, Platfora creates its own special type of lens behind the scenes to create and populate the members of the segment. The segment does not rely on the original lens or viz it was created from. However, if the original lens is deleted, you can no longer edit any segment made from it. Page 243 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields Why is the Segments group empty in the vizboard lens panel? The Segments group in the lens panel always appears when ad-hoc segments is enabled for a reference. However, if no segments have been created for the reference in any lens, then no segment fields appear in the group. Platfora displays a warning icon explaining why no segments are listed. Page 244 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields There are a lot of segments in the vizboard lens panel. Is there any easier way to find what I need? Yes! Choose Only Show Segments from the lens panel menu. This hides all other fields in the lens panel until you decide to show them again. Page 245 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields How do I edit a segment definition? You can edit a segment from a viz or the Data Catalog > Segments page. In a viz, click the segment's contextual menu and choose Edit Field. You can edit a segment from the lens panel or when it's being used in a viz drop zone. How do I let another Platfora user use the segment I created? When you create a segment, only you have permission on it by default. You can edit the segment definition and grant other users permission. They can use it in a viz as long as they have permission on the segment, and data access to the datasets used in the segment lens. When editing the segment definition, click Permission Settings to grant permission to other users. Create Segments Create segments from a viz that uses a lens with a referenced dataset. Segments created for a particular referenced dataset are available for any other viz that uses a lens with that dataset. You can create segments in any viz either from scratch or by selecting a single mark or funnel stage. Page 246 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields Create Segment from Chart Viz Lens You can define a segment from scratch in any viz that uses a dataset referenced by another dataset. 1. Choose Add Segment from the lens Add menu. 2. Enter a name for the segment. The name must be unique among segment names and dataset names. Platfora recommends using a very descriptive name. There is no description field for segments to help other users understand the criteria for segment membership. This name will appear in the list of available fields for a viz when the segment is added to a viz. Segment names cannot be changed later. Instead, you can create a copy of the segment, use a different name for the copy, and delete the original segment. Page 247 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields 3. In the Segment of field, choose a dimension dataset used in the lens that this segment is based on. Platfora lists datasets that are referenced by another dataset. Platfora displays the fact dataset used in this lens in the Occurring in Dataset field and the lens used in this viz in the Origin Lens field. 4. In the Segment Conditions section, define a condition for membership in the segment. Choose a field from the lens and the required values. Platfora lists fields defined in the lens from the fact dataset and each referenced dataset. Vizboard computed fields and segment fields are not listed. 5. (Optional) Click the icon to define additional conditions. 6. (Optional) In the Segment Value Label section, define the value labels for records that are members and non-members of the segment. If no value labels is specified, the segment name is used by default. Platfora uses the text you enter here as the labels for the segment values when used in a viz. For example, when you use the segment in the X-Axis drop zone, this text is used as the two values displays along the x-axis of the viz. 7. Click Save Segment. Platfora creates the segment members lens and adds the segment to the current viz lens panel as an available field in the Segments section under the reference it was created in. Segment field names are blue so you can distinguish them from regular lens fields and segment fields that are included in the lens. The spinning icon on the right side of the segment in the lens panel indicates Platfora is building the special lens for the segment. If the segment definition does not explicitly include a condition based on the fact dataset, Platfora displays a message informing you of the implied condition on the fact dataset. The implied condition means that the segment only includes members that also appear in the fact dataset. You can save the segment with the implied condition, or edit the segment to create your own condition on the fact dataset. Page 248 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields Create Segment from Mark Selection Segments can be created from a single selected mark in a viz. Platfora determines the conditions for the mark and configures the segment conditions for you. You can accept the conditions as is, or modify them further. You can't create a segment from a mark if a vizboard computed field or another segment field is in any drop zone in the viz. 1. Select a single mark in a viz. 2. Choose Add Segment from the viz selection menu. 3. Enter a name for the segment. Page 249 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields The name must be unique among segment names and dataset names. Platfora recommends using a very descriptive name. There is no description field for segments to help other users understand the criteria for segment membership. This name will appear in the list of available fields for a viz when the segment is added to a viz. Segment names cannot be changed later. Instead, you can create a copy of the segment, use a different name for the copy, and delete the original segment. 4. Verify the segment is being created in the dimension dataset you want. Change if necessary. 5. (Optional) Edit the pre-configured segment definition attributes, such as creating new conditions. 6. (Optional) In the Segment Value Label section, define the value labels for records that are members and non-members of the segment. If no value labels is specified, the segment name is used by default. Platfora uses the text you enter here as the labels for the segment values when used in a viz. For example, when you use the segment in the X-Axis drop zone, this text is used as the two values displays along the x-axis of the viz. 7. Click Save Segment. Platfora creates the segment members lens and adds the segment to the current viz lens panel as an available field. Segment field names are blue so you can distinguish them from regular lens fields. If the segment definition does not explicitly include a condition based on the fact dataset, Platfora displays a message informing you of the implied condition on the fact dataset. The implied condition means that the segment only includes members that also appear in the fact dataset. You can save the segment with the implied condition, or edit the segment to create your own condition on the fact dataset. Page 250 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields Create Segment from Funnel Stage Segments can be created from a selected stage in a funnel viz. Platfora configures the segment attributes for you, and they cannot be edited. When creating a segment from a stage, choose whether to include records that meet the stage criteria, or records that meet the previous stage criteria, but not the current stage's criteria (the records that dropped out at the current stage). For example, if the previous stage contained 50% of the original records, and the current stage contains 20% of the previous stage records, then choosing Reached at least this stage results in 10% of the original records (0.5 x 0.2) and choosing Reached the previous stage, but not this stage results in 40% of the original records (0.5 x (1.0-0.1)). 1. Select a single stage in a funnel viz. 2. Choose Add Segment from the viz selection menu. 3. Enter a name for the segment. The name must be unique among segment names and dataset names. Platfora recommends using a very descriptive name. There is no description field for segments to help other users understand the criteria for segment membership. This name will appear in the list of available fields for a viz when the segment is added to a viz. Segment names cannot be changed later. Instead, you can create a copy of the segment, use a different name for the copy, and delete the original segment. 4. Choose the segment membership criteria, either records that reached the selected stage or records that reached the previous stage, but not the selected stage. 5. Click Create. Page 251 Data Analysis and Visualization Guide - Request or Derive Additional Lens Fields Platfora creates the segment members lens and adds the segment to the current viz lens panel as an available field. Segment field names are blue so you can distinguish them from regular lens fields. Page 252 Chapter 19 Save Your Work in a Vizboard It is best practice to save your work each time you add or change a visualization or page in a vizboard. Vizboards are not auto-saved as you work. If you leave or reload the vizboard without saving your changes first, or if the browser closes unexpectedly, any unsaved changes will be lost. Topics: • Manage Vizboard Versions • Restore a Vizboard to a Previous Version • Exit a Vizboard without Saving • Using Undo and Redo in a Vizboard • Duplicate a Vizboard Whenever a vizboard has unsaved changes, the Save button is highlighted with an orange border. If you try to navigate away from the vizboard without saving, you will be prompted to stay and save your changes first. Leaving the vizboard without saving will discard any unsaved changes. Each time you click Save, a version is added to the vizboard Restore menu. Page 253 Data Analysis and Visualization Guide - Save Your Work in a Vizboard Manage Vizboard Versions The Restore menu has a history of each time a vizboard was saved, ordered from most recent to oldest. Versions are named by the date and time they were created, along with the name of the user who saved the version. You cannot rename versions. You can only delete or restore versions. About Vizboard Versions A version is a snapshot of the vizboard and its contained pages and visualizations at a given point in time. Note that the underlying lens data is not saved when you save a version, only the page and visualization definitions. The Restore menu keeps a history of each vizboard Save event. You can always go back to any point in the timeline, and then work forward from that point. A vizboard will always be opened to its last saved version. If multiple users are working on a vizboard at the same time, the latest saved version may not necessarily be the last version that you saved. However, you can always go back in the version timeline to recover your work. Delete a Saved Version The list of versions in the Restore menu will continue to grow each time you click Save. You should periodically delete older versions from the list that you no longer need. To delete a version: 1. Open the Restore menu. 2. Click the icon to the right of the version you want to delete. Page 254 Data Analysis and Visualization Guide - Save Your Work in a Vizboard 3. Click Confirm. Restore a Vizboard to a Previous Version Restoring a vizboard to a previous version allows you to rollback the vizboard to a previously saved state, then continue working from that point. If you want to keep the current state of the vizboard as well, be sure to Save before restoring. Any unsaved changes will be discarded when you restore to a previous version. 1. Select the version you want to restore from the Restore menu. Page 255 Data Analysis and Visualization Guide - Save Your Work in a Vizboard 2. Click Confirm to rollback the vizboard to the selected version. 3. If the vizboard has unsaved changes, you will be prompted before reverting to the previous version. Click Reload this Page to discard unsaved changes. If you want to keep the current state of the vizboard, click Stay on Page. This allows you to Save a new version before you restore. Exit a Vizboard without Saving When you close a vizboard without saving it first, Platfora asks whether or not you want to save the vizboard before exiting. To close a vizboard, just navigate off the page by clicking any other page in the top navigation header. If you don't want to lose your work, Save the vizboard before you exit. If you try to navigate away from a vizboard page without saving your work first, you will be prompted to either Stay on Page (do not exit without saving first) or Leave Page (exit without saving). If you want to save your work, click Stay on Page. Then click Save before leaving the page again. Using Undo and Redo in a Vizboard While working in an ad hoc analysis vizboard, you can undo and redo your changes to pages and visualizations. A history of actions is kept for the current session only. Leaving the vizboard page clears the action history. Page 256 Data Analysis and Visualization Guide - Save Your Work in a Vizboard Hovering your mouse over the Undo or Redo button displays a tooltip of the action to be undone or redone. Duplicate a Vizboard Save As makes a copy of the current vizboard and saves it as a new vizboard. Note that the underlying lens data is not saved when you duplicate a vizboard, only the page and visualization definitions. Any Page 257 Data Analysis and Visualization Guide - Save Your Work in a Vizboard unsaved changes in the current vizboard will be saved in the duplicate copy, but not in the original vizboard. The version history in the Restore menu is not carried forward to the new vizboard. 1. Select Save As from the vizboard Save menu. 2. Enter a name for the new vizboard and click Confirm. Page 258 Chapter 20 Trace the Data Lineage of Viz Fields All fields in a viz originate from raw data in Hadoop files. However, the data may be processed and manipulated by multiple datasets and computed fields before it appears in a viz. Analysts can trace data lineage through Platfora lenses, datasets, all field types (computed, base, and measure), and data sources to the source files in Hadoop. Topics: • Export Viz Data Lineage • What Data Lineage Includes • Interpret Data Lineage Levels Lineage tells the analyst where data used in critical decisions came from and what was done to the data before it was used in a viz. You might want to view data lineage to address any of the following questions: • How can I reproduce this result? Sometimes data analysts need to port the data to a different system for further analysis. Data lineage shows the order in which the actions are performed. Analysts can reproduce these actions in the new system. • Where did this data come from? Visualizations can be based on derived datasets that are based on other datasets. That is, on the results of someone else's analysis. By viewing the data lineage, analysts can prevent false positives. • When was this data retrieved from Hadoop? Data lineage shows the timestamp of all objects in the field's history, including the timestamp of the Hadoop source files. Page 259 Data Analysis and Visualization Guide - Trace the Data Lineage of Viz Fields Export Viz Data Lineage Exporting data lineage to a JSON file for all fields in a visualization allows data analysts to see where the data in the fields ultimately came from and how they were manipulated in the process. The data lineage report includes filters applied to fields used in the viz. 1. From the viz toolbar click the export menu and select Download Lineage as JSON. 2. The JSON file (platforaData.json by default) is saved in the default downloads directory configured for your browser. What Data Lineage Includes All fields in a visualization originate in the underlying lens. The lens, in turn, comes from a dataset (or another lens). These are the fields parent objects. Data lineage includes more than a field's parent objects, it includes details about filters and expressions from those objects as well. When applicable, the data lineage report shows the following types of information: • Lens field names • Reference field names • Filter expressions • Field expressions • Lens names Page 260 Data Analysis and Visualization Guide - Trace the Data Lineage of Viz Fields • Dataset names • Data source names • Data source locations • Lens build specific source file names including their paths • Timestamps When viewing data lineage for a lens field, Platfora lists that lens field and all parent objects (up to the configured number of levels). Interpret Data Lineage Levels System administrators can configure how much of a data lineage to report. This configuration applies to all data in Platfora's catalog. Administrators cannot configure lineage on individual catalog items. You can view data lineage for a single field or all fields in a viz. To view a field's lineage, choose Show Data Lineage from the field's menu. To view lineage for an entire viz, choose Export Lineage as JSON from a visualization's menu. Page 261 Data Analysis and Visualization Guide - Trace the Data Lineage of Viz Fields When interpreting lineage, think of a field as a root of a tree. Each object that feeds into this field is a branch. The lens itself is always the initial "branch" of a field. The simplest tree is always a base field: The Device ID tree has one branch (LENS: VOD Device) with two ancestors, the VOD Device lens itself and its parent dataset, VOD Mobile. On any Platfora lineage, the last parent is always the dataset. A field can have multiple branches. For example, fields formed from aggregate functions or other computation have multiple branches. Consider a Session Event Count field that results from the Page 262 Data Analysis and Visualization Guide - Trace the Data Lineage of Viz Fields MAX(Session Event Number) function. The Session Event Count lineage contains two branches, the lens and the Session Event Number: You can see that the Session Even Number branch is itself a computed field. As such, the branch has four ancestors Event Name, Device-Asset, Client Time and the VOD Device lens. A Platfora system administrator can configure the number of levels from the root field to the end of an ancestor branch. When reading a report, each ancestor appears as a branch with subsequent levels indented after that. For example, the Session Event Count field shows three levels to reach through the Session Event Number to the dataset: Page 263 Data Analysis and Visualization Guide - Trace the Data Lineage of Viz Fields • Session Event Number • LENS: VOD Device • DATASET: VOD Mobile By default, Platfora displays five levels of lineage on the graphical report and 10 levels in an exported JSON file. System administrators can configure reporting levels on the System > Global Settings page. Increasing the levels is only necessary if your data contains multiple derived datasets or computed fields. The following table lists the possible ancestors for each object type. Object Possible Ancestors Lens or dataset field that is a base field lens Lens or dataset field that is a computed dimension lens, lens or dataset field field Lens or dataset field that is a measure (aggregate) field lens, lens or dataset field Lens or dataset field that is a referenced field lens, lens or dataset field that is a referenced field, lens or dataset field that is the foreign key Lens dataset Dataset that is a derived dataset lens Dataset that is not a derived dataset data source Data source no parent Page 264 Chapter 21 Viz Example Gallery You can create dozens of different types of charts depending on the types of fields placed in the viz drop zones. This sections gives samples of different chart types including the types of fields required in the different drop zones (chart type recipe). If the default mark type doesn't create the desired chart type, change the mark type from the Mark Type menu. Topics: • Axis Chart Viz Examples • Non-Axis Chart Viz Examples • Polar Chart Viz Examples • GeoMap Viz Examples • Cross-Tab Viz Examples Axis Chart Viz Examples The examples in this section demonstrate how to create different kinds of Chart visualizations that display both an X and Y axis. Page 265 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Simple Bar A bar chart is useful for recording discrete categories of data. Bar graphs can also be used for more complex comparisons of data with grouped bar charts and stacked bar charts. Table 2: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Dimension Measure Details Color Size Opacity Shape Labels Page 266 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Bars with Different Color Values Table 3: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Dimension Measure Details Color Measure Size Opacity Shape Labels Page 267 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Stacked Bar Table 4: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Dimension Measure Details Color Dimension Size Opacity Shape Labels Page 268 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Split Bar with Values Table 5: Chart Type Recipe Drop Zone X-axis Y-axis Details Color Field Type Dimension Measure Dimension Measure Size Opacity Shape Labels Page 269 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Bar with Variable Widths Table 6: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Dimension Measure Details Color Size Measure Opacity Shape Labels Page 270 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Point Plot Table 7: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Dimension Measure Details Color Size Opacity Shape Dimension Labels Page 271 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Scatter Plot Table 8: Chart Type Recipe Drop Zone X-axis Y-axis Details Field Type Measure Measure Dimension Color Size Opacity Shape Labels Page 272 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Color Encoded Scatter Plot Table 9: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Measure Measure Details Color Dimension Size Opacity Shape Labels Page 273 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Bubble Chart Table 10: Chart Type Recipe Drop Zone X-axis Y-axis Details Field Type Measure Measure Dimension Color Size Measure Opacity Shape Labels Page 274 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Color Encoded Bubble Chart Table 11: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Measure Measure Details Color Size Dimension Measure Opacity Shape Labels Page 275 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Gradient Grouped Scatter Plot Table 12: Chart Type Recipe Drop Zone X-axis Y-axis Details Color Field Type Measure Measure Dimension Measure Size Opacity Shape Labels Page 276 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Shape Encoded Scatter Plot Table 13: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Measure Measure Details Color Size Opacity Shape Dimension Labels Page 277 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Heatmap Table 14: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Dimension Dimension Details Color Measure Size Opacity Shape Labels Page 278 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Size Encoded Heatmap Mark Type: Bar (not Auto) Table 15: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Dimension Dimension Details Color Size Dimension Measure Opacity Shape Labels Page 279 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Size Encoded Matrix Mark Type: Bar (not Auto) Table 16: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Dimension Dimension Details Color Size Measure Opacity Shape Labels Page 280 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Line Chart Table 17: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Date (Time-Series) Measure Details Color Size Opacity Shape Labels Page 281 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Multi-Series Line Chart Table 18: Chart Type Recipe Drop Zone X-axis Y-axis Details Field Type Date (Time-Series) Measure Dimension Color Size Opacity Shape Labels Page 282 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Color Encoded Multi-Series Line Chart Table 19: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Date (Time-Series) Measure Details Color Dimension Size Opacity Shape Labels Page 283 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Variable Color Line Chart Table 20: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Date (Time-Series) Measure Details Color Measure Size Opacity Shape Labels Page 284 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Variable Thickness Line Chart Table 21: Chart Type Recipe Drop Zone X-axis Y-axis Field Type Date (Time-Series) Measure Details Color Size Measure Opacity Shape Labels Page 285 Data Analysis and Visualization Guide - Viz Example Gallery Non-Axis Chart Viz Examples The examples in this section demonstrate how to create different kinds of Chart visualizations that do not have an X and Y axis. Chart Type: Packed Bubbles Mark Type: Point (not Auto) Table 22: Chart Type Recipe Drop Zone Field Type X-axis Y-axis Details Color Size Measure Opacity Shape Page 286 Data Analysis and Visualization Guide - Viz Example Gallery Drop Zone Labels Field Type Dimension Chart Type: Packed Bubbles with Different Colors Mark Type: Point (not Auto) Table 23: Chart Type Recipe Drop Zone Field Type X-axis Y-axis Details Color Size Dimension Measure Opacity Page 287 Data Analysis and Visualization Guide - Viz Example Gallery Drop Zone Field Type Shape Labels Dimension Chart Type: Text Gauge Table 24: Chart Type Recipe Drop Zone Field Type X-axis Y-axis Details Color Size Opacity Shape Labels Measure Page 288 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Word Cloud Table 25: Chart Type Recipe Drop Zone Field Type X-axis Y-axis Details Color Size Measure Opacity Shape Labels Dimension Page 289 Data Analysis and Visualization Guide - Viz Example Gallery Polar Chart Viz Examples The examples in this section demonstrate how to create different kinds of Polar Chart visualizations. Polar Chart Type: Donut Table 26: Polar Chart Type Recipe Drop Zone Angle Field Type Measure Details Color Dimension Size Opacity Labels Page 290 Data Analysis and Visualization Guide - Viz Example Gallery Polar Chart Type: Size Encoded Donut Table 27: Polar Chart Type Recipe Drop Zone Angle Field Type Measure Details Color Size Dimension Measure Opacity Labels Page 291 Data Analysis and Visualization Guide - Viz Example Gallery Polar Chart Type: Pie Table 28: Polar Chart Type Recipe Drop Zone Angle Field Type Measure Details Color Size Dimension Maximize the size. Opacity Labels Page 292 Data Analysis and Visualization Guide - Viz Example Gallery GeoMap Viz Examples The examples in this section demonstrate how to create different kinds of Geomap visualizations. Chart Type: Simple Geo Map Table 29: Chart Type Recipe Drop Zone Geography Field Type Location Details Color Size Opacity Shape Labels Page 293 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Color-Encoded Geo Map Table 30: Chart Type Recipe Drop Zone Geography Field Type Location Details Color Measure Size Opacity Shape Labels Page 294 Data Analysis and Visualization Guide - Viz Example Gallery Chart Type: Size-Encoded Geo Map Table 31: Chart Type Recipe Drop Zone Geography Field Type Location Details Color Size Measure Opacity Shape Labels Cross-Tab Viz Examples The examples in this section demonstrate how to create different kinds of Cross-Tab visualizations. Page 295 Data Analysis and Visualization Guide - Viz Example Gallery Cross-Tab Type: Simple Table 32: Chart Type Recipe Drop Zone Columns Field Type Measure Measure Rows Dimension Details Color Size Opacity Shape Labels Page 296 Data Analysis and Visualization Guide - Viz Example Gallery Cross-Tab Type: With Dimensional Groupings Table 33: Chart Type Recipe Drop Zone Columns Field Type Measure Measure Rows Dimension Dimension Details Dimension Color Size Opacity Shape Labels Page 297 Data Analysis and Visualization Guide - Viz Example Gallery Cross-Tab Type: Show Totals Columns setting: Show the total per column Rows setting: Show the total per row Table 34: Chart Type Recipe Drop Zone Columns Field Type Dimension Measure Rows Dimension Details Color Size Opacity Shape Labels Page 298 Chapter 22 Platfora Expressions Platfora comes with a powerful, flexible built-in expression language that you can use to transform, manipulate, and query data. This section describes Platfora's expression language, and describes how to use it to define dataset computed fields, vizboard computed fields, measures, lens filters, and lens query statements. Topics: • Expression Building Blocks • PARTITION Expressions and Event Series Processing (ESP) • ROLLUP Measures and Window Expressions • Computed Field Examples • Troubleshoot Computed Field Errors • Write a Lens Query • FAQs - Expression Basics • Platfora Expression Language Reference Expression Building Blocks This section explains the building blocks of an expression, and the general rules for constructing a valid expression. Functions in an Expression Functions perform common data processing tasks. While not all expressions contain functions, most do. This section describes basic concepts you need to know to use functions. Function Inputs and Outputs Functions take one or more input values and return an output value. Input values can be a literal value or the name of a field that contains a value. In both cases, the function expects the input value to be a particular data type such as STRING or INTEGER. For example, the CONCAT() function combines STRING inputs and outputs a new STRING. Page 299 Data Analysis and Visualization Guide - Platfora Expressions This example shows how to use the CONCAT() function to concatenate the values in the month, day, and year fields separated by the literal forward slash character: CONCAT(month,"/",day,"/",year) A function's return value may be the same as its input type or it may be an entirely new data type. For example, the TO_DATE() function takes a STRING as input, but outputs a DATETIME value. If a function expects a STRING, but is passed another data type as input, the function returns an error. Typically, functions are classified by what data type they take or what purpose they serve. For example, CONCAT() is a string function and TO_DATE() is a data type conversion function. You'll find a complete list of functions by type in Platfora's Expression Language Reference. Nesting Functions Functions can take other functions as arguments. For example, you can use the CONCAT function as an argument to the TO_DATE() function. The final result is a DATETIME value in the format 10/31/2014. TO_DATE(CONCAT(month,"/",day,"/",year),"MM/dd/yyyy") The nested function must return the correct data type. So, because TO_DATE() expects string input and CONCAT() returns a string, the nesting succeeds. Only row functions allow nesting. Aggregate functions do not allow nested expressions as input. Aggregate Functions versus Row Functions Most functions process one value from one row at a time. These are called row functions because they operate on one value from a single row at a time. Aggregate functions are a special class of functions. Unlike row functions, aggregate functions process the values from multiple rows together into a single return value. Some examples of row functions are: • SUM() • MIN() • VARIANCE() Aggregate functions are also special because you use them to define measures. Measures always return numeric values that serve as the quantitative data in an analysis. Aggregate expressions are often refered to as measure expressions in Platfora. Limitations of Aggregation Functions Unlike row functions, aggregate functions can only take simple expressions as input (such as field names or literal values). Aggregate functions cannot take row functions as arguments. You also cannot use an aggregate function as input into a row function. You cannot mix aggregate functions and row functions together in one expression. Finally, while you can build expressions in both the dataset or the vizboard, only the following aggregate functions are allowed in a vizboard computed field expressions: • DISTINCT() Page 300 Data Analysis and Visualization Guide - Platfora Expressions • MIN() • MAX() • ROLLUP Operators in an Expression Platfora has a number of built-in operators for doing arithmetic, logical, and comparison operations. Often, you'll use operators to combine or compare values. The values can be literal values, field values, or even other expressions. Arithmetic Operators Arithmetic operators perform basic math operations on two values of the same data type. For example, you could calculate the gross profit margin percentage using the values of a total_revenue and total_cost field as follows: ((total_revenue - total_cost) / total_cost) * 100 Or you can use the plus (+) operator to combine STRING values: "Firstname" + " " + "Lastname" You can use the plus (+) and minus (-) operators to add or subtract DATETIME values. The following table lists the math operators: Operator Description Example + Addition amount + 10 (add 10 to the value of the amount field) - Subtraction amount - 10 (subtract 10 from the value of the amount field) * Multiplication amount * 100 (multiply the value of the amount field by 100) / Division bytes / 1024 (divide the value of the bytes field by 1024 and return the quotient) Page 301 Data Analysis and Visualization Guide - Platfora Expressions Comparison Operators Comparison operators are used to define Boolean (true / false) expressions. They test whether two values are equivalent. Comparisons return 1 for true, 0 for false. If the comparison is invalid, for example comparing a STRING to an INTEGER, the comparison operator returns NULL. For example, you could use comparison operators within a CASE expression: CASE WHEN age <= 25 THEN "0-25" WHEN age <= 50 THEN "26-50 ELSE "over 50" END This expression compares the value in the age field to a literal number value. If true, it returns the appropriate STRING value. You cannot use comparison operators to test for equality between DATETIME values. The following table lists the comparison operators: Operator Meaning Example Expression = or == Equal to order_date = "12/22/2011" > Greater than age > 18 !> Not greater than age !> 8 < Less than age < 30 !< Not less than age !< 12 >= Greater than or equal to age >= 20 <= Less than or equal to age <= 29 <> or != or ^= Not equal to age <> 30 Logical Operators Logical operators are used in expressions to test for a condition. Logical operators are often used in lens filters, CASE expressions, and PARTITION expressions. Filters test if a field or value meets some condition. For example, this tests if a date falls between two other dates. BETWEEN 2013-06-01 AND 2013-07-31 Logical operators are also used to construct WHERE clauses in Platfora's query language. The following table lists the logical operators: Operator Meaning Example Expression AND Test whether two conditions are true. OR Test if either of two conditions are true. Page 302 Data Analysis and Visualization Guide - Platfora Expressions Operator Meaning BETWEEN Test whether a date or year BETWEEN 2000 AND 2012 numeric value is within the min and max values (inclusive). IN(list) Test whether a value is product_type within a set. IN("tablet","phone","laptop") LIKE("pattern") Simple inclusive caseinsensitive character pattern matching. The * character matches any number of characters. The ? character matches exactly one character. last_name LIKE("?utch*") matches Kutcher, hutch but not Krutcher or crutch Check whether a field value or expression is null (empty) ship_date IS NULL evaluates to true when the ship_date field is Reverses the value of other operators. • year NOT BETWEEN 2000 AND 2012 min_value AND max_value value IS NULL NOT Example Expression company_name LIKE("platfora") matches Platfora or platfora empty • first_name NOT LIKE("Jo?n*") excludes John, jonny but not Jon or Joann • Date.Weekday NOT IN("Saturday","Sunday") • purchase_date IS NOT NULL evaluates to true when the purchase_date field is not empty Fields in an Expression Expressions often operate on the values of a field. This section explains how to use field names in expressions. Referring to Fields in the Current Dataset When you specify a field name in an expression, if the field name does not contain spaces or special characters, you can simply refer to the field by its name. For example, the following expression sums the values of the sales field: SUM(sales) Page 303 Data Analysis and Visualization Guide - Platfora Expressions Enclose field names with square brackets ([]) if they contain spaces, special characters, reserved keywords (such as function names), or start with numeric characters. For example: SUM([Sale Amount]) SUM([2013_data]) SUM([count]) If a field name contains a ] (closing square bracket), you must escape the closing square bracket by doubling it ]]. So if the field name is: Min([crs_flight_duration]) You enclose the entire field name in square brackets and escape the closing bracket that is part of the actual field name: [Min([crs_flight_duration]])]> If you are using the expression builder, it provides the correct escapes for you. Field is a synonym for dataset column. The documentation uses the word field because that is the terminology used in Platfora's user interface. Use Dot Notation for Fields in a Referenced Dataset Your expression might refer to a field in the focus dataset. (Focus dataset is simply the current dataset you are working with.) You also might include a field in a referenced dataset. When including fields in a referenced dataset, you must qualify the field name with the proper notation. The convention is reference_name.field_name. Don't confuse a reference name with the dataset name; they are not the same. When you create a reference link in a dataset, you give that reference its own name. Use . (dot) notation to separate the two components. For example consider, the Airports dataset which goes by the Departure Airport reference name. To refer to the City field of the Departure Airport reference to the Airports dataset, you would use the notation: [Departure Airport].City Just as with field names, you must escape reference names if they contain spaces, special characters, reserved keywords (such as function names), or start with numeric characters. Aggregated Functions and Fields in a Referenced Dataset Aggregate functions can only operate on fields in the current focus dataset. You cannot directly calculate a measure on a field belonging to a referenced dataset. For example, the following expression is not allowed: DISTINCT([Departure Airport].City) Page 304 Data Analysis and Visualization Guide - Platfora Expressions Instead, use a two-step process to 'pull up' a referenced field into the current dataset. First, define Departure Airport City computed field whose expression is just the path to the referenced dataset field: [Departure Airport].City Then, you can use the interim Departure Airport City computed field as an argument to the aggregate expression. For example: DISTINCT([Departure Airport City]) Literal Values in an Expression Sometimes you need to use a literal value in an expression, as opposed to a field value. How you specify a literal value depends on its data type (text, numeric, or date). This section explains how to use literals in expressions. Literal STRING Values To specify a literal or actual STRING value, enclose the value in double quotes ("). For example, this expression converts the values of a gender field to the literal values of male, female, or unknown: CASE WHEN gender="M" THEN "male" WHEN gender="F" THEN "female" ELSE "unknown" END To escape a literal quote within a literal value itself, double the literal quote character. For example: CASE WHEN height="60""" THEN "5 feet" WHEN height="72""" THEN "6 feet" ELSE "other" END The REGEX() function is a special case. In the REGEX() function, string expressions are also enclosed in quotes. When a string expression contains literal quotes, double the literal quote character. For example: REGEX(height, "\d\'(\d)+""") Literal DATE and DATETIME Values To refer to a DATETIME value in a lens filter expression, the date format must be yyyy-MM-dd without any enclosing quotation marks or other punctuation. order_date BETWEEN 2012-12-01 AND 2012-12-31 To refer to a literal date value in a computed field expression, you must specify the format of the date and time components using TO_DATE, which takes a string literal argument and a format string. For example: CASE WHEN order_date=TO_DATE("2013-01-01 00:00:59 PST","yyyy-MM-dd HH:mm:ss z") THEN "free shipping" ELSE "standard shipping" DONE Page 305 Data Analysis and Visualization Guide - Platfora Expressions Literal Numeric Values For literal numeric values, you can just specify the number itself without any special escaping or formatting. For example: CASE WHEN is_married=1 THEN "married" is_married=0 THEN "not_married" ELSE NULL END PARTITION Expressions and Event Series Processing (ESP) Computed fields that contain a PARTITION expression are considered event series processing (ESP) computed fields. You can add ESP computed fields to Platfora datasets only (not vizboards). Event series processing is also referred to as pattern matching or event correlation. Use event series processing (ESP) to partition the rows of a dataset, order the rows sequentially (typically by a timestamp), and search for matching patterns among the rows. ESP fields evaluate multiple rows in the dataset, and output one value (or column) per row. You can use the results of an ESP computed field in other expressions or (after lens build processing) in a viz. How Event Series Processing Works This section explains how even series processing works by walking you through a simple use of the PARTITION expression. This example uses some weblog page view data. Each row represents a page view at a given point in time within a user session. Each session is unique and belongs to only one user. Users can have multiple sessions. Within any session a user can visit any page one or more times. SessionID UserID Timestamp Page 2A 2 3/4/13 2:02 AM products.html 1A 1 12/1/13 9:00 AM home.html 1A 1 12/1/13 9:10 AM products.html 1A 1 12/1/13 9:05 AM company.html 1B 1 3/1/13 9:45 PM home.html 1A 1 3/1/13 9:40 PM checkout.html 2A 2 3/4/13 2:56 AM checkout.html 1B 1 3/1/13 9:46 PM products.html 1A 1 12/1/13 9:20 AM checkout.html Page 306 Data Analysis and Visualization Guide - Platfora Expressions SessionID UserID Timestamp Page 2A 2 3/4/13 2:20 AM home.html 2A 2 3/4/13 2:33 AM blogs.html 1A 1 12/1/13 9:15 AM blogs.html Consider the following partial PARTITION expression: PARTITION BY SessionID ORDER BY Timestamp ... This paritions the rows by the SessionID. Within each partition, the function orders each row by Timestamp in ascending order (the default order). Suppose you wanted to find sessions where users traversed the pages in order from home.html to products.html and then to the checkout.html page. To look for this page view pattern, you complete the expression like this. PARTITION BY SessionID ORDER BY Timestamp PATTERN (A,B,C) DEFINE A AS Page = "home.html", B AS Page = "product.html", C AS Page = "checkout.html" OUTPUT "TRUE" The PATTERN clause describes the sequence and the DEFINE clauses assigns values to the PATTERN elements. This pattern says that there is a match whenever there are 3 consecutive rows that meet criteria A then B then C. If the computed field containing this PARTITION expression was called Path=home,product,checkout, you would get output that looks like this: SessionID UserID Timestamp Page Path=home,product,checkout 1A 1 12/1/13 9:00 AM home.html NULL 1A 1 12/1/13 9:05 AM company.html NULL 1A 1 12/1/13 9:10 AM products.html NULL 1A 1 12/1/13 9:15 AM blogs.html NULL 1A 1 12/1/13 9:20 AM checkout.html NULL 1B 1 3/1/13 9:40 PM home.html NULL 1B 1 3/1/13 9:45 PM products.html NULL 1B 1 3/1/13 9:46 PM checkout.html TRUE Page 307 Data Analysis and Visualization Guide - Platfora Expressions SessionID UserID Timestamp Page Path=home,product,checkout 2A 2 3/4/13 2:02 AM products.html NULL 2A 2 3/4/13 2:20 AM home.html NULL 2A 2 3/4/13 2:33 AM blogs.html NULL 2A 2 3/4/13 2:56 AM checkout.html NULL The lens build processing that happens to produce these results is as follows: 1. Partition (or group) the rows of the dataset by session. 2. Order the rows in each partition by time (in ascending order by default). 3. Evaluate the rows against each DEFINE clause and bind the row to the symbol where there is a match. 4. Check if the PATTERN clause conditions are met in the specified order and frequency. 5. If the PATTERN criteria is met, output TRUE as the result value for the last row that caused the pattern to be true. Write the output results to a new computed field: Path=home,product,checkout. If a row does not cause the pattern to be true, output nothing (NULL). Understand Pattern Match Processing Order During lens processing, the build evaluates patterns row-by-row from the partitions top row and going downwards. A pattern match is evaluated based on the current row, and any rows that come before (in terms of their position in the partition). The pattern match only looks back from the current row – it does not look ahead to the next row in the partition. Order processing is important to consider when you want to look for events that happened later or next (chronologically speaking). With the default sort order (ascending), the build sorts rows within a partition from oldest to most recent. This means that you can only pattern match backwards chronologically (or look for events that happened previously in time). Page 308 Data Analysis and Visualization Guide - Platfora Expressions For example, to answer a question such as "what page did a user visit before they visited the product page?", the following expression would return the previous (chronologically) viewed page before the product page: PARTITION BY SessionID ORDER BY Timestamp ASC PATTERN (^product_page?,A) DEFINE product_page AS "product.html", A AS TRUE OUTPUT A.Page If you want to pattern match forwards chronologically (or look for events that happened later in time), you would specify DESC sort order in the ORDER BY clause of your PARTITION expression. For example, to answer a question such as "what page did a user visit after they visited the product page?", the following expression would return the next (chronologically) viewed page after the product page: PARTITION BY SessionID ORDER BY Timestamp DESC PATTERN (^product_page?,A) DEFINE product_page AS "product.html", A AS TRUE OUTPUT A.Page Understand Pattern Match Precedence By default, pattern expressions are matched from left to right. The innermost parenthetical expressions are evaluated first and then moving outward from there. For example, the pattern: PATTERN (((A,B)|(C,D)),E) Would evaluate differently than: PATTERN (A,B|C,D,E) Understand Regex-Style Quantifiers (Greedy and Reluctant) The PATTERN clause can use regex-style quantifiers to denote the frequency of a match. By default, quantifiers are greedy. This means that it matches as many rows as possible. For example: PATTERN (A*,B?) Causes symbol A to match zero or more rows. Symbol B can match to exactly one row. Adding an additional question mark ? to a quantifier makes it reluctant. This means that the PATTERN only matches to a row when the row cannot match to any other subsequent match criteria in the pattern. For example: PATTERN (A*?,B) Causes symbol A to match zero or more rows, but only when symbol B does not produce a match. You can use reluctant quantifiers to break ties when there is more than one possible match to the pattern. Page 309 Data Analysis and Visualization Guide - Platfora Expressions A quantifier applies to a single match criteria symbol only. You cannot apply quantifiers to parenthetical expressions. For example, you cannot write ((A,B,C)*, D) to indicate that the asterisk quantifier applies to the whole (A,B,C) expression. Best Practices for Event Series Processing (ESP) Event series processing (ESP) computed fields, unlike other computed fields, require advanced processing during lens builds. This means they require more compute resources on your Hadoop cluster. This section discusses what to consider when adding event series computed fields to your dataset definitions, and the best practices when using this feature. Use Helpful Field Names and Descriptions In the Data Catalog and Vizboards areas of the Platfora application, event series computed fields look just like any other dataset field. When defining event series computed fields, give them names and descriptions that help users understand the field's purpose. This cues users on how to use a field in an analysis. For example, if describing an event series computed field that computes Next Page Viewed, it may be helpful for users to know that this field is best used in conjunction with the Page field. Whatever the current value is for the Page field, the Next Page Viewed field has the value of Page for the next click record immediately following the current page. Increase Partition Limit for Larger Event Series Processing Jobs The global configuration property platfora.max.pattern.events sets the maximum number of rows in a partition to evaluate for a pattern match. The default is one million rows. If a partition exceeds this number of rows, the result of the PARTITION function is NULL for all the rows that exceed the limit. For example, if you had an event series computed field that partitioned by UserID and ordered by Timestamp, the build processes only the first million rows and ignores any rows beyond that so the event series computed field is NULL for those rows. If you are noticing a lot of default values in your lens data (for example: ‘January 1, 1970’ for dates or ‘NULL’ for strings), you may want to increase platfora.max.pattern.events so that all of the rows are processed. Keep in mind that increasing this limit will consume more memory resources on the Hadoop cluster during lens processing. Filter Partitioning Fields to Restrict Lens Build Scope Platfora cannot incrementally build lenses that include event series processing fields. Due to the nature of patten matching logic, lenses with ESP fields require full lens builds that scan all of a dataset's input data. You can limit the scope of these lens builds and improve processing time by adding a lens filter on a dataset partitioning field. A dataset partitioning field is different from the partition criteria of the ESP field. For Hive data sources, partitioning fields are defined on the data source by the Hive administrator. For HDFS or S3 data Page 310 Data Analysis and Visualization Guide - Platfora Expressions sources, partitioning fields are defined in a Platfora dataset. If there are partitioning fields available in a lens, the lens builder displays a special icon next to them. Consider How Lens Filters Impact Event Series Processing Results Lens builds always apply lens filters on dataset partitioning fields as the first step of a lens build. This means a build excludes some source data before processing any computed field expressions. If your lens includes both lens filters on partitioning fields and ESP computed fields, you should take this behavior into consideration as it can change the results of PARTITION expresssions, and ultimately, your analysis conclusions. For example, suppose you are analyzing web page visits by user on data from 2012 and 2013: SessionID UserID Timestamp (partition field) Page 1A 1 12/1/12 9:00 AM home.html 1A 1 12/1/12 9:05 AM company.html 1A 1 12/1/12 9:10 AM products.html 1A 1 12/1/12 9:15 AM blogs.html 1B 1 3/1/13 9:40 PM home.html 1B 1 3/1/13 9:45 PM products.html 1B 1 3/1/13 9:46 PM checkout.html 2A 2 3/4/13 2:02 AM products.html 2A 2 3/4/13 2:20 AM home.html Page 311 Data Analysis and Visualization Guide - Platfora Expressions SessionID UserID Timestamp (partition field) Page 2A 2 3/4/13 2:33 AM blogs.html 2A 2 3/4/13 2:56 AM checkout.html Timestamp is a partitioning field and it has a filter that excludes 2012 sessions. Then, you create a computed field with an event series PARTITION function that returns a user's first visit date. When the lens builds, the PARTITION expression would process this filtered data: SessionID UserID Timestamp Page 1B 1 3/1/13 9:40 PM home.html 1B 1 3/1/13 9:45 PM products.html 1B 1 3/1/13 9:46 PM checkout.html 2A 2 3/4/13 2:02 AM products.html 2A 2 3/4/13 2:20 AM home.html 2A 2 3/4/13 2:33 AM blogs.html 2A 2 3/4/13 2:56 AM checkout.html Additionally, the results would say UserID 1 had a first visit date of 3/1/13 even though the user's first visit was actually 12/1/12. This discrepancy results from the build processing the lens filter on the partitioning field (Timestamp) before the event series processing field. Lens filters on other, non-partitioning dataset fields are applied after event series processing. ROLLUP Measures and Window Expressions This section explains how to write ROLLUP and window expressions to calculate complex measures, such as running totals, benchmark comparisons, rank ordering, percentiles, and so on. Understand ROLLUP Measures ROLLUP is a modifier to a measure (or aggregate) expression that allows you to operate on a subset of rows within the overall result set of a query. Using ROLLUP you can build a frame around one or more rows in a dataset or query result, and then compute an aggregate result in relation to that frame only. The result of a ROLLUP expression is always a measure. However, instead of just doing a simple aggregation, it does more complex aggregate processing over a specified set of rows (or marks in a viz). Page 312 Data Analysis and Visualization Guide - Platfora Expressions If you are familiar with SQL, a ROLLUP expression in Platfora is equivalent to the OVER clause in SQL. For example, this SQL statement: SELECT SUM(distance) OVER (PARTITION BY departure_date) would be equivalent to this ROLLUP expression in Platfora: ROLLUP SUM(Distance) TO [Departure Date] What is the difference between a measure and a ROLLUP measure? A measure is the result of an aggregate function (such as SUM) applied to a group of input data rows. For example, using the Flights tutorial data that comes with your Platfora installation, suppose you wanted to calculate the total distance flown by an airline. You could create a measure called Distance(Sum) with an aggregate expression such as this: SUM(Distance) The group of input records passed into this aggregate calculation is then determined by the dimension(s) used in a visualization or lens query. Records that have the same dimension members are grouped together in a single row, which then gets represented as a mark in a viz. For example, in this viz there is one group or mark for each Carrier/Week combination in the input data. A ROLLUP clause modifies another aggregate function to define additional partitioning, ordering, and window frame criteria. Like a regular aggregate function, ROLLUP also computes aggregate values over groups of input rows. However, a ROLLUP measure then partitions the overall rows returned by the Page 313 Data Analysis and Visualization Guide - Platfora Expressions viz query into subsets or buckets, and then computes the aggregate expression separately within each individual bucket. A ROLLUP is useful when you want to compute an aggregation over a subset of rows (or marks) independently of the overall result of the viz query. The ROLLUP function specifies how to partition the subset of rows and how to compute the aggregation within that subset. For example, suppose you wanted to calculate the percentage of all miles that were flown in a given week. You could write a ROLLUP expression that calculates the percent of total distance within the partition of a week (total distance for the week is 100%). The ROLLUP expression to define such a calculation would look something like this: 100 * [Distance(Sum)] / ROLLUP [Distance(Sum)] TO ([Departure Date].Week) Then when this ROLLUP expression is used in a viz, the group of input records passed into the aggregate calculation is determined by the dimension(s) used in the viz (such as Carrier in this case), however the aggregation is calculated independently within each week. In this case, you can see the percentage that each carrier contributed to the total distance flown in a given week. How to calculate a ROLLUP over an 'adaptive' partition A ROLLUP expression can have fixed or adaptive partitioning criteria. When you define the ROLLUP measure expression, the TO clause of the expression specifies how to partition the data. You can either specify an exact field name (fixed), a reference field name (adaptive), or no field name at all (adaptive). Page 314 Data Analysis and Visualization Guide - Platfora Expressions In the previous example, the ROLLUP expression used a fixed partition of [Departure Date].Week. If we changed the partition criteria to use just [Departure Date] (a reference), the partition criteria becomes adaptive to any field of that reference that is used in a viz. The expression to define an adaptive date partition might look something like this: 100 * [Distance(Sum)] / ROLLUP [Distance(Sum)] TO ([Departure Date]) Since Departure Date is a reference that points to the Date dimension, the calculation dynamically changes if you drill down from week to day in the viz. This expression can then be used to partition by any granularity of Departure Date without having to rewrite the ROLLUP expression. The ROLLUP expression adapts to any granularity of Departure Date used in a viz. Understand ROLLUP Window Expressions Adding an ORDER BY plus an optional RANGE or ROWS clause to a ROLLUP expression turns it into a window expression. These clauses are used to specify an order inside of each partition, and a window frame around all, one, or several rows over which to compute the aggregate calculation. The window frame defines how to crop, shift, or fix the row set in relation to the position of the current row. For example, suppose you wanted to calculate a cumulative total on a day to day basis. You could do this by adding a window frame to your ROLLUP expression that ordered the rows in each partition by date (using the ORDER BY clause) , and then summed up the current row and all the days that came Page 315 Data Analysis and Visualization Guide - Platfora Expressions before it (using a ROWS UNBOUNDED PRECEDING clause). In the Flights tutorial data, an expression that calculated a cumulative total of flights per day would look something like this: ROLLUP [Total Records] TO () ORDER BY ([Departure Date].Date) ROWS UNBOUNDED PRECEDING When this ROLLUP expression is used in a viz, the Total Records measure is computed cumulatively by day for each partition group (the Date and Cancel Status dimensions in this case), allowing us to see the progression of cancelled flights in the month of October 2012. This allows us to see unusual growth patterns in the data, such as the dramatic spike in cancellations at the end of the month. The RANK, DENSE_RANK, and NTILE functions are considered exclusively window functions because they can only be used in a ROLLUP expression, and they always require an ordered set of rows (or window) over which to compute their result. Computed Field Examples This section contains examples of some common data processing tasks you can accomplish using Platfora computed fields. The Expression Language Reference has examples for all of the built-in functions that Platfora provides. Finding and Replacing Values You may have a particular values in your data that you want to find and change to something else, or reformat them in a way so they are all consistent. For example, find and replace values in a name field Page 316 Data Analysis and Visualization Guide - Platfora Expressions where name values are formatted as firstname lastname and replace them with name values formatted as lastname, firstname: REGEX_REPLACE(name,"(.*) (.*)","$2, $1") Or you may have field values that are not formatted exactly the same, and want to change them so that like values can be grouped and sorted together. For example, change all profession_title field values that contain the word "Retired" anywhere in the string to just be a value of "Retired": REGEX_REPLACE(profession_title,".*(Retired).*","Retired") Extracting Information from File Names and Directories You may have a dataset where the information you need is not inside the source files, but in the Hadoop file name or directory path, such as dates or server names. Suppose your dataset is based on daily log files that are organized into directories by date, and the file names are the server IP address of the server that produced the log file. For example, the URI path to a log file produced by server 172.12.131.118 on July 4, 2012 is: hdfs://myhdfs-server.com/data/logs/20120704/172.12.131.118.log The following expression uses FILE_PATH() in combination with REGEX() and TO_DATE() to create a date field from the date directory name: TO_DATE(REGEX(FILE_PATH(),"hdfs://myhdfs-server.com/data/logs/(\d{8})/ (?:\d{1,3}\.*)+\.log"),"yyyyMMdd") And the following expression uses FILE_NAME() and REGEX() to extract the server IP address from the file name: REGEX(FILE_NAME(),"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.log") Extracting a Portion of Field Values You may have field values where only part of the value contains useful information. You can pull out a portion of a field value to define a new field. For example, suppose you had an email_address field with values in the format of [email protected], and you wanted to extract just the provider portion of the email address: REGEX(email,"^[a-zA-Z0-9._%+-]+@([a-zA-Z0-9._-]+)\.[a-zA-Z]{2,4}$") Renaming Field Values Sometimes field values are not very user-friendly. For example, a Boolean field may have values of 0 and 1 that you want to change to more human-readable values. CASE WHEN cancelled=0 THEN "Not Cancelled" WHEN cancelled=1 THEN "Cancelled" ELSE NULL END Page 317 Data Analysis and Visualization Guide - Platfora Expressions Deriving a New Field from Other Fields You may want to combine the values of other fields to create a new field. For example, you could combine a month, day, and year field into a single date field. This would then allow you to reference Platfora's built-in Date dimension dataset. TO_DATE(CONCAT(month,"/",day,"/",year),"MM/dd/yyyy") You can also use the values of other fields to calculate a new value. For example, you could calculate a gross profit margin percentage using the values of a revenue and cost field as follows: ((revenue - cost) / cost) * 100 Cleansing and Casting Field Values Sometimes the are data values in a column need to be transformed and cast to another data type in order to allow for further calculations on the data. For example, you might have some numeric data that you want to use as a measure, however, it has string values of "NA" to represent what should really be NULL values. You could transform the "NA" values to NULL and then cast the column to a numeric data type. TO_INT(CASE WHEN delay_minutes="NA" then NULL ELSE delay_minutes END) Troubleshoot Computed Field Errors When you create a computed field Platfora catches any syntax error in your expression when you try to save the field. This section describes the most common causes of expression syntax errors. Function Arguments Don't Match the Expected Data Type Functions expect input arguments to be of a certain data type. When a function uses another field as its input argument, and that field is not of the expected data type, you might see an error such as: Function REGEX takes 2 arguments with types STRING, STRING, but one argument of type INTEGER was provided. Look at the function's arguments that appear in the error message and verify they are the proper data types. If the argument is a field, you might need to change the data type of the base field or use a data type conversion function to cpnvert the argument to the expected data type within the expression itself. See also: Functions in an Expression Not Escaping Field or Dataset Names Field and dataset names used in an expression must be enclosed in square brackets ([ ]) if they contain spaces, special characters, reserved keywords, or start with numeric characters. When an expression contains a field or dataset name that meets one of these criteria and is not encosed in square brackets, you might see an error such as: Platfora expected the string `)', but instead received `F'. TO_LONG(New Field) Page 318 Data Analysis and Visualization Guide - Platfora Expressions Look at the bolded character in the expression to find the location of the error. Note the text that comes after this position. If it is part of a field or dataset name, you need to enclose the name with square brackets. To correct the expression in this example, use: TO_LONG([New Field]) See also: Escaping Spaces or Special Characters in Field and Dataset Names Not Specifying the Full Path to Fields of a Referenced Dataset Functions can use a field that is in dataset referenced from the focus dataset. You must specify the field's full path by including the reference dataset's reference name. If you forget to use the full path, you might see an error like: Field not found: carrier_name When you see the Field not found error, make sure the field is qualified with the reference name. In this example, carrier_name is a field in a referenced dataset. The reference name in this example is carriers. To correct this expression, use: carriers.carrier_name for the field name. See also: Referring to Fields in a Referenced Dataset Unenclosed Literal Strings You can include a literal string value as a function argument, but it must be enclosed in double quotes ("). When an expression uses a literal string that isn't enclosed in double quotes, you might see an error such as: Field not found: Platfora When you see the Field not found error, one option is that the alleged field is meant to be a literal string and needs to be enclosed in double quotes. To correct this expression, use: "Platfora" for the string. See also: Literal Values in an Expression Unescaped Special Characters Field and dataset names may contain a right square bracket (]), but it must be preceded by another right square bracket (]]). Literal strings may contain a double quote ("), but it must be preceded by another double quote (""). Suppose you want to concatenate the strings "Hello and world." to make the string "Hello world.". The double quotes in each string are special characters and must be escaped in the expression. If not, you might see an error like: Platfora expected the string `)', but instead received `H'. CONCAT(""Hello", " world."") Look at the bolded character in the expression to find the location of the error. To correct this error, escape the double quotes with another double quote: CONCAT("""Hello", " world.""") Page 319 Data Analysis and Visualization Guide - Platfora Expressions Invalid Syntax Functions have specific requirements, including required arguments and keywords. When an expression is missing a keyword, you might see an error such as: Platfora expected a string matching the regular expression `(?i)\Qend\E', but instead received end of source. CASE WHEN cancel_code=0 THEN "Not Cancelled" WHEN cancel_code=1 THEN "Cancelled" ELSE NULL Look at the bolded character in the expression to find the location of the error. In this example, it expected the string END (indicated by (?i)\Qend\E), but instead it reached the end of the expression. The CASE function requires the END keyword at the end of its syntax string. To correct this error, add END to the end of the expression: CASE WHEN cancel_code=0 THEN "Not Cancelled" WHEN cancel_code=1 THEN "Cancelled" ELSE NULL END See also: Expression Language Reference Using Row and Aggregate Functions Together in the Same Expression Aggregate functions (functions used to define measures) cannot use nested expressions as their input arguments. Aggregate functions can only accept field names as input. You also cannot use an aggregate expression as input to a row function expression. Aggregate functions and row functions cannot be mixed together in one expression. Write a Lens Query Platfora includes a programmatic query access feature you can use to query a lens. This section describes support for querying lenses using Platfora's lens query language and the REST API. Platfora allows you to make a query against an aggregate lens in your Platfora instance. This feature is not meant as an end-user feature. Rather it is intended to allow you to write programs that issue SQLlike queries to a Platfora lens. For example, you could write a simple command-line client for querying a lens. Since programmatic query access is meant for use by programs rather than people, a caller makes the queries through REST API calls. A query consists of a SELECT statement with one or more optional clauses. The statement and its clauses use the same expression language elements you encounter when building a computed field expression and/or a lens filter expression. [ DEFINE alias-name AS expression [ DEFINE ... ] ] SELECT measure-field [ AS alias-name ] | measure-expression AS alias-name [ , { dimension-field [ AS alias-name ] | row-expression AS alias-name } [ , ...] ] FROM lens-name [ WHERE filter-expression [ AND filter-expression ] ] [ GROUP BY dimension-field [ [, group-ordering ] ] [ HAVING measure-filter-expression ] Page 320 Data Analysis and Visualization Guide - Platfora Expressions For example, you make a query like the following: SELECT [device].[manufacturer], [user].[gender], [Num Users] FROM bo_view2G_PSM WHERE video.genre %3D "Action/Comedy" AND user.gender !%3D "male" GROUP BY [device].[manufacturer], [user].[gender] Once you know the query structure, you make an REST call use the query endpoint. You can pass the query as a parameter to a GET or as JSON body to a POST. https://hostname:port/api/v1/query?query="HTML-encoded SELECT statement ..." Considerations for Using Programmatic Query Access Here are some considerations to keep in mind when constructing lens queries: • You can only query aggregate lenses. You cannot query event series lenses. • Queries run against the currently built version of the lens. • Queries that once worked can later fail because the underlying dataset or lens changed. • You cannot do a SELECT * on a lens. FAQs - Expression Basics This section covers the basic concepts and common questions about the Platfora expression language. What is an expression? An expression computes or produces a value by combining fields (or columns), constant values, operators, and functions. An expression outputs a value of a particular data type, such as numeric, string, datetime, or Boolean (true/false) values. Simple expressions can be a single constant value, the values of a given column or field, or a function call. You can use operators to join two or more simple expressions into a complex expression. How are expressions used in the Platfora application? Platfora expressions allow you to select, process, transform, and manipulate data. Expressions are used in several ways in the Platfora application: • In Datasets, they are used to define computed fields and measures that operate on the raw source data. • In Lenses, they are used to define lens filters that limit the scope of raw data requested from Hadoop. • In Vizboards, they are used to define computed fields that further manipulate the prepared data in a lens. Page 321 Data Analysis and Visualization Guide - Platfora Expressions • In the Lens Query Language via the REST API, they are used to programmatically access and manipulate the prepared data in a lens from external applications or plugins. What is the expression builder? The expression builder helps you create computed field expressions in the Platfora application. It shows the available fields in the dataset or lens you are working with, plus the list of Platfora's built-in functions and statements. It validates your expressions for correct syntax, input data types, and so on. You can also access the help to view correct syntax and examples for all of the built-in functions and statements. What is a computed field expression? A computed field expression generates its values based on a calculation or condition, and returns a value for each input row. Computed field expressions that can contain values from other fields, constants, mathematical operators, comparison operators, or built-in row functions. What is a measure expression? A measure expression generates its values as the result of an aggregate function. It takes input values from multiple rows and returns a single aggregated value. How are expressions used in programmatic lens queries? Platfora's lens query language does not have a graphical user interface like the expression builder. Instead, you can use the cURL command line, Chrome's Postman extension, or write your own plugin extension to submit a SQL-like SELECT query statement through Platfora's REST API. The lens query language makes use of expressions in its SELECT statement, DEFINE clause, WHERE clause and HAVING clause. Programmatic lens queries are subject to some of the same expression limitations as vizboard computed fields, since they also operate on the pre-processed data in a lens. Platfora Expression Language Reference An expression computes or produces a value by combining field or column values, constant values, operators, and functions. Platfora has a built-in expression language. You use the language's functions and operators in dataset computed fields, vizboard computed fields, lens filters, and programmatic lens queries. Expression Quick Reference An expression is a combination of columns (or fields), constant values, operators, and functions used to evaluate, transform, or produce a value. Simple expressions can be combined to make more complex expressions. This quick reference describes the functions and operators that can be used to write expressions. Page 322 Data Analysis and Visualization Guide - Platfora Expressions Platfora's built-in statements, functions and operators are divided into the following categories: • Conditional and NULL Processing • Event Series Processing • String Processing • Date and Time Processing • URL Processing • IP Address Processing • Mathematical Processing • Data Type Conversion • Aggregation and Measure Processing • ROLLUP and Window Calculations • User Defined Functions • Comparison Operators • Logical Operators • Arithmetic Operators Conditional and NULL Processing Conditional and NULL processing allows you to transform or manipulate data values based on certain defined conditions. Conditional processing (CASE) can be done at either the dataset or vizboard level. NULL processing (COALESCE and IS_VALID) is only applicable at the dataset level. During a lens build, any NULL values in the source data are converted to default values, so lenses and vizboards have no concept of NULL values. Function Description Example CASE evaluates each row in the dataset according to one or more input conditions, and outputs the specified result when the input conditions are met CASE WHEN gender = "M" THEN "Male" WHEN gender = "F" THEN "Female" ELSE "Unknown" END COALESCE returns the first valid value (NOT NULL value) from a commaseparated list of expressions COALESCE(hourly_wage * 40 * 52, salary) IS_VALID returns 0 if the returned value is NULL, and 1 if the returned value is NOT NULL. IS_VALID(sale_amount) Page 323 Data Analysis and Visualization Guide - Platfora Expressions Event Series Processing Event series processing allows you to partition rows of input data, order the rows sequentially (typically by a timestamp), and search for matching patterns in a set of rows. Computed fields that are defined in a dataset using a PARTITION expression are considered event series processing computed fields. Event series processing computed fields are processed differently than regular computed fields. Instead of computing values from the input of a single row, they compute values from inputs of multiple rows in the dataset. Event series processing computed fields can only be defined in the dataset - not in the vizboard. Function Description Example PACK_VALUES returns multiple PACK_VALUES("ID",custid,"Age",age) output values packed into a single string of key/value pairs separated by the Platfora default key and pair separators - useful when the OUTPUT clause of a PARTITION expression returns multiple output values PARTITION partitions the rows of a dataset, orders the rows sequentially (typically by a timestamp), and searches for matching patterns in a set of rows PARTITION BY SessionID ORDER BY Timestamp PATTERN (A,B,C) DEFINE A AS Page = "home.html", B AS Page = "product.html", C AS Page = "checkout.html" OUTPUT "TRUE" String Functions String functions allow you to manipulate and transform textual data, such as combining string values or extracting a portion of a string value. Function Description Example ARRAY_CONTAINS performs a whole string match against a string containing delimited values and returns a 1 or 0 depending on whether or not the string contains the search value. ARRAY_CONTAINS(device,",","iPad") Page 324 Data Analysis and Visualization Guide - Platfora Expressions Function Description Example CONCAT concatenates (combines together) the results of multiple string expressions CONCAT(month,"/",day,"/",year) FILE_NAME returns the original file TO_DATE(SUBSTRING(FILE_NAME(),0,8),"yyyyMMdd") name from the source file system FILE_PATH returns the full URI path from the source file system TO_DATE(REGEX(FILE_PATH(),"hdfs:// myhdfs-server.com/data/logs/(\d{8})/(?: \d{1,3}\.*)+\.log"),"yyyyMMdd") EXTRACT_COOKIE extracts the value of the given cookie identifier from a semicolon delimited list of cookie key=value pairs. EXTRACT_COOKIE("SSID=ABC; vID=44", "vID") returns 44 EXTRACT_VALUE extracts the value for the given key from a string containing delimited key/value pairs. EXTRACT_VALUE("firstname;daria| lastname;hutch","lastname",";","|") returns INSTR returns an integer indicating the position of a character within a string that is the first character of the occurrence of a substring. INSTR(url,"http://",-1,1) JAVA_STRING returns the unescaped version of a Java unicode character escape sequence as a string value CASE WHEN currency == JAVA_STRING("\u00a5") THEN "yes" ELSE "no" END JOIN_STRINGS concatenates JOIN_STRINGS("/",month,day,year) (combines together) the results of multiple string expressions with the separator in between each non-null value hutch Page 325 Data Analysis and Visualization Guide - Platfora Expressions Function Description Example JSON_ARRAY_CONTAINS performs a whole string match against a string formatted as a JSON array and returns a 1 or 0 depending on whether or not the string contains the search value JSON_ARRAY_CONTAINS(software,"platfora") JSON_DOUBLE extracts a DOUBLE value from a field in a JSON object JSON_DOUBLE(top_scores,"test_scores.2") JSON_FIXED extracts a FIXED value JSON_FIXED(top_scores,"test_scores.2") from a field in a JSON object JSON_INTEGER extracts an INTEGER value from a field in a JSON object JSON_INTEGER(top_scores,"test_scores.2") JSON_LONG extracts a LONG value from a field in a JSON object JSON_LONG(top_scores,"test_scores.2") JSON_STRING extracts a STRING value from a field in a JSON object JSON_STRING(misc,"hobbies.0") LENGTH returns the count of characters in a string value LENGTH(name) REGEX performs a whole REGEX(weblog.request_line,"GET\s/([a-zAstring match against Z0-9._%-]+\.[html])\sHTTP/[0-9.]+") a string value with a regular expression and returns the portion of the string matching the first capturing group of the regular expression Page 326 Data Analysis and Visualization Guide - Platfora Expressions Function Description Example REGEX_REPLACE evaluates a string value against a regular expression to determine if there is a match, and replaces matched strings with the specified replacement value REGEX_REPLACE(phone_number,"([0-9] {3})\.([[0-9]]{3})\.([[0-9]]{4})","\($1\) $2-$3") SPLIT breaks down a delimited input string into sections and returns the specified section of the string SPLIT("Restaurants>Location>San Francisco",">", -1) returns San Francisco SUBSTRING returns the specified characters of a string value based on the given start and end position SUBSTRING(name,0,1) TO_LOWER converts all alphabetic characters in a string to lower case TO_LOWER("123 Main Street") returns 123 converts all alphabetic characters in a string to upper case TO_UPPER("123 Main Street") returns 123 TRIM removes leading and trailing spaces from a string value TRIM(area_code) XPATH_STRING takes an XMLformatted string and returns the first string matching the given XPath expression XPATH_STRING(address,"// address[@type='home']/zipcode") XPATH_STRINGS takes an XMLformatted string and returns a newlineseparated array of strings matching the given XPath expression XPATH_STRINGS(address,"/list/address[1]/ street") TO_UPPER main street MAIN STREET Page 327 Data Analysis and Visualization Guide - Platfora Expressions Function Description Example XPATH_XML takes an XMLformatted string and returns an XMLformatted string matching the given XPath expression XPATH_XML(address,"//address[last()]") Date and Time Functions Date and time functions allow you to manipulate and transform datetime values, such as calculating time differences between two datetime values, or extracting a portion of a datetime value. Function Description Example DAYS_BETWEEN calculates the whole number of days (ignoring time) between two DATETIME values DAYS_BETWEEN(ship_date,order_date) DATE_ADD adds the specified time DATE_ADD(invoice_date,45,"day") interval to a DATETIME value HOURS_BETWEEN calculates the whole number of hours (ignoring minutes, seconds, and milliseconds) between two DATETIME values HOURS_BETWEEN(NOW(),impressions.adview_timestam EXTRACT returns the specified portion of a DATETIME value EXTRACT("hour",order_date) MILLISECONDS_BETWEEN calculates the MILLISECONDS_BETWEEN(request_timestamp,response_ MINUTES_BETWEEN calculates the whole MINUTES_BETWEEN(impression_timestamp,conversion_t whole number of milliseconds between two DATETIME values number of minutes (ignoring seconds and milliseconds) between two DATETIME values NOW returns the current system date and time as a DATETIME value YEAR_DIFF(NOW(),users.birthdate) Page 328 Data Analysis and Visualization Guide - Platfora Expressions Function Description Example SECONDS_BETWEEN calculates the whole number of seconds (ignoring milliseconds) between two DATETIME values SECONDS_BETWEEN(impression_timestamp,conversion_ TRUNC truncates a DATETIME value to the specified format TRUNC(TO_DATE(order_date,"MM/dd/yyyy HH:mm:ss"),"day") YEAR_DIFF calculates the fractional number of years between two DATETIME values YEAR_DIFF(NOW(),users.birthdate) URL Functions URL functions allow you to extract different portions of a URL string, and decode text that is URLencoded. Function Description Example URL_AUTHORITY returns the authority URL_AUTHORITY("http:// portion of a URL string user:[email protected]:8012/ mypage.html") returns user:[email protected]:8012 URL_FRAGMENT returns the fragment URL_FRAGMENT("http://platfora.com/ portion of a URL string news.php?topic=press#Platfora%20News") returns Platfora%20News URL_HOST returns the host, URL_HOST("http:// domain, or IP address user:[email protected]:8012/ portion of a URL string mypage.html") returns mycompany.com URL_PATH returns the path URL_PATH("http://platfora.com/company/ portion of a URL string contact.html") returns /company/contact.html URL_PORT returns the port URL_PORT("http:// portion of a URL string user:[email protected]:8012/ mypage.html") returns 8012 URL_PROTOCOL returns the protocol URL_PROTOCOL("http://www.platfora.com") (or URI scheme name) returns http portion of a URL string Page 329 Data Analysis and Visualization Guide - Platfora Expressions Function Description Example URL_QUERY returns the query URL_QUERY("http://platfora.com/news.php? portion of a URL string topic=press&timeframe=today") returns topic=press&timeframe=today URLDECODE decodes a string that has been encoded with the application/xwww-form-urlencoded media type URLDECODE("N%2FA%20or%20%22not %20applicable%22") IP Address Functions IP address functions allow you to manipulate and transform STRING data consisting of IP address values. Function Description Example CIDR_MATCH compares two CIDR_MATCH("60.145.56.0/24","60.145.56.246") STRING arguments returns 1 representing a CIDR mask and an IP address, and returns 1 if the IP address falls within the specified subnet mask or 0 if it does not HEX_TO_IP converts a HEX_TO_IP(AB20FE01) returns 171.32.254.1 hexadecimal-encoded STRING to a text representation of an IP address Math Functions Math functions allow you to perform basic math calculations on numeric values. You can also use the arithmetic operators to perform simple math calculations, such as addition, subtraction, division and multiplication. Function Description Example DIV divides two LONG values and returns a quotient value of type LONG DIV(TO_LONG(file_size),1024) Page 330 Data Analysis and Visualization Guide - Platfora Expressions Function Description Example EXP raises the EXP(Value) mathematical constant e to the power (exponent) of a numeric value and returns a value of type DOUBLE. FLOOR returns the largest integer that is less than or equal to the input argument FLOOR(32.6789) returns 32.0 HASH evenly partitions data values into the specified number of buckets HASH(username,20) LN returns the natural logarithm of a number LN(2.718281828) returns 1 MOD divides two LONG values and returns the remainder value of type LONG MOD(TO_LONG(file_size),1024) POW raises a numeric 100 * POW(end_value/start_value, 0.2) - 1 value to the power (exponent) of another numeric value and returns a value of type DOUBLE. ROUND rounds a DOUBLE value to the specified number of decimal places ROUND(32.4678954,2) returns 32.47 Page 331 Data Analysis and Visualization Guide - Platfora Expressions Data Type Conversion Functions Data type conversion functions allow you to cast data values from one data type to another. These functions are used implicitly whenever you set the data type of a field or column in the Platfora user interface. The supported data types are: INTEGER, LONG, DOUBLE, FIXED, DATETIME, and STRING Function Description Example EPOCH_MS_TO_DATEconverts LONG values EPOCH_MS_TO_DATE(1360260240000) to DATETIME values, returns 2013-02-07T18:04:00:000Z where the input number represents the number of milliseconds since the epoch TO_FIXED converts STRING, INTEGER, LONG, or DOUBLE values to fixed-decimal values TO_FIXED(opening_price) TO_DATE converts STRING values to DATETIME values, and specifies the format of the date and time elements in the string TO_DATE(order_date,"yyyy.MM.dd 'at' HH:mm:ss z") TO_DOUBLE converts STRING, INTEGER, LONG, or DOUBLE values to DOUBLE (decimal) values TO_DOUBLE(average_rating) TO_INT converts STRING, INTEGER, LONG, or DOUBLE values to INTEGER (whole number) values TO_INT(average_rating) TO_LONG converts STRING, INTEGER, LONG, or DOUBLE values to LONG (whole number) values TO_LONG(average_rating) TO_STRING converts values of other data types to STRING (character) values TO_STRING(sku_number) Page 332 Data Analysis and Visualization Guide - Platfora Expressions Aggregate Functions An aggregate function groups the values of multiple rows together based on some defined input expression. Aggregate functions return one value for a group of rows, and are only valid for defining measures in Platfora. In the dataset, measures can be defined using any of the aggregate functions. In the vizboard, only the DISTINCT, MAX, or MIN aggregate functions are allowed. Function Description Example AVG returns the average of all valid numeric values AVG(sale_amount) COUNT returns the number of rows in a dataset COUNT(sales.customers) COUNT_VALID returns the number of rows for which the given expression is valid COUNT_VALID(page_views) DISTINCT returns the number of distinct values for the given expression DISTINCT(user_id) MAX returns the biggest value from the given input expression MAX(sale_amount) MIN returns the smallest value from the given input expression MIN(sale_amount) SUM returns the total of all values from the given input expression SUM(sale_amount) STDDEV calculates the population standard deviation for a group of numeric values STDDEV(sale_amount) VARIANCE calculates the VARIANCE(sale_amount) population variance for a group of numeric values ROLLUP and Window Functions ROLLUP is a modifier to an aggregate expression that turns an aggregate into a windowed aggregate. Window functions (RANK, DENSE_RANK and NTILE) can only be used within a ROLLUP statement. The ROLLUP statement defines the partitioning and ordering of a rowset before the associated aggregate function or window function is applied. Page 333 Data Analysis and Visualization Guide - Platfora Expressions ROLLUP defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use window functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results. ROLLUP statements can be specified in either the dataset or the vizboard. When using a ROLLUP in a vizboard, the measure for which you are calculating the ROLLUP must already exist in the lens you are using in the vizboard. Function Description Example DENSE_RANK assigns the rank (position) of each row in a group (partition) of rows and does not skip rank numbers in the event of tie ROLLUP DENSE_RANK() TO () ORDER BY ([Sales(Sum)] DESC) ROWS UNBOUNDED PRECEDING NTILE divides a partitioned group of rows into the specified number of buckets, and returns the bucket number to which the current row belongs ROLLUP NTILE(100) TO () ORDER BY ([Total Records] DESC) ROWS UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING RANK assigns the rank ROLLUP RANK() TO () ORDER BY (position) of each row ([Sales(Sum)] DESC) ROWS UNBOUNDED in a group (partition) PRECEDING of rows and skips rank numbers in the event of tie ROLLUP a modifier to an aggregate function that turns a regular aggregate function into a windowed, partitioned, or adaptive aggregate function 100 * COUNT(Flights) / ROLLUP COUNT(Flights) TO ([Departure Date]) ROW_NUMBER a modifier to an aggregate function that turns a regular aggregate function into a windowed, partitioned, or adaptive aggregate function ROLLUP ROW_NUMBER() TO (Quarter) ORDER BY (Sum_Sales DESC) ROWS UNBOUNDED PRECEDING Page 334 Data Analysis and Visualization Guide - Platfora Expressions User Defined Functions User defined functions (UDFs) allow you to define your own per-row processing logic, and then expose that functionality to users in the Platfora application expression builder. See User Defined Functions (UDFs) for more information. Comparison Operators Comparison operators are used to compare the equivalency of two expressions of the same data type. The result of a comparison expression is a Boolean value (returns 1 for true, 0 for false, or NULL for invalid). Boolean expressions are most often used to specify data processing conditions or filters. Operator Meaning Example Expression = or == Equal to order_date = "12/22/2011" > Greater than age > 18 !> Not greater than age !> 8 < Less than age < 30 !< Not less than age !< 12 >= Greater than or equal to age >= 20 <= Less than or equal to age <= 29 <> or != or ^= Not equal to age <> 30 Logical Operators Logical operators are used to define Boolean (true / false) expressions. Logical operators are used in expressions to test for a condition, and return 1 if the condition is true or 0 if it is false. Logical operators are often used in lens filters, CASE expressions, PARTITION expressions, and WHERE clauses of queries. Operator Meaning Example Expression AND Test whether two conditions are true. OR Test if either of two conditions are true. Page 335 Data Analysis and Visualization Guide - Platfora Expressions Operator Meaning BETWEEN Test whether a date or year BETWEEN 2000 AND 2012 numeric value is within the min and max values min_value AND max_value Example Expression (inclusive). IN(list) Test whether a value is product_type within a set. IN("tablet","phone","laptop") LIKE("pattern") Simple inclusive caseinsensitive character pattern matching. The * character matches any number of characters. The ? character matches exactly one character. last_name LIKE("?utch*") matches Kutcher, hutch but not Krutcher or crutch Check whether a field value or expression is null (empty) ship_date IS NULL evaluates to true when the ship_date field is Reverses the value of other operators. • year NOT BETWEEN 2000 AND 2012 value IS NULL NOT company_name LIKE("platfora") matches Platfora or platfora empty • first_name NOT LIKE("Jo?n*") excludes John, jonny but not Jon or Joann • Date.Weekday NOT IN("Saturday","Sunday") • purchase_date IS NOT NULL evaluates to true when the purchase_date field is not empty Arithmetic Operators Arithmetic operators perform basic math operations on two expressions of the same data type resulting in a numeric value. The plus (+) and minus (-) operators can also be used to perform arithmetic operations on DATETIME values. Operator Description Example + Addition amount + 10 (add 10 to the value of the amount field) Page 336 Data Analysis and Visualization Guide - Platfora Expressions Operator Description Example - Subtraction amount - 10 (subtract 10 from the value of the amount field) * Multiplication amount * 100 (multiply the value of the amount field by 100) / Division bytes / 1024 (divide the value of the bytes field by 1024 and return the quotient) Comparison Operators Comparison operators are used to compare the equivalency of two expressions of the same data type. The result of a comparison expression is a Boolean value (returns 1 for true, 0 for false, or NULL for invalid). Boolean expressions are most often used to specify data processing conditions or filter criteria. Operator Definitions Operator Meaning Example Expression = or == Equal to order_date = "12/22/2011" > Greater than age > 18 !> Not greater than age !> 8 < Less than age < 30 !< Not less than age !< 12 >= Greater than or equal to age >= 20 <= Less than or equal to age <= 29 Page 337 Data Analysis and Visualization Guide - Platfora Expressions Operator Meaning Example Expression <> or != or ^= Not equal to age <> 30 If you are writing queries with REST and the query string includes an = (equal) character, you must URL encode it as %3D. Failure to encode the character can result in this error: string matching regex `(?i)\Qnot\E\b' expected but end of source found. Logical Operators Logical operators are used to define Boolean (true / false) expressions. Logical operators are used in expressions to test for a condition, and return 1 if the condition is true or 0 if it is false. Logical operators are often used in lens filters, CASE expressions, PARTITION expressions, and WHERE clauses of queries. Operator Meaning AND Test whether two conditions are true. OR Test if either of two conditions are true. BETWEEN Test whether a date or year BETWEEN 2000 AND 2012 numeric value is within the min and max values min_value AND max_value Example Expression (inclusive). IN(list) Test whether a value is product_type within a set. IN("tablet","phone","laptop") LIKE("pattern") Simple inclusive caseinsensitive character pattern matching. The * character matches any number of characters. The ? character matches exactly one character. last_name LIKE("?utch*") matches Kutcher, hutch but not Krutcher or crutch Check whether a field value or expression is null (empty) ship_date IS NULL evaluates to true when the ship_date field is value IS NULL company_name LIKE("platfora") matches Platfora or platfora empty Page 338 Data Analysis and Visualization Guide - Platfora Expressions Operator Meaning Example Expression NOT Reverses the value of other operators. • year NOT BETWEEN 2000 AND 2012 • first_name NOT LIKE("Jo?n*") excludes John, jonny but not Jon or Joann • Date.Weekday NOT IN("Saturday","Sunday") • purchase_date IS NOT NULL evaluates to true when the purchase_date field is not empty Arithmetic Operators Arithmetic operators perform basic math operations on two expressions of the same data type resulting in a numeric value. The plus (+) and minus (-) operators can also be used to perform arithmetic operations on DATETIME values. Operator Description Example + Addition amount + 10 (add 10 to the value of the amount field) - Subtraction amount - 10 (subtract 10 from the value of the amount field) * Multiplication amount * 100 (multiply the value of the amount field by 100) / Division bytes / 1024 (divide the value of the bytes field by 1024 and return the quotient) Conditional and NULL Processing Conditional and NULL processing allows you to transform or manipulate data values based on certain defined conditions. Conditional processing (CASE) can be done at either the dataset or vizboard level. NULL processing (COALESCE and IS_VALID) is only applicable at the dataset level. During a lens Page 339 Data Analysis and Visualization Guide - Platfora Expressions build, any NULL values in the source data are converted to default values, so lenses and vizboards have no concept of NULL values. CASE CASE is a row function that evaluates each row in the dataset according to one or more input conditions, and outputs the specified result when the input conditions are met. CASE WHEN input_condition [AND|OR input_condition]THEN output_expression [...] [ELSE other_output_expression] END Returns one value per row of the same type as the output expression. All output expressions must return the same data type. If there are multiple output expressions that return different data types, then you will need to enclose your entire CASE expression in one of the data type conversion functions to explicitly cast all output values to a particular data type. WHEN input_condition Required. The WHEN keyword is used to specify one or more Boolean expressions (see Platfora's supported conditional operators). If an input value meets the condition, then the output expression is applied. Input conditions can include other row functions in their expression, but cannot contain aggregate functions or measure expressions. You can use the AND or OR keywords to combine multiple input conditions. THEN output_expression Required. The THEN keyword is used to specify an output expression when the specified conditions are met. Output expressions can include other row functions in their expression, but cannot contain aggregate functions or measure expressions. ELSE other_output_expression Optional. The ELSE keyword can be used to specify an alternate output expression to use when the specified conditions are not met. If an ELSE expression is not supplied, ELSE NULL is the default. END Required. Denotes the end of CASE function processing. Convert values in the age column into a range-based groupings (binning): CASE WHEN age <= 25 THEN "0-25" WHEN age <= 50 THEN "26-50" ELSE "over 50" END Transform values in the gender column from one string to another: CASE WHEN gender = "M" THEN "Male" WHEN gender = "F" THEN "Female" ELSE "Unknown" END The vehicle column contains the following values: truck, bus, car, scooter, wagon, bike, tricycle, and motorcycle. The following example convert multiple values in the vehicle column into a single value: Page 340 Data Analysis and Visualization Guide - Platfora Expressions CASE WHEN vehicle in ("bike","scooter","motorcycle) THEN "two-wheelers" ELSE "other" END COALESCE COALESCE is a row function that returns the first valid value (NOT NULL value) from a commaseparated list of expressions. COALESCE(expression[,expression][,...]) Returns one value per row of the same type as the first valid input expression. expression At least one required. A field name or expression. The following example shows an expression to calculate employee yearly income for exempt employees that have a salary and non-exempt employees that have an hourly_wage. This expression checks the values of both fields for each row, and returns the value of the first expression that is valid (NOT NULL). COALESCE(hourly_wage * 40 * 52, salary) IS_VALID IS_VALID is a row function that returns 0 if the returned value is NULL, and 1 if the returned value is NOT NULL. This is useful for computing other calculations where you want to exclude NULL values (such as when computing averages). IS_VALID(expression) Returns 0 if the returned value is NULL, and 1 if the returned value is NOT NULL. expression Required. A field name or expression. Define a computed field using IS_VALID. This returns a row count only for the rows where this field value is NOT NULL. If a value is NULL, it returns 0 for that row. In this example, we create a computed field (sale_amount_not_null) using the sale_amount field as the basis. IS_VALID(sale_amount) Then you can use the sale_amount_not_null computed field to calculate an acurate average for sale_amount that excludes NULL values: SUM(sale_amount)/SUM(sale_amount_not_null) This is what happens automatically when you use the AVG function. Event Series Processing Event series processing allows you to partition rows of input data, order the rows sequentially (typically by a timestamp), and search for matching patterns in a set of rows. Computed fields that are defined in a dataset using a PARTITION expression are considered event series processing computed fields. Page 341 Data Analysis and Visualization Guide - Platfora Expressions Event series processing computed fields are processed differently than regular computed fields. Instead of computing values from the input of a single row, they compute values from inputs of multiple rows in the dataset. Event series processing computed fields can only be defined in the dataset - not in the vizboard or a lens query. PARTITION PARTITION is an event series processing language that partitions the rows of a dataset, orders the rows sequentially (typically by a timestamp), and searches for matching patterns in a set of rows. Computed fields that are defined in a dataset using a PARTITION expression are considered event series processing computed fields. Event series processing computed fields are processed differently than regular computed fields. Instead of computing values from the input of a single row, they compute values from inputs of multiple rows in the dataset. The PARTITION function can only be used to define a computed field in the dataset definition (pre-lens build). PARTITION cannot be used to define a vizboard computed field. Unlike other expressions, PARTITION expressions cannot be embedded within other functions or expressions - it must be a top-level expression. PARTITION BYfield_name ORDER BY field_name [ASC|DESC] PATTERN (pattern_expression) DEFINE symbol_1 AS filter_expression [,symbol_n AS filter_expression ] [, ...] OUTPUT output_expression To understand how event series processing works, we'll walk through a simple example of a PARTITION expression. This is a simple example of some weblog page view data. Each row represents a page view by a user at a give point in time. Session IDs are used to group together page views that happened in the same user session: Page 342 Data Analysis and Visualization Guide - Platfora Expressions Suppose you wanted to know how many sessions included the path of page visits to ‘home.html’ then ‘products.html’ then ‘checkout.html’. You could define a PARTITION expression that groups the rows by session, orders by time, and then iterates through the rows from top to bottom to find sessions that match the pattern: PARTITION BY SessionID ORDER BY Timestamp PATTERN (A,B,C) DEFINE A AS Page = "home.html", B AS Page = "product.html", C AS Page = "checkout.html" OUTPUT "TRUE" 1. The PARTITION BY clause partitions (or groups) the rows of the dataset by session. 2. Within each partition, the ORDER BY clause sorts the rows by time (in ascending order by default). 3. Each DEFINE clause specifies a condition used to evaluate a row, and binds that condition to a symbol that is then used in the PATTERN clause. 4. The PATTERN clause checks if the conditions are met in the specified order and frequency. This pattern says that there is a match whenever there are 3 consecutive rows that meet criteria A then B then C. 5. For a row that satisfies all of the PATTERN criteria, the value of the OUTPUT clause is applied. Otherwise the output is NULL for rows that don’t meet all of the PATTERN criteria. Returns one value per row of the same type as the output_expression for rows that match the defined match pattern, otherwise returns NULL for rows that do not match the pattern. Output values are calculated during the lens build process using a special event series MapReduce job. Therefore, sample output values for a PARTITION computed field cannot be shown in the dataset workspace. PARTITION BY field_name Required. The PARTITION BY clause is used to specify a field in the current dataset by which to partition the rows. Rows that share the same value for this field will be grouped Page 343 Data Analysis and Visualization Guide - Platfora Expressions together, and each group will then be processed independently according to the matching pattern criteria. The partition field cannot be a field of a referenced dataset; it must be a field in the current focus dataset. ORDER BY field_name Optional. The ORDER BY clause specifies a field by which to sort the rows within each partition before applying the match pattern criteria. For event series processing, records are typically ordered by a DATETIME type field, such as a date or a timestamp. The default sort order is ascending (first to last or low to high). The ordering field cannot be a field of a referenced dataset; it must be a field in the current focus dataset. PATTERN (pattern_expression) Required. The PATTERN clause specifies the matching pattern to search for within a partition of rows. The pattern_expression is expressed in a format similar to a regular expression. The pattern_expression can include: • A symbol that represents some match criteria (as declared in the DEFINE clause). • A symbol followed by one of the following regex quantifiers: ? (matches once or not at all - greedy construct) ?? (matches once or not at all - reluctant construct) * (matches zero or more times - greedy construct) *? (matches zero or more times - reluctant construct) + (matches one or more times - greedy construct) +? (matches one or more times - reluctant construct) ** (matches the empty sequence, or one or more of the quantified symbol, with gaps allowed in between. The match need not begin or end with the quantified symbol) *+ (matches the empty sequence, or one or more of the quantified symbol, with gaps allowed in between. The match must end with the quantified symbol) ++ (matches the quantified symbol, followed by zero or more of the quantified symbol, with gaps allowed in between. The match must end with the quantified symbol) +* (matches the quantified symbol, followed by zero or more of the quantified symbol, with gaps allowed in between. The match need not end with the quantified symbol) • A symbol or pattern of symbols anchored by the regex special characters for the beginning of string. Page 344 Data Analysis and Visualization Guide - Platfora Expressions ^ (marks the beginning of the set of rows that match to the pattern) • patternA|patternB - The alternation operator (pipe symbol) between two symbols or patterns signifies an OR match. • patternA,patternB - The concatenation operator (comma) between two symbols or patterns signifies a match when pattern B immediately follows pattern A. • patternA->patternB - The follows operator (minus and greater-than sign) between two symbols or patterns signifies a match when pattern B eventually follows pattern A. • (pattern_expression) - By default, pattern expressions are matched from left to right. If parenthesis are used to group sub-expressions, the sub-expression within the parenthesis is evaluated first. You cannot use quantifiers outside of parenthesis. For example, you cannot write ((A,B,C)*), to indicate that the asterisk quantifier applies to the whole (A,B,C) expression. DEFINE symbol AS filter_expression Required. The DEFINE clause is used to enumerate symbols used in the PATTERN clause (or in the filter_expression of a subsequent symbol definition). A symbol is a name used to refer to some pattern matching criteria. This can be any name or token that follows Platfora's object naming rules. For example, if the name contains spaces, special characters, keywords, or starts with a number, you must enclose the name in brackets [] to escape it. Otherwise, this can be any logical name that helps you identify a piece of pattern matching logic in your expression. The filter_expression is a Boolean (true or false) expression that operates on each row of the partition. A filter_expression can contain: • The special expression TRUE or 1, meaning allow the match to occur for any row in the partition. • Any field_name in the current dataset. • symbol.field_name - A field from the dataset qualified by the name of a symbol that (1) appears only once in the PATTERN clause, (2) preceeds this symbol in the PATTERN clause, and (3) is not followed by a repetition quantifier in the PATTERN clause. For example: PATTERN (A, B) DEFINE A AS TRUE, B AS product = A.product This means that the expression for symbol B will match to a row if the product field for that row is also equal to the product field for the row that is bound to symbol A. • Any of the comparison operators, such as greater than, less than, equals, and so on. • The keywords AND or OR (for combining multiple criteria in a single filter expression) Page 345 Data Analysis and Visualization Guide - Platfora Expressions • FIRST|LAST(symbol.field_name) - A field from the dataset, qualified by the name of a symbol that (1) only appears once in the PATTERN clause, (2) preceeds this symbol in the PATTERN clause, and (3) is followed by a repetition quantifier in the PATTERN clause (*,*?,+, or +?). This returns the field value for the first or last row when the pattern matches to a set of rows. For example: PATTERN (A+) DEFINE A AS product = FIRST(A.product) OR COUNT(A)=0 The pattern A+ will match to a series of consecutive rows that all have the same value for the product field as the first row in the sequence. If the current row happens to be the first row in the sequence, then it will also be included in the match. A FIRST or LAST expression evaluates to NULL if it refers to a symbol that ends up matching an empty sequence. Make sure your expression handles the row at the beginning or end of a sequence if you want that row to match as well. • Any computed expression that operates on the fields or expressions listed above and/or on literal values. OUTPUT output_expression Required. An expression that specifies what the output value should be. The output expression can refer to: • The field declared in the PARTITION BY clause. • symbol.field_name - A field from the dataset, qualified by the name of a symbol that (1) appears only once in the PATTERN clause, and (2) is not followed by a repetition quantifier in the PATTERN clause. This will output the matching field value. • COUNT(symbol) where symbol (1) appears only once in the PATTERN clause, and (2) is followed by a repetition quantifier in the PATTERN clause. This will output the sequence number of the row that matched the symbol pattern. • FIRST | LAST | SUM | COUNT | AVG(symbol.field_name) where symbol (1) appears only once in the PATTERN clause, and (2) is followed by a repetition quantifier in the PATTERN clause. This will output an aggregated value for a set of rows that matched the symbol pattern. • Since you can only output a single column value, you can use the PACK_VALUES function to output multiple results in a single column as key/value pairs. 'Session Start Time' Expression Calculate a user session by partitioning by user and ordering by time. The matching logic represented by symbol A checks if the time of the current row is less than 30 minutes from the preceding row. If it is, then it is considered part of the same session as the previous row. Otherwise, the current row is considered the start of a new session. The PATTERN (A+) means that the matching logic represented Page 346 Data Analysis and Visualization Guide - Platfora Expressions by symbol A must be true for one or more consecutive rows. The output then returns the time of the first row in a session. PARTITION BY UserID ORDER BY Timestamp PATTERN (A+) DEFINE A AS COUNT(A)=0 OR MINUTES_BETWEEN(Timestamp,LAST(A.Timestamp)) < 30 OUTPUT FIRST(A. Timestamp) 'Click Number in Session' Expression Calculate where a click happened in a session by partitioning by session and ordering by time. The matching logic represented by symbol A simply matches to any row in the session. The PATTERN (A +) means that the matching logic represented by symbol A must be true for one or more consecutive rows. The output then returns to count of the row within the partition (based on its order or position in the partition). PARTITION BY [Session ID] ORDER BY Timestamp PATTERN (A+) DEFINE A AS TRUE OUTPUT COUNT(A) 'Path to Page' Expression This is a complicated expression that looks back from the current row's position to determine the previous 4 pages viewed in a session. Since a PARTITION expression can only output one column value as its result, the OUTPUT clause uses the PACK_VALUES function to return the previous page positions 1,2,3, and 4 in one output value. You can then use a series of EXTRACT_VALUE expressions to create individual columns for each prior page view in the path. PARTITION BY SessionID ORDER BY Timestamp PATTERN (^OtherPreviousPages*?, Page4Back??, Page3Back??, Page2Back??, Page1Back??, CurrentPage) DEFINE OtherPreviousPages AS TRUE, Page4Back AS TRUE, Page3Back AS TRUE, Page2Back AS TRUE, Page1Back AS TRUE, CurrentPage AS TRUE OUTPUT PACK_VALUES("Back4",Page4Back.Page, "Back3",Page3Back.Page, "Back2",Page2Back.Page, "Back1",Page1Back.Page) ‘Page -1 Back’ Expression Use the output from the Path to Page expression and extract the last page viewed before the current page. EXTRACT_VALUE([Path to Page],"Back1") Page 347 Data Analysis and Visualization Guide - Platfora Expressions PACK_VALUES PACK_VALUES is a row function that returns multiple output values packed into a single string of key/ value pairs separated by the Platfora default key and pair separators. This is useful when the OUTPUT clause of a PARTITION expression returns multiple output values. The string returned is in a format that can be read by the EXTRACT_VALUE function. PACK_VALUES uses the same key and pair separator values that EXTRACT_VALUE uses (the Unicode escape sequences u0003 and u0002, respectively). PACK_VALUES(key_string,value_expression[,key_string,value_expression] [,...]) Returns one value per row of type STRING. If the value for either key_string or value_expression of a pair is null or contains either of the two separators, the full key/value pair is omitted from the return value. key_string At least one required. A field name of any type, a literal string or number, or an expression that returns any value. value_expression At least one required. A field name of any type, a literal string or number, or an expression that returns any value. The expression must include one value_expression instance for each key_string instance. Combine the values of the custid and age fields into a single string field. PACK_VALUES("ID",custid,"Age",age) The following expression returns ID\u00035555\u0002Age\u000329 when the value of the custid field is 5555 and the value of the age field is 29: PACK_VALUES("ID",custid,"Age",age) The following expression returns Age\u000329 when the value of the age field is 29: PACK_VALUES("ID",NULL,"Age",age) The following expression returns 29 as a STRING value when the age field is an INTEGER and its value is 29: EXTRACT_VALUE(PACK_VALUES("ID",custid,"Age",age),"Age") You might want to use the PACK_VALUES function to combine multiple field values into a single value in the OUTPUT clause of the PARTITION (event series processing) function. Then you can use the EXTRACT_VALUE function in a different computed field in the dataset to get one of the values returned by the PARTITION function. For example, in the example below, the PARTITION function creates a set of rows that defines the previous five web pages accessed in a particular user session: PARTITION BY Session ORDER BY Time DESC PATTERN (A?, B?, C?, D?, E) DEFINE A AS true, B AS true, C AS true, D AS true, E AS true OUTPUT PACK_VALUES("A", A.Page, "B", B.Page, "C", C.Page, "D", D.Page) Page 348 Data Analysis and Visualization Guide - Platfora Expressions String Functions String functions allow you to manipulate and transform textual data, such as combining string values or extracting a portion of a string value. CONCAT CONCAT is a row function that returns a string by concatenating (combining together) the results of multiple string expressions. CONCAT(value_expression[,value_expression][,...]) Returns one value per row of type STRING. value_expression At least one required. A field name of any type, a literal string or number, or an expression that returns any value. Combine the values of the month, day, and year fields into a single date field formatted as MM/DD/ YYYY. CONCAT(month,"/",day,"/",year) ARRAY_CONTAINS ARRAY_CONTAINS is a row function that performs a whole string match against a string containing delimited values and returns a 1 or 0 depending on whether or not the string contains the search value. ARRAY_CONTAINS(array_string,"delimiter","search_string") Returns one value per row of type INTEGER. A return value of 1 indicates a positive match, and a return value of 0 indicates no match. array_string Required. The name of a field or expression of type STRING (or a literal string) that contains a valid array. delimiter Required. The delimiter used between values in the array string. This can be a name of a field or expression of type STRING. search_string Required. The literal string that you want to search for. This can be a name of a field or expression of type STRING. If you had a device field that contained a comma delimited list formatted like this: Safari,iPad You could determine whether or not the device used was an iPad using the following expression: Page 349 Data Analysis and Visualization Guide - Platfora Expressions ARRAY_CONTAINS(device,",","iPad") The following expressions return 1: ARRAY_CONTAINS("platfora","|","platfora") ARRAY_CONTAINS("platfora|hadoop|2.3","|","hadoop") The following expressions return 0: ARRAY_CONTAINS("platfora","|","plat") ARRAY_CONTAINS("platfora,hadoop","|","platfora") FILE_NAME FILE_NAME is a row function that returns the original file name from the source file system. This is useful when the source data that comprises a dataset comes from multiple files, and there is useful information in the file names themselves (such as dates or server names). You can use FILE_NAME in combination with other string processing functions to extract useful information from the file name. FILE_NAME() Returns one value per row of type STRING. Your dataset is based on daily log files that use an 8 character date as part of the file name. For example, 20120704.log is the file name used for the log file created on July 4, 2012. The following expression uses FILE_NAME in combination with SUBSTRING and TO_DATE to create a date field from the first 8 characters of the file name. TO_DATE(SUBSTRING(FILE_NAME(),0,8),"yyyyMMdd") Your dataset is based on log files that use the server IP address as part of the file name. For example, 172.12.131.118.log is the log file name for server 172.12.131.118. The following expression uses FILE_NAME in combination with REGEX to extract the IP address from the file name. REGEX(FILE_NAME(),"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.log") FILE_PATH FILE_PATH is a row function that returns the full URI path from the source file system. This is useful when the source data that comprises a dataset comes from multiple files, and there is useful information in the directory names or file names themselves (such as dates or server names). You can use FILE_PATH in combination with other string processing functions to extract useful information from the file path. FILE_PATH() Returns one value per row of type STRING. Your dataset is based on daily log files that are organized into directories by date on the source file system, and the file names are the server IP address of the server that produced the log file. For Page 350 Data Analysis and Visualization Guide - Platfora Expressions example, the URI path to a log file produced by server 172.12.131.118 on July 4, 2012 is hdfs://myhdfsserver.com/data/logs/20120704/172.12.131.118.log. The following expression uses FILE_PATH in combination with REGEX and TO_DATE to create a date field from the date directory name. TO_DATE(REGEX(FILE_PATH(),"hdfs://myhdfs-server.com/data/logs/(\d{8})/(?: \d{1,3}\.*)+\.log"),"yyyyMMdd") And the following expression uses FILE_NAME and REGEX to extract the server IP address from the file name: REGEX(FILE_NAME(),"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\.log") EXTRACT_COOKIE EXTRACT_COOKIE is a row function that extracts the value of the given cookie identifier from a semicolon delimited list of cookie key=value pairs. This function can be used to extract a particular cookie value from a combined web access log Cookie column. EXTRACT_COOKIE("cookie_list_string",cookie_key_string) Returns the value of the specified cookie key as type STRING. cookie_list_string Required. A field or literal string that has a semi-colon delimited list of cookie key=value pairs. cookie_key_string Required. The cookie key name for which to extract the cookie value. Extract the value of the vID cookie from a literal cookie string: EXTRACT_COOKIE("SSID=ABC; vID=44", "vID") returns 44 Extract the value of the vID cookie from a field named Cookie: EXTRACT_COOKIE(Cookie,"vID") EXTRACT_VALUE EXTRACT_VALUE is a row function that extracts the value for the given key from a string containing delimited key/value pairs. EXTRACT_VALUE(string,key_name [,delimiter] [,pair_delimiter]) Returns the value of the specified key as type STRING. string Required. A field or literal string that contains a delimited list of key/value pairs. key_name Required. The key name for which to extract the value. Page 351 Data Analysis and Visualization Guide - Platfora Expressions delimiter Optional. The delimiter used between the key and the value. If not specified, the value u0003 is used. This is the Unicode escape sequence for the start of text character (which is the default delimiter used by Hive). pair_delimiter Optional. The delimiter used between key/value pairs when the input string contains more than one key/ value pair. If not specified, the value u0002 is used. This is the Unicode escape sequence for the end of text character (which is the default delimiter used by Hive). Extract the value of the lastname key from a literal string of key/value pairs: EXTRACT_VALUE("firstname;daria|lastname;hutch","lastname",";","|") returns hutch Extract the value of the email key from a string field named contact_info that contains strings in the format of key:value,key:value: EXTRACT_VALUE(contact_info,"email",":",",") INSTR INSTR is a row function that returns an integer indicating the position of a character within a string that is the first character of the occurrence of a substring. Platfora's INSTR function is similar to the FIND function in Excel, except that the first letter is position 0 and the order of the arguments is reversed. INSTR(string,substring,position,occurrence) Returns one value per row of type INTEGER. The first position is indicated with the value of zero (0). string Required. The name of a field or expression of type STRING (or a literal string). substring Required. A literal string or name of a field that specifies the substring to search for in string. position Optional. An integer that specifies at which character in string to start searching for substring. A value of 0 (zero) starts the search at the beginning of string. Use a positive integer to start searching from the beginning of string, and use a negative integer to start searching from the end of string. When no position is specified, INSTR searches at the beginning of the string (0). occurrence Optional. A positive integer that specifies which occurrence of substring to search for. When no occurrence is specified, INSTR searches for the first occurrence of the substring (1). Return the position of the first occurrence of the substring "http://" starting at the end of the url field: INSTR(url,"http://",-1,1) Page 352 Data Analysis and Visualization Guide - Platfora Expressions The following expression searches for the second occurrence of the substring "st" starting at the beginning of the string "bestteststring". INSTR finds that the substring starts at the seventh character in the string, so it returns 6: INSTR("bestteststring","st",0,2) The following expression searches backward for the second occurrence of the substring "st" starting at 7 characters before the end of the string "bestteststring". INSTR finds that the substring starts at the third character in the string, so it returns 2: INSTR("bestteststring","st",-7,2) JAVA_STRING JAVA_STRING is a row function that returns the unescaped version of a Java unicode character escape sequence as a string value. This is useful when you want to specify unicode characters in an expression. For example, you can use JAVA_STRING to specify the unicode value representing a control character. JAVA_STRING(unicode_escape_sequence) Returns the unescaped version of the specified unicode character, one value per row of type STRING. unicode_escape_sequence Required. A STRING value containing a unicode character expressed as a Java unicode escape sequence. Unicode escape sequences consist ofa backslash '\' (ASCII character 92, hex 0x5c), a 'u' (ASCII 117, hex 0x75), optionally one or more additional 'u' characters, and four hexadecimal digits (the characters '0' through '9' or 'a' through 'f' or 'A' through 'F'). Such sequences represent the UTF-16 encoding of a Unicode character. For example, the letter 'a' is equivalent to '\u0061'. Evaluates whether the currency field is equal to the yen symbol. CASE WHEN currency == JAVA_STRING("\u00a5") THEN "yes" ELSE "no" END JOIN_STRINGS JOIN_STRINGS is a row function that returns a string by concatenating (combining together) the results of multiple values with the separator in between each non-null value. JOIN_STRINGS(separator,value_expression[,value_expression][,...]) Returns one value per row of type STRING. separator Required. A field name of type STRING, a literal string, or an expression that returns a string. value_expression At least one required. A field name of any type, a literal string or number, or an expression that returns any value. Combine the values of the month, day, and year fields into a single date field formatted as MM/DD/ YYYY. Page 353 Data Analysis and Visualization Guide - Platfora Expressions JOIN_STRINGS("/",month,day,year) The following expression returns NULL: JOIN_STRINGS("+",NULL,NULL,NULL) The following expression returns a+b: JOIN_STRINGS("+","a","b",NULL) JSON_ARRAY_CONTAINS JSON_ARRAY_CONTAINS is a row function that performs a whole string match against a string formatted as a JSON array and returns a 1 or 0 depending on whether or not the string contains the search value. JSON_ARRAY_CONTAINS(json_array_string,"search_string") Returns one value per row of type INTEGER. A return value of 1 indicates a positive match, and a return value of 0 indicates no match. json_array_string Required. The name of a field or expression of type STRING (or a literal string) that contains a valid JSON array. A JSON array is an ordered sequence of values separated by commas and enclosed in square brackets. search_string Required. The literal string that you want to search for. This can be a name of a field or expression of type STRING. If you have a software field that contains a JSON array formatted like this: ["hadoop","platfora"] The following expression returns 1: JSON_ARRAY_CONTAINS(software,"platfora") JSON_DOUBLE JSON_DOUBLE is a row function that extracts a DOUBLE value from a field in a JSON object. JSON_DOUBLE(json_string,"json_field") Returns one value per row of type DOUBLE. json_string Required. The name of a field or expression of type STRING (or a literal string) that contains a valid JSON object. json_field Required. The key or name of the field value you want to extract. Page 354 Data Analysis and Visualization Guide - Platfora Expressions For top-level fields, specify the name identifier (key) of the field. To access fields within a nested object, specify a dot-separated path of field names (for example top_level_field_name.nested_field_name). To extract a value from an array, specify the dot-separated path of field names and the array position starting at 0 for the first value in an array, 1 for the second value, and so on (for example, field_name.0). If the name identifier contains dot or period characters within the name itself, escape the name by enclosing it in brackets (for example, [field.name.with.dot].[another.dot.field.name] If the field name is null (empty), use brackets with nothing in between as the identifier, for example []. If you had a top_scores field that contained a JSON object formatted like this (with the values contained in an array): {"practice_scores":["538.67","674.99","1021.52"], "test_scores": ["753.21","957.88","1032.87"]} You could extract the third value of the test_scores array using the expression: JSON_DOUBLE(top_scores,"test_scores.2") JSON_FIXED JSON_FIXED is a row function that extracts a FIXED value from a field in a JSON object. JSON_FIXED(json_string,"json_field") Returns one value per row of type FIXED. json_string Required. The name of a field or expression of type STRING (or a literal string) that contains a valid JSON object. json_field Required. The key or name of the field value you want to extract. For top-level fields, specify the name identifier (key) of the field. To access fields within a nested object, specify a dot-separated path of field names (for example top_level_field_name.nested_field_name). To extract a value from an array, specify the dot-separated path of field names and the array position starting at 0 for the first value in an array, 1 for the second value, and so on (for example, field_name.0). If the name identifier contains dot or period characters within the name itself, escape the name by enclosing it in brackets (for example, [field.name.with.dot].[another.dot.field.name] If the field name is null (empty), use brackets with nothing in between as the identifier, for example []. Page 355 Data Analysis and Visualization Guide - Platfora Expressions If you had a top_scores field that contained a JSON object formatted like this (with the values contained in an array): {"practice_scores":["538.67","674.99","1021.52"], "test_scores": ["753.21","957.88","1032.87"]} You could extract the third value of the test_scores array using the expression: JSON_FIXED(top_scores,"test_scores.2") JSON_INTEGER JSON_INTEGER is a row function that extracts an INTEGER value from a field in a JSON object. JSON_INTEGER(json_string,"json_field") Returns one value per row of type INTEGER. json_string Required. The name of a field or expression of type STRING (or a literal string) that contains a valid JSON object. json_field Required. The key or name of the field value you want to extract. For top-level fields, specify the name identifier (key) of the field. To access fields within a nested object, specify a dot-separated path of field names (for example top_level_field_name.nested_field_name). To extract a value from an array, specify the dot-separated path of field names and the array position starting at 0 for the first value in an array, 1 for the second value, and so on (for example, field_name.0). If the name identifier contains dot or period characters within the name itself, escape the name by enclosing it in brackets (for example, [field.name.with.dot].[another.dot.field.name] If the field name is null (empty), use brackets with nothing in between as the identifier, for example []. If you had an address field that contained a JSON object formatted like this: {"street_address":"123 B Street", "city":"San Mateo", "state":"CA", "zip_code":"94403"} You could extract the zip_code value using the expression: JSON_INTEGER(address,"zip_code") If you had a top_scores field that contained a JSON object formatted like this (with the values contained in an array): {"practice_scores":["538","674","1021"], "test_scores": ["753","957","1032"]} Page 356 Data Analysis and Visualization Guide - Platfora Expressions You could extract the third value of the test_scores array using the expression: JSON_INTEGER(top_scores,"test_scores.2") JSON_LONG JSON_LONG is a row function that extracts a LONG value from a field in a JSON object. JSON_LONG(json_string,"json_field") Returns one value per row of type LONG. json_string Required. The name of a field or expression of type STRING (or a literal string) that contains a valid JSON object. json_field Required. The key or name of the field value you want to extract. For top-level fields, specify the name identifier (key) of the field. To access fields within a nested object, specify a dot-separated path of field names (for example top_level_field_name.nested_field_name). To extract a value from an array, specify the dot-separated path of field names and the array position starting at 0 for the first value in an array, 1 for the second value, and so on (for example, field_name.0). If the name identifier contains dot or period characters within the name itself, escape the name by enclosing it in brackets (for example, [field.name.with.dot].[another.dot.field.name] If the field name is null (empty), use brackets with nothing in between as the identifier, for example []. If you had a top_scores field that contained a JSON object formatted like this (with the values contained in an array): {"practice_scores":["538","674","1021"], "test_scores": ["753","957","1032"]} You could extract the third value of the test_scores array using the expression: JSON_LONG(top_scores,"test_scores.2") JSON_STRING JSON_STRING is a row function that extracts a STRING value from a field in a JSON object. JSON_STRING(json_string,"json_field") Returns one value per row of type STRING. json_string Required. The name of a field or expression of type STRING (or a literal string) that contains a valid JSON object. Page 357 Data Analysis and Visualization Guide - Platfora Expressions json_field Required. The key or name of the field value you want to extract. For top-level fields, specify the name identifier (key) of the field. To access fields within a nested object, specify a dot-separated path of field names (for example top_level_field_name.nested_field_name). To extract a value from an array, specify the dot-separated path of field names and the array position starting at 0 for the first value in an array, 1 for the second value, and so on (for example, field_name.0). If the name identifier contains dot or period characters within the name itself, escape the name by enclosing it in brackets (for example, [field.name.with.dot].[another.dot.field.name] If the field name is null (empty), use brackets with nothing in between as the identifier, for example []. If you had an address field that contained a JSON object formatted like this: {"street_address":"123 B Street", "city":"San Mateo", "state":"CA", "zip":"94403"} You could extract the state value using the expression: JSON_STRING(address,"state") If you had a misc field that contained a JSON object formatted like this (with the values contained in an array): {"hobbies":["sailing","hiking","cooking"], "interests": ["art","music","travel"]} You could extract the first value of the hobbies array using the expression: JSON_STRING(misc,"hobbies.0") LENGTH LENGTH is a row function that returns the count of characters in a string value. LENGTH(string) Returns one value per row of type INTEGER. string Required. The name of a field or expression of type STRING (or a literal string). Return count of characters from values in the name field. For example, the value Bob would return a length of 3, Julie would return a length of 5, and so on: LENGTH(name) Page 358 Data Analysis and Visualization Guide - Platfora Expressions REGEX REGEX is a row function that performs a whole string match against a string value with a regular expression and returns the portion of the string matching the first capturing group of the regular expression. REGEX(string_expression,"regex_matching_pattern") Returns the matched STRING value of the first capturing group of the regular expression. If there is no match, returns NULL. string_expression Required. The name of a field or expression of type STRING (or a literal string). regex_matching_pattern Required. A regular expression pattern based on the regular expression pattern matching syntax of the Java programming language. To return a non-NULL value, the regular expression pattern must match the entire string value. This section lists a summary of the most commonly used constructs for defining a regular expression matching pattern. See the Regular Expression Reference for more information about regular expression support in Platfora. Literal and Special Characters The most basic form of pattern matching is the match of literal characters. For example, if the regular expression is foo and the input string is foo, the match will succeed because the strings are identical. Certain characters are reserved for special use in regular expressions. These special characters are often called metacharacters. If you want to use special characters as literal characters, they must be escaped. You can escape a single character using a \ (backslash), or escape a character sequence by enclosing it in \Q ... \E. To escape literal double-quotes, double the double-quotes (""). Character Name Character Reserved For opening bracket [ start of a character class closing bracket ] end of a character class hyphen - character ranges within a character class backslash \ general escape character caret ^ beginning of string, negating of a character class dollar sign $ end of string period . matching any single character Page 359 Data Analysis and Visualization Guide - Platfora Expressions Character Name Character Reserved For pipe | alternation (OR) operator question mark ? optional quantifier, quantifier minimizer asterisk * zero or more quantifier plus sign + once or more quantifier opening parenthesis ( start of a subexpression group closing parenthesis ) end of a subexpression group opening brace { start of min/max quantifier closing brace } end of min/max quantifier Character Class Constructs A character class allows you to specify a set of characters, enclosed in square brackets, that can produce a single character match. There are also a number of special predefined character classes (backslash character sequences that are shorthand for the most common character sets). Construct Type Description [abc] simple matches a or b or c [^abc] negation matches any character except a or b or c Page 360 Data Analysis and Visualization Guide - Platfora Expressions Construct Type Description [a-zA-Z] range matches a through z , or A through Z (inclusive) [a-d[m-p]] union matches a through d , or m through p [a-z&&[def]] intersection matches d , e , or f [a-z&&[^xq]] subtraction matches a through z , except for x and q Predefined Character Classes Page 361 Data Analysis and Visualization Guide - Platfora Expressions Predefined character classes offer convenient shorthands for commonly used regular expressions. Construct Description Example . matches any single character (except newline) .at matches "cat", "hat", and also"bat" in the phrase "batch files" \d \D matches any digit character (equivalent to \d [0-9] ) matches "3" in "C3PO" and "2" in "file_2.txt" matches any non-digit character (equivalent to \D [^0-9] matches "S" in "900S" and "Q" in "Q45" ) \s matches any single white-space character (equivalent to [ \t\n\x0B\f\r] \sbook matches "book" in "blue book" but nothing in "notebook" ) \S matches any single non-white-space character \Sbook matches "book" in "notebook" but nothing in "blue book" \w matches any alphanumeric character, including r\w* underscore (equivalent to matches "rm" and "root" [A-Za-z0-9_] ) \W matches any non-alphanumeric character (equivalent to [^A-Za-z0-9_] \W matches "&" in "stmd &" , "%" in "100%", and "$" in "$HOME" ) Line and Word Boundaries Boundary matching constructs are used to specify where in a string to apply a matching pattern. For example, you can search for a particular pattern within a word boundary, or search for a pattern at the beginning or end of a line. Construct Description Example ^ matches from the beginning of a line (multiline matches are currently not supported) ^172 Page 362 will match the "172" in IP address "172.18.1.11" but not in "192.172.2.33" Data Analysis and Visualization Guide - Platfora Expressions Construct Description Example $ matches from the end of a line (multi-line matches are currently not supported) d$ matches within a word boundary \bis\b \b will match the "d" in "maid" but not in "made" matches the word "is" in "this is my island", but not the "is" part of "this" or "island". \bis matches both "is" and the "is" in "island", but not in "this". \B \Bb matches within a non-word boundary matches "b" in "sbin" but not in "bash" Quantifiers Quantifiers specify how often the preceding regular expression construct should match. There are three classes of quantifiers: greedy, reluctant, and possessive. The difference between greedy, reluctant, and possessive quantifiers involves what part of the string to try for the initial match, and how to retry if the initial attempt does not produce a match. Greedy ReluctantPossessiveDescription ConstructConstructConstruct Example ? matches the previous character or construct once or not at all st?on matches the previous character or construct zero or more times if* matches the previous character or construct one or more times if+ matches the previous character or construct exactly o{2} * + {n} ?? *? +? {n}? ?+ *+ ++ {n}+ n times Page 363 matches "son" in "johnson" and "ston" in "johnston" but nothing in "clinton" or "version" matches "if", "iff" in "diff", or "i" in "print" matches "if", "iff" in "diff", but nothing in "print" matches "oo" in "lookup" and the first two o's in "fooooo" but nothing in "mount" Data Analysis and Visualization Guide - Platfora Expressions Greedy ReluctantPossessiveDescription ConstructConstructConstruct Example {n,} o{2,} {n,}? {n,}+ matches the previous character or construct at least matches "oo" in "lookup" all five o's in "fooooo" but nothing in "mount" n times {n,m} {n,m}? {n,m}+ matches the previous character or construct at least F{2,4} matches "FF" in "#FF0000" and the last four F's in "#FFFFFF" n times, but no more than m times Groups are specified by a pair of parenthesis around a subpattern in the regular expression. A pattern can have more than one group and the groups can be nested. The groups are numbered 1-n from left to right, starting with the first opening parenthesis. There is always an implicit group 0, which contains the entire match. For example, the pattern: (a(b*))+(c) contains three groups: group 1: (a(b*)) group 2: (b*) group 3: (c) Capturing Groups By default, a group captures the text that produces a match, and only the most recent match is captured. The REGEX function returns the string that matches the first capturing group in the regular expression. For example, if the input string to the expression above was abc, the entire REGEX function would match to abc, but only return the result of group 1, which is ab. Non-Capturing Groups In some cases, you may want to use parenthesis to group subpatterns, but not capture text. A noncapturing group starts with (?: (a question mark and colon following the opening parenthesis). For example, h(?:a|i|o)t matches hat or hit or hot, but does not capture the a, i, or o from the subexpression. Match all possible email address strings with a pattern of [email protected], but only return the provider portion of the email address from the email field: REGEX(email,"^[a-zA-Z0-9._%+-]+@([a-zA-Z0-9._-]+)\.[a-zA-Z]{2,4}$") Match the request line of a web log, where the value is in the format of: Page 364 Data Analysis and Visualization Guide - Platfora Expressions GET /some_page.html HTTP/1.1 and return just the requested HTML page names: REGEX(weblog.request_line,"GET\s/([a-zA-Z0-9._%-]+\.[html])\sHTTP/[0-9.]+") Extract the inches portion from a height field where example values are 6'2", 5'11" (notice the escaping of the literal quote with a double double-quote): REGEX(height, "\d\'(\d)+""") Extract all of the contents of the device field when the value is either iPod, iPad, or iPhone: REGEX(device,"(iP[ao]d|iPhone)") REGEX_REPLACE REGEX_REPLACE is a row function that evaluates a string value against a regular expression to determine if there is a match, and replaces matched strings with the specified replacement value. REGEX_REPLACE(string_expression,"regex_match_pattern","regex_replace_pattern") Returns the regex_replace_pattern as a STRING value when regex_match_pattern produces a match. If there is no match, returns the value of string_expression as a STRING. string_expression Required. The name of a field or expression of type STRING (or a literal string). regex_match_pattern Required. A string literal or regular expression pattern based on the regular expression pattern matching syntax of the Java programming language. You can use capturing groups to create backreferences that can be used in the regex_replace_pattern. You might want to use a string literal to make a case-sensitive match. For example, when you enter jane as the match value, the function matches jane but not Jane. The function matches all occurrences of a string literal in the string expression. regex_replace_pattern Required. A string literal or regular expression pattern based on the regular expression pattern matching syntax of the Java programming language. You can refer to backreferences from the regex_match_pattern using the syntax $n (where n is the group number). This section lists a summary of the most commonly used constructs for defining a regular expression matching pattern. See the Regular Expression Reference for more information. Literal and Special Characters The most basic form of pattern matching is the match of literal characters. For example, if the regular expression is foo and the input string is foo, the match will succeed because the strings are identical. Certain characters are reserved for special use in regular expressions. These special characters are often called metacharacters. If you want to use special characters as literal characters, they must be escaped. Page 365 Data Analysis and Visualization Guide - Platfora Expressions You can escape a single character using a \ (backslash), or escape a character sequence by enclosing it in \Q ... \E. Character Name Character Reserved For opening bracket [ start of a character class closing bracket ] end of a character class hyphen - character ranges within a character class backslash \ general escape character caret ^ beginning of string, negating of a character class dollar sign $ end of string period . matching any single character pipe | alternation (OR) operator question mark ? optional quantifier, quantifier minimizer asterisk * zero or more quantifier plus sign + once or more quantifier opening parenthesis ( start of a subexpression group closing parenthesis ) end of a subexpression group opening brace { start of min/max quantifier closing brace } end of min/max quantifier Character Class Constructs Page 366 Data Analysis and Visualization Guide - Platfora Expressions A character class allows you to specify a set of characters, enclosed in square brackets, that can produce a single character match. There are also a number of special predefined character classes (backslash character sequences that are shorthand for the most common character sets). Construct Type Description [abc] simple matches a or b or c [^abc] negation matches any character except a or b or c [a-zA-Z] range matches a through z , or A through Z (inclusive) [a-d[m-p]] union matches a through d , or m through p [a-z&&[def]] intersection matches d , e , or f Page 367 Data Analysis and Visualization Guide - Platfora Expressions Construct Type Description [a-z&&[^xq]] subtraction matches a through z , except for x and q Predefined Character Classes Predefined character classes offer convenient shorthands for commonly used regular expressions. Construct Description Example . matches any single character (except newline) .at matches "cat", "hat", and also"bat" in the phrase "batch files" \d \D matches any digit character (equivalent to \d [0-9] ) matches "3" in "C3PO" and "2" in "file_2.txt" matches any non-digit character (equivalent to \D [^0-9] matches "S" in "900S" and "Q" in "Q45" ) \s matches any single white-space character (equivalent to [ \t\n\x0B\f\r] \sbook matches "book" in "blue book" but nothing in "notebook" ) \S matches any single non-white-space character \Sbook matches "book" in "notebook" but nothing in "blue book" \w matches any alphanumeric character, including r\w* underscore (equivalent to matches "rm" and "root" [A-Za-z0-9_] ) Page 368 Data Analysis and Visualization Guide - Platfora Expressions Construct Description Example \W matches any non-alphanumeric character (equivalent to \W [^A-Za-z0-9_] matches "&" in "stmd &" , "%" in "100%", and "$" in "$HOME" ) Line and Word Boundaries Boundary matching constructs are used to specify where in a string to apply a matching pattern. For example, you can search for a particular pattern within a word boundary, or search for a pattern at the beginning or end of a line. Construct Description Example ^ matches from the beginning of a line (multiline matches are currently not supported) ^172 matches from the end of a line (multi-line matches are currently not supported) d$ matches within a word boundary \bis\b $ \b will match the "172" in IP address "172.18.1.11" but not in "192.172.2.33" will match the "d" in "maid" but not in "made" matches the word "is" in "this is my island", but not the "is" part of "this" or "island". \bis matches both "is" and the "is" in "island", but not in "this". \B matches within a non-word boundary \Bb matches "b" in "sbin" but not in "bash" Quantifiers Quantifiers specify how often the preceding regular expression construct should match. There are three classes of quantifiers: greedy, reluctant, and possessive. The difference between greedy, reluctant, and Page 369 Data Analysis and Visualization Guide - Platfora Expressions possessive quantifiers involves what part of the string to try for the initial match, and how to retry if the initial attempt does not produce a match. Greedy ReluctantPossessiveDescription ConstructConstructConstruct Example ? matches the previous character or construct once or not at all st?on matches the previous character or construct zero or more times if* matches the previous character or construct one or more times if+ matches the previous character or construct exactly o{2} * + {n} ?? *? +? {n}? ?+ *+ ++ {n}+ matches "son" in "johnson" and "ston" in "johnston" but nothing in "clinton" or "version" matches "if", "iff" in "diff", or "i" in "print" matches "if", "iff" in "diff", but nothing in "print" matches "oo" in "lookup" and the first two o's in "fooooo" but nothing in "mount" n times {n,} {n,}? {n,}+ matches the previous character or construct at least o{2,} matches "oo" in "lookup" all five o's in "fooooo" but nothing in "mount" n times {n,m} {n,m}? {n,m}+ matches the previous character or construct at least F{2,4} matches "FF" in "#FF0000" and the last four F's in "#FFFFFF" n times, but no more than m times Match the values in a phone_number field where phone number values are formatted as xxx.xxx.xxxx and replace them with phone number values formatted as (xxx) xxx-xxxx: REGEX_REPLACE(phone_number,"([0-9]{3})\.([[0-9]]{3})\.([[0-9]] {4})","\($1\) $2-$3") Match the values in a name field where name values are formatted as firstname lastname and replace them with name values formatted as lastname, firstname: Page 370 Data Analysis and Visualization Guide - Platfora Expressions REGEX_REPLACE(name,"(.*) (.*)","$2, $1") Match the string literal mrs in a title field and replace it with the string literal Mrs. REGEX_REPLACE(title,"mrs","Mrs") SPLIT SPLIT is a row function that breaks down a delimited input string into sections and returns the specified section of the string. A section is considered any sub-string between the specified delimiter. SPLIT(input_string_expression,"delimiter_string",position_integer) Returns one value per row of type STRING. input_string_expression Required. The name of a field or expression of type STRING (or a literal string). delimiter_string Required. A literal string representing the delimiter used to separate values in the input string. The delimiter can be a single character or multiple characters. position_integer Required. An integer representing the position of the section in the input string that you want to extract. Positive integers count the position from the beginning of the string, and negative integers count the position from the end of the string. A value of 0 returns NULL. Return the third section of the literal delimited string: Restaurants>Location>San Francisco: SPLIT("Restaurants>Location>San Francisco",">", -1) returns San Francisco Return the first section of a phone_number field where phone number values are in the format of 123-456-7890: SPLIT(phone_number,"-",1) SUBSTRING SUBSTRING is a row function that returns the specified characters of a string value based on the given start and end position. SUBSTRING(string,start,end) Returns one value per row of type STRING. string Required. The name of a field or expression of type STRING (or a literal string). start Page 371 Data Analysis and Visualization Guide - Platfora Expressions Required. An integer that specifies where the returned characters start (inclusive), with 0 being the first character of the string. If start is greater than the number of characters, then an empty string is returned. If start is greater than end, then an empty string is returned. end Required. A positive integer that specifies where the returned characters end (exclusive), with the end character not being part of the return value. If end is greater than the number of characters, the whole string value (from start) is returned. Return the first letter of the name field: SUBSTRING(name,0,1) TO_LOWER TO_LOWER is a row function that converts all alphabetic characters in a string to lower case. TO_LOWER(string_expression) Returns one value per row of type STRING. string_expression Required. The name of a field or expression of type STRING (or a literal string). Return the literal input string 123 Main Street in all lower case letters:: TO_LOWER("123 Main Street") returns 123 main street TO_UPPER TO_UPPER is a row function that converts all alphabetic characters in a string to upper case. TO_UPPER(string_expression) Returns one value per row of type STRING. string_expression Required. The name of a field or expression of type STRING (or a literal string). Return the literal input string 123 Main Street in all upper case letters: TO_UPPER("123 Main Street") returns 123 MAIN STREET TRIM TRIM is a row function that removes leading and trailing spaces from a string value. TRIM(string_expression) Returns one value per row of type STRING. string_expression Page 372 Data Analysis and Visualization Guide - Platfora Expressions Required. The name of a field or expression of type STRING (or a literal string). Return the value of the area_code field without any leading or trailing spaces. For example, if the input string is " 650 ", then the return value would be "650": TRIM(area_code) Return the value of the phone_number field without any leading or trailing spaces. For example, if the input string is " 650 123-4567 ", then the return value would be "650 123-4567" (note that the extra spaces in the middle of the string are not removed, only the spaces at the beginning and end of the string): TRIM(phone_number) XPATH_STRING XPATH_STRING is a row function that takes an XML-formatted string and returns the first string matching the given XPath expression. XPATH_STRING(xml_formatted_string,"xpath_expression") Returns one value per row of type STRING. If the XPath expression matches more than one string in the given XML node, this function will return the first match only. To return all matches, use XPATH_STRINGS instead. xml_formatted_string Required. The name of a field or a literal string that contains a valid XML node (a snippet of XML consisting of a parent element and one or more child nodes). xpath_expression Required. An XPath expression that refers to a node, element, or attribute within the XML string passed to this expression. Any XPath expression that complies to the XML Path Language (XPath) Version 1.0 specification is valid. These example XPATH_STRING expressions assume you have a field in your dataset named address that contains XML-formatted strings such as this: <list> <address type="work"> <street>1300 So. El Camino Real</street1> <street>Suite 600</street2> <city>San Mateo</city> <state>CA</state> <zipcode>94403</zipcode> </address> <address type="home"> <street>123 Oakdale Street</street1> <street/> <city>San Francisco</city> <state>CA</state> Page 373 Data Analysis and Visualization Guide - Platfora Expressions <zipcode>94123</zipcode> </address> </list> Get the zipcode value from any address element where the type attribute equals home: XPATH_STRING(address,"//address[@type='home']/zipcode") returns: 94123 Get the city value from the second address element: XPATH_STRING(address,"/list/address[2]/city") returns: San Francisco Get the values from all child elements of the first address element (as one string): XPATH_STRING(address,"/list/address") returns: 1300 So. El Camino RealSuite 600 San MateoCA94403 XPATH_STRINGS XPATH_STRINGS is a row function that takes an XML-formatted string and returns a newline-separated array of strings matching the given XPath expression. XPATH_STRINGS(xml_formatted_string,"xpath_expression") Returns one value per row of type STRING. If the XPath expression matches more than one string in the given XML node, this function will return all matches separated by a newline (you cannot specify a different delimiter). xml_formatted_string Required. The name of a field or a literal string that contains a valid XML node (a snippet of XML consisting of a parent element and one or more child nodes). xpath_expression Required. An XPath expression that refers to a node, element, or attribute within the XML string passed to this expression. Any XPath expression that complies to the XML Path Language (XPath) Version 1.0 specification is valid. These example XPATH_STRINGS expressions assume you have a field in your dataset named address that contains XML-formatted strings such as this: <list> <address type="work"> <street>1300 So. El Camino Real</street1> <street>Suite 600</street2> <city>San Mateo</city> <state>CA</state> <zipcode>94403</zipcode> Page 374 Data Analysis and Visualization Guide - Platfora Expressions </address> <address type="home"> <street>123 Oakdale Street</street1> <street/> <city>San Francisco</city> <state>CA</state> <zipcode>94123</zipcode> </address> </list> Get all zipcode values from all address elements: XPATH_STRINGS(address,"//address/zipcode") returns: 94123 94403 Get all street values from the first address element: XPATH_STRINGS(address,"/list/address[1]/street") returns: 1300 So. El Camino Real Suite 600 Get the values from all child elements of all address elements (as one string per line): XPATH_STRINGS(address,"/list/address") returns: 123 Oakdale StreetSan FranciscoCA94123 1300 So. El Camino RealSuite 600 San MateoCA94403 XPATH_XML XPATH_XML is a row function that takes an XML-formatted string and returns an XML-formatted string matching the given XPath expression. XPATH_XML(xml_formatted_string,"xpath_expression") Returns one value per row of type STRING in XML format. xml_formatted_string Required. The name of a field or a literal string that contains a valid XML node (a snippet of XML consisting of a parent element and one or more child nodes). xpath_expression Required. An XPath expression that refers to a node, element, or attribute within the XML string passed to this expression. Any XPath expression that complies to the XML Path Language (XPath) Version 1.0 specification is valid. Page 375 Data Analysis and Visualization Guide - Platfora Expressions These example XPATH_STRING expressions assume you have a field in your dataset named address that contains XML-formatted strings such as this: <list> <address type="work"> <street>1300 So. El Camino Real</street1> <street>Suite 600</street2> <city>San Mateo</city> <state>CA</state> <zipcode>94403</zipcode> </address> <address type="home"> <street>123 Oakdale Street</street1> <street/> <city>San Francisco</city> <state>CA</state> <zipcode>94123</zipcode> </address> </list> Get the last address node and its child nodes in XML format: XPATH_XML(address,"//address[last()]") returns: <address type="home"> <street>123 Oakdale Street</street1> <street/> <city>San Francisco</city> <state>CA</state> <zipcode>94123</zipcode> </address> Get the city value from the second address node in XML format: XPATH_XML(address,"/list/address[2]/city") returns: <city>San Francisco</city> Get the first address node and its child nodes in XML format: XPATH_XML(address,"/list/address[1]") returns: <address type="work"> <street>1300 So. El Camino Real</street1> <street>Suite 600</street2> <city>San Mateo</city> <state>CA</state> <zipcode>94403</zipcode> </address> Page 376 Data Analysis and Visualization Guide - Platfora Expressions URL Functions URL functions allow you to extract different portions of a URL string, and decode text that is URLencoded. URL_AUTHORITY URL_AUTHORITY is a row function that returns the authority portion of a URL string. The authority portion of a URL is the part that has the information on how to locate and connect to the server. URL_AUTHORITY(string) Returns the authority portion of a URL as a STRING value, or NULL if the input string is not a valid URL. For example, in the string http://www.platfora.com/company/contact.html, the authority portion is www.platfora.com. In the string http://user:[email protected]:8012/mypage.html, the authority portion is user:[email protected]:8012. In the string mailto:[email protected]?subject=Topic, the authority portion is NULL. string Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format of: protocol:authority[/path][?query][#fragment]. The authority portion of the URL contains the host information, which can be specified as a domain name (www.platfora.com), a host name (localhost), or an IP address (127.0.0.1). The host information can be preceeded by optional user information terminated with @ (for example, username:[email protected]), and followed by an optional port number preceded by a colon (for example, localhost:8001). Return the authority portion of URL string values in the referrer field: URL_AUTHORITY(referrer) Return the authority portion of a literal URL string: URL_AUTHORITY("http://user:[email protected]:8012/mypage.html") returns user:[email protected]:8012 URL_FRAGMENT URL_FRAGMENT is a row function that returns the fragment portion of a URL string. URL_FRAGMENT(string) Returns the fragment portion of a URL as a STRING value, NULL if the URL or does not contain a fragment, or NULL if the input string is not a valid URL. Page 377 Data Analysis and Visualization Guide - Platfora Expressions For example, in the string http://www.platfora.com/contact.html#phone, the fragment portion is phone. In the string http://www.platfora.com/contact.html, the fragment portion is NULL. In the string http://platfora.com/news.php?topic=press#Platfora%20News, the fragment portion is Platfora%20News. string Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format of: protocol:authority[/path][?query][#fragment]. The optional fragment portion of the URL is separated by a hash mark (#) and provides direction to a secondary resource, such as a heading or anchor identifier. Return the fragment portion of URL string values in the request field: URL_FRAGMENT(request) Return the fragment portion of a literal URL string: URL_FRAGMENT("http://platfora.com/news.php?topic=press#Platfora%20News") returns Platfora%20News Return and decode the fragment portion of a literal URL string: URLDECODE(URL_FRAGMENT("http://platfora.com/news.php? topic=press#Platfora%20News")) returns Platfora News URL_HOST URL_HOST is a row function that returns the host, domain, or IP address portion of a URL string. URL_HOST(string) Returns the host portion of a URL as a STRING value, or NULL if the input string is not a valid URL. For example, in the string http://www.platfora.com/company/contact.html, the host portion is www.platfora.com. In the string http://admin:[email protected]:8001/index.html, the host portion is 127.0.0.1. In the string mailto:[email protected]?subject=Topic, the host portion is NULL. string Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format of: protocol:authority[/path][?query][#fragment]. The authority portion of the URL contains the host information, which can be specified as a domain name (www.platfora.com), a host name (localhost), or an IP address (127.0.0.1). Page 378 Data Analysis and Visualization Guide - Platfora Expressions Return the host portion of URL string values in the referrer field: URL_HOST(referrer) Return the host portion of a literal URL string: URL_HOST("http://user:[email protected]:8012/mypage.html") returns mycompany.com URL_PATH URL_PATH is a row function that returns the path portion of a URL string. URL_PATH(string) Returns the path portion of a URL as a STRING value, NULL if the URL or does not contain a path, or NULL if the input string is not a valid URL. For example, in the string http://www.platfora.com/company/contact.html, the path portion is /company/contact.html. In the string http://admin:[email protected]:8001/index.html, the path portion is / index.html. In the string mailto:[email protected]?subject=Topic, the path portion is [email protected]. string Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format of: protocol:authority[/path][?query][#fragment]. The optional path portion of the URL is a sequence of resource location segments separated by a forward slash (/), conceptually similar to a directory path. Return the path portion of URL string values in the request field: URL_PATH(request) Return the path portion of a literal URL string: URL_PATH("http://platfora.com/company/contact.html") returns /company/ contact.html URL_PORT URL_PORT is a row function that returns the port portion of a URL string. URL_PORT(string) Returns the port portion of a URL as an INTEGER value. If the URL does not specify a port, then returns -1. If the input string is not a valid URL, returns NULL. Page 379 Data Analysis and Visualization Guide - Platfora Expressions For example, in the string http://localhost:8001, the port portion is 8001. string Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format of: protocol:authority[/path][?query][#fragment]. The authority portion of the URL contains the host information, which can be specified as a domain name (www.platfora.com), a host name (localhost), or an IP address (127.0.0.1). The host information can be followed by an optional port number preceded by a colon (for example, localhost:8001). Return the port portion of URL string values in the referrer field: URL_PORT(referrer) Return the port portion of a literal URL string: URL_PORT("http://user:[email protected]:8012/mypage.html") returns 8012 URL_PROTOCOL URL_PROTOCOL is a row function that returns the protocol (or URI scheme name) portion of a URL string. URL_PROTOCOL(string) Returns the protocol portion of a URL as a STRING value, or NULL if the input string is not a valid URL. For example, in the string http://www.platfora.com, the protocol portion is http. In the string ftp://ftp.platfora.com/articles/platfora.pdf, the protocol portion is ftp. string Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format of: protocol:authority[/path][?query][#fragment] The protocol portion of a URL consists of a sequence of characters beginning with a letter and followed by any combination of letter, number, plus (+), period (.), or hyphen (-) characters, followed by a colon (:). For example: http:, ftp:, mailto: Return the protocol portion of URL string values in the referrer field: URL_PROTOCOL(referrer) Return the protocol portion of the literal URL string: URL_PROTOCOL("http://www.platfora.com") returns http Page 380 Data Analysis and Visualization Guide - Platfora Expressions URL_QUERY URL_QUERY is a row function that returns the query portion of a URL string. URL_QUERY(string) Returns the query portion of a URL as a STRING value, NULL if the URL or does not contain a query, or NULL if the input string is not a valid URL. For example, in the string http://www.platfora.com/contact.html, the query portion is NULL. In the string http://platfora.com/news.php? topic=press&timeframe=today#Platfora%20News, the query portion is topic=press&timeframe=today. In the string mailto:[email protected]?subject=Topic, the query portion is subject=Topic. string Required. A field or expression that returns a STRING value in URI (uniform resource identifier) format of: protocol:authority[/path][?query][#fragment]. The optional query portion of the URL is separated by a question mark (?) and typically contains an unordered list of key=value pairs separated by an ampersand (&) or semicolon (;). Return the query portion of URL string values in the request field: URL_QUERY(request) Return the query portion of a literal URL string: URL_QUERY("http://platfora.com/news.php?topic=press&timeframe=today") returns topic=press&timeframe=today URLDECODE URLDECODE is a row function that decodes a string that has been encoded with the application/ x-www-form-urlencoded media type. URL encoding, also known as percent-encoding, is a mechanism for encoding information in a Uniform Resource Identifier (URI). When sent in an HTTP GET request, application/x-www-form-urlencoded data is included in the query component of the request URI. When sent in an HTTP POST request, the data is placed in the body of the message, and the name of the media type is included in the message Content-Type header. URLDECODE(string) Returns a value of type STRING with characters decoded as follows: • Alphanumeric characters (a-z, A-Z, 0-9) remain unchanged. • The special characters hyphen (-), comma (,), underscore (_), period (.), and asterisk (*) remain unchanged. Page 381 Data Analysis and Visualization Guide - Platfora Expressions • The plus sign (+) character is converted to a space character. • The percent character (%) is interpreted as the start of a special escaped sequence, where in the sequence %HH, HH represents the hexadecimal value of the byte. For example, some common escape sequences are: percent encoding sequence value %20 space %0A or %0D or %0D%0A newline %22 double quote (") %25 percent (%) %2D hyphen (-) %2E period (.) %3C less than (<) %3D greater than (>) %5C backslash (\) %7C pipe (|) string Required. A field or expression that returns a STRING value. It is assumed that all characters in the input string are one of the following: lower-case letters (a-z), upper-case letters (A-Z), numeric digits (0-9), or the hyphen (-), comma (,), underscore (_), period (.) or asterisk (*) character. The percent character (%) is allowed, but is interpreted as the start of a special escaped sequence. The plus character (+) is allowed, but is interpreted as a space character. Decode the values of the url_query field: URLDECODE(url_query) Convert a literal URL encoded string (N%2FA%20or%20%22not%20applicable%22) to a humanreadable value (N/A or "not applicable"): URLDECODE("N%2FA%20or%20%22not%20applicable%22") returns N/A or "not applicable" IP Address Functions IP address functions allow you to manipulate and transform STRING data consisting of IP address values. Page 382 Data Analysis and Visualization Guide - Platfora Expressions CIDR_MATCH CIDR_MATCH is a row function that compares two STRING arguments representing a CIDR mask and an IP address, and returns 1 if the IP address falls within the specified subnet mask or 0 if it does not. CIDR_MATCH(CIDR_string, IP_string) Returns an INTEGER value of 1 if the IP address falls within the subnet indicated by the CIDR mask and 0 if it does not. CIDR_string Required. A field or expression that returns a STRING value containing either an IPv4 or IPv6 CIDR mask (Classless InterDomain Routing subnet notation). An IPv4 CIDR mask can only successfully match IPv4 addresses, and an IPv6 CIDR mask can only successfully match IPv6 addresses. IP_string Required. A field or expression that returns a STRING value containing either an IPv4 or IPv6 internet protocol (IP) address. Compare an IPv4 CIDR subnet mask to an IPv4 IP address: CIDR_MATCH("60.145.56.0/24","60.145.56.246") returns 1 CIDR_MATCH("60.145.56.0/30","60.145.56.246") returns 0 Compare an IPv6 CIDR subnet mask to an IPv6 IP address: CIDR_MATCH("fe80::/70","FE80::0202:B3FF:FE1E:8329") returns 1 CIDR_MATCH("fe80::/72","FE80::0202:B3FF:FE1E:8329") returns 0 HEX_TO_IP HEX_TO_IP is a row function that converts a hexadecimal-encoded STRING to a text representation of an IP address. HEX_TO_IP(string) Returns a value of type STRING representing either an IPv4 or IPv6 address. The type of IP address returned depends on the input string. An 8 character hexadecimal string will return an IPv4 address. A 32 character long hexadecimal string will return an IPv6 address. IPv6 addresses are represented in full length, without removing any leading zeros and without using the compressed :: notation. For example, 2001:0db8:0000:0000:0000:ff00:0042:8329 rather than 2001:db8::ff00:42:8329. Input strings that do not contain either 8 or 32 valid hexadecimal characters will return NULL. string Page 383 Data Analysis and Visualization Guide - Platfora Expressions Required. A field or expression that returns a hexadecimal-encoded STRING value. The hexadecimal string must be either 8 characters long (in which case it is converted to an IPv4 address) or 32 characters long (in which case it is converted to an IPv6 address). Return a plain text IP address for each hexadecimal-encoded string value in the byte_encoded_ips column: HEX_TO_IP(byte_encoded_ips) Convert an 8 character hexadecimal-encoded string to a plain text IPv4 address: HEX_TO_IP(AB20FE01) returns 171.32.254.1 Convert a 32 character hexadecimal-encoded string to a plain text IPv6 address: HEX_TO_IP(FE800000000000000202B3FFFE1E8329) returns fe80:0000:0000:0000:0202:b3ff:fe1e:8329 Date and Time Functions Date and time functions allow you to manipulate and transform datetime values, such as calculating time differences between two datetime values, or extracting a portion of a datetime value. DAYS_BETWEEN DAYS_BETWEEN is a row function that calculates the whole number of days (ignoring time) between two DATETIME values (value1-value2). DAYS_BETWEEN(datetime_1,datetime_2) Returns one value per row of type INTEGER. datetime_1 Required. A field or expression of type DATETIME. datetime_2 Required. A field or expression of type DATETIME. Calculate the number of days to ship a product by subtracting the value of the order_date field from the ship_date field: DAYS_BETWEEN(ship_date,order_date) Calculate the number of days since a product's release by subtracting the value of the release_date field in the product dataset from the current date (the result of the expression): DAYS_BETWEEN(NOW(),product.release_date) DATE_ADD DATE_ADD is a row function that adds the specified time interval to a DATETIME value. Page 384 Data Analysis and Visualization Guide - Platfora Expressions DATE_ADD(datetime,quantity,"interval") Returns a value of type DATETIME. datetime Required. A field name or expression that returns a DATETIME value. quantity Required. An integer value. To add time intervals, use a positive integer. To subtract time intervals, use a negative integer. interval Required. One of the following time intervals: • millisecond - Adds the specified number of milliseconds to a datetime value. • second - Adds the specified number of seconds to a datetime value. • minute - Adds the specified number of minutes to a datetime value. • hour - Adds the specified number of hours to a datetime value. • day - Adds the specified number of days to a datetime value. • week - Adds the specified number of weeks to a datetime value. • month - Adds the specified number of months to a datetime value. • quarter - Adds the specified number of quarters to a datetime value. • year - Adds the specified number of years to a datetime value. • weekyear - Adds the specified number of weekyears to a datetime value. Add 45 days to the value of the invoice_date field to calculate the date a payment is due: DATE_ADD(invoice_date,45,"day") HOURS_BETWEEN HOURS_BETWEEN is a row function that calculates the whole number of hours (ignoring minutes, seconds, and milliseconds) between two DATETIME values (value1-value2). HOURS_BETWEEN(datetime_1,datetime_2) Returns one value per row of type INTEGER. datetime_1 Required. A field or expression of type DATETIME. datetime_2 Required. A field or expression of type DATETIME. Calculate the number of hours to ship a product by subtracting the value of the ship_date field from the order_date field: HOURS_BETWEEN(ship_date,order_date) Page 385 Data Analysis and Visualization Guide - Platfora Expressions Calculate the number of hours since an advertisement was viewed by subtracting the value of the adview_timestamp field in the impressions dataset from the current date and time (the result of the expression): HOURS_BETWEEN(NOW(),impressions.adview_timestamp) EXTRACT EXTRACT is a row function that returns the specified portion of a DATETIME value. EXTRACT("extract_value",datetime) Returns the specified extracted value as type INTEGER. EXTRACT removes leading zeros. For example, the month of April returns a value of 4, not 04. extract_value Required. One of the following extract values: • millisecond - Returns the millisecond portion of a datetime value. For example, an input datetime value of 2012-08-15 20:38:40.213 would return an integer value of 213. • second - Returns the second portion of a datetime value. For example, an input datetime value of 2012-08-15 20:38:40.213 would return an integer value of 40. • minute - Returns the minute portion of a datetime value. For example, an input datetime value of 2012-08-15 20:38:40.213 would return an integer value of 38. • hour - Returns the hour portion of a datetime value. For example, an input datetime value of 2012-08-15 20:38:40.213 would return an integer value of 20. • day - Returns the day portion of a datetime value. For example, an input datetime value of 2012-08-15 would return an integer value of 15. • week - Returns the ISO week number for the input datetime value. For example, an input datetime value of 2012-01-02 would return an integer value of 1 (the first ISO week of 2012 starts on Monday January 2). An input datetime value of 2012-01-01 would return an integer value of 52 (January 1, 2012 is part of the last ISO week of 2011). • month - Returns the month portion of a datetime value. For example, an input datetime value of 2012-08-15 would return an integer value of 8. • quarter - Returns the quarter number for the input datetime value, where quarters start on January 1, April 1, July 1, or October 1. For example, an input datetime value of 2012-08-15 would return a integer value of 3. • year - Returns the year portion of a datetime value. For example, an input datetime value of 2012-01-01 would return an integer value of 2012. • weekyear - Returns the year value that corresponds the the ISO week number of the input datetime value. For example, an input datetime value of 2012-01-02 would return an integer value of 2012 (the first ISO week of 2012 starts on Monday January 2). An input datetime value of 2012-01-01 would return an integer value of 2011 (January 1, 2012 is part of the last ISO week of 2011). datetime Required. A field name or expression that returns a DATETIME value. Page 386 Data Analysis and Visualization Guide - Platfora Expressions Extract the hour portion from the order_date datetime field: EXTRACT("hour",order_date) Cast the value of the order_date string field to a datetime value using TO_DATE, and extract the ISO week year: EXTRACT("weekyear",TO_DATE(order_date,"MM/dd/yyyy HH:mm:ss")) MILLISECONDS_BETWEEN MILLISECONDS_BETWEEN is a row function that calculates the whole number of milliseconds between two DATETIME values (value1-value2). MILLISECONDS_BETWEEN(datetime_1,datetime_2) Returns one value per row of type INTEGER. datetime_1 Required. A field or expression of type DATETIME. datetime_2 Required. A field or expression of type DATETIME. Calculate the number of milliseconds it took to serve a web page by subtracting the value of the request_timestamp field from the response_timestamp field: MILLISECONDS_BETWEEN(request_timestamp,response_timestamp) MINUTES_BETWEEN MINUTES_BETWEEN is a row function that calculates the whole number of minutes (ignoring seconds and milliseconds) between two DATETIME values (value1-value2). MINUTES_BETWEEN(datetime_1,datetime_2) Returns one value per row of type INTEGER. datetime_1 Required. A field or expression of type DATETIME. datetime_2 Required. A field or expression of type DATETIME. Calculate the number of minutes it took for a user to click on an advertisement by subtracting the value of the impression_timestamp field from the conversion_timestamp field: MINUTES_BETWEEN(impression_timestamp,conversion_timestamp) Calculate the number of minutes since a user last logged in by subtracting the login_timestamp field in the weblogs dataset from the current date and time (the result of the expression): Page 387 Data Analysis and Visualization Guide - Platfora Expressions MINUTES_BETWEEN(NOW(),weblogs.login_timestamp) NOW NOW is a scalar function that returns the current system date and time as a DATETIME value. It can be used in other expressions involving DATETIME type fields, such as , , or . Note that the value of NOW is only evaluated at the time a lens is built (it is not re-evaluated with each query). NOW() Returns the current system date and time as a DATETIME value. Calculate a user's age using to subtract the value of the birthdate field in the users dataset from the current date: YEAR_DIFF(NOW(),users.birthdate) Calculate the number of days since a product's release using to subtract the value of the release_date field from the current date: DAYS_BETWEEN(NOW(),release_date) SECONDS_BETWEEN SECONDS_BETWEEN is a row function that calculates the whole number of seconds (ignoring milliseconds) between two DATETIME values (value1-value2). SECONDS_BETWEEN(datetime_1,datetime_2) Returns one value per row of type INTEGER. datetime_1 Required. A field or expression of type DATETIME. datetime_2 Required. A field or expression of type DATETIME. Calculate the number of seconds it took for a user to click on an advertisement by subtracting the value of the impression_timestamp field from the conversion_timestamp field: SECONDS_BETWEEN(impression_timestamp,conversion_timestamp) Calculate the number of seconds since a user last logged in by subtracting the login_timestamp field in the weblogs dataset from the current date and time (the result of the expression): SECONDS_BETWEEN(NOW(),weblogs.login_timestamp) TRUNC TRUNC is a row function that truncates a DATETIME value to the specified format. TRUNC(datetime,"format") Page 388 Data Analysis and Visualization Guide - Platfora Expressions Returns a value of type DATETIME truncated to the specified format. datetime Required. A field or expression that returns a DATETIME value. format Required. One of the following format values: • millisecond - Returns a datetime value truncated to millisecond granularity. Has no effect since millisecond is already the most granular format for datetime values. For example, an input datetime value of 2012-08-15 20:38:40.213 would return a datetime value of 2012-08-15 20:38:40.213. • second - Returns a datetime value truncated to second granularity. For example, an input datetime value of 2012-08-15 20:38:40.213 would return a datetime value of 2012-08-15 20:38:40.000. • minute - Returns a datetime value truncated to minute granularity. For example, an input datetime value of 2012-08-15 20:38:40.213 would return a datetime value of 2012-08-15 20:38:00.000. • hour - Returns a datetime value truncated to hour granularity. For example, an input datetime value of 2012-08-15 20:38:40.213 would return a datetime value of 2012-08-15 20:00:00.000. • day - Returns a datetime value truncated to day granularity. For example, an input datetime value of 2012-08-15 20:38:40.213 would return a datetime value of 2012-08-15 00:00:00.000. • week - Returns a datetime value truncated to the first day of the week (starting on a Monday). For example, an input datetime value of 2012-08-15 (a Wednesday) would return a datetime value of 2012-08-13 (the Monday prior). • month - Returns a datetime value truncated to the first day of the month. For example, an input datetime value of 2012-08-15 would return a datetime value of 2012-08-01. • quarter - Returns a datetime value truncated to the first day of the quarter (January 1, April 1, July 1, or October 1). For example, an input datetime value of 2012-08-15 20:38:40.213 would return a datetime value of 2012-07-01. • year - Returns a datetime value truncated to the first day of the year (January 1). For example, an input datetime value of 2012-08-15 would return a datetime value of 2012-01-01. • weekyear - Returns a datetime value trucated to the first day of the ISO weekyear (the ISO week starting with the Monday which is nearest in time to January 1). For example, an input datetime value of 2008-08-15 would return a datetime value of 2007-12-31. The first day of the ISO weekyear for 2008 is December 31, 2007 (the prior Monday closest to January 1). Truncate the order_date datetime field to day granularity: TRUNC(order_date,"day") Cast the value of the order_date string field to a datetime value using TO_DATE, and truncate it to day granularity: TRUNC(TO_DATE(order_date,"MM/dd/yyyy HH:mm:ss"),"day") Page 389 Data Analysis and Visualization Guide - Platfora Expressions YEAR_DIFF YEAR_DIFF is a row function that calculates the fractional number of years between two DATETIME values (value1-value2). YEAR_DIFF(datetime_1,datetime_2) Returns one value per row of type DOUBLE. datetime_1 Required. A field or expression of type DATETIME. datetime_2 Required. A field or expression of type DATETIME. Calculate the number of years a user has been a customer by subtracting the value of the registration_date field from the current date (the result of the expression): YEAR_DIFF(NOW(),registration_date) Calculate a user's age by subtracting the value of the birthdate field in the users dataset from the current date (the result of the expression): YEAR_DIFF(NOW(),users.birthdate) Math Functions Math functions allow you to perform basic math calculations on numeric values. You can also use arithmetic operators to perform simple math calculations. DIV DIV is a row function that divides two LONG values and returns a quotient value of type LONG (the result is truncated to 0 decimal places). DIV(dividend,divisor) Returns one value per row of type LONG. dividend Required. A field or expression of type LONG. divisor Required. A field or expression of type LONG. Cast the value of the file_size field to LONG and divide by 1024: DIV(TO_LONG(file_size),1024) Page 390 Data Analysis and Visualization Guide - Platfora Expressions EXP EXP is a row function that raises the mathematical constant e to the power (exponent) of a numeric value and returns a value of type DOUBLE. EXP(power) Returns one value per row of type DOUBLE. power Required. A field or expression of a numeric type. Raise e to the power in the Value field. EXP(Value) When the Value field value is 2.0, the result is equal to 7.3890 when truncated to four decimal places. FLOOR FLOOR is a row function that returns the largest integer that is less than or equal to the input argument. FLOOR(double) Returns one value per row of type DOUBLE. double Required. A field or expression of type DOUBLE. Return the floor value of 32.6789: FLOOR(32.6789) returns 32.0 HASH HASH is a row function that evenly partitions data values into the specified number of buckets. It creates a hash of the input value and assigns that value a bucket number. Equal values will always hash to the same bucket number. HASH(field_name,integer) Returns one value per row of type INTEGER corresponding to the bucket number that the input value hashes to. field_name Required. The name of the field whose values you want to partition. integer Required. The desired number of buckets. This parameter can be a numeric value of any data type, but when it is a non-integer value, Platfora truncates the value to an integer. When the value is zero, the function returns NULL. When the value is negative, the function uses absolute value. Page 391 Data Analysis and Visualization Guide - Platfora Expressions Partition the values of the username field into 20 buckets: HASH(username,20) LN LN is a row function that returns the natural logarithm of a number. The natural logarithm is the logarithm to the base e, where e (Euler's number) is a mathematical constant approximately equal to 2.718281828. The natural logarithm of a number x is the power to which the constant e must be raised in order to equal x. LN(positive_number) Returns the exponent to which base e must be raised to obtain the input value, where e denotes the constant number 2.718281828. The return value is the same data type as the input value. For example, LN(7.389) is 2, because e to the power of 2 is approximately 7.389. positive_number Required. A field or expression that returns a number greater than 0. Inputs can be of type INTEGER, LONG, DOUBLE, or FIXED. Return the natural logarithm of base number e, which is approximately 2.718281828: LN(2.718281828) returns 1 LN(3.0000) returns 1.098612 LN(300.0000) returns 5.703782 MOD MOD is a row function that divides two LONG values and returns the remainder value of type LONG (the result is truncated to 0 decimal places). MOD(dividend,divisor) Returns one value per row of type LONG. dividend Required. A field or expression of type LONG. divisor Required. A field or expression of type LONG. Cast the value of the file_size field to LONG and divide by 1024: MOD(TO_LONG(file_size),1024) Page 392 Data Analysis and Visualization Guide - Platfora Expressions POW POW is a row function that raises the a numeric value to the power (exponent) of another numeric value and returns a value of type DOUBLE. POW(index,power) Returns one value per row of type DOUBLE. index Required. A field or expression of a numeric type. power Required. A field or expression of a numeric type. Calculate the compound annual growth rate (CAGR) percentage for a given investment over a five year span. 100 * POW(end_value/start_value, 0.2) - 1 Calculate the square of the Value field. POW(Value,2) Calculate the square root of the Value field. POW(Value,0.5) The following expression returns 1. POW(0,0) ROUND ROUND is a row function that rounds a DOUBLE value to the specified number of decimal places. ROUND(double,number_decimal_places) Returns one value per row of type DOUBLE. double Required. A field or expression of type DOUBLE. number_decimal_places Required. An integer that specifies the number of decimal places to round to. Round the number 32.4678954 to two decimal places: ROUND(32.4678954,2) returns 32.47 Page 393 Data Analysis and Visualization Guide - Platfora Expressions Data Type Conversion Functions Data type conversion functions allow you to cast data values from one data type to another. These functions are used implicitly whenever you set the data type of a field or column in the Platfora user interface. The supported data types are: INTEGER, LONG, DOUBLE, FIXED, DATETIME, and STRING. EPOCH_MS_TO_DATE EPOCH_MS_TO_DATE is a row function that converts LONG values to DATETIME values, where the input number represents the number of milliseconds since the epoch. EPOCH_MS_TO_DATE(long_expression) Returns one value per row of type DATETIME in UTC format yyyy-MM-dd HH:mm:ss:SSS Z. long_expression Required. A field or expression of type LONG representing the number of milliseconds since the epoch datetime (January 1, 1970 00:00:00:000 GMT). Convert a number representing the number of milliseconds from the epoch to a human-readable date and time: EPOCH_MS_TO_DATE(1360260240000) returns 2013-02-07T18:04:00:000Z or February 7, 2013 18:04:00:000 GMT Or if your data is in seconds instead of milliseconds: EPOCH_MS_TO_DATE(1360260240 * 1000) returns 2013-02-07T18:04:00:000Z or February 7, 2013 18:04:00:000 GMT TO_CURRENCY This function is deprecated. Use the TO_FIXED function instead. TO_DATE TO_DATE is a row function that converts STRING values to DATETIME values, and specifies the format of the date and time elements in the string. TO_DATE(string_expression,"date_format") Returns one value per row of type DATETIME (which by definition is in UTC). string_expression Required. A field or expression of type STRING. date_format Required. A pattern that describes how the date is formatted. Use the following pattern symbols to define your date format. The count and ordering of the pattern letters determines the datetime format. Any characters in the pattern that are not in the ranges of a-z and Page 394 Data Analysis and Visualization Guide - Platfora Expressions A-Z are treated as quoted delimiter text. For instance, characters such as slash (/) or colon (:) will appear in the resulting output even they are not escaped with single quotes. Table 35: Date Pattern Symbols SymbolMeaning Presentation Examples G era text AD C century of era (0 or greater) number 20 Y year of era (0 or greater) year 1996 x week year year 1996 w week number of week year number 27 e day of week (number) number 2 E day of week (name) text Tuesday; Tue y year year 1996 D day of year number 189 M month of year month July; Jul; 07 3 or more uses text, otherwise uses a number d day of month number 10 If the number of pattern letters is 3 or more, the text form is used; otherwise the number is used. a half day of day text PM K hour of half day (0-11) number 0 h clock hour of half day (1-12) number 12 H hour of day (0-23) number 0 k clock hour of day (1-24) number 24 m minute of hour number 30 s second of minute number 55 S fraction of second number 978 Page 395 Notes Numeric presentation for year and week year fields are handled specially. For example, if the count of 'y' is 2, the year will be displayed as the zero-based year of the century, which is two digits. If the number of pattern letters is 4 or more, the full form is used; otherwise a short or abbreviated form is used. Data Analysis and Visualization Guide - Platfora Expressions SymbolMeaning Presentation Examples Notes z time zone text Pacific Standard Time; PST If the number of pattern letters is 4 or more, the full form is used; otherwise a short or abbreviated form is used. Z time zone offset/id zone -0800; -08:00; America/ Los_Angeles 'Z' outputs offset without a colon, 'ZZ' outputs the offset with a colon, 'ZZZ' or more outputs the zone id. ' escape character for text-based delimiters delimiter '' literal representation of literal a single quote ' Define a new DATETIME computed field based on the order_date base field, which contains timestamps in the format of: 2014.07.10 at 15:08:56 PDT: TO_DATE(order_date,"yyyy.MM.dd 'at' HH:mm:ss z") Define a new DATETIME computed field by first combining individual month, day, year, and depart_time fields (using CONCAT), and performing a transformation on depart_time to make sure threedigit times are converted to four-digit times (using REGEX_REPLACE): TO_DATE(CONCAT(month,"/",day,"/",year,":",REGEX_REPLACE(depart_time,"\b(\d{3})\b", dd/yyyy:HHmm") Define a new DATETIME computed field based on the created_at base field, which contains timestamps in the format of: Sat Jan 25 16:35:23 +0800 2014 (this is the timestamp format returned by Twitter's API): TO_DATE(created_at,"EEE MMM dd HH:mm:ss Z yyyy") TO_DOUBLE TO_DOUBLE is a row function that converts STRING, INTEGER, LONG, or DOUBLE values to DOUBLE (decimal) values. TO_DOUBLE(expression) Returns one value per row of type DOUBLE. expression Required. A field or expression of type STRING (must be numeric characters), INTEGER, LONG, or DOUBLE. Convert the values of the average_rating field to a double data type: TO_DOUBLE(average_rating) Convert the average_rating field to a double data type, but first transform the occurrence of any NA values to NULL values using a CASE expression: Page 396 Data Analysis and Visualization Guide - Platfora Expressions TO_DOUBLE(CASE WHEN average_rating="N/A" then NULL ELSE average_rating END) TO_FIXED TO_FIXED is a row function that converts STRING, INTEGER, LONG, or DOUBLE values to fixeddecimal values. Using a FIXED data type to represent monetary values allows you to calculate and aggregate monetary values with accuracy to a ten-thousandth of a monetary unit. TO_FIXED(expression) Returns one value per row of type FIXED (fixed-decimal value to 10000th accuracy). expression Required. A field or expression of type STRING (must be numeric characters), INTEGER, LONG, or DOUBLE. Convert the opening_price field to a fixed decimal data type: TO_FIXED(opening_price) Convert the sale_price field to a fixed decimal data type, but first transform the occurrence of any N/A string values to NULL values using a CASE expression: TO_FIXED(CASE WHEN sale_price="N/A" then NULL ELSE sale_price END) TO_INT TO_INT is a row function that converts STRING, INTEGER, LONG, or DOUBLE values to INTEGER (whole number) values. When converting DOUBLE values, everything after the decimal will be truncated (not rounded up or down). TO_INT(expression) Returns one value per row of type INTEGER. expression Required. A field or expression of type STRING (must be numeric characters), INTEGER, LONG, or DOUBLE. Convert the values of the average_rating field to an integer data type: TO_INT(average_rating) Convert the flight_duration field to an integer data type, but first transform the occurrence of any NA values to NULL values using a CASE expression: TO_INT(CASE WHEN flight_duration="N/A" then NULL ELSE flight_duration END) Page 397 Data Analysis and Visualization Guide - Platfora Expressions TO_LONG TO_LONG is a row function that converts STRING, INTEGER, LONG, or DOUBLE values to LONG (whole number) values. When converting DOUBLE values, everything after the decimal will be truncated (not rounded up or down). TO_LONG(expression) Returns one value per row of type LONG. expression Required. A field or expression of type STRING (must be numeric characters), INTEGER, LONG, or DOUBLE. Convert the values of the average_rating field to a long data type: TO_LONG(average_rating) Convert the average_rating field to a long data type, but first transform the occurrence of any NA values to NULL values using a CASE expression: TO_LONG(CASE WHEN average_rating="N/A" then NULL ELSE average_rating END) TO_STRING TO_STRING is a row function that converts values of other data types to STRING (character) values. TO_STRING(expression) TO_STRING(datetime_expression,date_format) Returns one value per row of type STRING. expression A field or expression of type FIXED, STRING, INTEGER, LONG, or DOUBLE. datetime_expression A field or expression of type DATETIME. date_format If converting a DATETIME to a string, a pattern that describes how the date is formatted. See TO_DATE for the date format patterns. Convert the values of the sku_number field to a string data type: TO_STRING(sku_number) Convert values in the age column into a range-based groupings (binning), and cast output values to a STRING: Page 398 Data Analysis and Visualization Guide - Platfora Expressions TO_STRING(CASE WHEN age <= 25 THEN "0-25" WHEN age <= 50 THEN "26-50" ELSE "over 50" END) Convert the values of a timestamp datetime field to a string, where the timestamp values are in the format of: 2002.07.10 at 15:08:56 PDT: TO_STRING(timestamp,"yyyy.MM.dd 'at' HH:mm:ss z") Aggregate Functions An aggregate function groups the values of multiple rows together based on some defined input expression. Aggregate functions return one value for a group of rows, and are only valid for defining measures in Platfora. Aggregate functions cannot be combined with row functions. AVG AVG is an aggregate function that returns the average of all valid numeric values. It sums all values in the provided expression and divides by the number of valid (NOT NULL) rows. If you want to compute an average that includes all values in the row count (including NULL values), you can use a SUM/COUNT expression instead. AVG(numeric_field) Returns a value of type DOUBLE. numeric_field Required. A field of type INTEGER, LONG, DOUBLE, or FIXED. Unlike row functions, aggregate functions can only take field names as input. Get the average of the valid sale_amount field values: AVG(sale_amount) Get the average of the valid net_worth field values in the billionaires data set, which resides in the samples namespace: AVG([(samples) billionaires].net_worth) Get the average of all page_views field values in the web_logs dataset (including NULL values): SUM(page_views)/COUNT(web_logs) COUNT COUNT is an aggregate function that returns the number of rows in a dataset. COUNT([namespace_name]dataset_name) Returns a value of type INTEGER. namespace_name Page 399 Data Analysis and Visualization Guide - Platfora Expressions Optional. The name of the namespace in which the dataset resides. If not specified, uses the default namespace. dataset_name Required. The name of the dataset for which to obtain a count of rows. If you want to count rows of a down-stream dataset that is related to the current dataset, you can specify the hierarchy of dataset names in the format of: parent_dataset_name.child_dataset_name.[...] Count the rows in the sales dataset: COUNT(sales) Count the rows in the billionaires dataset, which resides in the samples namespace: COUNT([(samples) billionaires]) Count the rows in the customer dataset, which is a related dataset down-stream of sales: COUNT(sales.customers) COUNT_VALID COUNT_VALID is an aggregate function that returns the number of rows for which the given expression is valid (excludes NULL values). COUNT_VALID(field) Returns a numeric value of type INTEGER. field Required. A field name. Unlike row functions, aggregate functions can only take field names as input. Count the valid values in the page_views field: COUNT_VALID(page_views) DISTINCT DISTINCT is an aggregate function that returns the number of distinct values for the given expression. DISTINCT(field) Returns a numeric value of type INTEGER. field Required. A field name. Unlike row functions, aggregate functions can only take field names as input. Count the unique values of the user_id field in the currently selected dataset: DISTINCT(user_id) Page 400 Data Analysis and Visualization Guide - Platfora Expressions Count the unique values of the name field in the billionaires dataset, which resides in the samples namespace: DISTINCT([(samples) billionaires].name) Count the unique values of the customer_id field in the customer dataset, which is a related dataset down-stream of web sales: DISTINCT([web sales].customers.customer_id) MAX MAX is an aggregate function that returns the biggest value from the given input expression. MAX(numeric_or_datetime_field) Returns a numeric or datetime value of the same type as the input expression. numeric_or_datetime_field Required. A field of type INTEGER, LONG, DOUBLE, FIXED, or DATETIME. Unlike row functions, aggregate functions can only take field names as input. Get the highest value from the sale_amount field: MAX(sale_amount) Get the latest date from the Session Timestamp datetime field: MAX([Session Timestamp]) MIN MIN is an aggregate function that returns the smallest value from the given input expression. MIN(numeric_or_datetime_field) Returns a numeric or datetime value of the same type as the input expression. numeric_or_datetime_field Required. A field of type INTEGER, LONG, DOUBLE, FIXED, or DATETIME. Unlike row functions, aggregate functions can only take field names as input. Get the lowest value from the sale_amount field: MIN(sale_amount) Get the earliest date from the Session Timestamp datetime field: MIN([Session Timestamp]) SUM SUM is an aggregate function that returns the total of all values from the given input expression. Page 401 Data Analysis and Visualization Guide - Platfora Expressions SUM(numeric_field) Returns a numeric value of the same type as the input expression. numeric_field Required. A field of type INTEGER, LONG, DOUBLE, or FIXED. Unlike row functions, aggregate functions can only take field names as input. Add the values of the sale_amount field: SUM(sale_amount) Add values of the session count field in the users dataset, which is a related dataset down-stream of clicks: SUM(clicks.users.[session count]) STDDEV STDDEV is an aggregate function that calculates the population standard deviation for a group of numeric values. Standard deviation is the square root of the variance. STDDEV(numeric_field) Returns a value of type DOUBLE. If there are less than two values in the input group, returns NULL. numeric_field Required. A field of type INTEGER, LONG, DOUBLE, or FIXED. Unlike row functions, aggregate functions can only take field names as input. Calculate the standard deviation of the values contained in the sale_amount field: STDDEV(sale_amount) VARIANCE VARIANCE is an aggregate function that calculates the population variance for a group of numeric values. Variance measures the amount by which all values in a group vary from the average value of the group. Data with low variance contains values that are identical or similar. Data with high variance contains values that are not similar. Variance is calculated as the average of the squares of the deviations from the mean. Squaring the deviations ensures that negative and positive deviations do not cancel each other out. VARIANCE(numeric_field) Returns a value of type DOUBLE. If there are less than two values in the input group, returns NULL. numeric_field Required. A field of type INTEGER, LONG, DOUBLE, or FIXED. Unlike row functions, aggregate functions can only take field names as input. Page 402 Data Analysis and Visualization Guide - Platfora Expressions Get the population variance of the values contained in the sale_amount field: VARIANCE(sale_amount) ROLLUP and Window Functions Window functions can only be used in conjunction with ROLLUP. ROLLUP is a modifier to an aggregate expression that determines the partitioning and ordering of a rowset before the associated aggregate function or window function is applied. ROLLUP defines a window or user-specified set of rows within a query result set. A window function then computes a value for each row in the window. You can use window functions to compute aggregated values such as moving averages, cumulative aggregates, running totals, or a top N per group results. ROLLUP ROLLUP is a modifier to an aggregate function that turns a regular aggregate function into a windowed, partitioned, or adaptive aggregate function. This is useful when you want to compute an aggregation over a subset of rows within the overall result of a viz query. ROLLUP aggregate_expression [ WHERE input_group_condition [...] ] [ TO ([partitioning_columns]) [ ORDER BY (ordering_column [ASC | DESC]) ROWS|RANGE window_boundary [window_boundary] | BETWEEN window_boundary AND window_boundary ] ] where window_boundary can be one of: UNBOUNDED PRECEDING value PRECEDING value FOLLOWING UNBOUNDED FOLLOWING A regular measure is the result of an aggregation (such as SUM or AVG) applied to some fact or metric column of a dataset. For example, suppose we had a dataset with the following rows and columns: Date Sale Amount Product Region 05/01/2013 100 gadget west 05/01/2013 200 widget east 06/01/2013 100 gadget east 06/01/2013 400 widget west 07/01/2013 300 widget west Page 403 Data Analysis and Visualization Guide - Platfora Expressions Date Sale Amount Product Region 07/01/2013 200 gadget east To define a regular measure called Total Sales, we would use the expression: SUM([Sale Amount]) When this measure is used in a visualization, the group of input records passed into the aggregate calculation is determined by the dimensions selected by the user when they create the viz. For example, if the user chose Region as a dimension in the viz, there would be two input groups for which the measure would be calculated: Total Sales / Region east west 500 800 If an aggregate expression includes a ROLLUP clause, the column(s) specified in the TO clause of the ROLLUP expression determine the additional partitions over which to compute the aggregate expression. It divides the overall rows returned by the viz query into subsets or buckets, and then computes the aggregate expression within each bucket. Every ROLLUP expression has implicit partitioning defined: an absent TO clause treats the entire result set as one partition; an empty TO clause partitions by whatever dimension columns are present in the viz query. The WHERE clause is used to filter the input rows that flow into each partition. Input rows that meet the WHERE clause criteria will be partitioned, and rows that don't will not be partitioned. The ORDER BY with a RANGE or ROW clause is used to define a window frame within each partition over which to compute the aggregate expression. When a ROLLUP measure is used in a visualization, the aggregate calculation is computed across a set of input rows that are related to, but separate from, the other dimension(s) used in the viz. This is similar to the type of calculation that is done with a regular measure. However unlike a regular measure, a ROLLUP measure does not cause the input rows to be grouped into a single result set; the input rows still retain their separate identities. The ROLLUP clause determines how the input rows are split up for processing by the ROLLUP's aggregate function. ROLLUP expressions can be written to make the partitioning adaptive to whatever dimension columns are selected in the visualization. This is done by using a reference name as the partitioning column, as opposed to a regular column. For example, suppose we wanted to be able to calculate the total sales for any granularity of date. We could create an adaptive measure called Rollup Sales to Date that partitions total sales by date as follows: ROLLUP SUM([Sale Amount]) TO (Date) Page 404 Data Analysis and Visualization Guide - Platfora Expressions When this measure is used in a visualization, the group of input records passed into the aggregate calculation is determined by the dimension fields selected by the user in the viz, but partitioned by the granularity of Date selected by the user. For example, if the user chose the dimensions Date.Month and Region in the viz, then total sales would be grouped by month and region, but the ROLLUP measure expression would aggregate the sales by month only. Notice that the results for the east and west regions are the same - this is because the aggregation expression is only considering rows that share the same month when calculating the sum of sales. Month / (Measures) / Region May 2013 June 2013 July 2013 Rollup Sales to Date Rollup Sales to Date Rollup Sales to Date east | west east | west east | west 300 | 300 500 | 500 500 | 500 Suppose within the date partition, we wanted to calculate the cumulative total day to day. We could define a window measure called Running Total to Date that looks at each day and all preceding days as follows: ROLLUP SUM([Sale Amount]) TO (Date) ORDER BY (Date.Date) ROWS UNBOUNDED PRECEDING When this measure is used in a visualization, the group of input records passed into the aggregate calculation is determined by the dimension fields selected by the user in the viz, and partitioned by the granularity of Date selected by the user. Within each partition the rows are ordered chronologically (by Date.Date), and the sum amount is then calculated per date partition by looking at the current row (or mark), and all rows that come before it within the partition. For example, if the user chose the dimension Date.Month in the viz, then the ROLLUP measure expression would cumulatively aggregate the sales within each month. Month / (Measures) / Date.Date May 2013 June 2013 July 2013 2013-05-01 2013-06-01 2013-07-01 Running Total to Date Rollup Sales to Date Rollup Sales to Date 300 500 500 Returns a numeric value per partition based on the output type of the aggregate_expression. aggregate_expression Page 405 Data Analysis and Visualization Guide - Platfora Expressions Required. An expression containing an aggregate or window function. Simple aggregate functions such as COUNT, AVG, SUM, MIN, and MAX are supported. Window functions such as RANK, DENSE_RANK, and NTILE are supported and can only be used in conjuction with ROLLUP. Complex aggregate functions such as STDDEV and VARIANCE are not supported. WHERE input_group_condition The WHERE clause limits the group of input rows over which to compute the aggregate expression. The input group condition is a Boolean (true or false) condition defined using a comparison operator expression. Any row that does not satisfy the condition will be excluded from the input group used to calculate the aggregated measure value. For example (note that datetime values must be specified in yyyy-MM-dd format): WHERE Date.Date BETWEEN 2012-06-01 AND 2012-07-31 WHERE Date.Year BETWEEN 2009 AND 2013 WHERE Company LIKE("Plat*") WHERE Code IN("a","b","c") WHERE Sales < 50.00 WHERE Age >= 21 You can specify multiple WHERE clauses in a ROLLUP expression. TO ([partitioning_columns]) The TO clause is used to specify the dimension column(s) used to partition a group of input rows. This allows you to calculate a measure value for a specific dimension group (a subset of input rows) that are somehow related to the other dimension groups used in a visualization (all input rows). It is possible to define an empty group (meaning all rows) by using empty parenthesis. When used in a visualization, measure values are computed for groups of input rows that return the same value for the columns specified in the partitioning list. For example, if the Date.Month column is used as a partitioning column, then all records that have the same value for Date.Month will be grouped together in order to calculate the measure value. The aggregate expression is applied to the group specified in the TO clause independently of the other dimension groupings used in the visualization. Note that the partitioning column(s) specified in the TO clause of an adaptive measure expression must also be included as dimensions (or grouping columns) in the visualization. A partitioning column can also be the name of a reference field. Using a reference field allows the partition criteria to dynamically adapt based on any field of the referenced dataset that is used in a viz. For example, if the partition column is a reference field pointing to the Date dimension, then any subfield of Date (Date.Year, Date.Month, etc.) can be used as the partitioning column by selecting it in a viz. Page 406 Data Analysis and Visualization Guide - Platfora Expressions A TO clause with an empty partitioning list treats each mark in the result set as an input group. For example, if the viz includes the Month and Region columns, then TO() would be equivalent to TO(Month,Region). ORDER BY (ordering_column) The optional ORDER BY clause orders the input rows using the values in the specified column within each partition identified in the TO clause. Use the ORDER BY clause along with the ROWS or RANGE clauses to define windows over which to compute the aggregate function. This is useful for computing moving averages, cumulative aggregates, running totals, or a top value per group of input rows. The ordering column specified in the ORDER BY clause can be a dimension, measure, or an aggregate expression (for example ORDER BY (SUM(Sales))). If the ordering column is a dimension, it must be included in the viz. By default, rows are sorted in ascending order (low to high values). You can use the DESC keyword to sort in descending order (high to low values). ROWS | RANGE Required when using ORDER BY. Further limits the rows within the partition by specifying start and end points within the partition. This is done by specifying a range of rows with respect to the current row either by logical association (RANGE) or physical association (ROWS). Use either a ROWS or RANGE clause to express the window boundary (the set of input rows in each partition, relative to the current row, over which to compute the aggregate expression). The window boundary can include one, several, or all rows of the partition. When using the RANGE clause, the ordering column used in the ORDER BY clause must be a sub-column of a reference to Platfora's built-in Date dimension dataset. window_boundary A window boundary is required when using either ROWS or RANGE. This defines the set of rows, relative to the current row, over which to compute the aggregate expression. The row order is based on the ordering specified in the ORDER BY clause. A PRECEEDING clause defines a lower window boundary (the number of rows to include before the current row). The FOLLOWING clause defines an upper window boundary (the number of rows to include after the current row). The window boundary expression must include either a PRECEEDING or FOLLOWING clause, or both. If PRECEEDING is omitted, the current row is considered the first row in the window. Similarly, if FOLLOWING is omitted, the current row is considered the last row in the window. The UNBOUNDED keyword includes all rows in the direction specified. When you need to specify both a start and end of a window, use the BETWEEN and AND keywords. For example: ROWS 2 PRECEDING means that the window is three rows in size, starting with two rows preceding until and including the current row. Page 407 Data Analysis and Visualization Guide - Platfora Expressions ROWS BETWEEN 2 PRECEDING AND 5 FOLLOWING means that the window is eight rows in size, starting with two rows preceding, the current row, and five rows following the current row. The current row is included in the set of rows by default. You can exclude the current row from the window by specifying a window start and end point before or after the current row. For example: ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING starts the window with all rows that come before the current row, and ends the window one row before the current row, thereby excluding the current row from the window. Calculate the percentage of flight records in the same departure date period. Note that the departure_date field is a reference to the Date dataset, meaning that the group to which the measure is applied can adapt to any downstream field of departure_date (departure_date.Year, departure_date.Month, and so on). When used in a viz, this will calculate the percentage of flights for each dimension group in the viz that share the same value for departure_date: 100 * COUNT(Flights) / ROLLUP COUNT(Flights) TO ([Departure Date]) Normalize the number of flights using the carrier American Airlines (AA) as the benchmark. This will allow you to compare the number of flights for other carriers against the fixed baseline number of flights for AA (if AA = 100 percent, then all other carriers will fall either above or below that percentage): 100 * COUNT(Flights) / ROLLUP COUNT(Flights) WHERE [Carrier Code]="AA" Calculate a generic percentage of total sales. When this measure is used in a visualization, it will show the percentage of total sales that a mark in the viz is contributing to the total for all marks in the viz. The input rows depend on the dimensions selected in the viz. 100 * SUM(sales) / ROLLUP SUM(sales) TO () Calculate the cumulative total of sales for a given year on a month-to-month basis (year-to-month sales totals): ROLLUP SUM(sales) TO (Date.Year) ORDER BY (Date.Month) ROWS UNBOUNDED PRECEDING Calculate the cumulative total of sales (for all input rows) for all previous years, but exclude the current year from the total. ROLLUP SUM(sales) TO () ORDER BY (Date.Year) ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING DENSE_RANK DENSE_RANK is a windowing aggregate function that orders rows by a measure value and assigns a rank number to each row in the given partition. Rank positions are not skipped in the event of a tie. DENSE_RANK must be used within a ROLLUP expression. ROLLUP DENSE_RANK() TO ([partitioning_column]) ORDER BY (measure_expression [ASC | DESC]) Page 408 Data Analysis and Visualization Guide - Platfora Expressions ROWS|RANGE window_boundary [window_boundary] | BETWEEN window_boundary AND window_boundary ] where window_boundary can be one of: UNBOUNDED PRECEDING value PRECEDING value FOLLOWING UNBOUNDED FOLLOWING DENSE_RANK is a window aggregate function used to assign a ranking number to each row in a group. If multiple rows have the same ranking value (there is a tie), then the tied rows are given the same rank value and subsequent rank positions are not skipped. The TO clause of the ROLLUP is used to specify the dimension column(s) used to partition a group of input rows. To define a global ranking that can adapt to any dimension groupings used in a viz, specify an empty TO clause. The ORDER BY clause of the ROLLUP expression determines how to order the rows before they are ranked. The ORDER BY clause should specify the measure field for which you want to calculate the ranks. The ranked rows in the partition are numbered starting at one. For example, suppose we had a dataset with the following rows and columns and you want to rank the Quarters and Regions according to the values in the Sales column. Quarter Region Sales 2010 Q1 North 100 2010 Q1 South 200 2010 Q1 East 300 2010 Q1 West 400 2010 Q2 North 400 2010 Q2 South 250 2010 Q2 East 150 2010 Q2 West 250 Supposing the lens has an existing measure field called Sales(Sum), you could then define a measure called Sales_Dense_Rank using the following expression: ROLLUP DENSE_RANK() TO () ORDER BY ([Sales(Sum)] DESC) ROWS UNBOUNDED PRECEDING Page 409 Data Analysis and Visualization Guide - Platfora Expressions When you include the Quarter, Region, and Sales_Dense_Rank columns in the viz, you get the following data points. Notice that tied values are given the same rank number and no rank positions are skipped: Quarter Region SalesRank 2010 Q1 North 6 2010 Q1 South 4 2010 Q1 East 2 2010 Q1 West 1 2010 Q2 North 1 2010 Q2 South 3 2010 Q2 East 5 2010 Q2 West 3 Returns a value of type LONG. ROLLUP Required. DENSE_RANK must be used within a ROLLUPROLLUP expression in place of the aggregate_expression of the ROLLUP. The TO clause of the ROLLUP expression specifies the dimension group(s) over which to calculate the window function. An empty TO calculates the window function over all rows in the query as one group. The ORDER BY clause of the ROLLUP expression specifies a measure field or aggregate expression. Rank the sum of all sales in descending order, so the highest sales is given the ranking of 1. ROLLUP DENSE_RANK() TO () ORDER BY ([Sales(Sum)] DESC) ROWS UNBOUNDED PRECEDING Rank the sum of all sales within a given quarter in descending order, so the highest sales in each quarter is given the ranking of 1. ROLLUP DENSE_RANK() TO (Quarter) ORDER BY ([Sales(Sum)] DESC) ROWS UNBOUNDED PRECEDING Page 410 Data Analysis and Visualization Guide - Platfora Expressions NTILE NTILE is a windowing aggregate function that divides a partitioned group of rows into the specified number of buckets, and returns the bucket number to which the current row belongs. NTILE must be used within a ROLLUP expression. ROLLUP NTILE(integer) TO ([partitioning_column]) ORDER BY (measure_expression [ASC | DESC]) ROWS|RANGE window_boundary [window_boundary] | BETWEEN window_boundary AND window_boundary ] where window_boundary can be one of: UNBOUNDED PRECEDING value PRECEDING value FOLLOWING UNBOUNDED FOLLOWING NTILE is a window aggregate function typically used to calculate percentiles. A percentile (or centile) is a measure used in statistics indicating the value below which a given percentage of records in a group falls. For example, the 20th percentile is the value (or score) below which 20 percent of the records may be found. The term percentile is often used in the reporting of test scores. For example, if a score is in the 86th percentile, it is higher than 86% of the other scores. The 25th percentile is also known as the first quartile (Q1), the 50th percentile as the median or second quartile (Q2), and the 75th percentile as the third quartile (Q3). In general, percentiles, deciles and quartiles are specific types of ntiles. NTILE must be used within a ROLLUPROLLUP expression in place of the aggregate_expression of the ROLLUP. The TO clause of the ROLLUP is used to specify a fixed dimension column used to partition a group of input rows. To define a global NTILE ranking that can adapt to any dimension groupings used in a viz, specify an empty TO clause. The ORDER BY clause of the ROLLUP expression determines how to order the rows before they are divided into buckets. The ORDER BY clause should specify the measure field for which you want to calculate NTILE bucket values. A centile would be 100 buckets, a decile would be 10 buckets, a quartile 4 buckets, and so on. The buckets in the partition are numbered starting at one. For example, suppose we had a dataset with the following rows and columns and you want to divide the year-to-date sales into four buckets (quartiles) with the highest quartile ranked as 1 and the lowest ranked as 4. Supposing a measure field has been defined called Sum_YTD_Sales, defined as Page 411 Data Analysis and Visualization Guide - Platfora Expressions SUM([Sales YTD]), you could then define a measure called YTD_Sales_Quartile using the following expression: ROLLUP NTILE(4) TO() ORDER BY(Sum_YTD_Sales DESC) ROWS UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING Name Gender Sales YTD YTD_Sales_Quartile Chen F 3,500,000 1 John M 3,100,000 1 Pete M 2,900,000 1 Daria F 2,500,000 2 Jennie F 2,200,000 2 Mary F 2,100,000 2 Mike M 1,900,000 3 Brian M 1,700,000 3 Molly F 1,500,000 3 Theresa F 1,200,000 4 Hans M 900,000 4 Ben M 500,000 4 Because the TO clause of the ROLLUP expression is empty, the quartile partitioning adapts to whatever dimensions are used in the viz. For example, if you include the Gender dimension field in the viz, the quartiles would then be computed per gender. The following example divides each gender into buckets with each gender having 6 year-to-date sales values. The two extra values (the remainder of 6 / 4) are allocated to buckets 1 and 2, which therefore have one more value than buckets 3 or 4. Name Gender Sales YTD YTD_Sales_Quartile (partitioned by Gender) Chen F 3,500,000 1 Daria F 2,500,000 1 Jennie F 2,200,000 2 Mary F 2,100,000 2 Molly F 1,500,000 3 Page 412 Data Analysis and Visualization Guide - Platfora Expressions Name Gender Sales YTD YTD_Sales_Quartile (partitioned by Gender) Theresa F 1,200,000 4 John M 3,100,000 1 Pete M 2,900,000 1 Mike M 1,900,000 2 Brian M 1,700,000 2 Hans M 900,000 3 Ben M 500,000 4 Returns a value of type LONG. ROLLUP Required. NTILE must be used within a ROLLUPROLLUP expression in place of the aggregate_expression of the ROLLUP. The TO clause of the ROLLUP expression specifies the dimension group(s) over which to calculate the window function. An empty TO calculates the window function over all rows in the query as one group. The ORDER BY clause of the ROLLUP expression specifies a measure field or aggregate expression. integer Required. An integer that specifies the number of buckets to divide the partitioned rows into. Perhaps the most common use case for NTILE is to get a global ranking of result rows. For example, if you wanted to get the percentile of Total Records per City, you may think the expression to use is: ROLLUP NTILE(100) TO (City) ORDER BY ([Total Records] DESC) ROWS UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING. However, by leaving the TO clause blank, the percentile buckets can adapt to whatever dimension(s) you use in the viz. To calculate the Total Records percentiles by City, you could define a global Total_Records_Percentiles measure and then use this measure in conjunction with the City dimension in the viz (or any other dimension for that matter). ROLLUP NTILE(100) TO () ORDER BY ([Total Records] DESC) ROWS UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING RANK RANK is a windowing aggregate function that orders rows by a measure value and assigns a rank number to each row in the given partition. Rank positions are skipped in the event of a tie. RANK must be used within a ROLLUP expression. ROLLUP RANK() Page 413 Data Analysis and Visualization Guide - Platfora Expressions TO ([partitioning_column]) ORDER BY (measure_expression [ASC | DESC]) ROWS|RANGE window_boundary [window_boundary] | BETWEEN window_boundary AND window_boundary ] where window_boundary can be one of: UNBOUNDED PRECEDING value PRECEDING value FOLLOWING UNBOUNDED FOLLOWING RANK is a window aggregate function used to assign a ranking number to each row in a group. If multiple rows have the same ranking value (there is a tie), then the tied rows are given the same rank value and the subsequent rank position is skipped. The TO clause of the ROLLUP is used to specify the dimension column(s) used to partition a group of input rows. To define a global ranking that can adapt to any dimension groupings used in a viz, specify an empty TO clause. The ORDER BY clause of the ROLLUP expression determines how to order the rows before they are ranked. The ORDER BY clause should specify the measure field for which you want to calculate the ranks. The ranked rows in the partition are numbered starting at one. For example, suppose we had a dataset with the following rows and columns and you want to rank the Quarters and Regions according to the values in the Sales column. Quarter Region Sales 2010 Q1 North 100 2010 Q1 South 200 2010 Q1 East 300 2010 Q1 West 400 2010 Q2 North 400 2010 Q2 South 250 2010 Q2 East 150 2010 Q2 West 250 Page 414 Data Analysis and Visualization Guide - Platfora Expressions Supposing the lens has an existing measure field called Sales(Sum), you could then define a measure called Sales_Rank using the following expression: ROLLUP RANK() TO () ORDER BY ([Sales(Sum)] DESC) ROWS UNBOUNDED PRECEDING When you include the Quarter, Region, and Sales_Rank columns in the viz, you get the following data points. Notice that tied values are given the same rank number and the rank positions 2 and 5 are skipped: Quarter Region SalesRank 2010 Q1 North 8 2010 Q1 South 6 2010 Q1 East 3 2010 Q1 West 1 2010 Q2 North 1 2010 Q2 South 4 2010 Q2 East 7 2010 Q2 West 4 Returns a value of type LONG. ROLLUP Required. RANK must be used within a ROLLUPROLLUP expression in place of the aggregate_expression of the ROLLUP. The TO clause of the ROLLUP expression specifies the dimension group(s) over which to calculate the window function. An empty TO calculates the window function over all rows in the query as one group. The ORDER BY clause of the ROLLUP expression specifies a measure field or aggregate expression. Rank the sum of all sales in descending order, so the highest sales is given the ranking of 1. ROLLUP RANK() TO () ORDER BY ([Sales(Sum)] DESC) ROWS UNBOUNDED PRECEDING Rank the sum of all sales within a given quarter in descending order, so the highest sales in each quarter is given the ranking of 1. ROLLUP RANK() TO (Quarter) ORDER BY ([Sales(Sum)] DESC) ROWS UNBOUNDED PRECEDING Page 415 Data Analysis and Visualization Guide - Platfora Expressions ROW_NUMBER ROW_NUMBER is a windowing aggregate function that assigns a unique, sequential number to each row in a group (partition) of rows, starting at 1 for the first row in each partition. ROW_NUMBER must be used within a ROLLUP expression, which acts as a modifier for ROW_NUMBER. Use a column in the ROLLUP order by clause to determine on which column to determine the row number. ROLLUP ROW_NUMBER(integer) TO ([partitioning_column]) ORDER BY (ordering_column [ASC | DESC]) ROWS|RANGE window_boundary [window_boundary] | BETWEEN window_boundary AND window_boundary ] where window_boundary can be one of: UNBOUNDED PRECEDING value PRECEDING value FOLLOWING UNBOUNDED FOLLOWING For example, suppose we had a dataset with the following rows and columns: Quarter Region Sales 2010 Q1 North 100 2010 Q1 South 200 2010 Q1 East 300 2010 Q1 West 400 2010 Q2 North 400 2010 Q2 South 250 2010 Q2 East 150 2010 Q2 West 250 Suppose you want to assign a unique ID to the sales of each region by quarter in descending order. In this example, a measure field is defined called Sum_Sales with the expression SUM(Sales). You could then define a measure called SalesNumber using the following expression: ROLLUP ROW_NUMBER() TO (Quarter) ORDER BY (Sum_Sales DESC) ROWS UNBOUNDED PRECEDING Page 416 Data Analysis and Visualization Guide - Platfora Expressions When you include the Quarter, Region, and SalesNumber columns in the viz, you get the following data points: Quarter Region SalesNumber 2010 Q1 North 4 2010 Q1 South 3 2010 Q1 East 2 2010 Q1 West 1 2010 Q2 North 1 2010 Q2 South 2 2010 Q2 East 4 2010 Q2 West 3 Returns a value of type LONG. None Assign a unique ID to the sales of each region by quarter in descending order, so the highest sales is given the number of 1. ROLLUP ROW_NUMBER() TO (Quarter) ORDER BY (Sum_Sales DESC) ROWS UNBOUNDED PRECEDING User Defined Functions (UDFs) User defined functions (UDFs) allow you to define your own per-row processing logic, and then expose that functionality to users in the Platfora application expression builder. User defined functions can only be used to implement new row functions, not aggregate functions. If a computed field that uses a UDF is included in a lens, the UDF will be executed once for each row during the lens build process. This is good to keep in mind when writing UDF Java programs, so you do not write programs that negatively impact lens build resources or execution times. Writing a Platfora UDF Java Program User defined functions (UDFs) are written in the Java programming language and implement the Platfora-provided Java interface, com.platfora.udf.UserDefinedFunction. Verify that any JAR file that the UDF will use is compatible with the existing libraries Platfora uses. You can find those libraries in $PLATFORA_HOME/lib. Page 417 Data Analysis and Visualization Guide - Platfora Expressions To define a user defined function for Platfora, you must have the Java Development Kit (JDK) version 6 or 7 installed on the machine where you plan to do your development. You will also need the com.platfora.udf.UserDefinedFunction interface Java code from your Platfora master server installation. If you go to the $PLATFORA_HOME/tools/udf directory of your Platfora master server installation, you will find two files: • platfora-udf.jar – This is the compiled code for the com.platfora.udf.UserDefinedFunction interface. You must link to this jar file (place it in the CLASSPATH) when you compile your UDF Java program. • /com/platfora/udf/UserDefinedFunction.java – This is the source code for the Java interface that your UDF classes need to implement. The source code is provided as reference documentation of the Platfora UserDefinedFunction interface. You can refer to this file when writing your UDF Java programs. 1. Copy the file $PLATFORA_HOME/tools/udf/platfora-udf.jar to a directory on the machine where you plan to develop and compile your UDF program. 2. Write a Java program that implements com.platfora.udf.UserDefinedFunction interface. For example, here is a sample Java program that defines a REPEAT_STRING user defined function. This simple function repeats an input string a specified number of times. import java.util.List; /** * Sample user-defined function implementation that demonstrates * how to create a REPEAT_STRING function. */ public class RepeatString implements com.platfora.udf.UserDefinedFunction { /** * Returns the name of the user-defined function. * The first character in the name must be a letter, * and subsequent characters must be either letters, * digits, or underscores. You cannot name your function * the same name as an existing Platfora * built-in function. Names are case-insensitive. */ @Override public String getFunctionName() { return "REPEAT_STRING"; } /** * Returns one of the following values, reflecting the * return type of the user-defined function: * DATETIME, DOUBLE, FIXED, INTEGER, LONG, or STRING. */ Page 418 Data Analysis and Visualization Guide - Platfora Expressions @Override public String getReturnType() { return "STRING"; } /** * Returns an array of Strings, one for each of the * input arguments to the user-defined function, * specifying the required data type for each argument. * The Strings should be of the following values: * DATETIME, DOUBLE, FIXED, INTEGER, LONG, STRING. */ @Override public String[] getArgumentTypes() { return new String[] { "STRING", "INTEGER" }; } /** * Returns a human-readable description of what the function * does, to be displayed to Platfora users in the * Expression Builder. May return null. */ @Override public String getDescription() { return "The REPEAT_STRING function returns an input string repeated " + " a specified number of times."; } /** * Returns a human-readable description explaining the * value that the function returns, to be displayed to * Platfora users in the Expression Builder. May return null. */ @Override public String getReturnValueDescription() { return "Returns one value per row of type STRING"; } /** * Returns a human-readable example of the function syntax, * to be displayed to Platfora users in the Expression * Builder. May return null. */ @Override public String getExampleUsage() { return "CONCAT(\"It's a \", REPEAT_STRING(\"Mad \",4), \" World\")"; } /** Page 419 Data Analysis and Visualization Guide - Platfora Expressions * The compute method performs the actual work of evaluating * the user-defined function. The method should operate on the * argument values provided to calculate the function return value * and return a Java object of the appropriate type to represent * the return value. The following mapping describes the Java * object type that is used to represent each Platfora data type: * DATETIME -> java.util.Date * DOUBLE -> java.lang.Double * FIXED -> java.lang.Long * INTEGER -> java.lang.Integer * LONG -> java.lang.Long * STRING -> java.lang.String * Note on FIXED type: fixed-precision numbers in Platfora * are represented as Longs that have been scaled by a * factor of 10,000. * * In the event that the user-defined function * encounters invalid inputs, or the function return value is not * defined given the inputs provided, the compute method should return * null rather than throwing an exception. The compute method should * avoid throwing any exceptions. * * @param arguments The values of the function inputs. * * The entries in this list will match the specification * provided by getArgumentTypes method in type, number, and order: * for example, if getArgumentTypes returned an array of * length 3 with the values STRING, DOUBLE, STRING, then * the arguments parameter will hold be a list of 3 Java * objects: a java.lang.String, a java.lang.Double, and a * java.lang.String. Any of the values within the * arguments List may be null. */ @Override public String compute(List arguments) { // cast the inputs to the correct types final String toRepeat = (String) arguments.get(0); final Integer numberOfRepeats = (Integer) arguments.get(1); // check for invalid inputs if (toRepeat == null || numberOfRepeats == null || numberOfRepeats < 0) return null; // repeat the input string the specified number of times final StringBuilder builder = new StringBuilder(); for (int i = 0; i < numberOfRepeats; i++) { builder.append(toRepeat); } return builder.toString(); Page 420 Data Analysis and Visualization Guide - Platfora Expressions } } 3. Compile your .java UDF program file into a .class file (make sure to link to the platforaudf.jar file or place it in your Java CLASSPATH). The target Java version must be Java 1.6. Compiling with a target of Java 1.7 will result in an error when the UDF is used. For example, to compile the RepeatString.java program using Java 1.6: javac -source 1.6 -target 1.6 -cp platfora-udf.jar RepeatString.java 4. Create a Java archive file (.jar) containing your .class file. For example: jar cf repeat-string-udf.jar RepeatString.class After you have written and compiled your UDF Java program, you must then install and enable it on the Platfora master server. See Adding a UDF to the Platfora Expression Builder. Adding a UDF to the Platfora Expression Builder After you have written and compiled a user defined function (UDF) Java class, you must install your class on the Platfora master server and enable it so that it can be seen and used in the Platfora expression builder. This task is performed on the Platfora master server. Before you begin, you must have written and compiled a Java class for your user defined function. See Writing a Platfora UDF Java Program. 1. Create a directory named extlib in the Platfora data directory on the Platfora master server. For example: $ mkdir $PLATFORA_DATA_DIR/extlib 2. Copy the Java archive (.jar) file containing your UDF class to the $PLATFORA_DATA_DIR/ extlib directory on the Platfora master server. For example: $ cp repeat-string-udf.jar $PLATFORA_DATA_DIR/extlib/ 3. Set the Platfora server configuration property, platfora.udf.class.names, so it contains the name of your UDF Java class. If you have more than one class, separate the class names with a comma. For example, to set this property using the platfora-config command-line utility: $ $PLATFORA_HOME/bin/platfora-config set --key platfora.udf.class.names --value RepeatString 4. Restart the Platfora server: $ platfora-services restart Page 421 Data Analysis and Visualization Guide - Platfora Expressions The user defined function will then be available for defining computed field expressions in the Add Field dialog of the Platfora application. Due to the way some web browsers cache Javascript files, the newly added function may not appear in the Functions list for up to 24 hours. However, the function is immediately available for use and recognized by the Expression autocomplete feature. Regular Expression Reference Regular expressions vary in complexity using a combination of basic constructs to describe a string matching pattern. This reference describes the most common regular expression matching patterns, but is not a comprehensive list. Regular expressions, also referred to as regex or regexp, are a standardized collection of special characters and constructs used for matching strings of text. They provide a flexible and precise language for matching particular characters, words, or patterns of characters. Platfora regular expressions are based on the pattern matching syntax of the Java programming language. For more in depth information on writing valid regular expressions, refer to the Java regular expression pattern documentation. Page 422 Data Analysis and Visualization Guide - Platfora Expressions Platfora makes use of regular expressions in the following contexts: • In computed field expressions that use the REGEX or REGEX_REPLACE functions. • In PARTITION expression statements for event series processing computed fields. • In the Regex file parser in data ingest. • In the data source location path descriptor in data ingest. • In lens filter expressions. Regex Literal and Special Characters The most basic form of regular expression pattern matching is the match of a literal character or string. Regular expressions also have a number of special characters that affect the way a pattern is matched. This section describes the regular expression syntax for referring to literal characters, special characters, non-printable characters (such as a tab or a newline), and special character escaping. The most basic form of pattern matching is the match of literal characters. For example, if the regular expression is foo and the input string is foo, the match will succeed because the strings are identical. Certain characters are reserved for special use in regular expressions. These special characters are often called metacharacters. If you want to use special characters as literal characters, they must be escaped. Character Name Character Reserved For opening bracket [ start of a character class closing bracket ] end of a character class hyphen - character ranges within a character class backslash \ general escape character caret ^ beginning of string, negating of a character class dollar sign $ end of string period . matching any single character pipe | alternation (OR) operator question mark ? optional quantifier, quantifier minimizer asterisk * zero or more quantifier plus sign + once or more quantifier opening parenthesis ( start of a subexpression group closing parenthesis ) end of a subexpression group Page 423 Data Analysis and Visualization Guide - Platfora Expressions Character Name Character Reserved For opening brace { start of min/max quantifier closing brace } end of min/max quantifier There are two ways to force a special character to be treated as an ordinary character: • Precede the special character with a \ (backslash character). For example, to specify an asterisk as a literal character instead of a quantifier, use \*. • Enclose the special character(s) within \Q (starting quote) and \E (ending quote). Everything between \Q and \E is then treated as literal characters. • To escape literal double-quotes in a REGEX() expression, double the double-quotes (""). For example, to extract the inches portion from a height field where example values are 6'2", 5'11": REGEX(height, "\'(\d)+""$") You can use special character sequence constructs to specify non-printable characters in a regular expression. Some of the most commonly used constructs are: Construct Matches \n newline character \r carriage return character \t tab character \f form feed character Regex Character Classes A character class allows you to specify a set of characters, enclosed in square brackets, that can produce a single character match. There are also a number of special predefined character classes (backslash character sequences that are shorthand for the most common character sets). A character class matches only to a single character. For example, gr[ae]y will match to gray or grey, but not to graay or graey. The order of the characters inside the brackets does not matter. You can use a hyphen inside a character class to specify a range of characters. For example, [az] matches a single lower-case letter between a and z. You can also use more than one range, or a combination of ranges and single characters. For example, [0-9X] matches a numeric digit or the letter X. Again, the order of the characters and the ranges does not matter. Page 424 Data Analysis and Visualization Guide - Platfora Expressions A caret following an opening bracket specifies characters to exclude from a match. For example, [^abc] will match any character except a, b, or c. Construct Type Description [abc] simple matches a or b or c [^abc] negation matches any character except a or b or c [a-zA-Z] range matches a through z , or A through Z (inclusive) [a-d[m-p]] union matches a through d , or m through p [a-z&&[def]] intersection matches d , e , or f Page 425 Data Analysis and Visualization Guide - Platfora Expressions Construct Type Description [a-z&&[^xq]] subtraction matches a through z , except for x and q Predefined character classes offer convenient shorthands for commonly used regular expressions. Construct Description Example . matches any single character (except newline) .at matches "cat", "hat", and also"bat" in the phrase "batch files" \d \D matches any digit character (equivalent to \d [0-9] ) matches "3" in "C3PO" and "2" in "file_2.txt" matches any non-digit character (equivalent to \D [^0-9] matches "S" in "900S" and "Q" in "Q45" ) \s matches any single white-space character (equivalent to [ \t\n\x0B\f\r] \sbook matches "book" in "blue book" but nothing in "notebook" ) \S matches any single non-white-space character \Sbook matches "book" in "notebook" but nothing in "blue book" \w matches any alphanumeric character, including r\w* underscore (equivalent to matches "rm" and "root" [A-Za-z0-9_] ) \W matches any non-alphanumeric character (equivalent to [^A-Za-z0-9_] ) Page 426 \W matches "&" in "stmd &" , "%" in "100%", and "$" in "$HOME" Data Analysis and Visualization Guide - Platfora Expressions POSIX has a set of character classes that denote certain common ranges. They are similar to bracket and predefined character classes, except they take into account the locale (the local language/coding system). \p{Lower} a lower-case alphabetic character, [a-z] \p{Upper} an upper-case alphabetic character, [A-Z] \p{ASCII} an ASCII character, [\x00-\x7F] \p{Alpha} an alphabetic character, [a-zA-z] \p{Digit} a decimal digit, [0-9] \p{Alnum} an alphanumeric character, [a-zA-z0-9] \p{Punct} a punctuation character, one of !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~ \p{Graph} a visible character, [\p{Alnum}\p{Punct}] \p{Print} a printable character, [\p{Graph}\x20] \p{Blank} a space or tab, [ t] \p{Cntrl} a control character, [\x00-\x1F\x7F] \p{XDigit} a hexidecimal digit, [0-9a-fA-F] \p{Space} a whitespace character, [ \t\n\x0B\f\r] Page 427 Data Analysis and Visualization Guide - Platfora Expressions Regex Line and Word Boundaries Boundary matching constructs are used to specify where in a string to apply a matching pattern. For example, you can search for a particular pattern within a word boundary, or search for a pattern at the beginning or end of a line. Construct Description Example ^ matches from the beginning of a line (multiline matches are currently not supported) ^172 matches from the end of a line (multi-line matches are currently not supported) d$ matches within a word boundary \bis\b $ \b will match the "172" in IP address "172.18.1.11" but not in "192.172.2.33" will match the "d" in "maid" but not in "made" matches the word "is" in "this is my island", but not the "is" part of "this" or "island". \bis matches both "is" and the "is" in "island", but not in "this". \B matches within a non-word boundary \Bb matches "b" in "sbin" but not in "bash" Regex Quantifiers Quantifiers specify how often the preceding regular expression construct should match. There are three classes of quantifiers: greedy, reluctant, and possessive. The difference between greedy, reluctant, and possessive quantifiers involves what part of the string to try for the initial match, and how to retry if the initial attempt does not produce a match. By default, quantifiers are greedy. A greedy quantifier will first try for a match with the entire input string. If that produces a match, then the match is considered a success, and the engine can move on to the next construct in the regular expression. If the first try does not produce a match, the engine backsoff one character at a time until a match is found. So a greedy quantifier checks for possible matches in order from the longest possible input string to the shortest possible input string, recursively trying from right to left. Adding a ? (question mark) to a greedy quantifier makes it reluctant. A reluctant quantifier will first try for a match from the beginning of the input string, starting with the shortest possible piece of the string that matches the regex construct. If that produces a match, then the match is considered a success, and the engine can move on to the next construct in the regular expression. If the first try does not produce a match, the engine adds one character at a time until a match is found. So a reluctant quantifier checks Page 428 Data Analysis and Visualization Guide - Platfora Expressions for possible matches in order from the shortest possible input string to the longest possible input string, recursively trying from left to right. Adding a + (plus sign) to a greedy quantifier makes it possessive. A possessive quantifier is like a greedy quantifier on the first attempt (it tries for a match with the entire input string). The difference is that unlike a greedy quantifier, a possessive quantifier does not retry a shorter string if a match is not found. If the initial match fails, the possessive quantifier reports a failed match. It does not make any more attempts. Greedy ReluctantPossessiveDescription ConstructConstructConstruct Example ? matches the previous character or construct once or not at all st?on matches the previous character or construct zero or more times if* matches the previous character or construct one or more times if+ matches the previous character or construct exactly o{2} * + {n} ?? *? +? {n}? ?+ *+ ++ {n}+ matches "son" in "johnson" and "ston" in "johnston" but nothing in "clinton" or "version" matches "if", "iff" in "diff", or "i" in "print" matches "if", "iff" in "diff", but nothing in "print" matches "oo" in "lookup" and the first two o's in "fooooo" but nothing in "mount" n times {n,} {n,}? {n,}+ matches the previous character or construct at least o{2,} matches "oo" in "lookup" all five o's in "fooooo" but nothing in "mount" n times {n,m} {n,m}? {n,m}+ matches the previous character or construct at least n times, but no more than m times Page 429 F{2,4} matches "FF" in "#FF0000" and the last four F's in "#FFFFFF" Data Analysis and Visualization Guide - Platfora Expressions Regex Capturing Groups Groups are specified by a pair of parenthesis around a subpattern in the regular expression. By placing part of a regular expression inside parentheses, you group that part of the regular expression together. This allows you to apply regex operators and quantifiers to the entire group at once. Besides grouping part of a regular expression together, parenthesis also create a capturing group. Capturing groups are used to determine which matching values to save or return from your regular expression. A regular expression can have more than one group and the groups can be nested. The groups are numbered 1-n from left to right, starting with the first opening parenthesis. There is always an implicit group 0, which contains the entire match. For example, the pattern: (a(b*))+(c) contains three groups: group 1: (a(b*)) group 2: (b*) group 3: (c) By default, a group captures the text that produces a match. Besides grouping part of a regular expression together, parenthesis also create a capturing group or a backreference. The portion of the string matched by the grouped subexpression is captured in memory for later retrieval or use. Capturing Groups and the Regex Line Parser When you choose the Regex line parser during the Parse Data phase of the data ingest process, Platfora uses capturing groups to determine what parts of the regular expression to return as columns. The Regex line parser applies the user-supplied regular expression against each line in the source file, and returns each capturing group in the regular expression as a column value. For example, suppose you had user records in a file, and the lines were formatted like this: Name: John Smith Address: 123 Main St. Age: 25 Comment: Active Name: Sally R. Jones Address: 2 E. El Camino Real Age: 32 Name: Rod Rogers Address: 55 Elm Street Comment: Suspended You could use the following regular expression to extract the Full Name, Last Name, Address, Age, and Comment column values: Name: (.*\s(\p{Alpha}+)) Address:\s+(.*) Age:\s+([0-9]+)(?:\s+Comment:\s +(.*))? Capturing Groups and the REGEX Function The REGEX function can be used to extract a portion of a string value. For the REGEX function, only the value of the first capturing group is returned. For example, if you wanted to match all possible email address strings with a pattern of [email protected], but only return the provider portion of the email address from the email field: REGEX(email,"^[a-zA-Z0-9._%+-]+@([a-zA-Z0-9._-]+)\.[a-zA-Z]{2,4}$") Capturing Groups and the REGEX_REPLACE Function Page 430 Data Analysis and Visualization Guide - Platfora Expressions The REGEX_REPLACE function is used to match a string value, and replace matched strings with another value. The REGEX_REPLACE function takes three arguments: an input string, a matching regex, and a replacement regex. Capturing groups can be used to capture backreferences (see Backreferences), but do not control what portions of the match are returned (the entire match is always returned). Backreferences allow you to capture and reuse a subexpression match inside the same regular expression. You can reuse a capturing group as a backreference by referring to its group number preceded by a backslash (for example, \1 refers to capturing group 1, \2 refers to capturing group 2, and so on). For example, if you wanted to match a pair of HTML tags and their enclosed text, you could capture the opening tag into a backreference, and then reuse it to match the corresponding closing tag: (<([A-Z][A-Z0-9]*)\b[^>]*>.*?</\2>) This regular expression contains two capturing groups, the outermost capturing group (which captures the entire string), and one which captures the string matched by [A-Z][A-Z0-9]* into backreference number two. This backreference can then be reused with \2 (backslash two) to match the corresponding closing HTML tag. When referring to capturing groups in the previous regular expression, the backreference syntax is slightly different. The backreference group number is preceded by a dollar sign instead of a backslash (for example, $1 refers to capturing group 1 of the previous expression). An example of this would be the REGEX_REPLACE function, which takes two regular expressions: one for the matching string, and one for the replacement string. The following example matches the values in a phone_number field where phone number values are formatted as xxx.xxx.xxxx, and replaces them with phone number values formatted as (xxx) xxxxxxx. Notice the backreferences in the replacement expression; they refer to the capturing groups of the previous matching expression: REGEX_REPLACE(phone_number,"([0-9]{3})\.([[0-9]]{3})\.([[0-9]] {4})","\($1\) $2-$3") In some cases, you may want to use parenthesis to group subpatterns, but not capture text. A noncapturing group starts with (?: (a question mark and colon following the opening parenthesis). For example, h(?:a|i|o)t matches hat or hit or hot, but does not capture the a, i, or o from the subexpression. Page 431
© Copyright 2026