Data Integration Console User Manual

Data Integration Console User Manual
Document publish date: 06/04/15
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #2
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
This document may not be reproduced, modified or distributed without the
prior written permission of GoodData Corporation.
GOODDATA CORPORATION PROVIDES THIS DOCUMENTATION AS-IS
AND WITHOUT WARRANTY, AND TO THE MAXIMUM EXTENT
PERMITTED, GOODDATA CORPORATION DISCLAIMS ALL IMPLIED
WARRANTIES, INCLUDING WITHOUT LIMITATION THE IMPLIED
WARRANTIES OF MERCHANTABILITY, NON-INFRINGEMENT AND
FITNESS FOR A PARTICULAR PURPOSE.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #3
Table of Contents
Table of Contents
3
Introduction to Data Integration Console
5
Users of Data Integration Console
5
Before You Begin
6
Recommended Practices on Managing Data Loads
6
Interactions between Data Integration Console and CloudConnect
Designer
7
Accessing Data Integration Console
10
CloudConnect Resources
10
CloudConnect Training
11
CloudConnect Documentation
11
Managing Data Loading Processes
13
Data Integration Console Overview Screen
13
Data Integration Console Projects Screen
16
Data Integration Console Project Details Screen
17
Deploying a Process
19
Scheduling a Process
21
Schedule a Process on the Data Integration Console
Configuring Schedule Parameters
Defining Project Parameters
Testing Parameter Execution
Parameter Usage Tips
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
21
23
24
25
25
Data Integration Console User Manual
Referencing the Project ID
Configuring Automatic Retry of Failed Processes
Troubleshooting Failed Schedules
Page #4
25
26
27
Configuring Schedule Sequences
28
Timing the Schedule
29
Custom Schedules
30
Schedule Details
31
Schedule Execution History
33
Running Schedules On-Demand
36
Batch Loading of Data through Data Integration Console
Notification Rules
Configuring Notification Rules
37
39
41
Example Notification Message
43
Modifying Legacy Recipients through the Gray Pages
43
Data Integration Console Process Logging
Deleting Graphs and Processes
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
44
44
Data Integration Console User Manual
Page #5
Introduction to Data Integration Console
The Data Integration Console enables project administrators to manage and track
the data loading processes that are supplying data to their GoodData projects.
Through Data Integration Console, you can perform the following tasks:
l
Monitor successful or failed data loads through a single dashboard
l
Manage many data loads within multiple projects at the same time.
l
Automate and monitor data loading processes applied to your GoodData
projects.
l
Postpone or disable a scheduled execution.
l
Trigger manually ad-hoc executions of any data loading process.
l
Set up and receive alerts and notifications related to data loading process
execution.
l
Review historical performance of the data loads.
l
Review logs generated during execution.
NOTE: In some situations, you may find it easier to monitor and
schedule data loads from within your own application. For more
information on the GoodData APIs, see GoodData
API Documentation.
Users of Data Integration Console
The Data Integration Console can be used for managing many processes and
schedules for the following types of users.
l
ETL Developer. During the implementation of ETL projects, developers
may find the console useful for troubleshooting issues with process
execution and monitoring performance of execution runs. Additionally,
developers can set up notifications for other members of the implementation
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #6
team, so that they are informed of the status of data in the projects under
development.
l
l
API Developer: Through GoodData APIs, developers can access Data
Integration Console functionality to manage data population and
maintenance of one or more projects.
System administrator. After the project has been migrated to maintenance
and support, the console provides on-going information about scheduled
executions of ETL, either through the console or configured notifications. As
the volume and number of ETL processes changes, administrators can
adjust schedules so that ETL is processed smoothly. As needed, scheduled
processes can be stopped, disabled, deleted, or executed on an ad-hoc
basis.
Before You Begin
Before you begin using Data Integration Console, please verify the following:
1. You are an administrator to at least one GoodData project.
NOTE: All projects to which you have access are displayed in
the console, so all project users can monitor the project's data
loading processes. However, you may modify only the projects
that 1) contain at least one process deployed to the platform and
2) you are an Administrator in the project.
2. You are familiar with the ETL graphs of that project, as defined in the
CloudConnect project supplying data to your GoodData project. See
CloudConnect Designer User Manual (PDF).
Recommended Practices on Managing Data Loads
When you are using the Data Integration Console, CloudConnect, or other
method to execute ETL processes, please keep in mind the following important
considerations:
l
When deploying a new schedule, you should set up a notification to inform
you if the data load process has failed. See New Process Schedule.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
l
l
l
l
Page #7
After an execution begins, data is being loaded into the system. If the
execution is stopped, the data that has been loaded remains in the system.
The GoodData Platform does not prevent you from uploading duplicate
data. Unless your ETL process has been designed to prevent loading
duplicates, you should be careful using ad-hoc executions, which may
create duplicate rows of data.
Scheduling ETL processes to execute during business hours may impact
the performance of the projects into which data is being loaded. Where
possible, schedule regular data loads during off-peak hours.
All schedule timing is based on UTC. Manual scheduling entries use the
cron format.
Interactions between Data Integration Console and
CloudConnect Designer
The CloudConnect Designer desktop application is used to develop logical data
models and ETL graphs. All graphs that you have deployed are available in Data
Integration Console.
l
l
l
A graph is the graphical representation of the set of transformations
required to extract, transform, and load source data into your GoodData
project. A graph is the minimum unit of processing that can be executed at
one time and is specified in a single file. Graphs are defined in
CloudConnect Designer, from which they are published to one or more
GoodData projects in the platform.
A process is one or more graphs of a CloudConnect project deployed into
the GoodData Platform.
A schedule is the automated execution of the graphs in a deployed
process. Schedules can be created only after the process has been
deployed into the platform; they cannot be created in CloudConnect
Designer.
In the diagram below, you can review the relationships between graphs,
processes, and the schedules defined in the Data Integration Console to manage
them.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #8
Figure: ETL Graphs, Processes, and Schedules
In the above image, the Data Integration Console manages the definition of the
following items:
l
l
timing. Associated with each schedule is the interval between executions of
the ETL graph. Graphs can be scheduled to execute as frequently as every
fifteen minutes. See Scheduling a Process.
Schedule parameters. As part of any schedule, you can define specific
parameters to apply to the graph when it executes. For example, you can
define separate schedules to pull from specific user accounts in a source
system by defining schedule parameters for the graph's execution. See
Configuring Schedule Parameters.
NOTE: These parameter settings override any settings defined
for the project parameters, which are defined within
CloudConnect Designer and are included as part of the
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #9
CloudConnect project definition. See
https://developer.gooddata.com/article/cloudconnect-usingparameters.
l
l
graph. Each schedule is associated with an individual graph. Depending
on how the ETL is designed, however, this graph can be an orchestrator
graph, which calls a number of other ETL graphs as part of its normal
operation.
Notifications (not pictured). For each process, you can define notifications
to alert stakeholders on the status of process executions. These
notifications are specific to the selected process only. See Notification
Rules.
NOTE: A notification applies to all schedules of the entire
process. If the process contains multiple schedules, the
notification should be designed to support all schedules of the
process.
Process Deployments:
When a process is deployed to the platform, it becomes an unscheduled
process in the Data Integration Console.
NOTE: Unscheduled processes must have schedules
associated with them before they can be executed in the
Console.
l
l
A scheduled process is a process in the Data Integration Console for
which a recurring schedule has been created. When the process is
scheduled to execute, it is queued for processing by the GoodData Platform.
Scheduled processes may also be executed on-demand, although there are
some considerations to review before executing processes off of their
normal schedule. See Running Schedules On-Demand.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #10
Accessing Data Integration Console
NOTE: You must be an Administrator of any project that you
wish to manage through the Data Integration Console. See
Before You Begin.
You may access the Data Integration Console using one of the following
methods:
l
l
l
If you are a project administrator of the project currently loaded in the
GoodData Portal, click the menu that displays your name. Select Data
Integration Console.
Project administrators may also click the Go to Administration link in the
Manage page.
You may access it via the following URL:
https://secure.gooddata.com/admin/disc/
NOTE: Please be sure to include the final backslash.
Exiting the Console:
To return to the GoodData Portal displaying the project, select the menu that
displays your name. Then, select Dashboards.
l
If you are working with a specific project in Data Integration Console, you
may click Go to Dashboards at the top of the Project Details screen to
open the selected project in the GoodData Portal.
CloudConnect Resources
The following resources are available to assist you in getting you up and running
with CloudConnect.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #11
CloudConnect Training
GoodData University offers a range of instructor-led online training classes,
including multiple offerings on the CloudConnect Designer.
l
Please visit GoodData University.
CloudConnect Documentation
The following documentation resources are available for CloudConnect
Designer, ETL, and the CloudConnect projects fed by the data.
Resource
Description
Link
CloudConne
ct User
Manual
General
documentatio
n on the
CloudConnec
t Designer
and related
platform
components.
CloudConnect Designer User Manual (PDF)
CloudConne
ct LDM
Modeler
Guide
Documentatio
n on the LDM
Modeler
component of
the
CloudConnec
t Designer.
CloudConnect LDM Modeler Guide (PDF)
Data
Integration
Console User
Manual
Documentatio
n on Data
Integration
Console and
Process
management
Data Integration Console User Manual (PDF)
Developer
articles on the
MAQL DDL
documentatio Data
Definition
n
Language
http://developer.gooddata.com/reference/maql
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #12
version of
MAQL, which
is used for
defining
logical data
models
Developer
CL Tool
articles on the
documentatio
commandn
line tool
Developer
Portal
Documentatio
n on
implementing
schedules
and
processes
http://developer.gooddata.com/reference/cltool
http://developer.gooddata.com/article/scheduli
ng-and-notifications
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #13
Managing Data Loading Processes
Through Data Integration Console, you can manage the loading of data into your
projects, as defined by your processes, including tracking their execution,
reviewing logs, or scheduling them for regular execution.
When you login, you are placed in the Overview screen, where you can review
the current counts for failed, running, scheduled, and successful executions of
your processes.
l
See DISC Overview Screen.
l
See DISC Projects Screen.
NOTE: A process corresponds to one or more graphs and the
schedules associated with them. These graphs are created and
tested in CloudConnect Designer before they are deployed to
your GoodData projects, after which they appear in Data
Integration Console. See Interactions with CloudConnect
Designer.
l
To logout of GoodData, select the menu displaying your name. Then, select
Logout.
Data Integration Console Overview Screen
In the Overview screen of the Data Integration Console, you can review the
counts for execution outcomes for processes in your projects.
l
l
The listed projects are the ones to which you have access.
You may make modifications only to the listed projects for which you are an
Administrator and that have loaded processes.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #14
Figure: Overview Screen
Click any of the counts to review the individual projects that contribute to it:
l
Failed: count of projects where one or more processes were started yet
failed to complete.
Executions that have failed may have no data updates or
incomplete date updates applied to the target project. Failed
executions should be explored and resolved as soon as
possible to prevent project users from working with
inaccurate data. Try to keep the count of failed executions
at 0.
l
l
l
Running: count of projects where one or more processes are currently
being executed by the GoodData Platform.
Scheduled: Count of projects where one or more processes have been
scheduled for execution. These processes have been placed in the queue
and are run as soon as possible.
Successful: Count of projects where one or more processes have
successfully executed.
Tip: All of your processes should be listed in this category. Fix
or disable those that are not.
For more information on statuses, see Schedule Execution History.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #15
For each project for which you are an Administrator, you can review the
processes and their related schedules in a hierarchy. For each schedule, you can
review details on the execution.
l
l
Click a schedule name to review its details. See Schedule Details.
To execute a schedule immediately, click the checkbox next to its name.
Then, click Run or Restart.
Depending on the state of the data in the project, restarting
a partially completed schedule may introduce duplicate
data in the project. See Running Schedules On-Demand.
l
l
To disable a schedule, click its checkbox. Then, click Disable.
To review the logging information for an executed schedule, click the Log (
) icon.
To explore additional information on the project, process, or schedule, click the
corresponding name in the detail table.
l
See Project Details Screen.
l
See Scheduling a Process.
General Bulk Operations:
In the Overview screen, you can apply the same operation to multiple projects
and processes at the same time.
l
l
l
l
For more information on applying bulk operations to projects, see DISC Projects Screen.
To select all processes of all projects listed on the screen for which you are
an administrator, click the checkbox next to the Run button.
You may also select one or more processes and projects (which applies
them to all processes in the project).
After you make your selections, click one of the buttons above the list of
projects.
Restarting and Redeploying Operations:
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
l
l
l
l
Page #16
Before stopping multiple projects, please verify that no processes are
currently running for those projects. Any partially loaded data remains in the
project, and a restart of the process may create duplicate data.
Before you restart multiple data loads, please verify that you have
addressed any issues with the process. You may need to review the log and
download and fix the process before redeploying.
If you disable multiple projects, you may use the status filter on the Projects
page to review these projects. See DISC - Projects Screen.
Before running a schedule, you should verify that you aren't loading
duplicate data. Check the status of the last few scheduled executions.
Data Integration Console Projects Screen
In the Projects screen of the Data Integration Console, you can review all of the
projects to which you have access.
l
Click Projects in the menu bar.
NOTE: You may only make modifications to projects for which
you are a project Administrator.
Figure: Projects Screen
l
l
To search your available projects, enter a project identifier or a search string
for the project name in the textbox.
To filter the list of projects based on the results of last execution of the
process, make a selection from the drop-down above of the table.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
l
l
Page #17
To the left of the name of each project, you can review the results of the
most recently scheduled process. A green checkmark indicates a
successful execution.
To review the details of a project's processes, click the name of the project.
See Project Details Screen.
Bulk Operations:
In the Projects screen, you can apply the same operation to multiple projects at
the same time.
l
For more information on applying bulk operations to processes, see DISC Overview Screen.
NOTE: The order of execution of bulk operations cannot be
guaranteed based on the selections in this screen. If there are
dependencies on execution order, you should avoid using bulk
operations.
l
l
l
To select all projects for which you are an administrator, click the checkbox
next to the Deploy Process button.
You may also select one or more specific projects.
After you make your selections, click the button above the list of projects to
apply the operation.
Data Integration Console Project Details Screen
In the Project Details screen, you can review the individual processes and
schedules that are associated with the selected project. By default, the current
schedules are displayed, with schedule execution history over the preceding
seven days displayed.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #18
Figure: Project Details Screen
l
To create a new schedule for a process in the displayed project, click New
schedule.
l
l
l
l
l
You may also create a schedule for an individual graph. Click the
graphs tab in the Project Details screen. Then, click the Schedule link.
See Scheduling a Process.
Select the schedule link to review details of the process schedules. See
Schedule Details.
To deploy a process from your local desktop to the selected project, click
Deploy Process. See Deploying a Process.
To open the project in the GoodData Portal, click Go to dashboards.
Tip: Under the project's metadata, you can review and copy the
internal project identifier, which may be useful in locating and
accessing the project through other interfaces.
For each process, you can review the execution history over the preceding seven
days or access the schedule, graphs of the process, and metadata associated
with the process. The current schedule is listed, followed by indicators the
executions attempted over the previous seven days.
l
A green vertical bar indicates a successful execution. A red bar indicates a
failed execution.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
l
Page #19
To download the process for use in CloudConnect Designer, click
Download.
Tip: Download the entire process for review, testing, and
debugging in CloudConnect Designer.
l
l
To delete the process from the project in the platform, click Delete. A new
version of the process can be deployed from CloudConnect Designer or
Data Integration Console. For more information on uploading through the
Console, see Deploying a Process.
To configure notification rules for the scheduled execution, click the link
indicating the count of notification rules. See Configuring Notification Rules.
Deploying a Process
Through the Project Details screen, you can deploy a process to the currently
selected project. Click Deploy Process.
Figure: Deploying a process
Steps:
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #20
1. To select a package, click Browse. Navigate your local environment to
select the ZIP file to upload. The file must contain all resources required for
the process to execute.
NOTE: Deployed process packages must be less than 1MB in
size.
l
l
CloudConnect processes can be extracted through the application.
The local CloudConnect project must be saved into a ZIP file, which
can be uploaded through this interface. See CloudConnect Designer
User Manual (PDF).
Ruby scripts need to include all components of the process,
including scripts for ETL, logical data model, and any parameter files.
These packages need to be bundled in a single ZIP file.
The Ruby option is for internal use only. It will be enabled
for external users in a later release.
Tip: You may also deploy Ruby packages using scripts that
reference commands in the GoodData Ruby SDK. See
http://sdk.gooddata.com/gooddata-ruby/.
2. For the Process name, enter a descriptive value, which appears in the Data
Integration Console interface.
3. To upload the process to the target GoodData project, click Deploy.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #21
Scheduling a Process
After you have deployed your processes to your GoodData project, you can
schedule execution of them through Data Integration Console.
NOTE: You do not need to specify login credentials or be
logged into the GoodData Platform at the time of execution for a
schedule process to be initiated.
l
l
To schedule execution of a process, click New Schedule in the Project
Details screen. See Project Details Screen.
You may also execute schedules on-demand. Depending on the process
and the current state of the data in the target project, you may be inserting
duplicate data in the project. See Running Schedules On-Demand.
Schedule a Process on the Data Integration Console
Administrators
Create a new data loading schedule to automatically execute an existing data
loading process at a specified time. Only one data loading process can be
executed at a time. You can schedule only data loading processes that already
exist. See Preparing a Data Loading Process.
NOTE: Data loading during business hours may negatively
impact system performance. Frequent updates may also impact
the performance of your projects. See Timing the Schedule.
Steps:
1. Click your user name > Data Integration Console.
2. Click Projects to open the Projects page. Click the name of the project
where you want to create the schedule and click New schedule.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #22
3. Select the process to execute and frequency of execution. The following
type of processes exist:
l
l
l
Ruby scripts
CloudConnect project -- Select a.grf file. For more information about
.grf files see Preparing a Data Loading Process.
Data loading process -- Loads data from the ADS to a data mart. For
data loading processes, you must also select the datasets to load data
to under Upload Data To.
Tip: Use the after selection to configure schedules to
execute sequentially. See Configuring Schedule
Sequences.
4. For .grf files and ruby scripts, optionally add additional parameters to your
schedule.
A project parameter is a name-value pair that can be passed to the graph
before execution begins. If the graph is designed to consume it, the project
parameter can be used to define variables specific to the execution. For
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #23
example, you can define project parameters for customer-specific login
credentials for an external data source. See Configuring Schedule
Parameters.
5. Optionally specify a new schedule name.
6. Click Schedule.
7. The schedule is saved and the GoodData Platform executes the process as
scheduled.
Tip: You can add a retry delay to a schedule after you create it.
See Configuring Automatic Retry of Failed Processes.
Configuring Schedule Parameters
In a schedule for a process, you may reference parameters from your
CloudConnect project. Schedule parameters are inputs to be applied to the
execution of the scheduled process. Using parameter values, the process can be
configured to behave differently depending on the circumstances. For example,
you can specify parameters to load multiple projects using the same process.
In CloudConnect Designer, a parameter is a name-value pair that is stored
internally in a graph or externally at the project level. Parameters that may
be modified by other CloudConnect users must be stored as external
parameters.
l
In CloudConnect Designer, parameters and their values are stored in the
*.prm files.
l
NOTE: Schedule parameters override any parameter settings
defined within CloudConnect Designerand are applied only
when the data loading process executes. So, you can use your
schedule parameters to manage specific configuration of
multiple schedules, such as changing the process to run for
each customer on different schedules. For more information, see
Testing Parameter Execution.
l
For more information on the uses of parameters in CloudConnect
Designer, see http://developer.gooddata.com/advanced-guides/cloud-
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #24
connect-best-practices/cloudconnect-using-parameters.
Figure: Configuring schedule parameters
You may specify secure and unsecure parameters to include in your schedule.
l
l
Secure parameters are useful for passing in sensitive data, such as
usernames and passwords, as part of the transformation. These parameter
values are encrypted and do not appear in clear-text form in any GUI or log
entries.
Before saving the schedule, use the Show Value checkbox to display the
value of a secure parameter for review purposes. When the schedule is
saved, secure parameter values are hidden.
Defining Project Parameters
Through CloudConnect Designer, you can define parameters for your projects.
which makes them available for use in Data Integration Console.
NOTE: Parameters must be defined within one of the
CloudConnect Designer graphs used in your process to be
available for inclusion in your process schedule. The values
specified in the schedule take precedence over any values
specified in the graph definition and are applied to all graphs in
the process.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #25
A CloudConnect project's parameters are defined in its workspace.prm file. See
CloudConnect Designer User Manual (PDF).
Testing Parameter Execution
In CloudConnect Designer, you can create and test your parameters before you
apply them to your production data loading processes in the GoodData Platform.
Parameters can be added by file or by manual entry.
l
For more information, see http://developer.gooddata.com/advancedguides/cloud-connect-best-practices/cloudconnect-using-parameters.
Parameter Usage Tips
The following are some tips on how to use parameters effectively in defining your
schedules.
l
l
l
l
l
l
When defining a process, you can use parameters to switch between your
development, testing, and production environments.
You can also use parameters for deployment of a process across multiple
customer projects.
Define and use parameters for credentials and other data that can easily
change.
Use secure parameters for sensitive data such as passwords.
Define default parameters in the workspace.prm file. As needed, you can
override them during execution using the schedule parameters.
For more information, see http://developer.gooddata.com/advancedguides/cloud-connect-best-practices/cloudconnect-using-parameters.
Referencing the Project ID
This section provides an example of how to use project parameters in your
schedules.
For reference purposes, you may wish to create a parameter that corresponds to
the GoodData project identifier. For example, you may define the PROJECT_ID
parameter in the external parameters as the following:
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #26
PROJECT_ID=(project_identifier)
where
l
(project_identifier) is the ID for your GoodData project. This identifier never
changes. You can retrieve the GoodData project identifier through
CloudConnect Designer. See CloudConnect Designer User Manual (PDF).
Configuring Automatic Retry of Failed Processes
Occasionally, scheduled executions of ETL processes in the platform may fail.
These failures may be due to configuration issues, network interruptions,
scheduled maintenance, or similar issues.
NOTE: By default, a process that fails is not restarted
automatically. To enable auto-restart, you must add a retry
delay.
When a user-defined delay is specified, the platform automatically re-runs the
ETL process if it fails, after the period of time specified in the delay has elapsed. If
it fails again, execution is attempted again after the same period of time.
l
l
The minimum permitted delay is 15 minutes.
When a schedule fails 5 times in a row, a notification email is delivered to
you.
NOTE:If a process fails 30 times in a row, it is automatically
disabled and cannot be re-run until it is manually enabled again.
See Troubleshooting Failed Schedules.
In the schedule definition area of the Data Integration Console, you may define
the retry delay.
l
To change the retry delay, click Add retry delay. Enter the value in minutes
that you would like for the platform to wait before retrying the ETL process.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #27
Figure: Add retry delay
If your ETL processes need to occur in a specific sequence or if your data loads
may push the maximum limits permitted for your organization, you should specify
your retry delays for each process with some care.
l
l
If the retry delay overlaps the next execution of the scheduled process, then
the failed scheduled execution is dropped, and the latest scheduled
execution is processed.
ETL processes that are retried are inserted into a processing queue, so they
may not be processed at the exact interval.
Troubleshooting Failed Schedules
If an ETL schedule fails 5 consecutive times, a notification email is sent. Unless
the underlying issues are corrected, the process is automatically disabled after 30
failed executions in a row. You must re-enable it manually after all issues are
resolved.
If you have been notified when your schedule has failed repeatedly, the root
cause of the error may vary. Please check the following:
l
Your credentials are valid.
l
Your connections are set up properly.
l
l
l
The changes you’ve made since the last successful execution haven’t
broken the graph supplying data.
All data sources are accessible.
All GD Writer components in your graphs have been properly configured
with the appropriate property settings.
Check the last text log in the schedule to assist in identifying issues. Failing
schedules should remain disabled until the issues are addressed.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #28
Configuring Schedule Sequences
As needed, you can configure schedules to be executed in sequence, creating a
chain of updates to your data. For example, suppose your project receives order
updates through an ETL process. Before you update your order ETL, you might
wish to provide updates from your enterprise master data ETL. In this manner, any
new customers or products referenced in the order stream are available in the
project.
When configuring a schedule, you can specify that the schedule should be
executed after the successful execution of another schedule:
Figure: Configuring a schedule to occur after another
The triggering schedule must successfully execute in order to execute the
schedule.
l
l
Schedules cannot be sequenced in a loop. A schedule can be used only
once in a scheduling sequence.
If a schedule is deleted, all schedules that are supposed to run after it must
be reconfigured. The reported error message for these schedules is "Trigger
schedule missing!"
You can navigate a sequence of schedules. Links to connected schedules are
displayed next to the schedule name:
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #29
Figure: Click links to navigate the sequence of schedules
Timing the Schedule
Processes can be scheduled to execute according to calendar intervals or based
on cron timings you specify.
NOTE: If the execution time of a run is greater than the interval
between scheduled runs, then the next scheduled run is
dropped, and the third scheduled run is later executed
according to the schedule. For example, if a daily run takes 25
hours to execute, the run is executed every two days. You may
need to tune your timings based on the average length of your
runs.
l
See Custom Schedules.
Steps:
1. Select the interval from the drop-down. The minimum supported frequency
is ever 15 minutes.
If you select a short interval for a large dataset, you may
experience performance impacts on the system. GoodData
recommends that you schedule your processes to execute
and complete during off-peak hours and at intervals that do
not impact system performance.
2. As needed, make selections from the provided drop-downs for the selected
interval to configure the specific time within that interval for the process to
run.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #30
NOTE: All timing is based off of UTC (Coordinated
Universal Time), which corresponds to Greenwich Mean
Time. Please adjust your timing accordingly.
3. To save your process schedule changes, click Save Changes.
Custom Schedules
If none of the available scheduling options is appropriate for your process, you
can configure a custom schedule, as needed. The Data Integration Console
enables the configuration of schedules using the cron format.
cron is a Unix-based job scheduling mechanism that enables users to trigger the
execution of scripts or other processes at predefined intervals. The cron timing
format enables more options for configuring the process execution timing.
NOTE: GoodData does not support the use of seconds in cron
expressions. Please enter five-digit cron expressions.
NOTE: You should understand the formatting requirements of
cron before you specify custom schedules. This information is
publicly available.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #31
Schedule Details
In the Schedule Details pane, you can review the history of runs of the selected
schedule over time and make modifications as necessary to the schedule.
Modifications to the schedule include the graph to execute, the parameters to
apply, and the retry delay.
l
l
The listed username identifies the user under which the schedule executes.
This user currently owns the schedule.
The owner of the schedule may differ from the owner of the process, since
processes can be downloaded and redeployed at any time. For example, if
a process created by User A is redeployed by User B, all schedules
associated with the process are now owned by User B, who will be the user
under which all schedules for the process are henceforth executed.
Figure: Process History
Commands:
l
l
By default, a schedule's name is set to the graph's name. To change the
name of the schedule, click the graph name. Enter the new name and click
Save.
To switch the graph used in this schedule, select a different graph from the
drop-down at the top of the pane.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
l
Page #32
To queue the process for execution, click Run. The process is executed as
soon as possible.
NOTE: You may queue a schedule for execution at any time.
However, executing the process may create issues with the data
that is loaded in the project. Please be aware of the possibilities
before running schedules on-demand. See Running Schedules
On-Demand.
l
l
l
To stop the process that has been queued for execution or is being
executed, click Stop.
To delete a schedule, click Delete. The schedule is deleted, while the
process and any associated notifications remain in the platform.
As needed, you may disable a schedule, which prevents scheduled
executions until you re-enable the schedule. Click Disable.
l
You may click Run to execute disabled or enabled schedules.
NOTE: If a schedule repeatedly fails, it may be automatically
disabled, and your project data is no longer refreshed until you
fix the issue causing failure and re-enable the schedule. For
more information on debugging failing schedules, see
Troubleshooting Failed Schedules.
l
l
l
You can review the history of schedule execution at the bottom of the
screen. See Schedule Execution History.
To change the timing of the schedule, select a new interval from the dropdown. Update other properties as needed. Then, click Save Changes. See
Timing the Schedule.
Parameters can be added to the process to customize it for individual
executions. See Configuring Schedule Parameters.
Changing the graph:
To modify the graph to use in the schedule, select the graph from the drop-down
at the top of the window. The next time the process executes, the new graph is
used to run the process.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #33
NOTE: When the new graph runs, it uses any previously
defined parameters as part of the execution. You may wish to
modify these parameters before execution of the new graph.
See Configuring Schedule Parameters.
Schedule Execution History
At the bottom of the Schedule Details screen, you can review the history of the
schedule executions.
Figure: Process History Details
In the above figure, the last seven days of executions of the graph are displayed.
For each execution, you can review:
l
l
In the dated history bar, you can see the instances in which the process has
been executed over the past seven days.
The icon on the left side of the screen indicates whether the process
executed successfully or not.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
l
l
Page #34
A hand icon indicates an execution that was triggered manually.
Red text indicates that the process encountered an error and failed.
Process Execution Result States
The table below identifies the possible states of schedules in the list.
l
l
You may access the log generated for the process run. To open the log,
click the ( ) icon. See Process Logging.
You may also review the runtime duration of the process execution, as well
as start and end timestamps.
State
Successful
Failed
Description
Tip: All of your processes should be listed in
this category. Fix or disable those that are
not.
Executions that failed to complete or that were manually
stopped have been marked in red. The displayed ERROR
message provides information on what caused the process to
fail. To review the log for further details, click the
icon.
Executions that have failed may have no
data updates or incomplete date updates
applied to the target project. Failed
executions should be explored and
resolved as soon as possible to prevent
project users from working with
inaccurate data. Try to keep the count of
failed executions at 0.
NOTE: Stopped processes are categorized
as errors, since the data load is incomplete.
All incomplete loads are treated as errors.
Running
Scheduled graph execution has begun in the platform. A
timestamp indicates when the execution began and the
current duration of the execution.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #35
Scheduled
Graph has been scheduled for execution at the appropriate
time.
Disabled
The scheduled has been disabled. It will not automatically run
until it has been re-enabled.
l
Broken
Schedule
Disabled schedules can be manually re-run, although
there are some risks with doing so. See Schedule
Details.
Schedules whose graph no longer exists are marked as
broken schedules. Typically, schedules are broken if a
process is redeployed under a new name. The scheduled can
be fixed by selecting the appropriate to graph to run in the
schedule definition.
Unscheduled These processes do not have a schedule associated with
them.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #36
Running Schedules On-Demand
As needed, you can run a schedule at any time. The processes of the schedule
are queued in the platform and are executed as soon as resources become
available.
l
In the Data Integration Console, you must create a schedule for a process
before you may execute process.
GoodData does not prevent the loading of duplicate data
through a process. Particularly if you run a process at an
ad-hoc interval, it is possible to load duplicate versions of
data. Please use this feature carefully.
Depending on the volume and complexity of the process,
executing a process during peak hours can impact
performance of the GoodData project that it is updating.
Where possible, execute processes during off-peak hours.
Tip: As a best practice, you should create one or more data
validation reports to identify how your processes are working.
To see the effects of your processes in your GoodData projects,
you can open a different browser tab and navigate to
https://secure.gooddata.com. Open a report that is populated by
the process.
Steps:
1. In the Project Details screen, select the scheduled graph you wish to run.
2. Then, click Run.
3. The schedule is queued for execution and is run as soon as possible.
NOTE: The schedule is executed as soon as platform resources
are available.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #37
To stop a schedule in the middle of execution, click the Stop button.
Figure: Stop button
NOTE: When an upload is stopped in the middle of execution,
the data that has already been uploaded to the project remains
in the project. If you are unsure whether your ETL process can
safely resume loading data, you can manually delete the
uploaded data from the Manage page of your project.
Through the following API endpoint, you may trigger on-demand executions of
processes without scheduling:
/gdc/projects/[project_id]/dataload/processes/[process_id]
Batch Loading of Data through Data Integration Console
Through the Data Integration Console, you can enable execution of multiple files
in a single CloudConnect process.
To enable batch loading of multiple files that you have posted to your projectspecific storage, please create the following parameter in your schedule:
GDC_USE_BATCH_SLI_UPLOAD=TRUE
When enabled in your schedule, the process attempts to load all files in projectspecific storage, based on the JSON manifest file that you have created.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
l
Page #38
Batch loading of files requires additional configuration in your ETL graphs in
CloudConnect Designer. For more information, see
https://developer.gooddata.com/article/multiload-of-csv-data.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #39
Notification Rules
If desired, you can configure notifications to be delivered via email for process
execution events. These notification rules can be used to update stakeholders
when data is refreshed or project administrators when there were problems with
the process execution run.
l
To configure notification rules, click the notification rules link at the top of the
Project Details screen. See Project Details Screen.
NOTE: A notification rule is associated with a process, not a
schedule. If the schedule is removed, the notification remains. If
the process is removed, any associated notifications are
removed from the Console. However, these notifications still
exist in the project and may be accessed through the APIs. See
GoodData API Documentation.
Tip: A notification applies to all schedules for the process and
should support the corresponding event for each schedule of the
process. You can use the variable identifying the executable to
assist in identifying the scheduled graph or script that was run.
Tip: REST-based notifications may also be configured for
delivery using the GoodData APIs. See
http://developer.gooddata.com/article/setting-up-thenotifications-using-api.
This list of notification rules for the process is displayed.
Figure: List of notification rules for this process
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #40
NOTE: Notification rules created in the Data Integration
Console can be modified by any administrator of the process for
which the notification is configured. For notification rules that
were created through the legacy method in the gray pages, only
the original owner of the notification can make changes.
Through the gray pages, you can locate the original resource
where the owner of the notification is identified.
l
l
To edit any notification rule, select it and make your edits as needed. See
Configuring Notification Rules.
To create a new notification rule for the process, click Add notification rule.
The Notification Rules window is displayed, where you can specify new ones. If
you have previously configured notification rules, you can review them in this
window. See Configuring Notification Rules.
l
l
To delete a notification rule, click the Trash icon.
To close the Notification Rules window, click Close dialog. If notification
rules have been added or deleted, the number of notification rules is
updated at the top of the Project Details screen.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Configuring Notification Rules
Figure: Notification Rules window
To configure a new notification rule, please complete the following steps.
Tip: When you are beginning to use a new process, you may
wish to create notification rules for all possible events. As the
process stabilizes over a number of successful executions, you
may choose to remove some of the notifications. For stable
processes, you should retain at least the failed notification.
Steps:
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Page #41
Data Integration Console User Manual
Page #42
1. Enter a valid email address or alias in the Send Email To textbox.
NOTE: You may configure only one recipient address per
notification rule. Some legacy notifications may include
references to multiple recipients. These notifications must
be modified through the gray pages. For more information,
see Modifying Legacy Recipients through the Gray Pages.
2. From the drop-down, select the event that triggers the notification:
1. success: Notification is sent upon successful completion of the
process.
2. failure: Notification is sent if the process fails to complete.
3. process scheduled: Notification is sent if the process has been
added to the queue for execution. Typically, the time between this
event and the process started event is very short.
4. process started: Notification is sent when the process begins
execution.
5. custom event: You may specific events that are custom to the
specific project. For more information on defining custom events, see
https://developer.gooddata.com/article/creating-custom-notificationevents.
3. Enter a meaningful text message in the Subject. This message should
indicate that the event occurred.
4. You can insert variables into the Subject or the body of the message.
1. Below is an example variable that you can insert:
{$params.USER_EMAIL}
NOTE: The list of available variables varies
depending on the selected event.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #43
2. When the email is generated, these strings are replaced with the
corresponding values from the process.
3. Above image is one example of a success message. For an example
of using these variables in an error notification, see Example
Notification Message.
5. For the message body, provide sufficient descriptive information so that the
recipient knows the name and type of event that occurred, as well as the
project in which it occurred. Adding a start and end time is helpful, too.
6. To create the notification rule, click Save Changes.
7. To cancel the rule, click Close dialog.
Example Notification Message
Below, you can review an example notification message, which could be used for
configuring a notification when a process failed to execute:
Please note that the load of the GoodData project (id
=${params.PROJECT}) using process "${params.PROCESS_NAME}",
graph ${params.GRAPH} that started at ${params.START_TIME}
failed at ${params.FINISH_TIME} with following ERROR:
${params.ERROR_MESSAGE}
Please inspect the ${params.LOG} for more details.
Modifying Legacy Recipients through the Gray Pages
For more information, see http://developer.gooddata.com/article/modifyingmultiple-recipients-of-notifications-through-gray-pages.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
Page #44
Data Integration Console Process Logging
For each execution of a process, a log is generated containing the status
messages of the steps of the process run. All logs are accessible via the Data
Integration Console.
Tip: If you are using the Chrome browser, GoodData provides a
useful extension to assist in monitoring and debugging process
execution. For more information on the GoodData Extension
Tool for Chrome, see https://developer.gooddata.com/tools.
l
Click
l
When selected, the log is displayed as a text file in your browser.
l
to open the log for a specific execution of the schedule.
To locate errors, search the text file for ERROR. All error messages need to
be addressed in your CloudConnect project before the process can
successfully execute.
You can identify the source of the error by examining the filename where the error
occurred. The following are general areas where problems may occur in the
execution of a process:
l
Connectivity issues
l
Transformation processing errors
l
Problems with the data source
Please review and attempt to fix through CloudConnect Designer. The name of
the component where the error occurs is also displayed with the error message.
This information is useful in debugging and fixing your transformation issues.
l
See CloudConnect Designer User Manual (PDF).
Deleting Graphs and Processes
Processes may exist in the GoodData Platform, in CloudConnect Designer
projects in your local environment, or in references in any scripts that you use to
deploy them.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Data Integration Console User Manual
l
Page #45
To remove an entire process from your GoodData project in the platform,
select the project. In the Project Details screen, click Delete next to the
name of the process.
NOTE: The above step removes the process from the selected
project only. Processes may be deployed to multiple projects.
If you are using CloudConnect to deploy processes, you should verify that the
CloudConnect project containing the process in your local environment is not
configured to use the GoodData project as its working project.
l
l
See CloudConnect Designer User Manual (PDF).
If you are deploying the process via API, you should verify that your
deployment scripts are no longer referencing the GoodData project.
Schedules and notifications are defined within the GoodData Platform only; if you
remove them from the CloudConnect Designer, they are removed from the
system.
l
You may delete schedules through the Data Integration Console. See
Schedule Details.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.
Copyright © GoodData Corporation 2007 - 2015
All Rights Reserved.