Download Report

CSMR, Vol. 1, No. 1 (2011)
Generation and Evaluation of Scheduling DAGs: How to
provide similar evaluation conditions.
Alexandra Olteanu, Andreea Marin
University POLITEHNICA of Bucharest
Faculty of Automatic Control and Computers, Computer Science Department
Emails: {alexandra.olteanu, andreea.marin}@cti.pub.ro
Abstract
When evaluating the efficiently and robustness of scheduling applications’ tasks in a
large scale distributed system through simulation, usually Directed Acyclic Graphs(DAGs)
are used. While the majority of evaluation studies are based on random DAGs which
do not emphasize real applications DAGs, a more realistic approach should be considered
when generating these synthetic DAGs. It should consider the generation of DAGs with
structures more similar with those resulted from real application workflows in order to
enhance better evaluations of scheduling algorithms. Furthermore metrics, such as task
granularity, communication to computation ratio, parallelism degree and constraints degree,
which are used for evaluating DAGs are presented.
Keywords: DAG Generation, DAG Evaluation, Task Graphs, Scheduling DAGs
1. Introduction
Abstract DAG has been extensively used in distributed systems workflow modeling. In
other words, if a computational problem can be divided into a number of subtasks, the data
dependencies between these subtasks are usually described by means of a directed acyclic
graph (DAG) also called task graph. In general, synthetic DAGs are used for evaluation
and classification of existing scheduling heuristics, proposal of a new scheduling approach
and comparison with existing heuristics. Alas, using different parameters and methods
when generating these DAGs results in a unfair performance comparisons of scheduling algorithms, due to testing under different conditions. As a result, we can identify the need to
provide the appropriate tool that will allow researchers to evaluate their scheduling algorithms under similar conditions, by generating various types of DAGs based on both realistic
parallel algorithms patterns, which can be found in a significant number of applications,
and synthetic patterns.
To address the aforementioned need various techniques have been proposed and used in
order to permit the modeling of a large variety of applications. We argue about the applicability of the resulted models as they should embody real applications’ characteristics and
complexity. We consider necessary the correct identification of some of the most common
patterns that can be found within an application workflow. Thus, this paper aims only to
introduce a DAG generator and to explain why we choose to generate certain patterns, clas-
A. Olteanu, A. Marin
Generation and Evaluation of Scheduling DAGs.
sical testbeds. The main goal is to propose a way for building an appropriate tool for DAGs
generation in order to give the possibility of evaluating scheduling algorithms in almost the
same conditions.
The paper is organized as follows: Section 2 presents related work in the field of generation and evaluation of scheduling DAGs. In Section 3 the DAG model which is used for
this analysis is described. Section 4 highlights for what patterns the DAGs generation are
considered and in Section 5 a series of metrics for DAG evaluation are presented. Finally,
in Section 6 we present the main conclusions.
2. Related Work
As a large number of scheduling algorithms take into account the mapping of application
workflow onto DAG structure, leads to the issue of evaluating these algorithms by using
various DAG structures which have the ability to simulate real application scenarios. Thus,
an efficient and adequate evaluation scheme for characterizing the workflow distribution is
needed for the design of scheduling heuristics. In order to provide a consistent analysis of
scheduling algorithms various methods for DAG generation along with several evaluation
metrics were proposed in the literature.
In [4] Canon proposes a series of comparison metrics for scheduling DAGs on heterogeneous systems and uses them to perform an experimental study with the purpose of showing
how they are correlated to each other in the case of task scheduling, with dependencies between tasks. In addition, a method for precise evaluation of the efficiency and the robustness
of stochastic DAG schedules is presented [3]. They use a dynamic programming method
with a bottom-up approach, since the main problem resides in the characterization of correlations between each random variable and the previous formulation reveals an overlapping
substructure, suboptimal computations occur when determining the correlation coefficient
between two random variables with a classic top-down recursion.
Tobita has designed a standard task graph set for fair evaluation of multi processor
scheduling algorithms with the purpose of making possible the evaluation of scheduling algorithms under the same conditions [11]. In addition, the importance of fairly algorithm
performance comparison is highlighted. Furthermore, others considered the need of having
heterogeneous computing system model in order to simulate different heterogeneous computing environments to allow the study of the relative performance of different scheduling
heuristics under different situations. In order to address this issue, Ali [10] proposed the
using of the expected execution times of the tasks that arrive in the system on the different
machines present in the system for characterizing a simulated heterogeneous computing
environment. The needed information is proposed to be arranged in an “expected time to
compute” matrix as a model of the given heterogeneous computing system, where an entry
(i, j) is the expected execution time of task i on machine j.
3. DAG model
DAGs are often used to model different types of structures in computer science and mathematics. In a DAG, the reachability relation forms a partial order, and any finite partial
order may be represented as a DAG by utilizing reachability [1]. DAGs are often used to
CSMR - Computer Science Master Research, Vol. 1, No. 1 (2011)
model processes in which information flows in a consistent direction through a network of
processors. The DAG model definition that we use within this analysis is presented below:
A DAG – G = (V, E, w, c) - that represents the application to be scheduled
• V = {vi : i = 1, . . . , N } – represents the set of tasks.
• E = {eij : data dependencies between node ni and node nj }
• w(ni ) – represents the node ni ’s computation cost
• (eij ) – represents the communication cost between node ni and node nj .
The relationships between the DAG tasks can be expressed as precedence constraints.
Precedence constraints impose the execution order of the nodes according to their dependencies (mapped into communication edges labeled with (eij ) data transfer cost). A node
can only start execution after all its predecessors, have completed their execution.
Our approach consists of generating DAG patterns taking into consideration a series of
DAG characteristics such as number of nodes, communication to computation ratio (CCR),
number of tasks per level and different applications workflow patterns.
4. DAG Generation
When generating DAGs, one must consider both the generic constraints presented above
and the ones specific to the problem at hand. These specific constraints will be presented
in the current section.
We have chosen to generate a series of specific DAGs that are common for applications’
tasks scheduling as mentioned in literature. First, we selected task graphs representing
various types of parallel algorithms. The selected algorithms used for graph generation are:
LU decomposition, Laplace equation solver, Stencil algorithm and Fast Fourier Transform.
Second, we also considered other types of DAGs, such as random DAGs, leveled DAGs with
a preselected maximum link complexity and balanced DAGs in order to obtain synthetic or
more general DAGs. For each of these approaches, the number of generated nodes depends
on the pattern used as many of the generated types of DAGs satisfy strict patterns. The
task granularity is achieved by varying the communication to computation ratio (CCR).
In addition, CCR is obtained using a ”best effort” approach, by generating random values
from a restricted, defined or computed, interval for the costs of links and nodes and assuring
a balanced distribution of these costs. Next we will give a brief description and motivation
for each of the DAG’s patterns that our generator supports. However a detailed description
of the properties of the DAG patterns for each class of parallel applications is out of the
scope of this paper and can be easily in many papers that evaluate scheduling algorithms
[2] or about numerical methods [9].
4.1 Random DAGs
A large number of papers about scheduling algorithms use for evaluation random generated
DAGs. This types of graphs are generated using a number of parameters that are also
mentioned in the literature:
A. Olteanu, A. Marin
Generation and Evaluation of Scheduling DAGs.
n1
c1
n1
c1
n2
c2
n3
c3
n4
c4
n5
c5
n6
c6
n7
c7
n2
c2
n4
c4
n3
c3
n5
c5
n6
c6
n8
c8
n8
c8
n9
c9
n10
c10
n1
c1
n11
c11
n9
c9
n10
c10
n11
c11
n12
c12
n13
c13
n14
c14
n15
c15
n12
c12
n13
c13
n15
c15
n16
c16
n2
c2
n14
c14
n3
c3
n4
c4
n16
c16
n8
c8
n17
c17
n15
c15
n16
c16
n5
c5
n9
c9
n6
c6
n10
c10
n11
c11
Figure 1: (a) Balanced graph, (b) Laplace graph, (c) Leveled graph with a preselected
maximum link complexity 3
n1
c1
n1
c1
n2
c2
n2
c2
n3
c3
n4
c4
n5
c5
n6
c6
n7
c7
n8
c8
n9
c9
n10
c10
n11
c11
n12
c12
n13
c13
n14
c14
n15
c15
n3
c3
n5
c5
n6
c6
n8
c8
n9
c9
n4
c4
n1
c1
n2
c2
n3
c3
n4
c4
n5
c5
n6
c6
n7
c7
n8
c8
n9
c9
n10
c10
n11
c11
n12
c12
n7
c7
Figure 2: (a) FFT graph, (b) LU graph, (c) Stencil graph
CSMR - Computer Science Master Research, Vol. 1, No. 1 (2011)
• the number of tasks in graph
• communication to computation ratio (CCR)
• the interval from which the costs for communication and processing are randomly
selected
However, for the performance obtained for scheduling algorithms on synthetic randomly
generated DAGs is given less significance, due to the DAG shape and type of tasks in DAG.
In general, a scientific workflow application has a unique shape DAG due to its design,
which is made to accomplish a complex task by means of job parallelism. The DAGs of
many real world workflow applications are well balanced and highly parallel [12]. Moreover,
for these applications a small number of unique operations can be distinguished.
4.2 Balanced DAGs
Given the observations made for randomly generated DAGs, the generation of full balanced
DAGs was considered. For this, two new parameters are used for their generation: the
number of parallel sections, for which the number of concurrent tasks is randomly selected
considering the remained number of tasks, and the minimum number of levels. The general
structure of these types of graphs is presented in Figure 1(a).
4.3 Leveled DAGs with a preselected maximum link complexity
We considered leveled DAGs with a preselected maximum link complexity as it is used in
evaluating scheduling heuristics [8]. This technique defines the construction of tasks graph,
level by level. For each level, excluding the first one, it looks for a number of parents for
each node, which is lower than a predefined threshold for links complexity, in the above
level. Graph construction takes into account the following defined parameters: number of
nodes, number of task levels, processing power, links complexity and communication costs.
Using this method, with small modifications, such as restricting for groups of nodes in one
level to have a single common child in the next level or for groups of nodes in a lower level
to have a single common parent in the previous level, patterns for in and out tree graphs
can also be generated. A small visual example of this graph can be seen in Figure 1(c).
4.4 Parallel Algorithms Mapping DAGs
Furthermore, the generation of task graphs representing various types of parallel algorithms
should also be considered. These types are chosen considering the number of real application
graphs in which these graph patterns can be found. We consider the most common used
types of parallel algorithms in the literature, such as in [2]: LU-decomposition algorithm,
Laplace algorithm, FFT algorithm (Fast Fourier Transform algorithm), Stencil algorithm.
Miniature samples of each of these types are presented in Figure 2. In all these cases, the
generation of different size DAGs is made by varying the number of nodes and CCR values.
Although, the original algorithms require, for some applications, the same costs for all or
just a part of nodes and edges, the costs are randomly selected from a given range and only
the graph structure was restricted to a given pattern at generation.
A. Olteanu, A. Marin
Generation and Evaluation of Scheduling DAGs.
4.4.1 LU algorithm graph
The LU algorithm can be characterized by the size of input data matrix, because the number
of nodes and edges in the task graph depends on the size of this matrix. The LU algorithm
that is used for parallelization in the majority of cases is simply a reorganization of classic
Gaussian elimination algorithm. The major operations are formed of the algorithm detailed
steps presented in [7]. These steps perform factorization, updates of U and matrix updates.
Note that these operations are carried out in a parallel environment. They can execute
concurrently but are, however, subject to a series of constraints highlighted in Figure 2(b).
The motivation for choosing this algorithm pattern is the fact that LU decomposition is
used in solving linear equations systems, matrix inversion, but it can also be used in fields
such as FPGA programming.
4.4.2 Laplace algorithm graph
In the general form of Laplace algorithm tasks graph, a node is constrained by its left and
right upper neighbors. It combines groups of consecutive mesh points and corresponding
values into coarse-grain tasks, yielding p tasks, each with n/p of values. In this way communication is greatly reduced, but the values within each coarse-grain task must be updated
sequentially. So, the number of tasks p can have an important effect on the performance
[2]. A miniature task graph for Laplace algorithm can be seen in Figure 1(b).
The main usage of this algorithm is for the differential equation solving and it can be
extended for use in the engineering field for solving circuit analysis problems.
4.4.3 FFT algorithm graph
For the generation of the FFT task graphs we have considered the Cooley–Tukey FFT
algorithm, which would be used for task decomposition. In our algorithm, the butterfly
diagram was used to generate task dependencies. The usage of FFT when generating
DAGs requires that, on the level where the computation starts, the number of parallel tasks
to be a power of 2. See Figure 2(a).
The method used to generate the DAG after a FFT algorithm is an adaptation of the
Cooley–Tukey FFT algorithm. After the DAG’s root node is generated the nodes on the
first level of the graph are generated. The number of the first level of generated nodes must
be a power of two in order to be able to group them in butterfly pairs. The total number
of generated levels is equal to lnn, where n is the number of nodes on the first level of the
graph. After the next level of nodes is generated the relationship parent child between the
nodes from different levels is set using the relationship between the positions of the nodes.
The child node can have only two parents, one with the same position on the previous level
and one with the position computed from the following formula:
childindex + currentlevel, 2/childindex &&childindex + currentlevel < n
childindex − currentlevel, otherwise
In general, the Fast Fourier Transform is used for signal and image processing and can
be considered as the most ubiquitous algorithm for analyzing and manipulating digital and
discrete data. This algorithm has a wide range of applications: electro-acoustic music and
CSMR - Computer Science Master Research, Vol. 1, No. 1 (2011)
audio signal processing, medical imaging, image processing, pattern recognition, computational chemistry, error correcting codes. Maybe one of the most important usages of the
FFT algorithm is the genfft compiler [5]
4.4.4 Stencil algorithm graph
The motivation for choosing the Stencil parallel algorithm is the usage of this pattern in a
significant number of applications, but most notably in image processing and simulation.
Furthermore, a large variety of linear and non-linear image processing operations are specified using stencils. Among these operations we can find linear convolution and non-linear
noise reduction. Explicit solutions to partial differential equations use iterative applications
of stencil operations. These are used, for example, in image processing and in simulation
and seismic reconstruction.
The graph is generated by first selecting the number of levels and tasks on each level.
Then for each node ni,j we select three consecutive children in the lower level, the node
in the next level with the same index ni−1,j and its’ left and right neighbors, ni−1,j−1 and
ni−1,j+1 . Figure 2(c) highlights the graph pattern on a small example.
5. DAG evaluation
The aim of this analysis is to provide a DAG generator whose implementation is based on
a series of various techniques and to provide a strong motivation for each one of these techniques by identifying a number of concrete applications where the resulted DAG patterns
can be found. Therefore it represents an aggregation of different methods used for generating DAGs in scientific studies in the context of scheduling in distributed environments
and offers a robust and complex tool. In real-life scenarios, a scheduling algorithm needs to
address a significant number of constraints and performance requirements. It should exploit
parallelism by identifying the task graph structure, and take into consideration task granularity (amount of computation with respect to communication), arbitrary computation
and communication costs. Consequently a set of metrics for DAG’s evaluation [6] should
also be defined in order to provide a way for classifying scheduling heuristics by different
application’s characteristics.
5.1 Granularity
Depending on its granularity, which is a measure of the communication to computation
ratio, a DAG can be coarse grained (the computation dominates the communication) or
fine grained (the communication dominates the computation). Granularity of a DAG is
defined as:
g(G) = min(cn(x)/max(c(x, i))),
where cn(x) - is the computation cost of node x
c(x, j) - the communication costs from node x to node j
x, j - represents the node’s and can take values from 1 to the number of nodes
(1)
A. Olteanu, A. Marin
Generation and Evaluation of Scheduling DAGs.
We conclude that: I/O bound term is equivalent with fine-grained and CPU-bound
term is equivalent with coarse-grained. Intuitively a graph is coarse-grained if the amount
of computation is relatively large with respect to communication.
5.2 CCR (Communication to Computation Ratio)
Definitions found in the literature usually assume CCR defined as the average edge weight
divided by the average node weight. With the help of CCR, one can judge the importance of
communication in a task graph, which strongly determines the scheduling behavior. Based
on CCR we classify task graphs in:
• CCR < 1 - coarse grained graph
• CCR = 1 - mixed
• CCR > 1 - fine grained graph
X
c(x, j) ∗ number of nodes
x,j
CCR = X
cn(x) ∗ number of edges
(2)
x
where cn(x) - is the computation cost of node x, c(x, j) - the communication costs from
node x to node j, and x, j - represents the node’s number and can take values from 1 to
the number of nodes.
5.3 Constraints degree
The constraints degree (CD) of a DAG is defined as the number of connections between
nodes. The degree of constraints can be: low, medium, high. The values that separate
these tree classes are recommended to be chosen considering the structure of the resources
network. A variation of the constraints degree metric can be considered the density of a
DAG, which determines the number of dependencies between nodes from two consecutive
DAG levels. Of course, this metric can be applied only to leveled DAGs.
5.4 Parallelism degree
Another important metric that should be considered is the DAG parallelism degree (PD)
which is the maximum number of tasks that can be executed concurrently. Using this
metric, DAGs can be classified in the following way: a small value for PD indicates low task
parallelism and a large value, is indicating a high degree of parallelism among the DAG
tasks. This is computed taking into account the constraints among the tasks. Like for CD,
the value that separates the two classes needs to be chosen accordingly with the number of
processors on which we want to schedule de current application.
6. Conclusions
This paper aims to indicate the correct way of constructing a DAG generator. Its implementation should be based on a series of various techniques and an appropriate motivation
CSMR - Computer Science Master Research, Vol. 1, No. 1 (2011)
for each one of these techniques by identifying a number of concrete applications where the
resulted DAG patterns can be found. In addition, the applications behind this paper represent an useful tool for testing and comparing scheduling algorithms in similar conditions by
means of simulation. Furthermore, a set of evaluation metrics is given for evaluating DAGs
in the context of scheduling in large scale distributed systems.
References
[1] Oded Maler Ahmed Bouajjani, Javier Esparza.
Reachability analysis of pushdown automata application to model checking.
http://wwwverimag.imag.fr/ maler/Papers/pda.pdf, 06/06/2010.
[2] Vincent Boudet. Heterogenous task scheduling: a survey. Research Report RR-6895,
INRIA, 2001.
[3] Louis-Claude Canon and Emmanuel Jeannot. A Comparison of Robustness Metrics
for Scheduling DAGs on Heterogeneous Systems. In Sixth International Workshop on
Algorithms, Models and Tools for Parallel Computing on Heterogeneous Networks HeteroPar’07, page 10 pages, Austin United States, 2007. In conjunction with [Cluster
2007] The 2007 IEEE International Conference on Cluster Computing. The workshop
proceedings will be published through the IEEE Computer Society Press as part of the
Cluster 2007 proceedings.
[4] Louis-Claude Canon and Emmanuel Jeannot. Precise Evaluation of the Efficiency and
the Robustness of Stochastic DAG Schedules. Research Report RR-6895, INRIA, 2009.
[5] Matteo Frigo. A fast fourier transform compiler. Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI),
3:195–207, May 1999.
[6] Udo Honig and Wolfram Schiffmann. A comprehensive test bench for the evaluation
of scheduling heuristics. Proceedings of the 16th International Conference on Parallel
and Distributed Computing and Systems (PDCS),, 2004.
[7] Parry Husbands and Katherine Yelick. Multi-threading and one-sided communication
in parallel lu factorization. SC Conference, 0:1–10, 2007.
[8] Florin Pop, Ciprian Dobre, Gavril Godza, and Valentin Cristea. A simulation model
for grid scheduling analysis and optimization. In Proceedings of the international symposium on Parallel Computing in Electrical Engineering, pages 133–138, Washington,
DC, USA, 2006. IEEE Computer Society.
[9] Singiresu S. Rao. Applied Numerical Methods for Engineers and Scientists. Prentice
Hall Professional Technical Reference, 1st edition, 2001.
[10] Shoukat, Shoukat Ali, Howard Jay Siegel, Muthucumaru Maheswaran, Debra Hensgen, and Sahra Ali. Representing task and machine heterogeneities for heterogeneous.
Journal of Science and Engineering, Special 50 th Anniversary Issue, 3:195–207, 2000.
A. Olteanu, A. Marin
Generation and Evaluation of Scheduling DAGs.
[11] Takao Tobita and Hironori Kasahara. A standard task graph set for fair evaluation of
multiprocessor scheduling algorithms. Journal of Scheduling, 5(5):379–394, 2002.
[12] Zhifeng Yu and Weisong Shi. An adaptive rescheduling strategy for grid workflow
applications. Parallel and Distributed Processing Symposium, International, 0:115,
2007.