Document 284597

Faculty of Mathematics and Computing
The Open University
Walton Hall
Milton Keynes
MK76AA
UK
Telephone
Direct line
Faculty of Mathematics
and Computing
Fax
+44 (0)1908274066
+44 (0)1908652688
+44 (0)1908 652 140
COMPUTING FOR COMMERCE AND INDUSTRY PROGRAMME
SAMPLE DISSERTATION FOR MSOl - MSCPROJECT
Sample NO:
\
[o
PLEASE RETURN TO COURSES OFFICE AT THE ABOVE ADDRESS
Faculty of Mathematics arid Coinpiitirig: Applied Matlierrintics Departrrient, Ceritrefor Mnthetrmtics Educatioti,
Coniputirig Deppnrtmerrr, Pure Matlierrrtitics Departtrierit, Statistics Departrnerit
A PKOTOCOL TO GUARANTEE THE ORDER OF
MESSAGE PROCESSING WHEN INTEGRATING
ENTERPRISE APPLICATIONS
A thesis submitted in partial fulfilment of the
requirements for the Open Uniwrsity's
Master of Science Degree in Software
Development
Duncan Millard
U1 1 9 ~ i l
17 September 2004
Word Count: 14914
PREFACE
My thanks to my supervisor Dr. Rob Walker for his guidance and suggestions, every
one of which helped to improve the academic quality of this work.
My thanks also go to my friends and family who have supported me throughout the
last 3% years of “spare time” study, particularly ovcr the past fav months as I
neglectcd them to immerse myself in this hnal piece of my MSc. Now you cm see
what I have been up to while you were out cnjuping thc summer! My proofreaders
also deseix a special mention for helping so selflessly with an S ~ ~ U O Ujob.
S
This thesis is dedicated
to
Liz. Without
y
o unconditional
~
and limitless support,
understanding and cups of coffee, it would have been a very different piece of work.
Thank you.
Duncan Millard
U1796407
-1-
TABLE OF CONTENTS
i
Preface
Table Of Contents
fist of Figures
V
List of Tables
vi
Abstract
vii
Glossary
viii
9
Chapter 1:Introduction
1. I.
1.2.
9
10
O\JEK\'LE\XJ
THESIS
Srautm~i?
11
Chapter 2 Enterprise Application Integration
11
11
12
12
12
13
13
14/
14
15
16
17
'17
19
19
20
21 y'
22
22
23
23
23
24
24
25
2.1.
TN'I"KODU(JI'ION
2.1 . l .
Tntegmtion: lnvisible Glue
2.1.2.
B2B and B2C
2.2.
THEEVOLUllON OF SOPIWAR!?
2.2.1.
Before Integration:The hlainfrdme
2.2.2.
Integration through Data Sharing
2.2.3.
Stovepipe Systems
AUIDIMA'I'ING INI"I~GlWI'1ON
2.3.
2.3.1.
Integration through Afiddlewaxe
2.3.2.
Enterprise Applications
2.3.3.
Point-to-Point Integration
2.4.
A NEW AI~PR~ACI-I
'I'D I N T K f i \ T W N
2.4.1.
I-Iub and Spokc Architecture
2.4.2.
Latcgratim Engines
2.4.3.
Automating Business Processes
M in the Real World
2.4.4.
2.4.5.
Asynchronous, Long-Running Processes
THEAUSSAGE ORDERING ~ O I 3 l ~ l 3I'OK
1
2.5.
2.5.1.
Cost of Integration
2.5.2.
Extensibility
Asynchronicity
2.5.3.
2.5.4.
Hub and Spoke Architecture
2.5.5.
Resilience
2.5.6.
Flexibility and Efficiency
2.6.
SUMMhKY
Chapter 3: An Investigation of Message Ordering
26
3.1.
IN'IRODUCI?ON
3.2.
"l-llX l'ROHl.,Er\l O P A&SSAGE oIU>l:KINC;
3.2.1. An Informal Language for Discussion
3.2.2.
Illuswative Scenarios
MlsSi\G E OKIXKINC; APl'RO/\CI-I I 3
3.3.
3.3.1.
Total Ordering
26
26
27
27
31
31
Duncan Millard
U1790107
- 11 -
33
37
39
42
43
3.3.2.
cdusd Ordering
3.3.3.
Actual Causal Ordcckg
3.3.4.
Application Ordering
3.4.
SUhfhlhlO'
3.5.
CONCLUSION
44
Chapter 4: Inferred Causal Ordering: A Protocol for EAI
4.1.
4.2.
4.3.
4.3.1.
4.3.2.
4.4.
4.4.1.
4.4.2.
4.5.
4.5.1.
4.5.2.
4.6.
44
44
45
45
45
IN'IXODUC'IION
GUAKAN'I'I!EIN(.;
CAUSAI.. OIUIEKING
OHTAINING C/\USIU.,I'IY FROM t\Pl'I,ICn'llON 1'ROGKAMS
Basic Ordering Tnformation
XML
46
INI'I?RK13DCRUSAI, OWERING
Causal Mcssagc Croups
Cross-Group Dependencies
N x i m m w i N c ; AND CONVEYING
C~US~UI'IY
Causal Log
Message Annotation
PKESIZKVING CAUSAl..I'IY
48
48
49
50
50
51
51
52 J'
52
52
52
55
58
Remote hkssdgc storcs
Dpamk Message Tag
REMOVING
CAUSAI.~'I'Y
INIWK~WY~ION
4.6.1.
4.6.2.
4.7.
E\/AI.UA'I'ION
4.8.
4.8.1.
Evaluation against EA1 Fcatures
Theoretical Efficiency of the Protocol
4.8.2.
4.9.
SUMMARY
60
Chapter 5: Protocol Evaluation
5.1.
5.2.
5.3.
5.3.1.
5.3.2.
5.3.3.
5.4.
5.4.1.
5.5.
5.6.
60
60
60
61
61
61
62
63
64
65
IN'II<ODUCI'ION
EVAI.UA'I'ION
AI'I'KOACI-I
PERFOIZ\hr\iC13 M I ~ A S U R I ~ S
Test Measures: Theoretical Efficieiicy
Test Measures: E X Features
Test Measures: Non-Quantitative Testing
sIk~Ul.A'I'IONSYSIliM
System Variables
TESl' C i A 3
Sukihbuw
66
Chapter 6: Test Results
6.1.
6.2.
6.3.
6.4.
6.5.
6.5.1.
6.5.2.
6.6.
6.7.
6.8.
6.9.
6.9.1.
6.9.2.
6.10.
66
66
67
68
71
71
74
80
a2
84
87
INII~ODUC~ION
?dZTt-lODOLDC;Y
LW~LSAC;E
ORDERING
kfIXSSAGlZ O\%RI-lEtW
hhXSSAGE LO(:;
OVlZRI-113AD
Causal Information
Dynamic Message Log Size
EFFICIENCY
RIXSILIENCX
LIMTI-AI'IONS
OF 'r1-11<SIMUI.A'L'ION SYS'Il31
msl' CONCI.USIONS
Performance G a i n s of the Protocol
Comparison of Theoretical Efficiency with Optimal Example
SUhhMMY
88
87J
89
90
Chapter 7: Conclusions
Duncan Millard
UI796407
- iii -
90
90
91
92
93
Appendix B Simulation System Architecture
96
B.l
o\TX\'lEW
B.2
SENDING SYSIIZM SIMUIA'IION
B.3
~ ~ U H U R
B.3.1
Rules Engine
B.3.2
Integration Engine: BizTalk 2004
Message Logs
B.3.3
€3.4
DIXIINAIION
SYSTEMSIMUI.A'I'ION
96
96
97
97
98
98
98
References
99
Index
Iluncan Millard
U1796407
103
- IV -
L i s 1 of F i g u r e s
Figure 1: Stovepipe applications.............................................................................................................................
Figure 2: Point-to-point integration between multiple systems ......................................................................
Figure 3: Hub and spoke integration.....................................................................................................................
Figure 4 An informal language for message ordering ......................................................................................
Figure 5: Sequential send, non-sequential receive ..............................................................................................
Figure 6: Sequential send, non-sequential receive ..............................................................................................
Figure 7: Dependent concurrent send ..................................................................................................................
Figure 8:Total ordering............................................................................................................................................
.
blgure
9: Causal ordering..........................................................................................................................................
F i e 10:A sknple XblL document.....................................................................................................................
Figure I 1: XML messages ........................................................................................................................................
.Figure 12: A cross-group dependency...................................................................................................................
F i e 13: Message enveloping...............................................................................................................................
Figure 14: An algorithm exhibiting O(n) efficiency ...........................................................................................
Figure 15: An algorithm exhibiting O(n2) efficiency..........................................................................................
F i w e 16: An a1goarhm exhibiting O(1)efficitmcy...........................................................................................
Figure 17: Message overliead hdepmdent ofnumber of destixdons .........................................................
Figwe 18 Message overhead independent of number of causal message groups......................................
Figure 19: How message overhead varies with cross-group dependencies..................................................
Fig7.m 20: I-IGWcausd lcg size varies with the iutllber ofdesthatioiis........................................................
F i i e 21: How causal log size varies with the number ofcausal groups.....................................................
Figure 22: How causal iog size varies with the number of desiinanons and causai groups......................
Figure 23: How dynamic log she varies for different message quantities....................................................
Figure 24: How the sending to delivery ratio affects dynamic log size .........................................................
F i e 25: How v;uiable latency affects dynamic log size ................................................................................
F i e 26: How run time reduces as the number of causal groups increases ..............................................
Figure 27: Baseline dynamic log &e for resilience testiflg ...............................................................................
Figure 28: The impact of an unavailable destination on dynamic log size ...................................................
Figure 29: Comparing theoretical efficiencies .....................................................................................................
F i e 30: Sirnulation system chitecturt:............................................................................................................
Figwe 31: Test message format..............................................................................................................................
.Figure 32: Simulation system causal group idenatier.........................................................................................
-
Duncan Millard
U1796407
-v-
14 /
16
18
27
28
29
30
32
34
46
47
49
51
55
56
56
18 *-/"
69
70
^I*
11
72
-,I
/2
75
77
79
81
83
84
89 J
96
97
97
L i s 2 oJ' F i g u r e s
Figure 1: Stovepipe applications.............................................................................................................................
p 2 IJoint-to-pointintegration between multiple systems ......................................................................
Figure 3: Hub and spoke integration.....................................................................................................................
Figure 4 An informal language for message ordering ......................................................................................
F w r e 5: Sequential send, non-sequential receive ..............................................................................................
Figure 6: Sequential send, non-sequenttal receive ..............................................................................................
Figure 7: Dependent Concurrent send ..................................................................................................................
Figure 8: Total ordering............................................................................................................................................
Figure 9: Causal ordexing .........................................................................................................................................
Figure 1 0 A simple XML document.....................................................................................................................
Figure 11: XML messages ........................................................................................................................................
Figure 1 2 A cross-group dependency...................................................................................................................
Figure 13: Message enveloping...............................................................................................................................
F w e 1 4 An algorithm exhibiting O(n) efficien............................................................................................
Figure 15: An algorithm &biting O(n3 effidency..........................................................................................
Figure 16: An algo,Orithm exhibiting O(1) efficiency ...........................................................................................
F i w e 17: Message overhead independent of numbcr of desijnations.........................................................
Figure 18: Message overhead independent ofnumber of causal message groups......................................
Figure 19: How message overhead varies with cross-group dependendes ..................................................
Fip-e 2 0 How causal log size varies with die numbtr ofdestin.dnons........................................................
Figure 21: How causal log size varies with the number of causal groups.....................................................
Figwe 2 2 S o w causai iog size vanes with the number of destinations and causai groups......................
Figure 23: How dynamic log size varies for different message quantities ....................................................
Figure 2 4 How the sending to delivery ratio affects dynamic log size .........................................................
Figure 25: How variable latency affects dynamic log size ................................................................................
Figure 26: How run time reduces as the number of causal groups increases ..............................................
..
F w e 27: Baseline dynamic log size for resthence testlng ...............................................................................
Figure 28: The impact of an unavailable destination on dynamic log size ...................................................
F
i 29: Comparing theoretical efficiencies .....................................................................................................
F w e 3 0 Sitllulation system *architecture............................................................................................................
-.. 31: Test message format..............................................................................................................................
bigire
Figure 32: Simulation system caud goup identifier.........................................................................................
F
Duncan A.lillard
U1796407
-v-
14 J
16
18
27
28
29
30
32
34
46
47
49
51
55
56
56
68 /
69
70
71
72
73
75
77
79
81
83
84
89
96
J
07
1
97
L i s t of
Tables
Table 1: Suitability ofA&
el al’s ordering protocol forI .
...................................................................
Table 2 Suitability of Kshemkalyani and Singhal’s ordering protocol for EA1 ..........................................
Table 3: Suitability of Cheng et ul’s ordering protocol for EAT ......................................................................
Table 4: Suitability of Sin& and Badarpura’s ordering protocol for .....................................................
Table 5: Suitability of message ordezlllg approaches for EA1 .........................................................................
Table 6: Summary of an inferred causal ordering protocol for E ~..............................................................
I
Table 7: Test measures ..............................................................................................................................................
Table 8 In-order message delivery ........................................................................................................................
Table 9: Running times, showing system overload above 50 messages.......................................................
Table 10 Impact of latency on run times.............................................................................................................
Table 11: Run times with unavailable destination ..............................................................................................
Ihncaii Millard
U1796407
-
33
36 t
39 v
41 Y
42 4
54 /
65
68
76
78
82 v’
..4 b s t r a E t
J
One of the significant emerging trends in modern computing is that of Enterprise
Application Integration (EAI) - the connecting of two or more individual
applications via custom, automated business processes. Withul this still maturing
discipline, there are a number of significant technical problems that are yet to be
overcome.
Integrated applications typically communicate by asynchtonous message passing,
resulting in
thc
dclivcry of mcssagcs to rcmotc systcms in an order potentially
different to their creation. This can have undesirable side effects such as a lass of
data integrity.
I
.
Distributed systems research describes a number of different approaches to solving
the message ordering problem. In order to assess these approaches, criteria are
developed that must be met for a protocol to be suitable for use in EAI. Measured
against thcsc criteria, existing protocols are found to be lacking.
This thesis presents a novcl mcssagc ordering protocol desgncd specifically for EM,
in which ordering information is inferrcd from the contents of the messages
themselves. The overhead of the protocol and the efficiency gains it offers compare
well with other implementations of traditional causal ordering.
Duncan ~Miilard
U1796407
-i i -
G l o s s a ry
A2A
Application to Application Integration
API
Application Programming Interface
B2B
Business to Business Integration
B2C
Business to Consumer Integration
Enterprise Application Integration
Enterprise
An orpuization that uses computers.
In practice, the term is applied much
more often to larger organizations than
smaller ones (NIH,
n.d.)
ERP
Enterprise Resource Planning
XML
extensible Markup Language
Duncan Millard
ut796407
-vii-
Chapter I
INTRODUCTION
LL Overview
Businesses are increasingly realising the financial and competitive benefits of
Enterprise Application Integration
P
I
)
Many
. new technologies and tools are
emerging that greatly reduce the cost and complexity of implementing an integration
strategy (Medjahed et d, 2003; Linthicum, 2004). As with any relatively new
technology, there are still a number of issues to resolve.
Integrated applications tppically communicate via asynchronous message passing
(Bussler, 2002b) and are therefore comparable in nature to traditional distributed
asynchronous systems. A problem common to both domains is the need to control
the order of message processing between the different components of the system
(Lamport, 1978) in order to ensure data integrity and consistency across each process
or application.
Consider for example two messages from a Human Resources (HR) system destined
for a payroll system - one noafjmg that a new employee has joined a company, and
one to set the employee’s salary level. If the payroll system receives the message to
set the employee’s salary before it receives the “new employee” message, the salary
message may be rejected, causing inconsistency of data between the two systems.
Duncan Millard
U1796407
Chapter 1: Introduction
page 9
This thesis invesugates existing solutions to the message ordering problem, evaluates
their suitability for EAI, and then presents and evaluates a message ordering protocol
that is specifically designed to address the unique features of EAI.
1.2. Thesis Structure
This thesis has the following structure:
Chapter 2 presents a description of Enterprise Application Integration based on a
literature search, showing how EA1 is the latest of many attempts to connect
computers together. It discusses the typical architecture of modem EA1 systems, and
explains why a message ordering problem exists.
Chapter 3 describes the message ordering problem in a wider context and presents a
literature review of exisung message approaches. Protocols implemenang these
approaches are then assessed in the context of their suitability for EAI.
Chapter 4 presents a proposal for an EA1 ordering protocol based on the problems
found in Chapters 3. The approach to tesang the protocol appears in Chapter 5 with
the results and conclusions appearing in Chapters 6 and 7 respectively.
Appendices, a list of references and an index conclude the thesis.
Duncan hIiUard
U1796407
Chapter 1: Introduction
Page 10
Chapter 2
ENTERPRISE APPLICATION INTEGRATION
2.1. Introduction
/
l
Lmthicum (2000) describes Enterprise Application Integration as “the unrestricted
sharing of data and business processes amongst any connected applications and data
sources in the enterprise”. W
t informative, i h s description describes a perfect
enterprise - something that is unlikely to exist in reality. More reahtically, EA1 is the
connection of two or more applications in a way that combines their data and
processing capabilities, controlled by a central business process.
2.1.1. Integration: Invisible Glue
We have all used integrated systems, probably without realising it. When making a
purchase on a credit card, the retailer’s systems talk to the credit card provider to
authorise the sale. The credit card’s payment processing system connects to the
account management system, which in turn connects to a blllrng system. At the end
of the month, the account management system integrates with a “supermarket-style”
loyalty card and awards points based on the amount spent.
Without the ability to integrate these separate applications programmatidy, humans
would have to be involved at every step, increasing the cost and time needed for
processmg.
Duncan Millard
U1796407
Chapter 2 Enterprise Application Integraaon
Page 11
2.12. B2B and B2C
Enterprise Application Integration (sometimes referred to as Application-toApplication Integration, or A2A) is one part of the wider field of integration. Other
categories are Business-to-Business integration (B2B) and Business-to-Consumer
Integration (B2C). EA1 can be thought of as the view from within the enterprise,
whereas the other approaches move outside of the enterprise.
Each of these types of integration has its own features and problems, but each shares
a common core: connecting two or more entities together, whether those entities are
applications, businesses or people.
2.2. The Evolution of Software
It is worthwhile loolung at integration - or its closest equivalent at the time - in the
context of each major development in computing archtecture. This gives an
understanding of how today’s problems have evolved and the increased need for
EAI.
2.2.1. Before Integration: The Mainframe
E d y computer platforms used in industry were mainframes responsible for all
processing in the business. With all data and processing centrdy located, there was
no real concept of integration; if another process needed data it was heady present
on the machine and accessible, albeit with programming sometimes required to
access it (Lmthicum, 2000).
Duncan Millard
U1796407
Chapter 2 Enterprise Application Integration
Page 12
2.2.2. Integration through Data Sharing
The arrival of desktop-based compuung led to a rapid growth in compuang in the
enterprise (Sinha, 1992). Data was no longer stored on the central mainframes; it was
instead spread across many desktop machines in an uncontrolled manner.
Departments were able to store data locally in ways that had not previously been
possible. This created a vast number of “data islands”, with data
business-critical in nature
-
-
some of it
stored and managed in an ad-hoc fashion on desktop
computers (Hackney, 1996).
As the desire grew to use information held by others, techologies such as file
servers enabled the sharing of data between different machines and s o b a r e
packages. In a similar rnanner to data reuse on the mainframe, file conversion utilities
facilitated data shanng.
2.2.3. Stovepipe Systems
R e c o p i n g that rnany people within a department were relylng on data sharing to
achieve their goals, department-level applications began to emerge. These
applications addressed a particulat department’s processes, such as general ledger or
sales management (Ltnthicum, 2000).
These applications typically operated in isolation, with very little attention given to
interoperability with other systems. This type of system was known as a stovepipe
application, for reasons that are apparent from Figure 1.
Duncan Millard
U1796407
Chapter 2 Enterprk Application Integration
Page 13
Figure I: Stovepipe applications
Data held in a stovepipe application is typically referred to as being in a “data silo” a large data store that is only accessible through the application itself.
2.3. Automating Integration
2.3.2 Integration through Middleware
With no way to interrogate a data silo directly from outside of the particular
stovepipe application, there was a need to pass data from one stovepipe to another
in a more controlled, but automated, manner.
The umbrella name given to the group of technologies used to connect applications
in this manner is middleware. There are many definitions of middleware. One of the
best I have seen is:
Sofiare thatpmvides a link between stparate sojbare qpkations. Middlewmz?
C. ..I connects two appkations and passes data between them. (Fehai Student
DuncanMillard
U1796407
Chapter 2 Enterprise Application Integration
Page 14
This definition is specific enough to describe the role of middleware, without tying it
to any particular technology.
Applications began to support interoperability with different proprietary middleware
software. With these interfaces in place, some of the htst truly automated integration
began to emerge. Using applications together no longer involved custom or m a n d
steps to extract data from an application; data was instead made available directly by
each one. As these applications grew in size, a new category of system developed the enterprise application.
-
2
23.2. Enterprise Applicatio
*
(A4
flLL
Enterprise applications are large, complex inter-departmental applications. They
typically feature complex business logic and provide functionality for many groups of
users in an organisation. Enterprise applications could replace the indimdual data
islands and departmental-level applications, or provide new, enterprise-wide
functionality.
Enterprise applications are more likely than other categories of system to feature
support for advanced middleware. The last decade has seen the emergence of
sophisticated middleware technologies, includmg CORBA, DCOM, Screen
Wrappers, and JavaBeans (Themistocleous et al., 2001; Vinoslu, 2002; Reyes et al.,
203).
Enterprise applications are also more likely to expose application-progfamming
interfaces (APIs) for accessing their functionality, thereby facilitating their integration
with other systems.
DuncanMillard
ut796407
Chapter 2 Enterprise Application Integxatbn
Page 15
23.3. Point-to-Point Integration
The use of middleware to connect applications led to point-to-point integration. This
is the general term given to a direct connection between two computer products or
systems.
Due to the reliance on a direct connection, the addition of a third system requires
two new links. Similarly, the addition of a fourth requires three more links.
Flgure 2 shows a number of systems connected via point-to-point integration.
Figure 2 Point-to-point integration between multiple systems
Point-to-point integration is attractive as a “quick fix” technology, but as a strategc
integration architecture it is constrained by a number of limitations.
Truman (2001) describes the point-to-point approach as “a spaghetti of interfaces
closely binding systems together”. He points out that systems of this type “ n o d y
exhibit the traits of hgh levels of complexity, risk and escalating cost’’
DuncanMillard
U1796407
Chaptez 2 EnterpriseApplication Lutegation
Page 16
A survey conducted in 2001 by Themistocleous et al. (2001) examined the integration
of Enterprise Resource Planrung (ERP) systems with other systems. All respondents
said that point-to-point integration was not the best approach for integratmg their
systems due to the maintenance problems it introduces. Futther, Themistocleous et
al. estimate that to integrate x applications usmg a point-to-point approach requires
the development of x*(x-1)/2 interfaces.
Espinosa and Pulido e 0 2 ) F e e . Two additional drawbacks identified are that
_2__
point-to-point integration tends to be invasive to the applications involved, and that
business processes are difficult to change without changing the code of the
applications. Custom interfaces need to be developed to cope with messages from
other systems, the development of which may require a k h degree of
understanding of those systems.
It is clear from these studies that point-to-point integration will lead to unsustainable,
rapidly escalatmg costs as the number of systems increases.
-
L?L&&A
\j
2.4. A New Approach to Integration
2.4.1 Hub and Spoke Architecture
By the late 1990s, a number of factors began to converge that fuelled an increased
demand for more complex integration. The extensible Markup Language,
commonly known as XML (W3C, 2000), emerged as a de fact, transparent way of
representug application data. XML received substantlal backmg from companies
includmg IBM and Microsoft, but its adoption as a recommendation in 1998 by the
W3C served to cement the place of XML in modem software (Eyal and Milo, 2001;
DuncanMillard
U1796407
Chapter 2 Enterprise Application Integration
Page 17
&e
Gosain et ut!, 2003). The industry also agreed on web services as the XML-based
standard for inter-application communication (Levitt, 2001), offering sLgzllhcant
benefits for application integration (Stal, 2002; Kreger, 2003).
This new cross-platform cross-vendor approach, coupled with an increasing desire to
connect applications, and the &h cost of point-to-point integration led to a new
way of thmkmg. Hub and spoke integration emerged, offering a centralised approach
similar to that seen with client-server computmg (Lnthicum, 2004) and offering a
way to control integration complexity. Flgure 3 illustrates why this architecture is
known as hub and spoke.
Figure 3: Hub and spoke integration
Each spoke represents a path of communication between the integration hub and
the application at the end of the spoke, effectively
a single point-to-point
integration between the hub and each application (Hohpe and Woolf, 2004). As with
point-to-point integration, applications connect to the hub using middleware
- for
Duncan Millard
Page 18
U1796407
Chapter 2 Enterprise Application lntegration
,
example message queuing,'Web
-3or
Services
/,
'U
The central control of
connections means that it is cost effective to offer support for a wide range of
connection methods.
2.4.2. Integration Engines
At the centre of the hub is typically a specialrsed integration engine, specifically
designed to cope with the needs of application integration. Companies such as
Tibco, IBM, Seebeyond and Microsoft have produced integration engines (Bussler,
2002a,b; Medjahed et d.,2003), and this is still a growing area of the industty.
Integration hubs typically provide the following features (Lmthicum, 2000):
m
Message transformation from one application's format to another
Intehgent rouang to decide which message to send to which application
A rules engine to automate business processes
Intuitively, all three points requite that the format of incoming messages is well
known by the integration engine. Further, the automation of business processes
requites that the integration engine has an implicit understandmg of the data
contained within those messages, as described in the next sub-section.
2.4.3. Automating Business Processes
Automated business processes, commonly known as orchestrations, are responsible
for managmg all interactions between the various systems and for making decisions
based on the data that flows through them.
DuncanMillard
U1796407
Chapter 2 Enterprise Application Integration
Page 19
A typical example is an orchestration for processing purchasing requests from
employees. The orchestration logic uses its knowledge of the message format to
extract the amount of the purchase request. It then uses its implicit understan-
of
that data to decide whether the request is within the employee’s budgetary limit. If
so, the request is routed directly to the purchasing system in a format recognised by
that system. If not, the request could be converted to an e m d containing the
employee’s name and purchase request, which is then sent to a purchasing
supervisor.
Orchestrated business processes are typically long running in nature (Kuo et aL,
2003)’ in contrast to those canied out using a database which tend to be short in
duration.
2.4.4.
EAI in the Real World
In section 2.4.2 I mentioned that many commerual vendors have produced
integration engines. To help put this into a real-world context, I will briefly present
three EAI case studies and scenarios drawn from industry publications and
promotional literature. It is important to remember that these examples come from
RDk
commercial magazine editorials and vendor-published case studies. As such, no
academic conclusions should be drawn from what are essentially marketmg and
opinion pieces. Despite this caveat, they do accurately represent some of the cutrent
uses of EA1 in industry based on my own experiences workmg in thrs field.
Magaene Sivb.rmptions
A publisbmg company receives magazine subscription information from two
sources: either once a day in bulk, from a third party datacollection company who
Duncan Millard
U1796407
Chapter 2 Enterprise Application Integration
Page 20
\+&-JJ
$%r:,
L
\
.
process postal applications, or on an individual ad-hoc basis horn agents
taktag
telephone bookings. In both cases, a common business process runs to administer
the subscription records, executing tasks such as updatmg the subscriber database
and issuing a receipt to the subscriber (Altmanand Alttnan, 2004).
Content Bmkering
General Motors uses EA1 technology
as a
"broker", coordinating the
communication between a number of different applications. The broker converts
data between the many formats in use by the applications, and allows additional
legacy systems to be added to the integrated whole (VaLl, 2OOO).
FieM Force Enablement
Accenture offer a solution known as Field Force Ehablement, which allows field
workers to access data held in enterprise systems (Accenture, 2003). A work
management system is used to assign tasks to employees, who then use handheld
computers to report on the status of these tasks. Updates are automaticaly sent back
to an integration hub, which updates all of the affected enterprise systems.
2.4.5.
Asynchronous, Long-RunningProcesses
As illustrated in the previous section, and as described in the technical
documentation of commercial integration e w e s (e.g. Microsoft 2004), integration
engines are designed for hw&
volume, enterprise-scale processmg. As such, they
typically rely on an asynchronous, message-passing approach to communication
(Bussler, 2002b) meaning that no single application is tlghtly coupled to any other.
Duncan Millard
U1796407
Chapter 2 Enterprise Application lntegration
Pa!ge 21
The combination of asynchronous message passing and long-running transactions
means that the processing of a particular message will take an unpredictable length of
time. As a result, it is not possible to know the order in which incoming messages
will emerge from the integration hub ready for delivery to other applications.
2.5. The Message Ordering Problem for EAI
As illusttated in section 1.1, a variable order of message processing can lead to
problems with data integrity. At the human level, the HR example could have led to
an employee not receiving their fitst salary cheque. Ths shows that the message
ordering problem in hub and spoke integration can have a real and visible impact.
I will now examine some of the features of EA1 that affect message ordering. This
will allow the evaluation of existing ordering approaches and protocols to assess their
suitability for use in hub and spoke EAI.
2.5.1,
Cost of Integration
Accordmg to Attachmate Corporation (Attachmate Corporation, 2004), a major cost
of integra-
an application occurs when the application needs modification in order
to work in the integrated environment. This conclusion is supported
----_
of Espinosa and Pulido (2002). The ideal EAIcentric message
should therefore be non-invasive to the integrated application - that is, the
application should not need modification in order either to supply ordering
information with its messages, or to understand ordenng information on messages
that it receives.
Duncan Millard
U1796407
Chapter 2 Enterprise Application Integration
page 22
!A
2.5.2. Extensibility
The Content Brokering example in section 2.4.4 (Vah, 2000) demonstrates that, over
time, the applications participatmg in an integrated system will vary. Similarly, the
Field Force Enablement solution has no assumptions about which work
management or resource scheduhg package is in use, or even that such an
application exits in the integrated system.
An EA1 protocol therefore must not require explicit or implicit knowledge
other applications in the system to function.
This also allows an incremental approach to implementation, such as that described
in Emmerich et al. (2001), rather than requiring a hh-risk “all-in-one”
implementation.
2.5.3. Aspchronicity
As discussed in section 2.4.5, one of the main features of an integration engine is that
it uses asynchronous communication.Hence, it is essential that any protocol includes
a level of decouphg between the sending of a message and its receiving and
delivery.
2.5.4. Hub and Spoke Architecture
The hub and spoke archtecture described in section 2.4.1 is a major architecture in
modem integration solutions. A full consideration of architecture is vital in ensuring
--*,“
p r ,l
,
oz,-4,
“T
that a system can meet its required goals (Clements and Nothrop 1996), and
SO
any
A
:/
\
-
&LA*
A
,’ J1/
A
protocol must be compatiblewith the hub and spoke approach. This implies that the ,,,
,
,
4 1 _/
Duncan Millard
U1796407
Chapter 2. Enterpdse Application lntegdon
Page 23
.&
protocol must not rely on point-to-point communication between any of the
integrated applications, placing further importance on the need for asynchronicity.
2.5.5. Resilience
Integrated systems are, by defkution, situated on more than one physical
They are therefore susceptible to network failures, communication delays, and other
similar occurrences - there is no guarantee that a complex, multi-application
integrated system will be k h l y available. Any protocol must
erefore be resilient
8-
-
-
and capable of dealtng with the unavailability of, or delays in communicating with, a
remote application.
2.5.6. Flexibility and Efficiency
It is widely accepted that a system processing a single message at a time will show
lower throughput than a system capable of concurrent processing of multiple
messages. Given the enterprise-scale nature of M I , it is important that an orderiug
protocol does not place unnecessary constraints on the processing of messages,
particularly in hght of the asynchronous processing and unreliable communication
issues already discussed.
In most cases, the optimal level of efficiency is constrained by the spec& data
integnty requirements of a system. A real time stock ttadmg system would require
very strict ordering of messages to ensure fair tradtng, thereby limiting its
throughput. In other applications, applymg weaker ordering rules may be
appropriate, increasing throughput at the potential cost of data and transactional
integnty. An ordenng protocol should therefore allow the efficient throughput of
Duncan Millard
U1796407
Chapter 2 Enterprise Application Integration
Page 24
IATL
I
.!n
1-4
/ ; r vj4,.
$’/
messages by being flexible enough to adapt to an integrated system’s particular
needs.
2.6. S u m m a q
/
EA1 is a growth area of compuung, with the technology far fi-om mature. There are
still technical hurdles to overcome, one of which is to find a solution to the message
ordering problem inherent in an asynchronous hub and spoke architecture.
This chapter identified a number of features that can be used to assess a particular
ordering protocol’s suitability for EAI, and these will be used in the next chapter to
shape a literature review of existing ordering protocols.
Duncan Millard
U1796407
Chapter 2 Enterprise Application Integration
Page 25
Chapter 3
AN INVESTIGATION OF MESSAGE ORDERING
3.1. Introduction
\/
The previous chapter illustrated the problem of message ordering as it relates to
integra-
enterprise applications. Message ordering is a relatively mature discipline
within dstributed systems research, and this chapter presents a literature review of
current practice in this area. The message ordering approaches and protocols
identified are assessed with respect to theit suitability for EAI, accordmg to the
features described in section 2.5.
3.2. The Problem of Message Ordering
Message ordering is relatively simple to comprehend messages sent to a remote
system, where communication is subject to unpredictable delays, should arrive in the
coftect order.
There are problems and subtleties in determining the correct order (or, more
precisely, one of the correct orders) for message delivery in traditional distributed
systems. These problems have warranted extensive examination in literature (for
example Lamport, 1978; Cheng et uL, 1995; Murty and Gar& 1997; Fritzke, Jr. et al.,
1998).
Duncan MiIlard
U1796407
Chapter 3 An Investigation of Message Ordering
Page 26
3.2.1. An Informal Language for Discussion
Before dlustratmg the problem of message ordering, it is necessary to define an
informal language for presenting the example scenarios. Figure 4 shows the key
terns used.
ml: A message, ml
Send(m1): The sendmg of message ml by a system
Receive(m1): The receipt of message ml by a system
hb
+
hb
:
“Happens before” - e.g. Send(m1)
+
Receive(m1) means that
a message is sent before it is received
Figure 4: An informal language for message ordering
An immediate point to note is the decoupling of the sen-
of a message by a
system, and receipt by its destination. In essence, this is where the crux of the
message ordering problem lies: there is a variable delay between the sen-
of a
message and its receipt. A message can therefore be received by a remote system
before it is its “tum” to be processed - in other words when an earlier message on
which it depends has not yet been received (Murty and Garg, 1997).
3.2.2. Illusttative Scenarios
I have created a small number of scenarios to k h h g h t potential ordering problems,
and illusttated these with a simple timeline for asynchronous systems.
Duncan Millard
U1796407
Chapter 3: An Investigation of Message Ordering
Page 21
ScenatioJ with one Sender and one Receiver:
A
ml
m2
Figure 5 Sequential send, non-sequend receive
Figure 5 shows the most basic form of message ordering problem: system A sends
two messages, m l and m2, intendmg them to be processed in that order. Delays
cause them to be received, and hence processed, in the order m2 then ml. Stated in
the informal language, the intended sequence of events on process B is
hb
process(m1)
hb
+process(m2), but process(m2) + process(m1) is the sequence
that occurs.
In some cases the impact may be neglqpble, but it is not hard to imagine where this
could cause problems. With the Field Force Enablement example in section 2.4.4,
consider a field engineer logging an additional piece of previously unplanned work
on their mobile device (causing one message to be sent to the integration e w e ) .
When the engineer records that work as complete (causing another message to be
sent to the integration engine), the messages must anive in order, otherwise the
integrated system would be told of the completion of a piece of work that it did not
know about.
Dunan Millard
U I796407
Chapter 3: An Investigation of Message Ordering
page 28
Scenatios with Muhple Senders and one Receiver
A
ml
Figure 6: Sequential send, non-sequential receive
The scenario represented in Frgure 6 shows two separate processes, each senmessages to the same receiver. This is essentially the same as the previous scenario,
with the added complication that the messages originate from different processes. As
in the previous section, the likely intended sequence of events at process C is
hb
hb
process(m1)
+process(d), but
instead process(m.2)
+ process(m1)
is the
sequence that occurs.
-?
In some circumstances it d not be necessary to coordinate the delivery of these
-
messages
-
for example where the data sets for A and B are orthogonal, however
where one or both of the senders are issuing time-based instructions, order of
delivery could be critical. As an example, consider the subscription processing system
from section 2.4.4. Message ml could contain a batch set of records from third party
data processors, and m2 could contain a time-based instruction to generate the day’s
invoices. In this case, the batch records submitted on the day in question will not be
included in the day’s invoices.
Duncan Millacd
U1796407
Chapter 3: An Investigation of Message Ordering
page 29
Scenarios with Multrple Senders and Muhple Receiven:
A
B *
c
ml :
m2 m3
Figure 7: Dependent concurrent send
Flgure 7, taken from Yoshida (2001), shows one of the more diffidt problems to
address. Here, the sen-
of two messages is concurrent, with no apparent means to
control the order in which they are processed. It may be that the system needs to
constrain the order to preserve data integrity: for example ensuring that
hb
process(m2)
+
process(m3) because m3 needs the result of m2 to continue.
The Field Force Enablement system faces this problem when sending messages to a
mobile device. Message ml could represent a held worker n o w the work
management system of a newly identhed piece of work required in the field.
Message m2 would originate from the work management system to notify the
schedultng system of the new work, and m3 would be a message from the field
device as@
for a new schedule. In this particular case, it would be preferable for
the request for a new schedule to be processed after the message recordmg the new
piece of work.
DuncanMillard
U1796407
Chapter 3: An Investjption of Message Ordering
Page 30
3.3. Message Ordering Approaches
With the message ordering problem now illustrated, and the criteria for message
ordenng protocols in EA1 established in section 2.5, it is possible to evaluate existing
approaches and protocols for their suitability for use in EAI.
In the 1980s and 1 9 9 0 ~
academics
~
produced a number of approaches to message
ordenng designed to address the problems inherent with asynchronous distributed
systems. Early approaches such as causal ordering and total ordering have been
followed by more recent work, includmg application ordenng (Singh and Badarpura,
2001) and actual causal ordering (Cheng et al., 1995). These more recent approaches
attempt to enhance traditional causal ordering.
The follow subsections describe each approach to ordering and evaluate a protocol
that implements that approach.
3.3.1. Total Ordering
Total ordering is perhaps the most strarghtforward message ordering concept to
understand. It requires that all messages sent to multiple destinations are processed
at each destination in the same order - the order in which they were sent.
Furthermore, two messages sent concurrently by two different senders will always be
processed in the same order at every remote site that receives them, as shown in
hb
Figure 8. Total ordering ensures that, if process(m1)
+
process(m2) at site Cy
hb
then process(m1)
DuncanMillard
U1796407
+
process(m2) also holds at site D.
Chapter 3: An lnvestigatim of Message Ordering
Page 31
B
C
D
i
ml
m2
Figure 8:Total ordering
Pmtocol E valuation
Aganval et aL (1998) describe a protocol for total ordenng, Totem. Totem is designed
for a single-broadcast environment such as a local network with a number of
listening processes, together formLng a “token ring”. The protocol uses the concept
of a logical token, which is transmitted point-to-point between processes. Possession
of the token grants the holder the right to broadcast messages to the other members
of the ring. Each outgoing message receives a sequence number, which identifies the
order of sen-
of messages across the whole system.
In addition to the main application messages, the system supports configuration
messages, informing the ring of changes to its membership
- for
example a process
joining or leaving.
The requirement for synchronous point-to-point communication between processes
means that the token passing approach is not suitable for EA1 as it is incompatible
with an asynchronous hub and spoke architecture. The processes also need to
explicitly handle the token and control messages, requiring an invasive approach.
Extensibility is supported, as any application can be added simply by includmg it in
DuncanMillard
U1796407
Chapter 3 An Invesdgadon of Message Ordering
Page 32
the token ring, and the protocol has built-in resilience through detection of the
failure of a participatmg process. The protocol is not neither efficient nor flexible,
because it constrains the order of message transmission to the process holding the
token, with no way of overridrng thts.
Summay
Table 1 summarises the suitability of Agarwal et al.’s protocol for use in MI.
A tick
denotes that the protocol meets a criterion, and a cross denotes that it does not
Asynchronous
x
Hub and
Spoke
Compatible
x
Resilient
J
Efficient
x
Flexible
x
~
3.3.2. Causal Ordering
The basis for causal ordering is Lamport‘s well-known “happens before” relation
(Lamport, 1978) and is a relaxation of strict total ordering. There are a number of
formal and informal detinitions of causal ordering p e n in literature (for example
Tyler, 1994; Mostefaoui et al., 2001), but all are essentially variations of the informal
definition:
DuncanMillard
U1796407
Chapter 3 An Investigation of Message Ordering
Page 33
.
.
.
Event A happens before event B if the two events are from the same
process, and event A occurs in time before event B
Event A happens before event B if event A is the senof a message to a
process and event B is the receipt of that message by that process. Further, a
message cannot be received that has not previously been sent
The happens b$ore relation is commutative - if event A happens before event
B, and event B happens before event C, then event A also happens before
event C.
Further, if neither event A happens b$on event B nor event B hqpens before event A
then the two events are concurrent In contrast to total ordering, their order of
processing at a remote site is not defined and may vary between remote sites as
iUustrated in Figure 9.
A
6
C
D
i
m2
m?
Figure 9: Causal ordering
It is possible that no actual relationship exists between two messages even though
causal ordering prescribes one. Causal ordering therefore represents o d y the potential
causal order of a series of messages. As such, no causal ordering implementation can
be considered truly efficient, as it will introduce delays where none are warranted
(Cheng et d,1995).
Duncan h G h d
U1796407
Chapter 3: An Investigation of Message Ordering
page 34
Pmtocol E valuation
The protocol proposed by Kshemkalyani and Singhal(1995; 1996) for implemenang
(potential) causal ordenng is far closer to the requirements for EA1 than the total
ordering protocol of A p a l et al. It uses asynchronous message passing to an
arbitrary and dynamic number of destinations, with no assumptions made about
communication patterns (abridged from the list in Kshemkalyani and Singhal, 1996).
Causal data is held in two places: metadata added to the message itself, and message
logs at the receiving processes. Kshemkalyani and Singhal's implementation prove
that their implementation is optimal when compared to other causal ordering
protocols such as Binnan and Joseph, 1987; Raynal et al., 1991; Skawratananond et
al, 1998. This is because thek dual storage requires only the minimum amount of
data to be added to each message (as defined in Kshemkalyani and Singhal 1995),
and allows cleat and optimal requirements for persisttng causal history in message
logs. W s t this represents an efficient implementation of causal ordering, causal
ordering itself is not efficient per se, as it creates sometimes unnecessary delays in
processing, as will be shown in section 3.3.3. It is also inflexible, as the ordering is
based purely on the time each message is sent.
It is the responsibdity of the receiving process to consume messages in the correct
causal order using this data, hence makmg this an invasive protocol which is
unsuitable for EAI.
The approach of adding data to the transmitted message fits well with the EA1
feature of not relying on point-to-point communications between applications. AU
DuncanMillard
U1796407
Chapter 3: An Investigation of Message Ordering
Page 35
causal information is contained within the message itself, without reference to any
specific application in the system. The concept of a message log for persisung causal
data also fits well with a hub and spoke architecture, whereby such a message log
could be centralised within the hub to maintain consistent causal information for all
integrated systems.
Unfortunately, the protocol does not address resilience, instead assumrng that a
reliable communication method exists by which to transmit messages.
Szimmaty
Table 2 summarises the suitabdity of Kshemkalyani and Singhal’s protocol for use in
M I . A tick denotes that the protocol meets a criterion, and a cross denotes that it
does not
Non-Invasive
x
Extensible
J
Asynchronous
J
and
Hub
Spoke
Compatible
J
L
Resilient
x
Efficient
x
Flexible
x
Table 2 Suitability of Kshemkalyani and Singhal’s ordering protocol for EA1
Duncan Millard
U1796407
Chapter 3: An Investigation of Message Ordering
Page 36
33.3. Actual Causal Ordering
Cheng et al. (1995) developed the idea of causal ordeang M e r , by noting that
causal ordering based on Lamport’s beens before relation describes only thepotential
causal ordering. Their observation is that although the bappens b$ore relation causally
relates two messages, there may be no actlralrelationship between the two messages.
This could lead to an unnecessary delay in processing the second message while the
system waits for its potentially, but not actually, causally related predecessor.
They propose an ordenng approach called actual causal ordering, whereby an
application explicitly and programmatically specifies the causal relationships between
the messages that it sends. This approach ensures that messages are not delayed
unnecessarily when there is no causal relationship between the delivered and the
undelivered messages.
Pmtocol Evaluation
The protocol for actual causal ordering described by Cheng et al. (1995) is very
similar to the approach to potential causal ordering described in the previous section
- the protocol adds data to an outgoing message to iden@ the causally preceding
messages. The difference is in the definition of causally precedmg.
In Cheng et all’s approach, the application developer specifies the causal order of sent
messages using programming constructs, makrng it an invasive protocol. This
ensures the optimum throughput - or more accurately the minimum latency
-
by
ensuring that messages do not incur unnecessary delays. The protocol is therefore
flexible and adapts to the particular application, lea-
to efficient delivery of
messages by reduung unnecessary delays.
Duncan Millacd
U1796407
Chapter 3: An Investigation of Message Ordering
Page 37
Maintainmg the full causal history on each message allows the protocol to delmer
causally related messages to different receiving processes, ensuring that the protocol
itself is independent of the destination processes and hence extensible.
This protocol also provides a degree of atchitectural detail not found in previously
described work. It uses protocol s
m to implement the actual causal ordering,
accessed via a programmatic interface. Each protocol server has two parts: a sender
and a receiver, which as the names suggest are responsible for transmimng a message
to a destination, and dehenng a received message respectively.
The decouplmg of the sender and the receiver is a good abstraction for supporting
asynchronicity, which is critical for EAI.By positioning senders and receivers within
the integration hub, the protocol could be adapted to a hub and spoke approach.
Duncan Millard
U1796407
Chapter 3: An Investigation of Message Ordering
Page 38
Summaly
Table 3 summarises the suitability of Cheng et aL’s protocol for use in EAI. A tick
denotes that the protocol meets a criterion, and a cross denotes that it does not
Non-Invasive
x
t-t-+
Asynchronous
Hub
and
Spoke
Compatible
J
Resilient
J
Efficient
J
Flexible
J
Table 3: Suitability of Cheng ct 01’s ordering protocol for EA1
3.3.4. Application Ordering
Noting that causal ordering was effectively a weakening of constraints between
messages compared to total ordering, Singh and Badarpura (2001) propose an
approach whereby message ordering requirements a e instead strengthened, based
upon application-specific constraints.
An example of where this is useful, paraphrased from their paper, considers a
distributed teachmg/student application in which students send questions to an
instructor, who then replies. The application may wish to enforce that an instructor’s
reply must be delivered to all students before the next student question is displayed
to the students, hence ensuriag a correct paidag of question and answer for all
Duncan Millard
U1796407
Chapter 3: An lnvesbgatim of Message Ordering
Page 39
present Causal and total ordering would not allow this, because the second student
question may well precede the sen-
of the answer.
In the example above, the communication layer of the distributed application knows
in advance that a question will generate an answer, allowing it to predetermine a
sequence number for the reply. The authors show that an “ordenng specification”
can be used to encode application-specihc knowledge, causmg a reduction in both
the time taken to allocate and determine ordering information, and the complexity of
the synchronisation logic needed by the application.
Protocol E valuation
Smgh and Badarpura’s application ordering protocol “pre-allocates” spaces in a
sequence for future messages that the application knows will be generated - for
example an answer message in response to a question message. In doing so, they
claim efficiency savings when compared to (potend) causal ordering. The actual
causal ordenng approach, which adopts a similar principle, is not considered in their
paper. This invasive protocol supports aspchronicity and a hub and spoke
architecture as it is does not rely on point to point communication between senand receiving applications.
Although the sequencing produced is efficient for the partic&
application, the
sequence is tied to the message flow of a specific set of applications, making it
inflexible. The partiupattug applications could readily change in an integrated
environment, as new systems are added, meaning that it would not meet the
extensibility criteria I have identified.
DuncanMillard
U1796407
Chapter 3: An Investigation of Message Ordering
p.g. 40
However, the concept of interpreting application data and deducing causal ordering
from this is one that merits further investtgation, as the nature of XML means that
application data can be readily accessed and understood. This is examined further in
sections 4.3.2 and 4.4.
Summa9
Table 4 summarises the suitabihty of Singh and Badarpura’s protocol for use in MI.
A tick denotes that the protocol meets a criterion, and a cross denotes that it does
not.
Non-Invasive
x
Extensible
Asynchronous
Hub
and
Spoke
Compatible
Resilient
Duncan Millard
U1796407
J
Efficient
Not specified in
Paper
J
Flexible
x
Chapter 3: An Investigation of Message Ordering
Page 41
3.4. S u m m a r y
/
Table 5 summarises how each of the protocols compares against the EAI-suitabihty
criteria identified in section 2.5.
Compatible
Resilient
x
J
J
Not specified ir
Papa
Efficient
x
x
J
J
Flexible
x
x
J
x
Table 5: Suitability of message ordering approaches for EA1
x
Duncan Millard
ut796407
Key:
= Feature not met
= Feature met
Chapter 3: An Investigation of Message Ordering
Page 42
3.5. Conclusion
It is clear from the above that no smgle protocol or approach meets the criteria for
EAI.
The major problem area in
all
protocols is the use of an application-invasive
approach and the lack of flexibility that this implies. As already discussed, these are
lughly undesirable features in EAI.
In the next Chapter, the best aspects of the above protocols will be combined to
define an MI-specific message ordenng protocol.
Duncanmad
ut796407
Chapter 3 An Investigadon of Message Ordedag
Page 43
Chapter 4
INFERRED CAUSAL ORDERING: A PROTOCOL FOR EA1
4.1. Introduction
~
~~~~~
~
J
~~
~
The previous chapter summarised the strengths and weaknesses of a number of
existing message ordenng protocols against the features of EA1 that affect message
ordering.
This chapter proposes an EAI-centric protocol for message ordering, and then
assesses it against the evaluation criteria used in Chapter 3.
4.2. Guaranteeing Causal Ordering
Cheng et al. (1995) idenafy three aspects which are necessary to guarantee causal
orderkg
.
.
Ob-
causahty from application programs
Representing and conveying causahty
Preserving causality (that is, ensuring that the causahty requirements are
obeyed)
To these, I will add one more point that is required to ensure non-invasiveness to the
receiving application:
Removal of causality information from delivered messages
Duncan Millard
U1796407
Chapter 4 Inferred Causal Order& A Protocol €or EA1
page 44
In other words, a message that has been modified to “represent and convey
causality” must have this information removed before it is delivered to the target
application. If this does not happen, the message could be rejected as invalid for
contarnrng unexpected data.
43. Obtaining Causality from Application Programs
4.3.1. Basic Ordering Information
One of the key issues for any protocol is how to obtain the correct ordenng
semantics. As shown in the previous chapter, traditional protocols require that the
sending application explicitly specifies ordering mfonnation. To avoid this
application-invasive approach, responsibility for ad-
these semantics must be
deferred to the integration hub.
A simple sequence number could be applied to each message as it d e s at the
integration hub. Provided that this order is enforced when delivering messages, the
system would be operating with total ordering. Although this is a valid approach, a
more sophisticated approach is made possible by considering the nature of XML
messages.
43.2. XML
An XML document contains both data and structure. Tags delimit data items in a
form that follows a number of simple rules. A basic XML document showmg details
about a person is shown in F w e 10.
Duncan Millard
U1796407
Chapter 4 Inferred Causal Ordedng. A Protocol for Eru
Page 45
<Person>
<Forename>Duncan</Forename>
<Surname>Millard</Surname>
<Title>Mr</Title>
</Person>
Figure 10 A simple XML document
As Figure 10 shows, XML is a very easy format to interpret The rules governing its
syntax are relatively straightforward, ma-
it simple to implement parsers to create
and consume XML data.
Sections 2.4.2 and 2.4.3 described how two of the main tasks of an integration engine
are to transform the format of messages, and to execute business processes based on
that data. This implies that the format of messages passed to the hub must be well
known, and that the integration hub is empowered to make decisions based on the
implicit meaning in the data that it receives.
Section 3.3.4 described the use of application data to predict future message ordering
in the approach called application ordering. I propose taking this idea and combining
it with the transparent nature of XML to infer the causal relationships between
received messages.
4.4. Inferred Causal O r d e b
~~
~~~~
Consider Figure 11, which shows three messages sent in succession by an HR
application. Traditional potential causal ordering mandates that these messages must
be processed in the order message 1, then 2, then 3. By inspectmg the data however,
it can be assumed that message 2 is independent of messages 1 and 3.
Duncan Millard
U1796407
Chapter 4 Inferred Causal Ordering: A Protocol for EAI
page 46
!4essage 1:
<CreateEmployee>
<Number>12345</Number>
<Name>Millard</Name>
</CreateEmployee>
Message 2 :
<CreateEmployee>
<Number>l2346</Number>
<Name>Adams</Name>
</CreateEmployee>
Message 3:
<SetPayLevel>
<EmployeeID>12345</EmployeeID>
<PayBand>C</PayBand>
</SetPayLevel>
Figure 11: XML messages
This assumption has been deduced from the ‘Number” element of the
“CreateEmployee” message, and the “EmployeelD” element of the “SetPayLevel”
message, as well as from the order in which messages were sent from the HR
application. Making decisions based on the content and structure of an incoming
message is an integral part of the operation of the message hub (as discussed in
section 2.4.3); this approach to obtaining causal information therefore fits well with a
hub and spoke integration architecture.
In summary, there are two sources of ordering information that can be combined
9
.
Data - to determine the grouping of a message with other messages
Sending order - to determine the order of messages within a group
Duncan Millard
U1796407
chapter 4 Inferred Causal Ord+.
A Protocol for
Page 47
Ob-
data from these sources requires no special interaction with the integrated
application - they are simply extracted when a message anives at the integration hub.
I call this approach z?$md causal odring.
4.4.1.
Causal Message Groups
Inferring relationships in this way creates what I term causal message pz@. Each
message within a causal message group is causally dependant on the earlier messages
in that group as defmed by the sequence numbers within that group.
Each causal message group is assigned an appropriate identifier constructed from the
content of the message
-
for example fiom Figure 11 two causal groups could be
created with group identifiers of Employee12345 and Employeel2346.
The identification of causal message groups is entirely dependent upon the data
received, hence allowing the creation of different groups depending on the specific
ordering needs of the application being integrated.
4.4.2. Cross-Group Dependencies
So far I have assumed that each causal group is fully independent - for example one
causal group relating to employee 12345 and one relating to employee 98765. Given
this causal grouping, consider a message that relates to both employee 12345 and
employee 98765, such as that shown in Figure 12. It is not immediately clear to
which causal group the message should belong - effectively it could belong to either.
Interpre-
the message, the inferred information is that the message depends on
the “Createhployee” messages for both employee 12345 and employee 98765.
DuncanMillard
U1796407
Chapter 4 Inferred Causal Odenng: A Protocol for EA1
page 48
~
Message 4 :
<SetManager>
<ManagedEmployeeID>12345</ManagedEmployeeID>
<Manager>98765</Manager>
</SetManager>
Figure 1 2 A cross-group dependency
In order to model this, the message is assigned to one of the causal groups, and a
crvss-gmzp a$fjen&nT referencing the CreateEmployee message in the other group is
added.
In this particular example, the business rules and protocol configuration allow the
deduction that the Createhployee message is always the first message in a causal
group. The message can therefore be added to the causal group Employee12345 and
a dependency added to message number 0 in the group Employee98765.
4.5. Representing and Conveying Causality
The previous section identified two pieces of causal information that must be
persisted
The causal group that a message belongs to
9
The sequence number of a message w i h that group
Causal ordering protocols typically have two places that ordering information is
stored (Chandra et aL,2004):
A dynamic causal message log
Control data added to each message
Duncan Millard
U1796407
Chapter 4 Inferred Causal Ordering A Protwol for EA1
Page 49
4.5.1. CausalLog
In order to apply causal information to incomrng messages, the protocol needs a
centralised “causal log”, with an entry for each causal message group. Each entry will
store the identiher of the group and the next sequence number to apply to a new
message belonging to that group. As messages are received, the sequence number is
increased.
-ta
The use of a message log in this way is similar to that of Kshemkalyani and Singhal
(1996) and as described in section 3.3.2, but is centrally located rather than
distributed at each remote process.
4.5.2. Message Annotation
In addition to maintaumg the causal log, it is necessary to append causal information
to each message so that the causal identity of every message is known. This
annotation will inttoduce an overhead on every message that flows through the
system.
The information that must be annotated to every message is:
.
Causal message group identifier
Causal message group sequence number
Dependencies on messages from other causal groups
In order to ensure that the protocol is non-invasive, it must also be possible to
remove the causal information prior to delivery to the destination system. There is a
Duncanhmud
U1796407
Chapter 4 Inferred Causal Ordedng: A Protocol for EAI
Page 50
I L L +
general technique of temporarily annotating an XML message, known as enveloping
(Hohpe and Wool€, 2004), iUustxated in Figure 13 below.
An un-enveloped message:
<Message>
<Fieldl>Data</Fieldl>
</Message>
The same message, enveloped:
<Envelope>
<AddedData>Values</AddedData>
<Message>
<Fieldl>Data</Fieldl>
</Message>
</Envelope>
Figure 13: Message enveloping
An implementation of the protocol can make use of this technique within the
boundaries of the integration hub to ensue that from the point of view of the
sendmg and receivmg applications the messages remain unchanged.
4.6. Preserving causality
'Treserving causality" means ens-
messages are processed by their destination in
the correct order. Rephrased, a way is required of delaying message delivery to a
destination until such time as all its causally preceding messages have been
processed.
4.6.1. Remote Message Stores
Kshemkalyani and Siaghal(l996) used a message store at each remote process. The
centralised nature of a hub and spoke atchitecture allows this mechanism to be
centtally implemented, removing the need for multiple remote stores.
DuncanMillard
U1796407
Chapter 4: Inferred Causal Order@ A Protocol for EAI
Page 51
4.6.2.
Dynamic Message Log
The protocol d use a dynamic message log to store messages that have been
processed by the integration engine, but whose causal predecessors have not yet
been delivered to the destination application. In a similar approach to the causal log
described in section 4.5.1, the dynamic log will need to track the next sequence
number to be delivered for each causal group to each destination. Once the next
message for a particular causal message group and destination appears in the
dynamic log as identified by the causal information annotated to that message, the
message can be delivered to the destination. In this way, causahty is successfully
preserved.
4.7. Removing Causality Information
As discussed in section 4.5.2, an envelope can be easily added to and removed from
an XML message. This message annotation is only required for plaung the message
in the dynamic message log. Once the message is ready to be delivered to a
destination application, the causality annotation can be removed.
4.8. Evaluation
4.8.1.
Evaluation against EAI Features
The proposed protocol offers a non-invasive way of obtaining ordenng information
from received messages. The use of enveloping ensures that this causal information
can be appended to messages without affecting the source or destination systems.
By centralising the message logs, hub and spoke compatibility is ensured and
resilience is increased. If a remote application is unavailable, for example due to
DuncanMillard
U1796407
Chapter 4 Inferred Causal Od-.
A Protocol for EAI
Page 52
network problems, the message can be held in the log until such time as delivery is
possible.
One negative impact on resilience is that the dynamic message log represents a single
point of fdure for the system, a problem common to any centralised system. It is
worth noung that an integration hub is, by definition, a centralised system. As such,
the integration engine and the hub as a whole are subject to the same problem. One
way to mingate this is to ensure that any system implementing the protocol shares
the same server(s) as the integration engine. This minimises the chance that the
protocol system would be unavailable whilst the integration engine was still running
and available: any external system failure would affect the integration hub as a whole
and not just the protocol's message logs.
The dynamic message log also represents a clear decouphg of the sending of a
message to the hub, and its receipt and processing by a remote system, ensuring that
asynchronous processing is possible.
Finally, the protocol offers a flexible and efficient approach by infenkg ordering
information horn the data that it receives, rather than mandating an order purely on
the basis of the tuntng of the sen-
of messages.
Table 6 below summarises how each of the EA1 features is met by the proposed
protocoL
DuncanMiuacd
U1796407
Chapter 4 Inferred Causal Od-.
A Protocol for E N
Page 53
How this is achieved
EAI criteria
Non-Invasive
message data, not from sending
application
ctensible
J
Protocol makes no implicit or
explicit assumptions about the
source or destinations for a
message. It purely operates on the
data received.
synchronous
J
A dynarmc message log, centrally
located, decouples the sendmg and
xeiving of messages and allows
synchronous operation.
[ub
and
iompatible
Lesilient
Spoke
J
h e protocol is deslgned to operate
I
a centralised environment,
aatchmg the hub and spoke
rchitecture
J
%e dynamic message log offers a
vay to delay delivery of messages in
he event of a communication
'dure. The trackmg of delivery
nfonnation separately for each
iestination means that no one
iestination depends on any other.
5fficient
J
Zausal message groups allow
Zausally unrelated messages to be
processed independently, instead of
having their order constrained
unnecessarily.
Flexible
J
The protocol creates causal message
groups based on the data in a
message, and as such allows the
implementation
of
differenl
ordering restrictions d e p e n d q or
the needs of the applications being
integrated.
Table 6: Summary of an inferred causal ordering protocol for EA1
Duncan Millard
U1796407
Chapter 4 Inferred Causal Ordedng: A Protocol for EAI
Page 54
4.8.2. Theoretical Efficiency of the Protocol
Introduction
The theoretical efficiency of an algorithm consists of determuzrng mathematically the
quantity of resources (execution time, memory space, etc.) needed by an algorithm as
a function of the size of the input instances (Brassard and Bratley, 1988). For
example, if the quantity of resources scales linearly as the input size increases, the
algorithm is said to be “in the order of O(n)”, where n is the input. An important
point to note is that the theoretical efficiency does not measure the
actual
performance of an algorithm; it instead measures its ability to scale for larger inputs.
As such, constant offsets are ignored
-
an algorithm requiring 20+n units of
resource is still in the order of O(n), hence explaining the lack of units on the Y axis
of Figure 14.
l
;
a
0 1 2 3 4 5 6 7 8 91011
Input Variable Size
Figure 14: An algorithm exhibiting O(n) efficiency
Similarly, if an input of size 2 requires twice as much resource as an input of size 1,
and an input of size 4 requires 16 times as much resource as an input of size 1, the
alg0rh-h is said to be in the order of O(n2), as illustrated in Figure 15.
DuncanMillard
U1796407
Chapter 4 I n f d Causal Ordering:A Protocol for EAI
Page 55
1
0 1 2 3 4 5 6 7 8 91011
Input Variable Size
Figure 15: An algorithm exlubiting O(n2) efficiency
Finally, if the resource required is independent of the variable size, the algorithm is
said to be “in the order of 0(1)”, as illustrated in Figure 16.
0 1 2 3 4 5 6 7 8 9 10
Input Variable Size
Figure 1 6 An algorithm exhibiting O(1) effiaency
DuncanMillard
U1796407
Chapter 4 Inferred Causal Od-.
A Protocol for EA1
Page 56
The heoretical efficiency of the protocol can be predlcted by examining the
following forms of space overhead
.
.
9
Individual Message Overhead
CausalLogSize
DynarmcLogSize
Message Overhead
The metadata added to a message is as described in section 4.5.2. A message is
guaranteed to be assigned
.
9
A causal message group identifier
A causal message group sequence number
This p e s a constant overhead per message, and eliminates the need to attach a full
causal history to every message. Hence, with respect to other applications in the
system, the algorithm is in the order of O(1).
The only other data added to a message occurs if a message depends on one or more
messages &om one or more other causal groups. Ln this case, only a dependency to
the most recent message from each group is required, If there is no dependency, no
additional data is added to the message.
Hence, the best case overhead is in the order of O(1) and the worst case overhead is
in the order of O(n), where n is the number of groups containing a message on
which a message depends.
Duncan Millard
U1796407
Chapter 4 Inferred Causal Ordering: A Protocol for =I
Page 57
Causal Log Overbead
The causal log needs to maintain a permanent record of the current sequence
number of each causal group for each destination. Hence the size of the message log
is proportional to the product of the number of causal message groups and the
number of destinations
-
the overhead is in the order of O(n.m), where n is the
number of message groups and m is the number of destinations.
Dynamic Log Ovethead
In addition, in order to support both resilience and ordenng, the messages
themselves must be stored until they have been processed by the remote application.
The dynamic log size is therefore dependent upon the rate of receipt of messages
destined for an application and the rate at which messages are consumed by that
application. Once processed, a message can be discarded, m e w that the dynamic
log overhead will vary with time.
4.9. S l l m m a r v /L
This chapter proposed a new message ordering protocol suitable for use in
Enterprise Application Integration. It has the unique feature of infemng ordering
semantics from the messages themselves rather than relying on the participatmg
applications to support message ordering.
The use of a dynamic message log and a causal log will allow central control of
ordering, and messages wiU have ordering information added and removed as they
enter and leave the integration hub.
DuncanMillard
U1796407
Chapter 4 Inferred Causal Ord-
A Protocol for EM
Page 58
An evaluation of the theoretical efficiency of the protocol gave a prediction of the
performance of the protocol that is suitable for experimental validation.
Duncan Millard
ut796407
Chapter 4 Inferred Causal Ord-.
A Protocol for EA1
Page 59
Chapter 5
PROTOCOL EVALUATION
5.1 Introduction
/
The previous chapter presented a protocol for message ordering in EA1 and
performed a theoretical evaluation of its efficiency. This chapter defines the
approach for measuring the performance characteristics of the protocol when
implemented in a simulation environment.
5.2. Evaluation Approach
Theoretical analysis of the protocol in section 4.8.2 allowed the prediction of a
number of performance characteristics of the algorithm. In order to test these
hypotheses, and to prove that the protocol does ensure ordered message delivery, it
is necessary to run the protocol in a simulation environment.
To ensure that meanqfid results are obtained, I will first idenafy measures that will
allow the system’s performance to be modelled, and then identify the parameters
required by the simulation to exercise these measures.
5.3. Performance Measures
In order to measure the actual performance and the suitability of the protocol to
EAI, it is necessary to idenafy measures to assess the protocol in the context of the
EA1 features identified in section 2.5. It is also necessary to measure the message
overhead, the causal log overhead, and the dynamic log overhead.
DuncanMillard
U1796407
Chapter 5: Protocol Evaluation
Page 60
5.3.1. Test Measures: Theoretical Efficiency
Message Overhead
The message overhead is the amount of metadata added to each message, measured
in bytes or kilobytes. This will show the variance of the overhead in different
scenarios, and should confirm that the overhead is in the order O(1) for causal
information and O(n) for causal dependencies.
Causa! and Dynamic Log Overheads
The number of message groups recorded and the number of messages held in the
dynamic log will be measured in each of the test cases. This d show the variance of
these overheads in different scenarios, and should confirm that they are in the order
O(n.m) for causal information, and variable for the dynamic log dependlag on
factors such as application availability and backlog.
5.3.2.
Test Measures: EAI Features
Resilience
The resilience of the protocol is its ability to cope with the unavailability of a remote
system. This is measurable by tu~llng a series of messages destined for two
applications with both applications available, and then repeating this test when one
of the applications is unavailable. The impact on the dynamic log overhead and the
time taken to process all messages for the application will be measured.
5.3.3. Test Measures: Non-Quantitative Testing
Fhxibibg, Eficieny
It is not possible to measure the flexibility of the protocol quantitatively. Instead, the
data used in the test cases will be set up to contain a value representing a logical
stream number’. The efficiency and flexibility of the protocol will be implicitly
Duncan htiilad
U1796407
Chapter 5: Protocol Evaluation
Page 61
shown by the creation of multiple causal message groups based on the value
contained in tlm field. Test cases will be run to show the impact of splitung
messages into different causal groups in this way to q u a n G the benefit this
approach brings.
Extensibility
Similarly, it is not possible to explicitly measure the extensibjltty of the system.
Instead, the use of an arbitrary message format and different numbers of destination
systems will show that the protocol operates successfully independently of the
applications operatmg in the integrated system.
Non-Invasive
The sendug and receiving test applications will not have any knowledge of causal
ordering, the sen-
application will simply transmit test messages to the integration
hub. The messages stored in the causal log ready for delivery will be checked to
vedy that they do not contain any causal ordering information. These two factors
will prove that the protocol is non-invasive.
Asynchrvnikg and Hub and Spoke CompatibiIity
The sitnulation system’s test cases will use an asynchronous hub and spoke
archtecture for processing. The success of the test cases will show that the protocol
is hub and spoke compatible and operates correctly under asynchronous conditions.
There are no widely accepted or standard benchmarlang programs for causal
---
-
_-
ordering (Chandra et a,!, 2004). The best alternative is to run a simulation EA1 system
\
DuncanMillard
U1796407
I
Chapter 5: Protocol Evaluation
Page 62
r, ,
with a known and variable range of characteristics, and use this to model the
performance of the protocoL
Intuitively, there are three components to a message ordering EA1 simulation
system:
The sending application(s)
9
The integration engine
The receiving application(s)
In order to simulate real-world situations and assess the impact of different types of
system on the protocol's performance, it is necessary to introduce a number of
variables to the simulation.
5.4.1. System Variables
The Sending Application
The Number of Sending Applications: As the number of sending
applications increases, there is an expected impact on message log size.
Time Between Sends: The discussion in section 4.8.2 hypothesises that the
more Oequently an application sends messages, the greater the load on the
message log over time as messages form a backlog, &en a constant receiver
rate. This variable allows the testmg of that hypothesis.
Potential Message Concurrency: Any message sent by an application may
or may not be causally related to a predecessor. Thls variable models the
likelihood that a message is causally independent of other messages. A value
of 100 means that every message will be independent (that is, causally
unrelated to any other message), and a value of 0 means that no message
(other than the h t message sent) is independent
Potential Cross-Group Dependency: Application-spedc semantics m a y
mean that a message in a causal group depends on a message from another
causal group. This models the percentage likelihood of a cross-group
dependency being added to a message.
Duncan Millard
U1796407
Chapter 5: Protocol Evaluation
Page 63
The EAZ Hub
‘
Duration of a Business Process: One of the main reasons for message
ordering problems in EA1 is the differing duration of the business processes
tnggered by each incoming message. This variable represents the probability
that a random delay will be added to an incoming message. A zero value
implies that no messages will be delayed - in other words all follow the same
processing rule - whilst a value of 100 implies that every message is subject to
different delays. This allows the determination of the protocol’s sensitivity to
variable processing times
’
Transmission Delays: In addition to business process duration, a
transmission delay can occur in the system. The impact of this delay on
protocol performance can be measured by introducing a €ked delay on all
communications and measuring the impact on message log and system
thtoughput.
9
Number of Destinations for a Message: The EA1 hub is responsible for
processmg incoming messages and deciding to which of the potential
destinations to send the message. The upper bound of this variable is the
total number of applications in the system and the lower bound is zero. This
allows assessment of the “multicast sensitivity” (Chandra et d,2004) of the
algorithm.
The Receiving Application
8
’
The Number of Receiving Applications: This variable is required as a
counterpart to the ‘Number of destinations” variable of the EA1 hub.
Enabled: In order to test resilience it must be possible to “switch off’ one
or more destinations so that they do not respond to message delivery
attempts.
’
Message Receipt Frequency: Intuitively, the frequency with which
receiving applications consume messages has a direct impact on the
performance of the algorithm. This variable allows the testing of that impact.
5.5. Test Cases
Based on the variables in section 5.4 and the performance measures in section 5.3, I
created a number of tests cases to exercise each set of variables. The full list of test
cases
and
steps
are
shown
in
Appendix
A,
and
summarised
in
Table 7 below.
Duncan Millard
U1796407
Chapter 5: Protocol Evaluation
page 64
1
kiteria
Measurements
Measured by Test Case
Ifficiencg
Creation of message groups
based on a logical stteam'
-due
Potential efficiency and impact
3f concurrent groups
h e taken for end to end
aocessing of non-failed
.pplication
'
Resilience
dessage Log Overhead
qexible,
nvasive
Non-
hccess at processing for an
Ltbitrary message format and
lumber of senders and
iestinations, without affectmg
ipplications
Impact of number of senders
on logs, impact of number of
destinations on logs (implicit
tests of suitability)
Extensible,
Asynchronous
particularly
resilience,
Repeated test cases run All,
without failures with different impact of number of senders,
and number of destinations
system conhgurations.
Message
Overhead
Variance of actual size of the
message over time d w i q
each test scenario
Message
Overhead
Log Number of causal messagc
groups tracked over time
All, particularly impact of crossgroup dependency
All test cases
Number of messages held it
the buffer pendeliver!
over time
Table 7: Test measures
5.6. S u m m a r y
J
Chapter 4 proposed a data-driven, flexible protocol for message ordering in M I ,
and made predictions about the performance of that protocol under particular
circumstances. This
chapter has identified the aspects of the protocol that must be
tested in order to vefifv and understand its performance in response to varying input
conditions.
Duncan MUad
U1796407
Chapter 5: Protocol Evaluation
Page 65
Chapter 6
TEST RESULTS
6.1 Introduction
9/
The previous chapter identilied the variables that are necessary to test the protocol.
This chapter briefly describes the tesung methodology used to exercise those
variables, before presenting the results of the tests.
The chapter closes with a summary of the performance characteristics of the
protocol.
6.2. Methodology
In order to give coverage of every variable, nine test cases were identitled. Each test
case consisted of a number of steps; in each step the value of a variable was changed.
The full list of test cases and steps can be found in Appendix A.
Each simulated sendtug application generated an XML message containing the test
name, a unique sender number, a test iteration number, and a “logical stream”
number to simulate causally independent messages. The protocol was configured to
construct causal message group names by combining these elements to give causal
message group identifiers similar to:
“Test-Number0 fhfessages1O O ~ I t e r a t i o n ~ 2 ~ F r o m ~ O ~ S ~ ~ ~ O ” .
The simulation system is described in detail in Appendix B.
Duncan Miuacd
U1796407
Chapter 6: Test Results
Page 66
For each step, the system was cleared of data from the previous run,a “warm up”
test case was executed to negate any initialisation delays, and then a fixed number of
messages were passed through the system and the results recorded. Each step was
executed three times to ensure accurate results. Some variance was observed, which
is discussed in section 6.5.2.
In total approximately 150 step executions were carried out, generating in excess of
25,000 rows of data.
63. Message Ordering
The fitst, and most important, aspect of the protocol is that it must ensure all
messages are delivered in order to all destinations. Although the asynchronous
name of the system meant that all test cases implicitly tested the in-order delivery
aspects of the protocol, the variable business process speed tests in particular caused
messages to pass through the system at different rates.
Each test had a variable likelihood of causing a delay of up to 10 seconds to the
processing. Table 8 shows the number of messages that passed through the
integration hub out of order, and confirms that all of those messages were then
delivered in order. The quantities were calculated based on the arrival time of the
message in the message log, and its delivery time to the remote destination, both of
which were recorded by the simulation system.
DuncanMillard
ut796407
Chaptez 6 Test Results
Page 67
~~
number
of
Average number of Total
messages arriving out of messages delivered out
of order to destination
order in dynamic log
Test case
I
36
25% chance of variable
0
business process speed
50% chance of variable
36
0
39
0
business process speed
75% chance of variable
business process speed
10Oo/o chance of varying
business process speed
6.4. Message Overhead
The message overhead was as predicted, bemg in the order of O(1) for all variables
other than cross-group dependency. Figure 17 shows that there is a constant
message overhead, irrespective of the number of destinations.
:
,500
O01
:
Q
>r
e
400
f
300
Q
9
200
Q
I
100
0
I
I
I
I
I
I
2
4
6
8
10
12
Number of Destinations
Figure 17: Message overhead independent of number of destinations
DuncanMillard
U179W7
Chapter 6 Test Results
Page 68
Similatly, F q e 18 shows a constant message overhead irrespective of the number
of causal message groups.
6oo
h
(D
1
500 1
I
e 400
#I
Q
B 200
E
100
0
0
2
4
6
8
10
12
Number of Causal Message Groups
Figure 18: Message overhead independent of number of causal message groups
Duncan Millard
U1796407
Chapter 6 Test Results
Page 69
For cross-group dependencies, the overhead was in the order of O(n) as shown in
F e e 19 below, where n is the number of dependencies expressed.
1200 -
3 1000 a¶
B0
800 -
Q)
E
0
600 -
EI)
Q
g
400-
200
-
04
I
0
I
1
I
2
1
I
3
4
Number of Dependencies
Ftgure 19: How message overhead varies with cross-group dependencies
The precise size of the overhead is specific to the simulation system implementation.
For a real world implementation, a more efficient XML representation could easily
be adopted.
Duncanw
ut796407
Chapter 6 Test Results
Page 70
6.5. Message Log Overhead
The message log overhead consists of the overhead of both the causal log and the
dynamic log.
6.5.1. Causal Information
The causal log size behaved as predicted. The following series of graphs show the
behaviour of the causal log under dif€erentconditions.
Number 0sDestinations
F w e 20 shows that, when tested using a constant number of causal message
groups, the log size is in the order of O(n) with respect to number of destinations.
04
I
4
1
2
I
3
I
4
I
I
5
6
7
I
8
I
9
1
0
I
I
I
1
1
Number of Destinations
Figure 20: How causal log size varies with the number of destinations
DuncanMillard
U 1796407
Chaptez 6 Test Results
Page 71
Number of CausalMesJage Gmrrps
Similarly, Figure 21 shows that for a constant number of destinations, the log size is
in the order of O(n) with respect to the number of causal message groups.
I
I
,
I
I
2
3
4
,
5
I
6
I
7
I
8
I
9
1
I
10
11
Number of Causal Message Groups
F p r e 21: How causal log size varies with the number of causal groups
Duncan Millard
U1796407
Chapter 6: Test Results
Page 72
Number $Destinations and Message Gmqbs
Figure 22 combines these measurements to show that the causal message log size
grows propomonally to the number of message groups and destinations.
A
180
&'
1601
-
causal
rn 80-100
Log Size
(entries)
10
"
Number of Destinations
IL
13 14 15
,;
Number of Causal
MessageGroups
Figure 22: How causal log size varies with the number of destinations and causal groups
Number @Sending Applications
The number of causal message groups created depends endtely on the configuration
of the protocoL The number of sending applications therefore had no direct bearing
on the message log size. As discussed above, the protocol was configured to
construct separate causal message gtoups €or each sending application, resulting in
the message log overhead bekg in the order of O(n) with respect to the number of
sendmg applications. Different rules for constructmg causal groups - for example
one which ignored the sending application's identity - would result in a different
relationshp .
Duncan Millard
U1796407
Chapter 6 Test Results
Page 73
6.5.2.
Dynamic Message
Log Size
The dynamic message log holds messages until they are ready for delivery to a
destination. The following series of graphs shows the impact of a number of
variables on the message log size.
Number $Messages Sent
The hrst test was to understand how the number of messages sent affects the log
size, for a single sender and s q l e receiver both o p e r a w with the same latency (I
message sent per second, 1 message received per second).
Flgute 23 shows the dynamic log overhead plotted against the run time of the test
scenario, for input message quantities of 50, 100, 200 and 4-00. The x-axis has been
scaled to a percentage rather than an absolute run time to allow comparisons
between the log sizes for each test case.
Duncan Millard
U1796407
Chapter 6 Test Results
Page 74
4
-.-..
8 -
,’
.-_ _ _._.-
0
10
20
30
40
50
60
70
80
90
100
Percentage of Run Time
Figure 23: How dynamic log size varies for dfferent message quantities
On fist inspection it appears that the greater the number of messages passed into
the system, the greater the dynamic log size for a constant send and delivery rate.
This is counterintuitive, as theoretically messages are being stored into the log as
quickly as they are delivered, i.e. at a rate of one per second, as shown by the log size
p
trace for the “50 messages” test case.
During tesung of larger volumes of messages, I observed that the test machine
became extremely untesponsive, with the hard disk hght permanently on indicatmg
heavy disk activity. Simultaneously, the CPU usage of the machine was low,
indicating that a slow hard disk could be the cause of the problem.
I investlgated this by considering the average length of time taken to process 50,100,
200 and 400 messages. Intuitively, with messages being processed at the rate of 1 per
DuncanMillard
U1796407
Chapter 6 Test Results
Page 75
“ \$
-J
L
second, processmg times should be close to 50, 100, 200, and 400 seconds
respectively, albeit with some constant degree of latency introduced by the testing
system itself.
However, the actual average run times for those test cases were as shown in
Table 9.
Number of Messages
Average Run length
(seconds)
Variance between
maximum and
minimum tun times
50
53
0%
100
125
17%
200
248
12%
400
476
10%
~
These results seem to confirm that variable hardware factors become sgmficant
somewhere between 50 and 100 messages and may explain the counter-intuitive test
results for message quantities greater than 50. For all M e r test cases, I therefore
limited the sample size to 50 messages.
Sending Frequency to Delivey Frequency Ratios
The next test measured the impact of varying the ratio of senspeed. Tests were run in which the speed of the sen%
speed to delivery
system and the receiving
system were changed. All messages were sent in the same causal message group to
ensure delivery of only a s q l e message at a time.
Duncan Millad
U1796407
Chapter 6 Test Results
Page 76
50.00
* e. .
45.00
40.00
A
OD
&
35.00
(II
OD
OD
-w
30.00
iij 25.00
t?
A
.-
20.00
EmE 15.00
0
10.00
5.00
0.oc
0
20
40
60
80
100
Percentage of Run Time
Figure 2 4 How the sending to delivery ratio affects dynamic log s u e
Figure 24 shows the impact of v a r p g the sender to receiver frequency ratio on the
dynamic log size - for example the 41 line shows the log size over time for a system
that is sen%
messages in a single message group four times faster than the receiver
can process them.
These results show that, as predicted in section 4.8.2, the greater the ratio of send
speed to delivery speed, the greater the dynamic log size. Note that in aU cases when
a sen-
application stops sen-
messages, the log size r e m s towards zero.
Constant L t e n g
Having established the performance of the protocol with different ratios of rate of
send to rate of delivery, I tested the impact of introducing a fixed latency in the
Duncan Millard
U1796407
Chapter 6 Test Results
Page 77
integration hub for a send to deliver ratio of 1:l. The equal send rate to delivery rate
ratio caused messages to be removed from the log at the same rate as they were
added, meaning that the latency had no impact on message log size. The log size
performance in all cases followed the behaviour shown in Figure 24 for the 1:l ratio.
The latency did increase the overall run times for the tests, as shown in
Table 10. This behaviour is expected, as the latency produces a constant delay in
I
Latency (seconds)
Average Run time (seconds)
0
53
1
56
2
57
4
59
16
71
Table 1 0 Impact of latency on run times
Given a zero-latency run time of 53 seconds, it would be expected that a latency of 1
second would result in a run time of 54 seconds. I believe that this discrepancy is due
to the use of a commercial integration engine (BizTalk 2004) in the simulation
system. In order to model latency, the orchestration includes a "wait for x seconds"
step, executed when the desired latency was non-zero. This type of instruction tends
to be designed for longer pauses, such as a day or more, and therefore it is likely that
there is some additional overhead caused by internal BizTak processes after issuing
this instruction.
Duncan MiIlard
U1796407
Chapter 6 Test Results
Page 78
Variabh Latemy
In addition to a constant latency, a variable latency was introduced, to simulate
business processes with different and random speeds of execution. Each incoming
message has a random likelihood of being delayed by between 1 and 10 seconds.
With so many random factors, the data gathered from these tests was intended only
to show the type of impact that variable speed processing would have on dynamic
g any predictions about future behaviours.
message log size, without p
0
10
20
30
40
50
60
70
80
90
100
Percentage of Run Time
Figure 25: How variable latency affects dynamic log size
Figure 25 shows the log size over time for a delay likelihood of 25%, So%, 75% or
100%. A variable delay is applied to messages to ensure that messages pass through
the integration hub in a random order and do not anrive in sequence. The i n i d
increase in log size is therefore expected, as incomplete sequences build, ready for
Duncan mard
U1796407
Chapter 6: Test Results
Page 79
delivery. For both the 25% and 100% test cases, the message log decreases in size at
around 50% before increasing again. This pattern indicates that a large sequence of
messages was available for delivery. I believe that d the experiment was repeated
with much larger message quantities (for example 1000 or 2000 messages) then this
pattern would be repeated many times for all values of the delay likelihood, with the
dynamic log size varylng around some roughly constant value as sequences are
assembled for delivery.
6.6. Efficienw
One of the primary goals of the protocol was to relax the restrictions inherent in
causal ordering according to the relationships inferred from the incoming data. As
described in the previous section, the protocol was configured to create causal
message groups based on a “logcal stream” identifier contained in the incoming
message. A logical stream is analogous to a real-world entity, for example messages
relating to a particular employee.
In order to see the benefit from concurrent processing, it is necessary to send
messages into the system more quickly than they are delivered. If ttus is not done,
there will only ever be one message available for delivery at a given time and
therefore concurrent delivery is not possible. I therefore changed the send to deliver
ratio to 1:6.
Flgute 26 shows the run times obtained as the number of causal message groups
generated by the protocol varied. The run times have been scaled by a factor of 6
Duncan Millard
U1796407
Chapter 6 Test Results
Page 80
(due to the 1:6 ratio) so that they are comparable with the other test results presented
in this chapter.
60
50
A
Q)
U
-
40
%Q)
al
E 30
E
2
U
2 20
3
U)
10
C
-
I
I
1
4
I
9
I
I
13
15
I
20
I
28
I
33
I
38
50
Number of Causal Message Groups Created
Figure 26 How run time reduces as the number of causal groups increases
It is clear that a rapid reduction in run-time and increase in efficiency is obtained by
the creation of causal message groups. The precise timings are not the critical
measure, as they depend on factors such as the order in which messages anive for
each group. However, the clear trend is that the protocol is able to deliver sqpficant
efficiency gains when compared to traditional causal ordering.
From the experimental Wes, it initially seems that there is no efficiency benefit
obtained above 13 groups, with any greater number of groups requiring between 9
and 10 seconds for the end-to-end run time. However, the fastest possible time to
DuncanMillard
U1796407
Chapter 6 Test Results
Page 81
deliver 50 messages at the (scaled) rate of 6 per second is 8.33 seconds, ignoring any
unavoidable latency from the integration hub. Therefore it is likely that with an even
greater send to deliver ratio, greater efficiency gains would be realised for the hlgher
number of message groups.
6.7. Resilience
To test the resilience of the protocol to a destination system being unavailable, I ran
50 messages through the simulation system, configured for delivery to two
destinations. I then repeated the test with the second receiver disabled - that is, not
acknowledgmg delivery and not processing any messages. The send to deliver ratio
was set to 1:6, as in previous tests, therefore the run times shown in Table 11 have
been scaled and are so that they are comparable with previous test results.
Table 11 shows that the time taken to deliver messages to the enabled destination
was not affected by the unavailability of a destination system.
1 Configuration
1
1 Both destination systems available 1
Average Run time (seconds)
One destination System unavailable
DuncanMillard
U1796407
Chapta 6 Test Results
53
52
I
I
I
Page 82
Basebe Measurement
In order to model the impact on the dynamic message log of an unavailable
destination, I first ran a baseline test with both destinations enabled and a O%, 25%,
and 50% likelihood of concurrent processing, gmng the results shown in Figure 27.
This ftgure shows that the dynamic log size is largely as could be predicted by
exammng the results in Figute 24.
Q)
N
40
10
0
0
10
20
30
40
50
60
70
80
90
100
Percentage of Run Time
Figure 27: Baseline dynamic log size for resilience testing
Measurement with One Destination Disabled
Frgute 28 shows the protile of the dynamic log size with the second destination
disabled. Note that in this test, all steps were run for the same length of time. The
messages destined for the unavailable receiver are simply held in the dynamic log,
awaitmg availability of the remote application in the same way that out-of-order
messages are held waittng for the next causal message. Therefore the protocol is able
DuncanMillard
U1 796407
Chapter 6 Test Results
Page 83
to cope with unavailable destinations without impactmg delivery to destinations that
are still available.
90 80 -
3m 70m
UJ
% 60E
0%
25%
*50%
CI
.v)
50-
8 40
-
4
.O 30
E
-
m
i 20IO
-
0
iI
0
I
1
I O I I I I I I I
10
20
----
I l l
30
I I
I I
I l l
I I
40
I l l
50
I
1
I
l
60
l
I I
I
l
l
70
I I
I l l
80
IO
I I I I I
90
100
Percentage of Measured Time
Figure 2 8 The impact of an unavailable destination on dynamiclog size
6.8. Limitations of the Simulation Svstem
~~~
As discussed in section 6.5.2, when executing an individual test step multiple times,
some variance was seen in the time taken on each run of a step. Below I discuss the
factors that may have influenced the test results. In total there were four factors
influencing the running time of tests:
.
.
Lirmtations of the hardware
Use of a real inteption hub in the simulation system
Number of threads of execution
Use of random values in the test cases
DuncanMillard
U1796407
Chapter 6: Test Results
page 84
Lrmitations Oftbe Hardwm
Tests were carried out on a hgh-powered laptop with a 2.66GHz processor and 1
GB of
RAM.Despite
this, during testlng it was observed that the hard disk was
constantly being accessed. I have observed in normal day-to-day workmg that the
laptop’s hard disk is particularly slow and believe that this had a definite impact on
the consistency of test results.
In order to reduce the impact of the latency of the hard disk, I reduced the frequency
of message sending from the 0rqpa.l design of 10 per second to one per second and
scaled back the “receiver frequency” by the same amount, hence maintaining the
sendldeliver ratios and the validity of the testing.
This limitation was mitigated by
takrng averages
of the length of time taken, and
where appropriate expressing results in terms of the percentage of total run time,
rather than in absolute tenns.
Use Ofa Real Integration Hub
The simulation system was built around BizTalk 2004. As has been discussed,
integration hubs are asynchronous systems subject to variable processing speeds.
This meant that sen-
messages in to the hub at the rate of 1 per second was not a
guatantee of receiving them at one per second. Repeaang a test case did not
therefore guarantee a precisely identical result, but the results were in line with
expectations.
Another factor is that if BizTalk detects that a machine is overloaded, it throttles its
processing in order to reduce that load. I therefore had to find a suitable message
Duncan Millard
U1796407
Chapter 6 Test Results
Page 85
.
quantity that would overcome the variable latency of BizTalk without overloadmg
the system- this was determined to be 50 messages when sending at 1 per second
Since this protocol is designed for use with EA1 hubs this variability in no way
devalues the testing results - it instead h@dtghts the importance of the protocol.
Tbnah ofExecution
As can be seen from the test system architecture in Appendix B, there are a number
of threads simulating destination systems. Threads are not a “free” resource, and
there is a natural limit to how many can be used in a pa.rticula.r context. By
experimentation, I determined that, for this hardware, the limit was 8 sending
threads, meaning that if more than 8 message groups are available for delivery at any
one time, only 8 of those can be serviced. This imposes a theoretical limit on the end
to end speed gains observed with multiple parallel message groups, which was
c o n h e d by the experimental results.
Random Values in Test Cases
Some of the test steps used a random factor when deudmg how to sequence
messages - for example “a 25% likelihood of one message being causally unrelated
to another”. This naturally means that different results were obtained for each
execution of the test step. Again, this limitation was miwted by taktng averages of
the length of time taken, by expressing results in terms of the percentage of total run
time.
Duncan Millard
U1796407
Chapter 6 Test Results
Page 86
6.9. Test Conclusions
~~
~
The findtryrs presented in thts chapter were conducted usmg a real-world hub and
spoke integration engine, conhrrmng the suitability of the protocol for this domain.
The tests confirmed the theoretical efficiency of the protocol predicted in section
4.8.2. The message overhead was found to be in the order of O(1) in the general
case, and in the order of O(n) with respect to cross group dependencies.
The causal log size was found to be in the order of O(n.m) with respect to the
number of destinations and number of causal groups in the system.
The behaviour of the dynamic log size was found to be harder to predict, but was
largely dependent on the difference between the frequency with whch messages
were sent to the application hub and the frequency with which they were processed
by the destination application.
The automatic creation of causal message groups confirmed the flexible nature of
the protocol, proving that inferring causal information from incoming messages is a
successful technique representing a novel approach to obtaimng ordering
information.
6.9.1. Performance Gains of the Protocol
Traditional causal ordering is equivalent to all messages being processed in a single
causal message group. Figure 26 showed the [email protected]
performance gains that can
be realised by splittmg messages into different causal message groups, leadmg to a
clea~benefit of using this protocol for message ordering. The only overhead of using
Duncan Millard
U1796407
Chapter 6: Test Results
Page 87
multiple causal groups is on the causal log size which, as shown in Frgure 21, scales
linearly with respect to the number of groups and destinations created.
6.9.2. Comparisonof Theoretical Efficiency with Optimal Example
The canonical causal message ordering algorithm of Raynal et a/. (1991) exhibits a
performance of O(n9 for both message and log overheads, where n is the number of
processes in a distributed system. The optimal implementation of this algorithm
(Kshemkalyani and Singhal 1995) performs slgnrhcantly better than this in the
general case, but is still worse than O(n).
It is worth noting that in a non-EA1 implementation each process m a i n t a i n s its own
message store, hence repeaang the overhead at multiple sites. Since O(n) describes
the message log size at a single site, the message log space overhead for the whole
system is closer to O(n3.
The theoretical efficiency of an algorithm was discussed in depth in section 4.8.2,
where it was clearly demonstrated that an algorithm exhibittng O(n) efficiency is
preferable to one exhibiting O(n3.
Fgure 29 clearly shows that my protocol, which is in the order of O(n), will scale
sgdicantly better than Kshemkalyani and Singhal's optimal algorithm for causal
ordering which varies between O(n3 and O(n3.
Duncan Millard
U1796407
Chapter 6 Test Results
page aa
O(n"3)
- - - - O(nY)
- . - .- - . O b )
I
-~
0
-
,
I
4
6
1
2
-
1
-
8
,
40
,
12
Input Size
Figure 29: Comparing theoretical efficiencies
6.10.
Summary
.\/
This chapter described the results of testmg the protocol through an extensive suite
of test cases. The test results showed that the protocol offers a flexible and efficient
approach to message ordering for Enterprise Application Integration and that the
protocol compares well to other message ordering protocols.
Duncan Millard
U1796407
Chapter 6:Test Results
Page 89
Chapter 7
CONCLUSIONS
7.1. The Need for an E M Protocol
This thesis began by describing the evolution of Enterprise Application Integration,
and idenafylng the particular problem of message ordering in this domain. The
w e n t state of message ordering research was evaluated in the context of its
suitability for sohug the EA1 ordenng problem, and it was concluded that although
the field was a rich and mature one, no one approach or protocol was suitable for
M I , as summarised by the results in Table 5.
By considering the best features of existing protocols and comb-
these with the
aspects of EA1 relevant to message ordering, I designed and evaluated a new
message ordering approach and protocol, called ir$md cuusul onhing. Table 6
presented the results of this evaluation against the EA1 criteria.
The protocol took the novel step of inferring ordering information from the data
contained within the messages, an approach made possible by the nature of
Enterprise Application Integration, whereby implicit understandmg of a message’s
contents is essential for orchesttatlng business processes.
7.2. Benefits of the Protocol
Chapter 6 tested the protocol using a simulation system built with a commercial
integration engine. The results showed that the protocol successfully guarantees the
Duncan M3Lu-d
U1796407
Chapter 7: Conclusions
Page 90
order of delivery of messages to multiple remote systems for a typical EA1 hub and
spoke architecture.
The tests also showed that the protocol introduces a minknal and predictable
overhead on each message, and similarly that the causal log size can be accurately
predicted by understanding the profile of the causal relationships between the
sen%
applications. These conclusions were detailed in Chapter 6.
Perhaps the biggest advantage of the protocol was the novel way in which causal
information was inferred &om the messages themselves, rather than explicitly
requiring applications to be modified to add causal information. S @ m t
efficiency
benefits were also obtained with the protocol when compared to n o d causal
ordering by splitting messages into causally unrelated groups.
7.3. Limitations of the Protocol
The dynamic log size is much harder to predict than the causal message log or the
message overhead as it is susceptible to a number of factors. However essentially the
main contributing influence is the ratio of the rate at which messages are sent into
the system against the rate at which they can be delivered to a remote application.
On the assumption that all messages will eventually be delivered, the dynamic log
size will reduce to zero, but sufficient log space must be available iu the dynamic log
for the protocol to operate.
Section 4.8.1 also detailed a problem inherent in the centralised nature of the
protocol, which represents a single point of failure for the system. CO-locating the
protocol implementation on the integration engine’s hardware will help to tie the
Duncan Millard
U1796407
chapter 7: Conclusions
Page 91
availability of the protocol with that of the hub itself, but a more sophisticated
algorithm-level solution is desirable.
7.4. Future Work
The protocol addresses Enterprise Application Integration, providmg message
ordering within a localised environment The concepts behind it could be combined
with those of asynchronous message groups as described by Fritzke, Jr. ef al. (1998)
to potentially create message ordenng in a Business to Business Q32B) environment,
for example by implementlng timestamps as described by Mostefaoui ef a1 (2001).
This work could also be extended to investtgate how to reduce the “single point of
failure” problem inherent in the centralised design of the protocol, for example by
spanning the protocol’s message logs across multiple sites. From the hardware
perspective, techniques such as clustering and redundant systems exist to reduce the
impact of the failure of an individual system but a protocol-level solution is worth
invesagatmg further.
It would also be interesting to undertake a formal proof of this protocol in a similar
way to Skawratananond et al. (1998), and to use thts to idenafy any areas of the
protocol that could be improved.
Finally, I would like to model the dynamic log behaviour for far larger message
quantities and more complex interconnected systems by implemenung it in a realworld integration project.
Duncan Millard
U1796407
Chapter 7: Conclusions
Page 92
7
F
l
-
~
I
I
o
o
c
0
0
0
c
0 0 0
0
0
c o o o o
-7
C
Appendix B
SIMULATION SYSTEM ARCHITECTURE
B.l Overview
Figure 30 shows the simulation system used to test the protocol.
It has three main components: a s e n e system simulation, the EA1 hub, and a
destination system simulation.
E!
appropriate rate
Envelope
Adding
component
mT
Message store
I
(Ibnitedt08
SendlTransmit
Figure 3 0 Simulation system archtecture
B.2 Sending System Simulation
A sending simulation component initialised one thread per simulated sendmg
application. Each thread generated the configured number of messages at the
configured rate and passed them to a simple “rules engme” forming a part of the
integration hub.
DuncanMillard
U1796407
Appendix B Simulation System
Architecture
Page 96
Each message contained additional information relevant to the test case, such as how
likely the message was to experience a delay, how many destinations the message was
for, or the static latency for that message. In addition, dependmg on the semng for
concurrency, a message would be assigned to a logcal stream. Finally, timestamp and
test iteration information were used to collate results in a database. Figure 31 shows
the fullinput message format that was used
<Message>
<SendingSystem>O</SendingSyst~>
<EogicalStream>O</LcqicalStream>
<LikelyhoadDelay>O</LikelyhaodDelay>
<FixedDelay>O</FixedDelay>
<NPrmberDestinations>l</N~rDestinatians>
<TestName>NumberOfessagesSO</TestNarae>
(TimeStamp>632295456317187472</TimeStamp>
<Pteration>l</Iteratian>
</Message>
Figure 31: Test message format
B.3 EAIHub
B.3.1 Rules Engine
The rules e w e examined the message, and used the data within it to assign a causal
message group. The causal groups were given a group identifier constructed as
follows:
Figute 3 2 Simulation system causal group identifier
For example "Test~Efficiency50~1teration~l~From~O~Stream~4".
An envelope was added to hold the causal information, and the message was then
passed into a BizTalk 2004 orchestration to simulate a business process.
Duncan Millard
U1796407
Appendix B: Simulation System
Architecture
Page 97
B.3.2 Integration Engine: BizTalk 2004
The BizTalk orchestration simply simulated any necessary delays, then wrote the
message to the dynamic message log ready for delivery to the simulated systems.
B.3.3 Message Logs
The message logs were implemented as SQL Server 2000 tables. One table held the
causal log, and another group of tables stored the dyuamic message log.
B.4 Destination Svstem Simulation
A Windows Service was written that ran eight threads to monitor the dynamic
message log for causal message groups that were ready for delivery. When a thread
identified an available message group it consumed the messages from that group one
at a time, pausing for the configured delay between each message. If a receiving
system was configured to act as being disabled, it returned a fdure code to the
thread controller, which flagged the delivery as ha-
failed, and requiring the
message to be retried at an appropriate point.
DuncanMillard
U1796407
Appendix B: Simulation System
Architecture
Page 98
References
,
@ccenture (2003) Field Force Enablement,
http://www.accenture.com/xdoc/en/services/microsoft/field_force
.pdf
/Agarwal,
D. A., Moser, L. E., Melliar-Smith, P. M. and Budhia, R. K. (1998)
"The Totem multiple-ring ordering and topology maintenance protocol",
ACMTrans. Comput. Syst., 16 (2), pp. 93-132.
man, R. and Altman, G. (2004) "An Integration Primer Part IT", Business
Integration Journal, March, pp. 45-47.
/Attachmate
Corporation (2004) Approaches to EAI Involving Legacy Host
Applications: The Five R 's,
http://www.attachmate.com/article/0,1012,3163-1 3 85 8,OO.html
V 6 m m - 1 , K. P. and Joseph, T. A. (1987) "Reliable communication in the
presence of failures", ACM Trans. Comput. Syst., 5 (l), pp. 47-76.
@'msard,
G. and Bratley, P. (1988) Algorithmics Theory and Practice, PrenticeHall, Inc., Englewood Cliffs, NJ.
a u s s l e r , C. (2002a) "B2B integration technology architecture", In: Proceedings
of the Fourth IEEE International Workshop on Advanced Issues of ECommerce and Web-Based Information Systems, pp. 147-152
dBussler, C. (2002b) "P2P in B2BI", In: Proceedings of the 35th Annual Hawaii
International Conference on System Sciences, pp. 39 15-3924
/Chan@
P., Gambhire, P. and Kshemkalyani, A. D. (2004) "Performance of the
Optimal Causal Multicast Algorithm: A Statistical Analysis", IEEE
Trans. Parallel Distrib. Syst., 15 (l), pp. 40-52.
/Cheng,
W., Jia, X. and Werner, M. (1995) "A multicast mechanism for actual
causal ordering", h:IEEE First International Conference on Algorithms
and Architectures for Parallel Processing, pp. 303-314
d l e r n e n t s , P. C. and Nothrop, L. N. (1996) "Software Architecture: an Executive
Overview", In: Brown, A. W. (ed.). Component-Based Sofhare
Engineering: Selected Papers @om the Software Engineering Institute
IEEE Computer Society Press 55-68
&merich,
W., Ellmer, E. and Fieglein, H. (2001) "TIGRA - an architectural
style for enterprise application integration", In: Proceedings of the 23rd
international conference on Sofmare engineering, pp. 567-576
DuncanMillard
U1796407
Referaces
Page 99
Espinosa, J. and Pulido, A. (2002) "IB (integrated business): a workflow based
integration approach", In: Proceedings of the 35th Annual Hawaii
International Conference on System Sciences, pp. 2566-257 1
A y a l , A. and Milo, T. (2001) "Integrating and customizing heterogeneous ecommerce applications", The VLDB Journal, 10 (l), pp. 16-38.
Federal Student Aid University (n.d.), Dejnitions,
http://extranet.sfa.ed.gov/sfa-university/training/Q botw/dictionary-m.ht
ml
A r i t z k e , U., Jr., Ingels, P., Mostefaoui, A. and Raynal, M. (1998) "Fault-tolerant
Total Order Multicast to asynchronous groups", In: Proceedings of the
Seventeenth IEEE Symposium on Reliable Distributed Systems, pp. 228234
1/Gosain, S., Malhotra, A., Sawy, 0. A. . and Chehade, F. (2003) "The impact of
common e-business interfaces", Commun. ACM, 46 (12), pp. 186- 195.
/
/'Hackney, D. (1996) "Treasure in the Data Islands", DM Review, 6 (10).
Jf-Iohpe, G. and Woolf, B. (2004) Enterprise Integration Patterns. Designing,
Building, and Deploying Messaging Solutions, Pearson Education,
Boston, MA.
f l e g e r , H. (2003) "Fulfilling the Web services promise", Commun. ACM, 46
(6), pp. 29-ff.
JKshemkalyani, A. D. and Singhal, M. (1995) Necessary and Suflcient
Conditions on Information for Causal Message Ordering and Their
Optimal Implementation, Technical Report 29.2040, IBM Research
Triangle Park.
,,Kshemkalyani, A. D. and Singhal, M. (1996) "AnOptimal Algorithm for
Generalized Causal Message Ordering", In: Proceedings of the fifteenth
annual ACM symposium on Principles of distributed computing,87
~ K U O
D.,, Fekete, A., Greenfield, P., Jang, J. and Palmer, D. (2003) "Just what
could possibly go wrong in B2B integration?", In: Proceedings of the
27th Annual International Computer Software and Applications
Conference, pp. 544-549
/Lamport,
L. (1978) "Time, clocks, and the ordering of events in a distributed
system", Commun. ACM, 21 (7), pp. 558-565.
(Levin,
J. (2001) "From ED1 To XML And UDDI: A Brief History Of Web
Services", Information Week, CMP MEDIA LLC.
Duncan Millard
U1796407
References
Page 100
/Linthicum, D. S. (2000) Enterprise Application Integration, Addison-Wesley,
Boston, MA.
A i n t h i c u m , D. S. (2004) Next Generation Application Integration: From Simple
Information to Web Services, Pearson Education, Inc., Boston, MA.
./Medjahed, B., Benatallah, B., Bouguettaya, A., Ngu, A. H. H. and Elmagarmid,
A. K. (2003) "Business-to-business interactions: issues and enabling
technologies", VLDB Journal: Very Large Data Bases, 12 (l), pp. 59-85.
/
/Microsoft (2004) Microsofi BizTaIk Server 2004 Product Documentation
http://www.msdn.microsoft.com/librarv/default.asp?url=~ibrarv/enus/def%tm/ebiz def portal page.asp
A., Raynal, M. and Verissimo, P. (2001) "The Logically
/Mostefaoui,
Instantaneous Communication Mode: a Communication Abstraction",
Future Generation Computer Systems, pp. 669-678.
,Akrty, V. V. and Garg, V. K. (1997) "Characterizationof Message Ordering
Specificationsand Protocols", In: 17th International Conference on
Distributed Computing Systems (1 7th ICDCS'97), pp. 492-499
National Institute of Health (n.d.), NIHnet Handbook Glossary,
http://www.cit.nih.gov/dnst/handbook/Main/glossary.htm
/&pal,
M., Schiper, A. and Toueg, S. (1991) "The causal ordering abstraction
and a simple way to implement it", I n , Process. Lett., 39 (6), pp. 343350.
d e y e s , A., Espino, J., Mohan, V. and Nadkar, M. (2003) "Ad hoc software
interfacing:enterprise application integration (eai) when middleware is
overkill", In: Proceedings of the 2 7th Annual International Computer
Software and Applications Conference, pp. 570-580
A
g
h
,G. and Badarpura, S. (2001) "Application ordering in group
communication", In: 21st International Conference on Distributed
Computing Systems Workshop, pp. 11- 16
&ha,
A. (1992) "Client-server computing", Commun. ACM, 35 (7), pp. 77-98.
WSkawratananond, C., Mittal, N. and Garg, V. (1998) "A Lightweight Algorithm
for Causal Message Ordering in Mobile Computing Systems".
&,
M. (2002) "Web services: beyond component-based Computing", Commun.
ACM, 45 (lo), pp. 71-76.
Duncan Millard
U1796407
References
Page 101
flemistocleous, M., Irani, Z., O'Keefe, R. and Paul, R. (2001) "EFW problems
and application integration issues: an empirical survey", In: Proceedings
of the 34th Annual Hawaii International Conference on System Sciences,
pp. 1-10
/Truman,
/
/Tyler,
/ah,
P. (2001) Integration Framework White Paper, Cap Gemini Emst &
Young.
P. (1994) "Causal group multicast: a formal description", In: Proceedings
of IEEE Region 10's Ninth Annual International Conference. Theme:
'Frontiers of Computer Technology', pp. 692-696
S. (2000) "Reality bites", Computer Business Review (Online Edition),
Computerwire, June 2000.
/Vinoski, S. (2002) "Middleware "Dark Matter"", IEEE Internet Computing, 6
(5), pp. 92-95.
d 3 C (2000) Simple Object Access Protocol (SOAP) 1.1,
http://www.w3c.org/TR/SOAP
( A & A-c
\
Yoshida, T. (2001) "Message ordering based on the strength of a causal
relation", In: Proceedings of the 15th International Conference on
Information Networking, pp. 9 15-920
fl
DuncanMiuard
U1796407
References
Page 102
Index
A2A ....................
See Application-to-Application
Integration
Actual Causal Ordering.....................................
37
39
Application Ordering........................................
Application-to-Application Integration
Asynchronous Communication................21, 23
B2B ..........See Business-to-Business Integration
B2C .......See Business-to-Consumer Integration
Business Processes
Automatlng ....................................................
19
Business-to-Business Integration ................... 12
Business-to-Consumer Integration ................ 12
Causal Message Groups.................................... 48
Causal Ordering.................................................. 33
Cross-GroupDependencies ............................ 48
Data Islands.........................................................
13
13
Data Shanng ........................................................
14
Data Silo ...............................................................
Enterprise Application Integration ................ 11
Definition .......................................................
11
Rea-World Uses ...........................................
20
The Message OrderingProblem ...............22
Enterprise Applications ....................................
15
Enveloping ...................... See.=
- Ehveloping
Hub and Spoke Integration ....................... 17, 18
Architecture ................................................... 17
Inferred Causal Ordering..................................
44
Integration Engine .............................................
19
Lamport, Leslie
"Happens Before" relation ......................... 33
35
Metadata ...............................................................
Middleware.......................................
14, 15, 16, 18
Orchestration...................................................... 19
Point to Point Integration ................................
16
Stovepipe Systems..............................................
13
Total Ordering....................................................
31
Totem .................................................................... 32
Web Services ................................................. 18, 19
XML .............................. 17, 41, 45, 46,47, 49, 51
Enveloping ..................................................... 51
~
Duncan Millard
U1796407
Index
Page 103