Unit Testing and Performance Using Entity Framework 4.0 Tommy H¨ ornlund

Unit Testing and Performance
Using Entity Framework 4.0
Tommy Hörnlund
January 24, 2013
Master’s Thesis in Computing Science, 30 credits
Supervisor at CS-UmU: Jan-Erik Moström
Examiner: Fredrik Georgsson
Umeå University
Department of Computing Science
SE-901 87 UMEÅ
SWEDEN
Abstract
POÄNGEN is a web application for rent management. The core of the application is a
module that performs rent calculations. In the past the application relied heavily on business
logic in stored procedures that made the program hard to test and maintain.
The purpose of this thesis was to find a new method for combining unit testing and data
access. A new implementation of the rent calculation had to be created that was easier to
test, maintain and have good performance.
This thesis shows how to combine data access and unit tests using Entity Framework
4.0, an object relational mapping framework from Microsoft. The new module uses the
Repository and Specification design patterns to create a data abstraction that is suitable for
unit testing.
Also the performance of Entity Framework 4.0 is evaluated and compared to traditional
data loading and it shows that Entity Framework 4.0 severely lacks in performance when
loading or saving large amounts of data. However the use of POCO entities makes it possible
to create optimized functionality for time critical data access.
ii
Contents
1 Introduction
1
1.1
Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
1.2
Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
1.3
Goals & Purpose . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
2 POÄNGEN
3
2.1
Utility principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.2
Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
2.3
Residential unit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.4
Apartment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.5
Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
2.6
Score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.7
Rent . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
2.8
Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
3 Entity Framework
7
3.1
Entity Data Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
3.2
Mapping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7
3.3
LINQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
3.4
Loading Related Entities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.5
Change Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Testability
11
13
4.1
Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
4.2
Unit of Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.3
POCO . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4
Mocking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.5
Inversion of Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.6
Unit Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
iii
iv
CONTENTS
5 Result
5.1 Overview . . . . . . . . . .
5.2 Entity Data Model . . . . .
5.3 POCO . . . . . . . . . . . .
5.4 Specification . . . . . . . . .
5.5 FetchStrategy . . . . . . . .
5.6 Repository . . . . . . . . . .
5.7 Calculation . . . . . . . . .
5.8 Dependencies . . . . . . . .
5.9 Data Access . . . . . . . . .
5.10 Data Persistence . . . . . .
5.11 Unit Tests . . . . . . . . . .
5.11.1 Testing Data Access
5.11.2 Test Data . . . . . .
5.11.3 Mocking . . . . . . .
5.11.4 Example . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
17
17
18
19
19
20
20
21
22
23
23
23
24
25
25
25
6 Performance
6.1 Test Data . . . . . . . . .
6.2 Test Application . . . . .
6.3 Execution . . . . . . . . .
6.4 Result . . . . . . . . . . .
6.4.1 Calculation time .
6.4.2 Memory Usage . .
6.4.3 Persistence . . . .
6.4.4 Legacy Calculator
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
29
29
29
29
30
30
30
31
33
.
.
.
.
.
.
.
.
7 Conclusions
35
7.1 Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
7.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
8 Acknowledgements
37
References
39
List of Figures
3.1
3.2
An example database diagram . . . . . . . . . . . . . . . . . . . . . . . . . . .
The entities that are mapped to the database tables in figure 3.1 . . . . . . .
5.1
5.2
5.3
Conceptual overview of the system . . . . . . . . . . . . . . . . . . . . . . . . 18
The real and the mock context implements the same interface . . . . . . . . . 19
Calulation module dependencies . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.1
6.2
6.3
6.4
6.5
Data loading comparison . . . . . . . . . . . .
Memore usage comparison . . . . . . . . . . .
Entity Framework persistance performance. .
Entity Framework persistence memory usage
Comparison with legacy rent calculator . . .
v
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
8
30
31
32
32
33
vi
LIST OF FIGURES
List of Tables
2.1
2.2
2.3
2.4
2.5
2.6
Common apartment properties. . . . . . . . . .
An example model. . . . . . . . . . . . . . . . .
Two apartments with different property values.
Property values converted to score . . . . . . .
Formula calculated score . . . . . . . . . . . . .
Example . . . . . . . . . . . . . . . . . . . . . .
vii
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
4
5
6
6
6
viii
LIST OF TABLES
Listings
3.1
3.2
3.3
3.4
3.5
3.6
3.7
4.1
4.2
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
Using LINQ to query a list of integers. . . . . . . . . . .
Using LINQ to query an entity collection. . . . . . . . .
Loading related entities by including them in the query.
Explicitly loading related entities after the query. . . . .
Loading related entities with lazy loading enabled. . . .
Loading related entities before the query is executed. . .
Saving changes to the context . . . . . . . . . . . . . . .
Regular dependency . . . . . . . . . . . . . . . . . . . .
Inversion of Control . . . . . . . . . . . . . . . . . . . .
The calculation module interface . . . . . . . . . . . . .
The specification interface . . . . . . . . . . . . . . . . .
An excerpt from the generic repository interface . . . .
Ordinary object instantiation. . . . . . . . . . . . . . . .
Dependency injection using a factory lambda expression.
Specification for an active apartment. . . . . . . . . . .
Unit testing the Specification for an active apartment. .
Mock example . . . . . . . . . . . . . . . . . . . . . . . .
IFormulaCalculator interface . . . . . . . . . . . . . . .
IModelLayoutCalculator interface excerpt . . . . . . . .
Example unit test . . . . . . . . . . . . . . . . . . . . .
ix
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9
9
10
10
10
10
11
15
15
17
20
20
22
22
24
24
25
26
26
27
x
LISTINGS
Chapter 1
Introduction
TRIMMA is an IT company focusing on solutions for business management and decision
making software. One if its major products is POÄNGEN (translates to ”Score”), a complete
solution for rent management according to the utility principle. An explanation of the utility
principle can be found in section 2.1.
POÄNGEN started out as a small and simple application, but it has quickly grown in
both functionality and number of customers. It is becoming harder and harder to maintain
and develop the application and new development methods has to be found in order to ensure
the future of POÄNGEN.
A core part of the application is the rent calculation module. It is a data heavy process
where large amounts of data is brought together to calculate the rent for each apartment.
The current implementation suffers from many issues.
The first problem is that it is hard to test the correctness of the module. The only way
to know that it works correctly is because it has worked in the past. This means that the
module cannot easily be extended, because the correctness cannot be verified if a change is
made.
Another issue is performance. With larger and larger customers the system cannot handle
the tens of thousands of apartments these companies manage. If a new system design is
proposed, it has to be highly efficient.
1.1
Overview
Chapter 1 contains the background and problem statement.
Chapter 2 is a brief introduction to the problem domain.
Chapter 3 is an overview of Entity Framework, the object relational mapping framework
used in this project.
Chapter 4 contains a theoretical study about testability and Entity Framework.
Chapter 5 describes the final application that was created.
Chapter 6 contains a performance study on the resulting application.
Chapter 7 contains the discussion of the results and conclusions.
1
2
1.2
Chapter 1. Introduction
Problem Statement
The task is to create a new rent calculation module with automated testing. The possibility
to add new features should be taken into consideration as well as the performance of the
calculation.
A large part of the problem is that existing calculation logic reside in the database in
the form of stored procedures. It is not possible with the current tools to automatically test
this functionality. If a change is made there is no safeguard that the previous functionality
is not altered. The problem is to move this logic to the application code without loosing
performance.
Presently POÄNGEN is using an object relation mapping (ORM) framework known as
dOOdads[16]. This framework however is obsolete and no longer maintained. Therefore
another ORM framework will be used, Microsofts Entity Framework 4.0[14]. Part of the
project will be to evaluate the testability and performance of Entity Framework.
1.3
Goals & Purpose
The purpose is to develop a method to incorporate automated testing in the development
process. The method should be evaluated to determine if it is suitable for the existing
application as well as in future development. The goal is to develop software that is easier to
maintain and modify, without introducing new bugs in the existing functionality.
Chapter 2
POÄNGEN
This chapter is a brief introduction to the utility principle. The utility principle is the
underlying principle of POÄNGEN, a complete solution for rent management.
2.1
Utility principle
The idea behind the utility principle is to provide an alternative to market prices. The
difference in price between apartments should be easily explainable because of different
standards on the apartments[3]. In Sweden a landlord is governed by law to use the utility
principle[7].
The method used in POÄNGEN is based on work by the Swedish Association of Public
Housing Companies (SABO)[3]. The basic idea is that a score is calculated for each apartment
and then the current rent is redistributed relative to the score to calculate the new rent.
This means that the total income remains constant while some apartments get increased
rent and others decreased.
2.2
Properties
To describe the different aspects of the apartments a number of properties are defined.
Properties can be of different data types; string, numeric, boolean and predefined values.
The most common properties has been defined by SABO[3] while others has to be chosen
based on experience. Some of the common properties can be seen in table 2.1. Properties
can also be undefined, represented by a null value.
Table 2.1: Common apartment properties.
Name
Data type
Possible values
Area
Apartment type
Balcony
Address
Location
Numeric
Predefined
Boolean
String
Predefined
Real numbers
1ROK, 2ROK, 3ROK
yes or no
Any string
A, B, C
3
4
2.3
Chapter 2. POÄNGEN
Residential unit
Apartments are contained in groups called residential units. A unit usually consist of a single
building or several very similar buildings. Properties such as building year, surroundings
and distance to different services can be associated with residential units.
2.4
Apartment
The apartment score is calculated from the number of rooms and the type of kitchen. For
example, one room and kitchenette receives 24 points, while two rooms and a regular kitchen
receives 40 points. This score is added to the area of the apartment and this becomes the
total score for the apartment.
The apartment score assumes that the apartment has all the standard equipment of a
regular apartment. If some part of the apartment differs from the standard an adjustment
score has to be added. For example, if the balcony is extra large the apartment should gain
extra points, while if the balcony is extra small or missing the apartment should loose points.
The different types of scores are described in section 2.6.
2.5
Model
A model is a mapping from properties to points, different property values can be assigned
different points. An example model can be seen in table 2.2. Each property is also assigned
a formula alias. All points with the same formula alias are summed and substituted into the
corresponding variable in the formula, described in the next section.
Table 2.2: An example model. The apartment type
has a fixed score for each possible value. The area
property is a numerical property so it can be directly
converted to points.
Formula alias
Property
Value
Points
A
A
A
Apartment type
Apartment type
Apartment type
1ROK
2ROK
3ROK
34
40
44
A
Area
X
X
B
B
Balcony
Balcony
Yes
No
0
-5
C
C
C
Location
Location
Location
A
B
C
35
33
21
2.6. Score
2.6
5
Score
There are three different types of points in the model. The A score is known as the apartment
score. It is a measure of the size and number of rooms in the apartment. Each apartment is
assumed to satisfy the minimum standard requirements. For example, rooms should have at
least one window and heating should be included in the rent.
The second type is the B score, called the adjustment score. If the apartment differs
from a standard apartment an adjustment has to be made. For example if an apartment has
no balcony a negative score will be added to the B score.
The final type is the C score, called the residential unit score. This is the score for all
properties that are shared by all apartments in a residential unit. The desirability of the
building location can be one such property.
The total score is calculated by taking the residential unit score C and adding 100.
This score is then multiplied by the apartment score A. Finally the adjustment score B is
multiplies by 100 and added to the product. The final formula can be seen in equation 2.1.
Total score = (100 + C) × A + (100 × B)
2.7
(2.1)
Rent
To convert from points to rent the total income from all apartments are divided by the sum
of the score for all apartments, see equation 2.2.
Factor =
Total income
Sum of all scores
(2.2)
The resulting factor is then multiplied by the score of the apartment to calculate the
rent, equation 2.3.
Apartment rent = Score × Factor
2.8
(2.3)
Example
An example is two apartments having the same rent of 3000 SEK, but different standard.
With the utility principle the rent should be redistributed to reflect the differences of the
apartments.
Table 2.3: Two apartments with different property values.
Value
Apartment one
Area
Type
Balcony
32
2ROK
Yes
Apartment two
30
1ROK
No
6
Chapter 2. POÄNGEN
The first apartment in table 2.3 has an area of 32 m2 , two room and kitchen (2ROK)
and a balcony. The second apartment has an area of 30 m2 , one room and kitchen (1ROK)
but no balcony. Using the model in table 2.2 the properties can be converted to points in
table 2.4.
Table 2.4: The score for each property in table
2.3.
Points
Apartment one
Area
Type
Balcony
Apartment two
32
40
0
30
34
-5
Again using the model the A, B and C score can be calculated in table 2.4. The final
score is calculated by using equation 2.1.
Table 2.5: The score for the two apartments.
The total score is calculated using the formula
(100 + C) × A + (100 × B).
Points
A
B
C
Total
Apartment one
Apartment two
72
0
0
7200
64
-5
0
5900
Apartment ones type is worth 40 points. Added with the area the A score becomes 72.
The total points become (100 + 0) × 72 + (100 × 0) = 7200. The second apartment receives
(100 + 0) × 64 + (100 × −5) = 5900.
60
The factor becomes 3000+3000
7200+5900 = 131 . Now the factor can be multiplied with the new
score to calculate the new rent.
Table 2.6: Example
Apartment one
Points
Old rent
New rent
7200
3000 SEK
3298 SEK
Apartment two
5900
3000 SEK
2702 SEK
Note that the sum of the new rents are the same as the sum of the old rents. It has been
redistributed to better reflect the utility of the apartments.
Chapter 3
Entity Framework
According to the requirements Entity Framework had to be used, the reason being that the
framework was already in use for other projects at the company. Entity Framework is an
object relation mapping framework from Microsoft that is included in the .NET framework[14].
The version used for this project is Entity Framework 4.0.
The basic idea of Entity Framework is to eliminate the impedance mismatch between
business logic and data representation. This is done using the Entity Data Model (EDM).
3.1
Entity Data Model
The Entity Data Model has two basic components.
– Entities are strictly typed data structures that contains the record data and an identifier
key.
– Relationships are associations between entities.
More advanced features of the EDM are inheritance and complex types[14], but these are
not used in this project.
The Entity Data Model should be created to reflect the structure of the business objects
used in the application. It may be necessary to create different data models for different
parts of the application, while still using the same database.
3.2
Mapping
To populate the data model with data from an actual relational database a mapping has to
be created. Entities can be mapped to database tables, but several tables can also map to a
single entity or a table can be split up into several entities. In figure 3.1 the Employee and
ContactInfo tables are combined into a single entity. The mapped entities can be seen in
figure 3.2. The Company entity can be accessed from a property on the Employee entity, and
the Company entity contains a list of all employees associated with the company.
When accessing the entities in the application code Entity Framework will automatically
fetch data from the database and populate the in-memory data structure.
7
8
Chapter 3. Entity Framework
Company
Employee
Employee
PK
Company
EmployeeID
PK CompanyID
Salary
FK1 ContactInfoID
FK2 CompanyID
Name
ContactInfo
PK ContactInfoID
Name
Adress
Phone
Figure 3.1: An example database diagram. The dashed boxes show the entities that the
database tables are mapped to.
Employee
+Salary : decimal
+Name : string
+Adress : string
+Phone : string
+Company : Company
0..*
1
Company
+Name : string
+Employees : List< Employee >
Figure 3.2: The entities that are mapped to the database tables in figure 3.1.
3.3. LINQ
3.3
9
LINQ
Instead of using SQL query strings to query the entity model the C# language has introduced
a new feature called Language Integrated Query (LINQ). LINQ can be used to query a
number of different data sources like databases, collections, XML documents and entity
models using the same syntax. In listing 3.1 a LINQ query is made against a list of numbers.
All numbers less or equal to five are selected and the numbers are sorted in ascending order.
Listing 3.1: Using LINQ to query a list of integers.
int [] numbers = new int [] {5 , 7 , 1 , 4 , 9 , 3 , 2 , 6 , 8};
var smallnumbers = from n in numbers
where n <= 5
orderby n
select n ;
foreach ( var n in smallnumbers ) {
Console . Write ( n ) ;
}
OUTPUT : 12345
The same syntax is used to load entities from the database and when used with entities
it is usually referred to as LINQ to Entities. Each entity type is represented as a collection
and all collections are contained in an ObjectContext class. The object context act as a
repository and a unit of work, concepts described in chapter 4. For now the important thing
is that entities are accessed through a collection, in the same way as the number example in
listing 3.1.
Listing 3.2 shows an example where LINQ is used to query the company context for all
employees named ”Bob”.
Listing 3.2: Using LINQ to query an entity collection.
CompanyContext companyContext = new CompanyContext ()
var bobs = from e in companyContext . Employees
where e . Name == " Bob "
select e ;
10
Chapter 3. Entity Framework
3.4
Loading Related Entities
When loading an entity related entities can be loaded as well, as defined by the relationships
in the EDM. There are several ways related entities can be loaded.
1. Specified in the query
2. Explicit loading
3. Lazy loading
4. Eager loading
The first method is used in listing 3.3 and it references the related fields in the query and
selects them.
Listing 3.3: Loading related entities by including them in the query.
var result = from e in companyContext . Employees
select new { Name = e . Name , Company = e . Company . Name };
In listing 3.4 the employee entity is loaded first, then the company navigational property
of the employee is explicitly loaded. The First() method simply returns only the first
employee in the result set. This methods requires two round trips to the database to retrieve
the data.
Listing 3.4: Explicitly loading related entities after the query.
var employee = ( from e in companyContext . Employees
select e ) . First () ;
employee . Company . Load ()
If the lazy loading option is enabled in Entity Framework there is no need to explicitly
load the related entity, it is automatically loaded when it is accessed like in listing 3.5. This
too requires two round trips to the database and care must be taken when accessing a
navigational property. If for example lazy loading happens inside the loop iterating over a
list of employees an SQL query will be executed for every iteration.
Listing 3.5: Loading related entities with lazy loading enabled.
var employee = ( from e in companyContext . Employees
select e ) . First () ;
Company company = employee . Company ;
The final method in listing 3.6 is eager loading. Here the Company related entity is
included just after the LINQ query. This only creates a single SQL query joining the tables
together.
Listing 3.6: Loading related entities before the query is executed.
var result = ( from e in companyContext . Employees
select e ) . Include ( " Company " ) ;
3.5. Change Tracking
3.5
11
Change Tracking
When a change is made to an entity the change is automatically tracked by Entity Framework.
To persist the changes to the database the SaveChanges method is called on the context
object, as in listing 3.7.
Listing 3.7: Persisting the entity changes made to the object context.
var employee = ( from e in companyContext . Employees
select e ) . First () ;
employee . Salary += 1000;
companyContext . SaveChanges () ;
12
Chapter 3. Entity Framework
Chapter 4
Testability
The focus of the study has been on the subject of testability, specifically how to unit test
data access code.
Because one of the requirements was to use Microsoft Entity Framework 4.0 a lot of
effort was put into finding information about testability when using EF4. In an articled
published on MSDN Scott Allen demonstrates some common unit testing techniques that
can be applied to EF4 [1]. Allen argues that extensive unit testing is a valuable tool to
developer teams. However, the effort in creating these unit tests are related to the testability
of the code. Therefore Entity Framework 4.0 was designed with testability in mind.
Allen presents two metrics that will always be exhibited by highly testable code. The
first one is observability. If a method is observable, it is easy to visually observe the output
of the method, with a given input. Methods with many side effects are hard to observe.
The other metric is isolation. When you unit test a method you only want to test the
logic inside the method. But if the method depends on some external resource, for example
a network socket or database, the unit test might fail if the resource is off line. The resource
might also take a very long time to respond, leading the automated test to take a very long
time to run. To achieve testable code a separation of concerns should be maintained. This
concept was termed the Single responsibility principle by Robert C. Martin [11]. It is based
on the concept of cohesion and can be summarized as: “There should never be more than
one reason for a class to change”. In this case the logic should reside in one class or module
and the external resource access should reside in another. They can then be unit tested in
isolation.
These metrics presented by Allen are very basic metrics but they can easily be applied
to any newly developed code. The concept will also be repeated when other patterns are
discussed, so these metrics will be used to evaluate the resulting code of the project.
4.1
Repository
Allen goes on to explain some common abstractions that are useful for abstracting data
persistence. One very common abstraction is the Repository pattern. This design pattern
has been documented by Martin Fowler in his book Patterns of Enterprise Application
Architecture[5] and a short overview of the pattern can also be found on his website[6]. The
repository pattern is very commonly used, both for unit testing and other uses.
According to Fowler a repository “mediates between the domain and data mapping layers
using a collection-like interface for accessing domain objects”. Allen says that this isolates
13
14
Chapter 4. Testability
the details of the data persistence, and that it fulfills the isolation principle required for
testability. However, he also adds that the interface for a repository shouldn’t contain any
operations for actually persisting the objects back to the data source. In the spirit of the
single responsibility principle, a separate structure should be used, and he presents the Unit
of Work pattern.
4.2
Unit of Work
The Unit of Work pattern is also described by Martin Fowler in his book Patterns of
Enterprise Application Architecture[5] and on his website[6]. Allen mentions that the unit of
work pattern should be a familiar pattern for .NET developers, because it has been used in
the ADO.NET DataSet class. It has the ability to handle update, delete and insert operation
on database table rows. It is however tightly coupled to database logic. The goal is to isolate
the specifics of data persistence. This is why Allen argues that the unit of work pattern is
required.
The default behaviour in Entity Framework 4.0 is to create a class extending the ObjectContext class. The object context serves as both a repository for generated entities
as well as a unit of work. There is no interface defined for the object context, which is a
problem when you want to achieve isolation and testability. Fortunately there exists several
extensions for generating entities which can be used instead of the default code generator.
Allen uses a template that generates POCO (Plain Old CLR Objects).
4.3
POCO
The POCO concept originates from the Java POJO classes (Plain Old Java Objects). The
POCO object is independent of the data source and contains only data and business logic.
This is known as Persistence Ignorance. According to Allen objects using POCO classes are
easier to test than entities that include information about persistence.
Julie Lerman mentions two flavours of POCO entities on her blog[9] which is the same
information as in her book Programming Entity Framework[10]. The first type is Data
Transfer Objects (DTO) that are unable to notify the object context of any changes made
to the entities. The changes to the context is only checked before a commit is made on the
context. Lazy loading is not possible with this POCO type.
If all the properties and associations of the POCO class is declared virtual a second type
of POCO entites are possible. When the context creates the POCO entity it actually creates
a proxy class that overrides the methods of the POCO class and provides feedback to the
context when the POCO is manipulated. This makes it possible for the context to intercept
if an association is accessed that is not yet loaded. It can then be lazily loaded on demand.
One interesting detail is that Lerman puts the generated POCO entities in a separate
class library, allowing to create different applications that are only connected by the POCO
entities.
4.4
Mocking
One of the biggest thresholds in beginning unit testing is how to isolate a unit before testing
it. Tim Mackinnon, Steve Freeman and Philip Craig used mock objects to isolate units in
their paper[20]. According to them, what makes unit testing hard is that the units are tested
from the outside.
4.5. Inversion of Control
15
Using mock objects it is possible to test code in isolation. Mock objects replace the
application code with dummy classes that emulate the real objects, but provide a much
simpler implementation that can be set up with data relevant to the unit tests. If the mock
objects become too complex this is an indication that the application code itself is too
complex and requires refactoring.
4.5
Inversion of Control
In his examples Allen creates a class that is dependent upon the interface of a unit of work
class like in listing 4.1. Because it uses an interface he can create another implementation of
the unit of work class that has no database connection, it just uses hard coded in memory
data known as a fake class. To be able to switch implementation the creation of the unit
of work class is moved from the constructor to a member variable that can be sent to the
constructor like in listing 4.2.
Listing 4.1: The Controller class depends on the UnitOfWork class.
class Controller
{
UnitOfWork unitOfWork ;
Controller ()
{
this . unitOfWork = new UnitOfWork () ;
}
}
Listing 4.2: Inversion of control is used to break the dependency.
class Controller
{
IUnitOfWork unitOfWork ;
Controller ( IUnitOfWork uow )
{
this . unitOfWork = uow ;
}
}
This is a very simple implementation of a pattern known as Dependency Injection. As
Allen mentions this is only a simple example, a real project would use a more complex
method to automate the process of dependency injection.
When creating the data needed in the unit test Allen creates a class that initializes test
data intended to be used across multiple test suits. This is a design pattern known as Object
Mother, described by Schuh and Punke[18]. They show that it can be a very useful pattern
for unit tests that requires data that closely resembles real data. However, as mention by
Martin Fowler[4], it creates a strong coupling between tests that use the same test data.
Changes to a test that requires the test data to change might affect other tests. The pattern
still seems very useful, but it is slightly outside the scope of this thesis.
16
4.6
Chapter 4. Testability
Unit Testing
For unit testing Lerman uses mock contexts that implement the context interface[9]. Instead
of accessing a database the mock context returns mock object sets that read its data from
an internal list of POCO entities. Several mock contexts are created, for example one with
valid data and one with invalid data. This approach is similar to using the ObjectMother
pattern mentioned in section 4.5, and it suffers the same drawback that the tests becomes
strongly connected through the shared test data.
The practical use of testability is the ability to unit test the code. R. Venkat Rajendran is
writing in a paper[19] about the impact of testing in general and the benefits and drawbacks
of unit testing. One of the benefits is the ability to test one part of the code without having
to rely on other parts being available. This makes it possible for several programmers to
work on and create unit tests simultaneously. Unit testing also makes it possible to debug a
very confined piece of code. It is also possible to test special test cases with state that is very
hard to set-up for the whole program. The overall structure of the code is also improved
when unit testing is enforced. Unit testing is the most cost effective type of testing, because
it occurs in the early stages of development.
Some of the drawbacks with unit testing according to Rajendran is that unit testing is
boring. The solution to this is to provide better tools to automate repetitive task. Another
problem is that documentation of test cases is rarely done in practise. This makes it hard to
modify existing test cases. Because lots of stubs have to be created in order for a unit test
to function, the test code is in many cases larger then the production code. Stub code can
have bugs as well.
Some of these drawbacks can be resolved, like enforcing code conventions that create
self-documenting code. Also if the code has a high testability the unit tests will be less
complex, reducing the number of bugs in the test code. The effort of writing full coverage
unit tests will always be great, and a careful decision has to be made if the program is
important enough to justify such an effort.
Chapter 5
Result
This chapter describes the final implementation of the application. The module has a service
oriented interface shown in listing 5.1. There is a method for calculating the rent for a set of
apartments, given the id of a model, and to calculate the rent for all apartments. There is
also event handlers to receive feedback about the calculation progress.
Listing 5.1: The calculation module interface.
public interface I O b j e c t C a l c u l a t i o n S e r v i c e
{
event P u m a C a l c u l a t i o n S e r v i c e . ObjectService . O b j e c t C a l c u l a t i o n S e r v i c e .←CalculationProgressHandler CalculationProgress ;
event P u m a C a l c u l a t i o n S e r v i c e . ObjectService . O b j e c t C a l c u l a t i o n S e r v i c e .←C a l c u l a t i o n E v e n t H a n d l e r C al c ul a ti o nE v en t ;
void Ca l cu l at e Ob j ec t s ( IEnumerable < int > objectIDs , int modelID , string ←ca lcul atio nNam e ) ;
void C a l c u l a t e A l l O b j e c t s ( int modelID , string c alcu lat ionN ame ) ;
}
5.1
Overview
A conceptual overview of the architecture can be seen in figure 5.1. The business logic is
separated from the data access layer and only depends on the POCO entities. The generic
query repository uses specifications and fetch strategies to fetch entities from the context.
The resulting entities can then be used in the business logic module. The data access and
business logic is wrapped in a service layer that acts as a layer between the whole calculation
module and the service consumer, in this case a web application.
17
18
Chapter 5. Result
Service
GenericRepository
Logic
Context
Specification
FetchStrategy
POCO
Figure 5.1: Conceptual overview of the system.
5.2
Entity Data Model
The core of the data access layer is the Entity Data Model. It is semi-automatically generated
from the current development database and each table is directly mapped to an entity object.
The foreign key relations are also included as associations between entities. Because of legacy
artefacts in the database some minor adjustment has to be made to the data model, for
example relations without foreign key constraints has to be added manually.
A T4 template[12] is used to generate the context. T4 templates are a combination of
program code and a scripting language that’s used to output program code. A mock context
and mock object set is also generated, to allow mocking of dependencies in the unit tests.
Figure 5.2 shows how the real context and the mock context implement the same interface.
This allows for unit tests that replace the real data access with mock data access.
5.3. POCO
19
«interface»
IPumaModelContext
«interface»
IObjectSet
+Entities () : IObjectSet <Entity>
MockObjectSet
PumaModelContext
+Entites () : IObjectSet <Entity>
PumaModelContextMock
+Entities () : MockObjectSet <Entity>
ObjectContext
Figure 5.2: The real and the mock context implements the same interface.
5.3
POCO
The POCO entities were generated with the same template as the context. They are placed
in a separate project, having no references to any other project. This makes it possible to
write business logic that is not dependent on the data source. Although the POCO entities
are generated from a database, this is a one-time operation. When the classes are in place,
instances can be created at any time, without requiring a database connection.
5.4
Specification
A specification checks if an entity satisfies a certain condition. The condition is specified as a
LINQ expression. The same expression is used both to check if an in-memory entity satisfies
the specification, but it is also used in LINQ to Entities (3.3) to receive entities from the
database. This eliminates any duplicate code between accessing the in-memory model and
20
Chapter 5. Result
accessing the database, as well as isolating the query expression so it can be unit tested.
The most important method in listing 5.2 is IsSatisfiedBy. It determines if an entity
satisfies the LINQ expression in the specification. The Predicate property simple return the
internal expression. There is also methods to combine the specifications using boolean logic.
Listing 5.2: The specification interface
public interface ISpecification <T >
{
Expression < Func <T , bool > > Predicate { get ; }
bool IsSatisfiedBy ( T entity ) ;
ISpecification <T > And ( ISpecification <T > other ) ;
ISpecification <T > Or ( ISpecification <T > other ) ;
}
5.5
FetchStrategy
FetchStrategy is a very simple class that contains the associated entities that should be
loaded when the root entity is loaded. This is the same feature as Include in Entity
Framework (3.4), but wrapped in its own class. The fetch strategy is used together with the
specification when loading entities from the repository, as can been seen in listing 5.3.
5.6
Repository
The repository is based on a generic repository created by Will Beattie[2]. The basic idea
when loading an entity is to provide a specification of the same entity type. Only entities
satisfying the specification will be loaded. Part of the interface of the repository can be
found in listing 5.3. It contains methods to load a single entity matching a specification,
load all entities matching this specification and to check if any entity exists that matches the
specification. In addition a FetchStrategy can be supplied. It determines if any associated
entities should be loaded as well. This allows Entity Framework to load the associated
entities joined in a single query, decreasing the number of queries required thus increasing
performance.
Listing 5.3: An excerpt from the generic repository interface
public interface I G e n e r i c Q u e r y R e p o s i t o r y
{
T Load <T >( ISpecification <T > spec ) where T : class ;
IEnumerable <T > LoadAll <T >( ISpecification <T > spec ) where T : class ;
bool Matches <T >( ISpecification <T > spec ) where T : class ;
T Load <T >( ISpecification <T > spec , IFetchStrategy <T > fetchStrategy )
where T : class ;
...
}
5.7. Calculation
5.7
21
Calculation
The logic module handles all aspect of the rent calculation. The class dependency diagram can
be seen in figure 5.3. It is important to note that all dependencies are actually dependencies
on the interface.
Another important point is that the module is not aware of any part of the data access
modules. It uses the POCO entities as if they were in-memory object graphs.
Calculator
ObjectCalculator
FormulaCalculator
RentCalculator
ModelLayoutCalculator
AdjustmentValueCalculator
CalcValueCalculator
DependencyCalculator
OPICalculator
Figure 5.3: The depenency between classes in the calculation module.
22
Chapter 5. Result
5.8
Dependencies
Each dependency between two classes is implemented using dependency injection. Instead of
instantiating an object the usual way, as in listing 5.4, a factory method is used in listing 5.5
to instantiate the dependency.
Listing 5.4: Ordinary object instantiation.
public class F o r m u l a C a l c u l a t o r : I F o r m u l a C a l c u l a t o r
{
public decimal MyMethod ()
{
IModelLayoutCalculator modelLayoutCalculator =
new M o d e l L a y o u t C a l c u l a t o r () ;
...
}
}
Listing 5.5: Dependency injection using a factory lambda expression.
public class F o r m u l a C a l c u l a t o r : I F o r m u l a C a l c u l a t o r
{
public Func < IModelLayoutCalculator > M o d e l L a y o u t C a l c u l a t o r F a c t o r y =
() = > new M o d e l L a y o u t C a l c u l a t o r () ;
public decimal MyMethod ()
{
IModelLayoutCalculator modelLayoutCalculator =
M o d e l L a y o u t C a l c u l a t o r F a c t o r y () ;
...
}
}
The factory method in listing 5.5 may look complex if you are unused to the the syntax,
but it is simply a first class function stored in the member variable ModelLayoutCalculator
Factory. The function is assigned a default value that uses the C# language feature of
lambda expressions to create a method that has no input (the empty parenthesis) and returns
a new instance of the ModelLayoutCalculator class. To invoke the function the variable
name is used, followed by parentheses.
This implementation differs from the one found in the study in section 4.5. In this case
the only reason for using dependency injection is to replace the real dependency with a mock
object. In the actual application the dependencies are hard coded. Therefore the factory
methods allows a default dependency to be implemented, and this makes the classes easier
to use, because the dependencies doesn’t have to be sent to the constructors. If dependency
injection is used to allow different implementations in the actual production code this method
is likely insufficient.
5.9. Data Access
5.9
23
Data Access
The generic repository and specification is only one way to load the data. To measure the
performance of this approach four other loading methods has been implemented to be used
as a reference.
– Using LINQ to query directly against the object sets, as described in chapter 3.
– Using a SQL query string with ADO.NET.
– Using a stored procedure and calling it using ADO.NET.
– Using Entity Framework function import to create a strongly typed result object from
the stored procedure.
5.10
Data Persistence
To save the result from the calculation to the database, instances of POCO classes that are
to be saved to the database are created and added to the object context. The result is then
persisted by Entity Framework to the database.
The test in section 6.4.3 showed that this method was highly inefficient and had to be
abandoned. Instead the SqlBulkCopy[13] class was used that can efficiently copy data from
any data source to a database.
5.11
Unit Tests
All these techniques come together in the unit tests. Because the classes are decoupled from
both the data access layer and from each other the unit tests become very simple to write.
The framework used for unit testing is Visual Studio Unit Testing Framework [15]. This
framework is built into Microsoft Visual Studio.
24
Chapter 5. Result
5.11.1
Testing Data Access
The generic repository only have to be tested once, it doesn’t have to change when new
entities are added. What remains is to test the specifications. The specification in listing 5.6
is only satisfied by apartments (objects) that are active. In this case an apartment is active
if its isInActive attribute is null or false. There are three possible states for apartments:
1. isInActive = null should satisfy the specification.
2. isInActive = false should satisfy the specification.
3. isInActive = true should not satisfy the specification.
Each state can now be tested in a unit test. The only dependency that specification has is
on POCO entities. Recall from section 5.3 that POCO entities have no dependencies at all.
Listing 5.6: Specification for an active apartment.
public class O b j e c t I s A c t i v e S p e c i f i c a t i o n : SpecificationBase < PumaPOCO . Object >
{
public O b j e c t I s A c t i v e S p e c i f i c a t i o n ()
{
predicate = obj = > ! obj . isInActive . HasValue || obj . isInActive . Value ←== false ;
}
}
Listing 5.7: Unit testing the Specification for an active apartment.
[ TestMethod () ]
public void ←O b j e c t I s A c t i v e S p e c i f i c a t i o n _ s h o u l d _ m a t c h _ o b j e c t _ w i t h _ i s I n A c t i v e _ n u l l ()
{
PumaPOCO . Object obj = new PumaPOCO . Object ()
{
isInActive = null
};
O b j e c t I s A c t i v e S p e c i f i c a t i o n target = new O b j e c t I s A c t i v e S p e c i f i c a t i o n () ;
bool expected = true ;
bool actual = target . IsSatisfiedBy ( obj ) ;
Assert . AreEqual ( expected , actual , " Object should satisify specification " ) ;
}
5.11. Unit Tests
5.11.2
25
Test Data
Because the test cases are so isolated in most cases the amount of test data required for each
unit test is very small. Instances of POCO entities are created on the fly in the test method
and sent as parameters to the method under test.
5.11.3
Mocking
Using the technique in section 5.8 it is possible to replace the real implementation of a
dependency with a fake, or mock object. One way of doing this is to implement the same
interface and replace the factory to return the mock object. In this case an external library
called Moq[8] is used. It is a library that makes it possible to easily implement an interface
on the fly. By default each method will return the default value of the return type, for
example null for all reference types. Specific methods can then be overrided to return any
value. A short example can be seen in listing 5.8.
Listing 5.8: A mock implementation of ICalcValueCalculator is created and the IsObject
MatchingCalcValue method is overridden to always return true for any input.
[ TestMethod () ]
public void MyTest ()
{
Mock < ICalcValueCalculator > c a l c V a l u e C a l c u l a t o r M o c k = new
Mock < ICalcValueCalculator >() ;
c a l c V a l u e C a l c u l a t o r M o c k . Setup ( x = > x . I s O b j e c t M a t c h i n g C a l c V a l u e (
It . IsAny < ICalculationObject >() ,
It . IsAny < CalcValue >() )
) . Returns ( true ) ;
}
}
5.11.4
Example
An example of a test method from the project can be seen in listing in 5.11. The test
tests the FormulaCalculator method GetFormulaCalculatedPointsForObject which interface can be seen in listing 5.9. The purpose of this class is to take an apartment
(ICalculationObject), model and formula and calculate the score for the apartment.
Because each class is supposed to have only a single responsibility (section 4) this class
only takes the score for each formula alias (A, B, C) and substitutes them into the formula to
calculate the final score. The rest of the calculation is performed by another class, through an
interface called IModelLayoutCalculator. The method that calculates the points is called
GetFormulaCalculatedPointsForObject. The interface can be seen in listing 5.10.
The first thing to do in the unit test in listing 5.11 is to set up the test data. Because
POCO entities are used they can simply be created on the fly, and only the relevant data
has to be initialized. For example the model and object will never be read, so no fields has
to be initialized. The formula is initialized to a + 2 × b.
The next step is to hard code a return value for the GetFormulaCalculatedPointsForObject method, because the purpose of this test is to test the FormulaCalculator, not
any other classes. Using Moq[8] a mock object is created and the method is set up to return
the hard coded value.
26
Chapter 5. Result
Using the inversion of control factory method the FormulaCalculator is set up to use
the mock object instead of the real implementation.
Now that everything is set up the actual method that is to be tested can be called, and
the returned value should be 1 + 2 × 2 = 5.
Listing 5.9: The interface IFormulaCalculator implemented by FormulaCalculator.
[ TestMethod () ]
public interface I F o r m u l a C a l c u l a t o r
{
decimal G e t F o r m u l a C a l c u l a t e d P o i n t s F o r O b j e c t ( I C a l c u l a t i o n O b j e c t obj , ←I C a l c u l a t i o n M o d e l model , Formula formula ) ;
}
Listing 5.10: An excerpt from the interface IModelLayoutCalculator implemented by
ModelLayoutCalculator.
[ TestMethod () ]
public interface I M o d e l L a y o u t C a l c u l a t o r
{
Dictionary < string , decimal > G e t P o i n t s F o r R o o t M o d e l L a y o u t s (←I C a l c u l a t i o n O b j e c t obj , I C a l c u l a t i o n M o d e l model ) ;
...
}
5.11. Unit Tests
27
Listing 5.11: A test method for the formula calculator.
[ TestMethod () ]
public void S h o u l d _ c a l c u l a t e _ t h e _ p o i n t s _ b a s e s _ o n _ t h e _ f o r m u l a ()
{
// Setup test data .
I C a l c u l a t i o n M o d e l model = new C al c ul a ti o nM od e l () ;
I C a l c u l a t i o n O b j e c t obj = new C a l c u l a t i o n O b j e c t () ;
Formula formula = new Formula ()
{
Formula1 = " a + 2 * b "
};
// Instead of the model layout calculator calculating the points the ←result is hard coded .
Dictionary < string , decimal > points = new Dictionary < string , decimal >() ;
points . Add ( " a " , 1) ;
points . Add ( " b " , 2) ;
// Create a mock of the model layout calculator that returns the hard ←coded points .
Mock < IModelLayoutCalculator > m o c k M o d e l L a y o u t C a l c u l a t o r = new Mock <←IModelLayoutCalculator >() ;
m o c k M o d e l L a y o u t C a l c u l a t o r . Setup ( m = > m . G e t P o i n t s F o r R o o t M o d e l L a y o u t s (
It . IsAny < ICalculationObject >() ,
It . IsAny < ICalculationModel >() )
) . Returns ( points ) ;
// Use the inversion of control factory to make the formula calculator ←use the mock object instead of the real object .
F o r m u l a C a l c u l a t o r target = new F o r m u l a C a l c u l a t o r () ;
target . M o d e l L a y o u t C a l c u l a t o r F a c t o r y = () = > m o c k M o d e l L a y o u t C a l c u l a t o r .←Object ;
decimal expected = 5;
// Make the call to the method under testing .
decimal actual = target . G e t F o r m u l a C a l c u l a t e d P o i n t s F o r O b j e c t ( obj , model , ←formula ) ;
// Assert that the returned value is the expected one .
Assert . AreEqual ( expected , actual , " The formula calculated points are ←incorrect . " ) ;
}
Almost all unit test are based on the layout of the test in listing 5.11. Sometimes not all
steps are necessary, for example if a class has no dependencies. The following are the steps
used:
1. Create test data.
2. Create mock object that return test data.
3. Replace the real dependencies with mock objects.
4. Call the method that is to be tested.
5. Assert that the return value is the expected value.
28
Chapter 5. Result
Chapter 6
Performance
The purpose of the performance measurement is to determine how well the application
performs when the amount of data is scaled up. The different loading methods in section 5.9
are compared.
6.1
Test Data
The test data sets are based on a customer database with 8723 apartments. The apartments
were duplicated or removed to create differently sized databases. The number of apartments
in each database are 100, 200, 500, 1000, 2000, 5000, 10000, 20000, 50000, 100000 and 200000.
The numbers were chosen based on the size of existing customers databases and expected
size of future customers.
6.2
Test Application
A test application was created to test the performance. The time to perform the calculation,
including loading all data, and store the result in memory was measured separately from the
time to save the result to the database. The data persistence is the same for each test and
therefore not interesting. However, to be able to compare the new calculator to the legacy
one the save time has to be measured as well.
The test application measures the number of apartments that were calculated each second.
This is used to see if the calculators performance change over time.
The memory usage was also measured. A separate thread was used that wakes up every
seconds and polls the garbage collector for the current amount of memory allocated by the
garbage collector. Before each poll the garbage collector releases all unreferenced memory.
6.3
Execution
Each implementation is run ten times for each database. The first execution takes longer
and is discarded. This is because Entity Framework performs some initialization the first
time it is invoked. The mean value of the remaining nine values are used as the result.
29
30
Chapter 6. Performance
6.4
Result
6.4.1
Calculation time
The average number of calculated apartments per second is shown in figure 6.1. The function
import, inline query and stored procedure performs equally with a peak performance at
about 10000 apartments. This means that the calculation is not completely linear and for
larger number of apartments the performance rapidly decreases. The reason for this is that
Entity Framework performs some automatic linking of related entities that is not executed
in constant time.
The LINQ and Specification methods also performs equally but compared to the other
methods the performance is awful. They were also unable to calculate more than 50000
apartments so no result is recorded for bigger data sets.
1000
Calculations per second (n/s)
900
800
700
600
Function Import
Inline Query
LINQ
Specification
Stored Procedure
500
400
300
200
100
0
1
10
100
1000
10000
Apartments (n)
100000 1000000
Figure 6.1: Comparison between the different data loading methods.
6.4.2
Memory Usage
The memory usage for each methods for different number of calculated apartments can
be seen in figure 6.2. The specification method is the most memory intensive with a peak
memory usage of 361 MiB calculating 50000 apartments. The LINQ method uses almost
200 MiB for the same calculation and the rest uses only about 20 MiB. Even for 200000
apartments the memory usage is only 50 MiB.
6.4. Result
31
512
Peak memory (MiB)
256
128
64
Function Import
Inline Query
LINQ
Specification
Stored Procedure
32
16
8
4
2
1
10
100
1000
10000
Apartments (n)
100000 1000000
Figure 6.2: Comparison between the memory usage of the different data loading methods.
6.4.3
Persistence
Early versions used the data persistence feature of Entity Framework to save the result of
the calculation to the database. An early test run showed that for large amounts of data
the time to save the result was greater then the actual calculation. The result of the test
run can be seen in figure 6.3. Saving the result of 200000 apartments took 35 minutes. The
memory usage was also extremely high, figure 6.4 shows a peak memory usage of 963 MiB.
The persistence step starts at about 1000 seconds and continues until the end of execution.
32
Chapter 6. Performance
10000
Save time (s)
1000
100
Calculation Time
Save Time
10
1
1
10
100
1000
10000
Apartments (n)
100000 1000000
Figure 6.3: Performance of using the function import method for calculation and the Entity
Framework persistence feature to save the result.
Memory Usage (MiB)
1200
1000
800
600
Function Import
400
200
0
0
500
1000
1500
Elapsed time (s)
2000
2500
Figure 6.4: The memory usage when saving using Entity Framework.
The final program uses the SqlBulkCopy[13] function and saving the same amount of
data takes only 14 seconds with a memory usage of only a few kibibytes. This method is not
6.4. Result
33
as flexible however and the records persisted to the database are not automatically updated
on the client side, but have to be loaded again manually.
6.4.4
Legacy Calculator
A comparison was made with the old calculator and one of the most efficient methods,
function import. This comparison includes the time to persist the result to the database.
The result can be seen in figure 6.5. The old calculator calculates only five apartments per
second while the new one using function import and the SqlBulkCopy function has a peak
of 762 apartments per second.
800
Calculations per second (n/s)
700
600
500
400
Function Import
Old calculator
300
200
100
0
1
10
100
1000
Apartments (n)
10000
Figure 6.5: Comparison between the function import method and the old calculator.
34
Chapter 6. Performance
Chapter 7
Conclusions
Because of lack of time the external supervisor did not have time to create a formal
specification. This meant it took some time to actually figure out what the thesis was all
about. Despite the slow start the work went on smoothly and the project was finished only
one week behind schedule.
The resulting application is fully functional and it performs the same task as the old
program but over a hundred times faster. A big improvement in performance was expected,
but the result still exceeded the expectations. The module has over 200 unit tests and only
time can tell if it is easily maintained, but it will have a greater chance than the legacy
application.
The main goal of the thesis was to find a method to incorporate unit testing into the
development cycle. It turned out that the main problem was not to write test cases, it was
to write code that is easy to test. By abstracting away the database access and adhering to
the rules of observability, isolation and single responsibility principle writing unit test will be
a lot more feasible in the future.
Because of the unit tests some of the bugs introduced when adding new features to the
program will be avoided. Smaller and less coupled classes will also make it possible to reuse
tried and tested classes, avoiding the need to modify classes and risking introducing new
bugs. The thing that is missing is integration tests that make sure that the module as a
whole is still working after modifications has been carried out on the module.
Another main topic was how to test data access code. This turned out to be the hardest
part where several approaches had to be completely abandoned. It was either too much
effort to write the tests or the tests were useless. The final solution of using specifications is
a good compromise and, at least in theory, the whole concept has a lot of potential.
The evaluation of Entity Framework 4.0 showed that almost all code can be automatically
generated from the database, minimizing the effort needed to bring the database into object
oriented code. The performance however is awful for large sets of data. Thankfully it is
possible to optimize the bottlenecks by replacing them with stored procedure. It would have
been interesting to compare Entity Framework with more mature ORM frameworks, most
notably NHibernate[17].
The unit tests created are very useful, but there is also a need for integration tests to test
the interactions of units. A big challenge here is to maintain test data that can be updated
together with the application. This is another topic that would be interesting to explore.
35
36
7.1
Chapter 7. Conclusions
Limitations
The main limitation of the module is that is not yet integrated into the graphic user interface
of the rest of the application. More issues will probably have to be considered when the
module is integrated with user input. There is a feature to select only a subset of the
apartments to be used in the calculation, but its performance does not compare to loading
all apartments at once. A better solution has to be found in creating this subset.
7.2
Future Work
A lot of things like maintainability cannot be evaluated before the module starts to expand.
There is also not known how much an effort is required to maintain the code, keeping all
test cases up to date.
Because unit tests only test each unit in isolation the test suite will not detect errors
that occur when units are interacting. It would be possible to create a suite of integration
tests that test the service layer, because it has a well defined interface. These tests requires
another database with test data that has to be maintained when the application changes.
Chapter 8
Acknowledgements
I would like to thank TRIMMA for the opportunity of doing this project and my external
supervisor Mattias Blom and the other employees at TRIMMA for their feedback. A thanks
also to my internal supervisor at Umeå University, Jan-Erik Moström.
37
38
Chapter 8. Acknowledgements
References
[1] Scott Allen. Testability and Entity Framework 4.0. http://msdn.microsoft.com/enus/library/ff714955.aspx (visited 2012-05-21).
Specification Pattern,
Entity Framework & LINQ.
[2] Will Beattie.
http://blog.willbeattie.net/2011/02/specification-pattern-entity-framework.html
(visited 2012-06-01).
[3] SABO Sveriges Allmännyttiga Bostadsföretag. Sätt rätt hyra, handledning i systemematisk hyressättning, 2010.
[4] Martin J. Fowler. ObjectMother. http://martinfowler.com/bliki/ObjectMother.html
(visited 2012-05-22).
[5] Martin J. Fowler. Patterns of Enterprise Application Architecture. Addison-Wesley
Professional, 2002.
[6] Edward Hieatt and Rob Mee. Repository. http://martinfowler.com/eaaCatalog/repository.html
(visited 2012-05-22).
[7] Hyressättningsutredningen. Sou 2004:91 reformerad hyressättning. Socialdepartementet,
09 2004.
[8] Clarius Consulting Labs. Moq. http://code.google.com/p/moq/ (visited 2012-06-05).
[9] Julia Lerman. Agile Entity Framework 4 Repository. http://thedatafarm.com/blog/dataaccess/agile-entity-framework-4-repository-part-1-model-and-poco-classes/ (visited 201205-23).
[10] Julia Lerman. Programming Entity Framework. O’Reilly Media, 2009.
[11] Robert C. Martin. The single responsibility principle. Principles of Object Oriented
Design, 2002.
[12] Microsoft. Code Generation and T4 Text Templates. http://msdn.microsoft.com/enus/library/bb126445.aspx (visited 2012-07-30).
SqlBulkCopy
Class.
http://msdn.microsoft.com/en[13] Microsoft.
us/library/system.data.sqlclient.sqlbulkcopy.aspx (visited 2012-06-25).
[14] Microsoft. The ADO.NET Entity Framework Overview. http://msdn.microsoft.com/enus/library/aa697427(v=vs.80).aspx (visited 2012-06-07).
[15] Microsoft.
Unit testing framework.
http://msdn.microsoft.com/enus/library/ms243147(v=vs.80).aspx (visited 2012-08-01).
39
40
REFERENCES
[16] MyGeneration.
The
dOOdads
.NET
Architecture.
http://www.mygenerationsoftware.com/portal/dOOdads/Overview/tabid/63/Default.aspx
(visited 2012-06-07).
[17] NHibernate. Nhibernate. http://nhforge.org/Default.aspx (visited 2012-08-01).
[18] Stephanie Punke Peter Schuh. Objectmother - easing test object creation in xp. XP
Universe, 2003.
[19] R. Venkat Rajendran. White paper on unit testing. Deccanet Designs Ltd., 2002.
[20] Philip Craig Tim Mackinnon, Steve Freeman. Endo-testing: Unit testing with mock
objects. XP eXamined, 2000.