Batch Processing How- To Stefan Rufer, Netcetera Matthias Markwalder, SIX Card Solutions

Batch Processing How- To
Or the “The Single Threaded Batch Processing Paradigm”
Stefan Rufer, Netcetera
Matthias Markwalder, SIX Card Solutions
6840
2
Speakers
> St efan Rufer
– St udied business IT at t he Universit y of Applied Sciences in
Bern
– Senior Soft ware Engineer at Net cet era
– Main int erest: Server side application development using JEE
> Mat thias Markwalder
– Graduat ed from ETH Zurich
– Senior Developer + Framework Responsible at SIX Card
Solutions
– Main int erest: High performance and qualit y batch processing
3
Why are we here?
> Let 's learn how t o bake an omelet .
4
AGENDA
> What do we do
> Sharing our ex perience
> Wrap up + Q&A
5
What do we do
> Credit / debit card t ransact ion processing
> Backoffice bat ch processing application 24x 7x 365
> 1.7 Mio card t ransact ions a day
> Volume will double by end of 2010  be ready…
> Migrat ed from Fort é UDS to JEE
> More agile code base now
6
How do we do it
> Transact ional int egrit y at any time
> Custom batch processing framework (not Spring Bat ch)
> 1 controller  builds the jobs
35 workers  process t he steps of jobs
(or as many as you want and your syst em can take)
> 1 application server (12 cores)
> 1 dat abase server (12 cores, 1.5TB SAN)
7
Batch Processing Basics
> It‘s simple, but parallel:
– Read file(s)
– Process a bit
– Write file(s)
> Terminology from
Spring Bat ch
8
AGENDA
> What do we do
> Sharing our ex perience
> Wrap up + Q&A
9
Bake an omelet
> 200g flour, 3 eggs, 2 dl milk, 2 dl wat er, ½ table spoon salt
> St ir well, wait 30min ( )
> St ir again
> Put litt le but ter in heated pan
> Add 1dl dough
> Bake until slightly brown, flip over, bake again half as long
> Put cheese / marmalade / apfelmus / ... on top, fold
> Enjoy 
10
Jobs run in parallel
Mot ivat ion
> Load balancing
Ex ample
> Complete yest erdays reports
while doing today's business
How to achieve
> Use bat ch scheduling applicat ion t hat
cont rols your entire processing.
> Read/ modify categorization of jobs
12
Load limitations
Mot ivat ion
> Load balancing
Ex ample
> Generate 70 reports, but max 20 in parallel
How to achieve
> Number of workers one job can use
> Priorit ies of t he steps of a job
13
Decouple controller + workers
Mot ivat ion
> Scalabilit y
Ex ample
> SETI@hom e
14
Step trees, Sequential, Fail on Exception
Mot ivat ion
> Avoid structuring st eps in code
Ex ample
>
writ e a file.
Collect dat a, afterwards
How to achieve
> Sequent ial ex ecution
> Fail on ex ception
(rollback entire st ep)
Step trees, Parallel, Continue on Exception
Mot ivat ion
> Minimize work left
Ex ample
> Process 30'000
t ransactions in 3 steps.
How to achieve
> Parallel ex ecut ion
> Continue on ex ception
(still rollback ent ire st ep)
15
16
Parallelize reading
Mot ivat ion
> Speedup
Ex ample
> A file of 200'000 credit card
aut horisat ions and transact ions
have t o be read into database.
How to achieve
> Cut input file in pieces of 10'000 lines each.
– bt w: perl, sort are unbeat en for this...
> Process each piece in a parallel st ep.
17
Parallelize processing
Motivation
> Speedup
Ex ample
> Summarize accounting data and
store result in database again.
How to achieve
> Group data in chunks of 10'000 and process each chunk in a parallel step.
> Choose grouping criteria carefully:
–
No overlapping data areas
–
Pass along data that you had to read for the grouping process
18
Parallelize processing – how to group
Motivation
> Structuring your data in parallelizable chunks
> Load balancing
Ex ample
> Parallelize processing by client as data is distinct by design.
How to achieve
> Group by client
> Group by keys: Ranges or ids
– Ranges (1..5) can grow very large
– Keys (1, 2, 3, 4, 5) can become very many
19
Parallelize writing
Mot ivat ion
> Transact ional int egrit y while writing files.
> Easy recovery while writing files.
Ex ample
> Collect dat a for the payment file.
How to achieve
> Collect dat a in parallel and writ e t o a staging t able.
> St aging t able cont ent very close t o target file format.
> In a last st ep dump ent ire cont ent of staging t able t o file.
20
Different processes write in parallel
Mot ivat ion
> Don't lock out each ot her
Ex ample
> Account informat ion changes
while account balance grows.
How to achieve
> No opt imist ic locking
> Modify delt as on sums and count ers
> Keep dist inct fields for different parallel jobs
> Be aware of deadlock pot ent ial
21
Avoid insert and update in same table and
step
Mot ivat ion
> Speedup
> Avoid DB locks
Ex ample
> Summary rows in same t able as
t he raw dat a.
How to achieve
> Normalize your database.
22
Let the database work for you
Mot ivat ion
> Simple code
> Speedup
Ex ample
> Sorting or joining arrays in memory.
How to achieve
> Code review.
> Book SQL course.
23
Read long, write short
Motivation
> Keep lock contention on database minimal
> Keep transactional DB overhead minimal
Ex ample
> Fully process the whole batch of 1‘000 records before starting to write to
DB.
How to achieve
> 1 (one) "writing" database transaction per step.
interface IModifyingStepRunner {
void prepareData();
void writeData();
}
24
This omelet did not taste like grandma's!
> Despite following the recipe, there are the hidden corners
> Let's have a look at some pitfalls
25
Don't forget to catch Error
Motivation
> Application integrity delegated to DB
Ex ample
> OutOfMemoryError caused half of a batch to be committed. Fatal as rerun
can not fix inconsistency.
How to fix
try {
result = action.doInTransaction(status);
} catch (Throwable err) {
transactionManager.rollback(status);
throw err;
}
transactionManager.commit(status);
26
Use BufferedReader / BufferedWriter
Mot ivat ion
> Speedup (file reading t ime cut in half)
Ex ample
> Forgot t o use BufferedReader in file reading framework.
How to fix
> Code review.
> Profile if performance "feels not right ".
27
Use 1 thread only
Mot ivat ion
> Simplicity for t he programmer
> Safet y (no concurrent access)
Ex ample
> Singlet on, synchronized blocks, st at ic variables,
st at eful st ep runners – we had it all...
How to achieve
> Configure fram ework t o use one JVM per worker.
28
Cache wisely
Mot ivat ion
> Speedup
> Limit memory use
Ex ample
> Tax rat es do not change during a processing day, cache it long.
> Customer data will be reused if processing transact ion of same
customer – cache it short .
How to achieve
> Cache per worker
> Cache lifetimes: Worker / step / on demand
29
Support JDBC batch operations
Mot ivat ion
> Speedup
Ex ample
List<Booking> bookings = new ArrayList<Booking>();
...
bookingDao.update(bookings);
How to achieve
> Enhance your database layer wit h a built - in JDBC bat ch facilit y.
> Ex ecut e bat ch after 1000 it ems added.
> Autom at ically re- run failed batch using single JDBC st at ement s
30
Structured patching
Mot ivat ion
> Risk management
> St ay agile in product ion
Ex ample
> Bug found, fix ed and unit test ed. Deploy t o product ion asap.
How to achieve
> Eclipse- wizard to creat e pat ch (all files involved to fix a bug)
> Pat ch- script t hat applies .class file/ SQL script/ whatever...
31
Never, ever, update primary keys
Mot ivat ion
> Good database design
> Speedup
Ex ample
> Homem ade library always wrot e ent ire row t o dat abase.
How to fix
> Only writ e changed fields (dirt y flags).
> Make primary keys immut able on your object s.
32
AGENDA
> What do we do
> Sharing our ex perience
> Wrap up + Q&A
33
Future
> Scalabilit y is an issue with a single database server.
– Partit ioning opt ions used, but not t o t he end.
– Will Moore's law save us again?
> Processing double the volume st ill t o be proven...
34
If you remember just three things...
Java batch processing works and is cool :- )
Trade- offs:
> Do not stock the work, start.
> Single threaded, many JVMs.
> Designing for scalability, stability needs experts.
http:/ / www.google.ch/ search?q= how+ to+ flip+ an+ omelet
Stefan Rufer
stefan.rufer@netcet era.ch
Netcetera AG
www.netcet era.ch
Matthias Markwalder
group.com
matt hias.markwalder@six -
SIX Card Solutions
www.six - group.com
36
Links / References
> htt p:/ / en.wikipedia.org/ wiki/ Batch_processing
> htt p:/ / stat ic.springframework.org/ spring- bat ch/
> htt p:/ / www.bmc.com/ product s/ offering/ control- m.html
> htt p:/ / www.javaspecialists.eu/
And to really learn how to bake fine omelets, buy a book:
> htt p:/ / de.wikipedia.org/ wiki/ Marianne_Kalt enbach
> htt p:/ / www.oreilly.de/ catalog/ geeksckbkger/
37
Other batch processing frameworks (public
only)
>
http:/ / www.bmap4j.org/
>
http:/ / freshmeat.net/ projects/ jppf
>
http:/ / hadoop.apache.org/