Agile Data Practices - Database Management

Agile Data Practices
Database Management Techniques
to Support Software Delivery Teams
Tim Andersen
Agile Coach / Software Developer
Lean TECHniques, Inc.
@timander
Abstract
As software development teams introduce agile
practices, improving how they work with databases is
often ignored. Software developers tend to accept the
fact that databases must be shared, are hard to
change, and they must wait on someone else to make
those changes. It doesn't have to be this way! In this
session, I will identify the pain points and anti-patterns
associated with traditional database management and I
will show specific techniques and tools to improve the
workflow for embracing database changes.
What’s different?
What are the problems with
traditional database practices?
What changes in the data world
need to occur in order to help
software delivery teams succeed?
We will demystify agile database
techniques and preview tools
Obsolete data
management
practices are
impediments to
agile teams
This session will focus on improving how
teams interact with relational databases
A common
problem...
What is the current state of the database?
Does QA match Production?
Does DEV match QA?
Has the quick-fix that was applied in production been
applied to test and dev environments?
How can you tell?
A common
situation...
Keeping the dev environment in-sync with the test and
production environments is usually a crude and errorprone process that involves coordinating changes and
the timing of applying those changes with the database
administrators. Many times, there are differences but
because the database is “close enough” it happens to
work until someone encounters a problem.
Compatibility
Another pesky problem we encounter is the dependency
between the application and the database. The
application expects a specific structure in place for the
database, and if it doesn’t match the application will not
work. Database changes are sensitive to timing and must
be applied when the application is deployed. This often
requires coordinating with a database administrator to
apply those changes in the evening or on a weekend.
Culture
Database have typically been treated as large, heavy
objects that are hard to change. We treat them as a
scarce resource that we have to share.
Questions to ponder:
Why is this a common practice?
How does your database vendor charge you?
Knowledge
Silos
Database skill-sets are considered specialized and often
silo-ed into departments. This culture is deeply ingrained
in many organizations. Because of this, data
professionals are typically treated as a “shared service”
and usually spread out to support multiple teams. This
causes teams to wait on database-related questions or
changes.
Blocked!
Developers often lack the skill-sets and permission
necessary to make database changes and encounter
blocked work because they must wait on a data
professional for making database changes.
The developer then has two choices:
pick less important work
workaround the problem
Privileges
Databases are often locked down to prevent
unauthorized changes and there are processes and
policies in place to prevent changes from unauthorized
individuals. This gives all of the power to the database
administrators and puts the developers at their mercy.
Lockdowns and
Workarounds
If you have ever seen an anti-pattern such as a multipurposed column, or a multi-valued column, have you
wondered why someone decided that was a good idea?
This usually happens because working around the
controls in place to avoid database changes. At the time
these decisions are introduced, the additional complexity
is the path of least resistance compared to changing the
database.
Ugh, meetings
We have a strong urge to get the database design right
the first time and we spend a lot of time in design and
analysis. These meetings are intended to prevent the
wrong decision from being made because of the belief
that the database is hard to change and we cannot afford
to get it wrong. Meanwhile, these meetings are blocking
development work and taking time away from other
valuable activities. This is another reason why developers
prefer to avoid database changes if at all possible.
The NoSQL
Movement
Why do you think the NoSQL movement has gained such
popularity? My theory is that it is not always about the
technology. Sometimes NoSQL solutions are introduced
as a way to bypass the database administrators. Whether
or not it is a good technical decision, it often appears to
be more successful than a relational database because
the team take on the skill-sets for managing NoSQL
databases and can deliver faster because they are not
blocked by database design decisions or waiting on
someone to apply database changes.
Manual
Changes
ALTER TABLE ADDRESS MODIFY COLUMN ZIP_CODE VARCHAR(10) NOT NULL;
Manual changes are not recorded, which proliferates
the need for comparing schemas between
environments. Manual changes also require duplicate
effort to apply changes in other environments, and are
susceptible to human error.
Schema
Comparisons
Comparing the database schema from one environment
to another is often the only mechanism we have in
order to determine whether or not a set of database
changes has been applied or not. This activity is
required for each environment when the application is
deployed and requires database changes.
Multiple
Environments
DEV
QA
PROD
When are database changes applied? Does someone
have apply changes in the evenings or on weekends? How
do you guarantee that the same changes are repeated in
each environment? Are the changes scripted out or is the
effort repeated by using GUI tools used to “right click, add
column?” Are all database changes scripted out? Are
those change scripts checked into version control? How
do you manage the structure vs required application data?
What differs in each environment?
Single
Shared
Database
If developers all code against a single shared database,
they must be careful to not destroy or pollute the test data
for others. Automated testing is difficult or unreliable at
best, and if one developer “messes up” the database it will
impact others.
Managing
Test Data
A typical practice for testers is to query to find data they
need to test a particular scenario, and then perform
manual tests. Once that data is “used”, it may no longer be
valid and cannot be used again until the QA environment is
refreshed. Refreshing data typically only happens when the
data becomes stale or unusable. Tests are not repeatable,
and automated testing efforts are difficult because each
time they run it requires different data. This is caused by
non-deterministic test data.
Big Design
Up Front
Doesn't Work
Agile development teams are working on features
iteratively and incrementally, and the notion that the
database design needs to be complete before
development can start is incompatible with agile software
development. Requirements are going to change. The best
design evolves from just-in-time analysis through contextspecific conversations.
Creating
New Databases
SQL> CREATE DATABASE CUSTOMER_DEV;
How long does it take to setup a new development
database? Is it simple? Is it fast? Who performs this work?
Is it done manually? Do developers have to wait?
* If a database administrator had fifty developers that each
needed their own copy of the database, how much
additional work would it require of the DBA?
*
denotes a trick question
Database as
Integration
Point :(
One application to one database is ideal, but many
applications integrate with other applications by leaving
data in a database that another application with pickup
and process. This practice amplifies the impact of
changing the database because in order to roll-out a
database change both applications must be tested. This
leads to the avoidance of changing the database unless
absolutely necessary, and encourages lots of time spent
carefully planning and designing for changes.
...What? Rename a table?
Fear
A situation where a database change will result in a highimpact on testing and affects multiple applications leads to
fear. This is a self-proliferating problem that gets worse
over time because rather than fixing the database design,
people live with workarounds and workarounds for the
workarounds. This prevents innovation and reduces the
effectiveness of everyone that interacts with the database.
Agile Data Practices
Everyone gets their own Database
Iterative and Incremental Database Design
Database Configuration Management
Database Versioning
Continuous Integration
Continuous Integration is an important agile software
development practice that checks code out from
version control and runs all automated tests after each
commit. The build drops and creates the database
required for automated tests. This enables fast
feedback, but requires high discipline of configuration
management and change control.
SQL
> C
RE
Mig
S
D
QL>
ROP
A
DAT
B
CU
ASE
M
STO
ER_
;
DEV
Nuke
&
Pave
Mig
Mig
Mig
Mig
Mig
rat
rat
rat
rat
rat
rat
ATE
DAT
ing
ing
ing
ing
ABA
sch
sch
sch
sch
SE
ema
ema
ema
ema
CUS
`cu
`cu
`cu
TOM
sto
sto
sto
ER_
mer
mer
mer
DEV
_de
_de
_de
;
v`
v`
to
to
ver
ver
sio
n 0
02
- c
rea
te
v
sch
e
_
r
ing
dev
`cu
sio
em
dom
` t
sto
n
sch
a
i
0
o
n
m
0
ema
er_
3
v
d
.
ers
ata
1 dev
`cu
ion
` t
ins
sto
004
o v
ert
mer
ers
_de
- a
cus
ion
v`
dd
to
to
ema
0
0
5 ver
i
l
sio
mul
n 0
tip
05.
le
2 pho
nes
ins
ert
ema
il
te
ing
sch
ema
`cu
sto
mer
v`
to
sio
n 0
03
A database configuration management practice where all
database objects are scripted out to automatically
reproduce a copy of the database. This includes tables,
views, stored procedures, functions, required application
data, users, roles, permissions, etc. The scripts are kept in
version control, and developers have permission to drop
and create their development databases at any time. This is
only applicable to non-persistent data stores (you would
never do this on your production database).
Incremental
Design
Small changes introduce the right-sized context-appropriate
solution when we design the database with an iterative and
incremental approach. Communication changes in SQL
leaves very little room for misinterpretation, and when all
changes are checked into version control the database
changes are transparent to everyone. This also enables
executing the same upgrade path for multiple environments.
Migrations
A migration is a script that upgrades a database schema
from one version to the next while preserving existing data.
Versioning metadata is tracked. When a database starts
using migrations, all future changes must be implemented
as a migration. Migrations cannot be edited, so if they fail
they should be rejected instead of “fixed”. It’s a good idea to
backup the database before executing migrations.
Schema Versioning
Metadata table to
track which changes
have been applied in
each database
All database changes
are migrations recorded
in schema_version when
applied
No manual changes
Everyone gets their
own database
Developers each have their own database
so they don’t collide with each other
Different versions of the application may
require different versions of the database
Local installation or centralized server
Weekly job that will
destroy all developer
databases. Ha!
Existing
Databases
Reverse engineer the existing production database schema
into scripts. Extract the required application data into insert
scripts. Check the scripts into version control. Execute the
scripts to recreate the database as part of the automated
build. Use migrations for all future changes to the database.
Existing
Data
You don’t need a huge amount of data to test the
functionality of your application. Separate functionality
testing from load-testing. Usually a few rows of data to start
with will suffice for demos and test automation. Data can be
created programmatically and reloaded automatically.
What about
test data?
Each test is responsible for creating the data necessary for
that test. Each test is responsible for cleaning up its data
afterwards (whether or not the test passed or failed). Data
can be reset automatically and is deterministic.
Determinism
Deterministic data is required for
test automation. Every time the test
runs, the state of the database is
exactly the same. Outputs are
assertable and test is repeatable.
Modeling
Tools
Modeling tools like ERwin and PowerDesigner can still be
leveraged for data modeling and the conceptual, logical,
and physical data models. This may require a higher degree
of collaboration and possibly changes in workflow with agile
development teams.
Benefits
Productivity
Communication
Predictability
Repeatability
Happiness
Poor Man's Migrations
(no additional tooling)
Workshop
Signup
Sheet
eval(this.session)
When you get back to your desk:
DROP your old DATABASE practices!