Agile Data Practices Database Management Techniques to Support Software Delivery Teams Tim Andersen Agile Coach / Software Developer Lean TECHniques, Inc. @timander Abstract As software development teams introduce agile practices, improving how they work with databases is often ignored. Software developers tend to accept the fact that databases must be shared, are hard to change, and they must wait on someone else to make those changes. It doesn't have to be this way! In this session, I will identify the pain points and anti-patterns associated with traditional database management and I will show specific techniques and tools to improve the workflow for embracing database changes. What’s different? What are the problems with traditional database practices? What changes in the data world need to occur in order to help software delivery teams succeed? We will demystify agile database techniques and preview tools Obsolete data management practices are impediments to agile teams This session will focus on improving how teams interact with relational databases A common problem... What is the current state of the database? Does QA match Production? Does DEV match QA? Has the quick-fix that was applied in production been applied to test and dev environments? How can you tell? A common situation... Keeping the dev environment in-sync with the test and production environments is usually a crude and errorprone process that involves coordinating changes and the timing of applying those changes with the database administrators. Many times, there are differences but because the database is “close enough” it happens to work until someone encounters a problem. Compatibility Another pesky problem we encounter is the dependency between the application and the database. The application expects a specific structure in place for the database, and if it doesn’t match the application will not work. Database changes are sensitive to timing and must be applied when the application is deployed. This often requires coordinating with a database administrator to apply those changes in the evening or on a weekend. Culture Database have typically been treated as large, heavy objects that are hard to change. We treat them as a scarce resource that we have to share. Questions to ponder: Why is this a common practice? How does your database vendor charge you? Knowledge Silos Database skill-sets are considered specialized and often silo-ed into departments. This culture is deeply ingrained in many organizations. Because of this, data professionals are typically treated as a “shared service” and usually spread out to support multiple teams. This causes teams to wait on database-related questions or changes. Blocked! Developers often lack the skill-sets and permission necessary to make database changes and encounter blocked work because they must wait on a data professional for making database changes. The developer then has two choices: pick less important work workaround the problem Privileges Databases are often locked down to prevent unauthorized changes and there are processes and policies in place to prevent changes from unauthorized individuals. This gives all of the power to the database administrators and puts the developers at their mercy. Lockdowns and Workarounds If you have ever seen an anti-pattern such as a multipurposed column, or a multi-valued column, have you wondered why someone decided that was a good idea? This usually happens because working around the controls in place to avoid database changes. At the time these decisions are introduced, the additional complexity is the path of least resistance compared to changing the database. Ugh, meetings We have a strong urge to get the database design right the first time and we spend a lot of time in design and analysis. These meetings are intended to prevent the wrong decision from being made because of the belief that the database is hard to change and we cannot afford to get it wrong. Meanwhile, these meetings are blocking development work and taking time away from other valuable activities. This is another reason why developers prefer to avoid database changes if at all possible. The NoSQL Movement Why do you think the NoSQL movement has gained such popularity? My theory is that it is not always about the technology. Sometimes NoSQL solutions are introduced as a way to bypass the database administrators. Whether or not it is a good technical decision, it often appears to be more successful than a relational database because the team take on the skill-sets for managing NoSQL databases and can deliver faster because they are not blocked by database design decisions or waiting on someone to apply database changes. Manual Changes ALTER TABLE ADDRESS MODIFY COLUMN ZIP_CODE VARCHAR(10) NOT NULL; Manual changes are not recorded, which proliferates the need for comparing schemas between environments. Manual changes also require duplicate effort to apply changes in other environments, and are susceptible to human error. Schema Comparisons Comparing the database schema from one environment to another is often the only mechanism we have in order to determine whether or not a set of database changes has been applied or not. This activity is required for each environment when the application is deployed and requires database changes. Multiple Environments DEV QA PROD When are database changes applied? Does someone have apply changes in the evenings or on weekends? How do you guarantee that the same changes are repeated in each environment? Are the changes scripted out or is the effort repeated by using GUI tools used to “right click, add column?” Are all database changes scripted out? Are those change scripts checked into version control? How do you manage the structure vs required application data? What differs in each environment? Single Shared Database If developers all code against a single shared database, they must be careful to not destroy or pollute the test data for others. Automated testing is difficult or unreliable at best, and if one developer “messes up” the database it will impact others. Managing Test Data A typical practice for testers is to query to find data they need to test a particular scenario, and then perform manual tests. Once that data is “used”, it may no longer be valid and cannot be used again until the QA environment is refreshed. Refreshing data typically only happens when the data becomes stale or unusable. Tests are not repeatable, and automated testing efforts are difficult because each time they run it requires different data. This is caused by non-deterministic test data. Big Design Up Front Doesn't Work Agile development teams are working on features iteratively and incrementally, and the notion that the database design needs to be complete before development can start is incompatible with agile software development. Requirements are going to change. The best design evolves from just-in-time analysis through contextspecific conversations. Creating New Databases SQL> CREATE DATABASE CUSTOMER_DEV; How long does it take to setup a new development database? Is it simple? Is it fast? Who performs this work? Is it done manually? Do developers have to wait? * If a database administrator had fifty developers that each needed their own copy of the database, how much additional work would it require of the DBA? * denotes a trick question Database as Integration Point :( One application to one database is ideal, but many applications integrate with other applications by leaving data in a database that another application with pickup and process. This practice amplifies the impact of changing the database because in order to roll-out a database change both applications must be tested. This leads to the avoidance of changing the database unless absolutely necessary, and encourages lots of time spent carefully planning and designing for changes. ...What? Rename a table? Fear A situation where a database change will result in a highimpact on testing and affects multiple applications leads to fear. This is a self-proliferating problem that gets worse over time because rather than fixing the database design, people live with workarounds and workarounds for the workarounds. This prevents innovation and reduces the effectiveness of everyone that interacts with the database. Agile Data Practices Everyone gets their own Database Iterative and Incremental Database Design Database Configuration Management Database Versioning Continuous Integration Continuous Integration is an important agile software development practice that checks code out from version control and runs all automated tests after each commit. The build drops and creates the database required for automated tests. This enables fast feedback, but requires high discipline of configuration management and change control. SQL > C RE Mig S D QL> ROP A DAT B CU ASE M STO ER_ ; DEV Nuke & Pave Mig Mig Mig Mig Mig rat rat rat rat rat rat ATE DAT ing ing ing ing ABA sch sch sch sch SE ema ema ema ema CUS `cu `cu `cu TOM sto sto sto ER_ mer mer mer DEV _de _de _de ; v` v` to to ver ver sio n 0 02 - c rea te v sch e _ r ing dev `cu sio em dom ` t sto n sch a i 0 o n m 0 ema er_ 3 v d . ers ata 1 dev `cu ion ` t ins sto 004 o v ert mer ers _de - a cus ion v` dd to to ema 0 0 5 ver i l sio mul n 0 tip 05. le 2 pho nes ins ert ema il te ing sch ema `cu sto mer v` to sio n 0 03 A database configuration management practice where all database objects are scripted out to automatically reproduce a copy of the database. This includes tables, views, stored procedures, functions, required application data, users, roles, permissions, etc. The scripts are kept in version control, and developers have permission to drop and create their development databases at any time. This is only applicable to non-persistent data stores (you would never do this on your production database). Incremental Design Small changes introduce the right-sized context-appropriate solution when we design the database with an iterative and incremental approach. Communication changes in SQL leaves very little room for misinterpretation, and when all changes are checked into version control the database changes are transparent to everyone. This also enables executing the same upgrade path for multiple environments. Migrations A migration is a script that upgrades a database schema from one version to the next while preserving existing data. Versioning metadata is tracked. When a database starts using migrations, all future changes must be implemented as a migration. Migrations cannot be edited, so if they fail they should be rejected instead of “fixed”. It’s a good idea to backup the database before executing migrations. Schema Versioning Metadata table to track which changes have been applied in each database All database changes are migrations recorded in schema_version when applied No manual changes Everyone gets their own database Developers each have their own database so they don’t collide with each other Different versions of the application may require different versions of the database Local installation or centralized server Weekly job that will destroy all developer databases. Ha! Existing Databases Reverse engineer the existing production database schema into scripts. Extract the required application data into insert scripts. Check the scripts into version control. Execute the scripts to recreate the database as part of the automated build. Use migrations for all future changes to the database. Existing Data You don’t need a huge amount of data to test the functionality of your application. Separate functionality testing from load-testing. Usually a few rows of data to start with will suffice for demos and test automation. Data can be created programmatically and reloaded automatically. What about test data? Each test is responsible for creating the data necessary for that test. Each test is responsible for cleaning up its data afterwards (whether or not the test passed or failed). Data can be reset automatically and is deterministic. Determinism Deterministic data is required for test automation. Every time the test runs, the state of the database is exactly the same. Outputs are assertable and test is repeatable. Modeling Tools Modeling tools like ERwin and PowerDesigner can still be leveraged for data modeling and the conceptual, logical, and physical data models. This may require a higher degree of collaboration and possibly changes in workflow with agile development teams. Benefits Productivity Communication Predictability Repeatability Happiness Poor Man's Migrations (no additional tooling) Workshop Signup Sheet eval(this.session) When you get back to your desk: DROP your old DATABASE practices!
© Copyright 2024