Thursday, March 15, 2007

Database Refactoring

Refactoring Databases: Evolutionary Database DesignAt Agile 2006, Scott Ambler gave a talk on Database refactoring. Scott stresses on the need to be able to evolve our database schema and mentions that it is technically possible but it might not be culturally. Scott defines database refactoring as a simple change to the design of the database schema that does not change behavioral semantics or information semantics. Then Scott gives an overview of database refactoring and discusses the steps involved in the refactoring. These include: Verifying that refactoring is needed, choosing the right refactoring method, deprecating the original schema, writing and running tests, changing the schema, migrating the data, updating external apps, re-running tests, announcing the refactoring, and versioning your work.

When refactoring, there needs to be a deprecation period or transition window. Not all apps will be able to apply the required changes all at once so the old schema and new schema need to run in parallel. Both the refactoring and scaffolding need to be deployed into production. The scaffolding ensures that the old schema and new schema are kept in sync. After a transition window, the original schema and scaffolding are removed and the refactoring is complete.

Scott then gives examples of refactoring techniques. These include: renaming a column, replacing a column, merging columns, removing a view, introducing a default value, replacing 1 to many relationship with an associative table, and introducing a surrogate key. In general, the techniques involve keeping the old schema, adding the new schema along with a trigger to keep the two schemas in sync. At a later date (after the transition period), the old schema and trigger are dropped.

Next Scott covers 6 ways to enable database refactoring.

  1. Accept the situation: Scott mentions that we need to accept the fact that the data is an important asset and that we should be able to change the data schema. The database should not be an anchor around our neck. Trivial things should be trivial to do. We have to be able to do what needs to be done and no tools will magically fix the schema for us.
  2. Teamwork: Developers need to learn data skills. They should understand the fundamentals of database modeling. They should know how to normalize. Also, DBAs need to understand design patterns. Operations must be flexible.
  3. Sandboxes: Developers need their own database instance or table space to work with. There needs to be development sandbox, project integration sandbox, demo sandbox, qa sandbox, and production.
  4. Configuration management: All system artifacts need versions( DDL scripts, test data, test scripts, source code, models, documents).
  5. Database Regression Testing: Testing should be at the interface to database (input/output), and also inside the database (stored procedures, views, triggers, constraints). Example, testing cascading deletes or null values.
  6. Encapsulate database access: not all apps will access the db the same way. Some will use Frameworks, some might use direct SQL, etc.

Then Scott covers the steps to deploy database refactoring:
  1. Sandboxes: Each team works in its own set of sandboxes.
  2. Deployment Process: Each team run regression tests, deploy code, deploy database  refactoring, re-run test, back out if it does not work.
  3. Versioning issues: There will be different version of the database for different team sandboxes. Some are behind, some are ahead. Each database refactoring needs to be a script with incremental version numbers. There should be table in the database that keeps track of its current version.
  4. Deployment Windows: Operations can use windows to merge database refactoring scripts from different teams into 1 before rolling into production

Finally Scott leaves us with some parting thoughts. He mentions that if we can refactor application code we should be able to refactor database schemas. Refactoring a database is harder, but the tools are getting better. Don’t underestimate how difficult database refactoring is.
This presentation is available on InfoQ at