Wednesday, October 04, 2006

Database Version Control In Depth

I've always struggled with database version control in the environment of a database application and the dependency between application and database structure. The database I'm most familiar with, MS SQL Server, doesn't have built-in version control for a development environment (something I think Microsoft should have addressed in SQL 2000).

I'm presently using a solution provided by a company called Innovartis (http://www.innovatris.com/) called DBGhost that builds a database much as you would build an application: from source code under version control. These folks have quite an outline for their proposed solution on their web site, and a trial version that helps really grasp it. The key is all the SQL that defines your schema and static data resides in your version control system and is used to build an online database. Thus, it captures changed checked-in and only those changes as a part of the 'final' database for your application.

This particular tool has some nice features including a script generator to get the scripts out of your online database as a start. It also has a comparison piece that helps generate the update script and actually merge changes between different versions. All in all, it allows you to treat your schema more like code than ever before and tag it, branch it and control it.

For now, it's the most control I've had on a database schema in a database application but there are some drawbacks:
  • The command line version is way more than the interactive model, which means the automated build of the database is most likely to be reserved for 'special' people or build boxes. I want all my developers to be able to 'build' a database from the command line, but the tool is way to expensive for that.
  • The tool is a little clunky on the UI in sort of a wizard format. For us advanced users (and really, if you're building a database doesn't that give you automatic advanced user status?) we should be able to configure it without having to go 'Next' through all the screens.
  • The command line version leaves lots to be desired. It's all driven by a single XML configuration file with no opportunity to override even the simplest things (like target server or database) on the command line.
  • It doesn't return a normal error code when run from the command line so build tools (like NAnt) don't automatically know when the build has failed. For that matter, how about an NAnt add-in task so we can just pull it all together.

But it works for now, and while we're finding glitches now and then it's great to automatically generate a database create script and update script every night in our nightly build with no manual intervention directly from version control. Feel the warm and fuzzies...

Automated Unit Tests in a Database Application

I've gotten to the point where I really see the value in automated unit tests, and although it's a tired analogy it really is a safety net over which refactoring is less dangerous. But what about database objects? Is there any meshing of the database and test driven development?

I think the traditional approach of testing database applications using an automated framework like xUnit is to mock any dependency on the database in a way the rids the test of dependence on the live (or test) database. But let's be honest: you need certain database objects and data to exist or your application won't run.

So, I've come up with an interesting solution. In my database dependent application I build unit tests to verify the code independent of the database using mock objects (we use .NET, so NMock is very helpful here). But I also build unit tests that verify that the key elements of the database schema exist including tables, columns, stored procedures and views. We've built a mechanism of creating a hash from the objects that are complex, like stored procedures and views, and verify that the object on the other end of the database matches what the application expects.

So, in a normal day on my project I may have to fix or add a stored procedure. I can add a unit test that checks for the object and it's hash that fails until I add the stored procedure itself. This way at build time I can verify that the database I'm connected to is valid as far as the application is concerned.

For those of us still running installed software (I know, I wish it was an ASP too) there's an additional benefit. Using NUnit's ability to categorize unit tests creates a subset of the unit tests that, when run in a production environment, can validate that the customer database meets the standards of the application installed. I have a built in mechanism to pinpoint differences in a customer database (maybe they accidentally dropped a table) quickly.

Tuesday, October 03, 2006

Software Development

I find myself introspective about software development more now than ever, and the idea of getting better and better at it is very appealing. I already feel that I'm a very good developer, and I find that many of the lessons I've learned have come back to me in the form of best-practices documented by others. I wonder at times if I would have been able to articulate the best-practices I identified in myself as well as others I see writing books. I think so, but I also think it would take more continuous self (or team) focus on what, exactly, works and how to capture it and pass it on.

Some of my favorite other references include The Pragmatic Programmer (http://www.pragmaticprogrammer.com/), truly a ground-breaking work for me in articulating these key things I've found so precious. Interestingly, I've found that passing this book along to others has often not had the desired effect of changing the behavior that conflict with the book. Which makes me wonder if you have to figure it out for yourself before you'll really 'get it.'

I also find that I enjoy some of the web sites written by others in software development. My favorites here are:

These are folks like me who are good at software development and who I relate to on what 'good' means. I think this blog could be an opportunity to post introspective thoughts on what works and what doesn't work in software development, some probably already well documented practices but maybe some aren't.

Database Version Control

For example, I've always struggled with version control for a relational database structure in a database centric application. It's always a key component (sometimes using way too many stored procedures so it's even more key), the database is often kept in a state of 'works with the current code' but not version controlled.

How often have you, the developer, wanted to look at the history of a table structure like you can look at the history of a class source file? Only to know that you only have the current instance of the table stored in a database to go from as the 'working copy' of the schema. In previous organizations, I've written scripts to pull out the schema and check it into version control to provide, at a minimum, evidence of change over time. But it still lacked control.

In my present team we've implemented what is the most robust fashion of database version control I've been successful at implementing. We store the entire set of scripts used to create a database (broken up by tables, functions, stored procedures, static data, etc) in our version control system. These files must be checked out to be modified and then checked in. To enforce the need for developers to check-out/check-in we use an application the 'compiles' the individual scripts into a database each night as a part of our continuous integration.

So, for us, we get a nightly database build along with our application. We also use the tool to create an 'upgrade' script from the previous version to the current version of our database. With this additional feature, we get the build-from-scratch database as well as the upgrade-from-previous script. These two things are checked into version control each night and migrated to an integration server.

If a developer wants the database changes to be included in the build, he or she must check into source control the database script. From here, we can branch and tag releases and versions and establish a clear package for our application from data schema to application code.

Monday, October 02, 2006

Know It All

I read an interesting book called The Know It All by A.J. Jacobs while taking a rare vacation from the kids and getting solid time to read. The book had an interesting premise: the story of a man my age trying to read the entire Encyclopedia Britannica. I'm not sure what was more interesting to me, the story about the guy or the way it provided a glimpse into the interesting parts of the encyclopedia.

Honestly, I think I enjoyed the little facts and tidbits more than the story of his life. He's a pretty normal guy, and although I think I could relate to him (and that helped the reading go easier) it struck me about two-thirds through that I really didn't care about him very much. I mean, it's written from the first person so you know his eventual outcome is he finishes the book.

The little tidbits are interesting, although I have to say I think there was even some literary license taken in making them sound more interesting (or twisted) than they really were. Like making a statement that berries, as you know them (say strawberries), aren't really berries. Well, that was interesting until I realized it's no good to know that when everyone's definition of berry and fruit is different. And saying "hey, I bet you didn't know strawberries aren't berries" just makes you sound stupid.

Well, it was interesting. And I must say, I finished it just to say I finished a book that wasn't work-related in 2006. I think it may well be the only one. But now I must go back to Ruby on Rails, or the Pragmatic Programmer, or some other more interesting read that makes me come off as a super geek. Because that's really what I am, and I don't need to read a book about other one of myself.