13 March 2007

Harder to Type, Easier to Maintain

One of my favorite software books, Steve McConnell's Code Complete, rails against hard-coded strings because they make an application difficult to maintain. At "typing time," it's the easiest and quickest way to do things. Thereafter, though, you've created a debt that you will pay down for the life of a project.

When we work with DataSets, DataTables and other means of packaging data, it is hard to avoid littering code with literal strings that represent column names. This is not helped by our tools -- how hard would it have been for Microsoft to expose an enum for column names from a DataTable object in a generated typed DataSet? (And why is "strongly-typed DataSet" preferred over "typed DataSet"?)

Several years ago, when I first started using .NET, I created an database column enum and constants generator. Point it to a database and let it generate enums or constants (in C# or VB.NET) for every table and view, or pick the ones you want. It's part of the suite of tools I use to create the data layer and business layer in my applications.

It's a simple thing. So instead of writing code like this:

30 dRow["LastName"] = "Smith";

31 dRow["FirstName"] = "Bob";


I can write it like this:

30 dRow[CustomerColumn.LastName] = "Smith";

31 dRow[CustomerColumn.FirstName] = "Bob";


A typical application is going to have references to columns in many places:
  • Assigning values to columns
  • Persisting data
  • Filtering (using the DataTable's .Select method or a DataView's .RowFilter property)
  • Sorting
  • Ad-hoc querying
  • Data binding and other tasks above in the designer for a form or web control
Aside from the visual designer code, giving up hard-coded strings for enums or constants is a very small change to make. If When your database schema changes, just regenerate the source file for the enums/constants and fix up your code. Most changes will now break the application at compile time, not run time. This makes it trivial to identify and fix these issues. A large application with thousands of references and many developers just cannot be maintained with hard-coded strings for column names. Catching these errors at run time requires 100% code coverage, and happens way too far downstream to NOT lose you money.

GridViews and other designer-based code may still be a chore, but you're not making things worse because that's how they are whether or not the rest of your source code has hard-coded values.

Does "LastName" look nicer than CustomerColumn.LastName? Probably. Is "LastName" easier to type? I'd say so. But it creates a mess, one that too many developers are keen to accept, unfortunately. Keep in mind, what I'm talking about here is for developers who haven't adopted object relational mapping or other means of encapsulating and abstracting object data. If you're still using DataSets and DataTables, you need to consider the friction your are accepting by hard-coding string column names, primary keys, default values, filters and so forth.

No comments: