Biml From the Trenches - Convention over Configuration
Biml from what I have observed is still in it’s infancy. Don’t get me wrong, what can actually be done with Biml is impressive, however it seems to me how we are using it and how we are talking about it still needs to mature before it can be fully and effectively embraced by the community at large.
There is a growing number of people in the SQL Server community that are contributing excellent guidance and driving forward its maturity, as well as BimlScript.com which has an ever growing set of examples to learn from. I firmly believe that the future of Biml is bright, if you don’t believe me check out this interview with the creator Scott Currie.
What I would like to see more of, is people talking about implementing design patterns in Biml, let’s have less examples and more theory.
An area that there is still little guidance on is convention based Biml.
Talk to any experienced software developer and they will tell you all about “Convention over Configuration” and how it makes their life easier, but if you’re a BI developer this might be brand new concept to you.
Wikipedia has a great description:
Convention over configuration is a software design paradigm which seeks to decrease the number of decisions that developers need to make, gaining simplicity, but not necessarily losing flexibility.
The phrase essentially means a developer only needs to specify unconventional aspects of the application.
In my mind there are a few different types of conventions:
- Those that require strict adhered
- Those that can be overridden
- On/Off by default
In this post I’m going to give two examples of the first type, those I class as “strict adherence required” from a recent ETL project I was working on. I’ll cover the other types in subsequent posts.
NB: I’m deliberately not including any Biml or screenshots of SSIS packages in this post. The reason for this is that there are numerous ways to implement design patterns, so rather than showing you how I do it, I want to introduce the concepts and get you thinking about it.
Use a convention to Extract data
What I want to achieve: I want to move data from a source system into my target database, but only a subset of tables in the source system.
My convention: For each table in the Import schema of my Target database, there is a table with the same name and structure in the source system. Then, for each table in the Import schema generate Data Flow task which will read from the source table and copy the data into the target table.
The advantage: When I need to start importing more data from the source system all I need to do is create a new table in my import schema that matches the source and then regenerate the extract package. No configuration is needed, I don’t need to maintain a list of tables which needed to be imported, the presence of the table in the Import schema is enough. Simple.
Use convention to Transform data
The ETL project that I working on already had a number of queries written which would transform the source system data to match our table structure. We had a number of tables in the Stage schema which the transformed data needed to go before being loaded into the live tables.
What I want to achieve: Transform data from the Import schema tables and load into tables in the Stage schema
My convention: For each table in the Stage schema, there is a corresponding stored procedure with a name like: [Stage table name]_Transform
The advantage: Once again, when I need to start transforming more data, only two things need be done, create new table in Stage schema, create transform stored procedure. Then I just regenerate my SSIS package, no changes required.
What’s the point?
Hopefully you’ve figured it out, but the point of “convention over configuration”, or “coding to a convention” is to remove the number of decisions that you need to make during the development effort. This allows you to spend more time on the hard stuff and let the computer take care of the rest.