Sunday, August 03, 2014

SSIS Best Practices 2 - Single Package vs Multiple Packages

When developing an ETL package the first thing that comes to the mind is whether to build single package doing the end-to-end flow or multiple packages doing the same.

Being an Architect, I have been challenged with this question many times from my development team especially around single package model helps in maintaining less number of packages whereas multiple packages creates an overhead with the number of packages.

One Package = One Unit of Work

It is always best practice to have one package doing only one thing. The following benefits can be achieved doing so.

  • Less time spent waiting on design-time validation. SQL Server Data Tools has a rich interface that provides, among other things, a near real-time evaluation of potential metadata problems in the SSDT designer. If, for example, a table that is accessed by the SSIS package is changed, the developer will be presented with a warning (or an error, if applicable) in SSDT indicating that metadata accessed by the package has changed. This constant metadata validation is beneficial in that it can help to identify potential problems before they are pushed out for testing. There's also a performance cost associated with this. The length of time required for validation increases as the size of the package increases, so naturally keeping the packages as small as practical will cut down on the amount of time you're drumming your fingers on your desk waiting for validation to complete.
  • Easier testing and deployment. A single package that loads, say, 10 dimensions has a lot of moving parts. When developing each of the dimensions within the package, there is no easy way to test just one element of the package (apart from manually running it in the SSDT designer, which isn't a completely realistic test for a package that will eventually be deployed to the server). The only realistic test for such a package would be to test the entire package as a server-based execution, which may be overkill if you're only interested in one or two changed properties. Further, it's not uncommon for organizations with formal software testing and promotion procedures to require that the entire thing be retested, not just the new or changed elements. By breaking up operations into smaller units, testing and deployment are usually less of a burden because you are only operating on one component at a time.
  • Distributed development. If you work in an environment where you are the only person developing SSIS packages, this is less of a concern. However, if your shop has multiple ETL developers, those do-everything packages are quite inconvenient. Although it's much easier in SQL Server 2012 than in previous releases to compare differences between versions of the same package file, it's still a mostly manual process. By segmenting the workload into multiple packages, it's much easier to farm out development tasks to multiple people without having to reconcile multiple versions of the same package.
  • Reusability. It's not uncommon for the same logic to be used more than once in the same ETL execution, or for that same logic to be shared among multiple ETL processes. By encapsulating these logical units of work into their own packages, it's much easier to share that logic and avoid duplicate development.

Also we follow the following design patterns

  • Each dimension has a separate package.
  • Each fact table has a separate package.
  • Staging operations (if used) each have their own package.
  • Dropping or disabling indexes, Truncating tables can be grouped together and can typically reside in the same package.

No comments:

Post a Comment