Sharing knowledge does not lessen your store, often it gets you more.
Success doesn't happen overnight and patience is key to living your dream life.
Success is a journey not a destination
Sunday, August 31, 2014
SSRS Preview in VS 2013: There was no endpoint listening at net.pipe://localhost/PreviewProcessingService4440/ReportProcessing that could accept the message.
Saturday, August 30, 2014
SPARSE Columns and ColumnSets
I had a scenario where I had to load a csv file which had around 2000 columns. Then I started looking at the wide table feature which came as part of SQL Server 2008. A wide table can have 30,000 columns as opposed to a non-wide table which can have 1024 columns.
Having 30,000 or even 1,024 columns in a table is a lot of columns and design review needs to be done for these type of scenarios. In my scenario it is another system which generates this file and the business wants an ability to query it easily via a single table. More or less just for analysis purposes.
Sparse columns are designed to allow a relational database to store and process relatively unstructured data, where any individual entity may have a modest selection from a very wide set of attributes. This sort of data can be found frequently in scientific, pharmaceutical, and medical applications. EAV (Entity/Attribute/Value) storage techniques were engineered for such data; however, EAV data storage has proved to be somewhat inefficient in such areas as query speed and storage cost.
Sparse columns work well where a high proportion of the data in a column is NULL, because negligible storage space is used for the NULL representation. A table that is wide in theory can be made narrow in practice by implementing sparse columns, and the addition of filtered indexes provides effective searches on the sparse data. Sparse columns support the use of check constraints, as long as the constraints allow NULL values (a sparse column requirement).
Column sets provide a solution to the difficulties of efficient imports and exports of sparse data. The untyped XML representation that is generated for each row interacts directly with each sparse field. The XML field accepts inserts, and its data can be extracted and modified easily. This feature effectively supports certain types of serialized object data from various applications.
CREATE TABLE SPARSETABLE ( ID INT NOT NULL IDENTITY(1,1) ,FIRSTNAME VARCHAR(100) NOT NULL ,MIDDLENAME VARCHAR(100) SPARSE NULL ,LASTNAME VARCHAR(100) NOT NULL ,SPARSEDATASET XML COLUMN_SET FOR ALL_SPARSE_COLUMNS ) INSERT SPARSETABLE(FIRSTNAME, MIDDLENAME, LASTNAME) VALUES ('John','Daniel','Smith'); GO INSERT SPARSETABLE(FIRSTNAME, LASTNAME) VALUES ('Peter','Wilson'); GO SELECT [ID] ,[FIRSTNAME] ,[LASTNAME] ,[SPARSEDATASET] FROM [SPARSETABLE]
Capture Metadata for Data Warehouses using SQL Server Extended Properties
SQL SERVER extended properties allows developers to load metadata information like description, Source etc on a table or column within a table. This will allow information to be documented and attached to SQL SERVER and also queryable.
--Add Extended Property for a table EXEC sp_addextendedproperty @name=N'Description', @value=N'Dimension with the list of all products.' , @level0type=N'SCHEMA', @level0name=N'dbo', @level1type=N'TABLE', @level1name=N'DIM_PRODUCTS' --Update Extended Property for the table EXEC sp_updateextendedproperty @name=N'Description', @value=N'Dimension with the list of all available products.' , @level0type=N'SCHEMA', @level0name=N'dbo', @level1type=N'TABLE', @level1name=N'DIM_PRODUCTS' GO --Add Extended Property for a column EXEC sp_addextendedproperty @name=N'Description', @value=N'Name of the Product.' , @level0type=N'SCHEMA', @level0name=N'dbo', @level1type=N'TABLE', @level1name=N'DIM_PRODUCTS', @level2type=N'COLUMN', @level2name=N'PRODUCT_NAME' --Update Extended Property for a column EXEC sp_updateextendedproperty @name=N'Description', @value=N'Name of the Product' , @level0type=N'SCHEMA', @level0name=N'dbo', @level1type=N'TABLE', @level1name=N'DIM_PRODUCTS', @level2type=N'COLUMN', @level2name=N'PRODUCT_NAME' GO --Query Extended Properties SELECT objname, name, value FROM fn_listextendedproperty(NULL, 'schema', 'dbo', 'table', default, NULL, NULL) SELECT objtype, objname, name, value FROM fn_listextendedproperty (NULL, 'schema', 'dbo', 'table', 'DIM_PRODUCTS', 'column', default);
Saturday, August 16, 2014
Package Configurations missing in SSIS 2012
Couldn't find the package configurations tab in SSIS 2012. This is because Package Configurations in SSIS 2012 is available only for the Package Deployment Model and not for Project Deployment Model.
Make sure to convert the project to Package Deployment Model by right-clicking the project and click on Convert to Package Deployment Model.
Project consistency check failed. The following inconsistencies were detected: Test BIML [PackageName].dtsx has a different ProtectionLevel than the project.
Monday, August 04, 2014
SSIS Best Practices 3 - Handling and Logging Data Load Errors
Normally you don't want a ETL load to fail because of any data errors. For instance, if you have a file with 1000 rows with 10 rows having some issues, you don't want the load to fail. Instead the package must successfully load the 990 rows and the 10 error rows need to be redirected to a separate table to identify the issue and fix them. If it is a flat file then it is worth while to generate and include a Row Number which will help in identifying the row in the flat file.
The error table will have all columns with data type of varchar(max) across all columns with additional columns like ErrorCode, ErrorColumn and ErrorDescription.
Error rows are handled in the Data Flow through the use of the Error Row Configuration properties. These properties tell the Data Flow Components what to do when a row fails an operation, such as data conversion or truncation.
Redirect Row Event Handler denotes that when a row reaches an error for any column marked as redirected, the entire row is sent out the red error path output which normally connects to the Error table destination. If the red error path output is not used then the row will be ignored out of the Data flow.
public override void Input0_ProcessInputRow(Input0Buffer Row) { Row.ErrDesc = ComponentMetaData.GetErrorDescription(Row.ErrorCode); }
Sunday, August 03, 2014
SSIS Best Practices 2 - Single Package vs Multiple Packages
When developing an ETL package the first thing that comes to the mind is whether to build single package doing the end-to-end flow or multiple packages doing the same.
Being an Architect, I have been challenged with this question many times from my development team especially around single package model helps in maintaining less number of packages whereas multiple packages creates an overhead with the number of packages.
One Package = One Unit of Work
It is always best practice to have one package doing only one thing. The following benefits can be achieved doing so.
- Less time spent waiting on design-time validation. SQL Server Data Tools has a rich interface that provides, among other things, a near real-time evaluation of potential metadata problems in the SSDT designer. If, for example, a table that is accessed by the SSIS package is changed, the developer will be presented with a warning (or an error, if applicable) in SSDT indicating that metadata accessed by the package has changed. This constant metadata validation is beneficial in that it can help to identify potential problems before they are pushed out for testing. There's also a performance cost associated with this. The length of time required for validation increases as the size of the package increases, so naturally keeping the packages as small as practical will cut down on the amount of time you're drumming your fingers on your desk waiting for validation to complete.
- Easier testing and deployment. A single package that loads, say, 10 dimensions has a lot of moving parts. When developing each of the dimensions within the package, there is no easy way to test just one element of the package (apart from manually running it in the SSDT designer, which isn't a completely realistic test for a package that will eventually be deployed to the server). The only realistic test for such a package would be to test the entire package as a server-based execution, which may be overkill if you're only interested in one or two changed properties. Further, it's not uncommon for organizations with formal software testing and promotion procedures to require that the entire thing be retested, not just the new or changed elements. By breaking up operations into smaller units, testing and deployment are usually less of a burden because you are only operating on one component at a time.
- Distributed development. If you work in an environment where you are the only person developing SSIS packages, this is less of a concern. However, if your shop has multiple ETL developers, those do-everything packages are quite inconvenient. Although it's much easier in SQL Server 2012 than in previous releases to compare differences between versions of the same package file, it's still a mostly manual process. By segmenting the workload into multiple packages, it's much easier to farm out development tasks to multiple people without having to reconcile multiple versions of the same package.
- Reusability. It's not uncommon for the same logic to be used more than once in the same ETL execution, or for that same logic to be shared among multiple ETL processes. By encapsulating these logical units of work into their own packages, it's much easier to share that logic and avoid duplicate development.
Also we follow the following design patterns
- Each dimension has a separate package.
- Each fact table has a separate package.
- Staging operations (if used) each have their own package.
- Dropping or disabling indexes, Truncating tables can be grouped together and can typically reside in the same package.
Saturday, August 02, 2014
SSIS Best Practices 1 - Naming Conventions
Always name the tasks and components using the following acronyms at the beginning of the task and component names. Following these naming conventions will facilitate on-going maintenance as well as making any global changes to the packages as package updates can be performed programmatically.
Task | Prefix |
For Loop Container | FLC |
Foreach Loop Container | FELC |
Sequence Container | SEQC |
ActiveX Script | AXS |
Analysis Services Execute DDL | ASE |
Analysis Services Processing | ASP |
Bulk Insert | BLK |
Data Flow | DFT |
Data Mining Query | DMQ |
Execute DTS 2000 Package | EDPT |
Execute Package | EPT |
Execute Process | EPR |
Execute SQL | SQL |
File System | FSYS |
FTP | FTP |
Message Queue | MSMQ |
Script | SCR |
Send Mail | SMT |
Transfer Database | TDB |
Transfer Error Messages | TEM |
Transfer Jobs | TJT |
Transfer Logins | TLT |
Transfer Master Stored Procedures | TSP |
Transfer SQL Server Objects | TSO |
Web Service | WST |
WMI Data Reader | WMID |
WMI Event Watcher | WMID |
XML | XML |
Expression | EXPR |
Component | Prefix |
DataReader Source | DR_SRC |
Excel Source | EX_SRC |
Flat File Source | FF_SRC |
OLE DB Source | OLE_SRC |
Raw File Source | RF_SRC |
XML Source | XML_SRC |
Aggregate | AGG |
Audit | AUD |
Character Map | CHM |
Conditional Split | CSPL |
Copy Column | CPYC |
Data Conversion | DCNV |
Data Mining Query | DMQ |
Derived Column | DER |
Export Column | EXPC |
Fuzzy Grouping | FZG |
Fuzzy Lookup | FZL |
Import Column | IMPC |
Lookup Column | LKP |
Merge | MRG |
Merge Join | MRGJ |
Multicast | MLT |
OLE DB Command | CMD |
Percentage Sampling | PSMP |
Pivot | PVT |
Row Count | CNT |
Row Sampling | RSMP |
Script Component | SCR |
Slowly Changing Dimension | SCD |
Sort | SRT |
Term Extraction | TEX |
Term Lookup | TEL |
Union All | ALL |
Unpivot | UPVT |
Data Mining Model Training | DMMT_DST |
Data Reader Destination | DR_DST |
Dimension Processing | DP_DST |
Excel Destination | EX_DST |
Flat File Destination | FF_DST |
OLE DB Destination | OLE_DST |
Partition Processing | PP_DST |
Raw File Destination | RF_DST |
Recordset Destination | RS_DST |
SQL Server Destination | SS_DST |
SQL Server Mobile Destination | SSM_DST |
Note: Thanks to Jamie Thomson for the naming conventions.
Entity Framework 6 Features
Async Query and Save
Entity Framework 6 provides the ability to execute a query and command asynchronously using DbContext.
Code based Configuration
Entity framework configurations that are usually set in app.config section now can be set programmatically using System.Data.Entity.DbConfiguration class.
Db Command Logging
EF6 provides an ability to log all the activity performed using context.database.Log
using (var context = new PersonEntities()) { context.Database.Log = Console.Write; var person = context.People.FirstOrDefault(p => p.FirstName == "John"); if (person != null) person.LastName = "Smith"; context.SaveChanges(); }
Command Interception
Starting with EF6.1 interceptors can be registered in the config file. Interceptors allows to run additional logic when EF performs certain operations, such as executing database queries, opening connections, etc. To append to the log file if the log file already exists
Transactions
EF 6 has introduced Database.BeginTransaction and Database.UseTransaction to provide more control over transaction. In previous versions of Entity Framework, whenever you execute SaveChanges() to do Create, Update, Delete operations in the database it will be wrapped in a transaction. This transaction lasts only until the operation is executed. A new transaction is started once another operation is initiated.
Database.BeginTransaction() – An easier method for a user to start and complete themselves with an existing DbContext – allowing several operations to be combined within the same transaction and hence either all committed or all rolled back as one. It also allows the user to more easily specify the isolation level for the transaction
Database.UseTransaction() – Allows the DbContext to use a transaction which was started outside Entity Framework.
using (System.Data.Entity.DbContextTransaction dbTran = context.Database.BeginTransaction( )) { try { Person person = new Person() { FirstName = "John" }; context.Persons.Add(std1); context.Database.ExecuteSqlCommand( @"UPDATE Person SET FirstName = 'James'" + " WHERE PersonID =1" ); context.Persons.Remove(std1); //saves all above operations within one transaction context.SaveChanges(); //commit transaction dbTran.Commit(); } catch (Exception ex) { //Rollback transaction if exception occurs dbTran.Rollback(); } }
DbSet.AddRange and DbSet.RemoveRange
DbSet in EF6 has introduced new methods AddRange and RemoveRange. DbSet.AddRange adds collection of entities to the DbContext. DbSet.RemoveRange removes collection of entities from DbSet.
IListpersons = new List (); persons.Add(new Person() { FirstName = "Person 1" }); persons.Add(new Person() { FirstName = "Person 2" }); persons.Add(new Person() { FirstName = "Person 3" }); using (var context = new PersonEntities()) { context.People.AddRange(persons); context.SaveChanges(); } using (var context = new PersonEntities()) { IList personList = context.People.Where(p => p.FirstName == "First Name 1").ToList(); context.People.RemoveRange(personList); context.SaveChanges();
Custom Conventions
EF6 provides the ability to define own custom conventions which will be the default behavior for the models. There are 2 types of conventions, Configuration Conventions and Model Conventions
Configuration Convention
Configuration Conventions are the way to configure set of objects without overriding configuration provided by Fluent API. We can define this convention inside OnModelCreating method or we can use own class that inherits Convention class.
Suppose you would like to configure all DateTime properties must of data type DateTime2 across all tables in the model this can be easily achieved by Configuration Conventions by overriding the onModelCreating method or having a separate class that inherits Convention class.
protected override void onModelCreating(DbModelBuilder modelBuilder) { modelBuilder.Properties() .Configure(c => c.HasColumnType("datetime2")); }
Using Convention class
class CustomDateTimeConvention : Convention { public CustomDateTimeConvention() { Properties() .Configure(c => c.HasColumnType("datetime2")); } }
The above method can be called from onModelCreating method of DbContext using modelBuilder.Conventions.Add
Model Convention
Model Conventions are based on the underlying model metadata. There are two types of model conventions, Conceptual (C-Space) and Store (S-Space). This distinction indicates where in the pipeline a convention is executed. A C-Space convention is applied to the model that the application builds, whereas an S-Space convention is applied to the version of the model.
Creating a model convention will require creating a class that implements either IConceptualModelConvention or IStoreModelConvention. These interfaces both accept a generic that can be of type MetadataItem which is used to filter the type data type that the convention applies to.
public class ModelBasedConvention : IStoreModelConvention{ public void Apply(EdmProperty property, System.Data.Entity.Infrastructure.DbModel model) { if (property.PrimitiveType.PrimitiveTypeKind == PrimitiveTypeKind.Decimal && property.Scale == null) { property.Scale = 4; } } }
Stored Procedure Mapping
Entity Framework 6 Code First provides ability to create and use stored procedure for add, update and delete operation. To create stored procedures for Person Entity automatically
protected override void OnModelCreating(DbModelBuilder modelBuilder) { modelBuilder.Entity() .MapToStoredProcedures(); }
This will create three procedures Person_Insert, Person_Update and Person_Delete. Person_Insert and Person_Update stored procedures has parameters name corresponds to the property names. Person_Delete will have primary key property PersonID parameter.
Limitations:
- Only Fluent API can be used to map stored procedures. You cannot use Data Annotation attributes in EF 6 for stored procedure mapping.
- You cannot use mixture of stored procedure and query to add, update and delete operation on the same entity. You can either use stored procedure or SQL query for all add, update and delete operation with entity.
Index Attribute
Entity Framework 6 provides Index attribute to create Index on particular column in the database
class Person { public Person() { } public int PersonID { get; set; } public string FirstName { get; set; } [Index] public datetime DateOfBirth { get; set; } }
By default, Index name will be IX_{property name}.But this can be changed using
[Index( "INDEX_DOB", IsClustered=true, IsUnique=true )] public datetime DateOfBirth { get; set; }