Anupama Natarajan: August 2014

Sunday, August 31, 2014

SSRS Preview in VS 2013: There was no endpoint listening at net.pipe://localhost/PreviewProcessingService4440/ReportProcessing that could accept the message.

When you preview the SSRS report, a console window opens and don't close that window. If you have done that already then close and open the solution and try preview again.

Saturday, August 30, 2014

SPARSE Columns and ColumnSets

I had a scenario where I had to load a csv file which had around 2000 columns. Then I started looking at the wide table feature which came as part of SQL Server 2008. A wide table can have 30,000 columns as opposed to a non-wide table which can have 1024 columns.

Having 30,000 or even 1,024 columns in a table is a lot of columns and design review needs to be done for these type of scenarios. In my scenario it is another system which generates this file and the business wants an ability to query it easily via a single table. More or less just for analysis purposes.

Sparse columns are designed to allow a relational database to store and process relatively unstructured data, where any individual entity may have a modest selection from a very wide set of attributes. This sort of data can be found frequently in scientific, pharmaceutical, and medical applications. EAV (Entity/Attribute/Value) storage techniques were engineered for such data; however, EAV data storage has proved to be somewhat inefficient in such areas as query speed and storage cost.

Sparse columns work well where a high proportion of the data in a column is NULL, because negligible storage space is used for the NULL representation. A table that is wide in theory can be made narrow in practice by implementing sparse columns, and the addition of filtered indexes provides effective searches on the sparse data. Sparse columns support the use of check constraints, as long as the constraints allow NULL values (a sparse column requirement).

Column sets provide a solution to the difficulties of efficient imports and exports of sparse data. The untyped XML representation that is generated for each row interacts directly with each sparse field. The XML field accepts inserts, and its data can be extracted and modified easily. This feature effectively supports certain types of serialized object data from various applications.

CREATE TABLE SPARSETABLE
(
 ID INT NOT NULL IDENTITY(1,1)
 ,FIRSTNAME VARCHAR(100) NOT NULL
 ,MIDDLENAME VARCHAR(100) SPARSE NULL
 ,LASTNAME VARCHAR(100) NOT NULL
 ,SPARSEDATASET XML COLUMN_SET FOR ALL_SPARSE_COLUMNS
)

INSERT SPARSETABLE(FIRSTNAME, MIDDLENAME, LASTNAME)
VALUES ('John','Daniel','Smith');
GO

INSERT SPARSETABLE(FIRSTNAME, LASTNAME)
VALUES ('Peter','Wilson');
GO


SELECT [ID]
      ,[FIRSTNAME]
      ,[LASTNAME]
      ,[SPARSEDATASET]
  FROM [SPARSETABLE]

Capture Metadata for Data Warehouses using SQL Server Extended Properties

SQL SERVER extended properties allows developers to load metadata information like description, Source etc on a table or column within a table. This will allow information to be documented and attached to SQL SERVER and also queryable.

--Add Extended Property for a table
EXEC sp_addextendedproperty 
 @name=N'Description', 
 @value=N'Dimension with the list of all products.' , 
 @level0type=N'SCHEMA',
 @level0name=N'dbo', 
 @level1type=N'TABLE',
 @level1name=N'DIM_PRODUCTS'

--Update Extended Property for the table
EXEC sp_updateextendedproperty 
 @name=N'Description', 
 @value=N'Dimension with the list of all available products.' , 
 @level0type=N'SCHEMA',
 @level0name=N'dbo', 
 @level1type=N'TABLE',
 @level1name=N'DIM_PRODUCTS'
GO

--Add Extended Property for a column
EXEC sp_addextendedproperty 
 @name=N'Description', 
 @value=N'Name of the Product.' , 
 @level0type=N'SCHEMA',
 @level0name=N'dbo', 
 @level1type=N'TABLE',
 @level1name=N'DIM_PRODUCTS', 
 @level2type=N'COLUMN',
 @level2name=N'PRODUCT_NAME'

--Update Extended Property for a column
EXEC sp_updateextendedproperty 
 @name=N'Description', 
 @value=N'Name of the Product' , 
 @level0type=N'SCHEMA',
 @level0name=N'dbo', 
 @level1type=N'TABLE',
 @level1name=N'DIM_PRODUCTS', 
 @level2type=N'COLUMN',
 @level2name=N'PRODUCT_NAME'
GO

--Query Extended Properties
SELECT objname, name, value
FROM fn_listextendedproperty(NULL, 'schema', 'dbo', 'table', default, NULL, NULL)

SELECT objtype, objname, name, value
FROM fn_listextendedproperty (NULL, 'schema', 'dbo', 'table', 'DIM_PRODUCTS', 'column', default);

Saturday, August 16, 2014

Package Configurations missing in SSIS 2012

Couldn't find the package configurations tab in SSIS 2012. This is because Package Configurations in SSIS 2012 is available only for the Package Deployment Model and not for Project Deployment Model.

Make sure to convert the project to Package Deployment Model by right-clicking the project and click on Convert to Package Deployment Model.

Project consistency check failed. The following inconsistencies were detected: Test BIML [PackageName].dtsx has a different ProtectionLevel than the project.

If you get this error when trying to create a package using BIML in SSIS 2012 using Project Deployment Model, make sure that the Project ProtectionLevel matches the Package ProtectionLevel.

Monday, August 04, 2014

SSIS Best Practices 3 - Handling and Logging Data Load Errors

Normally you don't want a ETL load to fail because of any data errors. For instance, if you have a file with 1000 rows with 10 rows having some issues, you don't want the load to fail. Instead the package must successfully load the 990 rows and the 10 error rows need to be redirected to a separate table to identify the issue and fix them. If it is a flat file then it is worth while to generate and include a Row Number which will help in identifying the row in the flat file.

The error table will have all columns with data type of varchar(max) across all columns with additional columns like ErrorCode, ErrorColumn and ErrorDescription.

Error rows are handled in the Data Flow through the use of the Error Row Configuration properties. These properties tell the Data Flow Components what to do when a row fails an operation, such as data conversion or truncation.

Redirect Row Event Handler denotes that when a row reaches an error for any column marked as redirected, the entire row is sent out the red error path output which normally connects to the Error table destination. If the red error path output is not used then the row will be ignored out of the Data flow.

public override void Input0_ProcessInputRow(Input0Buffer Row) 
{ 
Row.ErrDesc = ComponentMetaData.GetErrorDescription(Row.ErrorCode); 
}

Sunday, August 03, 2014

SSIS Best Practices 2 - Single Package vs Multiple Packages

When developing an ETL package the first thing that comes to the mind is whether to build single package doing the end-to-end flow or multiple packages doing the same.

Being an Architect, I have been challenged with this question many times from my development team especially around single package model helps in maintaining less number of packages whereas multiple packages creates an overhead with the number of packages.

One Package = One Unit of Work

It is always best practice to have one package doing only one thing. The following benefits can be achieved doing so.

Less time spent waiting on design-time validation. SQL Server Data Tools has a rich interface that provides, among other things, a near real-time evaluation of potential metadata problems in the SSDT designer. If, for example, a table that is accessed by the SSIS package is changed, the developer will be presented with a warning (or an error, if applicable) in SSDT indicating that metadata accessed by the package has changed. This constant metadata validation is beneficial in that it can help to identify potential problems before they are pushed out for testing. There's also a performance cost associated with this. The length of time required for validation increases as the size of the package increases, so naturally keeping the packages as small as practical will cut down on the amount of time you're drumming your fingers on your desk waiting for validation to complete.
Easier testing and deployment. A single package that loads, say, 10 dimensions has a lot of moving parts. When developing each of the dimensions within the package, there is no easy way to test just one element of the package (apart from manually running it in the SSDT designer, which isn't a completely realistic test for a package that will eventually be deployed to the server). The only realistic test for such a package would be to test the entire package as a server-based execution, which may be overkill if you're only interested in one or two changed properties. Further, it's not uncommon for organizations with formal software testing and promotion procedures to require that the entire thing be retested, not just the new or changed elements. By breaking up operations into smaller units, testing and deployment are usually less of a burden because you are only operating on one component at a time.
Distributed development. If you work in an environment where you are the only person developing SSIS packages, this is less of a concern. However, if your shop has multiple ETL developers, those do-everything packages are quite inconvenient. Although it's much easier in SQL Server 2012 than in previous releases to compare differences between versions of the same package file, it's still a mostly manual process. By segmenting the workload into multiple packages, it's much easier to farm out development tasks to multiple people without having to reconcile multiple versions of the same package.
Reusability. It's not uncommon for the same logic to be used more than once in the same ETL execution, or for that same logic to be shared among multiple ETL processes. By encapsulating these logical units of work into their own packages, it's much easier to share that logic and avoid duplicate development.

Also we follow the following design patterns

Each dimension has a separate package.
Each fact table has a separate package.
Staging operations (if used) each have their own package.
Dropping or disabling indexes, Truncating tables can be grouped together and can typically reside in the same package.

Saturday, August 02, 2014

SSIS Best Practices 1 - Naming Conventions

Always name the tasks and components using the following acronyms at the beginning of the task and component names. Following these naming conventions will facilitate on-going maintenance as well as making any global changes to the packages as package updates can be performed programmatically.

Task	Prefix
For Loop Container	FLC
Foreach Loop Container	FELC
Sequence Container	SEQC
ActiveX Script	AXS
Analysis Services Execute DDL	ASE
Analysis Services Processing	ASP
Bulk Insert	BLK
Data Flow	DFT
Data Mining Query	DMQ
Execute DTS 2000 Package	EDPT
Execute Package	EPT
Execute Process	EPR
Execute SQL	SQL
File System	FSYS
FTP	FTP
Message Queue	MSMQ
Script	SCR
Send Mail	SMT
Transfer Database	TDB
Transfer Error Messages	TEM
Transfer Jobs	TJT
Transfer Logins	TLT
Transfer Master Stored Procedures	TSP
Transfer SQL Server Objects	TSO
Web Service	WST
WMI Data Reader	WMID
WMI Event Watcher	WMID
XML	XML
Expression	EXPR

Component	Prefix
DataReader Source	DR_SRC
Excel Source	EX_SRC
Flat File Source	FF_SRC
OLE DB Source	OLE_SRC
Raw File Source	RF_SRC
XML Source	XML_SRC
Aggregate	AGG
Audit	AUD
Character Map	CHM
Conditional Split	CSPL
Copy Column	CPYC
Data Conversion	DCNV
Data Mining Query	DMQ
Derived Column	DER
Export Column	EXPC
Fuzzy Grouping	FZG
Fuzzy Lookup	FZL
Import Column	IMPC
Lookup Column	LKP
Merge	MRG
Merge Join	MRGJ
Multicast	MLT
OLE DB Command	CMD
Percentage Sampling	PSMP
Pivot	PVT
Row Count	CNT
Row Sampling	RSMP
Script Component	SCR
Slowly Changing Dimension	SCD
Sort	SRT
Term Extraction	TEX
Term Lookup	TEL
Union All	ALL
Unpivot	UPVT
Data Mining Model Training	DMMT_DST
Data Reader Destination	DR_DST
Dimension Processing	DP_DST
Excel Destination	EX_DST
Flat File Destination	FF_DST
OLE DB Destination	OLE_DST
Partition Processing	PP_DST
Raw File Destination	RF_DST
Recordset Destination	RS_DST
SQL Server Destination	SS_DST
SQL Server Mobile Destination	SSM_DST

Note: Thanks to Jamie Thomson for the naming conventions.

Entity Framework 6 Features

Async Query and Save

Entity Framework 6 provides the ability to execute a query and command asynchronously using DbContext.

Code based Configuration

Entity framework configurations that are usually set in app.config section now can be set programmatically using System.Data.Entity.DbConfiguration class.

Db Command Logging

EF6 provides an ability to log all the activity performed using context.database.Log

using (var context = new PersonEntities())
            {
                context.Database.Log = Console.Write;
                var person = context.People.FirstOrDefault(p => p.FirstName == "John");
                if (person != null) person.LastName = "Smith";
                context.SaveChanges();
            }

Command Interception

Starting with EF6.1 interceptors can be registered in the config file. Interceptors allows to run additional logic when EF performs certain operations, such as executing database queries, opening connections, etc. To append to the log file if the log file already exists

Transactions

EF 6 has introduced Database.BeginTransaction and Database.UseTransaction to provide more control over transaction. In previous versions of Entity Framework, whenever you execute SaveChanges() to do Create, Update, Delete operations in the database it will be wrapped in a transaction. This transaction lasts only until the operation is executed. A new transaction is started once another operation is initiated.

Database.BeginTransaction() – An easier method for a user to start and complete themselves with an existing DbContext – allowing several operations to be combined within the same transaction and hence either all committed or all rolled back as one. It also allows the user to more easily specify the isolation level for the transaction

Database.UseTransaction() – Allows the DbContext to use a transaction which was started outside Entity Framework.

using (System.Data.Entity.DbContextTransaction dbTran = context.Database.BeginTransaction( ))
        {
            try
            {
                Person person = new Person() { FirstName = "John" };
                context.Persons.Add(std1);
                context.Database.ExecuteSqlCommand(
                    @"UPDATE Person SET FirstName = 'James'" +
                        " WHERE PersonID =1"
                    );
                context.Persons.Remove(std1);
                //saves all above operations within one transaction
                context.SaveChanges();
                //commit transaction
                dbTran.Commit();
            }
            catch (Exception ex)
            {
                //Rollback transaction if exception occurs
                dbTran.Rollback();
            }
        }

DbSet.AddRange and DbSet.RemoveRange

DbSet in EF6 has introduced new methods AddRange and RemoveRange. DbSet.AddRange adds collection of entities to the DbContext. DbSet.RemoveRange removes collection of entities from DbSet.

            IList persons = new List();
            persons.Add(new Person() { FirstName = "Person 1" });
            persons.Add(new Person() { FirstName = "Person 2" });
            persons.Add(new Person() { FirstName = "Person 3" });
            using (var context = new PersonEntities())
            {
                context.People.AddRange(persons);
                context.SaveChanges();
            }
            using (var context = new PersonEntities())
            {
                IList personList = context.People.Where(p => p.FirstName == "First Name 1").ToList();
                context.People.RemoveRange(personList);
                context.SaveChanges();

Custom Conventions

EF6 provides the ability to define own custom conventions which will be the default behavior for the models. There are 2 types of conventions, Configuration Conventions and Model Conventions

Configuration Convention

Configuration Conventions are the way to configure set of objects without overriding configuration provided by Fluent API. We can define this convention inside OnModelCreating method or we can use own class that inherits Convention class.

Suppose you would like to configure all DateTime properties must of data type DateTime2 across all tables in the model this can be easily achieved by Configuration Conventions by overriding the onModelCreating method or having a separate class that inherits Convention class.

protected override void onModelCreating(DbModelBuilder modelBuilder)
{
modelBuilder.Properties() 
            .Configure(c => c.HasColumnType("datetime2"));
}

Using Convention class

class CustomDateTimeConvention : Convention
{
 public CustomDateTimeConvention()
 {
  Properties() 
            .Configure(c => c.HasColumnType("datetime2"));
 }
}

The above method can be called from onModelCreating method of DbContext using modelBuilder.Conventions.Add()

Model Convention

Model Conventions are based on the underlying model metadata. There are two types of model conventions, Conceptual (C-Space) and Store (S-Space). This distinction indicates where in the pipeline a convention is executed. A C-Space convention is applied to the model that the application builds, whereas an S-Space convention is applied to the version of the model.

Creating a model convention will require creating a class that implements either IConceptualModelConvention or IStoreModelConvention. These interfaces both accept a generic that can be of type MetadataItem which is used to filter the type data type that the convention applies to.

public class ModelBasedConvention : IStoreModelConvention
{
    public void Apply(EdmProperty property, System.Data.Entity.Infrastructure.DbModel model)
    {
        if (property.PrimitiveType.PrimitiveTypeKind == PrimitiveTypeKind.Decimal
            && property.Scale == null)
        {
            property.Scale = 4;
        }
    }
}

Stored Procedure Mapping

Entity Framework 6 Code First provides ability to create and use stored procedure for add, update and delete operation. To create stored procedures for Person Entity automatically

    protected override void OnModelCreating(DbModelBuilder modelBuilder)
    {
        modelBuilder.Entity()
            .MapToStoredProcedures();
    }

This will create three procedures Person_Insert, Person_Update and Person_Delete. Person_Insert and Person_Update stored procedures has parameters name corresponds to the property names. Person_Delete will have primary key property PersonID parameter.

Limitations:

Only Fluent API can be used to map stored procedures. You cannot use Data Annotation attributes in EF 6 for stored procedure mapping.
You cannot use mixture of stored procedure and query to add, update and delete operation on the same entity. You can either use stored procedure or SQL query for all add, update and delete operation with entity.

Index Attribute

Entity Framework 6 provides Index attribute to create Index on particular column in the database

        class Person
        {
            public Person()
            {
            }
            public int PersonID { get; set; }
            public string FirstName { get; set; }      
            [Index]
            public datetime DateOfBirth { get; set; }
        }

By default, Index name will be IX_{property name}.But this can be changed using

    [Index( "INDEX_DOB", IsClustered=true, IsUnique=true )]
    public datetime DateOfBirth { get; set; }

Pages