Saturday, August 30, 2014

SPARSE Columns and ColumnSets

I had a scenario where I had to load a csv file which had around 2000 columns. Then I started looking at the wide table feature which came as part of SQL Server 2008. A wide table can have 30,000 columns as opposed to a non-wide table which can have 1024 columns.

Having 30,000 or even 1,024 columns in a table is a lot of columns and design review needs to be done for these type of scenarios. In my scenario it is another system which generates this file and the business wants an ability to query it easily via a single table. More or less just for analysis purposes.

Sparse columns are designed to allow a relational database to store and process relatively unstructured data, where any individual entity may have a modest selection from a very wide set of attributes. This sort of data can be found frequently in scientific, pharmaceutical, and medical applications. EAV (Entity/Attribute/Value) storage techniques were engineered for such data; however, EAV data storage has proved to be somewhat inefficient in such areas as query speed and storage cost.

Sparse columns work well where a high proportion of the data in a column is NULL, because negligible storage space is used for the NULL representation. A table that is wide in theory can be made narrow in practice by implementing sparse columns, and the addition of filtered indexes provides effective searches on the sparse data. Sparse columns support the use of check constraints, as long as the constraints allow NULL values (a sparse column requirement).

Column sets provide a solution to the difficulties of efficient imports and exports of sparse data. The untyped XML representation that is generated for each row interacts directly with each sparse field. The XML field accepts inserts, and its data can be extracted and modified easily. This feature effectively supports certain types of serialized object data from various applications.

CREATE TABLE SPARSETABLE
(
 ID INT NOT NULL IDENTITY(1,1)
 ,FIRSTNAME VARCHAR(100) NOT NULL
 ,MIDDLENAME VARCHAR(100) SPARSE NULL
 ,LASTNAME VARCHAR(100) NOT NULL
 ,SPARSEDATASET XML COLUMN_SET FOR ALL_SPARSE_COLUMNS
)

INSERT SPARSETABLE(FIRSTNAME, MIDDLENAME, LASTNAME)
VALUES ('John','Daniel','Smith');
GO

INSERT SPARSETABLE(FIRSTNAME, LASTNAME)
VALUES ('Peter','Wilson');
GO


SELECT [ID]
      ,[FIRSTNAME]
      ,[LASTNAME]
      ,[SPARSEDATASET]
  FROM [SPARSETABLE]

No comments:

Post a Comment