Wednesday, December 25, 2019

Data Platform Tips 28 - Azure Data Lake Storage Gen 2

Azure Data Lake Storage is a repository that provides organisations to store structured, semi-structured and unstructured data to perform high big data analytics.

Azure Data Lake Storage is available both as Gen 1 and Gen 2. Gen 2 is a super-set of Gen 1 and provides the best of both Azure Blob storage and Azure Data Lake Gen 1 storage functionality. Azure Data Lake Storage Gen 2 is a "no-comprises" Data Lake that is secure, performant, massively-scalable Data Lake storage that brings the cost and scale profile of object storage together.












Gen 2 uses the same low cost storage model as Blob Storage. It has integrations with other Azure Services like Azure Data Factory, Azure Synapse Analytics, Azure Databricks, Power BI. It provides highly perfomant HDFS compatible file system and capable of providing 1 TB/s throughput.

Azure Data Lake Gen 2 Architecture














The primary way of accessing data from the Data Lake Storage Gen 2 is through the Hadoop File System. Gen 2 provides access to the file system through a new driver "Azure Blob File System" (ABFS) driver. It is designed to support file system semantics over Azure Blob Storage. ABFS driver is part of Apache Hadoop and is included in many commercial distributions of Hadoop.

Gen 2 provides better file system performance is through the Hierarchical namespace. It allows the collection of objects to be organised in hierarchy of directories and sub directories providing scalability and cost-effectiveness of object storage.

Provisioning ADLS Gen 2 service on Azure Portal

a) Log on to the Azure Portal

b) Create a new resource group named "ADLSGen2"

c) Search for "Storage Account" in the Azure marketplace and select the Storage Account service.










d) Provide the details for the storage account.

























e) Make sure to enable "Data Lake Storage Gen 2" in the "Advanced" tab.

























f) Once provisioned, you can see the "Containers" section and under which you can create your file system to store the Data Lake Storage files.










g) You can now go ahead and create your file system including folders and sub folders to store your Data Lake storage objects.


No comments:

Post a Comment