Anupama Natarajan: Data Platform Tips 31 - Azure Data Lake Storage Gen2 and Azure Data Factory v2

Saturday, December 28, 2019

Data Platform Tips 31 - Azure Data Lake Storage Gen2 and Azure Data Factory v2

Azure Data Factory (ADF) v2 is a cloud based integration service that allows to populate Azure Data Lake Storage Gen2 with data from on-premises, cloud or SaaS data stores. In this post, let us look at how to upload data from on-premises file system to Azure Data Lake Storage Gen 2.

a) Logon to the Azure Portal.

b) Provision an Azure Data Lake Storage Gen2 service.

c) Look for "Azure Data Factory" in the market place and select "Data Factory".

d) Provision the "Azure Data Factory v2" service.

e) Once provisioned, you can now author your pipelines within Azure Data Factory by clicking "Author & Monitor".

f) Select "Copy Data" to copy data from an API to Azure Data Lake Storage Gen 2.

g) Provide a name for the "Copy Data" task.

h) Select the Source data store as "Generic Protocol" and "OData" and create a new Linked Service.

i) Select the Service URL as "https://services.odata.org/OData/OData.svc" and Authentication type as "Anonymous"

j) Select the source data from the API.

h) Create an "Azure Data Lake Storage Gen2" Destination and also select the folder path under which the files needs to be uploaded.

i) Select the file format as "Parquet" and compression type as "snappy".

j) Now complete the settings and finish the configuration.

k) Now the pipeline would have run successfully and would have created the API output as txt files on Azure Data Lake Storage Gen2. You can check this by using the Azure Storage Explorer.

Anupama Natarajan

Pages

Saturday, December 28, 2019

Data Platform Tips 31 - Azure Data Lake Storage Gen2 and Azure Data Factory v2

No comments:

Post a Comment