Monday, January 20, 2020

Data Platform Tips 54 - Azure HDInsight

Azure HDInsight is managed cloud service which comprises of the complete spectrum of open source analytics frameworks like Hadoop, Apache Spark, Apache Kafka, Apache Storm, Apache Hive and R etc.

Components and versions available on Azure HDInsight - https://docs.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning

Azure HDInsight supports various business scenarios in Big Data Processing and Analytics like Batch processing, Real time stream processing, IoT, Data Science and other Advanced Analytics capabilities.

Azure HDInsight Ecosystem




















Cluster types supported in Azure HDInsight

  • Apache Hadoop - Framework that has HDFS, YARN and MapReduce programming model
  • Apache Spark - Open source parallel and in-memory processing framework
  • Apache HBase - NoSQL database built on Hadoop
  • Apache Storm - Framework to process large volumes of streaming datasets
  • Apache Interactive Query - In-memory caching for Interactive Hive queries
  • Apache Kafka - Open-source platform for developing streaming pipelines and applications.
  • ML Services- Platform for hosting distributed R processes.

Storage for Azure HDInsight clusters

Any of the following storage services can be used with Azure HDInsight Clusters.

  • Azure Storage
  • Azure Data Lake Storage Gen 2
  • Azure Data Lake Storage Gen 1

No comments:

Post a Comment