IoT Analytics and Lambda Architecture with Hadoop, Kafka and Druid [Sponsored Talk]
This talk is dedicated to the back-end architecture of IoT analytics solutions. A Lambda Architecture based on an all open-source technology stack will be presented. In this architecture real-time data from IoT devices is ingested into an analytics platform and combined with historical data, and is made available for fast querying/analysis. We use Kafka, Hadoop and column-oriented data stores (like Druid and Kudu). We argue how a solid production ready implementation of Lambda Architecture looks like in real-world in year 2018.
In short: We will introduce Lambda Architecture, the use-cases and motivation behind this architecture and realization of this using Hadoop and Kafka ecosystem. Spark will be used as the technology of choice for batch processing (at batch-layer), SQL via Hive and Impala (at service-layer), Kafka Streams for real-time processing (at speed-layer) in our architecture. Druid and Kudu will be introduced, and we will demonstrate a simple business intelligence (OLAP) use-case on Druid with event stream data.
Join this session to get an overview of a full-blown analytics platform in accordance with the Lambda Architecture, and to get inspired to deploy and implement your own IoT analytics solutions!
Vorkenntnisse
* Basic knowledge of big data technologies and architectures, e.g. Hadoop and Kafka, is helpful.
* Some level of experience with above mentioned technology stack would be helpful.
* Knowledge and experience with SQL and Data Ware Housing would help understanding the architecture better.
Lernziele
* Lambda architecture is a data-processing architecture for handle massive volumes of data by using both batch- and stream-processing methods.
* Apache Hadoop eco-system is a collection of open-source projects that facilitate using clusters of computers, in a distributed system, in order to store and process massive amounts of data.
* Apache Kafka is a distributed-log, streaming platform for storing, processing and forwarding large volumes of real-time data.
* Druid is a column-oriented, distributed data store, designed for ingestion of real-time event data, and low-latency interactive business intelligence/OLAP use-cases.