Die Softwareentwicklerkonferenz
zu Internet of Things und Industrie 4.0
Köln, KOMED, 4.-6. Juni 2018

building IoT 2018 » Programm »

// IoT Analytics and Lambda Architecture with Hadoop, Kafka and Druid [Sponsored Talk]

This talk is dedicated to the back-end architecture of IoT analytics solutions. A Lambda Architecture based on an all open-source technology stack will be presented. In this architecture real-time data from IoT devices is ingested into an analytics platform and combined with historical data, and is made available for fast querying/analysis. We use Kafka, Hadoop and column-oriented data stores (like Druid and Kudu). We argue how a solid production ready implementation of Lambda Architecture looks like in real-world in year 2018.

In short: We will introduce Lambda Architecture, the use-cases and motivation behind this architecture and realization of this using Hadoop and Kafka ecosystem. Spark will be used as the technology of choice for batch processing (at batch-layer), SQL via Hive and Impala (at service-layer), Kafka Streams for real-time processing (at speed-layer) in our architecture. Druid and Kudu will be introduced, and we will demonstrate a simple business intelligence (OLAP) use-case on Druid with event stream data.

Join this session to get an overview of a full-blown analytics platform in accordance with the Lambda Architecture, and to get inspired to deploy and implement your own IoT analytics solutions!

* Basic knowledge of big data technologies and architectures, e.g. Hadoop and Kafka, is helpful.
* Some level of experience with above mentioned technology stack would be helpful.
* Knowledge and experience with SQL and Data Ware Housing would help understanding the architecture better.

* Lambda architecture is a data-processing architecture for handle massive volumes of data by using both batch- and stream-processing methods.
* Apache Hadoop eco-system is a collection of open-source projects that facilitate using clusters of computers, in a distributed system, in order to store and process massive amounts of data.
* Apache Kafka is a distributed-log, streaming platform for storing, processing and forwarding large volumes of real-time data.
* Druid is a column-oriented, distributed data store, designed for ingestion of real-time event data, and low-latency interactive business intelligence/OLAP use-cases.

// Saba Fallah Saba Fallah

Saba Fallah is the founder and CEO of Qimia GmbH. He has over twelve years of experience as Java Enterprise Architect and five years as Big Data Engineer and Architect. He is a passionate Java and Scala coder, who loves massive data problems and highly scalable system architectures. He has done over 20 Big Data Project for DAX Blue-chip companies in many industries. He founded Qimia GmbH in 2015 and manages and leads a team of 40+ Big Data Consultants and Machine Learning Engineers.