Streaming Analytics with Analytic Models (R, Spark MLlib, H20, PMML)

In March 2016, I had a talk at Voxxed Zurich about “How to Apply Machine Learning and Big Data Analytics to Real Time Processing”.

Finding Insights with R, H20, Apache Spark MLlib, PMML and TIBCO Spotfire

Big Data” is currently a big hype. Large amounts of historical data are stored in Hadoop or other platforms. Business Intelligence tools and statistical computing are used to draw new knowledge and to find patterns from this data, for example for promotions, cross-selling or fraud detection. The key challenge is how these findings can be integrated from historical data into new transactions in real time to make customers happy, increase revenue or prevent fraud.

Putting Analytic Models into Action via Event Processing and Streaming Analytics

Fast Data” via stream processing is the solution to embed patterns – which were obtained from analyzing historical data – into future transactions in real-time. The following slide deck uses several real world success stories to explain the concepts behind stream processing and its relation to Apache Hadoop and other big data platforms. I discuss how patterns and statistical models of R, Apache Spark MLlib, H20, and other technologies can be integrated into real-time processing using open source stream processing frameworks (such as Apache Storm, Spark Streaming or Flink) or products (such as IBM InfoSphere Streams or TIBCO StreamBase). A live demo showed the complete development lifecycle combining analytics with TIBCO Spotfire, machine learning via R and stream processing via TIBCO StreamBase and TIBCO Live Datamart.

Slide Deck from Voxxed Zurich 2016

Here is the slide deck:

http://www.slideshare.net/KaiWaehner/how-to-apply-machine-learning-with-r-h20-apache-spark-mllib-or-pmml-to-real-time-streaming-analytics

Kai Waehner

bridging the gap between technical innovation and business value for real-time data streaming and applied AI.

Recent Posts

dbt Meets Apache Flink: One Workflow for Data Engineers on Snowflake, BigQuery, Databricks, and Confluent

Two toolchains, two skill sets, two CI/CD pipelines — that has been the reality for…

3 months ago

The Shift Left Architecture 2.0: Operational, Analytical and AI Interfaces for Real-Time Data Products

The Shift Left Architecture moves data integration logic into an event-driven architecture where governed data…

3 months ago

UFC VIP Experience Worth the Price? Fan Review. Business Perspective. Tech Vision.

The Ultimate Fighting Championship (UFC) held Fight Night London on March 21, 2026, at The…

3 months ago

Dashboards and Queries for Apache Kafka: Operational, Explorative, and the Role of the Context Engine

Dashboards are a popular way to make streaming data visible and useful, but they are…

3 months ago

Data Streaming at MWC 2026: How Apache Kafka, Flink and Agentic AI Power Telecom Trends

Mobile World Congress (MWC) 2026 highlights the shift from batch systems to real time data…

4 months ago

From Takeoff to Touchdown: Real-Time Aviation with Data Streaming at Qantas

This blog post explores how data streaming transforms airline operations by enabling real-time visibility, faster…

4 months ago