As promised last time, we talk about the first of two blog-post of Tyler Akidau, in which he explains the concepts behind ‘streaming’ and why you should call it different.

Shownotes

The links to the two blog post we are talking about. The second is quite nice, as it has some video visualisation embedded.

Sum-up

As we go through the blog posts, you could just read it. :) For those willing to follow along while listening, we provide a bullet-point-ish outline here.

Background

First we make clear that we have to be precise about the terminology and the capabilities, as well as the time-domains.

It’s not about how you do it, it’s how you design it. A processing engine that is designed for infinite data sets.

Streaming? WTF

Streaming is a mouthful, and Tyler points out the terminology should be way clearer:

Unbounded Data, which are an ever growing, infinite set of data
Unbounded Data Processing, the way to continuously deal with the aforementioned unbounded data…
Low-latency, approximate, and/or speculative results: Historically streaming was sold as low-latency, but with the drawback that it won’t give you correct results. That is not true anymore.

Limit of Streaming

How unbounded data and batches could work together by using the Lambda Architecture, but within the same sentence why this is a sub-optimal idea.

Segwaying somehow into, why tools for reasoning about time and correct results are much cooler and how to put this together using something like Kafka…

Correctness

You need to be able to store persistently and replay the stream if necessary.

Papers: - MillWheel - Spark Streaming

Event vs. Processing Time

Yeah, this topic was touched in our very first (and very german) podcast and it will be talked about here as well.

Data processing Patterns
Bounded Data
Unbounded Data - batch
Fixed Windows
Session
Unbounded Data - streaming
Time agnostic
Filtering
Inner-Joins
Approximation Algorithms
Windowing
- Fixed Windows
- Sliding Windows
- Sessions
Windowing by Processing Time
Windowing by Event Time
- Buffering / Completness

Download .mp3 (67.3M)

IB21 Newsflash No6 13 October 2018

IB20 Newsflash No5 29 May 2018

IB19 Newsflash No4 25 April 2018

IB18 Stateful Stream Processing 05 November 2017

IB17 DockerConEU Newsflash 17 October 2017

IB16 Newscast No3 15 October 2017

IB15 Newscast No2 09 October 2017

IB14 Newscast No1 01 October 2017

IB13 HPCAC Student Cluster Competition w/ Dan 29 September 2017

IB12 High-Performance Commoditization 28 September 2017

IB11 IT Operations w/ Albert 15 September 2017

IB10 WrapUp 2016 w/ Patrick 20 December 2016

IB9 Beyond Batch 31 August 2016

Shownotes

Sum-up

Background

Streaming? WTF

Limit of Streaming

Correctness

Event vs. Processing Time

IB8 Apache Stream Processing 12 August 2016

IB7 Data Intensive Pipelines 27 July 2016

IB6 Docker Bundle 12 July 2016

IB5 Boxes 15 June 2016

IB4 Buzzword Inception 10 June 2016

IB3 Boeuf Strugeon [DE] 27 May 2016

IB2 Ruby Mett 24 May 2016

IB1 Piped Metrics [DE] 11 May 2016

Insert Buzzword
Fireside chat between Ops and Dev.

IB21 Newsflash No6 13 October 2018

IB20 Newsflash No5 29 May 2018

IB19 Newsflash No4 25 April 2018

IB18 Stateful Stream Processing 05 November 2017

IB17 DockerConEU Newsflash 17 October 2017

IB16 Newscast No3 15 October 2017

IB15 Newscast No2 09 October 2017

IB14 Newscast No1 01 October 2017

IB13 HPCAC Student Cluster Competition w/ Dan 29 September 2017

IB12 High-Performance Commoditization 28 September 2017

IB11 IT Operations w/ Albert 15 September 2017

IB10 WrapUp 2016 w/ Patrick 20 December 2016

IB9 Beyond Batch 31 August 2016

Shownotes

Sum-up

Background

Streaming? WTF

Limit of Streaming

Correctness

Event vs. Processing Time

IB8 Apache Stream Processing 12 August 2016

IB7 Data Intensive Pipelines 27 July 2016

IB6 Docker Bundle 12 July 2016

IB5 Boxes 15 June 2016

IB4 Buzzword Inception 10 June 2016

IB3 Boeuf Strugeon [DE] 27 May 2016

IB2 Ruby Mett 24 May 2016

IB1 Piped Metrics [DE] 11 May 2016

Insert Buzzword Fireside chat between Ops and Dev.

Insert Buzzword
Fireside chat between Ops and Dev.