Akka Streams 101

1. Stream Processing

Before we dive into Akka Streams, let’s start with the basic problems of stream processing.

These days, many companies are starting to processing more and more data. But processing Big Data is not enough, we need to process this data fast. Fast Data is the next big thing. System have to process a lot of data, they have to process it as fast as they can, reliably and without making other systems wait. Moving data from producers to consumers may seem simple at first glance, but there is fundamental problem that every stream processing system faces.

The most important problem is that producers and consumers work at different rates. Producers may produce data faster than consumers are able to consume it. This means that consumers have to buffer more and more data as time goes. This may lead to very serious problems.

2. Reactive Streams

Reactive Streams is an initiative to provide a standard for asynchronous stream processing with non-blocking back pressure. This means defining a set of fundamental concepts (interfaces, methods and protocols) that are aimed to solve stream processing problems.

Backpressure, in simple words, is a way consumer is able to report to producer, how much data it can handle in a non-blocking way. This is the goal of Reactive Streams project.

This requires, that at minimum, these streams must be asynchronous and non-blocking.

3. Akka Streams

Akka Streams is a reactive stream processing system that is built on top of Akka Actors.

4. Graph DSL

In this article, we will focus mostly on Akka Graphs. Using Graphs we will be able to build very complex data pipelines, that are suited for any processing task.

If you find yourself lost, please refer to official Akka Streaming documentation.

Before we continue with graphs, let’s understand few more simple Akka Streaming concepts:

Source - A processing stage with exactly one output.

Sink - A processing stage with exactly one input.

Flow - A processing stage which has exactly one input and output.

ActorMaterializer - materialize stream blueprints as running streams using Akka Actors.

Streams, defined in Akka, are just blueprints. They are immutable and can be easily passed between your code. To actually run the stream, you have to use ActorMaterializer, which in turn requires ActorSystem. This makes sense, because Akka Stream is built on top of Akka Actor system.

Let’s take a look at simplest Source example.

5. Metrics

To better understand how Akka Streams work, we will visualize metrics and how Akka Streams working in different scenarios. First, let’s take a look at how it working under simplest scenario and more complex ones.

5.1. Fast Producer, Fast Consumer