flink

Flink is an open-source framework for stateful, large-scale, distributed, and fault-tolerant stream processing. Apache Flink allows to ingest massive streaming data (up to several terabytes) from different sources and process it in a distributed fashion way across multiple nodes, before pushing the derived streams to other services or applications such as Apache Kafka, DBs, and Elastic search. Flink enables real-time data analytics on streaming data and fits well for continuous Extract-transform-load (ETL) pipelines on streaming data and for event-driven applications as well. Flink offers multiple operations on data streams or sets such as mapping, filtering, grouping, updating state, joining, defining windows, and aggregating. Features:

Robust Stateful Stream Processing: handle business logic that requires a contextual state while processing the data streams using its DataStream API at any scale
Fault Tolerance: state recovery from faults based on a periodic and asynchronous checkpointing
Exactly-Once Consistency Semantics: each event in the stream is delivered and processed exactly once in case of failures
Scalability
In-Memory Performance: perform all computations by accessing local, often in-memory, state yielding very low processing latencies
seamless connectivity to data sources including Apache Kafka, Elasticsearch, Apache Cassandra, Kinesis
Flexible deployment: can be deployed on various clusters environments (YARN, Apache Mesos, and Kubernetes)
Complex Event Processing (CEP ) library to detect patterns (i.e., event sequences) in data streams
Fluent APIs in Java and Scala
Flink is a true streaming engine comparing to the micro-batch processing model of Spark Streaming

Data Abstraction

DataStream
DataSet

backend storage

Memory State Backend
File System Backend
RocksDB backend

Data Abstraction

backend storage

CATALOG

FEATURED TAGS