ClickHouse

ClickHouse is an open-source column-oriented DBMS for online analytical processing.
ClickHouse was developed by the Russian IT company Yandex for the Yandex.Metrica web analytics service. ClickHouse allows analysis of data that is updated in real time. The system is marketed for high performance.
The project was released as open-source software under the Apache 2 license in June 2016.
ClickHouse is used by the Yandex.Tank load testing tool. Yandex.Market uses ClickHouse to monitor site accessibility and KPIs. ClickHouse was also implemented at CERN’s LHCb experiment to store and process metadata on 10 billion events with over 1000 attributes per event, and Tinkoff Bank uses ClickHouse as a data store for a project.

History

Yandex.Metrica previously used a classical approach, when raw data was stored in aggregated form. This approach can help reduce the amount of stored data. However, it has several limitations and disadvantages:

The list of available reports must be pre-determined, and there is no way to make a custom report.
The volume of data can increase after aggregation. This happens when data is aggregated by a large number of keys or using keys with high cardinality.
It's difficult to support logical consistency around reports with different aggregations.

A different approach is to store unaggregated data. Processing raw data requires a high-performance system, since all calculations are made in real time. To solve this problem, a column-oriented DBMS is needed that can handle analytical data on the scale of the entire Internet. Yandex began developing its own.
The first ClickHouse prototype appeared in 2009. By the end of 2014, Yandex.Metrica version 2.0 was released. The new version has an interface for creating custom reports and uses ClickHouse for storing and processing data.

Features

The main features of the ClickHouse DBMS are:

True column-oriented DBMS. Nothing is stored with the values. For example, constant-length values are supported to avoid storing their length "number" next to the values.
Linear scalability. It's possible to extend a cluster by adding servers.
Fault tolerance. The system is a cluster of shards, where each shard is a group of replicas. ClickHouse uses asynchronous multimaster replication. Data is written to any available replica, then distributed to all the remaining replicas. ZooKeeper is used for coordinating processes, but it's not involved in query processing and execution.
Capability to store and process petabytes of data.
SQL support. ClickHouse supports an extended SQL-like language that includes arrays and nested data structures, approximate and URI functions, and the availability to connect an external key-value store.
High performance.
* Vector calculations are used. Data is not only stored by columns, but is processed by vectors. This approach allows to achieve high CPU performance.
* Sampling and approximate calculations are supported.
* Parallel and distributed query processing is available.
Data compression.
Hard disk drive optimization. The system can process data that doesn't fit in random-access memory.
Clients for database connectivity. Database connection options include the console client, the HTTP API, or one of the wrappers. A JDBC driver is also available for ClickHouse.
Limitations

ClickHouse has some features that can be considered disadvantages:

There is no support for transactions.
By default when performing aggregations the query intermediate states must fit in the RAM on a single server. However ClickHouse can be configured to spill on the disk in such case.
Lack of full-fledged UPDATE/DELETE implementation.
Use cases

ClickHouse was designed for OLAP queries.

It works with a small number of tables that contain a large number of columns.
Queries can use a large number of rows extracted from the DB, but only a small subset of columns.
Queries are relatively rare.
For simple queries, latencies of about 50 ms are allowed.
Column values are fairly small, usually consisting of numbers and short strings.
High throughput is required when processing a single query.
A query result is mostly filtered or aggregated.
Data update uses a simple scenario.

One of the common cases for ClickHouse is server log analysis. After setting regular data uploads to ClickHouse, it's possible to analyze incidents with instant queries or monitor a service's metrics, such as error rates, response times, and so on.
ClickHouse can also be used as an internal data warehouse for in-house analysts. ClickHouse can store data from different systems and analysts can build internal dashboards with the data or perform real-time analysis for business purposes.

Benchmark results

According to benchmark tests conducted by developers, for OLAP queries ClickHouse is more than 100 times faster than Hive or MySQL.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

ClickHouse

History

Features

Limitations

Use cases

Benchmark results