Data lake

A data lake is a system or repository of data stored in its natural/raw format, usually object blobs or files. A data lake is usually a single store of all enterprise data including raw copies of source system data and transformed data used for tasks such as reporting, visualization, advanced analytics and machine learning. A data lake can include structured data from relational databases, semi-structured data, unstructured data and binary data. A data lake can be established "on premises" or "in the cloud".
A data swamp is a deteriorated and unmanaged data lake that is either inaccessible to its intended users or is providing little value.

Background

James Dixon, then chief technology officer at Pentaho, coined the term to contrast it with data mart, which is a smaller repository of interesting attributes derived from raw data. In promoting data lakes, he argued that data marts have several inherent problems, such as information siloing. PricewaterhouseCoopers said that data lakes could "put an end to data silos." In their study on data lakes they noted that enterprises were "starting to extract and place data for analytics into a single, Hadoop-based repository."
Hortonworks, Google, Oracle, Microsoft, Zaloni, Teradata, Impetus Technologies, Cloudera, and Amazon now all have data lake offerings.

Examples

Many companies use cloud storage services such as Google Cloud Storage and Amazon S3 or a distributed file system such as Apache Hadoop. There is a gradual academic interest in the concept of data lakes. For example, Personal DataLake at Cardiff University is a new type of data lake which aims at managing big data of individual users by providing a single point of collecting, organizing, and sharing personal data.
An earlier data lake had limited capabilities with its batch-oriented processing and was the only processing paradigm associated with it. Interacting with the data lake meant one had to have expertise in Java with map reduce and higher level tools like Apache Pig, Apache Spark and Apache Hive.

Criticism

In June 2015, David Needle characterized "so-called data lakes" as "one of the more controversial ways to manage big data". PricewaterhouseCoopers was also careful to note in their research that not all data lake initiatives are successful. They quote Sean Martin, CTO of Cambridge Semantics,
They describe companies that build successful data lakes as gradually maturing their lake as they figure out which data and metadata are important to the organization. Another criticism is that the concept is fuzzy and arbitrary. It refers to any tool or data management practice that does not fit into the traditional data warehouse architecture. The data lake has been referred to as a particular technology. The data lake has been labeled as a raw data reservoir or a hub for ETL offload. The data lake has been defined as a central hub for self-service analytics. The concept of the data lake has been overloaded with meanings, which puts the usefulness of the term into question..
While critiques of data lakes are warranted, in many cases they are overly broad and could be applied to any technology endeavor generally and data projects specifically. For example, the term “data warehouse” currently suffers from the same opaque and changing definition as a data lake. It can also be said that not all data warehouse efforts have been successful either. In response to various critiques, McKinsey noted that the data lake should be viewed as a service model for delivering business value within the enterprise, not a technology outcome.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...