Distributed R

Distributed R is an open source, high-performance platform for the R language. It splits tasks between multiple processing nodes to reduce execution time and analyze large data sets. Distributed R enhances R by adding distributed data structures, parallelism primitives to run functions on distributed data, a task scheduler, and multiple data loaders. It is mostly used to implement distributed versions of machine learning tasks. Distributed R is written in C++ and R, and retains the familiar look and feel of R., Hewlett-Packard provides enterprise support for Distributed R with proprietary additions such as a fast data loader from the Vertica database.

History

Distributed R was begun in 2011 by Indrajit Roy, Shivaram Venkataraman, Alvin AuYoung, and Robert S. Schreiber as a research project at HP Labs. It was open sourced in 2014 under the GPLv2 license and is available at GitHub.
In February 2015, Distributed R reached its first stable version 1.0, along with enterprise support from HP.

Components

Distributed R is a platform to implement and execute distributed applications in R. The goal is to extend R for distributed computing, while retaining the simplicity and look-and-feel of R. Distributed R consists of the following components:

Distributed data structures: Distributed R extends R's common data structures such as array, data.frame, and list to store data across multiple nodes. The corresponding Distributed R data structures are darray, dframe, and dlist. Many of the common data structure operations in R, such as colSums, rowSums, nrow and others, are also available on distributed data structures.
Parallel loop: Programmers can use the parallel loop, called foreach, to manipulate distributed data structures and execute tasks in parallel. Programmers only specify the data structure and function to express applications, while the runtime schedules tasks and, if required, moves around data.
Distributed algorithms: Distributed versions of common machine learning and graph algorithms, such as clustering, classification, and regression.
Data loaders: Users can leverage Distributed R constructs to implement parallel connectors that load data from different sources. Distributed R already provides implementations to load data from files and databases to distributed data structures.
Integration with databases

HP Vertica provides tight integration with their database and the open source Distributed R platform. HP Vertica 7.1 includes features that enable fast, parallel loading from the Vertica database to Distribute R. This parallel Vertica loader can be more than five times faster than using traditional ODBC based connectors. The Vertica database also supports deployment of machine learning models in the database. Distributed R users can call the distributed algorithms to create machine learning models, deploy them in the Vertica database, and use the model for in-database scoring and predictions. Architectural details of the Vertica database and Distributed R integration are described in the Sigmod 2015 paper.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Distributed R

History

Components

Integration with databases