Articulated body pose estimation

Articulated body pose estimation in computer vision is the study of algorithms and systems that recover the pose of an articulated body, which consists of joints and rigid parts using image-based observations. It is one of the longest-lasting problems in computer vision because of the complexity of the models that relate observation with pose, and because of the variety of situations in which it would be useful.

Description

Perception of human beings in their neighboring environment is an important capability that robots must possess. If a person uses gestures to point to a particular object, then the interacting machine should be able to understand the situation in real world context. Thus pose estimation is an important and challenging problem in computer vision, and many algorithms have been deployed in solving this problem over the last two decades. Many solutions involve training complex models with large data sets.
Pose estimation is a difficult problem and an active subject of research because the human body has 244 degrees of freedom with 230 joints. Although not all movements between joints are evident, the human body is composed of 10 large parts with 20 degrees of freedom. Algorithms must account for large variability introduced by differences in appearance due to clothing, body shape, size, and hairstyles. Additionally, the results may be ambiguous due to partial occlusions from self-articulation, such as a person's hand covering their face, or occlusions from external objects. Finally, most algorithms estimate pose from monocular images, taken from a normal camera. Other issues include varying lighting and camera configurations. The difficulties are compounded if there are additional performance requirements. These images lack the three-dimensional information of an actual body pose, leading to further ambiguities. There is recent work in this area wherein images from RGBD cameras provide information about color and depth.
There is a need to develop accurate, tether-less, vision-based articulated body pose estimation systems to recover the pose of bodies, such as the human body, a hand, or non-human creatures. Such a system has several foreseeable applications, including the following:

Markerless motion capture for human-computer interfaces,
Physiotherapy,
Human image synthesis,
Ergonomics studies,
Robot control, and
Visual surveillance.

The typical articulated body pose estimation system involves a model-based approach, in which the pose estimation is achieved by maximizing/minimizing a similarity/dissimilarity between an observation and a template model. Different kinds of sensors have been explored for use in making the observation, including the following:

Visible wavelength imagery,
Long-wave thermal infrared imagery,
Time-of-flight imagery, and
Laser range scanner imagery.

These sensors produce intermediate representations that are directly used by the model. The representations include the following:

Image appearance,
Voxel reconstruction,
3D point clouds, and sum of Gaussian kernels
3D surface meshes.
Part models

The basic idea of part based model can be attributed to the human skeleton. Any object having the property of articulation can be broken down into smaller parts wherein each part can take different orientations, resulting in different articulations of the same object. Different scales and orientations of the main object can be articulated to scales and orientations of the corresponding parts. To formulate the model so that it can be represented in mathematical terms, the parts are connected to each other using springs. As such, the model is also known as a spring model. The degree of closeness between each part is accounted for by the compression and expansion of the springs. There is geometric constraint on the orientation of
springs. For example, limbs of legs cannot move 360 degrees. Hence parts cannot have that extreme orientation. This reduces the possible permutations.
The spring model forms a graph G where V corresponds to the parts and E represents springs connecting two neighboring parts. Each location in the image can be reached by the and coordinates of the pixel location. Let be point at location. Then the cost associated in joining the spring between and the point can be given by. Hence the
total cost associated in placing components at locations is given by
The above equation simply represents the spring model used to describe body pose. To estimate pose from images, cost or energy function must be minimized. This energy function consists of two terms. The first is related to how each component matches the image data and the second deals with how much the
oriented parts match, thus accounting for articulation along with object detection.
The part models, also known as pictorial structures, are of one the basic models on which other efficient models are built by slight modification. One such example is the flexible mixture model which reduces the database of hundreds or thousands of deformed parts by exploiting the notion of local rigidity.

Articulated model with quaternion

The kinematic skeleton is constructed by a tree-structured chain, as illustrated in the Figure. Each rigid body segment has its local coordinate system that can be transformed to the world coordinate system via a 4×4 transformation matrix,
where denotes the local transformation from body segment to its parent. Each joint in the body has 3 degrees of freedom rotation. Given a transformation matrix , the joint position at the T-pose can be transferred to its corresponding position in the world coordination. In many works, the 3D joint rotation is expressed as a normalized quaternion due to its continuity that can facilitate gradient-based optimization in the parameter estimation.

Applications

Assisted living

Personal care robots may be deployed in future assisted living homes. For these robots, high-accuracy human detection and pose estimation is necessary to perform a variety of tasks, such as fall detection. Additionally, this application has a number of performance constraints.

Character animation

Traditionally, character animation has been a manual process. However, poses can be synced directly to a real-life actor through specialized pose estimation systems. Older systems relied on markers or specialized suits. Recent advances in pose estimation and motion capture have enabled markerless applications, sometimes in real time.

Intelligent driver assisting system

Car accidents account for about two percent of deaths globally each year. As such, an intelligent system tracking driver pose may be useful for emergency alerts. Along the same lines, pedestrian detection algorithms have been used successfully in autonomous cars, enabling the car to make smarter decisions.

Video games

Commercially, pose estimation has been used in the context of video games, popularized with the Microsoft Kinect sensor. These systems track the user to render their avatar in-game, in addition to performing tasks like gesture recognition to enable the user to interact with the game. As such, this application has a strict real-time requirement.

Medical Applications

Pose estimation has been used to detect postural issues such as scoliosis by analyzing abnormalities in a patient's posture, physical therapy, and the study of the cognitive brain development of young children by monitoring motor functionality.

Other applications

Other applications include video surveillance, animal tracking and behavior understanding, sign language detection, advanced human–computer interaction, and markerless motion capturing.

Related technology

A commercially successful but specialized computer vision-based articulated body pose estimation technique is optical motion capture. This approach involves placing markers on the individual at strategic locations to capture the 6 degrees-of-freedom of each body part.

Research groups

A number of groups and companies are researching pose estimation, including groups at Brown University, Carnegie Mellon University, MPI Saarbruecken, Stanford University, the University of California, San Diego, the University of Toronto, the École Centrale Paris, ETH Zurich, National University of Sciences and Technology, and the University of California, Irvine.

Companies

At present, several companies are working on articulated body pose estimation.

Bodylabs: Bodylabs is a Manhattan-based software provider of human-aware artificial intelligence.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...