Gesture recognition

Gesture recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. Gestures can originate from any bodily motion or state but commonly originate from the face or hand. Current focuses in the field include emotion recognition from face and hand gesture recognition. Users can use simple gestures to control or interact with devices without physically touching them. Many approaches have been made using cameras and computer vision algorithms to interpret sign language. However, the identification and recognition of posture, gait, proxemics, and human behaviors is also the subject of gesture recognition techniques.
Gesture recognition can be seen as a way for computers to begin to understand human body language, thus building a richer bridge between machines and humans than primitive text user interfaces or even GUIs, which still limit the majority of input to keyboard and mouse and interact naturally without any mechanical devices. Using the concept of gesture recognition, it is possible to point a finger at this point will move accordingly. This could make conventional input on devices such and even redundant.

Overview

Gesture recognition features:

More accurate
High stability
Time saving to unlock a device

The major application areas of gesture recognition in the current scenario are:

Automotive sector
Consumer electronics sector
Transit sector
Gaming sector
To unlock smartphones
Defence
Home automation
Automated sign language translation

Gesture recognition can be conducted with techniques from computer vision and image processing.
The literature includes ongoing work in the computer vision field on capturing gestures or more general human pose and movements by cameras connected to a computer.
Gesture recognition and pen computing:
Pen computing reduces the hardware impact of a system and also increases the range of physical world objects usable for control beyond traditional digital objects like keyboards and mice. Such implementations could enable a new range of hardware that does not require monitors. This idea may lead to the creation of holographic display. The term gesture recognition has been used to refer more narrowly to non-text-input handwriting symbols, such as inking on a graphics tablet, multi-touch gestures, and mouse gesture recognition. This is computer interaction through the drawing of symbols with a pointing device cursor.

Gesture types

In computer interfaces, two types of gestures are distinguished: We consider online gestures, which can also be regarded as direct manipulations like scaling and rotating. In contrast, offline gestures are usually processed after the interaction is finished; e. g. a circle is drawn to activate a context menu.

Offline gestures: Those gestures that are processed after the user interaction with the object. An example is the gesture to activate a menu.
Online gestures: Direct manipulation gestures. They are used to scale or rotate a tangible object.
Touchless interface

Touchless user interface is an emerging type of technology in relation to gesture control. Touchless user interface is the process of commanding the computer via body motion and gestures without touching a keyboard, mouse, or screen. Touchless interface in addition to gesture controls are becoming widely popular as they provide the abilities to interact with devices without physically touching them.

Types of touchless technology

There are a number of devices utilizing this type of interface such as, smartphones, laptops, games, television, and music equipment.
One type of touchless interface uses the bluetooth connectivity of a smartphone to activate a company's visitor management system. This prevents having to touch an interface during the COVID-19 pandemic.

Input devices

The ability to track a person's movements and determine what gestures they may be performing can be achieved through various tools. The kinetic user interfaces are an emerging type of user interfaces that allow users to interact with computing devices through the motion of objects and bodies. Examples of KUIs include tangible user interfaces and motion-aware games such as Wii and Microsoft's Kinect,and other interactive projects.
Although there is a large amount of research done in image/video based gesture recognition, there is some variation within the tools and environments used between implementations.

Wired gloves. These can provide input to the computer about the position and rotation of the hands using magnetic or inertial tracking devices. Furthermore, some gloves can detect finger bending with a high degree of accuracy, or even provide haptic feedback to the user, which is a simulation of the sense of touch. The first commercially available hand-tracking glove-type device was the DataGlove, a glove-type device which could detect hand position, movement and finger bending. This uses fiber optic cables running down the back of the hand. Light pulses are created and when the fingers are bent, light leaks through small cracks and the loss is registered, giving an approximation of the hand pose.
Depth-aware cameras. Using specialized cameras such as structured light or time-of-flight cameras, one can generate a depth map of what is being seen through the camera at a short range, and use this data to approximate a 3d representation of what is being seen. These can be effective for detection of hand gestures due to their short range capabilities.
Stereo cameras. Using two cameras whose relations to one another are known, a 3d representation can be approximated by the output of the cameras. To get the cameras' relations, one can use a positioning reference such as a lexian-stripe or infrared emitters. In combination with direct motion measurement gestures can directly be detected.
Gesture-based controllers. These controllers act as an extension of the body so that when gestures are performed, some of their motion can be conveniently captured by software. An example of emerging gesture-based motion capture is through skeletal hand tracking, which is being developed for virtual reality and augmented reality applications. An example of this technology is shown by tracking companies uSens and Gestigon, which allow users to interact with their surrounding without controllers.

Another example of this is mouse gesture trackings, where the motion of the mouse is correlated to a symbol being drawn by a person's hand which can study changes in acceleration over time to represent gestures. The software also compensates for human tremor and inadvertent movement. The sensors of these smart light emitting cubes can be used to sense hands and fingers as well as other objects nearby, and can be used to process data. Most applications are in music and sound synthesis, but can be applied to other fields.

Single camera. A standard 2D camera can be used for gesture recognition where the resources/environment would not be convenient for other forms of image-based recognition. Earlier it was thought that single camera may not be as effective as stereo or depth aware cameras, but some companies are challenging this theory. Software-based gesture recognition technology using a standard 2D camera that can detect robust hand gestures.
Algorithms

Depending on the type of the input data, the approach for interpreting a gesture could be done in different ways. However, most of the techniques rely on key pointers represented in a 3D coordinate system. Based on the relative motion of these, the gesture can be detected with a high accuracy, depending on the quality of the input and the algorithm's approach.

In order to interpret movements of the body, one has to classify them according to common properties and the message the movements may express. For example, in sign language each gesture represents a word or phrase.
Some literature differentiates 2 different approaches in gesture recognition: a 3D model based and an appearance-based. The foremost method makes use of 3D information of key elements of the body parts in order to obtain several important parameters, like palm position or joint angles. On the other hand, Appearance-based systems use images or videos for direct interpretation.

3D model-based algorithms

The 3D model approach can use volumetric or skeletal models, or even a combination of the two. Volumetric approaches have been heavily used in computer animation industry and for computer vision purposes. The models are generally created from complicated 3D surfaces, like NURBS or polygon meshes.
The drawback of this method is that it is very computational intensive, and systems for real time analysis are still to be developed. For the moment, a more interesting approach would be to map simple primitive objects to the person's most important body parts and analyse the way these interact with each other. Furthermore, some abstract structures like super-quadrics and generalised cylinders may be even more suitable for approximating the body parts.

Skeletal-based algorithms

Instead of using intensive processing of the 3D models and dealing with a lot of parameters, one can just use a simplified version of joint angle parameters along with segment lengths. This is known as a skeletal representation of the body, where a virtual skeleton of the person is computed and parts of the body are mapped to certain segments. The analysis here is done using the position and orientation of these segments and the relation between each one of them
Advantages of using skeletal models:

Algorithms are faster because only key parameters are analyzed.
Pattern matching against a template database is possible
Using key points allows the detection program to focus on the significant parts of the body

Appearance-based models

These models don't use a spatial representation of the body anymore, because they derive the parameters directly from the images or videos using a template database. Some are based on the deformable 2D templates of the human parts of the body, particularly hands. Deformable templates are sets of points on the outline of an object, used as interpolation nodes for the object's outline approximation. One of the simplest interpolation function is linear, which performs an average shape from point sets, point variability parameters and external deformators. These template-based models are mostly used for hand-tracking, but could also be of use for simple gesture classification.
A second approach in gesture detecting using appearance-based models uses image sequences as gesture templates. Parameters for this method are either the images themselves, or certain features derived from these. Most of the time, only one or two views are used.

Electromyography-based models

concerns the study of electrical signals produced by muscles in the body. Through classification of data received from the arm muscles, it is possible to classify the action and thus input the gesture to an external software. Consumer EMG devices allow for non-invasive approaches such as an arm or leg band, and connect via bluetooth. Due to this, EMG has an advantage over visual methods since the user does not need to face a camera to give input, enabling more freedom of movement.

Challenges

There are many challenges associated with the accuracy and usefulness of gesture recognition software. For image-based gesture recognition there are limitations on the equipment used and image noise. Images or video may not be under consistent lighting, or in the same location. Items in the background or distinct features of the users may make recognition more difficult.
The variety of implementations for image-based gesture recognition may also cause issue for viability of the technology to general usage. For example, an algorithm calibrated for one camera may not work for a different camera. The amount of background noise also causes tracking and recognition difficulties, especially when occlusions occur. Furthermore, the distance from the camera, and the camera's resolution and quality, also cause variations in recognition accuracy.
In order to capture human gestures by visual sensors, robust computer vision methods are also required,
for example for hand tracking and hand posture recognition or for capturing movements of the head, facial expressions or gaze direction.

Social Acceptability

One significant challenge to the adoption of gesture interfaces on consumer mobile devices such as smartphones and smartwatches stems from the social acceptability implications of gestural input. While gestures can facilitate fast and accurate input on many novel form-factor computers, their adoption and usefulness is often limited by social factors rather than technical ones. To this end, designers of gesture input methods may seek to balance both technical considerations and user willingness to perform gestures in different social contexts. In addition, different device hardware and sensing mechanisms support different kinds of recognizable gestures.

Mobile Device

Gesture interfaces on mobile and small form-factor devices are often supported by the presence of motion sensors such as inertial measurement units. On these devices, gesture sensing relies on users performing movement-based gestures capable of being recognized by these motion sensors. This can potentially make capturing signal from subtle or low-motion gestures challenging, as they may become difficult to distinguish from natural movements or noise. Through a survey and study of gesture usability, researchers found that gestures that incorporate subtle movement, which appear similar to existing technology, look or feel similar to every actions, and which are enjoyable were more likely to be accepted by users, while gestures that look strange, are uncomfortable to perform, interferes with communication, or involves uncommon movement caused users more likely to reject their usage. The social acceptability of mobile device gestures rely heavily on the naturalness of the gesture and social context.

On-Body and Wearable Computers

s typically differ from traditional mobile devices in that their usage and interaction location takes place on the user's body. In these contexts, gesture interfaces may become preferred over traditional input methods, as their small size renders touch-screens or keyboards less appealing. Nevertheless, they share many of the same social acceptability obstacles as mobile devices when it comes to gestural interaction. However, the possibility of wearable computers to be hidden from sight or integrated in other everyday objects, such as clothing, allow gesture input to mimic common clothing interactions, such as adjusting a shirt collar or rubbing one's front pant pocket. A major consideration for wearable computer interaction is the location for device placement and interaction. A study exploring third-party attitudes towards wearable device interaction conducted across the United States and South Korea found differences in the perception of wearable computing use of males and females, in part due to different areas of the body considered as socially sensitive. Another study investigating the social acceptability of on-body projected interfaces found similar results, with both studies labelling areas around the waist, groin, and upper body to be least acceptable while areas around the forearm and wrist to be most acceptable.

Public Installations

, such as interactive public displays, allow access to information and displaying interactive media in public settings such as museums, galleries, and theaters. While touch screens are a frequent form of input for public displays, gesture interfaces provide additional benefits such as improved hygiene, interaction from a distance, improved discoverability, and may favor performative interaction. An important consideration for gestural interaction with public displays is the high probability or expectation of a spectator audience.

"Gorilla arm"

"Gorilla arm" was a side-effect of vertically oriented touch-screen or light-pen use. In periods of prolonged use, users' arms began to feel fatigue and/or discomfort. This effect contributed to the decline of touch-screen input despite initial popularity in the 1980s.
In order to measure arm fatigue and the gorilla arm side effect, researchers developed a technique called Consumed Endurance.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...