Definite assignment analysis

In computer science, definite assignment analysis is a data-flow analysis used by compilers to conservatively ensure that a variable or location is always assigned before it is used.

Motivation

In C and C++ programs, a source of particularly difficult-to-diagnose errors is the nondeterministic behavior that results from reading uninitialized variables; this behavior can vary between platforms, builds, and even from run to run.
There are two common ways to solve this problem. One is to ensure that all locations are written before they are read. Rice's theorem establishes that this problem cannot be solved in general for all programs; however, it is possible to create a conservative analysis that will accept only programs that satisfy this constraint, while rejecting some correct programs, and definite assignment analysis is such an analysis. The Java and C# programming language specifications require that the compiler report a compile-time error if the analysis fails. Both languages require a specific form of the analysis that is spelled out in meticulous detail. In Java, this analysis was formalized by Stärk et al., and some correct programs are rejected and must be altered to introduce explicit unnecessary assignments. In C#, this analysis was formalized by Fruja, and is precise as well as sound, in the sense that all variables assigned along all control flow paths will be considered definitely assigned. The Cyclone language also requires programs to pass a definite assignment analysis, but only on variables with pointer types, to ease porting of C programs.
The second way to solve the problem is to automatically initialize all locations to some fixed, predictable value at the point at which they are defined, but this introduces new assignments that may impede performance. In this case, definite assignment analysis enables a compiler optimization where redundant assignments — assignments followed only by other assignments with no possible intervening reads — can be eliminated. In this case, no programs are rejected, but programs for which the analysis fails to recognize definite assignment may contain redundant initialization. The Common Language Infrastructure relies on this approach.

Terminology

A variable or location can be said to be in one of three states at any given point in the program:

Definitely assigned: The variable is known with certainty to be assigned.
Definitely unassigned: The variable is known with certainty to be unassigned.
Unknown: The variable may be assigned or unassigned; the analysis is not precise enough to determine which.
The analysis

The following is based on Fruja's formalization of the C# intraprocedural definite assignment analysis, which is responsible for ensuring that all local variables are assigned before they are used. It simultaneously does definite assignment analysis and constant propagation of boolean values. We define five static functions:

Name	Domain	Description
before	All statements and expressions	Variables definitely assigned before the evaluation of the given statement or expression.
after	All statements and expressions	Variables definitely assigned after the evaluation of the given statement or expression, assuming it completes normally.
vars	All statements and expressions	All variables available in the scope of the given statement or expression.
true	All boolean expressions	Variables definitely assigned after the evaluation of the given expression, assuming the expression evaluates to true.
false	All boolean expressions	Variables definitely assigned after the evaluation of the given expression, assuming the expression evaluates to false.

We supply data-flow equations that define the values of these functions on various expressions and statements, in terms of the values of the functions on their syntactic subexpressions. Assume for the moment that there are no goto, break, continue, return, or exception handling statements. Following are a few examples of these equations:

Any expression or statement e that does not affect the set of variables definitely assigned: after = before
Let e be the assignment expression loc = v. Then before = before, and after = after U.
Let e be the expression true. Then true = before and false = vars. In other words, if e evaluates to false, all variables are definitely assigned, because e does not evaluate to false.
Since method arguments are evaluated left to right, before = after. After a method completes, out parameters are definitely assigned.
Let s be the conditional statement if s₁ else s₂. Then before = before, before = true, before = false, and after = after intersect after.
Let s be the while loop statement while s₁. Then before = before, before = true, and after = false.
And so on.

At the beginning of the method, no local variables are definitely assigned. The verifier repeatedly iterates over the abstract syntax tree and uses the data-flow equations to migrate information between the sets until a fixed point can be reached. Then, the verifier examines the before set of every expression that uses a local variable to ensure that it contains that variable.
The algorithm is complicated by the introduction of control-flow jumps like goto, break, continue, return, and exception handling. Any statement that can be the target of one of these jumps must intersect its before set with the set of definitely assigned variables at the jump source. When these are introduced, the resulting data flow may have multiple fixed points, as in this example:

int i = 1;
L:
goto L;

Since the label L can be reached from two locations, the control-flow equation for goto dictates that before = after intersect before. But before = before, so before = after intersect before. This has two fixed-points for before, and the empty set. However, it can be shown that because of the monotonic form of the data-flow equations, there is a unique maximal fixed point that provides the most possible information about the definitely assigned variables. Such a maximal fixed point may be computed by standard techniques; see data-flow analysis.
An additional issue is that a control-flow jump may render certain control flows infeasible; for example, in this code fragment the variable i is definitely assigned before it is used:

int i;
if return; else i = j;
print;

The data-flow equation for if says that after = after intersect after. To make this work out correctly, we define after = vars for all control-flow jumps; this is vacuously valid in the same sense that the equation false = vars is valid, because it is not possible for control to reach a point immediately after a control-flow jump.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...

Definite assignment analysis

Motivation

Terminology

The analysis