First normal form
First normal form is a property of a relation in a relational database. A relation is in first normal form if and only if the domain of each attribute contains only [|atomic] values, and the value of each attribute contains only a single value from that domain. The first definition of the term, in a 1971 conference paper by Edgar Codd, defined a relation to be in first normal form when none of its domains have any sets as elements.
First normal form is an essential property of a relation in a relational database. Database normalization is the process of representing a database in terms of relations in standard normal forms, where first normal is a minimal requirement.
First normal form enforces these criteria:
- Eliminate repeating groups in individual tables
- Create a separate table for each set of related data
- Identify each set of related data with a primary key
Examples
Designs that violate 1NF
Below is a table that stores the names and telephone numbers of customers. One requirement though is to retain telephone numbers for some customers. The simplest way of satisfying this requirement is to allow the "Telephone Number" column in any given row to contain more than one value:Customer ID | First Name | Surname | Telephone Number |
123 | Pooja | Singh | 555-861-2025, 192-122-1111 |
456 | San | Zhang | 403-1659 Ext. 53; 182-929-2929 |
789 | John | Doe | 555-808-9633 |
The telephone number column contains multiple phone numbers in a single value. For example, the first row has two telephone numbers separated by a comma. The column values are not atomic: it can be subdivided into two numbers. This violates first normal form.
An apparent solution is to introduce more columns:
Customer ID | First Name | Surname | Telephone Number1 | Telephone Number2 |
123 | Pooja | Singh | 555-861-2025 | 192-122-1111 |
456 | San | Zhang | 403-1659 Ext. 53 | 182-929-2929 |
789 | John | Doe | 555-808-9633 |
Technically, this table does not violate the requirement for values to be atomic. However, informally, the two telephone number columns still form a "repeating group": they repeat what is conceptually the same attribute, namely a telephone number. An arbitrary and hence meaningless ordering has been introduced: why is 555-861-2025 put into the Telephone Number1 column rather than the Telephone Number2 column? There's no reason why customers could not have more than two telephone numbers, so how many Telephone NumberN columns should there be? It is not possible to search for a telephone number without searching an arbitrary number of columns. Adding an extra telephone number may require the table to be reorganized by the addition of a new column rather than just having a new row added.
Designs that comply with 1NF
To bring the model into the first normal form, we split the strings we used to hold our telephone number information into "atomic" entities: single phone numbers. And we ensure no row contains more than one phone number.Customer ID | First Name | Surname | Telephone Number |
123 | Pooja | Singh | 555-861-2025 |
123 | Pooja | Singh | 192-122-1111 |
456 | San | Zhang | 182-929-2929 |
456 | San | Zhang | 403-1659 Ext. 53 |
789 | John | Doe | 555-808-9633 |
Note that the "ID" is no longer unique in this solution with duplicated customers. To uniquely identify a row, we need to use a combination of. The value of the combination is unique although each column separately contains repeated values. Being able to uniquely identify a row is a requirement of 1NF.
An alternative design uses two tables:
Columns do not contain more than one telephone number in this design. Instead, each Customer-to-Telephone Number link appears on its own row. Using Customer ID as key, a one-to-many relationship exists between the name and the number tables. A row in the "parent" table, Customer Name, can be associated with many telephone number rows in the "child" table, Customer Telephone Number, but each telephone number belongs to one, and only one customer. It is worth noting that this design meets the additional requirements for second and third normal form. Atomicity's definition of 1NF makes reference to the concept of 'atomicity'. Codd states that the "values in the domains on which each relation is defined are required to be atomic with respect to the DBMS." Codd defines an atomic value as one that "cannot be decomposed into smaller pieces by the DBMS " meaning a column should not be divided into parts with more than one kind of data in it such that what one part means to the DBMS depends on another part of the same column.Hugh Darwen and Chris Date have suggested that Codd's concept of an "atomic value" is ambiguous, and that this ambiguity has led to widespread confusion about how 1NF should be understood. In particular, the notion of a "value that cannot be decomposed" is problematic, as it would seem to imply that few, if any, data types are atomic:
1NF tables as representations of relationsAccording to Date's definition, a table is in first normal form if and only if it is "isomorphic to some relation", which means, specifically, that it satisfies the following five conditions:Violation of any of these conditions would mean that the table is not strictly relational, and therefore that it is not in first normal form. Examples of tables that would not meet this definition of first normal form are:
|