Homoglyph

In orthography and typography, a homoglyph is one of two or more graphemes, characters, or glyphs with shapes that appear identical or very similar. The designation is also applied to sequences of characters sharing these properties.
Synoglyphs are glyphs that look different but mean the same thing. Synoglyphs are also known informally as display variants. The term homograph is sometimes used synonymously with homoglyph, but in the usual linguistic sense, homographs are words that are spelled the same but have different meanings, a property of words, not characters.
In 2008, the Unicode Consortium published its Technical Report #36 on a range of issues deriving from the visual similarity of characters both in single scripts, and similarities between characters in different scripts.
An example of homoglyphic confusion in a historical regard results from the use of a 'y' to represent a 'þ' when setting older English texts in typefaces that do not contain the latter character. It has led in modern times to such phenomena as Ye olde shoppe, implying incorrectly that the word the was formerly written ye. For further discussion, see thorn.
Examples of homoglyphic symbols are the diaeresis and umlaut ; and the hyphen and minus sign. Among digits and letters, digit 1 and lowercase l are always encoded separately but in many fonts are given very similar glyphs, and digit 0 and capital O are always encoded separately but in many fonts are given very similar glyphs. Virtually every example of a homoglyphic pair of characters can potentially be differentiated graphically with clearly distinguishable glyphs and separate code points, but this is not always done. Typefaces that do not emphatically distinguish the one/el and zero/oh homoglyphs are considered unsuitable for writing formulas, URLs, source code, IDs and other text where characters cannot always be differentiated without context. Fonts which distinguish glyphs by means of a slashed zero, for example, are preferred for those uses.

Umlaut and diaresis

In the days of mechanical typewriters these were typed with the same key, which was also used for a double inverted comma. However the umlaut originated specifically as a pair of short vertical lines . Incidentally the two dots above the letter E in Albanian are described as a diaresis but do not fulfil the function of a diaresis.

0 and O; 1, l and I

Two common and important sets of homoglyphs in use today are the digit zero and the capital letter O ; and the digit one, the lowercase letter L and the uppercase i. In the early days of mechanical typewriters there was very little or no visual difference between these glyphs, and typists treated them interchangeably as keyboarding shortcuts. In fact, most keyboards did not even have a key for the digit "1", requiring users to type the letter "l" instead, and some also omitted 0. As these same typists transitioned in the 1970s and 1980s to being computer keyboard operators, their old keyboarding habits continued with them, and was an occasional source of confusion.
Most current type designs carefully distinguish between these homoglyphs, usually by drawing the digit zero narrower and drawing the digit one with prominent serifs. Early computer print-outs went even further and marked the zero with a slash or dot; which led to a new conflict involving the Scandinavian letter "Ø" and the Greek letter Φ. The redesigning of character types to differentiate these characters has meant less confusion. The degree to which two different characters appear the same to a given observer is called the "visual similarity".

Multi-letter homoglyphs

Some other combinations of letters look similar, for instance rn looks similar to m, cl looks similar to d, and vv looks similar to w.
In certain narrow-spaced fonts, placing the letter c next to a letter such as j, l or i will create a homoglyph, such as Tahoma, sans-serif;">cj cl ci.
When some characters are placed next to each other, seen together at a glance they give the visual impression of another, unrelated character. A more precise way of saying this is that some typographic ligatures can look similar to standalone glyphs. For example, the fi ligature can look similar to A in some typefaces or fonts. This potential for confusion is sometimes an argument made against the use of ligatures.

Unicode homoglyphs

The Unicode character set contains many strongly homoglyphic characters, known as "confusables". These present security risks in a variety of situations and have recently been called to particular attention in regard to internationalized domain names. One might deliberately spoof a domain name by replacing one character with its homoglyph, thus creating a second domain name, not readily distinguishable from the first, that can be exploited in phishing. In many fonts the Greek letter 'Α', the Cyrillic letter 'А' and the Latin letter 'A' are visually identical, as are the Latin letter 'a' and the Cyrillic letter 'а'. A domain name can be spoofed simply by substituting one of these forms for another in a separately registered name. There are also many examples of near-homoglyphs within the same script such as 'í' and 'i', É and Ė and È, Í and ĺ. When discussing this specific security issue, any two sequences of similar characters may be assessed in terms of its potential to be taken as a 'homoglyph pair', or if the sequences clearly appear to be words, as 'pseudo-homographs'. In the Chinese language, many simplified Chinese characters are homoglyphs of the corresponding traditional Chinese characters.
Efforts by TLD registries and Web browser designers are under way to minimize the risks of homoglyphic confusion. Commonly, this is achieved by prohibiting names which mix character sets from multiple languages ; Canada's.ca registry goes one step further by requiring names which differ only in diacritics to have the same owner and same registrar. The handling of Chinese characters varies: in.org and.info registration of one variant renders the other unavailable to anyone, while in.biz the traditional and simplified versions of the same name are delivered as a two-domain bundle which both point to the same domain name server.
Relevant documentation will be found both on the developers' Web sites, and on an IDN Forum provided by ICANN.

Canonicalization

Homoglyphs of all kinds can be detected through a process called 'dual canonicalization'. The first step in this process is to identify homoglyph sets, namely characters appearing the same to a given observer. From here, a single token is specified to represent the homoglyph set. This token is called a canon. The next step is to convert each character in the text to the corresponding canon in a process called canonicalization. If the canons of two runs of text are the same but the original text is different, then a homoglyph exists in the text.

Popular movies

The Hunger Games (film) - 2012 American dystopian action thriller science fiction-adventure film directed by Gary Ross and based on Suzanne Collins’s 2008 novel of the same name. It is the first insta...
untitled Captain Marvel sequel - part of Marvel Cinematic Universe....
Killers of the Flower Moon (film project) - Killers of the Flower Moon - film project in United States of America. It was presented as drama, detective fiction, thriller. The film project starred Leonardo Dicaprio, Robert De Niro. Director of...
Five Nights at Freddy's (film) - Five Nights at Freddy's - film published in 2017 in United States of America. Scenarist of the film - Scott Cawthon....

Popular books

Book of Revelation - The Book of Revelation is the final book of the New Testament, and consequently is also the final book of the Christian Bible. Its title is derived from the first word of the Koine Greek text: apok...
Book of Genesis - account of the creation of the world, the early history of humanity, Israel's ancestors and the origins...
Gospel of Matthew - The Gospel According to Matthew is the first book of the New Testament and one of the three synoptic gospels. It tells how Israel's Messiah, rejected and executed in Israel, pronounces judgement on ...
Michelin Guide - Michelin Guides are a series of guide books published by the French tyre company Michelin for more than a century. The term normally refers to the annually published Michelin Red Guide , the oldest...
Psalms - The Book of Psalms , commonly referred to simply as Psalms , the Psalter or "the Psalms", is the first book of the Ketuvim , the third section of the Hebrew Bible, and thus a book of th...
Ecclesiastes - Ecclesiastes is one of 24 books of the Tanakh , where it is classified as one of the Ketuvim . Originally written c. 450–200 BCE, it is also among the canonical Wisdom literature of the Old Tes...
The 48 Laws of Power - non-fiction book by American author Robert Greene. The book...

Popular television series

The Crown (TV series) - historical drama web television series about the reign of Queen Elizabeth II, created and principally written by Peter Morgan, and produced by Left Bank Pictures and Sony Pictures Tel...
Friends - American sitcom television series, created by David Crane and Marta Kauffman, which aired on NBC from September 22, 1994, to May 6, 2004, lasting ten seasons. With an ensemble cast sta...
Young Sheldon - spin-off prequel to The Big Bang Theory and begins with the character Sheldon...
Modern Family - American television mockumentary family sitcom created by Christopher Lloyd and Steven Levitan for the American Broadcasting Company. It ran for eleven seasons, from September 23...
Loki (TV series) - upcoming American web television miniseries created for Disney+ by Michael Waldron, based on the Marvel Comics character of the same name. It is set in the Marvel Cinematic Universe, shar...
Game of Thrones - American fantasy drama television series created by David Benioff and D. B. Weiss for HBO. It...
Shameless (American TV series) - American comedy-drama television series developed by John Wells which debuted on Showtime on January 9, 2011. It...