Idiolect is an individual's distinctive and unique use of language, including speech. This unique usage encompasses vocabulary, grammar, and pronunciation. An idiolect is the variety of language unique to an individual. This differs from a dialect, a common set of linguistic characteristics shared among some group of people. The term is etymologically related to the Greek prefix idio- and -lect, abstracted from dialect, and ultimately from Ancient Greek légō.
Language
Language consists of sentence constructs, word choice, expression of style. Meanwhile, Idiolect is an individual's personal use of all of these facets. Idiolect reflects how every individual uniquely utilizes the many different facets of language to create their own individual way of speaking. Every person has a unique idiolect depending on their language, socioeconomic status, and geographical location. Forensic linguistics is able to analyze these individual unique uses of language to come to conclusions within cases. The notion of language is used as an abstract description of the language use, and of the abilities of individual speakers and listeners. According to this view, a language is an "ensemble of idiolects... rather than an entity per se". Linguists study particular languages, such as English or Xhosa, by examining the utterances produced by the people who speak the language. This contrasts with a view among non-linguists, at least in the United States, that languages as ideal systems exist outside the actual practice of language users: Based on work done in the US, Nancy Niedzielski and Dennis Preston describe a language ideology that appears to be common among American English speakers. According to Niedzielski and Preston, many of their subjects believe that there is one "correct" pattern of grammar and vocabulary that underlies Standard English, and that individual usage comes from this external system. Linguists who understand particular languages as a composite of unique, individual idiolects must nonetheless account for the fact that members of large speech communities, and even speakers of different dialects of the same language, can understand one another. All human beings seem to produce language in essentially the same way. This has led to searches for universal grammar, as well as attempts to further define the nature of particular languages.
Forensic linguistics
Forensic linguistics includes attempts to identify whether a person produced a given text by comparing the style of the text with the idiolect of the individual in question. The forensic linguist may conclude that the text is consistent with the individual, rule out the individual as the author, or deem the comparison inconclusive. In 1995 Max Appedole relied in part on an analysis of Rafael Sebastián Guillén Vicente's writing style to identify him as Subcomandante Marcos, a leader of the Zapatista movement. Although the Mexican government regarded Subcomandante Marcos as a dangerous guerilla, Appedole convinced the government that Guillén was a pacifist. Appedole's analysis is considered an early success in the application of forensic linguistics to criminal profiling in law enforcement. In 1998 Ted Kaczynski was identified as the "Unabomber" by means of forensic linguistics. The FBI and Attorney GeneralJanet Reno pushed for the publication of an essay of Kaczynski's, which led to a tip-off from Kaczynski's brother, who recognized the writing style, his idiolect. In 1978 four men were accused and convicted of murdering Carl Bridgewater. There were not any forensic linguistics involved in their case at the time. Today, forensic linguistics reflects that the idiolect used in the interview of one of the men was very similar to that man's reported statement. Since idiolect is unique to an individual, forensic linguistics reflects that it is very unlikely that one of these files was not created by using the other.
Detecting Idiolect with Corpora
Idiolect analysis is different for an individual depending on whether the data being analyzed is from a corpus made up entirely from texts or audio files, since written work is more thought out in planning and precise in wording than in speech, where informal language and conversation fillers fill corpus samples. Corpora with large amounts of input data allow for the generation of word frequency and synonym lists to be generated, normally through the use of the top ten bi-grams created from it. Determining whether a word or phrase is part of an idiolect, is determined by where the word is in comparison to the window's head word, the edge of the window. This window is kept to 7 - 10 words, with a sample that is being considered as a feature of the idiolect as being possibly +5/-5 words away from the "head" word of the window. Data in corpus pertaining to idiolect get sorted into three categories irrelevant, personal discourse marker, and informal vocabulary.Samples that are at the end of the frame and far from this head word, are often ruled to be superfluous. Superfluous data then needs to be run through different functions than non superfluous data, in order to see if this word or phrase is a part of an individual's idiolect.