Subject (documents)


In library and information science documents are classified and searched by subject – as well as by other attributes such as author, genre and document type. This makes "subject" a fundamental term in this field. Library and information specialists assign subject labels to documents to make them findable. There are many ways to do this and in general there is not always consensus about which subject should be assigned to a given document. To optimize subject indexing and searching, we need to have a deeper understanding of what a subject is. The question: "what is to be understood by the statement 'document A belongs to subject category X'?" has been debated in the field for more than 100 years.

Definition

Hjørland defined subjects as the epistemological potentials of documents. This definition is in line with the request oriented understanding of indexing quoted [|below]. The idea is that a document is assigned a subject to ease retrieval and findability. And the criteria for what should be found – what constitutes knowledge – is in the end an epistemological question.

Theoretical view

Charles Ammi Cutter (1837–1903)

For Cutter the stability of subjects depends on a social process in which their meaning is stabilized in a name or a designation. A subject "referred to those intellections that had received a name that itself represented a distinct consensus in usage" and: the "systematic structure of established subjects" is "resident in the public realm" ; "ubjects are by their very nature locations in a classificatory structure of publicly accumulated knowledge. Bernd Frohmann adds:
"The stability of the public realm in turn relies upon natural and objective mental structures which, with proper education, govern a natural progression from particular to general concepts.
Since for Cutter, mind, society, and SKO stand one behind the other, each supporting each, all manifesting the same structure, his discursive construction of subjects invites connections with discourses of mind, education, and society. The Dewey Decimal Classification, by contrast, severs those connections. Melvil Dewey emphasized more than once that his system maps no structure beyond its own; there is neither a "transcendental deduction" of its categories nor any reference to Cutter's objective structure of social consensus. It is content-free: Dewey disdained any philosophical excogitation of the meaning of his class symbols, leaving the job of finding verbal equivalents to others. His innovation and the essence of the system lay in the notation. The DDC is a poorly semiotic system of expanding nests of ten digits, lacking any referent beyond itself. In it, a subject is wholly constituted in terms of its position in the system. The essential characteristic of a subject is a class symbol which refers only to other symbols. Its verbal equivalent is accidental, a merely pragmatic characteristic...
....
The conflict of interpretations over "subjects" became explicit in the battles between "bibliography" and Dewey's "close classification". William Fletcher spoke for the scholarly bibliographer.... Fletcher's "subjects", like Cutter's, referred to the categories of a fantasized, stable social order, whereas Dewey's subjects were elements of a semiological system of standardized, techno-bureaucratic administrative software for the library in its corporate, rather than high culture, incarnation"..
Cutter's early view on what a subject is, is probably wiser than most understandings that dominated the 20th century – and also the understanding reflected in the ISO-standard quoted below. The early statements quoted by Frohmann indicate that subjects are somehow shaped in social processes. When that is said, it should be added that they are not particularly detailed or clear. We only get a vague idea of the social nature of subjects.

S. R. Ranganathan (1892–1972)

A system, which has en explicit theoretical foundation is Ranganathan's Colon Classification. Ranganathan provided an explicit definition of the concept of "subject":
A related definition is given by one of Ranganathan's students:
Ranganathan's definition of "subject" is strongly influenced by his Colon Classification system. The colon system is based on the combination of single elements from facets to subject designation. This is the reason why the combined nature of subjects are emphasized so strongly. It leads, however, to absurdities such as the claim that gold cannot be a subject. This aspect of the theory has been criticized by Metcalfe. Metcalfe's skepticism regarding Ranganathan's theory is formulated in hard words : "This pseudo-science imposed itself on British disciples from about 1950 on...".
It seems unacceptable that Ranganathan defines the word subject in a way that favors his own system. A scientific concept like "subject" should make it possible to compare different ways of establishing access to information. Whether or not subjects are combined or not should be examined once their definition has been given, it should not determined a priori, in the definition.
Besides the emphasis on the combined, organizing and systematizing nature of subjects contains Ranganathan's definition of subject the pragmatic demand, that a subject should be determined in a way that suits a normal person's competency or specialization. Again we see a strange kind of wishful thinking mixing a general understanding of a concept with demands put by his own specific system. One thing is what the word subject means, quite another issue is how to provide subject descriptions that fulfill demands such as the specificity of a given information retrieval language which fulfill demands put on the system, such as precision and recall. If researchers too often define terms in ways that favor specific kinds of systems, that are such definitions not useful to provide more general theories about subjects, subject analysis and IR. Among other things are comparative studies of different kinds of systems made difficult.
Based on these arguments we may conclude that Ranganathan's definition of the concept "subject" is not suited for scientific use. Like the definition of "subject" given by the ISO-standard for topic maps may Ranganathan's definition be useful within his own closed system. The purpose of a scientific and scholarly field is, however, to examine the relative fruitfulness of systems such as topic maps and Colon classification. For such purpose is another understanding of "subject" necessary.

Patrick Wilson (1927–2003)

In his book Wilson examined – in particular by thought experiments – the suitability of different methods of examining the subject of a document. The methods were:
Patrick Wilson shows convincingly that each of these methods are insufficient to determine the subject of a document and is led to conclude : "The notion of the subject of a writing is indeterminate..." or, on p. 92 : "For nothing definite can be expected of the things found at any given position". In connection to the last quote has Wilson an interesting footnote in which he writes that authors of documents often use terms in ambiguous ways. Even if the librarian could personally develop a very precise understanding of a concept, he would be unable to use it in his classification, because none of the documents use the term in the same precise way. Based on this argumentation is Wilson led to conclude: "If people write on what are for them ill-defined phenomena, a correct description of their subjects must reflect the ill-definedness".
Wilson's concept of subject was discussed by Hjørland who found that it is problematic to give up the precise understanding of such a basic term in LIS. Wilson's arguments led him to an agnostic position which Hjørland found unacceptable and unnecessary. Concerning the authors' use of ambiguous terms, the role of the subject analysis is to determine which documents would be fruitful for users to identify whether or not the documents use one or another term or whether a given term in a document is used in one or another meaning. Clear and relevant concepts and distinctions in classification systems and controlled vocabularies may be fruitful even if they are applied to documents with ambiguous terminology.

"Content oriented" versus "request oriented" views

Request oriented indexing is indexing in which the anticipated request from users is influencing how documents are being indexed. The indexer ask himself: "Under which descriptors should this entity be found?" and "think of all the possible queries and decide for which ones the entity at hand is relevant".
Request oriented indexing may be indexing that is targeted towards a particular audience or user group. For example, a library or a database for feminist studies may index documents different compared to a historical library. It is probably better, however, to understand request oriented indexing as policy based indexing: The indexing is done according to some ideals and reflects the purpose of the library or database doing the indexing. In this way it is not necessarily a kind of indexing based on user studies. Only if empirical data about use or users are applied should request oriented indexing be regarded as a user-based approach.

The subject knowledge view

Rowley & Hartley wrote "In order to achieve good consistent indexing, the indexer must have a through appreciation of the structure of the subject and the nature of the contribution that the document is making to the advancement of knowledge within a particular discipline". This is accordance with Hjørland's definition given above.

Other views and definitions

In the ISO-standard for topic maps the concept of subject is defined this way:
"Subject
Anything whatsoever, regardless of whether it exists or has any other specific characteristics, about which anything whatsoever may be asserted by any means whatsoever." ISO 13250-1, here cited from draft: http://www1.y12.doe.gov/capabilities/sgml/sc34/document/0446.htm#overview)
This definition may work well with the closed system of concepts provided by the topic maps standard. In broader contexts, however, is not fruitful because it does not contain any specification of what to identify in a document or in a discourse when ascribing subject identification terms or symbols to it. If different methods of subject analysis imply different results, which of these results can then be said to reflect the subject?. Different persons may have different opinions about what the subject of a specific document is. How can a theoretical understanding of the term "subject" be helpful deciding principles of subject analysis?

Related concepts

Indexing words versus concepts versus subjects

A proposal for the differentiation between concept indexing and subject indexing was given by Bernier. In his opinion subject indexes are different from, and can be contrasted with, indexes to concepts, topics and words. Subjects are what authors are working and reporting on. A document can have the subject of Chromatography if this is what the author wishes to inform about. Papers using Chromatography as a
research method or discussing it in a subsection do not have Chromatography as subjects. Indexers can easily drift into indexing concepts and words rather than subjects, but this is not good indexing. Bernier does not, however, differentiate author's subjects from those of the information seeker. A user may want a document about a subject, which is different from the one intended by its author. From the point of view of information systems, the subject of a document is related to the questions that the document can answer for the users.
Hjørland & Nicolaisen investigated the concept of subject in relation to Bradford's law of scattering and made a distinction between three kinds of scattering:
"The FRSAR Working Group is aware that some controlled vocabularies provide terminology to express other aspects of works in addition to subject. While very important and the focus of many user queries, these aspects describe isness or what class the work belongs to based on form or genre rather than what the work is about.".

Ofness

"Those LIS authors who have focused on the subjects of visual resources, such as artworks and photographs, have often been concerned with how to distinguish between the "aboutness" and the "ofness" of such works. In this sense, "aboutness" has a narrower meaning than that used above. A painting of a sunset over San Francisco, for instance, might be analyzed as being "of" sunsets and "of" San Francisco, but also "about" the passage of time.".
See also: Baca & Harpring and Shatford.