Content

Maria da Graça Simões, Daniel Martínez-Ávila, Blanca Rodríguez-Bravo, Patricia de Almeida, Isadora Victorino Evangelista, Approaches to the concepts of exhaustivity and specificity in ISKO International meeting proceedings: 2000-2017 in:

Fernanda Ribeiro, Maria Elisa Cerveira (Ed.)

Challenges and Opportunities for Knowledge Organization in the Digital Age, page 58 - 65

Proceedings of the Fifteenth International ISKO Conference 9-11 July 2018 Porto, Portugal

1. Edition 2018, ISBN print: 978-3-95650-420-4, ISBN online: 978-3-95650-421-1, https://doi.org/10.5771/9783956504211-58

Series: Advances in Knowledge Organization, vol. 16

Bibliographic information
Maria da Graça Simões, Daniel Martínez-Ávila, Blanca Rodríguez-Bravo, Patricia de Almeida, Isadora Victorino Evangelista Approaches to the concepts of exhaustivity and specificity in ISKO International meeting proceedings: 2000-2017 Abstract We study how the concepts of exhaustivity and specificity are addressed in the publications of the ISKO international meeting proceedings for the period 2000-2017. In particular, we analyze the aspects related to thematic proximity and the methodological approaches in these publications. For the selection and analysis of the corpus we used an ad-hoc combination of techniques and methodological procedures, including content analysis and analysis of keywords in context. The results show that studies on exhaustivity and specificity are scarce and not very central in ISKO meeting proceedings, while of the most of the publications follow empirical approaches. Introduction The origins of subject indexing, as a technique for document analysis, date back to ancient times and civilizations such as Mesopotamia (Witty 1973). Subject catalogs, whose origins are also rooted in these ancient civilizations, were established and systematized with the work of Charles Ammi Cutter, whose “Rules for a printed dictionary catalog” (1904) provided guidelines for this practice (Witty 1973; Pettee 1945; Borko and Bernier 1978). In this vein, the ultimate goal of the operation of indexing in a catalog, since it is based on the analysis and representation of contents using indexing terms (Rowley 1982; Chaumier 1982; ISO 5963:1985), would be information retrieval. In the operation of indexing there are two key principles that affect the retrieval of documents: exhaustivity and specificity. The concept of exhaustivity is related to the number of subjects (factors) that are translated to concrete representative terms for a document. According to Jones (2004), the exhaustivity of a document description is the coverage of its various topics given by the terms assigned to it. For Khosh-Khui (1986), exhaustivity refers to the extent to which a document is analyzed to completely identify its contents. Anderson (2002) sees exhaustivity as the number of single concept terms that will be used to describe the topics, content or meaning of a documentary unit. For Olson and Given (2003), exhaustivity would be the breadth of representation, the number of factors indexed and concerned with the different aspects included, which leads to comprehensiveness. In this sense, exhaustivity is related to the leve l of indexable matter and how much of a given topic must be covered. According to Ogilvie and Lalmas (2006), exhaustivity measures how exhaustively an element discusses the topic. 59 Specificity, on the other hand, is the level of detail in which a particular concept is represented (ISO 5963: 1985). According to Jones (2004), the specificity of an individual term would be the level of detail at which a given concept is represented. For Khosh-Khui (1986), specificity is the extent to which a system allows precision in explicitly stating the subject contents of a document. According to Anderson (2002), specificity refers to the tightness of fit between the meaning of a term and the topic of a discussion or illustration in a text, which tends to offset or modify the impact of exhaustivity. According to Olson and Given (2003), specificity is the relative detail within the vocabular, the number of hierarchical levels defined, which increases with each level as the hierarchy becomes deeper. In this way, the level of specificity should vary according to the subject itself and the users’ needs. Mai (2004) also talks about exclusivity, stating that classes on the same level should be distinct, such that documents placed in one class could not also be placed in another class. According to Ogilvie e Lalmas (2006), specificity measures how focused an element is on the topic. According to some authors (e.g., Foskett 1977; Chaumier 1982; Langridge 1989), these two principles would be political/administrative decisions as they are based on the indexing policies of their respective information services. The different degrees of exhaustivity and specificity in the indexing of documents affect the relevance of retrieved documents, as a greater number of appropriate index terms increases the density of documents judged as relevant (Wolfram and Zhang 2002; Kim 2006). On the other hand, the concept of relevance depends on external factors, such as the user’s background knowledge (Abdulahhad et al. 2013). According to Pehcevski and Larsen (2007) relevance is the extent to which some information is pertinent, connected, or applicable to the matter at hand, representing a key concept in the fields of documentation, information science, and information retrieval. Given and Olson (2003), in a more critical way, also cite these principles as important strategies for effective information organization and retrieval in research, also relating them to the concepts of recall and precision (an aspect introduced by Richter 1984). For these authors, recall would be the amount of relevant information that is retrieved, while a maximum level of recall would mean “retrieving every last instance of a theme or variable. […] If exhaustivity is high more codes are used, which will allow more data to be retrieved and analysed” (Olson and Given 2003, 131). Specificity, on the other hand, would be related to precision. Olson and Given (2003, 130-131) point out that “precision is enhanced by high specificity, [as] narrower categories will produce fewer data in each category.” Thus, “if precision is high, then all the information retrieved is relevant and little or no irrelevant information is retrieved.” Thus, greater exhaustivity in indexing entails a high level of recall and a low level of precision; while greater specificity entails a high level of precision and a low level of recall. This means that recall and precision tend to be inversely proportional, which will have "an impact on the construction of data categories and codes for 60 analysis" (Olson and Given 2003, 131). According to Khosh-Khui (1986), exhaustivity controls recall potentialities, and specificity controls precision capabilities. Thus, it is said that exhaustivity “is recall-oriented,” while specificity is “precision-oriented” (Abdulahhad et al. 2013). Abdulahhad, Chevallet, and Berrut (2013) state that exhaustivity and specificity are “still theoretical notions without a clear idea of how to be implemented” and, in a logical framework of practical application, conclude that “the explicit integration of Exhaustivity and Specificity into IR models will improve the retrieval performance.” In the words of Anderson (2002), “the detail of indexing is a very important policy consideration for any index, with strong implications for search precision and recall.” Thus, the principles of exhaustivity and specificity are essential in the process of indexing as they are closely related to information retrieval. Evangelista, Simões, and Guimarães (2016) conducted a study on these principles as ethical values in knowledge organization. The authors identified how these principles affect information retrieval in relation to the diversity and expressiveness of indexing terms. Regardless of the nature of these two principles, it is well-accepted that they influence the entire dynamics of the indexing process, especially in relation to the analysis, representation, and retrieval of information. As a consequence, it is also well accepted that these two principles should be considered in an “objective” way in order to increase precision and recall. Considering that exhaustivity and specificity are two of the main concepts of subject indexing, this study aims to analyze how these two concepts are addressed in the publications of the ISKO international meeting proceedings for the period 2000-2017. In particular, we aim to analyze the aspects related to their thematic proximity and methodological approach. Our objectives are: (i) to identify and systematize the elements that characterize the conceptual construction of the principles of exhaustivity and specificity; (ii) to identify the works on the topic in the ISKO international meetings proceedings (2010-2017); (iii) to identify and describe the thematic proximity and methodological approaches to these two concepts in the papers. Methodology For the selection and analysis of the corpus we used an ad-hoc combination of techniques and methodological procedures, including content analysis and analysis of keywords in context (Bardin 2011; Bernard and Ryan 2010; Coutinho 2013). Regarding the first objective, we conducted a literature review of the topic. In relation to the second and third objectives, we searched the terms “exhaustivity,” “specificity,” and other related concepts such as relevance, precision, recall, and consistency in the full texts of the ISKO international meeting proceeding (2000-2017) and analyzed the meaningful parts of the papers (mainly the title, abstract and introduction). 61 For the analysis of the methodological approach, we adopted two nominal variables (see Table 1), based on Bernard and Ryan (2010, 151). Table 1: Variables used in the analysis category “methodological approach” Methodological approach of the paper (description) Variables of the approach The authors analyze the theoretical-methodological foundations and discuss the relevance of exhaustivity and specificity in indexing (especially for the case of information retrieval) exclusively based on the analysis and information of texts. Epistemological analysis The authors present results on the relevance of exhaustivity and specificity in indexing (especially for the case of information retrieval) based on experience. Empirical study In order to study the thematic proximity of the papers to the concepts of exhaustiveness and specificity, we worked with four ordinal variables that were expressed in a negative scale of intensity (see Table 2), based on Bardin (2011, 84). Table 2: Variables used in the analysis category “thematic proximity” Thematic proximity of the paper to the concepts of exhaustivity and specificity Degree of proximity Value The concept is a core part of the study. Central 0 The concept is addressed because of its intrinsic relationship with the object of study. Inherent -0.5 The concept is addressed because of a secondary relation with the object of study. Peripheral -1.5 The concept is not addressed, but we infer a thematic connection. Inferred -3 Results Out of the 578 papers that were published in the period 2000-2017, we selected and retrieved 29 papers (5% of the total) that are relevant for analysis. Figure 1 shows the frequency of these publications per year, noting that most papers were published in 2000 and 2002. Since then, there has been a significant decrease in publications on the topic until 2012 (the year with the lowest frequency), and again a mild increase in 2014 and 2016. 62 Figure 1: Frequency of publications per year Out of the 29 papers, only seven papers (24%) worked with the concepts of exhaustivity and specificity (see Table 3). The other 22 papers (76%) address related concepts such as relevance, precision, recall, and consistency. Table 3: Papers working with the concepts of exhaustivity and specificity Author/Year Title Kwasnik, B.; Liu, X. (2000) Classification structures in the changing environment of active commercial websites: the case of E-bay.com Frâncu, V. (2000) Harmonizing a universal classification system with an interdisciplinary multilingual thesaurus: advantages e limitations Broughton, V. (2002) Facet analytical theory as a basis for knowledge organization tool in a subject portal Kwasnik, B. (2002) Commercial websites and the use of classification schemes: the case of Amazon.com Kwasnik, B.; Chun Y-L. (2004) Translation of classifications: issues and solutions as exemplified in the Korean Decimal Classification Chen, X. (2008) The influence of existing consistency measures of the relationship between indexing consistency and exhaustivity Guimarães, J.A.C.; Fernandéz-Molina, J.C.; Pinho, F.A.; Milani, S.O. (2008) Ethics in the Knowledge Organization environment: an overview of values and problems in the LIS literature Related to the methodological approaches, the majority of papers (26, 89%) are empirical studies, while only 3 papers (11%) followed an epistemological approach. 63 Most of these empirical studies focused on the use of classification systems or information retrieval systems such as OPACs, etc. and also mentioned the concepts of recall, relevance, and precision. As for thematic proximity, only one paper (3%) – Chen 2008 – was central in dealing with the concepts of exhaustivity and specificity. As for the other papers, the degree of proximity was inferred in 22 papers (76%), and peripheral in 6 papers (21%). We did not identify any paper with an inherent degree of proximity. Conclusion Based on the results, we conclude that studies on exhaustivity and specificity are scarce and not very central in ISKO meeting proceedings. Most articles do not deal with indexing as a whole and very few specifically address these two concepts. Given that specificity and exhaustivity are present not only in the process of indexing but also in a wider range of areas related to the representation and retrieval of information, we believe that more research is needed. The clarification and study of these two concepts is important as they are also related to other concepts that contribute to semantic ambiguity, such as consistency, precision, recall, and relevance. The prevalence of empirical approaches in the methodologies, such as case studies and applications of theoretical models, might be due to the fact that the studies of concepts such as precision, recall, and relevance, generally associated with the concepts exhaustivity and specificity in information retrieval, mainly follow empirical methodological approaches rather than theoretical-methodological and epistemological approaches. However, the small number of studies on exhaustivity and specificity in the proceedings, and the inferred or peripheral centrality of most of them, might indicate that the relationship between knowledge organization and information retrieval is an under-researched area that deserves more attention from the ISKO community. Acknowledgment Isadora Victorino Evangelista thanks Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP) 2017/02327-8 for its support. References Abdulahhad, Karam, Chevallet, Jean-Pierre, & Berrut, Catherine (2013). Revisiting Exhaustivity and Specificity Using Propositional Logic and Lattice Theory. In Proceedings of the 2013 Conference on the Theory of Information Retrieval Copenhagen, Denmark – September 29 - October 02, 2013. Anderson, James D. (2002). Indexing, teaching of See: Information retrieval design. The Indexer, 23(1): 2-7. Bardin, Laurence (2011). Análise de conteúdo. São Paulo: Almedina. Bernard, H Russell, & Ryan, Gery Wayne (2010). Analyzing qualitative data: Systematic Approaches. Los Angeles: SAGE. 64 Borko, Harold, & Bernier, Charles L. (1978) Indexing concepts and methods. New York: Academic Press. Chaumier, Jacques (1982). Analyse et langages documentaires: le traitement linguistique de l'information documentaire. Paris: Entreprise Moderne d'Édition. Coutinho, Clara Pereira (2013). Metodologia de investigação em ciências sociais e humanas: Teoria e prática. 2ª ed. Coimbra: Almedina. Cutter, Charles A. (1904). Rules for a dictionary catalog, 4th ed. Washington, DC: Government Printing Office. Evangelista, Isadora Victorino, Simões, Maria da Graça Melo, & Guimarães, José Augusto Chaves (2016). A exaustividade e a especificidade como valores éticos no processo de indexação: uma análise baseada na literatura disponibilizada em Portugal. Páginas a&b: arquivos e bibliotecas, 3(5): 58-75. Foskett, A.C. (1977). The subject approach to information. London: Clive Bingley. Given, Lisa M, & Olson, Hope A. (2003). Knowledge organization in research: a conceptual model for organizing data. Library & Information Science Research, 25(2): 157-176. ISO 5963. 1985. Documentation – Méthodes pour l’analyse des documents, la détermination de leur contenu et la sélection des termes d’indexation. In Documentation et information: recueil de norms ISO I. Genève: ISO, 1988, p. 575-579. Jones, Karen Spärck (2004). A Statistical interpretation of term specificity and its application in retrieval. Journal of Documentation 60(5): 493-502. Khosh-Khui, A. (1986) Effects of Subject Specificity. Part I: Specificity of LC Subject Headings and Depth of Subject Analysis in Monographic Records. Technical Services Quarterly 4(2): 59-67. Kim, Giyeong (2006). Relationship between index term specificity and relevance judgment. Information Processing & Management 42(5): 1.218-1.229. Langridge, Derek Wilton (1989). Subject analysis: principles and procedures. London: Bowker- Saur. Mai, Jens-Erik. (2004). Classification in Context: Relativity, Reality, and Representation. Knowledge Organization 31(1): 39-48. Ogilvie, Paul, & Lalmas, Mounina (2006). Investigating the Exhaustivity Dimension in Content- Oriented XML Element Retrieval Evaluation. In CIKM’06, November 5-11, 2006, Arlington, Virginia, USA. Olson, Hope A., & Given, Lisa M. (2003). Indexing and the 'organized' researcher. The Indexer 23(3): 129-133 Pehcevski, Jovan, & Larsen, Birger (2009). Relevance. In Encyclopedia of Database Systems, ed., Ling Liu & M. Tamer Özsu. Springer, p. 2.377-2.378. Pettee, Julia (1945). Subject headings: the history and theory of the alphabetical subject approach to books. New York: The H. W. Wilson Company. Richter, Noë (1984). Grammaire de l'indexation alphabetique. Le Mans: Université du Maine. Rowley, Jennifer (1982). Abstracting and indexing. Londres: Clive Bingley. Witty, Francis J. (1973). The beginnings of indexing and abstracting: some notes towards a history of indexing and abstracting in antiquity and the middle ages. The Indexer 8(4): 193- 198. 65 Wolfram, Dietmar, & Zhang, Jin. (2002). An investigation of the influence of indexing exhaustivity and term distributions on a document space. Journal of the American Society for Information Science and Technology, 53(11): 943-952.

Chapter Preview

Schlagworte

Information Organization, Information access, Societal challenges, Interoperability, Didgital age, Information Representation

References

Abstract

The 15th International ISKO Conference has been held in Porto (Portugal) under the topic Challenges and opportunities for KO in the digital age. ISKO has been organizing biennial international conferences since 1990, in order to promote a space for debate among Knowledge Organization (KO) scholars and practitioners all over the world.

The topics under discussion in the 15th International ISKO Conference are intended to cover a wide range of issues that, in a very incisive way, constitute challenges, obstacles and questions in the field of KO, but also highlight ways and open innovative perspectives for this area in a world undergoing constant change, due to the digital revolution that unavoidably moulds our society. Accordingly, the three aggregating themes, chosen to fit the proposals for papers and posters to be submitted, are as follows: 1 – Foundations and methods for KO; 2 – Interoperability towards information access; 3 – Societal challenges in KO. In addition to these themes, the inaugural session includes a keynote speech by Prof. David Bawden of City University London, entitled Supporting truth and promoting understanding: knowledge organization and the curation of the infosphere.

Schlagworte

Information Organization, Information access, Societal challenges, Interoperability, Didgital age, Information Representation