International Society for Knowledge Organziation (ISKO), Marianne Lykke, Tanja Svarre, Mette Skov, Daniel Martínez-Ávila (Ed.)

Knowledge Organization at the Interface

Proceedings of the Sixteenth International ISKO Conference, 2020 Aalborg, Denmark

1. Edition 2020, ISBN print: 978-3-95650-775-5, ISBN online: 978-3-95650-776-2,

Series: Advances in Knowledge Organization, vol. 17

Bibliographic information
ADVANCES IN KNOWLEDGE ORGANIZATION VOL. 17 Edited by Marianne Lykke Tanja Svarre Mette Skov Daniel Martínez-Ávila in te rn at io n al s o ci et y fo r kn o w le d g e o rg an iz at io n [ ] Kn ow le dg e O rg an iz at io n at th e In te rf ac e M ar ia nn e Ly kk e – Ta nj a Sv ar re – M et te S ko v – D an ie l M ar tín ez -Á vi la (E ds .) 17 ADVANCES IN KNOWLEDGE ORGANIZATION Knowledge Organization at the Interface Proceedings of the Sixteenth International ISKO Conference, 2020 Aalborg, Denmark Organized by International Society for Knowledge Organization (ISKO), Marianne Lykke, Tanja Svarre, Mette Skov and Daniel Martínez-Ávila is ko Knowledge Organization at the Interface Advances in Knowledge Organization, Vol. 17 (2020) Knowledge Organization at the Interface Proceedings of the Sixteenth International ISKO Conference, 2020 Aalborg, Denmark Organized by International Society for Knowledge Organization (ISKO), Marianne Lykke, Tanja Svarre, Mette Skov and Daniel Martínez-Ávila Edited by Marianne Lykke Tanja Svarre Mette Skov Daniel Martínez-Ávila ERGON VERLAG The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at © Ergon – ein Verlag in der Nomos Verlagsgesellschaft, Baden-Baden 2020 This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in other ways and storage in databanks. Duplication of this publication or parts thereof is only permitted under the provisions of the German Copyright Law, a copyright fee must always be paid. Overall responsibility for manufacturing (printing and production) lies with Nomos Verlagsgesellschaft mbH & Co. KG. Cover Design: Jan von Hugo ISBN 978-3-95650-775-5 (Print) ISBN 978-3-95650-776-2 (ePDF) ISSN 0938-5495 Table of Contents Introduction 11 Giovanna Aracri, Assunta Caruso, Antonietta Folino: An Ontological Model for Semantic Interoperability Within an Earth Observation Knowledge Base 13 Webert Júnior Araújo, Gercina Ângela de Lima: A Methodological Proposal Towards Domain Ontology Enrichment 23 Mario Barité, Mirtha Rauch: Cultural Warrant: Old and New Sights from Knowledge Organization 31 Maria Teresa Biagetti: Bibliographical Relationships in Knowledge Organization Systems: A Historical-Theoretical Perspective 41 Ceri Binding, Claudio Gnoli, Gabriele Merli, Marcin Trzmielewski, Paul- Valéry, Douglas Tudhope: Integrative Levels Classification as a Networked KOS: A SKOS Representation of ILC2 49 Pino Buizza: Thesaurus and Heading Lists: Equivalences and Divegences 59 D. Grant Campbell, Alex Mayhew: Inheritance and Lamination in the Representation of Bibliographic Relationships 69 Josir Cardoso Gomes, Marco André Feldman Schneider: Ethical Perspective on Classifications of Religions: The Protestant Rise in Brazil 78 Yi-Yun Cheng, Khanh Linh Hoang, Bertram Ludäscher: Cacao, Cocao, or Cocoa?: Reconciliation of Taxonomic Names in Biodiversity Heritage Library 88 Stephanie Colombo: Representation and Misrepresentation in Knowledge Organization: The Cases of Bias 98 Giulia Crippa, Andre Vieira de Freitas Araujo: Order of Knowledge, Selection and Bibliographical Tension in the 16th Century: Between Gesnerian Universality and Possevinian Anti-Heretism 105 Amelie Dorn, Renato Rocha Souza, Enric Senabre, Thomas Palfinger, Eveline Wandl-Vogt, Barbara Piringer: Crafting a System for Knowledge Discovery and Organisation: A Case-Study on KOS for a Non-Standard German Legacy Dataset 115 Sharon Farnel, Ali Shiri: Indigenous Community Driven Knowledge Organization at the Interface: The Case of the Inuvialuit Digital Library 123 6 Amel Fraisse, Samantha Blickhan, Victoria Van Hyning: Towards an Open, Inclusive and Sustainable Knowledge Organization Models 133 Jonathan Furner: New Formats, Shifting Fortunes: Late-Twentieth-Century KO in the Wild 142 Francisco-Javier García-Marco, Fernando Galindo, Pilar Lasala, Joaquín López del Ramo: Advancing the Interoperability of the GLAM+ and Cultural Tourism Sectors through KOS: Perspectives and Challenges 151 Ann M. Graf: Domain Analysis of Graffiti Art Documentation: A Methodological Approach 161 David Haynes: Understanding Personal Online Risk to Individuals Via Ontology Development 171 Antoine Henry, Widad Mustafa El Hadi: The Use of Community to Organize Knowledge: The Case of an Energy Company 181 Philip Hider: Fiction Genres in Library Catalogues and Social Cataloguing Sites 190 Maximilian Hindermann, Andreas Ledl: BARTOC FAST: A Federated Asynchronous Search Tool for Remote Vocabulary Access 200 Chris Holstrom, Joseph T. Tennis: Visibility, Identity, and Personal Expression: Qualitative Case Studies of Social Tagging on MetaFilter 207 Gregory H. Leazer, Robert Montoya, Jonathan Furner: Numerical Classification and Complexity: Developing a Classification of Classifications 217 Deborah Lee, Lyn Robinson, David Bawden: Operatic Knowledge Organisation: An Exploration of the Domain and Bibliographic Interface in the Classification of Opera Subgenres 226 Daniel Libonati Gomes, Thiago Henrique Bragato Barros: The Bias in Ontologies: An Analysis of the FOAF Ontology 236 Lucinéia Souza Maia, Gercina Ângela de Lima: A System for Specifying Semantic Relations for Knowledge Representation 245 Carlos Henrique Marcondes, Célia da Consolação Dias: Representing Faceted Classification in SKOS 254 Daniel Martínez-Ávila, Fidelia Ibekwe, Fernanda Bochi: The Epistemic Communities and Evolution of Knowledge Domains: A Domain Analysis of the Journal Education for Information 264 Paul Matthews: Knowledge Organisation Systems for Chatbots and Conversational Agents: A Review of Approaches and an Evaluation of Relative Value-Added for the User 274 Claire McDonald: Call Us by Our Name(s): Shifting Representations of the Transgender Community in Classificatory Practice 284 7 Ådne Meling: A Critique of the Use and Abuse of Typologies in Cultural Policy Analysis 293 Juan Bernardo Montoya-Mogollón, Sonia Troitiño: Digital Forensics Science and Knowledge Organization: An Interdisciplinary Approach to Addressing the Conceptual Challenges of Born-Digital Records 302 Katherine Morrison: Committed to a Narrative: Expressions of Knowledge Organization at The Henry Ford Museum of American Innovation 310 Catalina Naumis-Peña, Hugo Alberto Guadarrama-Sánchez, Luis Enrique Sánchez-Rodríguez, Rosa de Guadalupe Hernández-Villeda: Terminological Relations of a Thesaurus for University Cultural Infrastructure Terms 319 Inger Beate Nylund: Using the Concept of Warrant in Designing Metadata for Enterprise Search 328 Lucia Maria Velloso de Oliveira, Bianca Therezinha C. Panisset, José Antonio da Silva: Types of Documents: Representations of Who We Are and How the Government Works 338 Ziyoung Park, Hosin Lee, Seungchon Kim, Sungjae Park, Dasom Jung, Seunghee Son, Yoonwhan Kim, Hyewon Lee: Organizing Performing Arts Records of Korean Traditional Music as Linked Open Data 348 Ziyoung Park, So Young Yoon, Seunghee Son, Yoonwhan Kim: Constructing Semantic Periodical Index Database Focusing on the Visegrad Group’s Transition Process (writing problems) 357 Brigita Perchutkaite, Marianne Lykke: Facilitating University–Industry Interaction by Visually Showcasing Researcher Profiles Via Metadata 364 Günter Reiner, Philipp Adämmer: Similarities Between Human Structured Subject Indexing and Probabilistic Topic Models 374 Athena Salaba: Knowledge Organization Requirements in LIS Graduate Programs 384 Gustavo Saldanha, Giulia Crippa: Concept Theory and Conceit Theory Ontology and Logology Between Conceptuality and Non-Conceptuality in Knowledge Organization 394 Ali Shiri, Elizabeth Joan Kelly, Ayla Stein Kenfield, Kinza Masood, Caroline Muglia, Santi Thompson, Liz Woolcott: A Faceted Conceptualization of Digital Object Reuse in Digital Repositories 402 Carlos Guardado da Silva, Luís Corujo, Jorge Revez: The Classification Plan for Local Administration: Portuguese Archives and the Knowledge Organization in Practice 411 Richard P. Smiraglia, Rick Szostak: Identifying and Classifying the Phenomena of Music 421 8 Linda C. Smith: Interdisciplinary Searching as a Use Case for Vocabulary Mapping 428 Rick Szostak, Richard P. Smiraglia, Andrea Scharnhorst, Ronald Siebes, Aida Slavic, Daniel Martínez-Ávila, Tobias Renwick: Classifications as Linked Open Data: Challenges and Opportunities 436 Natália Tognoli, Suellen Oliveira Milani, José Augusto Chaves Guimarães, João Batista Ernesto de Moraes: The Subject Dimension of Authorship: A New Perspective of Provenance in KO 446 Uma Balakrishnan, Dagobert Soergel, Olivia Helfer: Representing Concepts through Description Logic Expressions for Knowledge Organization System (KOS) Mapping 455 Mario Barité, Mirtha Rauch: Classification System for Knowledge Organization Literature (CSKOL): Its Update, a Pending Task? 460 Thiago Henrique Bragato Barros: Touching from a Distance: Concept Theory and Archival Hierarchical Classification 465 Amelie Dorn, Yalemisew Abgaz, Gerda Koch, José Luis Preza Díaz: Harvesting Knowledge from Cultural Images with Assorted Technologies: The Example of the ChIA Project 470 Francisco-Javier García-Marco: Knowledge Organization in Historical Information Systems Revisited: Changes in Society, Technology and Expectations 25 Years Later 474 Negin Shokrzadeh Hashtroudi, Mohsen Haji Zeinolabedini: Representing Entities and Characteristics of Iranian Performing Arts Based on IFLA Library Reference Model (IFLA-LRM) 479 Christopher S.G. Khoo, Rebecca Y.P. Kan: An Ontology for Conceptual Analysis of Signature Pedagogies 484 Michael Kleineberg: Classifying Perspectives: Expressing Levels of Knowing in the Integrative Levels Classification 489 Wan-Chen Lee: Linking, Mapping, Matching, and Change: Contemporary Use of Ranganathan’s Three Planes of Work in Classification Activity 494 Xiaoyue Ma, Pengzhen Xue, Nada Matta: Reconstruction of Crisis Knowledge Ontology by Integrating Temporal-Spatial Analysis 499 Luís Machado, Graça Simões, Claudio Gnoli, Renato Souza: Can an Ontologically-Oriented KO Do Without Concepts? 502 Victor Odumuyiwa, Yetunde Zaid, Olatunde Barber: Enhancing Knowledge Organization Through Implicit Collaboration in Crowdsourcing Process 507 9 Lucia Maria Velloso de Oliveira, Bianca Therezinha C. Panisset, José Antonio da Silva: Mediation in Archives: Organization, Classification and Transparency 512 Olívia Pestana, Rui Sousa-Silva: Knowledge Organization in the New Era Using DIY Corpora as Writing Assistants 517 Marcos Gonçalves Ramos, Priscila Ramos Carvalho, Rosali Fernandez de Souza: Amazônia and Amazon: Domain Analysis with Iramuteq in Scopus and LISA Databases 522 Tobias Renwick, Rick Szostak: A Thesaural Interface for the Basic Concepts Classification 527 Ana Lúcia Terra, Maria Del Carmen Agustín-Lacruz, Mariângela Spotti Lopes Fujita: The Role of Knowledge Organization in Scientific Communication: An Overview on JCR's Psychology Journals Guidelines about Title, Abstract and Keywords 532 Julietti de Andrade, Marilda Lopes Ginez de Lara: The social role of knowledge organization in Evidence Based Health 537 Radia Bernaoui, Dagobert Soergel: Social Network Communication and Effects on Innovation: The Case of the Agrifood Sector in Algeria 540 Djadeu Nguemedyam Colette: Organization and Sharing of Knowledge on Selective Household Waste Collection for Hygiene and Sanitation in the City of Yaoundé, Cameroon 543 Rodrigo Aldeia Duarte, Rosali Fernandez de Souza, Gustavo Saldanha: Devising a Concept of User for Archival Science: An Analysis of the Brazilian Scientific Literature 546 Isadora Victorino Evangelista, Thiago Henrique Bragato Barros: Ethical Aspects in Knowledge Organization: A Discourse Analysis at ISKO International Events 549 Negin Shokrzadeh Hashtroudi, Mohsen Haji Zeinolabedini: Educational Practices of Knowledge Organization in Iran: A Historical Review 551 María Leticia Pereyra Lanterna, María José López-Huertas Pérez, Francisco José Morales Calatayud: Approach to Domain Community Health and its Implications for Information Management 554 Bruno Henrique Machado, Rafael Semidão, Telma Campanha De Carvalho Madio, Daniel Martínez-Ávila: Provenance as an Ethical Measure for the Archival Knowledge Organization of Photographs 557 Alex Mayhew: Phylomemetic Cataloguing: Expanding Bibliographic Relationships Beyond FRBR 559 10 Marcos Luiz Cavalcanti de Miranda, Maria Luiza de Almeida Campos: Knowledge Organization Cultural Studies and Their Influence on Knowledge Organization Systems from the Douglas John Foskett’s Perspectives 562 Katerina Lynn Stanton, Rachel Ivy Clarke: The Design Domain is Divided: Issues in Interdisciplinary Library Classification 564 Marc Tanti: Analysis on Twitter of the Actors and Rumors Around the Ebola Epidemic 2018-2019 in the Democratic Republic of Congo. 566 Natália Tognoli, Lucas Correa: Knowledge Organization Systems as Accountability Tools in Archival Science 569 Fernanda Valle, Gustavo Saldanha: Autism Disorder in KO: Classification, Representation and Social Impact 572 Subject Index 575 Author Index 580 International Scientific Committee 582 PROCEEDINGS Knowledge Organization at the Interface 16th International Conference of the International Society for Knowledge Organization Introduction The 16th International ISKO conference, under the theme Knowledge organization at the interface was planned to take place from the 6. – 8. July 2020, in Aalborg (Denmark) at Aalborg University, Department of Communication and Psychology. The conference theme explored the connected themes of knowledge organization systems and their role in knowledge organization, knowledge sharing, and information searching. The conference invited contributions with practical solutions as well as the theory behind the design, development and implementation of knowledge organizing systems, ranging from controlled vocabularies, classification systems, metadata schemas through to ontologies and taxonomies. The conference topics included: • Knowledge organization across domains, media and technologies • Knowledge organization as understanding and communication • Knowledge organization as driver for development and change Proposals for full papers, short papers, posters, round table discussion, and workshops were welcomed. The conference call invited KO academics, practitioners, developers and students of innovative ideas and solutions to submit abstracts for consideration. Criteria for acceptance included originality, clarity of expression, and relevance to the conference theme. Proposals should have a sound basis in KO theory, be previously unpublished research, and not under review for another conference or journal. Anonymized full paper, short paper and poster submissions were double-blind reviewed. Submissions for round table discussions and workshops were single-blind reviewed, and should not be anonymized. The review process was carried out as a twostep process where first 1) abstracts were reviewed and accepted for further development, and 2) later accepted as either full paper, short paper, or poster when a full, developed version had been submitted. All accepted papers and posters are published in the conference proceedings. Accepted papers During the two-step review process a total of 48 full papers, 17 short papers, and 14 posters were accepted for publication and presentation at the conference. The papers covered a wide range of topics within the conference themes, i.e., knowledge transfer, concepts and conceptualization, fiction genres, ethical aspects, classificatory structures, representation, probabilistic models, social tagging, domain analysis, music classification, legacy data, document types, semantic networks, bibliographic relationships, faceted classification, KOS mapping, warrants, KO education, museum 12 organization, and archival organization. The papers discussed theoretical issues related to knowledge organization and the design, development and implementation of knowledge organizing systems as well as practical considerations and solutions in the application of knowledge organization theory. The papers covered knowledge organization systems ranging from classification systems, thesauri, metadata schemas through to ontologies and taxonomies. Scholars from 28 countries from all over the world contributed to the conference proceedings. As the conference was cancelled due to COVID-19, no conference program has been developed, and the papers are organized in full papers, short papers, and posters and within this organization alphabetically by first author. We would like to thank to those who helped make this publication possible. In particular, we are very grateful to the scholars who submitted abstracts and contributed with excellent papers as well as to the reviewers who made their effort to review and suggest improvements to the submissions. Aalborg, July 14 2020 Marianne Lykke, Tanja Svarre, Mette Skov, and Daniel Martínez Ávila Giovanna Aracri – IIT-CNR, Italy Assunta Caruso – University of Calabria, Italy Antonietta Folino – University of Calabria, Italy An Ontological Model for Semantic Interoperability Within an Earth Observation Knowledge Base1 Abstract: This paper presents an ontological model which will be included in an EO Knowledge Base (KB) covering four thematic strands. As a multitude of heterogeneous data will be made available through the KB, it is essential to ensure high standards of discoverability, accessibility, and interoperability. The overall aim therefore, is to align and integrate a set of existing semantic resources and ad hoc vocabularies into a single ontological conceptual model, which defines the specific domains and which will facilitate information and knowledge generation from EO data. Thus, guaranteeing semantic interoperability within the KB platform, ensuring harmonised access to and retrieval of the vast volume of data produced, turning it into usable information and knowledge. 1.0 Aim and scope of the study Integrating complex data, dynamic in nature, from heterogeneous resources, and without broadly applied standards constitutes a real challenge for users trying to make sense of the increasing amount of information made publicly available in any domain (Bodenreider et al. 2002). Earth Observation (EO) data, in particular, has increased considerably over the last decades and end-users are facing many challenges in accessing and analysing this data. In order to address the organisation and homogenisation of the huge volume of information that Earth Observation research is producing nowadays, scientists, and not only, need support from lexical and semantic tools, such as terminologies, vocabularies, nomenclatures, code and synonym sets, lexicons, thesauri, ontologies, taxonomies and classifications (De la Iglesia et al. 2013). Information Science typically defines information in terms of data, knowledge in terms of information, and wisdom in terms of knowledge (Rowley 2007). Generating information and knowledge from data is about understanding and connecting. Some initiatives, such as Copernicus, Group on Earth Observation (GEO), INSPIRE and others 2 , focusing on air quality monitoring, atmospheric conditions and pollution emissions, have generated a large volume of data. These data have been collected by using different devices simultaneously and dissimilar modalities, thus access to them remains difficult. On the basis of the above actions, the ERA-PLANET Programme3 with the enrolment of the most active experts in the EO field aims to develop, among other objectives, a Knowledge Platform to ensure harmonized access to the vast amount of data produced. 1 Although the authors have cooperated in the research work and in writing the paper, they have individually devoted specific attention to the following sections: Aracri: 3.3, 3.4 and 4.0; Caruso: 1.0 and 3.2; Folino: 2.0 and 3.1. 2 3 ERA-PLANET - The European network for observing our changing planet - Call: H2020-SC5-2015-onestage; Topic: SC5-15-2015; Type of action: ERA-NET-Cofund, Grant Agreement no. 689443. 14 The purpose is to gather distributed data coming from in-situ sensors and satellite-based remote sensing technologies, by means of practices and infrastructures able to organize, interpret and summarize them. To meet this challenge, the Programme is made up of 4 strands4, each one corresponding to a project5. All the projects try to provide more reliable information to policy makers concerning the status of the Earth in order to discuss and identify common strategies able to limit climate change, which is dangerous both for environmental and human health, by promoting sustainable development. Such data and information need to be made available through an interoperable system for data sharing and management able to ensure data quality and to interpret the meaning of data, turning them into usable information and knowledge. Existing EO frameworks support a variety of geographic data set types as well as tools for data management, analysis and visualisation, but often they do not provide any mechanism to tackle semantic heterogeneity issues (Fugazza et al. 2010). Even when EO platforms do include a number of vocabularies, if they are not aligned, the retrieval of all the information regarding a certain topic will not be guaranteed. Relating terms from distinct vocabularies creates richer structural information that can be used for improving search and query expansion (Craglia 2011). To fulfil this requirement an ontology is under development and a set of semantic resources are being integrated and aligned through a corpus-based approach. The overarching goal is to organize and to give access to integrate complex data, dynamic in nature, from heterogeneous resources, and without broadly applied standards, in order to improve the ability of end-users to explore and exploit EO data. The fulfillment of these ambitious goals, as well as the necessity to integrate our semantic resource into a comprehensive platform oriented us towards the choice of an ontology since it provides a conceptual framework which is more structured, adaptable and reusable. 2.0 Literature review In this section we provide a brief literature review regarding methods and projects oriented to align and integrate sets of existing semantic resources and ad hoc vocabularies, regardless of the domain within which they have been applied. As for the alignment, the term used for referring to the establishment of a variable degree of correspondence between concepts that belong to different controlled vocabularies is ontology mapping6. The increasing number of ontology matching methods and tools and the necessity of reaching a consensus on their evaluation determined the advent of the Ontology Alignment Evaluation Initiative (OAEI)7. 4 1. Smart Cities and Resilient Societies; 2. Resource Efficiency and Environmental Management; 3. Global change and Environmental treaties; 4. Polar Areas and Natural. 5 1. SMURBS - SMart URBan Solutions for air quality, disasters and city growth; 2. GEOEssential - Essential Variables workflows for resource efficiency and environmental management; 3. iGOSP - Integrated Global Observing Systems for Persistent Pollutants; 4. iCUPE - Integrative and Comprehensive Understanding on Polar Environments. Our research laboratory is partner of the first three projects. 6 7 15 In general, in order to be aligned and subsequently published as linked open data on the Semantic Web, several existing controlled vocabularies have been previously converted in SKOS format8. Some of them have been mapped without the systematic application of automatic approaches or the direct involvement of domain experts. The indexing languages RAMEAU, Library of Congress Subject Headings (LCSH) and Subject Headings Authority File (German: Schlagwortnormdatei SWD) (Landry 2009), as well as the Dewey Decimal Classification (DDC), the Library of Congress Classification (LCC) and the Medical Subject Headings (MeSH) (Vizine-Goetz et al. 2004), the Thesaurus for Economics (STW) and the Thesaurus for the Social Sciences (TSS) (Mayr and Petras 2008) and the Thésaurus du Tourisme et des Loisirs (Caruso and Folino 2015) have been mainly aligned without exploiting automatic approaches. Some other research works propose the semi-automatic detection of exact matches between concepts, and several matching systems are based on the computation of string similarity measures, rather than on semantic criteria. However, in almost all studies, the evaluation phase is performed manually. As stated in Morshed et al. (2011), some tools, such as S-match, use external resources (i.e. WordNet) as a background for recognising semantic relationships, but this approach seems less suitable for domain-specific terminologies. The approach here explained concerns the alignment between AGROVOC and other six KOSs more or less related to the field of agriculture. For each pair of concepts some string similarity measures are computed and the average value is taken into consideration for the subsequent manual validation. In order to perform this phase, experts have considered the status of the term (preferred or not), the hierarchy, the equivalent labels in other languages and the notes associated to concepts. The use of hierarchy as a disambiguation technique has also been used in presence of one-to-many alignments in Tordai et al. (2009) and, for each concept to be mapped, it takes into consideration both broader and narrower alignments. The authors have adopted a combination of techniques for establishing mappings between concepts coming from thesauri belonging to the cultural heritage domain: syntactic exact match techniques, a linguistic analysis and a technique deploying the ontology structure. The need for a manual evaluation depends on several problems generating incorrect matches when using automatic approaches. Some of them are listed in Kempf et al. (2014): terms share the same lexical value but their broader and narrower terms or their scope notes are different; terms in different domains seems to be similar but their meaning is different; the matching between a synonym and a preferred term generates an incorrect equivalence. To sum up, we refer to one of the most complete classifications of matching techniques (Euzenat and Shvaiko 2013), shown in Figure 1. Furthermore, starting from the assumption that a domain of interest can be represented through a corpus of text documents, it can be assumed that the knowledge domain that should be encoded into an ontology is represented through a domain corpus, and that the evaluation should output some measures that express the coverage and the adequacy of the ontology with respect to said domain (Rospocher et al. 2012). The integration of ad hoc vocabularies should therefore begin with the acquisition of domain-specific terminology (ex. Liddle et al. (2003); Navigli and Velardi (2004); 8 16 Gillam et al. (2005); Wong et al. (2007)). Methodologies similar to the one used in the present paper are brought forth by Brewster et al. (2004), in which the authors present a method for evaluating an ontology by comparing it with a domain-specific corpus, and by Cui (2010), who compares the coverage, semantic consistency, and agreement of four thematic ontologies by checking them against a corpus of domain literature. Figure 1. Classification of matching techniques 3.0 Method 3.1 Corpus-based terminology extraction New domains with specific conventions and new terminology are continuously appearing. The development of terminology lists is a previous step in KOS building that allows the use of automatic or semi-automatic techniques that could facilitate the labour (Peñas et al. 2001). Terminology extraction deals with the identification of terms which are frequently used to refer to the concepts in a specific domain, and therefore most likely representative of it. The main aim of term extraction here was to carry out a corpus-based terminological evaluation of the existing semantic resources in the EO domain in order to assess whether they adequately cover the terminology used in the domain corpus. Terminology extraction has been undertaken using the T2K² (text-toknowledge) tool (Dell’Orletta et al. 2014), specifically conceived to identify and extract simple and compound terms from unstructured texts. The main assumption on which T2K², along with most terminology extraction software, is based, is that the relevant concepts of a text are conveyed by the terms that will occur most frequently. The tool performs a linguistic analysis of the texts, the result of which consists of a terminological vocabulary accompanied by semantic and conceptual information about the terms themselves, which add to the value of the output. The list of candidate terms has been sorted by frequency and subsequently manually revised. Understanding whether a given KOS adequately covers the domain of interest is a common and important issue when evaluating a semantic resource. After having matched and evaluated the existing resources’ semantic/terminological coverage, which will be detailed in the following section, the terms appropriate to the sub-domains and objectives involved were used to produce/integrate the ontology. Both terms common to the corpus and to the existing vocabularies, as well as terminology unique to the domain corpus, i.e. not present in any of the examined vocabularies, was used to integrate the ontology to be incorporated in the EO Knowledge Base platform. 17 3.2 Evaluation of EO vocabularies semantic coverage Alongside the corpus construction, relevant existing terminological resources related to the Environment have been collected to be used as references. To date, thesauri such as the General Multilingual Environmental Thesaurus (GEMET) 9 , the EARTh Thesaurus10, the AGROVOC Thesaurus11, along with the INSPIRE Feature Concept Dictionary and Glossary12, have been taken into consideration. These terminologies have been downloaded in an easily computable format and in the form of flat-lists in order to compare all their terms to those extracted in the previous phase. Figure 2. Vocabulary term lists Over 5,550 terms have been extracted from the GEMET Thesaurus, while around 14,000 terms from the EARTh thesaurus. Meta-terms along with Macro areas have not been taken into consideration. The AGROVOC thesaurus includes approximately 45,500 terms. Almost 200 terms have been extracted from the INSPIRE glossary and 360 from the Feature Concept Dictionary. Considering their partial semantic overlapping, some vocabularies are already mapped to each other in order to allow federated access to information. An initial comparison has been carried out between the four term lists and those extracted from our sub-corpora in order to identify exact matches and understand if the concepts relevant to our specific purposes are present in the existing resources. The comparison has been carried out through the use of WordSmith Tools 13, a software tool that allows the comparison of the wordlists generated from the existing resources and the corpus. For instance, 450 candidate exact matches were found when comparing GEMET with the sub-corpus terminology, 26 exact match candidates in INSPIRE, 886 exact matches between AGROVOC and the specialized term list, and 763 candidate exact matches were automatically identified between the terminology present in EARTH and the terms extracted from one of the sub-corpora. A further manual analysis however, results in the identification of close matches, broader and narrower matches. 9 10 11 12 13 18 Figure 3. Term list comparison 3.3 Definition of an ontological conceptual model In this section we present the methodological choices and the steps involved in the development of the ontological model covering the thematic strands, which are strictly related but not necessarily overlapping. In a nutshell, an ontology is a complex formal vocabulary14 which requires background knowledge of the domain in order to explicit an efficient conceptual framework. Within the ontology the main issue is how concepts are logically implicated and which kind of information/knowledge we would like to get out from concepts and relationships, even if there are no a-priori links between them. EO data and information are both extremely heterogeneous and dynamic in terms of formats, meanings, languages, etc., and captured by using different technologies, applications and systems 15 . For the ERA-PLANET program, the major challenge regards providing decision support tools aimed at guaranteeing access to relevant information, whenever needed and in a comprehensible format. Thus, data and information can be reused and managed for various purposes, depending on the challenge. The ontology to be included in the ERA-PLANET Knowledge Platform has been developped through the use of Protégé, an open tool used to design and integrate ontologies. The aim is to support decision making in accordance with the Sustainable Development Goals (SDGs) and to monitor activities in evaluating measures to support environmental policies. To provide a complete and detailed representation of the concepts faced in the projects a top down approach has been adopted by implementing a high level conceptualization due to the inclusive nature of the ontology. Hence, at the top level there are the main classes which are key concepts across the domains and the access points to explore and browse the entire conceptual structure without evident boundaries among the sub-domains. Classes are, in turn, organized in further sub-classes 14 15 Big Data in Earth Observation. Earth%20Observation%20v1.pdf. 19 at different hierarchical levels16 and are at the same time member of the superclass and root of another subclass. The result is a taxonomy in which concepts are hierarchically organized by means of the specification of generic relationships in a universally acceptable manner. Even if concepts included in the ontology are related to the subdomains, there are those that are strictly domain-based (ex. Ecosystem - Terrestrial_ecosystem - Anthropogenic_terrestrial_ecosystem - Cropland), thus they present a very high level of specialization, while others, whose meaning is less domainspecific, present few hierarchical levels (ex. Dataset - Copernicus scenes). Therefore, at the moment, the same granularity cannot be guaranteed in all classes. Classes are not populated by individuals, but are related to each other by means of direct and inverse properties (ObjectProperties) defined according to the type of relationship that is useful to explicit. The match between classes and properties generates a statement in the form of subject-predicate-object expressions (ex. “Indicator 15.3.1 measures Target 15.1” and vice versa “Target 15.1 isMeasuredBy Indicator 15.3.1”). Figure 4. Ontological model 3.4 Ontology and vocabulary alignment The ontology development benefits from the terminology extraction and comparison phases described above. Some concepts have been added to expand the semantic granularity, and in order to better contextualize each concept, some additional information, such as definitions coming from other vocabularies or links to multiple sources, has been included. In defining mappings a semi-automatic approach has been adopted and a unidirectional mapping has been implemented from our ontology to the above mentioned controlled vocabularies. In particular, our main reference is represented by GEMET, because of its consolidated use within the scientific community and the alignments it already has with other vocabularies. More specifically, automatic procedures are used to discover and establish exact and sometimes close matches across concepts coming from different vocabularies, while human mediation was necessary to validate the output of automatic procedures and to 16 For a description of ontology structure and construction see (Noy and McGuinness 2001; Capuano 2005). 20 identify both hierarchical and associative mappings when it is not possible to provide a valid equivalence match. This is done to ensure that all concepts included in the ontology have an external reference in at least one vocabulary involved in ontology mapping. With regard to equivalence, using an hybrid method allows to control and manage inexact matches (ex. Homographs, Synonyms), because, as within the same vocabulary, also in cross-mapping, the equivalence relationship can be exact, inexact and partial (ISO 25964-2:2013). The different degrees of equivalence mapping have been expressed by means of the SKOS data model that has been imported in Protégé and its properties skos:exactMatch and skos:closeMatch have been used as AnnotationProperties. For the sake of completeness, some skos:broadMatch, skos:narrowMatch and skos:relatedMatch have been included in our ontology, as can be seen in Figure 5 below. Figure 5. Examples of mapping To overcome differences in vocabulary structures, ISO 25964-2:2013 provides some recommendations concerning the suitable model or combination of models to be used. Indeed, SKOS and the ISO 25964 data model are aligned17 so that it is possible to create a mapping compliant to both. As regards our mapping process, we have opted for a combination of the hub model, because our ontology is the core and the external vocabularies act as satellites, and the selective mapping, because links are established only in one direction and solely for concepts used in the ontology. The vocabularies involved in the mapping partially overlap with the ERA-PLANET thematic strands, as illustrated in Figure 6 below. Figure 6. Mapping percentages The preliminary results, exclusively regarding exact and close matches, therefore show the necessity to extend this mapping involving other vocabularies. 4.0 Conclusion Ontology mapping is a challenge in some strategic domains when a large volume of data need to be managed. The ongoing initiatives and actions highlight the importance of interoperability, discoverability, accessibility and reusability of data. These issues are central to our project because all data must be made available to policy decision-makers, who have the hard task of designing the strategies to be used to support sustainable development. In this sense, the ontology and the matching with external vocabularies, 17 Aligned vocabulary % Total equivalences % Exact Equivalences % Close Equivalences GEMET 44% 34% 10% EARTh 46% 38% 8% ENVO 33% 27% 6% 21 represent a valid semantic support because they allow to expand and specialize an information request. The ontology evolves continuously, therefore a periodic update is necessary in order to revise mappings and add new links to further vocabularies. To improve semantic interoperability in the KP it is expected to define links to some further existing vocabularies (ex. AGROVOC) and ontologies (ex. Sustainable Development Goals Interface Ontology (SDGIO), Chemical Entities of Biological Interest (ChEBI)) which cover some other areas of interest such as health, agriculture and chemistry, that are strictly related to the environmental domain. Acknowledgments This activity was funded by the European Commission in the framework of program “The European network for observing our changing planet (ERA-PLANET)”, Grant Agreement: 689443 . The authors would like to thank colleagues from IIA-CNR Florence for their collaboration in defining the ontological model. References Bodenreider, Olivier, Joyce A. Mitchell, and Alexa T. McCray. 2002. “Evaluation of the UMLS as a Terminology and Knowledge Resource for Biomedical Informatics.” In Proceedings of the AMIA Symposium 2002, 61-5. Brewster, Christopher, Harith Alani, Srinandan Dasmahapatra, and Yorick Wilks. 2004. “Data Driven Ontology Evaluation.” In Proceedings of the International Conference on Language Resources and Evaluation (LREC), May 2004, Lisbon, Portugal. European Language Resources Association (ELRA). Capuano, Nicola. 2005. “Ontologie OWL: Teoria e Pratica. Prima Puntata.” Computer Programming 148: 59-64. Caruso, Assunta and Antonietta Folino. 2015. “Mapping Tourism Thesauri for Semantic Interoperability.” In Systèmes D’organisation des Connaissances et Humanités Numériques: Actes du 10ème Colloque ISKO-France 2015 5-6 Novembre 2015 Strasbourg, edited by. Chevry Pébayle. ISTE éditions, 98-113. Craglia, Massimo, Stefano Nativi, Mattia Santoro, Lorenzino Vaccari, and Cristiano Fugazza. 2011. “Inter-Disciplinary Interoperability for Global Sustainability Research.” In GeoSpatial Semantics: International Conference on GeoSpatial Semantics (GeoS 2011), edited by C. Claramunt, S. Levashkin, M. Bertolotto. Lecture Notes in Computer Science 6631. Berlin, Heidelberg: Springer, 1-15. Cui, Hong. 2010. “Competency Evaluation of Plant Character Ontologies Against Domain Literature.” Journal of the American Society for Information Science and Technology 61, no. 6: 1144-65. De la Iglesia, Diana, Raul E. Cachau, Miguel Garcìa-Remaesal, and Victor Maojo. 2013. “Nanoinformatics Knowledge Infrastructures: Bringing Efficient Information Management to Nanomedical Research.”, Computer Science Discovery 6, no. 1: :014011. Dell’Orletta, Felice, Giulia Venturi, Andrea Cimino, and Simonetta Montemagni. 2014. “T2K²: A System for Automatically Extracting and Organizing Knowledge from Texts.” In Proceedings of 9th Edition of International Conference on Language Resources and Evaluation (LREC 2014), 26-31 May, Reykjavik, Iceland. Euzenat, Jérôme and Pavel Shvaiko, P. 2013. Ontology matching (2nd ed.). Heidelberg: Springer- Verlag. Fugazza, Cristiano, Sören Dupke, and Lorenzino Vaccari. 2010. “Matching SKOS Thesauri for Spatial Data Infrastructures.” In Metadata and Semantic Research:Research Conference on 22 Metadata and Semantic Research (MTSR 2010), edited by S. Sanchez-Alonso, I.N. Athanasiadis. Communications in Computer and Information Science 108. Berlin, Heidelberg: Springer, 211-21. Gillam, Lee, Tariq Mariam, and Ahmad Khurshid. 2005. “Terminology and the Construction of Ontology.” Terminology 11, no. 1: 55-81. ISO 25964-2:2013. Information and Documentation - Thesauri and interoperability with other vocabularies - Part 2: Interoperability with other vocabularies. Geneva: ISO. Kempf, Andreas Oskar, Dominique Ritze, Kai Eckert, and Benjamin Zapilko. 2014. “New Ways of Mapping Knowledge Organization Systems: Using a Semi-Automatic Matching Procedure for Building up Vocabulary Crosswalks.” Knowledge Organization 41: 66-75. Landry, Patrice. 2009. “Multilingualism and Subject Heading Languages: How the MACS Project is Providing Multilingual Subject Access in Europe.” Catalogue & Index: Periodical of CILIP Cataloguing & Indexing Group 157: 9-11. Liddle, Stephen, Kimball A. Hewett, and David W. Embley 2003. “An Integrated Ontology Development Environment for Data Extraction.” In Proceedings of Information Systems Technology and its Applications, International Conference (ISTA) 19-21 June 2003 Kharkiv, Ukraine, 21-33, edited by M. Godlevsky, S.W. Liddle, and H.C. Mayr, H. C. Bonn: Gesellschaft für Informatik, 21-33). 2.pdf?sequence=1&isAllowed=y Mayr, Philipp and Vivien Petras. 2008. “Building a Terminology Network for Search: The Komohe Project.” In Metadata for Semantic and Social Applications: Proceedings of the International Conference on Dublin Core and Metadata Applications 22-26 September 2008 Berlin, edited by J. Greenberg and K. Wolfgang. Göttingen: Universitätsverlag Göttingen, 177-82. Morshed, Ahsan, Caterina Caracciolo, Gudrun Johannsen, and Johannes Keizer. 2011. “Thesaurus Alignment for Linked Data Publishing.” In Proceedings of the International Conference on Dublin Core and Metadata Applications 2011 21-23 September 2011 The Hague, NL, 37-46. Navigli, Roberto and Paola Velardi. 2004. “Learning Domain Ontologies from Document Warehouses and Dedicated Websites.” Computational Linguistics 30, 151-79. Noy, Natalya F. and Deborah L. McGuinness. 2001. Ontology Development 101: A Guide to Creating Your First Ontology. Stanford, CA: Stanford University. Peñas, Anselmo, M. Felisa Verdejo, and Julio Gonzalo. 2001. “Corpus-Based Terminology Extraction Applied to Information Access.” In Proceedings of Corpus Linguistics 30 March- 02 April 2001 Lancaster University, UK. Rospocher, Marco, Sara Tonelli, Luciano Serafini, and Emanuele Pianta. 2012. “Corpus-Based Terminological Evaluation of Ontologies.” Applied Ontology 7, no. 4: 429-48. Rowley, Jennifer. 2007. “The Wisdom Hierarchy: Representations of the DIKW Hierarchy.” Journal of Information Science 33: 163-180. Tordai, Anna, Jacco van Ossenbruggen, and Guus Schreiber. 2009. “Combining Vocabulary Alignment Techniques.” In Proceedings of the Fifth International Conference on Knowledge Capture (K-CAP '09), September 2009, Marina Del Rey, CA, USA, 25-32. Vizine-Goetz, Diane, Carol Hickey, Andrew Houghton, and Roger Thompsen. 2004. “Vocabulary Mapping for Terminology Services.” Journal of Digital Information 4, no. 4. Wong, Wilson, Wei Liu, and Mohammed Bennamoun. 2007. “Determining Termhood for Learning Domain Ontologies Using Domain Prevalence and Tendency.” In Proceedings of the sixth Australasian conference on Data mining and analytics - AusDM ’07, 3-4 December 2007, Darlinghurst, Australia. Australian Computer Society, Inc., 47-54. Webert Júnior Araújo – Federal University of Minas Gerais, Brazil Gercina Ângela de Lima – Federal University of Minas Gerais, Brazil A Methodological Proposal Towards Domain Ontology Enrichment Abstract: Since the current methods for domain ontology enrichment present some gaps, due to knowledge dynamicity, this investigation aims to develop a methodology for domain ontology enrichment that overcomes the existing methods’ gaps. To address the goal, four steps sustain the research methodology: 1) An exploratory study of Knowledge Organization Systems maintenance and updating; 2) Mapping and analysis of the methods for enriching ontologies, from the literature review; 3) Qualitative content analysis of documents selected in Phases 1 and 2; 4) Development of the methodology for domain ontology enrichment. The result is a novel methodology for domain ontology enrichment, called METHODOE. 1.0 Introduction Since ontologies are a type of Knowledge Organization Systems and knowledge is dynamic, ontologies must be updated periodically. Unfortunately, most ontologies developers ignore the ontology maintenance and updating area. They focus only on the development of these KOS, ignoring the fact that knowledge can change, terms used to represent concepts become obsolete, new terms emerge, and new scientific discoveries are made. So, ontologies must pursue this knowledge evolution. An approach to update ontologies is through the ontological enrichment process. The enrichment process aims to expand an already developed ontology with new components (e.g. concepts, relationships, properties, and axioms); in consequence, the domain representation increases its potential. Hence, this study aims to develop a domain-independent methodology for the ontology enrichment process. The literature presents several proposals for ontology enrichment. These proposals have the following limitations: 1) enrichment only of some ontologies’ components (e.g., Faatz 2001; Faatz and Steinmetz 2002; Valakaros et al. 2004); 2) enrichment based on a very particular data source (e.g., Navigli and Velardi 2006; Amar, Gargouri, and Hamadou 2013); 3) enrichment applied to a specific domain (e.g., Faatz and Steinmetz 2002; Valakaros et al. 2004; Navigli and Velardi 2006; Booshehri et al. 2013); 4) intuitive and non-systematic methods. The question of this investigative research is: how to develop a methodology for domain ontology enrichment that overcomes the gaps in existing methods? We assume that literature can provide indications of how we can get the knowledge to develop this kind of methodology. Especially in the literature regarding Knowledge Organization Systems maintenance and in the empirical studies about ontology enrichment. The motivation behind this theoretical investigation is the concern in improving existing domain ontologies since these instruments are important in the communication, interpretation, and reasoning of knowledge. Besides, ontologies help in the Semantic Web context, favoring the semantic integration between different systems and vocabularies. Likewise, it supports the organization and retrieval of information. Thus, studies are necessary to contribute to the topic of ontology maintenance because ontology researchers still neglect this area. 24 2.0 The research methodology The research methodology is sustained in the following four steps: 1) An exploratory study of Knowledge Organization Systems maintenance and updating; 2) Mapping and analysis of the methods for enriching ontologies, from the literature review; 3) Qualitative content analysis of documents selected in steps 1 and 2; 4) Development of the methodology for domain ontology enrichment. 1) An exploratory study of Knowledge Organization Systems maintenance and updating In this research, we consider the ontology enrichment process as belonging to a more comprehensive area within ontological engineering, which is the ontology maintenance. Thus, we determine the main norms and methods for maintaining Knowledge Organization Systems as a knowledge source, from which we could extract inputs for the development of the methodology we intend to work within this study. KOSs, such as thesauri, classification systems, and taxonomies, have similarities with ontologies in some aspects. Therefore, some strategies for maintaining and updating these instruments can also be reused in ontologies. So after exploratory research on the topic, we selected the work of these authors: Kim (1973), Soergel (1974), ANSI/NISO Z39.19 (2005), ISO 25964-1 (2011), ISO 25964-2 (2013). 2) Mapping and analysis of the methods for enriching ontologies from the literature review In this stage, a narrative type literature review was carried out, where the objective was to map the works addressing the ontology enrichment thematic to verify which researches have already addressed this theme. The research was carried out in Information Science and Computer Science databases using the following expressions (in English and Portuguese): Ontology Enrichment, Ontological Enrichment, Ontology Expansion, Ontology Extension, Ontology Specialization, Ontology Refinement, Ontology Enlarge, Ontology Completeness, Ontology Improvement. The search strategy used truncators, boolean operators, advanced search, and specific filters for each database. The period determined in the research was from 1990 to 2018. After applying the search strategy, duplications and works not related to the scope of this research (by analyzing the title and abstract) were eliminated, obtaining a result of 35 works in total. Then, these 35 works were read using the following exclusion criteria: (1) works dealing with another Knowledge Organization Systems or databases enrichment; (2) works addressing another process and not enrichment (such as ontology learning, evolution); (3) works focusing only in the technique of knowledge acquisition and did not result in the enrichment of ontologies. In the end, 15 studies were considered for analysis in this review, which are: Faatz et al. 2001; Faatz and Steinmetz 2002; Valakaros et al. 2004; Navigli and Velardi 2006; Bendaoud, Toussaint, and Napoli 2008; Carvalho et al. 2010; Barbur, Blaga, and Groza 2011; Petasis et al. 2011; Hashimy and Kulathuramaiyer 2013; Booshehri et al. 2013; Amar, Gargouri, and Hamadou 2013; Booshehri and Luksch 2015; Al-Yahya, Al-Malak, and Aldhubayi 2016; Gómez- Moreno; Mestre-Mestre, 2017; Guerram; Mellal, 2018. 3) Qualitative content analysis of documents selected in steps 1 and 2 Given the recovered and selected documents in the previous steps, 1 and 2, a qualitative analysis of these documents’ content was performed. We selected the main points which deal with maintenance and updating in KOS and based on this, we created categories to organize the extracted information. 25 The analysis of the documents selected in step 1 demonstrates that when maintaining Knowledge Organization Systems there are a few points to consider: (1) define the person responsible for maintenance; (2) categorize the type of change (which may cover the inclusion of a term, replacement, exclusion); (3) control changes made (like the source of the extracted information, inclusion date); (4) identify the consequences of maintenance for other systems using the KOS; (5) define a periodicity for the maintenance. Therefore, despite being focused primarily on updating thesauri, the way this process works has well-founded information adaptable to the context of ontology enrichment. The analysis of the 15 selected papers revealed that they employ methods focusing on the technique used for information extraction, the knowledge source, and the type of enrichment performed. Among the identified information extraction techniques, there are statistical analysis, similarity measures, machine learning algorithms, syntactic analysis (part-of-speech, named entity recognition, parsing, stemming, tokenization, lemmatization), formal concept analysis, cluster techniques. Concerning knowledge sources we identified: textual corpus, semantically annotated corpus, thesaurus, ontology, web page content, lexical bases (such as Wordnet), machine-readable dictionary, data in linked data. Regarding the types of enrichment, they highlight lexical enrichment, conceptual enrichment, enrichment of taxonomic relations, enrichment of non-taxonomic relations, enrichment of axioms. We realize that most works have a narrowed perspective and they approach specific methods for a knowledge domain, consequently presenting the following shortcomings: (1) there is no planning for enrichment; (2) lack of details on how to perform some steps; (3) the methods are empirical, intuitive and not systematic. Notwithstanding these shortcomings, the analysis of these articles and papers still provided valid inputs for the development of the enrichment methodology. We created an action plan from the inputs generated in steps 1 and 2, which brought up information about what should be considered when maintaining a KOS and also about the main components of the enrichment process (such as knowledge extraction technique, knowledge source, and type of enrichment). This action plan will help in the development of the enrichment methodology, which will appear in the next stage of the research methodology. 4) Development of the methodology for domain ontology enrichment Information gathered in the previous phases helped to develop an action plan with key strategies for the development of the ontology enrichment methodology. As seen in Table 1, this plan has seven features and 11 strategies that will guide the methodology’s development ensuring it is better managed. Table 1. Action plan for the ontology enrichment methodology development # Functionality Action / strategy 1 Assessment of the need for enrichment 1.1)Develop a step to analyze the objectives of the ontology, users, and competency questions, if any. 2 Ontology diagnosis 2.1)Describe the possible ways to make the diagnosis. 3 Knowledge acquisition 3.1)Describe the possible knowledge sources; 3.2)Describe the possible techniques for extracting knowledge. 3.3)Explain how to extract knowledge. 4 Knowledge processing 4.1)Explain how to handle the extracted information. 26 5 Enrichment of ontology components from extracted content 5.1)Analyze the content extracted in the previous step; 5.2)Enrichment of the ontology components. 6 Evaluation and validation of the enrichment content 6.1)Describe the possible forms of evaluation and validation. 6.2)Check out the ontology after the content inclusion. 7 Methodology Documentation 7.1)Describe what was executed in each step of the methodology. The methodology developed according to the action plan above will be described in the next section, "Results." 3.0 Results As a result, we present the three phases which unfold in seven steps of the proposed methodology, which we call METHODOE (Methodology for Domain Ontology Enrichment): 1) Pre-enrichment; 2) Enrichment; 3) Post-enrichment. Figure 1 presents a general outline of METHODOE. This methodology maps the entire enrichment process and makes it structured and organized. This methodology does not intend to describe in detail each of the possible techniques for extracting knowledge to carry out enrichment. Rather, it will be flexible and allow the ontologist to choose the knowledge source and the extraction technique that best fits the domain represented by the ontology since each domain has its particularities. Figure 1. METHODOE general outline 3.1 Pre-enrichment Pre-enrichment is the first phase proposed in METHODOE and has the following steps: (A) Assessment of the need for enrichment; (B) Ontology diagnosis. In Step A - Assessment of the need for enrichment occurs the analysis to understand if the ontology responds to its the objectives, this can be done using various methods from the ontology evaluation area. A very common way to do it is through Competency Questions (CQs). Thus, if the ontology is unable to answer the competency questions elaborated in its development project, there will be indications showing that it needs enrichment. It is also possible to propose new CQs for the ontology. The assessment frequency of the need to enrich the ontology will depend on each domain, or how often 27 the terminology and the domain evolve. At this stage, a report should be generated, describing the need, or not to enrich the ontology with the necessary justifications. In Step B - Ontology diagnosis, an examination of the ontology is implemented to find out in which parts the enrichment will be necessary. Again, there are multiple ways to do this. The ontologist can check the entire ontology (each concept and relationship) with a domain specialist guidance and look for possible points where improvements might be necessary. Another possibility is the use of 41 pitfalls for diagnosing ontologies developed by Poveda-Villalón (2016), as they deal with common mistakes made when building ontologies, these errors are indications for carrying out the enrichment activity. Not all pitfalls apply to the enrichment process. In this step, a detailed report (identifying mainly the location of the problem in the ontology structure) should be generated from the diagnosis result. 3.2 Enrichment After the pre-enrichment phase, the ontology enrichment process is carried out. The steps composing this phase are (A) Knowledge acquisition; (B) Processing of extracted knowledge; (C) Content insertion in the ontology. Step A - Knowledge acquisition deals with access to the raw material (information) for enrichment, and can also be an iterative step that happens throughout the enrichment process. This stage has three activities: (a) selection of knowledge sources; (b) indication of knowledge extraction techniques; (c) knowledge extraction. About the Activity a - selection of knowledge sources, there are various knowledge sources, such as domain experts, textual sources (such as articles, books, reports), and other Knowledge Organization Systems (thesauri, taxonomies, glossaries, ontologies). In short, all sources considered qualitatively able to acquire information on the domain represented by the ontology can be recognized as a potential knowledge source. The participation of a domain expert in the survey of these sources can be very useful and should be considered. Moreover, one should consider the diagnostic report done in the Pre-enrichment phase because it can provide inputs to find the knowledge sources. Furthermore, we suggest exploratory research in domain databases and repositories of KOS. After surveying all potential knowledge sources, a table with all sources, and the motive for choosing each should be generated. Concerning Activity b - an indication of knowledge extraction techniques, it can range from interviews with domain experts, analysis of documents, and KOS. These techniques can be manual, semi-automatic, or automatic using linguistic techniques (Natural Language Processing mainly), statistics, and based on machine learning algorithms. This step must generate a table with the selected techniques accompanied by a motive for choice. The knowledge source and the technique must have a strong relationship since the chosen source's nature will greatly influence the knowledge extraction technique that will be used. Having chosen the technique, the information is extracted from the selected knowledge sources. In METHODOE, we do not explain which source or technique should be used since this will depend heavily on which domain the enrichment process will be developed. We highlight that one of this methodology's characteristics is to be domain-independent. Thus, one must consider the existing sources in each specific domain. Activity c - knowledge extraction, refers to the application of knowledge extraction techniques in the selected knowledge sources. Again, the report on the ontology 28 diagnostic stage is important because it will assist the specific search for knowledge to enrich the ontology. The goal here is not to extract all knowledge regarding the domain represented by the ontology, but to extract the knowledge the ontology has yet to cover. To this end, specific questions to the knowledge sources are made. These questions can be asked through interviews with domain experts, through text analysis (manually, automatically, or semi-automatically) of the domain in pursuit of answers. The manner used to extract knowledge must be associated with the type of knowledge extraction technique and the chosen knowledge source. Stage B – Processing of extracted knowledge deals with the organization of knowledge acquired in the previous phase and comprises two activities: (a) correlation between diagnosis and extracted knowledge; (b) classification of extracted knowledge. Activity a - correlation between diagnosis and extracted knowledge tries to relate the gaps identified in Step 1.B (Ontology diagnosis), and the possible solutions found through Activity 2.A.c (Knowledge Extraction). The objective is to facilitate the identification of possible answers to the questions and gaps the ontology presents. Activity b - classification of extracted knowledge refers to the attempt to group the acquired knowledge into types of enrichment. The types of enrichment are: (1) lexical enrichment, which deals with the acquisition of terminological variations of a concept, synonyms, and definitions in natural language; (2) conceptual enrichment, refers to obtaining new concepts, they can be specific or general; (3) enrichment of taxonomic relations, that deals with the acquisition of 'gender-species' and 'part-of' relationships between the ontologies’ concepts; (4) enrichment of non-taxonomic relations, which deals with all other types of associations that can happen between the ontologies' concepts; (5) enrichment of axioms, discusses the definition of rules to the concepts and relationships, its purpose is the formalization and consequent restriction on the interpretations of the represented knowledge. The results generated by Activity a correlation between diagnosis and extracted knowledge will be relevant for the classification of knowledge into types of enrichment. Step C - Content insertion in ontology works with the ontology expansion with the acquired content. This step must be performed by the ontologist or together with the domain specialist. The purpose is to enrich the ontology according to Step B - Processing of the extracted knowledge’s results and correctly insert the content in the ontology. If something remains unclear, it is possible to return to the knowledge acquisition stage. At the end of this step, one must generate a detailed report with all the content inserted in the ontology, the knowledge source, and the inclusion date in the ontology. 3.3 Post-enrichment The last phase of METHODOE consists of the following steps: (A) Evaluation of the enriched content; (B) Documentation of the methodology. Step A - Evaluation of the enriched content verifies if the content has been properly inserted in the ontology and if there are any necessary changes to the structure or ontology component since the inserted content can have an impact on its structure. This analysis is performed with the help of the domain specialist. Finally, a document specifying the evaluation result is generated. Stage B - Methodology documentation refers to the registration of all procedures performed in each stage of the ontology enrichment. This step should happen throughout the entire process, not merely at the end. However, only at the end of the enrichment process, will it be possible to generate the final document of the methodology. 29 4.0 Conclusions We presented a novel methodology for domain ontology enrichment based on indications found in the literature related to Knowledge Organization Systems’ maintenance and updating, and in empirical studies regarding domain ontology enrichment. This methodology's fundamental goal is the attempt to organize the entire domain ontologies’ enrichment process. It pursues to do so presenting the preenrichment phase steps to be executed before the enrichment itself, and steps to be done afterwards, differing from all methods presented in literature so far. Thus, it contributes so that ontologists can have a systemic and holistic notion of how to improve existing ontologies. METHODOE is still in its first version, so it needs to be applied in domain ontologies to be diagnosed and validated, which is one of the main limitations of this research. However, we believe the methodology presents a relevant and systematic view on the process of domain ontologies enrichment, surpassing the methods previously developed in some aspects. As a forthcoming proposal, we intend to validate this methodology in the enrichment of two ontologies from different domains, thus proving the METHODOE's domainindependent characteristic. Furthermore, we intend to develop a practical and exemplified manual for each stage of the methodology. References Al-Yahya, Maha, Sawsan Al-Malak, and Luluh Aldhubayi. 2016. “Ontological Lexicon Enrichment: The Badea System For Semi-Automated Extraction Of Antonymy Relations From Arabic Language Corpora.” Malaysian Journal of Computer Science 29, no. 1: 56–73. Amar, Feten Baccar Ben, Bilel Gargouri, and Abdelmajid Ben Hamadou. 2013. “Domain Ontology Enrichment Based on the Semantic Component of LMF-Standardized Dictionaries.” In Knowledge Science, Engineering and Management. KSEM 2013, edited by M. Wang. Lecture Notes in Computer Science 8041. Berlin: Springer, 404–19. ANSI/NISO. 2005. Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies. Bethesda, MD: National Information Standards Organization. Barbur, Gabriel, Bogdan Blaga, and Adrian Groza. 2011. "Ontorich - A Support Tool for Semi- Automatic Ontology Enrichment and Evaluation." In 2011 IEEE 7th International Conference on Intelligent Computer Communication and Processing, 129-132 doi: 10.1109/ICCP.2011.6047855 Bendaoud, Rokia, Yannick Toussaint, and Amedeo Napoli. 2008. “PACTOLE: A Methodology and a System for Semi-Automatically Enriching an Ontology from a Collection of Texts.” Conceptual Structures: Knowledge Visualization and Reasoning. ICCS 2008, edited by P. Eklund and O. Haemmerlé. Lecture Notes in Computer Science 5113. Berlin: Springer, 203–16. Booshehri, Meisam, Abbas Malekpour, and Peter Luksch. 2013. “Ontology Enrichment by Extracting Hidden Assertional Knowledge from Text.” International Journal of Computer Science and Information Security 11, no. 5: 64–72. Booshehri, Meisam and Peter Luksch. 2015. “An Ontology Enrichment Approach by Using DBpedia.” WIMS '15: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics. New York: Association for Computing Machinery, 1–11. Carvalho, Miguel. G. P., Vanessa Braganholo, Maria L.M. Campos, and Maria L.A. Campos 2010. “Enriquecimento de Ontologias: uma Abordagem para Extração de Conhecimento do Campo Definição”. Presented at Ontobrás. Florianópolis, Santa Catarina. Faatz, Andreas and Ralf Steinmetz. 2002. Ontology Enrichment with Texts from the WWW. Faatz, Andreas, Stefan Hermann, Cornelia Seeberg, and Ralf Steinmetz. 2001. Conceptual Enrichment of Ontologies by Means of a Generic and Configurable Approach. 30 Gómez-Moreno, Pedro Ureña and Eva M. Mestre-Mestre. 2017. “Automatic Domain-specific Learning: Towards a Methodology for Ontology Enrichment.” Revista de Lenguas para Fines Específicos 23, no. 2: 63-85. Guerram, Tahar and Nacima Mellal. 2018. “A Domain Independent Approach for Ontology Semantic Enrichment.” In Computer Science & Information Technology, edited by Natarajan Meghanathan et al., 13-19 Hashimy, Amaal Saleh Hassan Al, and Narayanan Kulathuramaiyer. 2013. “Ontology Enrichment with Causation Relations.” In 2013 IEEE Conference on Systems, Process & Control (ICSPC). Kuala Lumpur, 186-192. doi: 10.1109/SPC.2013.6735129. International Organization for Standardization (ISO). 2011. ISO 25964 -1: Thesauri for information retrieval. Geneva: International Standard Organization. International Organization for Standardization (ISO). 2013. ISO 25964 -2. Interoperability with other vocabularies. International Standard Organization, Geneve. Kim, Chai. 1973. “Theoretical Foundations of Thesaurus-Construction and Some Methodological Considerations for Thesaurus-Updating.” Journal of the American Society for Information Science 24, no. 2: 148–56. Navigli, Roberto and Paola Velardi. 2006- “Ontology Enrichment Through Automatic Semantic Annotation of On-Line Glossaries.” In Managing Knowledge in a World of Networks, edited by S. Staab and V. Svátek. Berlin: Springer, 126–140. Petasis, Georgios, Vangelis Karkaletsis, Georgios Paliouras, Anastasia Krithara, and Elias Zavitsanos. 2011. “Ontology Population and Enrichment: State of the Art.” In Knowledge- Driven Multimedia Information Extraction and Ontology Evolution, edited by G. Paliouras, C.D. Spyropoulos, and G. Tsatsaronis. Lecture Notes in Computer Science 6050. Berlin: Springer, 134–66. Poveda-Villalón, María. 2016. “Ontology Evaluation: A Pitfall-Based Approach to Ontology Diagnosis.” Ph.D. dissertation. Madrid: Universidad Politécnica de Madrid. Soergel, Dagobert. 1974. Indexing Languages and Thesauri: Construction and Maintenance. U.S.A.: Melville Pub. Co. Valarakos, Alexandros G., Georgios Paliouras, Vangelis Karkaletsis, and George Vouros. 2004. “A Name-Matching Algorithm for Supporting Ontology Enrichment.” In Methods and Applications of Artificial Intelligence. SETN 2004, edited by G.A. Vouros and T. Panayiotopoulos T. Lecture Notes in Computer Science, vol 3025. Springer, Berlin, Heidelberg. Mario Barité – University of the Republic, Uruguay Mirtha Rauch– University of the Republic, Uruguay Cultural Warrant Old and New Sights from Knowledge Organization Abstract: Culture is a controversial and multi-discursive term in different disciplines. A dimension of Knowledge Organization (KO) appears in the cultural warrant (CW) because KO systems, processes and products show “the values of the cultures involved” (Guimarães et al. 2019). We identified four theoretical elements necessary to understand the CW: the need of adjusting the concept of culture to the KO field; the focus placed on local dimensions of knowledge; the intention to promote biased classifications in favour of minority and relegated social sectors; and the ethical issue expressed in the respect for the integrity of the cultural thought of a community. The suitable techniques for the application of the CW in the processes of construction, evaluation, and revision of knowledge organization systems (KOS) are qualitative. Some methods are common to KO: content, terminological, discourse, and domain analysis techniques. We also propose a categorization of three cultural hospitalities. It is concluded that the concepts of culture and CW are not neutral since they favour forms of knowledge organization that replace the criteria of objectivity and neutrality, by those of cultural pertinence and respect for the values of a community. We also suggest considering the extrapolation of some methodologies from social sciences to study the linguistic behaviour of subcultures in order to improve the CW of KOS. 1.0 Culture, warrant, cultural warrant: consensuses and conflicts 1.1 Culture Culture is a controversial and notoriously ambiguous term used in different disciplines (anthropology, sociology, politics, feminism, humanities and cultural studies). It is also a “multi-discursive” term, because “it can be mobilized in a number of different discourses” (Hartley 2004, 51). The fact of being multi-discursive explains much of the difficulty to agree on a unique concept of culture. Not surprisingly, more than 150 definitions have been collected in the classic review by Kroeber and Kluckhohn (1952), and all of them are adequate. Culture is “a term that evolves in the historical period during which it develops,” and therefore, “it has changing contents” (Rodríguez Pastoriza 2006, our translation). Its oldest antecedent is associated with agriculture. In Latin ‘cultus’ means cultivation, cultivated, or treated with care. The semantic background has several positive connotations since crops imply subsistence, regularity, and continuity (Di Tella et al, 2004), the vigorous growth of new forms of life. The concepts of culture and civilization have been bound together for several centuries, and have been seen from the French tradition of the Enlightenment “as a progressive, cumulative, distinctively human achievement” (Kuper 1999, 5) to which everyone can aspire. Hoggart (1957) and Williams (1958) were the first to propose to overcome the idea of culture as proper of an elite and planted the seed of the so-called cultural studies, which transformed the concept, seen today as “a dynamic concept, always negotiable and in process of endorsement, contestation, and transformation” (Wright 1998, 10). From an individual perspective, culture can be considered as the measure of a person’s education, manners and knowledge of the world. 32 But what matters for the objectives and purposes of Knowledge Organization (KO) is culture considered from a social perspective, because it gives an adequate dimension of the information problems and the solutions that have to be addressed, and because each individual (and therefore, each information user) belongs to several cultural communities due to ethnicity, religion, nationality, political thought, habits, and preferences. This was confirmed by Baumann, when he studied five ethnic groups in a London sector, looking for common elements of each culture and found, in addition to those identity traits, what he called “communities within communities as well as cultures across communities” (Baumann 1996, 10). The educational background, professions, and disciplines are also important personal cultural variables. Even if culture is often defined as a system of shared meanings, Burke (2005) determined that it was difficult to hold this position when large groups, such as nations, were studied. Burke argued that this approach showed the strengths and weaknesses of the Durkheimian model of society, where consensus prevails over conflict (two major topics in culture). He proposed -as an alternative- “the use of the concept of subculture, defined as a partially autonomous culture within a larger totality,” without intending to give an idea of inferiority (Burke 2005, 177). Burke added that sociologists had dealt with the most visible subcultures (ethnic or religious minorities), those deviant according to the rules of a given society (criminals and heretics), and young people. Historians have also studied groups such as the Jews in Medieval Spain (Pérez 1993) or beggars in Elizabethan London, “but they did not always pay attention to the relationship between the culture of those minorities and those of the surrounding society” (Burke 2005, 178). Hebdige put an end to this perspective by analysing the process through which dominant mentalities in a society become hegemonic, and treat subcultures in a somehow pejorative way; this gives rise to a tension between the powerful groups and minorities (Hebdige 1979). Briefly, the following approaches can be taken into account in KO: i) On the one hand, those that promote consensus values, such as cultural integration and tolerance between cultures. Unesco follows that trend, when it states that culture is conceived “as the set of distinctive spiritual, material, intellectual and emotional features of society or a social group, and [...] it encompasses, in addition to art and literature, lifestyles, ways of living together, value systems, traditions and belief” (Unesco 2001). ii) on the other hand, those that focus on the analysis of the conflict between different cultural expressions, stating that a dominant culture is usually shielded to strongly resist changes and innovations; thus, they cause different contradictions (higher or elite culture vs. popular culture; hegemonic culture vs. subcultures or minority cultures, accepted cultures vs. rejected cultures, urban culture vs. rural culture, literate culture vs. nonliterate cultures). Besides accepting that classificationists must take into account consensus and conflict between cultures when they develop knowledge organization systems (KOS), it may be more appropriate to give equal importance to the principles and methods that ensure reciprocal tolerance, respect for 'otherness' and the promotion of cultural integration values. 33 1.2 Warrant As said in a previous paper, “there are no substantial differences regarding the definition of the notion of warrant” (Barité 2019, 652). In 1986, Beghtol developed such a precise and detailed definition of warrant, that it is still peacefully accepted to this day: “the warrant of a classification system can be thought of as the authority a classificationist invokes first to justify and subsequently to verify decisions about what classes/concepts to include in the system” (Beghtol 1986, 110-111). More than 30 years after its formulation, it might only be added to that definition that: i) warrants are currently seen as an essential component of three KOS processes: construction, evaluation and revision; ii) we can also consider warrants as tools for terminology selection in contexts other than KOS: information and data systems, big data analytics, social classifications, terminological data banks and specialized dictionaries, among others. The different existing warrants have recently been compiled and explained (Barité 2018), and a discussion on the usefulness of a general understanding of the problems associated with the characterization and application of warrants (Bullard 2017; Barité 2019) has started. The fact that in KO literature there is an unequivocal understanding of what a warrant is, its utilities, and applications, is a sound foundation for research. 1.3 Cultural Warrant While the concept of culture turns out to be multi-discursive and controversial and that of warrant is peacefully accepted in KO, what is left to say about the concept of CW? It has been written that the CW is “a critical activity [that] can be used to evaluate both classification schemes and knowledge fields” (Hjørland and Albrechtsen 1999, 135) because KO systems, processes and products reflect “the values of the cultures involved” (Guimarães et al. 2019, 29, our translation). It has repeatedly been said that although classifications try to represent the map of disciplines in an objective and neutral way, their schemes are historically and culturally conditioned, since they reflect the social, political and religious thought as well as the state of the scientific evolution of their times and designers’ mentalities (Shera 1961; Lee 1976; Beghtol 1986; González Casanova 1996). KOS have recurrently been criticized for unequal (Trivelato and Moura 2016), improper (Kua 2004), discriminatory (Furner and Dunbar 2004; Olson 2007) or colonialist (Pacey 1989) treatment, or for making invisible many topics (Olson and Ward 1998) in religion, social sciences, literature or gender studies. In all these cases, the classificationists’ cultural perspectives have been seriously questioned, and this has compelled to look for systematic solutions to these issues. It was Beghtol who refined and expanded Lee’s original and basic idea (1976), pointing out that CW “posits that every classification system is based on the assumptions and preoccupations of a certain culture, whether the culture is that of a country, or of some smaller or larger social unit (e.g. ethnic group, academic discipline, arts domain, political party, religion and/or language)” (Beghtol 2002a, 45). For a better understanding, in the following sections, the concept of CW will be broken down into theoretical and methodological aspects. 34 2.0 Theoretical approach to cultural warrant In the literature review performed for this paper, we identified four theoretical elements necessary to understand the CW: i) The relevance of adjusting the concept of culture to the KO field. In the literature of the area, it has seldom been explained on what concept of culture the development of an idea is based on. A broad sense -that may not be acceptable to everyone- is usually taken for granted. Lee made a relevant contribution when she reviewed definitions of culture from both KO and anthropology literature and compiled the definitions in four families of definitions (Lee 2015). As culture is a multi-discursive and controversial term, it seems logical to seek an operational definition to be used as a reference for each research in KO, integrating the peculiarities of vocabulary control processes, classification, indexing, tagging and information retrieval on culturally determined topics. ii) The focus placed on local dimensions of knowledge as opposed to universal approaches. The recognition of the local dimensions of knowledge means addressing the conception of science and technology as a set of ideas, solutions and applications valid in any place and time and alternative thinking that addresses specific beliefs, values, and contexts where knowledge acquires a more immediate legitimacy. Tensions between global culture and local cultures are expressed through language, which always operates as a reference to cultural identity. Notwithstanding the recognition that KO has traditionally favoured universal systems to ensure some uniformity in the international communication of data on publications, in recent years two insurgent movements against universalistic perspectives have been identified. On the one hand, there is a series of critical studies on the serious limitations that universal systems have for the classification and indexing of issues of local impact. On the other hand, we see the emergence of new KOS - especially thesauri - developed for subject representation of local affairs (Brasil, Ministério de Cultura 2006). Some disciplines are supported by the cultural dimensions of a field (law, history, cultural anthropology, music, literature, and arts), or by approaches built by various disciplinary cultures; they were the first to raise issues of subject representation. For example, the “mate” culture -widespread in Paraguay, Argentina, Uruguay, and southern Brazil-, has its own body of literature and terminology. Its status requires the creation of classification systems or mini-thesauri to adequately represent the specific documentation on the “mate” infusion and the cultural elements (rituals, objects, procedures, codes of conduct) that surround it. Likewise, different cultures classify birds differently, and this is studied in ethnobiology (Berlin 1992) and in biological systematics, where different paradigms such as numerical taxonomy and cladistics are competing perspectives. iii) The intention to promote biased classifications in favour of minority and relegated social sectors. This approach implies accepting the distinction between dominant and subjugated cultures that coexist in the same society. Although this may be an excessive simplification, part of the social conflicts may be related to the non-peaceful coexistence between different cultural communities. Minority cultures strongly protect their cultural characteristics (language, customs, and ways of understanding reality). They result in followers, theories, documentation, 35 regulations, objects, liturgies and procedures that require visibility in KOS. Members of minority cultures often develop strong religious, ethnic, ideological and/or philosophical cohesion, especially when the dominant or hegemonic culture reacts showing indifference or real discrimination. The DDC and UDC classification systems have validated, in many cases, anachronistic or ideologically biased, tendentious, and even offensive constructions of social and cultural knowledge areas. The CW intends to compensate for these imbalances, and maintain the integrity and identity of minor or ‘minoritized’ cultures (Olson 2007). iv) The ethical issue expressed in the respect for the integrity of the cultural thought of a community. By involving in subject representation the visibility of particular groups, showing their patterns of coexistence and communication, the CW introduces the ethical factor in KO (Beghtol 2002b; Guimarães et al. 2008). This is especially important when attention is focused on the demands of social movements or people who promote ideas based on new ethical presuppositions, or the deconstruction of hegemonic cultural perspectives or forms (García Gutiérrez 2007). The identification of the CW with essential ethical precepts regarding the recording, availability, access to and retrieval of information in the most open and free way allows us to place it in the pragmatic epistemological approaches to KO advocated by Hjørland (Hjørland 1999, 2013; Barité 2019). In this context, the concern for not enhancing the hegemonic aspects of a society and considering alternative interpretations of reality forms should be central theoretical orientations in the application of the CW. This will imply the use of non-discriminatory or non-inclusive indexing terms and politically correct expressions that take into account all the possible perspectives for the formulation of a culturally determined topic or issue. In particular, terminology selection should be endorsed by movements or social groups considered as potential users of an information service or system. The different dimensions of culture (culturality, culturalism, multiculturality, multiculturalism, interculturality, and transculturality) - studied by Boccato and Biscalchin (2014) - will have to be taken into account among other aspects for the development of the CW concept. The customized and comparative treatment of these dimensions results in a better disaggregation of the concept of culture, but methodological precautions have to be considered. 3.0 Methodological approach to cultural warrant From a methodological perspective, the suitable techniques for the application of the CW are qualitative. Some methods are common to KO: Content analysis: a classical tool used for indexing and tagging (Krippendorff 2004; Hsieh and Shannon 2005) which allows identifying culturally marked expressions to be used as appropriate indexing terms for specific user communities. Terminological analysis: used to create KOS biased towards certain aspects of social and human sciences, or the understanding and judgment of situations where the parameters of two different cultures may be in conflict. The paper by Benyaich (2014) on the consequences of the relative incompatibilities between family law in Spain and Morocco is a good example. 36 Critical discourse analysis: the assumptions of this methodology listed by Kress, one of the pioneers, have several points of support for the CW. Kress assumed that language is a social phenomenon. Therefore, individuals, institutions and social groups convey meanings and values through language in an organized way. In this context, texts are fundamental units of language in communication. Readers are not passive recipients of texts; on the contrary, they interact in various ways with those texts based on their cognitive structures, perceptions, and cultural patterns (Kress 1989). Van Dijk (1999) developed categories and procedures for the critical study of the discursive reproduction of domination and hegemonic thinking in societies. These categories and procedures can also be articulated with the CW methods. Domain analysis: Under this generic name, Hjørland gathered eleven techniques for mapping areas of knowledge from different perspectives and bases of analysis (Hjørland 2002). Some domain analysis techniques -whether qualitative or quantitative- can also be fruitful for the CW, for example: the creation of literature guides; the construction of special classifications and thesauri; empirical user studies; historical studies; epistemological and critical studies; database semantics and discourse studies; and even bibliometric studies. The CW can be used throughout a KOS specialized in cultural issues, such as the Art & Architecture Thesaurus (Getty Research Institute 2017) or the Brazilian Folklore and Popular Culture Thesaurus (Brasil, Ministério de Cultura 2006). Some culturally oriented terms can also be inserted into a pre-existing scheme, to make visible a certain perspective. This implies the need to intervene in that pre-existing system, adding specifications in precise areas of the schemes, or inserting alternative tables of local value. To that end, it is necessary to make changes in the pre-existing system, adding specifications to certain areas or inserting alternative tables of local value. Beghtol (2002a, 2002b) introduced the principle of hospitality that promotes and values the insertion in the KOS of new, alternative or local specifications. The 'cultural hospitality' implies the capacity to create procedures so that KOS are permeable to different cultural perceptions and conceptions. In one of her papers, Beghtol pointed out that the cultural hospitality “needs to be debated, assessed and tested further to assess its potential for effective implementation” (Beghtol, 2002a p. 48). In response to that call, we propose here the initial categorization of three forms of cultural hospitality: i) The technique of creation of paradoxical spaces (Rose 1993; Olson and Ward 1998) to insert gender terminology into the Dewey Decimal Classification (DDC) tables. The method could also be used in any KOS to insert other culturally determined terms (by race, religion or subculture). This technique proposes to subdivide a general concept in order to incorporate particular concepts that have been omitted. As an example, Olson and Ward (1998) subdivided 'Economic basis of labor' to include the topic 'Unpaid labor', as follows: 331.116 Economic basis of labor (recorded in DDC) 331.116 2 Unpaid labor (incorporated by the creation of a paradoxical space) 331.116 3 Paid labor (recorded in DDC) 37 ii) The creation of schemes biased towards certain cultural orientations within preexisting or autonomously built classifications. In a recent paper about bias in KO, Colombo and Barité (2015) identified and categorized three forms of bias in KOS: positive bias, negative bias and neutral bias. Of the three, the positive bias, understood as the premeditated will to orient the terminology of a KOS in a certain direction (Buddhist thought, Marxist conception, an evangelical approach, a feminist stance in the choice of terms) is a door of access to a cultural perspective with its own identity. iii) The local adaptations of universal classification schemes to represent the specific characteristics of a country or region. Local adaptations of universal classification systems are a traditional way of expressing cultural hospitality, insofar as they seek to address the peculiarities of a country or region, its geography, administrative division, literature or history. These adaptations can be designed by those officially responsible for the KOS (Beall 2003; Choi 2018), or they can be of local nature, or internally generated in an information system, or a library system or network, with the advantages and disadvantages it entails. 4.0 Conclusions The term 'culture' has been used in many different contexts, by various disciplines, with different scopes, thus becoming an essentially ambiguous, controversial and changing concept. Neither anthropology - its field of origin - nor the development of cultural studies have been able to overcome this ambiguity, or reach definitive agreements on its meaning, since each discipline loads the concept with original contents, according to its specific needs. The notion of warrant, instead, has a wide and peaceful consensus in KO. It can be said that the CW is at the midpoint: while some specific theoretical, methodological and application guidelines have been drawn up, they have been established in a somewhat dispersed and sporadic way in specialized literature. Besides, the reasoned extrapolation of concepts related to culture and cultural aspects from the disciplines that have most discussed these issues to KO still seems insufficient. A more comprehensive work, more case studies, and the promotion of theses that include KO in an orderly way, as well as methodological aspects of anthropology, cultural studies, history, and sociology are required. It is concluded that the concepts of culture and CW are not neutral. They favour knowledge organization forms that replace the objectivity and neutrality criteria with those of cultural pertinence and respect for community values. It can be stated that the CW is more directly related to pragmatic epistemological approaches to KO, and, therefore, the ethical factor is naturally included in the CW. The need for KO to have its definition (or definitions) of culture is reaffirmed. It is proposed to use the subculture concept as an operational approach because the CW can possibly be more useful to promote the terminology and expressions of minority or relegated cultures. In other situations, a broader conceptualization of culture may be necessary, especially for the subject representation of the elements of two or more cultures coexisting in the same territory where it is necessary to find linguistic mechanisms for social, political or religious integration. The best methodological solutions will surely arise in each particular case. Perhaps it 38 will be possible to extrapolate some methodologies used to study the linguistic behaviour of subcultures. It definitely seems that qualitative methods are the most suitable tools to work with the CW. In this paper, it has been possible to systematize a set of useful methodologies for the CW and to propose a first categorization with three forms of cultural hospitality. The CW contributes to reaffirm the identity of local cultures and neutralize the acculturation effects associated with globalization and political-economic processes of social exclusion from KOS. It is essential to privilege the analysis of users’ relationship with indexing terms and documents organization involving their cultural baggage, their way of understanding reality and assimilating established knowledge. It is important to point out the integrative and democratic role that the CW can give to our area of knowledge, as it postulates tolerance between different cultures and respect for the cultural integrity of subcultures inserted in our societies. New questions arise about the CW and its purposes, and they require new answers: should it be a tool to prevent or correct deviations from the culturally appropriate, acceptable or tolerable issues? Should we be satisfied if we obtain politically correct tags? Or should we be involved with the cultural, political or ideological conceptions of social movements that claim a different reality in terms of language? The emphasis on consensus (appeal to cultural integration) or conflict (preferential attention to subjugated cultures or social movements) will give the answer in each case. Acknowledgement The authors thank the reviewers for their important suggestions to improve this paper. References Barité, Mario. 2018. “Literary Warrant.” Knowledge Organization 45: 517-36. Barité, Mario. 2019. “Toward a General Conception of Warrants: First Notes.” Knowledge Organization 46: 647-55. Baumann, Gerd. 1996. Contesting Culture: Discourses of Identity in Multi-ethnic. London; Cambridge: Cambridge University. Beall, Julianne. 2003. “Approaches to Expansions: Case Studies from the German and Vietnamese Translations.” In World Library and Information Congress: 69th IFLA General Conference and Council 1-9 August 2003, Berlin. Beghtol, Clare. 1986. “Semantic Validity: Concepts of Warrant in Bibliographic Classification Systems.” Library Resources & Technical Services 30, no. 2: 109-23. Beghtol, Clare. 2002a. “Universal Concepts, Cultural Warrant and Cultural Hospitality.” In Challenges in Knowledge Representation and Organization for the 21st Century: Integration of Knowledge Across Boundaries: Proceedings of the Seventh International ISKO Conference 10-13 July, 2002 Granada, Spain, edited by María José López-Huertas. Advances in knowledge organization 8. Würzburg: Ergon Verlag, 45-9. Beghtol, Clare. 2002b. “A Proposed Ethical Warrant for Global Knowledge Representation and Organization Systems.” Journal of Documentation 58: 507-32. Benyaich, Sokaina Benyaich. 2014. Estudio Terminológico de la Mudawana: Código de Familia Marroquí: Árabe-Español. Master’s thesis. Alcalá de Henares: Universidad de Alcalá. 39 Berlin, Brent. 1992. Ethnobiological Classification: Principles of Categorization of Plants and Animals in Traditional Societies. Princeton. University Press. Boccato, Vera Regina Casari and Ricardo Biscalchin. 2014. “As Dimensões Culturais no Contexto da Construção de Vocabulários Controlados Multilíngues.” Revista Interamericana de Bibliotecología 37, no. 3: 237-50. Brasil. Ministério de Cultura. 2006. Tesauro de Folclore e Cultura Popular Brasileira. 2a. edição ampliada. Bullard, Julia. 2017. “Warrant as a Means to Study Classification System Design.” Journal of Documentation 73: 75-90. Burke, Peter. 2005. History and Social Theory. 2nd edition. New York: Cornell. Choi, Inkyung. 2018. Toward a Model of Intercultural Warrant: A Case of the Korean Decimal Classification's Cross-cultural Adaptation of the Dewey Decimal Classification. Ph.D. dissertation. Milwaukee: University of Wisconsin. Colombo, Stephanie and Mario Barité. 2015. “Tres Enfoques de Bias en Organización del Conocimiento: Bias Neutro, Bias Negativo y Bias Positivo.” Brazilian Journal of Information Studies 10, no. 2: 9-13. Di Tella, Torcuato S., Hugo Chumbita, Paz Gajardo, and Susana Gamba. 2004. Diccionario de Ciencias Sociales y Políticas. Buenos Aires: Ariel. Furner, Jonathan and Anthony W. Dunbar. 2004. “The Treatment of Topics Relating to People of Mixed Race in Bibliographic Classification Schemes: A Critical Race-theoretic Approach.” In Knowledge Organization and the Global Information Society: Proceedings of the Eighth International ISKO Conference 13-16 July 2004 London, UK, edited by Ia C. McIlwaine. Advances in knowledge organization 9. Würzburg: Ergon Verlag, 115-20. García Gutiérrez, Antonio. 2007. Desclasificado: Pluralismo Lógico y Violencia de la Clasificación. Barcelona: Anthropos. Getty Research Institute. 2017. Art & Architecture Thesaurus. González Casanova, Pablo. 1996. “Clasificaciones y Definiciones: Nota para un Bibliotecario.” Investigación Bibliotecológica 10, no. 20: 3-8. Guimarães, Jose Augusto Chaves, Isadora Victorino Evangelista, Gabriele de Araújo Medeiros Luz, Henrique Fiamengue Osawa. 2019. “A Dimensão Cultural da Organização do Conhecimento: Uma Análise de Comunidades Epistêmicas no Contexto Internacional da Ciência da Informação.” Scire 25, no. 1: 25-36. Guimarães, Jose Augusto Chaves, Juan Carlos Fernández-Molina, Fabio Assis Pinho, and Suellen Oliveira Milani 2008. “Ethics in the Knowledge Organization Environment: an Overview of Values and Problems in the LIS Literature.” In Culture and Identity in Knowledge Organization: Proceedings of the Tenth International ISKO Conference 5-8 August 2008 Montréal, Canada, edited by Clément Arsenault and Joseph T. Tennis. Advances in knowledge organization 11. Würzburg: Ergon Verlag,: 340-46. Hartley, John. 2004. Communication, Cultural and Media Studies: Key Concepts. 3rd ed. New York: Routledge. Hebdige, Dick. 1979. Subculture: The Meaning of Style. London: Methuen and Co. Hjørland, Birger. 2002. “Domain Analysis in Information Science: Eleven Approaches – Traditional as Well as Innovative.” Journal of Documentation 58: 422–62. Hjørland, Birger. 2013. “Theories of Knowledge Organization: Theories of Knowledge.” Knowledge Organization 40: 169-81. Hjørland, Birger and Hanne Albrechtsen. 1999. “An Analysis of Some Trends in Classification Research.” Knowledge Organization 26: 131-39. Hoggart, Richard. 1957. The Uses of Literacy: Aspects of Working Class Life. London: Chatto & Windus. 40 Hsieh Hsiu-Fang and Sarah Shannon. 2005. “Three Approaches to Qualitative Content Analysis.” Qualitative Health Research 15, no.9: 1277-288. Kress, Gerard. 1989. “History and Language: Towards a Social Account of Linguistic Change.” Journal of Pragmatics 13, no. 3: 445-66. Krippendorff, Klaus. 2004. Content Analysis: An Introduction to its Methodology. 2nd ed. Thousand Oaks: Sage Publications. Kroeber, A.L. and C. Kluckhohn. 1952. “Culture: A Critical Review of Concepts and Definitions.” Papers of the Peabody Museum 47, no. 1: 1-223. Kua, Eunice. 2004. “Non-western Languages and Literatures in the Dewey Classification Scheme.” Libri 54, no. 4: 256-65. Kuper, Adam. 1999. Culture: The Anthropologist Account. Cambridge: Harvard University Press. Lee, Wan-Chen. 2015. “Culture and Classification: An Introduction to Thinking about Ethical Issues of Adopting Global Classification Standards to Local Environments.” Knowledge Organization 42: 302-07. Lee, Joel M.E. 1976. “E. Wyndham Hulme: A Reconsideration.” In The variety of Librarianship: Essays in Honour of John Wallace Metcalfe, edited by W. B. Rayward. Sydney: Library Association of Australia, 101-103. Olson, Hope A. 2007. “How We Construct Subjects: A Feminist Analysis.” Library Trends 56, no. 2: 509-41. Olson, Hope A. and Dennis B. Ward, 1998. “Charting a Journey across Knowledge Domains: Feminism in the Decimal Dewey Classification.” In Structures and Relations in Knowledge Organization: Proceedings of the 5th International ISKO Conference 25-29 August 1998 Lille, France, edited by Widad Mustafa El Hadi, Jacques Maniez, and Steven A. Pollit. Advances in knowledge organization 6. Würzburg: Ergon Verlag, 238-44. Pacey, Philip. 1989. “The Classification of Literature in the Dewey Decimal Classification: The Primacy of Language and the Taint of Colonialism.” Cataloging & Classification Quarterly 9, no. 4: 101-07. Pérez, Joseph. 1993. Historia de una Tragedia: La Expulsión de los Judíos de España. Grijalbo: Crítica. Rodríguez Pastoriza, Francisco. 2006. Periodismo Cultural. Madrid: Síntesis. Rose, Gillian. 1993. Feminism & Geography: The Limits of Geographical Knowledge. Minneapolis: University of Minnesota Press. Shera, Jesse H. 1961. “Social Epistemology, General Semantics, and Libraries.” Wilson Library Bulletin 35, no. 3: 767–70. Trivelato, Rosana Matos da Silva and Maria Aparecida Moura. 2016. “Alterity, Tolerance and Heterotopia: Repercussions on the Religion Science Representation in Bibliographic Classification Systems.” In Knowledge Organization for a Sustainable World: Challenges and Perspectives for Cultural, Scientific, and Technological Sharing in a Connected Society. Proceedings of the Fourteenth International ISKO Conference 27-29 September 2016, Rio de Janeiro, Brazil, edited by José Augusto Chaves Guimarães, Suellen Oliveira Milani, and Vera Dodebei. Advances in knowledge organization 15. Würzburg: Ergon, 538-45. Unesco 2001. Universal Declaration on Cultural Diversity. Van Dijk, Teun A. 1999. “El Análisis Crítico del Discurso.” Anthropos 186: 23-36. Williams, Raymond. 1958. Culture and Society. London: Chatto and Windus. Wright, Susan. 1998. “The Politicization of 'Culture'.” Anthropology Today 14, no. 1: 7-15. Maria Teresa Biagetti – Sapienza Rome University, Italy Bibliographical Relationships in Knowledge Organization Systems A Historical-Theoretical Perspective Abstract: The study endeavours to analyse the bibliographical relationships provided by some prominent cataloguing codes in the 19th century that is, the codes by Panizzi, Jewett and Cutter, with the aim of highlighting the deep organization of cross-references. The bibliographical relationships in ancient catalogues are compared to the bibliographical relationships provided by the FRBR model and analysed in relation to the functionalities offered in linked open data format built by libraries. The study remarks that in the exposition in LOD format of catalographic data by Libris, and, all the relationships provided by ancient catalogues are not provided. 1.0 Introduction Bibliographic relationships are one of the central issues of Knowledge Organization and of cataloguing theory. Derivative relationship brings into being different editions and translations of a work and a further class of derivative textual works, such as amplifications, extractions, commentaries, adaptations, and creates families of works from a common progenitor. The frequency and distribution of derivative works has been analysed by Smiraglia and Lazer (1999). In addition to the descriptive, sequential, wholepart and accompanying relationships, Tillet too drew attention to derivative relationships, such as different versions of a work, editions, revisions, summaries, adaptations, new works based on the earlier work, changes of genre, as dramatizations (Tillet 1989; 2001; Green 2001). Also in the new models for organizing bibliographic data, such as FRBR (IFLA 1998) and LRM (IFLA 2017), bibliographic relationships are crucial. The analysis by Noruzi (2012) reveals that there is an important, albeit not complete, congruence between the categories defined by FRBR and Tillett’s categorization. Additionally, Arsenault and Noruzi (2012) focused on Work-to-Work bibliographic relationships among Canadian publications, and highlighted the frequency and percentage of Workto-Work bibliographic relationships in some categories, such as supplement, successor, transformation, and adaptation. RDA (American Library Association 2010) provides guidelines to set up bibliographic relationships between related Works, considering the derivative, the whole/part, and the sequential relations; moreover, it offers lists of relationship designations to be used. In building data set in LOD format, bibliographic ontologies or data models are used, such as Bibo (The Bibliographic Ontology) and Bibframe (The Library of Congress 2016), in which bibliographic relationships are essential, and expressed in the form of object properties. 2.0 Methodology The main issue of this study is to pinpoint more exactly bibliographical relationships in bibliographic entities and to highlight the importance of thoroughly analysing the structure of catalogues of prominent libraries in the past centuries, because they provide 42 a valuable source to study bibliographical categories and relationships involved also in the realization of modern tools for retrieval. The aim is to provide a more thorough model of bibliographic relationships to use in particular in LOD built by libraries, with the purpose of emphasizing the relationships not offered in the modern models. Therefore, an analysis of a sample of portals of data sets in LOD format is offered with the aim of analysing the relationships used and provided for searching. A historical-theoretical survey about the most significant catalographic tools in the 19th century is presented, with the purpose of verifying the structures used to underline the bibliographic relationships. 3.0 Bibliographic relationships in the catalogues of the 19th century Some catalogues in the 19th century provide cross-references to connect Works and Expressions, Works and Authors, Authors and Texts, Author and Author. In this paper, I consider only a few of them. In The British Museum’s Rules for the compilation of the catalogue (1839), issued within the first volume of the Catalogue of printed books in the British Museum (1941) and for the most part a great deal of effort by Antonio Panizzi, the Rules LXI-LXII provide a set of cross-references (some of them already offered by the British Museum catalogue 1787 and 1813-1819) that allow to offer links between Authors and Works: editors, co-authors and authors of continuations, translations, comments and biographies. An in-depth analysis of the ancient cataloguing rules is presented by Biagetti (2001). In the Report of the 21th February 1839 addressed to the Trustees of the British Museum (Appendix n. 10 1850) which provides the synthesis of the catalographic regulations established by Panizzi himself, is presented the network of references that should be realized to link Works, Texts and Authors and to connect an author’s work to all the authors of the different texts and of illustrations, or to the editors and commentators or the person who is continuing a work. Table 1 shows a synthesis of the Rules LXI-LXII for cross-references, some examples from the Appendix n. 10 (1850), p. 192 and my explanation between brackets: Table 1 Relationships and examples from the Panizzi’s Rules for the British Museum The editor’s name and the work edited. “Garret (William). See Floddon field. The battle, etc. 1822, 8°” (Garret is the editor of the edition of the work about the battle of Flodden field). The author’s name of a biography issued within an edition of a work and the work’s title. “Campegius (Symphorianus). Arnaldi vita. See Arnaldus or Arnoldus de Villanova. Opera etc. 1520. fol” (Symphorien Champier issued Arnaldi Vita within the edition 1520 of the Works by Arnaldo da Villanova). The co-author’s names and the work’s author and title. “Fletcher (John). See Beaumont (F.). Comedies and tragedies. 1674. fol” (Fletcher and Beaumont are co-authors). The author’s name of a continuation of a work and the work’s author and title. “Gaertner (Carolus Fridericus). Supplementum carpologiae. See Gaertner (J.). De fructibus et siminibus. 1788-1805. 4°” (Gaertner C. F. is the author of the supplement published in 1805 to the Joseph Gaertner’s work). The name of a translator of a work and the work’s author and title. “Moir (George). See Schiller (F) Wallenstein. 1827. 8°” (George Moir is the translator in English of the Schiller’s work). The name of a commentator of a work and the work’s author and title. “Sullivan (Arabella). See Ogle (B.) Lady Dacre, Recollections…1833. 8°” (Recollections of a chaperon is a histories’ collection known as by Barbarina Ogle, Wilmot, Brand (Lady 43 Dacre), English poet and Arabella Sullivan’s mother, who in this case is considered to be the commentator of the work, but probably is the real author). It is important to notice that also semantic links are provided: the Rule LXIII connects the name of a person who is the subject of the biography and the name of the biographer; the Rule LXV links the name of an author whose work has been analysed within a work of another author and the name of the latter. Table 2 shows a synthetic explanation and examples from the Appendix n. 10, with my comment. Table 2 Relationships and examples from the Panizzi’s Rules for the British Museum The person subject of a biography and the biographer. “Rousseau (Jean Jacques). Vie. See Barruel de Beauvert (A. J.) Vie de J. J. Rousseau etc. 1789. 8°” (J. J: Rousseau is the subject of the biography by Barruel de Beauvert). The author whose work has been analysed into a work of another author and the author’s name. “Martialis (Marcus Valerius). See Calderinus (D.) Commentarii in M. 1474. 4°” (Calderinus is the author of the comment). Earlier, Antonio Panizzi offered a similar structure of cross-references in the classed catalogue of scientific works of the Royal Society1 published in 1836. In this case, references are used to link editors and commentators to the works, and discovered authors of anonymous works to the works. However, the most relevant thing is to use crossreferences with the aim of indexing works published within other publications or within miscellaneous works, reports or comments issued together with the publication to which they are linked, additions and supplements. Moreover, as this is a classed catalogue, the works concerning more than one discipline are linked to each relevant class by crossreferences. Table 3 shows a synthetic explanation, examples from the Royal Society Catalogue, 1836, and my comment. Table 3 Relationships and examples from the Panizzi’s Royal Society Catalogue The author of a work inserted within Another work and the work. “Cusa or Cusanus (Nicolaus de). De quadratura circuli deque recti ac curvi commensuratione. See Regiomonte (J. M. de). De triangulis omnimodis. Fol. 1533” (Cusano issued his work as an addition to the Regiomontano’s work). The author of a report related to a work and the work. “Ampère (A. M.) Rapport sur une mémoire de M. Bérard. See Bérard (J. B.). Méthodes nouvelles pour déterminer les Racines etc. 4° 1818”. (Ampère is the author of the report). Charles Coffin Jewett (1852) published On the construction of catalogues of libraries, and their publication by means of separate, stereotyped titles. Jewett devised a system to use stereotype plates to print catalogues of American libraries with uniform headings, under the direction of the Smithsonian Institution. To this aim, Jewett prepared the code for cataloguing to be used by the Smithsonian’s library, based on the Rules of the British Museum. The code provides cross-references for all the authorial responsibilities in a work: translators, commentators, editors, continuators, also from the name of any author whose work is contained in a collection. Semantic references are provided linking 1 The Royal Society Library, “Archive Room”, Catalogue 1836, without title page. On the spine of the book: “R. S. LIB. CAT. 1836”. 44 the name of a person subject of a biography or whose work is subject of a commentary (also without the text), to the work. Table 4 shows some explanations and examples from the work by Jewett (1852). Table 4 Relationships and examples from the Jewett’s work The editor’s name and the work edited. “Tacitus (Caius Cornelius). Cornelii Taciti opera. Ad codices antiqvos exacta et emendata commentario critico et exegetico illvstrata edidit Franciscvs Ritter ... V. 1–4. [With a biogr. and crit. preface.] Cantabrigiae, 1848. 8º.” The author’s name of a biography issued within an edition of a work and the work’s title. “Bentick (Lord George).See Disraeli (Benj.). Biography of Lord Geo. Bentinck.” The co-author’s names and the work’s author and title “Sievrac (Jean Henri). See Cobbett (Wm.). Roman hist. in French and English; the Fr. by J. H. Sievrac.” (The reference is to “Cobbett (William). Elements of the Roman history, in English and French, from the foundation of Rome to the battle of Actium; selected from the best authors, ancient and modern, with a series of questions ... The English by William Cobbett; the French by J. H. Sievrac. London, 1828. 12º”) The author’s name of a continuation of a work and the work’s author and title. “Marleborough (Henry). Ancient Irish histories.—The chronicle of Ireland. By Henry Marlebvrrovgh; continued from the collection of Doctor Meredith Hanmer, in the yeare 1571. Dublin, 1809. 8º”. The name of a translator of a work and the work’s author and title. “Taylor (William). See Oriental hist. mss. in the Tamil language; transl. with annotations by Wm. Taylor.” (The reference is to “Oriental historical manuscripts, in the tamil language: translated; with annotations. By William Taylor, missionary. […] Madras, 1835”.) The name of a commentator of a work and the work’s author and title. “Apollodorus, of Athens. See Heyne (C. G.). Ad Apollodori Ath. bibliothecam notæ, etc.” (The reference is to “Heyne (Christian Gottlob). Ad Apollodori Atheniensis bibliothecam notae avctore Chr. G. Heyne cvm commentatione de Apollodoro argvmento et consilio operis et cvm Apollodori fragmentis. […] Goettingae, 1783. 8º”). The author of a work published in a collection and the collection. “Spartianus (Ælius). See Historiae Augustæ scriptores. Ælius Spartianus”. (The reference is to “Historiae Augustæ scriptores VI. Ælius Spartianus. Julius Capitolinus. Ælius Lampridius. Vulc. Gallicanus. Trebell. Pollio. Flavius Vopiscus. Cum integris notis Isaaci Casauboni, Cl. Salmasii & Jani Gruteri. […] Lugduni Batav[orum], 1671. 8º”). The person subject of a biography and the biographer. “Alexander, the Great. See Curtius Rufus (Quintus). De rebus gestis Alexandri Magni.” In the Rules for a printed dictionary catalogue (1876), based on the catalographic practice followed setting up the first volume of the Boston Athenaeum Library, Charles Ammi Cutter established cross-references for all the authorships of a work, such as editors, translators, including designers, painters, cartographers and, in particular cases, engravers; moreover, for commentaries, continuation and indexes of a work. The Rules by Cutter are well known and it is not essential a complete explanation of the categories adopted. Moreover, they are in part similar to those by Panizzi. 45 4.0 FRBR, LRM, EDM and bibliographic relationships FRBR provides bibliographic relationships between Entities of the first group: Works, Expressions, Manifestations and Items in order to increase information and help finding linked Entities. Logic relationships are used to connect the second group, Entities, Persons and Corporate bodies, to the Entities of the first group. Semantic relationships are provided to link Works that concern the same topic, and a work of critic of a literary work. All the Entities of the three groups may be subject of a Work. Considering Work-to-Work relations, the most significant relationships in FRBR are sequel, supplement, concordance, summarization, transformation, imitation. Relationships between Expressions of the same work are Abridgement, Revision, Translation, Arrangement (music); between Expressions of different Works, there are relationships such as Successor, Supplement, Complement, Summarization, Adaptation, Transformation, and Imitation. In the FRBR, “relationships are examined in the context of the entities defined for the model, i.e., they are analysed specifically as relationships that operate between one work and another, between one expression and another, between a manifestation and an item, etc.” (IFLA 2009, 55) It is important to bear in mind that, however, the network of relationships concerns mainly transformations of a Work. Considering the attributes, the statement of responsibilities is highlighted only at the Manifestation level: authors, translators, editors, compilers. Knowing an attribute of responsibility related to an Expression of a Work, for instance the editor of a particular edition of a Work, it is not possible to find the Work, running through the web of relationships. In the LRM (2017, 63) the relationships of second level, in particular for Works, mainly include relations of subject, whole-part (component part), priority (logical continuation), complement (or companion), inspiration for (source of ideas), transformation (change of literary form); considering Expressions, are included relations of whole-part, derivation, aggregation; and in case of Manifestations, relations of whole-part and reproduction. In the EDM, Europeana Data Model for Europeana Collections (2017), the bibliographic relationships are presented in the form of properties, including “Is Derivative Of”, “Is Similar To”, “Is Next in Sequence To”, “Is Related To”, “Is Representation Of”, “Is Successor Of”, “Contributor”, “Creator”, “Is Replaced By”. 5.0 Data set in LOD format In this work I considered a little sample of data set provided by the Swedish Libris, the Biblioteca Nacional de España (, the Bibliothèque Nationale de France ( A mapping between the model proposed by the rules for cross-references provided by British Museum, by Jewett and by Cutter, and the realization of data set in LOD format by some influential libraries, is presented. The aim is to emphasize that LOD format realization by libraries could offer a more detailed and a deeper organization of information using a larger amount of bibliographic relationships, following the ancient catalogues’ model. Libris : The functionalities for searching in the Swedish OPAC based on LOD technologies offer the possibility to find all the works of an author – for instance, August Strindberg – including monographs and some article about him. Searching for a work – for instance, 46 Fröken Julie – the user can find all editions in original language and translations assembled by language., (the beta version) : The functionalities in the Spanish portal allow to find a work – for instance, Don Quijote de la Mancha – and all the different editions also translated in other languages; moreover, they allow to find works about the work considered. Searching for an author – for instance, Miguel de Cervantes Saavedra – the functionalities permit to find the entire author’s works, the works about him, including some biographies and, in case, the works attributed to the author. : The portal offers a great number of functionalities based on semantic web technologies. Searching for an author – for instance, Victor Hugo – the user finds all the works in order of decreasing dates, the musical and iconographic works, manuscripts and archive documents, theatric works of which he is the author of the text; moreover, the works about Hugo are listed, split in categories, such as video, films, images, archive documents; finally, works and persons linked to Hugo are listed: co-authors, designers, engravers, librettists. Selecting a textual work – for instance, Les Misérables – all the editions in French are presented; moreover, the films, the registrations, the theatrical performances based on the work, monographs about the work, documentaries, and virtual shows. Furthermore, allows to find a great number of authors linked to the work: editors and commentators, translators, actors, scenographers, producers, and many others. 6.0 Discussion From the analysis, it appears that in Libris, and derivative relationships, such as the categories concerning supplement, continuation and abridgements of a work are not thoroughly provided. On the contrary, these relationships were suggested – albeit restricted to continuation of a work – by Panizzi and followed by Jewett, and have been considered by FRBR. Moreover, there is the lack of the possibilities offered by the set of cross-references provided by the prominent catalogues of the 19th century. For instance, the possibility to connect the person who is the subject of a biography and the name of the biographer (except for Datos.bne) or to link works or reports inserted within another work to the host work, as was suggested by Panizzi’s Catalogue of the Royal Society. provides the major number of connections based on the FRBR model and proposes the most thorough set of bibliographic information. However, it does not stress on works published within other publications or in miscellaneous works, as suggested by Panizzi. Table 5 shows a mapping between, on one side, a selection of relationships offered by the ancient cataloguing codes of the 19th century and, on the other side, modern data models, such as FRBR/LRM and data exposition in LOD format provided by three prominent libraries. 47 Table 5 Mapping between modern data models/LODs and ancient catalogues’ relationships *The Editor / the work edited x x *The Author of biography issued within a work / the work’s title *Co-authors / work’s title x x *The Author of a continuation of a work / the work x x *The Author of a translation of a work / the work x x x x *The Author of a comment of a work / the work x *The subject of a biography / the biographer x *The author whose work has been analyzed into a work / the work x x x x **The author of a work inserted within another work / the work **The author of a report related to a work / the work ***The author of a work published in a collection / the collection FRBR-LRM LIBRIS DATOS. BNE DATA. BNF EDM * Panizzi-Rules for British Museum, Jewett’s code and, in part, the Cutter’s Rules. ** Panizzi’s Royal Society Catalogue. *** Jewett’s code. 7.0 Conclusion The abundance of bibliographic relationships provided by the ancient catalogues through the skillful use of cross-references and links among responsibilities and Works or Expressions, persuades us that it is essential to analyse more thoroughly the structure of important ancient catalogues. In some cases, ancient catalogues show an advanced model of bibliographical relationships, and some connections they allow are not provided either by the FRBR model. This is the case of the Panizzi’s rule for the catalogue of British Museum regarding the link between the name of authors whose work is commented by another author and the name of the latter, or of the rule concerning the use of cross-references to index works published within other publications or within miscellaneous works. The prominent ancient catalogues are a relevant source to study bibliographic relationships involved also in setting up modern tools for searching resources, such as the tools provided by linked open data technologies. Semantic technologies and also bibliographic ontologies should allow to highlight a major number of connections between bibliographic entities. In particular, the works inserted within other works, the reports published within another work and the works issued in collections should receive a major opportunity to be highlighted. References American Library Association, Canadian Library Association, Chartered Institute of Library and Information Professionals (Great Britain), Joint Steering Committee for Development of RDA. 2010. RDA: Resource Description & Access. Chicago: American Library Association Appendix to the Report of the Commissioners Appointed to Inquire into the Constitution and Management of the British Museum, N. 10. 1850. Attached to the Report of the Commissioners 48 Appointed to Inquire into the Constitution and Government of British Museum; with Minutes of Evidences, […] London, William Clowes and sons. Arsenault Clément and Alireza Noruzi. 2012. “Analysis of Work-to-Work Bibliographic Relationships through FRBR: A Canadian Perspective.” Cataloging & Classification Quarterly 50: 641-652. Biagetti Maria Teresa. 2001. Teoria e Prassi della Catalogazione Nominale. I Contributi di Panizzi, Jewett e Cutter. Roma, Bulzoni. Cutter A. Charles. 1876. “Rules for a Printed Dictionary Catalogue by Charles A. Cutter Librarian of the Boston Athenaeum.” In Public Libraries in the United States of America… Their History, Condition, and Management. Washington, Government Printing Office. Europeana. 2017. Definition of the Europeana Data Model v5.2.8. Green Rebecca. 2001. “Relationships in the Organization of Knowledge: An Overview.” In Relationships in the Organization of Knowledge, edited by Carol A. Bean and Rebecca Green. Dordrecht etc.: Kluwer Academic Publishers, 3-18. IFLA. 1998. Functional Requirements for Bibliographic Records: Final Report. München: K.G. Saur. IFLA. 2009. Functional Requirements for Bibliographic Records: Final Report. Approved by the Standing Committee … As amended and corrected through February 2009. München: K.G. Saur. IFLA. 2017. Library Reference Model. Consolidation Editorial Group of the IFLA FRBR Review Group … Edited by Pat Riva, Patrick Le Boeuf, and Maja Žumer. Revised after worldwide review. Not yet endorsed by the IFLA Professional Committee or Governing Board.]. Jewett Charles C. 1852. Smithsonian Report. On the Construction of Catalogues of Libraries, and Their Publication by Means of Separate, Stereotyped Titles. With Rules and Examples. Second edition. Washington: The Smithsonian Institution. The Library of Congress. 2016. Bibliographic Framework Initiative. Model and Vocabulary 2.0 Noruzi, Alireza. 2012. “FRBR and Tillett’s Taxonomy of Bibliographic Relationships”. Knowledge Organization 39: 409-416. Rules for the compilation of the catalogue. 1839. In Catalogue of Printed Books in the British Museum. Volume 1. London: printed by order of the Trustees, 1841, v-ix. Smiraglia, Richard P. and Gregory H. Leazer. 1999. “Derivative Bibliographic Relationships: The Work Relationship in a Global Bibliographic Database.” Journal of the American Society for Information Science 50, no. 6: 493-504. Tillett, Barbara B. 1989. “Bibliographic Structures: The Evolution of Catalog Entries, References, and Tracings.” In The Conceptual Foundations of Descriptive Cataloging, edited by Elaine Svenonius. San Diego: Academic Press Inc., 149-165. Tillett, Barbara B. 2001. “Bibliographic Relationships.” In: Relationships in the Organization of Knodwledge, edited by Carol A. Bean and Rebecca Green. Dordrecht: Springer, 19-35. Ceri Binding – University of South Wales, UK Claudio Gnoli – University of Pavia, Italy Gabriele Merli – University of Pavia, Italy Marcin Trzmielewski – Paul-Valéry University of Montpellier 3, France Douglas Tudhope – University of South Wales, UK Integrative Levels Classification as a Networked KOS A SKOS Representation of ILC2 Abstract: Recently, there is a need to move knowledge organization systems (KOS) to online applications, by using Semantic Web technologies, in order to optimize indexing and searching. The present paper reports the representation of the Integrative Levels Classification (ILC) as a networked KOS, through conversion of its second edition into the W3C standard SKOS (Simple Knowledge Organization System) format. 1.0 Introduction In recent years, there is an increased need to move knowledge organization systems (KOS) to online applications, such as library catalogues or research data repositories, to optimize indexing and searching. Such need leads to represent traditional organization of concepts in the syntax of Semantic Web technologies (Binding and Tudhope 2016; Trzmielewski and Gnoli 2019). As Peponakis et al. (2019) highlight, enumerative disciplinary classifications and subject headings are harder to represent as machine processable and expressive semantic networks while thesauri are more suitable for this purpose. Therefore, it is also interesting to observe which solutions may be adopted with freely faceted classifications, which have a richer structure closer to that of thesauri and a greater expressive power. The present paper reports on the representation of the Integrative Levels Classification (ILC) as a networked KOS, through conversion of its second edition into the W3C standard SKOS (Simple Knowledge Organization System) format. The SKOS work on standards for thesauri and other knowledge organization systems grew out of the EC FP5 SWAD-Europe project. The aim was to facilitate the migration of KOSs to the Semantic Web, and work was carried forward by the W3C Semantic Web Best Practices and Deployment Working Group. The SKOS standard is published as a W3C Recommendation (W3C 2009). While SKOS was designed with thesauri primarily in mind, the availability of a relatively simple and accessible standard, expressible in RDF, has undoubtedly contributed to a major interest in KOSs generally for Semantic Web and linked data application development and also the mapping (and linking) of one KOS to another. Representation in SKOS (and RDF) exposes KOSs to a wide potential audience of developers and users. This was the rationale for investigating the representation in SKOS and making available a machine readable version of ILC. 2.0 The features of ILC The Integrative Levels Classification is a general KOS that is being developed since 2004 by an international team of scholars, including the authors of this paper. It draws from the tradition of faceted bibliographic classifications as developed by Ranganathan 50 and the Classification Research Group. However, it differs from these mainly for listing phenomena — such as iron, lakes, trade unions or orchestras — instead of disciplines — such as chemistry, geography, economics or musicology (Gnoli 2016). This has important consequences, both theoretical and applied. One of them is that the same classes can be applied to bibliographic records (e.g. an article on bagpipes), museum objects (e.g. a bagpipe specimen), products (a bagpipe model offered in a maker website) and so on, possibly combined with additional dimensions (“bagpipes, in articles”; “bagpipes, in museums”...: see Gnoli, Park, and Ledl, 2019). Another original feature is that ILC facets are not only special facets limited to a specific main class (e.g. the processes of biology, or the materials of mining), but also free facets that can be used to connect any two classes from the whole spectrum of knowledge (e.g. cervid populations affected by road traffic). This KOS variety, described by Austin (1976) as freely faceted classification, offers a powerful expressivity very similar to that of a full language; at the same time, it implies a certain amount of syntactic complexity that is more demanding to be represented carefully as linked data (see next section). The first stable edition of the system (ILC1) was published in 2011 and consisted of 7,052 classes and facets. In September 2019, the developing new edition has been frozen to become the second stable edition (ILC2), consisting of 10,845 classes and facets (Gnoli 2020). Compared to ILC1, it has evolved in some renamed or moved main classes, better development of many subclasses, rearrangement and new definitions of various facet categories, distinction between facets by nature (“wheels” as parts of vehicles) and facets by function (“vehicles, with wheels”), and other details in notation. These changes are described by Park et al. (2020). Specific fields for mapping between different ILC editions, and between ILC and the Dewey Decimal Classification, are provided in the ILC MySQL database. ILC features involve a rich semantic structure with many components: basic classes (a-y), common facets (0-9), special facets (90-99), expected foci, deictics (A-Z), etc. While not all these structural components are provided for in the standard SKOS format, good compromises and solutions can be found for many of them (Gnoli et al. 2011). 3.0 Procedures It was necessary to transform the working representation of ILC2 used by the editorial team to SKOS and RDF. We were able to draw on previous experience by the Hypermedia Research Group at the University of South Wales with publishing national UK heritage thesauri (Heritage Data n.d.) as SKOS based linked data (Binding and Tudhope 2016). In order to generate a SKOS representation of ILC2, it was necessary to transform the relational (MySQL as exported into CSV) expression of the ILC2 classification system. This was achieved using the STELETO transformation tool developed previously (Binding, Tudhope and Vlachidis 2018). STELETO converts input data to any textual output format via a user-defined textual template. It is a cross-platform command line application (open source) that performs bulk transformation of delimited text tabular data into other textual formats via a custom template (Binding 2019). 51 Due to the complexity of a faceted classification system such as ILC, bespoke rules were added to the process for ILC purposes. For example, it was necessary to derive the hierarchical structure of the classification from the notational codes. Database fields for synonyms and descriptions of classes, of facet indicators and of foci have been treated variously in order to obtain meaningful labels. Mappings to DDC classes are available for all ILC main classes and for most 3-digit subdivisions of DDC (000-999). These have been linked to OCLC DDC URIs. The following solutions have been adopted: • records having purely alphabetic notation values (basic classes) are modelled as skos:Concept • records having purely numeric notation values (common facets) are modelled as rdf:Property, using the notation to determine the subproperty/superproperty relationships. Single number notations (i.e. the fundamental categories) are sub-property of skos:related. These properties are modelled with domain and range specified as skos:Concept. • records having a combination of alphabetic and numeric notation (special facets) are also modelled as rdf:Property with the domain being the alphabetic part of the notation and the range being the value from the ‘foci’ field (if present, otherwise skos:Concept). E.g. for m981 (“aged years”) domain is m (“organisms”) and range is an (“quantities”), super property is then m98 (“developmental stage”). 4.0 A metaphysical question: what is the top class of all phenomena? A basic SKOS relationship is skos:broader, by which any class can be related to its parent class. For example, wi “pots” has a skos:broader relationship to w “artifacts” — and vice versa, w “artifacts” has a skos:narrower relationship to wi “pots” and other subclasses. We generated these relationships in automatic ways by exploiting the expressivity of ILC positional notation, where every additional digit means an additional rank of specificity. Once we came to the main classes expressed by a single letter, such as w “artifacts” or h “celestial bodies”, we had to decide whether these in turn have any skos:broader relationship. ILC2 also has a class * meaning “absolute, apeiron, the undifferentiated whole” that could be seen as the primordial top class of which all phenomena are subdivisions. This would have implied that all single-letter classes would have a skos:broader relationship to class *. Draft visualizations of this architecture, however, looked confusing for expected common users, as they would display a very abstract, philosophical notion with much greater evidence than classes of more common usage. We thus opted for not recording such relationship in the SKOS version of ILC2. On the other hand, this has stimulated interesting considerations on how very general philosophical notions, such as “things in themselves” or “phenomena”, may be expressed in ILC. A provisional view, that could be implemented in ILC3, is that a top class meaning “being” can include both “absolute” that is noumena or things in themselves in philosophical terminology, and “phenomena” meant as classes of differentiated named entities, in turn including all common main classes; of these, some are “real”, that is actually existent, and can be specified by the deictic Y already available in ILC. 52 5.0 Publication details The conversion created a total of 82,534 triples describing 8,990 concepts (including 52 top concepts) and 943 properties (modelled as hierarchical specializations of skos:related), a total of 9,933 items. URIs for individual classes and facets refer to the online schedules previously available through a PHP interface. To allow online navigation, however, these have dynamic URLs of the form (for class jUxf “Bay of Fundy” taken as an example). In SKOS data, the dynamic form has been changed to a static one: This has required to set a mod_rewrite redirect instruction on the Apache server, so that referenced URIs are automatically converted to the dynamic form and the appropriate information is displayed. SKOS data for ILC2 are available from in Turtle, NTriples or RDF syntax. They are also available at the BARTOC (Basel Register of Thesauri, Ontologies and Classifications) repository at as part of the long-term experimentation with application of ILC to BARTOC indexing (Ledl and Gnoli 2017). The SKOS version is published using Skosmos, an open source tool developed at the National Library of Finland ( This produces various flavours of RDF output, including the commonly used NTriples format. 6.0 Visualizations A text based visualization is available from BARTOC. Graphical displays can be created by importing the generated ILC2 NTriples RDF data file into the AllegroGraph Gruff tool. Some illustrative examples of ILC2 concepts and properties follow, where we can see the SKOS NTriples output and corresponding graph-based visualisations using Gruff and the equivalent view from BARTOC. Note that we can observe in the SKOS output the main URI for the ILC2 SKOS scheme at, the concept being visualised with its preferred label (“Bay of Fundy”), its notation (jUxf), broader concepts (“Atlantic Ocean”) and their notation. @prefix rdfs: . @prefix skos: . @prefix ilc2: . rdfs:label "Integrative Levels Classification (ILC)"@en ; skos:prefLabel "Integrative Levels Classification (ILC)"@en ; a skos:ConceptScheme . ilc2:jUxf rdfs:seeAlso ; skos:broader ilc2:jUx ; skos:notation "jUxf" ; rdfs:label "Bay of Fundy"@en ; skos:prefLabel "Bay of Fundy"@en ; skos:inScheme ; a skos:Concept . ilc2:jUx skos:notation "jUx" ; 53 rdfs:label "Atlantic Ocean"@en ; skos:prefLabel "Atlantic Ocean"@en ; a skos:Concept ; skos:narrower ilc2:jUxf . Figure 1: Skosmos output and Gruff visualisations and corresponding BARTOC view for Bay of Fundy A more elaborate example with the concept of polypteriformes shows more of a hierarchical tree. We also see an example of a descriptive Note in the Gruff visualisation. @prefix ilc2: . @prefix skos: . @prefix rdfs: . ilc2:mqvh skos:notation "mqvh" ; rdfs:label "ray-finned fish"@en ; skos:prefLabel "ray-finned fish"@en ; a skos:Concept ; skos:narrower ilc2:mqvhb . 54 ilc2:mqvhb rdfs:seeAlso ; skos:note "including bichirs, reedfish"@en ; skos:broader ilc2:mqvh ; skos:notation "mqvhb" ; rdfs:label "polypteriformes"@en ; skos:prefLabel "polypteriformes"@en ; skos:inScheme ; a skos:Concept . rdfs:label "Integrative Levels Classification (ILC)"@en ; skos:prefLabel "Integrative Levels Classification (ILC)"@en ; a skos:ConceptScheme . Figure 2: Skosmos output and Gruff visualizations and corresponding BARTOC view for polypteriformes 55 A yet more complex example shows the variety of relationships within ILC2 and connections between concepts. For example, there are associative relationships (skos:related) between “stars” and “star clusters” and between “stars” and “plasma”. @prefix ilc2: . @prefix skos: . @prefix rdfs: . ilc2:h skos:notation "h" ; rdfs:label "celestial bodies"@en ; skos:prefLabel "celestial bodies"@en ; a skos:Concept ; skos:narrower ilc2:hl . ilc2:hu skos:notation "hu" ; rdfs:label "star clusters"@en ; skos:prefLabel "star clusters"@en ; a skos:Concept ; skos:related ilc2:hl . ilc2:hlb skos:notation "hlb" ; rdfs:label "subdwarf stars"@en ; skos:prefLabel "subdwarf stars"@en ; a skos:Concept ; skos:broader ilc2:hl . ilc2:hlg skos:notation "hlg" ; rdfs:label "giant stars"@en ; skos:prefLabel "giant stars"@en ; a skos:Concept ; skos:broader ilc2:hl . ilc2:hlj skos:notation "hlj" ; rdfs:label "supergiant stars"@en ; skos:prefLabel "supergiant stars"@en ; a skos:Concept ; skos:broader ilc2:hl . ilc2:hl skos:narrower ilc2:hlU, ilc2:hlb, ilc2:hlg, ilc2:hlf, ilc2:hlj, ilc2:hlh, ilc2:hld, ilc2:hlk, ilc2:hla ; skos:scopeNote "celestial bodies where nuclear fusion occurs"@en ; skos:related ilc2:hu, ilc2:gf ; skos:inScheme ; rdfs:label "stars"@en ; rdfs:seeAlso ; skos:prefLabel "stars"@en ; skos:notation "hl" ; a skos:Concept ; skos:broader ilc2:h . ilc2:gf skos:notation "gf" ; 56 rdfs:label "plasma"@en ; skos:prefLabel "plasma"@en ; a skos:Concept ; skos:related ilc2:hl . ilc2:hlh skos:notation "hlh" ; rdfs:label "bright giant stars"@en ; skos:prefLabel "bright giant stars"@en ; a skos:Concept ; skos:broader ilc2:hl . rdfs:label "Integrative Levels Classification (ILC)"@en ; skos:prefLabel "Integrative Levels Classification (ILC)"@en ; a skos:ConceptScheme . ilc2:hla skos:notation "hla" ; rdfs:label "attributes of #hla"@en ; skos:prefLabel "attributes of #hla"@en ; a skos:Concept ; skos:broader ilc2:hl . ilc2:hlf skos:notation "hlf" ; rdfs:label "subgiant stars"@en ; skos:prefLabel "subgiant stars"@en ; a skos:Concept ; skos:broader ilc2:hl . ilc2:hlk skos:notation "hlk" ; rdfs:label "hypergiant stars"@en ; skos:prefLabel "hypergiant stars"@en ; a skos:Concept ; skos:broader ilc2:hl . ilc2:hlU skos:notation "hlU" ; rdfs:label "the Sun"@en ; skos:prefLabel "the Sun"@en ; a skos:Concept ; skos:broader ilc2:hl . ilc2:hld skos:notation "hld" ; rdfs:label "dwarf stars"@en ; skos:prefLabel "dwarf stars"@en ; a skos:Concept ; skos:broader ilc2:hl . 57 Figure 3: Skosmos output and Gruff visualizations and corresponding BARTOC view for stars 7.0 Conclusion Previous experience at the University of South Wales with the STELETO transformation tool has allowed to treat the complex syntactic structures of a freely faceted classification, such as ILC, and produce an appropriate representation of them as SKOS. On the other hand, full management of concept combinations according to ILC syntax is limited by the expressiveness of the SKOS format itself, as already discussed by Gnoli et al. (2011). Representation of a freely faceted classification as SKOS is especially useful for the purposes of data exchange in a standard international format, making it available on the Web as linked data in view of new applications. Tools for visualization of semantic structures are another benefit of conversion to SKOS. Special applications, such as PHP scripts for navigation of ILC schedules as available on the website, can further exploit its expressive power. 58 Acknowledgments We are grateful to Andreas Ledl for publication in BARTOC Skosmos, and to Riccardo Ridi for discussion on philosophical aspects of noumena and phenomena. References Austin, Derek. 1976. “The CRG Research Into a Freely Faceted Scheme.” In Classification in the 1970s: A Second Look, edited by Arthur Maltby. London: Bingley, 158-194. Binding, Ceri. 2019. “STELETO: Convert Input Data to Any Textual Output Format Via a Custom Template.” GitHub. Binding, Ceri and Douglas Tudhope. 2016. “Improving Interoperability Using Vocabulary Linked Data.” International Journal on Digital Libraries 17: 5-21. Binding, Ceri, Douglas Tudhope and Andreas Vlachidis. 2018. “A Study of Semantic Integration Across Archaeological Data and Reports in Different Languages.” Journal of Information Science 45: 364-386. Gnoli, Claudio. 2016. “Classifying Phenomena, Part 1: Dimensions.” Knowledge Organization 43: 403-415. Gnoli, Claudio. 2020. “Integrative Levels Classification.” In ISKO Encyclopedia of Knowledge Organization, edited by Birger Hjørland and Claudio Gnoli. Gnoli, Claudio, Tom Pullmann, Philippe Cousson, Gabriele Merli and Rick Szostak. 2011. “Representing the Structural Elements of a Freely Faceted Classification.” In Classification and Ontology: Formal Approaches and Access to Knowledge: Proceedings of the International UDC Seminar 19-20 September 2011 The Hague, edited by Aida Slavic and Edgardo Civallero. Würzburg: Ergon, 193-205. Gnoli, Claudio, Ziyoung Park and Andreas Ledl. 2019. “Dimensional Analysis of Subjects: Indexing Koss in BARTOC by Phenomena, Perspectives, Documents and Collections.” In 1st Low Countries ISKO Conference, Brussels. Heritage Data, n.d. “Linked Data Vocabularies for Cultural Heritage.” https://www. blog/. Ledl, Andreas and Claudio Gnoli. 2017. “Indexing Koss in BARTOC by a Disciplinary and A Phenomenon-Based Classification: Preliminary Considerations.” In Faceted Classification Today: Theory, Technology and End Users: Proceedings of the International UDC Seminar 14-15 Sept. 2017, London, edited by Aida Slavic and Claudio Gnoli. Würzburg: Ergon, 109-117. Park, Ziyoung, Claudio Gnoli and Daniele P. Morelli. 2020. “The Second Edition of the Integrative Levels Classification.” In NKOS Workshop at DCMI 2019 25 September 2019 Seoul. Journal of Data and Information Science 5: 39-50. Peponakis, Manolis, Anna Mastora, Sarantos Kapisakis, Martin Doerr. 2019. “Expressiveness and Machine Processability of Knowledge Organization Systems (KOS): An Analysis of Concepts and Relations.” International Journal on Digital Libraries 20: 433-452. Trzmielewski, Marcin, Claudio Gnoli. 2019. “Une Classification Interdisciplinaire Pour L’échange et la Médiation des Données Ouvertes de la Recherche.” In 12ème Colloque International D’ISKO-France: Données et Mégadonnées Ouvertes en SHS: De Nouveaux Enjeux Pour l’État et l’Organisation des Connaissances? 9-11 October 2019 Montpellier. Archive Ouverte HAL. W3C. 2009. SKOS: Simple Knowledge Organization System Reference, eds. Alistair Miles and Sean Bechhofer. Pino Buizza – Università degli studi di Firenze (Florence), Italy Thesaurus and Heading Lists Equivalences and Divegences Abstract The variety of indexing systems needs interoperability to satisfy global information. In mapping projects and related literature, the focus is predominantly on equivalence relationships between terms. Starting from examples of terms mapped between the Thesaurus of Nuovo Soggettario and LCSH and Rameau, the semantic relationships of terms are explored to verify correspondences and divergences in related terms. The different structure of the examined vocabularies leads to semantic networks that are not parallel. Specific and general remarks follow, in light of ISO 25964:2011-2013 and recent revisions. 1.0 Aim and background The great variety of indexing systems, together with the use of different languages, meets the specific needs of their patrons. However, today access to information on a global scale requires interoperability, a theme widely studied, and mapping is the way to reach data coming from different sources. Many mapping tests have been carried out and a rich literature is available on the matter.1 However, the focus is predominantly, if not exclusively, on equivalence relationships between concepts/terms. This attitude creates precise maps among single nuclei. However, it ignores the semantic relationships that are traditionally and diffusely represented in controlled vocabularies, hierarchical and associative relationships. Moreover, it loses sight of the overall correspondence of the envisaged systems. What happens if we explore their semantic networks? Do we find parallel or diverging nets? Do the correspondences go on step by step or stop at the starting points? This paper addresses this issue starting from mappings of the Thesaurus of Nuovo soggettario 2 (ThNS, source vocabulary) to Library of Congress Subject Headings3 (LCSH) and to Repertoire d’autorité-matière encyclopédique et alphabétique unifié4 (Rameau), as target vocabularies; all of them are characterized for general scope and produced by national bibliographic agencies. Some typical examples of mapping are reported, showing the equivalences and the most frequent divergences between the semantic networks and giving the starting point for justifying the divergences on the basis of the different features of the indexing languages or of their application criteria. The paper discusses some overall remarks on mappings and interoperability between different indexing languages and suggests possible alternative solutions drawn from ISO 25964:2011-2013, which is a sound reference to recognize and represent equivalences and their degree, to state meaningful and useful links where there is no equivalence, and to manage non-parallel systems. 1 To mention only a few papers: Hudon 1997, Doerr 2001, Riesthuis 2003, Zeng and Chan 2004, Jacobs, Mengel, and Müller 2010, Binding and Tudhope 2016, Balakrishnan, Voß, and Sorgel 2018, Kempf 2018, Zeng 2019. 2 3 4 60 Nuovo soggettario is an analytic-synthetic indexing language produced by the National Central Library in Florence and adopted by Italian libraries (Biblioteca nazionale centrale di Firenze 2006). Here, we are interested in the component Thesaurus, which is devoted to the control of vocabulary and semantic relationships between concepts. It is clearly distinct from the rules for concept analysis and strings construction, and from the authority file of assigned strings. In this separation of syntax from semantics, ThNS conforms to stipulations of ISO 25964-1:2011, and can function autonomously in postcoordinated indexing. The terms are not distinct in headings and subdivisions and there are no complex subjects. The terms included are only topic concepts, and terms that are valuable as genre/form are also included without distinction. Geographic terms, proper names and titles are not included. The structure of ThNS is not by discipline, but is founded on four macrocategories (Agents, Actions, Things, Time) divided into semantic categories, in number of thirteen (Organisms, Organizations, Persons and groups; Activities, Disciplines, Processes; Matter, Objects, Space, Tools, Forms; Structures; Time). Each of these categories is organized in facets and sub-facets, also adopting node labels. Within these categories, hierarchical relationships between concepts (BT and NT) and equivalence relationships between terms (USE, UF) are arranged without deviations. The connections between concepts belonging to different categories are recorded as associative relationships (RT). The use of polyhierarchy is limited to few controlled conditions. These features make ThNS different from other widespread indexing systems like LCSH and Rameau, but do not prevent from mapping correspondences useful for searching through different systems. ThNS mapping has reached almost 14000 equivalences with LCSH and 12000 with Rameau and it is still in progress. The equivalence mappings from ThNS are all recorded in skos as ‘closeMatch’, even when the equivalence could be an ‘exactMatch’. Compound equivalence, as foreseen in ISO 25964- 2:2013, 8.3, is not adopted: neither intersecting (EQ+) nor cumulative (EQ|). Mappings on broader or narrower levels (BM or NM) are not adopted, so each term admits only one mapping with one term of the same semantic category in the target vocabulary, and there are no double mappings from or to one term. The equivalences recorded in LCSH as ‘closely matching concepts’ with ThNS are the reciprocal of the mappings made in Florence, unfortunately incomplete, as they have not been added lately. In Rameau, Italian terms do not yet appear, but in they are now included among the resources ‘sur le Web: notice correspondante dans Le Nuovo Soggettario’ and also among the variants (autres formes du thème) with the qualification ‘italien’. Regardless of the following technical remarks, future reciprocal agreements are essential to the effectiveness of these works: for users’ searching functions and to manage the updating through the vocabularies. Mapping activity has shown cases of no correspondence (a concept represented in a vocabulary does not appear in the other) and of inexact correspondence (for instance, due to a different level of specificity). These mismatches can be due to language differences, different context and literary warrant, or different structure and application criteria of the considered indexing languages (for instance, the inclusion of complex subjects, requiring a double mapping from two terms of ThNS, or a different granularity). Some representative cases of difficult correspondence between single concepts have been previously reported (Buizza 2019). 61 2.0 Mapping concepts and exploring relationships This study focuses on six concepts represented by equivalent terms in the three vocabularies. Their hierarchical chains and related terms have been explored, in order to verify correspondences and differences in their semantic networks. The sample of examples tries to cover the most typical and interesting cases, without any presumption to be complete. No quantitive survey was carried out, to know the incidence of unsatisfactory results. The first example is a product globally widespread, the smartphone. ThNS records it as Smartphone (not translated), and its equivalents are Smartphones in LCSH and Smartphones in Rameau. Looking at the semantic relationships in the three vocabularies we can see immediately some differences, beginning from hierarchical ones (see Table 1). In ThNS, for the monohierarchical choice, we find only one BT Telefoni cellulari, even if the object is both a telephone and a computer. The term representing the second hierarchy, Palmari, is linked with an associative relationship, RT. In LCSH polyhierarchies are applied and we find two BTs: Cell phones and Pocket computers. In ThNS we have no NT due to the general choice of not recording proper names in the thesaurus, whether brand names or names of products. In LCSH some models from different brands are recorded as NT (for instance Samsung Galaxy S (Smartphone)). Therefore, only one equivalence can be set at the upper level (Telefoni cellulari EQ Cell phones) and no equivalence can be set at the lower level. Table 1. The case for Smartphone (simplified); double hierarchy is split. The superscripts mark recorded closeMatches. ThNS LCSH Rameau ··· BT Telefoni5 RT Radiotelefonia4 ·· BT Radiotelefoni · BT Telefoni cellulari2 Smartphone1 -------------------------- BT Computer portatili8 RT Palmari7 Smartphone1 ·· BT Telephone5 UF Telephone service, Telephones · BT Cell phones2,3 Smartphones1 · NT Samsung Galaxy S (Smartphone)6 -------------------------------------- ·· BT Portable computers8 · BT Poket computer7 Smartphones1 ·· BT Téléphone5, Radiotéléphonie4, Radiocommunications mobiles, Systèmes de communication sans fil · BT Téléphonie mobile3 EP Téléphones cellulaires Smartphones1 · NT Samsung Galaxy (Smartphones) ·· NT Samsung Galaxy S (Smartphone)6 ---------------------------------------- ·· BT Ordinateurs portatifs8 · BT Ordinateurs de poche7 Smartphones1 The comparison with Rameau gives similar results. In Rameau, two BTs are recorded, Téléphonie mobile and Ordinateurs de poche, and some NTs for specific brands, with relationships parallel to those of LCSH. However, there are two differences. The BT Téléphonie mobile has shifted the semantic category from objects to an activity – or rather, concepts of activity and of objects have been merged under one term, as shown by the French equivalents in Rameau (e.g. EP Téléphones cellulaires). Based on the category distinction, ThNS cannot set an equivalence with this term as it would cover only the component ‘object’. The parallelism is lost also with LCSH and Téléphonie mobile has correspondence with both Cell phones and Cell phone systems, with two distinct relationships. The second difference is how the two heading lists link the proper names of the smartphones to the noun term. LCSH links them directly as NT. Rameau 62 links them in two steps, through an intermediate term for the brand: Smartphones NT Samsung Galaxy (Smartphones) NT Samsung Galaxy S (Smartphone). The last term has as equivalent the homograph recorded in LCSH, while the intermediate term has no equivalent in LCSH. The ways hierarchical chains go to upper levels are now explored. In ThNS Telefoni cellulari has BT Radiotelefoni, which has BT Telefoni. In LCSH Cell phones has BT Telephone (in the singular, with variants: Telephone service and Telephones, with double category value) and also the complex subject Radio—Transmitters-receivers. ThNS sets equivalence between Telefoni and Telephone, even though the Italian word does not include the telephone service. No equivalence for Radiotelefoni, but the RT Radiotelefonia has equivalent Radiotelephone, that includes service and devices (its NT Walkie-talkies has a homograph in ThNS recorded as NT of Radiotelefoni). In Rameau, Téléphonie mobile has four BTs: Téléphone, Radiotéléphonie, Radiocommunications mobiles and Systèmes de communication sans fil, but the last term has the second and third as NT, besides Téléphonie mobile, and the second term is also NT of the first; an entry to be updated, probably – there are some in every vocabulary. The equivalences in ThNS go from Telefoni and Radiotelefonia to Téléphone and Radiotél- éphonie – equally from the English terms to the French. The correspondences between the three vocabularies are restored, but the paths from smartphones to this point are different and the exact correspondence between smartphones is not kept between the reference to one category in ThNS and the meaning merging object and functions in LCSH and Rameau. According to the choice of monohierarchy, ThNS establishes an associative relationship to the second hierarchy: Smartphone RT Palmari. The equivalents of Palmari are Pocket computers and Ordinateurs de poche. Their respective BTs, Computer portatili, Portable computers, and Ordinateurs portatifs, are equivalent. At upper levels we find the equivalence of the three languages. But in LCSH and Rameau coming back to smartphones is inside one hierarchical chain by NT relationships, while in ThNS it is necessary to pass through the associative relationship Palmari RT Smartphone. The complexity and the divergencies of these semantic networks are due, at least in part, to the double function of the considered object. Another example is the simple concept of ‘date’, the fruit, seen from different points of view. In ThNS the term Datteri has BT Frutta, whose BT is Alimenti, and RT Frutti, a term that has BT the node label [Organi e parti di piante], that has BT Piante. According to the meaning of the Italian words, Frutti has a scope note assigning it the botanic meaning, while for the works about fruits as food Frutta is used. In LCSH, the equivalent Dates (Fruit) has two BTs: Date palm products (with hierarchy: BT Palm products BT Plant products) for the economic side, and BT Fruit, for food, agricultural and botanic meanings, as is shown by its five BTs: Food, Food crops, Horticultural crops, Horticultural products, and Plants. In Rameau, the equivalent Dattes has BT Fruit, whose scope note states both botanic and food meanings. These are shown together with the economic meaning, in its three BTs: Diaspores (botanique), Plantes comestibles, and Produits horticoles. In LCSH and in Rameau, we find also Cooking (Dates) and Cuisine (dattes), with the activity of cooking qualified by the specific food, a method collecting a long sequence of terms for each type of food under the same first word. These complex terms cannot have correspondence in Nuovo soggettario, where strings are provided syntactically out of the thesaurus. 63 In this case the given concept is represented by the term for the food and a form term meaning the aim of the document: Datteri-Ricette. This example confirms that: adopting polyhierarchy or not is an important factor; scope notes may be crucial for stating an equivalence; the presence of complex subjects makes a difference between the vocabularies that is difficult to manage. Even if there are not different meanings, nor different contexts for a concept, the equivalence of the terms for a given concept does not grant the persistence of the equivalence in the other steps of the semantic networks. Looking at concepts representing a set of individuals, for instance deities, and considering a particular historical expression, say Greek deities, we find the equivalent terms Divinità greche, Gods, Greek, and Dieux grecs. In ThNS the subordinate terms are two subsets, Muse and Ninfe, while in LCSH we find some subsets of different kind (e.g. Gods, Minoan, referring to specific areas) and a number of proper names (e.g. Chaos (Greek deity)) but not the well-known gods of Olympus (recorded in Library of Congress Name Authority File). In Rameau we find some subsets (e.g. Dieux minoens), the female collective term (Deesses greques) and an exhaustive list of proper names of deities (e.g. Aphrodite (divinité grecque)). The choice of Nuovo soggettario to manage the proper names of individuals through the guidelines of the Manuale, causes this lonely position of ThNS. However, the other vocabularies do not agree with each other either: the distribution of subordinate sets is not equal and LCSH places some gods among names of persons. There are misalignments in superordinate relationships too. The Italian term has BT Divinità, the English one lacks BT, the French one has two study areas as BT Mythologie grecque and Religion grecque. The equivalent terms for Divinità are Gods and Dieux. Divinità has NTs for deities of various peoples and Divinità femminili, Divinità marine, Divinità salutari, etc. Gods has NTs for deities of nature (e.g. Water gods) and of specific religions (e.g, Hindu gods, with NT for some proper names), except for classical religions, that are orphan, and has BT Mythology, Classical. Dieux has NTs for the deities of phenomena (e.g. Dieux des vents), that can have NTs for proper names, like Vayu (divinité hindoue), while the terms qualified by people or religion have BT for the appropriate mythology and/or religion (e.g. Dieux hindous BT Hindouisme, Mythologie hindoue). Other differences in semantic relationships can be found in the category of persons. Donatori di sangue has equivalents Blood donors and Donneurs de sang (see Table 2). The term in ThNS has BT the node label [Persone secondo il comportamento], that has BT Persone. In LCSH the immediate BT is Persons. This is not a small difference: Persone has three node labels (according to activity and to conditions, in addition to behaviour) and each of them has NTs for other node labels in a widely faceted articulation. Under Persons a long list of terms represents specific types of persons (e.g, Saints, Slaves, Travelers), arranged alphabetically. In a different way, in Rameau Donneurs de sang has BT Donneurs d’organes, that has no BT, the same as the equivalent terms of the above examples: Saints, Esclaves, Voyageurs. As a result, the terms for categories of persons are fragmented and neither systematized in a pyramid as in ThNS, nor collected under a comprehensive top term as in LCSH. There are some groupings, for instance under Catégories socio-professionnelles (with hierarchies like: NT Commerçants NT 64 Libraires), and also terms with a BT. But some BTs might have moved to another semantic category, for instance, Personnes remariées has BT Remariage. Table 2. The case for Blood donors and Persons. The superscripts mark recorded closeMatches. ThNS LCSH Rameau Persone2 · NT [Secondo l’attività] · NT [Secondo la condizione] ·· NT [Secondo la condizione sociale] ··· NT Schiavi3 · NT [Secondo il comportamento] ·· NT Donatori di organi6 ·· NT Donatori di sangue1 ·· NT Viaggiatori4 ·· NT [Secondo la fede e le convinzioni religiose] ··· NT Santi5 Persons2 · NT Slaves3 · NT Organ donors6 · NT Blood donors1 · NT Travelers4 · NT Saints5 -- Esclaves3 Donneurs d’organes6 · NT Donneurs de sang1 Voyageurs4 Saints5 A term for a discipline, Dermatologia, has equivalents Dermatology and Dermatologie (see Table 3). In ThNS the broader term Medicina is reached through the interposed node label [Medicina applicata a specifici organi, apparati, sistemi, funzioni]. This node label is paired on the same array by two other labels: [Medicina applicata a categorie di persone] and [Medicina applicata a specifiche attività], which are all recorded as NTs under Medicina specialistica. The result is a faceted distribution of the branches of medicine. In LCSH and Rameau there is a direct relationship to the general discipline, BT Medicine and BT Médicine respectively. In the subordinate hierarchy, in ThNS there are no NTs and the term Dermatologia veterinaria is recorded as associated (RT). In a different interpretation, its equivalents Veterinary dermatology and Dermatologie vétérinaire are recorded as NT. Among other NTs for specialities there are also terms that are neither members nor parts of the superordinate concept, such as, in LCSH, agents: Dermatologists, and a technique: Radioisotopes in dermatology, or, in Rameau, a védette construite for an activity: Peau-Maladies-Soins infirmiers. Table 3. The case for Dermatology. The superscripts mark recorded closeMatches. ThNS LCSH Rameau ··· BT Medicina2 ·· BT Medicina specialistica · BT [Medicina applicata a organi…] Dermatologia1 RT Dermatologia veterinaria3 · BT Medicine2 Dermatology1 · NT Veterinary dermatology3 · NT Pediatric dermatology4 · NT Dermatologists · NT Radioisotopes in dermatology · BT Mèdicine2 Dermatologie1 · NT Dermatologie vétérinaire3 · NT Dermatologie pédiatrique4 · NT Peau-Maladies-Soins infirmiers Another disciplinary term, Diritto internazionale, and its equivalents International law and Droit international, present the same kind of difference at superordinate levels: direct relationships to Law and Droit, while in ThNS a node label is interposed: BT [Diritto secondo la materia] BT Diritto. The subordinate Italian hierarchy has specific branches (e.g. Diritto comunitario, Diritto internazionale marittimo), and associative relationships link objects, tools and activities of the discipline. In LCSH and Rameau 65 there are many NTs, including any kind of concept that can be considered in the semantic area of the discipline, including complex or syntactically constructed terms (e.g. Islands- Law and legislation or Iles-Droit) and parenthetic terms (e.g. Missing persons (International law) and Personnes disparues (droit international)). The sets of the two languages do not fully overlap, some terms have no equivalent in the other language (e.g. Women (International law) or Animaux (droit international)). In ThNS a parenthetic term is allowed only for disambiguation of homographs (normally not by the discipline). Concepts expressed with parenthetic terms in LCSH and Rameau are seen in ThNS as syntactical relationships between distinct concepts, to be treated according to the rules for string construction (if they are not pleonastic). The term Intervento (Diritto internazionale), with the same meaning of Intervention (International law) and of Intervention (droit international), is a non-preferred term to be expressed by a combination of terms: Intervento (Diritto internazionale) USE+ Diritto internazionale, Intervento militare. No equivalence is possible from the Italian terms to those of LCSH and Rameau. According to ISO 25964-2:2013, 8.3.2, an intersecting compound equivalence could be provided in the other direction: Intervention (International law) EQ Intervento militare + Diritto internazionale, and Intervention (droit international) EQ Intervento militare + Diritto internazionale. Obviously, this method is not adopted, at present. In ThNS Discipline is a semantic category and a top term. Its hierarchy comprises only concepts of discipline. Other concepts strictly related to a discipline (not simply falling into a discipline) are linked with an associative relationship, and remain in their semantic category (agents, activities, tools, etc.). Thus, in ThNS we do not find the hierarchies typical of classifications based on disciplines, where a discipline is the systematic container of everything it concerns. The free structure of LCSH and Rameau is quite different: categories are not considered and the relationships with narrower terms may be similar to an alphabetic index collecting all the topics belonging or attributed to that discipline, including complex and parenthetical terms. 3.0 Overall remarks There are several differences between the semantic networks of the three vocabularies, even though we can often start from equivalent terms for the same concept and we find other full equivalences along the paths. The differences occur mainly between ThNS and the two heading lists, due to the different structure of ThNS explained above. To summarize, some of the reasons are: firstly the lack of subdivisions, complex subjects, and proper names. Secondly, an architecture based on categories without disciplinary groupings, strictly categorial equivalences and hierarchies, faceting, and limited polyhierarchy, compared with more free relationships aiming to link what is inside a discourse rather than the accurate identities of concepts. Téléphonie mobile is not the same entity as Téléphones cellulaires, but when talking about the former one talks also about the latter. There are important differences even between the strongly connected LCSH and Rameau, due to different granularity or definition, including jumping hierarchical levels, merging concepts under one term or distinguishing them under different terms, different criteria of subdivision, non-identical choices in recording complex subjects. This means 66 that some equivalences are missing or couples of terms referred to the same concepts need different paths to be connected. The great extension of the vocabularies allows for the presence of most concepts in the three languages and their mapping, so the divergences of the networks may appear not very important: surfing among terms seems always possible. What are the drawbacks of these divergences, in regard to the primary value of the equivalence between terms representing the same concept? Terms have no magic power. They can neither fully represent the content and informative potential of the works entered under them, nor fully and exactly retrieve the searched information. They live as strictly connected elements of indexing and of natural language, controlled in indexing and free in users’ access. Terms literally say what their definitions or scope notes say, but they represent wider themes and thoughts, multiple and articulated connections to other concepts and themes. Semantic relationships just try to represent these articulated meanings in systematic, understandable and viable order, guiding indexers in choosing terms and users in searching and exploring resources. Different configurations of the semantic frames are quite legitimate and sometimes required. However, for interoperability and combined use, the largest overlapping of the systems, or, at least, knowing the different paths and the missing links is desirable. Going up and down hierarchies consistently makes the choice of broadening or narrowing the field of research easier, faster and more reliable, and avoids possible confusions. The same usefulness works in order to see the sibling terms, passing through the broader term. In indexing work, only the adopted system is involved and the problems of semantic networks are less relevant than the deficiencies in mapping. Some observations about the mentioned cases reduce their seriousness. The lack of proper names in ThNS simply means that a name is assigned following the Manuale and is recorded and searchable among the strings, not in the thesaurus. Complex subjects are ready to use, while in ThNS the strings are combined when needed, according to syntactic rules, or drawn from strings already recorded. The equivalence is restored at this level. Missing terms for a concept, probably due to lack of literary warrant, could be added and validated (e.g. Divinità minoiche in ThNS, or Samsung Galaxy smartphones in LCSH), unless an equivalence relationship with a term for a slightly different concept is preferred in the vocabulary policies. Only when two concepts are represented by the same term we can find different solutions in indexing. Obviously other differences in indexing results come from different policies and syntax (e.g. summarization vs depth indexing, or one coextensive string for one work vs more complementary strings). The results of searching through diverging semantic networks vary greatly depending upon the way indexes are presented and searches are carried out: through the vocabulary or directly on strings, by surfing the vocabulary or the strings or the records, with or without automatic extension to target vocabularies. ‘Exploding’ a search, that is, adopting a technique able to retrieve also the resources linked to the terms subordinate to the searched term, gives different results in systems with or without polyhierarchies, and adopting hierarchies with or without extra semantic categories. The same happens ‘expanding’ a search, that is, retrieving the resources linked to both subordinate and associated terms (ISO 25964-1:2011, 10.2.1). In general terms, simple searches are not problematic. When surfing, users must follow the different features of each indexing system. 67 Automatic exploring gives variously inaccurate results (with topics and resources that are not pertinent), insufficient (some relevant resources are not found) or redundant (relevant resources are retrieved together with off-topic ones, good for serendipitous diversions), depending on finding equivalent, lacking or overabundant relationships. Nowadays the connections between the equivalent terms of the different vocabularies allow users to search in the catalogues of the three libraries in few steps: any improvement of interoperability will be beneficial to users from the three areas. Structural differences and soundness of the systems seem unfavourable to future convergences. However, some suggestions are an obvious consequence of this survey. Adding other forms and levels of mapping is recommendable to clarify and enrich connections, even though it acts almost only at the level of single correspondences, without weakening divergencies between the semantic networks. Any convergences towards international standards, especially ISO 25964, would lead to positive effects, even though subject headings lists are too far from thesauri in their structure. At present, some elements offer suitable conditions to interoperability. Linked open data already support the three systems and work on data of the entities in a way that prefers modular systems, instead of systems fixing pre-coordinated entities in their vocabularies. Systems where each element is treated per se, in order to be combined with other elements, either steadily according to semantic paradigms or occasionally according to syntactic needs, are more suitable for linked data. In this perspective, ‘Réformer Rameau’ is a very encouraging venture launched in France5. In five years, among other things, it foresees to abolish the distinction between tête de vedette and subdivision, to split up the vedettes construites, and to delete special instructions for disciplines. This treatment of terms is similar to thesaural treatment. It is neutral as regards the destination to pre-coordination, and reduces the disciplinary bonds, that functionally collect any object of interest in a given field of research, but constrain concepts into limits more restricted than those where concepts normally move. As for ThNS, the second edition of the Guida to Nuovo soggettario, now in the final phase, confirms the adherence to standard ISO 25964. It does not change the features of the thesaurus and its criteria for mapping other vocabularies, even after a complete rewriting. We cannot exclude a future opening to varied mappings, which would be particularly helpful if accompanied by collaboration and agreed criteria between national libraries. What matters here, in this description of coherence, limits and problems of network mapping, is to point out that good mappings between single terms are not sufficient. This paper recommends, particularly to people involved in designing interoperability, going beyond mappings between single terms, towards clearly branched and connected maps, which would be useful to cross unknown or less well-known areas of knowledge. References Balakrishnan, Uma, Jakob Voß, and Dagobert Soergel. 2018. “Towards Integrated Systems for KOS Management, Mapping, and Access: Coli-Conc and its Collaborative Computer-Assisted KOS Mapping Tool Cocoda.” In: Challenges and Opportunities for Knowledge Organization in the Digital Age: Proceedings of the Fifteenth International ISKO Conference 9-11 5 68 July 2018 Porto, Portugal, edited by Fernanda Ribeiro and Maria Elisa Cerveira. Advances in knowledge organization 16. Baden-Baden: Ergon, 693-701. Biblioteca nazionale centrale di Firenze. 2006. Nuovo soggettario. Guida al sistema italiano di indicizzazione per soggetto. Prototipo del Thesaurus. Milano: Editrice bibliografica. Binding, Ceri, and Douglas Tudhope. 2016. “Improving Interoperability Using Vocabulary Linked Data.” Journal of Digital Information 17: 5-21. Buizza, Pino. 2019. “Indicare quasi la stessa cosa. Appunti di indicizzazione interlinguistica.” In Viaggi a bordo di una parola. Scritti di indicizzazione semantica in onore di Alberto Cheti, a cura di Anna Lucarelli, Alberto Petrucciani, Elisabetta Viti. Roma: Associazione italiana biblioteche, 33-49. Doerr, Martin. 2001. “Semantic Problems of Thesaurus Mapping.” Journal of Digital Information 1, no. 8. Hudon, Michèle. 1997. “Multilingual Thesaurus Construction. Integrating the Views of Different Cultures in One Gateway to Knowledge and Concepts.” Knowledge Organization 24: 84-91. ISO 25964-1-2:2011-2013. Information and Documentation. Thesauri and Interoperability with Other Vocabularies. Geneva: ISO. Jacobs, Jan-Helge, Tina Mengel, and Katrin Müller. 2010. “Benefits of the Crisscross Project for Conceptual Interoperability and Retrieval.” In Paradigms and Conceptual Systems in Knowledge Organization: Proceedings of the Eleventh International ISKO Conference 23-26 February 2010, Rome, Italy, edited by Claudio Gnoli and Fulvio Mazzocchi. Advances in knowledge organization 12. Würzburg: Ergon Verlag, 236-241. Kempf, Andreas Oskar. 2018. “The Need to Interoperate: Structural Comparison of and Methodological Guidance on Mapping Discipline-Specific Subject Authority Data to Wikidata.” In Challenges and Opportunities for Knowledge Organization in the Digital Age: Proceedings of the Fifteenth International ISKO Conference 9-11 July 2018 Porto, Portugal, edited by Fernanda Ribeiro and Maria Elisa Cerveira. Advances in knowledge organization 16. Baden- Baden: Ergon, 644-652. Riesthuis, Gerhard J.A. 2003. “Information Languages and Multilingual Subject Access.” In Subject Retrieval in a Networked Environment, edited by I.C. McIlwaine. München: De Gruyter Saur, 11-18. Zeng, Marcia Lei. 2019. “Interoperability.” Knowledge Organization 46: 122-146.. Zeng, Marcia Lei and Lois Mai Chan. 2004. “Trends and Issues in Establishing Interoperability Among Knowledge Organization Systems.” Journal of the American Society for Information Science and Technology 55, no.5: 377-395. D. Grant Campbell – University of Western Ontario, Canada Alex Mayhew – University of Western Ontario, Canada Inheritance and Lamination in the Representation of Bibliographic Relationships Abstract: This paper uses the concept of lamination implied in the term “discovery layer” to explore how domain knowledge could be applied to large federated search environments. Using the publishing history of Daniel Defoe’s Robinson Crusoe as a case study, we use bibliographic scholarship in the English literary studies community to establish lines of inheritance on all four levels of the FRBR paradigm: work, expression, manifestation and item. We created a small demonstration of a visualization generated from linked data extracted from the scholarly literature, to show how literary scholarship, when encoded as linked data, can create lines of inheritance and influence that enable users to fulfil the fifth user task of the new Library Reference Model: exploration. 1.0 Introduction Knowledge organization, as a field of both theory and practice, regularly grapples with the place of specialized domain knowledge within information systems that purport to be “universal” in breadth and use. While the teams that maintain the Decimal Classification, the Universal Decimal Classification and the Library of Congress Classification rely on subject experts to help them maintain various sections of these tools, the relationship between domain knowledge on the one hand, and principles of knowledge organization that purport to be a priori assumptions on the other, is by no means clear (Hjorland 2015). In this paper, we address this conflict in a different way by focusing on bibliographic rather than thesaural relationships: in particular, we examine the patterns by which one bibliographic entity influences others over time. By considering the possibilities of using linked data to encode relationships across federated information resources, we explore new ways in which specialized domain knowledge can be integrated into library catalogues. 2.0 Federated Searching and the Concept of Lamination The concept of federated searching—simultaneously searching multiple information sources—has a long history in librarianship, dating back at least to the National Union Catalogue of the Library of Congress. Efforts to establish global standards of information description—“Universal Bibliographic Control”—rest at least partly on the dream of searching diverse catalogues through the standardization of description methods. On the Web, metasearch engines and metadata harvesting initiatives have led to the call of Tim Berners-Lee for data that is freed from its local context and permitted to combine in productive and exciting ways (Berners-Lee 2009). The concept of the “discovery layer” has signalled a new environment in which federated searching in libraries has gone to a new level of complexity and power. Discovery layers are search platforms that integrate online catalogue data with data from other sources, enabling libraries not only to provide access to a wide range of resources through a single interface or portal, but also to customize search results in specialized ways (Ramsay & Chamberlain 2012). Not only do these discovery layers permit 70 searching across a range of different catalogues; they can also create interfaces that mimic popular search engine interfaces, permit faceted filtering of search results, and allow enhanced means of user feedback and participation in community search practices. Much of the challenge with discovery layers lies, naturally enough, with problems of data harmonization: how to enable metadata structured and encoded for one resource to combine with metadata from a different resource entirely. The promise of linked data, combined with the emergence of the Dublin Core as a lingua franca to which multiple diverse systems can convert, has made such harmonization possible, producing interfaces such as Western Libraries’ new federated search interface, OMNI, which allows users to search 14 Ontario university libraries for books, articles, videos, music and databases using one common interface. The term “discovery layer,” however, evokes another mental image as well. While proprietary platforms such as ProQuest’s Summon offer limited opportunities for customization, the rise of open source platforms such as VuFind and Blacklight offer greater opportunities for innovative adaptation (Barber, Holden & Mayo 2016, 182). Such discovery layers make it possible for users to add their own tags to records, creating folksonomies that are effectively “laminated” over the library catalogue, creating bibliographic relationships of specific utility. The concept of lamination—the superimposition of multiple data layers upon a foundational piece of data—was invoked by Ranganathan to describe the process of adding isolates to a base number to create a compound subject (Ranganathan 1967, 354). Later, the rise of geographic information systems popularized the process of overlaying geographical areas with successive layers of data indicating different facets of geographical interest (Hawkins 1994, 94). This paper describes a case study which explores the feasibility of incorporating specialized domain knowledge into a linked data discovery layer that could be laminated over bibliographic data—particularly union catalogues—in order to encode and visualize bibliographic relationships in fresh ways. 3.0 Bibliographic Relationships Much of the development in bibliographic description since the heyday of the Anglo- American Cataloguing Rules has involved the development of richer and more varied bibliographic relationships. The digital environment has enabled us to move beyond those relationships defined in the Paris Principles and which had to be primarily implied through main and added entries that determined where a record appeared (or did not appear) in a library’s card catalogue. The paradigm of the Functional Requirements of Bibliographic Records (FRBR), which serves as the basis for AACR2’s replacement, Resource Description and Access (RDA), separates a resource into four distinct entities, Work, Expression, Manifestation and Item, each of which exists in a one-to-many relationship with its predecessor. This paradigm is exploited in RDA to facilitate four primary objectives: finding resources, selecting them, identifying them and obtaining them (RDA In addition, RDA provides a set of relationship designators for each level of the paradigm that can be used to encode relationships between different resources, such as abridgments, adaptations, digests, inspirations, remakes and variations (RDA Appendix J). 71 With IFLA’s recently-released Library Reference Model (LRM), library catalogue standards have admitted a new objective. In addition to existing objectives to enable the user to find, identify, select and obtain. The LRM has added “Explore” as a fifth important user task: “to discover resources using the relationships between them and thus place the resource in context” (Riva, Le Boeuf & Žumer 2017, 15, emphasis added). The question lurking behind this new functionality is, of course: who is going to do all of this extra encoding? One option would be descriptive cataloguers themselves: the existing relationship designators would enable cataloguers to do a certain amount of it, embedding the relationships directly into the bibliographic records. In stark contrast, many discovery layers suggest another option by including user tagging affordances in their features, enabling library users to recognize and flag relationships of meaning to themselves. This paper investigates a middle approach between these two extremes: the use of professional domain knowledge encoded to create bibliographic relationships between different resources: relationships that can exist at various levels of the FRBR paradigm. In this particular case, we are examining relationships defined in the field of literary studies, drawing on published literary and bibliographic scholarship to define relationships that could be encoded using linked data that refers to existing bibliographic and authority records, thereby enable users to follow temporal progressions of influence and development in literary history. Patterns of influence and inheritance would be effectively “laminated” over the bibliographic data when needed. 4.0 Temporal Progression The use of temporal progression as a guiding principle in this case study requires some justification: not all literary studies scholarship is rigidly defined by chronology. Nonetheless, a significant amount of literary scholarship analyzes patterns of influence, both documented and implied, patterns of development of certain genres and themes, and verification of the accuracy of certain statements. Time therefore figures prominently in many literary studies, particularly studies of literary bibliography. Classification theorists acknowledge the omnipresence and importance of time as a facet, and Fairthorne anticipated a growing awareness, within and beyond information science, that important relationships of inheritance can only be understood through a close attention to the temporal aspects of classification, whether natural or bibliographical (Fairthorne 1985, 363). More recently, scholars have acknowledged a shift in scientific thinking away from Linnaean taxonomies to a cladistic paradigm that recognizes lines of descent across time, thereby redefining category membership in ways that Linnaean classification does not allow (Hjorland 2015). While this case study does not adopt a formally cladistic approach, we proceed from the assumption that charting scholarship in literary studies and literary bibliography along a temporal dimension provides a fruitful means of overlaying domain expertise upon a large information store for purposes of bibliographic exploration. 5.0 The Case of Robinson Crusoe Daniel Defoe (1661?-1731) is an author who cries out for scholarly assistance in the act of bibliographic control. An extraordinarily prolific author, Defoe wrote voluminously on a dizzying range of subjects, at a time when authorial attribution was 72 highly unreliable. As a result, the number of works attributed to Defoe has grown over the years, only to be cut back ruthlessly in the 1990s when Furbank and Owens aggressively refuted many of the attributions (1998). As a result, conventional bibliographic control can be perilous in the case of Defoe: The Life of Mrs. Christian Davis, a sensational history of a cross-dressing Irish matriarch who distinguished herself as a foot soldier before succumbing to dropsy and scurvy, was mistakenly attributed to Defoe for some time, and the bibliographic metadata of at least one published version reflects this mistake. Robinson Crusoe lends itself particularly to scholarly assistance. It was “a milestone in literary history” that was immediately embraced (and sometimes ridiculed) well beyond England (Backscheider 1989, 412). What is more, the novel exhibits complexities, both in its publishing and its influence, that touch on all four levels of the FRBR paradigm. Defoe himself wrote two sequels: The Further Adventures of Robinson Crusoe later in 1719, and Serious Reflections During the Life and Surprising Adventures of Robinson Crusoe in 1720. There were many translations of the novel into other languages, as well as multiple abridgments, some unauthorized. The novel’s premise of a man shipwrecked on an island and growing in wisdom and knowledge inspired a long tradition of adaptations known as “Robinsonades,” some for adults and many for children. In exploring these complexities, we envisioned a federated search environment that would include the catalogues of numerous academic libraries, together with their extensive authority records, and such databases as the English Short Title Catalogue. Using authoritative scholarly and bibliographic guides to the publishing history of Robinson Crusoe, we began tracing inheritances at all four levels of the FRBR paradigm: Work, Expression, Manifestation and Item. 5.1 Work Relationships Robinson Crusoe was first published in 1719. The Library of Congress Name Authority File contains a name-title authority record for the work: Defoe, Daniel, 1661?-1731. Robinson Crusoe LC Authority: Publication date: 1719 Using the Name Authority File together with scholarly data, we can isolate at least two lines of influence at the Work level: sequels and Robinsonades. 5.1.1 Sequels. Hutchins (1925), Lovett (1991) and Furbank and Owens (1998) all agree that Defoe published two sequels to Robinson Crusoe, one in 1719 and one in 1720. The LC Authority File contains name-title access points for both: Defoe, Daniel, 1661?-1731. Farther adventures of Robinson Crusoe LC Authority: Publication date: 1719 Defoe, Daniel, 1661?-1731. Serious reflections during the life and surprising adventures of Robinson Crusoe. LC Authority: 73 Publication date: 1720 5.1.2 Robinsonades. Searching the MLA International Bibliography using the term “Robinsonades” produces a number of scholarly articles that identify works drawing on Robinson Crusoe. Among those with LC Name-Title Authority Records are: Dalayrac, N. (Nicolas), 1753-1809. Azémia LC Authority: First Performed: 1786 A French comic opera Ducray-Duminil, M. (François Guillaume), 1761-1819. Lolotte et Fanfan. English LC Authority for English Translation: First published: 1788 Novel, known in England as Ambrose and Eleanor Campe, Joachim Heinrich, 1746-1818. Robinson der Jüngere LC Authority: First published: 1788 Instruction book for children Conrad, Joseph, 1857-1924. Victory. Polish LC Authority: First published: 1915 Novel Golding, William, 1911-1993. Lord of the flies LC Authority: First published: 1954 Novel 5.2 Expression Relationships 5.2.1 Translations Robinson Crusoe was translated into many languages. A recent exhibition at Indiana University’s Lilly Library ( provides the following examples of early translations: First French Translation, 1720: La Vie et les Avantures Surprenantes de Robinson Crusoe: Contenant entre autres Évenemens, le Séjour qu'il a Fait Pendant Vingt & Huit Ans dans une Isle Déserte, Située sur la Côte de l'Amerique, près de l'embouchure de la Grande Riviere Oroonoque. Le tout écrit par lui-même. Traduit de l’Anglois. A Amsterdam, Chez L’Honoré & Chatelain. MDCCXX. 74 Italian Translation, 1731: La Vita e le Avventure di Robinson Crusoe, Storia Galante, Che contiene tra gli altri avvenimenti il soggiorno ch' Egli fece per ventott' anni in un' Isola deserta situata sopra la Costa dell' America vicino all' imboccatura della gran Riviera Oroonoca. Il tutto scritto da Lui medesimo: Tomo Primo. Traduzione dal Francese. In Venezia, MDCCXXXI. Presso Domenico Occhi. In Merceria all’Unione. Con Licenza de’Superiori. Translation based on the French version above. Second Dutch Translation, 1735: Het Leven en de wonderbare Lotgevallen van Robinson Crusoe, behelzende onder andere ongehoorde uitkomsten een verhaal van zijn acht-en-twintigjaarig verblijf op een onbewoond eiland, gelegen op de kust van America, bij de mond van de rivier Oronooque. Alles door hemzelf beschreven. Door G. Schreuders. Amsterdam, 1735-1736. Arabic Translation, 1835: Qi•s•sah R¯ubin•sun Kr¯uz¯i.Malta, 1835. Persian Translation, 1878: Rábinsan Krúso. Translated from the Urdú into Persian by Sher Alí of Kábul and edited in the Roman Character by T. W. H. Tolbort, Esq., B.C.S., Barristerat-Law. London: William H. Allen and Co., 13 Waterloo Place, Pall Mall, S.W. Publishers to the India Office. 1878. 5.2 Abridgments At least three abridgments of Robinson Crusoe appear in the English Short Title Catalogue: The Midwinter Abridgment, 1722: The life and most surprizing adventures of Robinson Crusoe, of York, mariner. Who lived eight and twenty years in an uninhabited island on the [co]ast of America, lying near the mouth [of] the great river of Oroonoque: ... The whole three volumes faithfully abridged, .. English Short Title Catalogue (ESTC) Number: 006343293 An Abridgment of the Midwinter Abridgment, 1734: The wonderful life, and most surprizing adventures of Robinson Crusoe, of York, mariner ... Faithfully epitomized from the three volumes, and adorned with cutts suited to the most remarkable stories .. printed for A. Bettesworth and C. Hitch, at the Red-Lyon; and J. Osborn, at the Golden-Ball in Pater- Noster-Row; R. Ware in Amen-Corner, and J. Hodges at the Looking-Glass on London-bridge, 1734. ESTC Number: 066477403 75 Abridgment, 1790: A concise abstract of the wonderful life, and surprising adventures of that renowned hero, Robinson Crusoe, who lived twenty-eight years on an unhabited island, and was afterwards released by pirates. Adorned with cuts. London : printed for, and sold by all the stationary and toy shops in town and country, [1790?] ESTC Number: 006061120 5.3 Manifestation Relationships Manifestation relationships have a significant potential to assist scholars in eighteenth-century literary studies by encoding important textual decisions made by modern editors. Modern literary studies frequently establish “standard editions” of canonical authors: editions which scholars prefer to consult and cite, and which often prove the basis for subsequent editions. In the case of Robinson Crusoe, the de facto “standard edition” is the 2009 edition of the novel which appears in the Chatto and Windus complete edition of Defoe Novels. This edition was created by consulting Defoe’s first edition together with Defoe’s manuscript (Owens 326). But other lines occur as well. The first edition of 1719 was reprinted in 1827 as the Shakespeare Head Press edition, which served as the base text for the Norton Critical Edition, published in 1975 (Shinagel 1994). The Broadview Press edition of Robinson Crusoe studied a handful of editions published in Defoe’s lifetime (Davis 2014, 37). In many instances in literary history, significant differences exist between early editions, such as the difference between the first and fourth editions of Samuel Richardson’s Clarissa, or the 1799, 1805 and 1850 editions of Wordsworth’s The Prelude. In such cases scholars and instructors, when selecting a text for use or for teaching, would very much want to trace modern editions that are based on a particular earlier edition. 5.4 Item Relationships In some cases, textual decisions are made, not just on the basis of a particular edition, but on a particular copy of an edition: one which contains perhaps an author’s handwritten marginal notes, or which contains a half-sheet imposition correcting a compositor’s error. In the case of Robinson Crusoe, an editor of a modern edition, skeptical of the vagaries of text reproduction in the eighteenth and nineteenth centuries, might prefer to overlook the Shakespeare Head reprint of 1827 and go directly to the copy of the first edition held in the British Museum. 6.0 Visualization As a preliminary experiment in how these relationships might be visualized, we created a sample visualization. While this visualization was created manually, the later steps of actually rendering the image can potentially be automated. The visualization and its creation thus serve as a proof of concept for a future discovery layer. To create this visualization we created a dedicated ‘catalogue wiki’ using the MediaWiki software along with the Semantic MediaWiki extension. Next, we gathered the access points for a sample of works referred to in scholarly articles as Robinsonades: 76 we used the name-title access points in the Library of Congress Name Authority File. This information was then added to the catalogue wiki, creating simplified authority records in a wiki environment. We then added to the catalogue wiki a ‘SourceOf’ relationship from Robinson Crusoe to the Robinsonades group. The Semantic MediaWiki extension allows for the encoding of these sorts of semantic web linkages within the MediaWiki environment. The following steps to create the visualization were done manually. We exported the authority records in a standardized RDF format to visualization software. In this case Gephi was used as it has a web RDF import extension that is easily enabled. Once the RDF information was imported into Gephi extraneous information was removed and labels were simplified for human readability. One of the data points preserved on all records was the date of the work’s creation, allowing the records to be arranged chronologically. The final result is an image that is derived from the content of the authority records (see Fig. 1). With the addition of the authoritative relationships, the final image contains some elements reminiscent of a family tree, and other elements reminiscent of a cladogram. Figure 1: Robinsonade Descendents of Robinson Crusoe. 7.0. Conclusion Most of these bibliographic relationships can be encoded by cataloguers in existing MARC records using the relationship designators provided by RDA. However, the task would be overwhelming. A laminated set of linked data relationships established by knowledgeable domain experts, however, could conceivably be employed to draw certain paths through a large federated collection of bibliographic records and authority files. Such paths would facilitate the new objective of “Exploration” advocated by the Library Reference Model, and also support a variety of scholarly approaches to a specific knowledge domain. 77 References Backscheider, Paula R. 1989. Daniel Defoe: His Life. Baltimore: Johns Hopkins University Press. Barber, Marlena, Christopher Holden and Janet.L. Mayo. 2016. “Notes on Operations: Customizing an Open Source Discovery Layer at East Carolina University Libraries.” Library Resources & Technical Services 60 no.3: 182-190. Berners-Lee, Tim. 2009. “The Next Web of Open, Linked Data.” TED Talks. Davis, Evan R. 2014. “A Note on the Text.” In Daniel Defoe, Robinson Crusoe: Modernized Edition. Peterborough, Ont.: Broadview Editions. Defoe, Daniel. 2008. “The Life and Strange Surprizing Adventures of Robinson Crusoe (1719). edited by W.R. Owens.” In The Novels of Daniel Defoe. London: Pickering & Chatto. Fairthorne, Robert A. 1985. “Temporal Structure in Bibliographical Classification.” Theory of Subject Analysis: A Sourcebook, edited by Lois Mai Chan, Phyllis Richmond, and Elaine Svenonius. Littleton: Libraries Unlimited, 356-368. Furbank, P.N. and W.R. Owens. 1998. A Critical Bibliography of Daniel Defoe. London: Pickering & Chatto. Hawkins, Andrew M. 1994. “Geographical Information Systems (GIS): Their Use as Decision Support Tools in Public Libraries and the Integration of GIS with Other Computer Technology.” New Library World 95 no. 1117: 94. Hjørland, Birger. 2015. “Are Relations in Thesauri ‘Context-Free, Definitional, and True in All Possible Worlds?’” Journal of the Association for Information Science and Technology 66, no. 7: 1367-1373. Hutchins, Henry Clinton. 1925. Robinson Crusoe and Its Printing, 1719-1731: A Bibliographical Study. New York: Columbia University Press. Lovett, Robert W. 1991. Robinson Crusoe: A Bibliographical Checklist of English language Editions (1719-1979). New York: Greenwood Press. Ramsay, Malcolm and Edmund Chamberlain. 2012. “Software Selection Methodology for Library Discovery Layer Systems.” FOSS4LIB. Ranganathan, S.R. 1967. Prolegomena to Library Classification. Edition 3. New York: Asia Publishing House. RDA Steering Committee. 2017. Resource Description and Access. April 2017 Update. Riva, Pat, Patrick Le Boeuf, and Maja Žumer. 2017. IFLA Library Reference Model: A Conceptual Model for Bibliographic Information. Netherlands: International Federation of Library Associations. Shinagel, Michael. 1994. “A Note on the Text.” In Daniel Defoe, Robinson Crusoe: An Authoritative Text, Contexts, Criticism. Second edition. New York: Norton. Josir Cardoso Gomes – IBICT, Brazil Marco André Feldman Schneider – IBICT, Brazil Ethical Perspective on Classifications of Religions The Protestant Rise in Brazil Abstract This paper aims to make a comparative review on the classification of religion of the Brazilian Census in 2010, with a specific attention on the protestant groups and denominations. Classifying religions is an arduous task and there is no consensus on the best way to classify them. In fact, there is not even consensus on what differs religion, denomination, sect or cult. Clearly there are ethical issues when the classification seeks to hierarchize or make one given religion more “developed” than another or simply the action of concealment of certain religions within the category "Others". Our analysis starts from the usual bibliographic classification schemes (CS) such as Universal Decimal Classification (UDC) and the Dewey Decimal Classification (DDC), but it also seeks to investigate the used schemes in China, India and other nations with a huge population but not aligned with the Western cultures in the global north. We point the need of ethical perspectives of classification and the necessity to be careful in order to respect cultural issues and not to allow any kind of prejudice, especially in the religious aspect and in the self-determination of religious minorities who do not accept or understand certain categorizations of their beliefs. The case of the 2010 Census is then detailed, bringing with it the context where there has been continued growth in Protestantism since 1970 in Brazil and their steadily rising influence on politics. It also shows the difficulty in determining which groups make up the Protestants according to the methodological choices of the institution that conducts the Census (IBGE). 1.0 Introduction This paper is part of a doctoral research in Information Science, in progress, over political campaign funding and protestant politicians performances in Brazil. The analysis of who these political actors are and how they influence national politics is a matter of great relevance in the Political and Social Sciences today. The understanding of how government managed elections data, how political campaign funding data are made available and how society's appropriation of this data occurs are issues to which the Information Science methodological and theoretical apparatus can contribute. Considering the recent years have been the longest period in Brazil in which a democratic regime has been able to operate in a relatively stable manner,1 researches that analyse the drivers that strengthen or threaten democracy are welcome on Social and Political Sciences. One of the main aspects that influenced the country’s political scene was the growth of evangelical influence groups, the so-called “Bancada Evangélica” (Evangelical Parliamentary Front). These “Bancada” imposed a conservative agenda that seeks to contest the secularity of state and restrict minority rights and civil liberties (Carranza and Cunha 2018) and, on the last presidential campaign, they actively supported the election of the current far-right president Jair Bolsonaro. This influence is not an isolated phenomenon in Brazil. The presence of evangelicals in politics has been taking place for more than three decades in all of Latin America (Freston 2008). 1 Since the beginning of the Republic in 1891, which was proclaimed and governed by the military in its first 8 years, until the constitution of 1988, Brazil went through brief democratic periods interspersed with two dictatorial regimes: the Dictatorship of the “Estado Novo” when Getulio Vargas ruled from 1930 to 1946 and the Civil-Military Dictatorship from 1964 to 1985. 79 In fact, the generic term “evangelical” does not reflect the multiplicity of social, economic, religious and moral groups that fit within this religious segment. However, in Brazil, according to Mafra, “[…] given the public visibility this segment has gained in public opinion, a certain consensus has been forged by referendum on the term ‘evangelical’ as a comprehensive category” (Mafra 2001, 7). It is worth noting that the generic term “evangelical” used in Brazil is the same as “Protestant” in the USA and Europe, while “Protestant” in Brazil is similar to “Main Lane Protestant Church” in the USA (Mafra 2001). As the Brazilian Census is one of the most used scientific classification instrument of religion studies in Brazil, a comparative review of how it classifies religions can bring relevant contributions to the studies over the processes of constructing knowledge organization systems of religion, also to theoretical and empirical studies of religion and its relation to politics. Effectively, who are those Christian groups that appear on the political scene? Do they represent the population contingent that calls itself evangelical? Are they homogeneous or is there a prominent protestant denomination that leads this movement? To start to answer those questions, the first step is to identify how part of the recommended literature and the bibliographic schemes classify Religion and then how Christian denominations are classified in relevant classifying schemes in the world. This paper details how this research is starting to work with these questions, in an introductory exploratory overview. 2.0 The method This study is an exploratory, qualitative, theoretical and documental research. It aims to make a comparative review of the classification of religion used in the Brazilian IBGE Census of 20102, with a specific attention on the protestant groups and denominations. It compares the traditional bibliographic classification schemes on Dewey Decimal Classification (DDC), Universal Decimal Classification (UDC) and Colon Classification (CC), showing their similitudes and differences with the classification constructed by the Brazilian Census. The bibliographic research explored specially three sources: the classifications 1) of religions in different and influent bibliographic schemes; 2) of the Protestants on specialized literature; 3) of the Protestants in the Brazilian census. 3.0 Religion Classification “Omnis determinatio est negatio” (Spinoza cited in Lenin 2011, 111). Every determination excludes, in purely logical terms. In ethical and political terms, what means each consecrated set of determinations, as the main bibliographic classifying schemes? Which contending worldviews operate in each case, for what reasons, with what consequences? Classifying religions is an arduous task. There is no consensus on the best way to classify them. In fact, there is not even consensus on what differs religion, denomination, sect or cult (Liebman, Sutton, and Wuthnow 1988). The most widely used CS in the West stem from European culture, created at the height of Imperialism Era, before the 2 IBGE. 2011. Censo Demográfico 2010: Características Gerais da População, Religião e pessoas com deficiência 80 Second World War. Christian religions were the main ones in the reality in which Dewey and Otlet lived, while Asia, Africa and Latin America (the now called Global South) were exclusively suppliers of raw material to Europe and USA. That hegemonic culture was reflected on both CS: DDC and UDC just constructed the Religion class (named 200 on both schemas) reserving the majority of sub-classes dedicated to Christianity and reserves the “Others” (290) to all other religions, putting together ancient religions (Greek and Roman Mythology) in the same level of Islam, Hinduism or Buddhism. Indeed, the UDC was more inclusive than the DDC because it specified much more religions on its main classification schema. Nevertheless, it kept the same structure giving most of the main classes dedicated to Christianity. It may not be fair to blame the librarians and professionals of that time who participated in the making of the classification schemes. Pragmatically, how many books related to Hinduism, Islam or the religions of China or Japan arrived on the Western countries? Would it be rational to classify religions according to their importance if the number of documents arriving in the libraries (and to the classifiers) was minimal? For instance, in UDC there was a whole classification section to Judaism (with 49 subclasses) while other religions with many more believers at that time did not have such relevance. Was there a political or socio-economic influence to include so many subclasses in the Judaism scheme? Or were there simply more documents about Judaism in Western libraries, since the Jewish presence in the West was relevant at the beginning of the 20th century and because Christianity itself came from Judaism, maintaining with Judaism an ambiguous bur intense relationship throughout the centuries? These schemas based mainly on Christian classes changed when Ranganathan built the Colon Classification (CC). That classification, for the first time, represents eastern religions in detail. Apparently, the classification proposed by the CC listed the religions in a historical order, i.e., ordering the oldest religions in the first positions of the classification scheme. However, it was not possible to find a bibliographic reference that would make explicit the choice for this methodological option. As can be seen in Table 1, CC was very concise classifying the Christian religions. One can note that, in the same way that the CDD and the CDU did not perform a greater detailing for the other Eastern religions, the CC did the same for Christianity. In the comparative table, we deliberately emphasize the terms sects, movements, denominations and churches in order to demonstrate how such terms appear without apparent criteria. We also analysed how the Chinese system performed the classification on religions. Although its use is restricted to China, it has great relevance since China has one sixth of the world population. Still in the 20th century, just after the Communist Revolution in China, four different classification schemes were created, each for different purposes. The four systems had, however, a common basis quite distinct from the Western Aristotelian vision. While Western CS work philosophy as their first hierarchical item, the Chinese classification brings Marxism as its first class and the class of religion appears as a mere appendix to the class of Philosophy. In the 1970s, there was an effort to create a single CS carried out by the University of Beijin, but this new system still keeps the same ideological bases as the four initial systems (Studwell, Wu, and Wang 1994). Unfortunately, during this literature review, it was not possible to verify deeply the classification of religions within the Chinese system, but it was interesting to verify the lack of relevance of the subject. 81 Indeed, several studies addressed how biased the western classification schemas are when the subject is religion. Vanda Broughton identified three main areas where bias can occur: “an illogical order, or distribution of notation, that causes one system to appear as dominant, use of vocabulary that has a strong flavour of one system or is special to that system and inadequate provision of detail other than for the 'favoured' religion” (Broughton 2000). The author notes, however, that it is possible to minimize the bias through the use of facet analytical techniques and that the most recent UDC revision of Class 2 can be compounded and can achieves a better degree of specificity. Idrees and Khalid (2009) wrote a study on the Islam classification and proposed amendments and expansion in order to give a better guidance to LIS professionals. Finally, the article by McIlwaine and Mitchell (2006) first at ISKO Conference and then replicated in one of the UDC's own reports, suggested ways to minimize the impact of classification bias through an auxiliary table listing religions in chronological order, i.e., from the oldest to the newest religions. In the same article, they agreed with Broughton that the revision of UDC class 2 would bring benefits to such classification system. Table 1: Christian Religions Equivalence between CDD, CDU and CC3 CDD CDU CC 281 Early church and Eastern churches 281 Primitive Churches. Eastern Churches. 61. Early Churches 281.93 Russian Orthodox Church 618. Russian 281.94 Greek Orthodox Church 611. Greek 613. Armenian 282 Roman Catholic Church 282 Roman Catholic Church 62. Roman Catholic 283 Anglican churches 283 Episcopal Churches – not Roman Catholic (Protestants) 283(410.1) Anglican Church 283(73) Episcopal Church on USA 283.5 Old Catholics (?) 284 Protestant denominations of Continental original 284 Continental Protestant Sects 63. Protestant 284.1 Luterans 284.2 Calvinists 284.3 Utraquistas. Tabortas. 284.4 Coterões. Gazaristas. 284.5 Huguenotes. French Movements 284.6 Moravian Brothers. Hernutos. 284.98 Old Lutherans. Free Lutheran church 285 Presbyterian churchs, Reformed centered in America 285 Puritanism 65. Puritanism 285.1 Presbyterian Churches 64. Presbyterian 285.8 Brownists. Barrowists. 3 The CDU that we accessed was in Portuguese. We translated the terms to English. But we could not find the English word of some of them (Coterões, Hernutos, Utraquistas and Tabortas) for them. 82 286 Baptist, Disciples of Christ, Puritanism 286 Movements that accentuate the baptism of Adults by immersion 286.12 Anabaptists. Menonists. 286.15 Baptists. 286.3 Adventists. 286.4 Other movements. Campbellistas. Christ Disciples (USA) 287 Methodist churches 287 Methodist 68L6. Methodist 289 Other denominations and sects 289 Other Movements 289.3 Mormons 289.4 New Jerusalem (Swedenborg) 289.6 Quakers 66. Quakers 289.954 Jeovah Witnesses 289.956 Liberal Catholic Church 4.0 Protestant Classification Issues People who profess the Christian faith and are part of religious groups or churches that emerged from the Protestant reform initiated by Luther in the 16th century are generally called Protestants or evangelical. However, how to categorize a sect or church created in the 21st century? From the same theological dogmas of the original groups or just from the self-denomination of its representatives? For example, according to Pew Report “Global Christianity” (2010), there were 801 million Protestants in the world and they represented 37% of the global Christian population. The report categorized Christianity in four major groups: Orthodox, Catholic, Protestants and “Other Christians”. The latter group refers to all Christian religions that do not fit into the first three. The classification of Catholics and Orthodox seems to be easy to accomplish, as they are churches that have well-defined theological, historical and geographical attributes over almost 18 centuries. However, how to make the difference between Protestant and other types of Christians, that are quite emblematic as Mormons, but who make a stand of not being classified as Protestants? Such difficulty in categorizing the Protestants was noted from the beginning of the study of Protestantism. One of the most respected bibliographies on the subject was the Protestant Ethics and the Spirit of Capitalism, by the German sociologist Max Weber. In this work, Weber tried to catalogue the various groups of Protestants that existed in his time and already indicated the difficulty in performing such categorization: “we can only do this by presenting religious ideas with the logical consistency of an "ideal type", which is only rarely found in historical reality. Precisely because of the impossibility of drawing clear boundaries in historical reality, our only hope in researching the most coherent of its forms is to tune in with its more specific effects.” (Weber 1999, 90). The author categorized the Protestants into five major groups (Lutherans, Calvinists, Pietists, Methodists and Anabaptists), based on the moral dogmas that each group followed. The interesting categorization adopted by Weber came from the need to perceive which groups fit more in the so-called spirit of capitalism, the new ethics that had emerged in the same historical period as the Protestant reform and that became stronger precisely in the countries that embraced capitalism with more intensity. In other words, 83 the methodological option of categorization had the objective of understanding how the new capitalist ethos was strengthened in each group.4 Bringing the discussion to present time, the Pew Institute uses two kinds of categorization: by movement and by denomination. The first one categorizes Christians in three major movements: Pentecostals, Charismatics and Evangelical and the criterion used for this categorization is based on the central dogmas of each religious group, independent of the historical and geographical context. The second one, by denomination, categorizes them by their history and origin. This last CS is the only one that resembles the bibliographic classification. In Brazil, the term Protestant is used usually on the academic and specialized contexts. As we noted on the introduction, non-Catholic Christian are generally called “evangélicos”. Mainstream media and everyday talking usually refer to them as “crentes” (believers), in a pejorative connotation. However, this so-called evangelical population is heterogeneous from a theological, economic and political point of view. Historically, the arrival of the first evangelicals in Brazil coincided with the opening of the ports with the coming of D. João and the permission for other religions than the Catholic to settle in the Portuguese colony. However, this first wave of Protestants had only the objective of serving the foreigners who arrived together with the Portuguese court and the act of evangelization was very timid. It was only in the 1910s that the first missionaries arrived in Brazil with the intention of evangelizing the population. The Assembly of God and the Christian Congregation were founded at that time (Mariano 2004). The second significant wave of protestants began to take root in the 1950s and 1960s with the founding of the “Igreja do Evangelho Quadrangular” (Foursquare Gospel Church) and then the “Brasil para Cristo” (Brazil for Christ), “Deus é Amor” (God is Love) and “Casa da Bênção” (House of Blessing) churches. Specifically, the Foursquare Gospel Church began to act with greater force, and as Mariano attests, “besides the emphasis on healing, this Pentecostal aspect was noted for the intense use of the radio and the itinerant preaching with the use of canvas tents” (2004, 123). From the end of the 1970s on, the number of evangelicals started to grow expressively with the emergence of the so-called Neo Pentecostal churches. Originating from Methodism, this new aspect of Protestantism brought a new form of philosophy: the so-called Prosperity Theology, which moved away from the asceticism of the Methodists and brought a new form of belief that valued material goods. In general terms, it defines that the greater man's faith, the greater his material prosperity. Unlike the traditional churches that had their origins in European countries and the USA, this new side of Protestantism grew out of churches shaped by Brazilian pastors and some of these churches became corporations with enormous financial, media and political capacity. The most common nomination of them is Neo-Pentecostals. Among these churches, we should highlight the “Igreja Universal do Reino de Deus (IURD)” (Universal Church of the Kingdom of God), the “Graça de Deus Internacional” (Grace of God International), Sara Nossa Terra (Heal Our Earth) and “Renascer em Cristo” 4 Keeping in mind this Weberian methodological option, we also ask about the relation between nowadays capitalist’ ethos and the many protestant groups in Brazil, from a critical study of the political performances of the politicians that are supposed to represent them. A complementary methodological approach, so, is to relate ethos and ideology, in the sense of false consciousness. This is to say, morality as a lure to obtain votes. We will come back to this point sooner. 84 (Reborn in Christ). Still according to Mariano, in the 80's alone the IURD grew 2,600% and at the end of the 90's it was estimated that it had more than 2 million followers (2004, 125). The organization controls the 2nd largest TV station in Brazil, hundreds of radio concessions and since then it has expanded internationally: in 2001 it was already present in 80 countries, having more than 1000 temples (Oro 2004). 5.0 The IBGE Census The Brazilian Institute of Geography and Statistics (IBGE) is a public institute of the Brazilian federal administration and the main provider of country’s demographic data. As stated on their site, “such information meets the demands of several types of segments of civil society, as well as the bodies at the federal, state and municipal level”. And one of their main missions is to conduct the National Census which occurs every 10 years. As far the research goes, just 6 countries (Brazil, Canada, México, Peru, Jamaica and Haiti) on South, Central and North America conducted a religion inquiring during their National Census. All other countries on the continent rely on private institutes to estimate how many practitioners exists and what are their religion. The research had focus just on this 3 sub-continents because it will be very time consuming to do it to all countries of the world. An interesting case occurs on the United States where there is a specific law that prohibits the National Census to conduct any inquiry about religion. The law promulgated in 1950 came at the same time as the horrors of Nazism were discovered after the end of the second world war, and the concern that the State might distinguish its citizens on the basis of their religion may have been the cause of the implementation of the law. However, during the research, no evidence was found that the motivation for such law occurred because of the Jewish Holocaust. Since 1872, the Brazilian census had included a question on Religion and until 1970 the Roman Catholic reigned absolute with 91,8% of the population declared themselves Catholic. However, after the 1991 Census, there were significant changes in the religious composition for the first time when there was a growth in the number of respondents who declared themselves evangelical. In the 1980s, the percentage of the population segment that declared itself evangelical was only 6.6% and rose to 9.0% in the following decade, and up to the present time the share of the evangelical population continues to grow steadily: according to the 2010 census, the Brazilian total population was made up of 22.2% evangelicals. There was also a slight growth of those who have declared themselves to be spiritists and of all other religions but the most notable fact is that the Catholic population have been diminished in a regular pace. Due to this new scenario, IBGE has sought a more appropriate categorization of existing religions, relying on the advice of a Brazilian NGO specialized in Religion called ISER (Institute of Religious Studies).5 Founded during the 1970s, ISER has a tradition not only in the area of religion but also in the defence of human rights, public security and the environmental issues. It regularly publishes two renowned peer reviewed journals in the area and books about religion and human rights, in addition to promoting ecumenical meetings among the various religious representatives. The schema they elaborated starts with three main classes: Traditional Protestants, Pentecostals and Others. The traditional protestants are 5 85 those from European and North American origin, the Pentecostals are those created by Brazilian pastors and “Others” are those who don’t fit on the first two categories. The application of the categorization was much criticized at the time. The methodological choice of ISER was concerned with naming the churches but the census technician could not identify the church in a proper way. As the census form came with an open question: "What is your religion?" without offering a list of options, the critics were in doubt as to how the interviewee would have answered and how precise the technician who wrote down the answers would reproduce what was said (Camurça 2014). Given the controversy, ISER asked IBGE for the database with the individual responses of the respondents (even if anonymized) so that it would be possible to assess how accurate the collection was. However, IBGE denied the request. The final categorization used and numbers collected by the Census was summarized in Table 2. In this table it is possible to see that the name of the Pentecostal churches appears differentiated from the religions categorized as Churches of Mission. The latter are the churches that appear in both the CDD and the CDU. It is worth noting the significant number of the “Evangelical undetermined” category, which represents more than 20% of all non-Catholic Christians. If we add up the number of ‘Evangelicals of Pentecostal origin – others’, the percentage rises to 33% of the total number of Evangelicals. That is, this number confirms a great pluralism among the evangelicals. One possibility was that many faithful adherents of stigmatized groups chose not to declare their attachment to a specific church, even more because during the census period there were many corruptions scandals linked with evangelical politicians and struggles in the evangelical milieu (Sottani 2010). Table 2: Protestant churches according to IBGE Census 2010 Churches of Mission Population % Luteran Church 999.498 2,00 Presbiterian Church 921.209 1,84 Metodist Church 340.938 0,68 Baptist Church 3.723.853 7,45 Congregational Church 109.591 0,22 Adventist Church 1.561.071 3,12 Other Mission Evangelical 30.666 0,06 Churches of Mission Sub Total 7.686.826 15,39 Evangelical Pentecostal Population % Igreja Assembléia de Deus (Assembly of God) 12,314,410 24.65 Igreja Congregação Cristã do Brasil 2,289,634 4.58 Igreja o Brasil para Cristo 196,665 0.39 Igreja Evangelho Quadrangular 1,808,389 3.62 Igreja Universal do Reino de Deus 1,873,243 3.75 Igreja Casa da Benção 125,550 0.25 Igreja Deus é Amor 845,383 1.69 Igreja Maranata 356,021 0.71 Igreja Nova Vida 90,568 0.18 Evangélica renovada undetermined 23,461 0.05 Evangelical Community 180,130 0.36 86 Other Evangelic Pentecostals 5,267,029 10.54 Undetermined Evangelical 9,218,129 18.45 Not identified as Evangelic or Catholic by IBGE Other Christian religiosities 1,461,495 2.93 Church of Jesus Christ of Latter-day Saints 226,509 0.45 Jehovah's Witnesses 1,393,208 2.79 Grand Total 49,962,264 100 6.0 Conclusion In Brazil, a popular proverb states that in the dinner table, you should not talk about religion, politics and soccer. However, in the last 35 years, two of those controversial topics were melded on the country’s incipient democracy. The turmoil that promoted a presidential impeachment under suspicious allegations and jail former president Lula da Silva had direct participation of evangelical actors. To understand more precisely who they are, what groups finance them and how democracy can protect itself from those groups are vital to protect civil liberties and rights. It is vital to avoid the persecution of minorities, or even majorities, “deviant” from the set of conservative values rhetorically or effectively defended by “evangelicals”. These persecutions take place in Brazil against religious minorities, specially of African origins and indigenous people, to nonheteronormative people, feminists, the arts in general, mass culture, critical social sciences and even modern science, insofar as it contradicts more literal readings of the Bible – creationism x evolutionism, for example. These persecutions take the concrete form of projects of law, censorship of plays and art expositions, public schools curricula changes (excluding the obligation of the study of African history, philosophy and sociology), targeting of publicity funds for more friendly media corporations, preaching for millions in churches, radio and television stations, owned or rented by these churches etc. (Almeida 2019). Besides, the conservative moralist discourse of protestant politicians mobilize the affections of the “evangelicals” in support of political groups whose actions in the field of economics have a clear neoliberal bias. These discourses were often widespread defunded out of Brazil in the form of bizarre fake-news against political opponents, through social networks of believers (illegally, during the electoral period)6, The main point here is that behind moralism, the flexibilization of the labour legislation, favouring the employers, the scrapping of public services, from health to education and even water supplies, actually contradicts the concrete interests of these same believers, mostly workers. In other words, moralism acts as a kind of Trojan Horse, camouflaging the interests that are effectively at stake. This paper maps the construction of protestant classification on several knowledge organization systems in order to understand how social researchers can critically use those classifications on their scientific researches. By comparing those classification with the IBGE Brazilian census, we could perceive how the census classification is adherent to the classifications used in worldwide relevant bibliographic systems. 6 We must note that in Brazil, broadcasting operations, even private ones, are public concessions. By the law, they should be committed with the promotion of accurate information, “serious” and popular culture, tolerance. 87 As we see, there are many possibilities to scrutinize the classification of evangelical groups and research in this subject can better clarify how these population moves among the various religious denominations in the evangelical field. On our ongoing research, we could identify how the census of other countries outside the American continent deal with Religion inquires and how the knowledge organization systems adhere to each other. Studying different CSs over a controversial theme as religion, in many countries and cultures, can open space to new and less ethnocentric biased intercultural and multifaceted classification schemes, what Otlet and Ranganathan aimed. References Almeida, Ronaldo D. 2019. “Bolsonaro Presidente: Conservadorismo, Evangelismo e a Crise Brasileira.” Novos Estudos CEBRAP 38: 185–213. Berger, Peter L. 1985. O Dossel Sagrado: Elementos para uma Sociologia da Religião. São Paulo Paulinas. Broughton, Vanda. 2000. “A New Classification for the Literature of Religion.” International cataloguing and bibliographic control 29, no. 4: 59–61. Camurça, Marcelo. 2014. “A Religião e o Censo: Enfoques Metodológicos uma Reflexão a Partir das Consultorias do ISER ao IBGE Sobre o Dado Religioso nos Censos.” Religiões em Conexão: Números, Direitos, Pessoas, edited by Christina Vital da Cunha; Renata de Castro Menezes. Rio de Janeiro: ISER, Instituto de Estudos da Religião. Carranza, Brenda, and Da Cunha, Christina Vital. 2018. “Conservative Religious Activism in the Brazilian Congress: Sexual Agendas in Focus.” Social Compass 65, no. 4: 486–502. Freston, Paul. 2008. Evangelical Christianity and Democracy in Latin America. Evangelical Christianity and Democracy in the Global South. Oxford: Oxford University Press. Idrees, Haroon and Khalid, Mahmood. 2009. “Devising a Classification Scheme for Islam: Opinions of LIS and Islamic Studies Scholars.” Library Philosophy and Practice (e-Journal), November. Lenin, Vladmir I. 2011. Cadernos Sobre a Dialética de Hegel. Rio de Janeiro: Editora UFRJ. Liebman, Robert C., John R. Sutton, and Robert Wuthnow. 1988. “Exploring the Social Sources of Denominationalism: Schisms in American Protestant Denominations, 1890-1980.” American Sociological Review 53, no. 3: 343–52. Mafra, Clara. 2001. Os Evangélicos. Rio de Janeiro: Zahar. Mariano, Ricardo. 1996. “Os Neopentecostais e a Teologia da Prosperidade.” Novos Estudos 44: 24-44. Mariano, Ricardo. 2004. “Expansão Pentecostal no Brasil: O Caso da Igreja Universal.” Estudos Avançados 18, no. 52: 121–38. McIlwaine, Ia C. and Joan S. Mitchell. 2006. “The New Ecumenism: Exploration of a DDC/UDC View of Religion.” Extensions & Corrections to the UDC 28: 9-16. Oro, Ari Pedro. 2004. “A Presença Religiosa Brasileira no Exterior: O Caso da Igreja Universal do Reino de Deus.” Estudos Avançados 18, no. 52: 139–155. Pew Research Center. 2010. “A Brief History of Religion and the U.S. Census.” Pew Research Center’s Religion & Public Life Project (blog). Sottani, Silvânia M.P. 2016. Biopolítica dos Afetos: Alteridades Ressentidas e a Circulação da Intolerância. Rio de Janeiro: UFRJ. Studwell, William E., Hong Wu, and Rui Wang. 1994. “Ideological Influences on Book Classification Schemes in the People’s Republic of China.” Cataloging & Classification Quarterly 19, no. 1: 61–74. Weber, Max. 1999. Ética Protestante e o Espírito do Capitalismo. Santana de Parnaíba: Pioneira. Yi-Yun Cheng – iSchool, University of Illinois at Urbana-Champaign, USA Khanh Linh Hoang – iSchool, University of Illinois at Urbana-Champaign, USA Bertram Ludäscher – iSchool, University of Illinois at Urbana-Champaign, USA Cacao, Cocao, or Cocoa? Reconciliation of Taxonomic Names in Biodiversity Heritage Library Abstract: The Biodiversity Heritage Library (BHL) currently hosts more than 150 thousand titles, and 57 million OCRscanned pages on biodiversity literature dating back to the 16th century. While great research efforts have been conducted to extract taxonomic names in BHL’s literature the issue of name reconciliation has yet to be studied. Through the use case of Theobroma cacao, commonly known as chocolate plants, this research aims at presenting a framework to reconcile species names in BHL by merging external taxonomies. We demonstrate this by using a logic-based, taxonomy alignment approach to match variations of species and subspecies names of Theobroma cacao from four major biodiversity sources: the Encyclopedia of Life (EoL), Integrated Taxonomic Information System (ITIS), Global Biodiversity Information Facility (GBIF), and the United States Department of Agriculture PLANTS Database (USDA Plants). 1.0 Introduction Consider the following hypothetical scenario: Charlie is a rising researcher in plant biodiversity with an interest in economic plants and wants to find more descriptions about such species and their family trees from historical texts. Determined to use the canonical online resource for biodiversity texts – the Biodiversity Heritage Library – Charlie starts with a quick search with cocoa, the ingredient in chocolate bars. There are two seemingly unrelated species names. She then turns to GBIF, another popular online resource for biodiversity information, to perform further searches. She found that in GBIF there are two names listed: Theobroma cocao and Theobroma cacao. Unsure which names are the approved scientific name, she tries both and finds 15 other variations of scientific names for Theobroma cacao shown in BHL. These 15 names are listed alphabetically, but without further information of how the terms relate to each other. With the exponential growth of biodiversity information and data, it has become increasingly difficult to reliably retrieve species based on species names, as shown in the example above. The Biodiversity Heritage Library (BHL) houses OCR-scanned pages of legacy biodiversity literature from natural history museum collections and other partnering institutions. This literature dates back to the 16th century, but oftentimes key taxonomic papers are obscured by the evolving, different variations of species names. While efforts have been put in leveraging natural language processing (NLP) and text mining methods to extract scientific names from the texts in BHL, such as developing a series of text mining tools (Page 2011, 2013), discussing taxonomic name recognition from texts (Wei, Heidorn, and Freeland 2010), or automating the process of extraction of names from texts (Batista-Navarro et al., 2015), little has been done in organizing and reconciling these scientific names to provide a better species name representation. 89 Through the use case of the species Theobroma cacao, commonly known as chocolate plants, our research aims to develop a framework for reconciling species names in BHL by merging external taxonomies. We propose to employ a logic-based, taxonomy alignment approach (Franz et al., 2015, 2016; Cheng and Ludäscher, 2019) to match variations of species and subspecies names of Theobroma cacao from four major biodiversity sources: the Encyclopedia of Life (EoL), Integrated Taxonomic Information System (ITIS), Global Biodiversity Information Facility (GBIF), and the United States Department of Agriculture PLANTS Database (USDA Plants). 2.0 Related Work 2.1 Quality issues in aggregated databases such as GBIF, BHL Data quality issues in aggregated biodiversity databases are not uncommon. Numerous studies have discussed the issues as well some practical solutions towards these aggregated databases. Parr et al. (2012) discuss how the rapid growth of biodiversity information from various repositories exacerbates the discovery and integration of the knowledge to the “Tree of Life”, and call for a linked data solution to connect all different aggregated databases. Franz and Sterner (2018) point out how these aggregators underplay a “taxonomic backbone” in their design systems, and shift responsibilities of data quality issues to the data source providers. The authors conclude that rather than correcting datasets from the root providers, providing services that enhance the taxonomic concepts in these systems would increase the trust and collaboration among systematists and the aggregator communities. While the discussions on data quality mostly circulate around aggregators such as GBIF, extracting scientific articles and finding species names in BHL has also been proven to be an unresolved concern. 2.2 Extracting names in BHL BHL uses Global Names Recognition and Discovery (GNRD) services to analyze the OCR-scanned texts and identify any string that can potentially be scientific names. Many studies have proposed using natural language processing (NLP) and text mining methods to enhance the recognition and extraction of scientific names from BHL. For instance, NLP techniques have been developed to support the extraction of taxonomic names and morphological character from taxonomic descriptions (Thessen, Cui, and Mozzherin 2012). Page (2011, 2013) has developed a series of text mining tools (BioNames, BioTor) that aims to enhance the extraction of names and retrieval ability of BHL. Recently, Page (2019) proposed an approach to link taxonomic names from other databases to papers from BHL with the attempt to leverage existing resources. These prior works show great potential in enhancing the extraction and recognition of names from BHL on top of its native embedded service on the GNRD API. However, how to best utilize, organize, and represent the extracted information from BHL warrants further research. Figure 1. BHL’s current search results only return a flat structure of species names 90 2.3 Reconciliation of names Name recognition and extraction are of growing interest within the context of BHL to provide semantics for unstructured data (Wei, Heidorn, and Freeland 2010); nevertheless, the extraction of names is only the first step towards retrieving species information from text, the reconciliation of species names is the next unsettled territory to many biodiversity experts. In Franz et al. (2016), the authors reconciled 11 different taxonomies across 126 years of time span of the Andropogon complex and concluded that the meanings of scientific names can change significantly over time. Analogously, extant literature in Knowledge Organization also emphasized how names are a diachronic concern by both biologists and information scientists alike (Blake 2011). Our prior work discussed the reconciliation of names in evolving, disputed geo-entities (Cheng and Ludäscher 2019). To date, BHL remains a cornucopia of historical taxonomic names and species information. However, the current organization of species taxonomies in BHL makes it hard to relate species names and determine the credibility of species hierarchies. This paper proposes to reconcile species names from external taxonomies and link this value-added information with information from BHL to obtain an even more informative and userfriendly presentation of search results. 3.0 Use Case: Theobroma cacao We started with a single species to investigate the current taxonomic backbone of BHL. The species we chose to examine is commonly known as cacao (or cocoa) tree, scientifically known as Theobroma cacao. Upon searching Theobroma cacao, BHL performed a full-text search and returned about 4,000 search results on all the publications and their metadata (title, author, date, page number) relating to the search term. 15 scientific names were listed alphabetically on the BHL interface. However, the relationship of how each term relates to Theobroma cacao, whether hierarchical, synonyms, or siblings, was unknown to the users. For instance, when clicking a term among the 15 names such as Cacao theobroma, BHL returns a list of bibliography that contains the keyword. The current BHL structure is shown in Figure 1. Because information about where species and higher taxa occur in a taxonomic hierarchy is essential for taxonomists, biodiversity experts, and researchers, we are particularly interested in how BHL presents the information of scientific names, and whether a better approach to organize these names can be provided. Figure 2. Our proposed taxonomic structure framework. 91 3.1 Method We propose a framework to gather species taxonomies through external databases. Specifically, we compare and merge taxonomies of species in four major sources: EoL, GBIF, ITIS, and USDA Plants (Figure 2). In this study, we specifically investigate the inclusion of subspecies into the taxonomic backbone of BHL. 3.2 Data Collection Our data was collected in December, 2019. Detailed descriptions of the data are stated below. The taxonomies of the four different external databases are not mutually exclusive and may overlap with one another. 3.2.1 Encyclopedia of Life (EoL) EoL is one of the aggregated databases that annotates preferred scientific names, common names, and synonyms of species. Notably, EoL maintains curated dynamic hierarchies for each species. EoL staffs manually curate and edit the taxonomic information as needed when suggested by biodiversity experts and likewise communities1. 3.2.2 Global Biodiversity Information Facility (GBIF) GBIF is one of the most popular sources for biodiversity information, including species occurrences, publication, peer-reviewed data, and etc. GBIF published a backbone taxonomy containing taxonomies from EoL, IUCN Red lists, published papers, and more. The taxonomic information is GBIF is updated via an automated process, and Catalogue of Life (CoL) is the primary source that GBIF compares the taxonomies upon. 3.2.3 Integrated Taxonomic Information System (ITIS) The goal of ITIS is mainly to provide species names and taxonomic information. It contains approved species taxonomy by biodiversity experts and its Taxonomy Working Group (TWG). Performing a search on any species will result in a list of accepted or not accepted species names, followed by species hierarchies, expert references, or other sources. 3.2.4 United States Department of Agriculture PLANTS Database (USDA Plants) The scope of USDA Plants is slightly narrower than the previous mentioned counterparts. Performing a search on species names will lead users to the species classification, and subordinate taxa which documented species parents (genus, family), and subspecies. Table 1 shows the subspecies we collected from the four external sources. We considered all the infraspecific epithets (names) as children of a species, so species names that contain keywords such as subsp., ssp., infrasubsp., f., fm., var., were all grouped as subspecies in our analysis. In particular, we viewed subsp. equivalent to ssp. (subspecies), and f. equivalent to fm. (form). That said, if a database consists of both Theobroma cacao ssp. cacao and Theobroma cacao subsp. cacao, we only keep one name for the purpose of analysis. 1 EoL information page: 92 Table 1. A list of subspecies in the four sources examined. *Data are collected December 2019. Prefix Theobroma cacao omitted; changed punctuations (.) to underscores (_). 3.3 Logic-based Taxonomy alignment approach 3.3.1 Taxonomy We define taxonomy T as a hierarchical, tree structure of terms (or names): Each node in the taxonomy has only one parent (with the exception of the root node having no parent). Sibling nodes typically represent disjoint taxa, i.e., two nodes on the same level in the tree are considered mutually exclusive. When considering the children of a node, we could either assume that all children are known (i.e., we know or assume that no other children nodes exist), or that there might yet be unknown children. In case of the latter, we introduce a placeholder node called “other to design for future possible changes (e.g. Theobroma_cacao_other). 3.3.2 Taxonomy Alignment Problem (TAP) To compare two taxonomies T1 and T2, we first identify a set of articulations (relations) used to describe how concept X in T1 relates to concept Y in T2. The region connection calculus RCC-5 (Randell, Cui, and Cohn 1992; Cohn and Renz 2008) can be used to define specific articulations: equals, overlaps, disjoint, includes, is_included_in. Then, we input these constraints to Euler/X (an Answer Set Programming, Python-based tool), which provides us with different merged solutions for these pairwise comparisons. Euler/X will either conclude with (1) an inconsistent outcome with zero Possible World (PW) (n=0); (2) a single, uniquely merged PW T3 (n=1), usually the desired outcome; or (3) multiple merged PWs T3 (n≥2), where each world is a possible reconciliation of how the two taxonomies can be merged. A simple TAP is shown in Figure 3, where a few relations are marked as equivalent ‘=’, and the ‘other’ nodes are left with open relations. The resulting PW shows that all the children nodes in both taxonomies are equivalent (in grey boxes). Details of the Euler/X tool workflow and descriptions for implementation are explained in Cheng et al.(2017). Figure 3. Example of a TAP. Input figure (left). One Possible world (right). 4.0 Results From Table 1, it shows that some of the subspecies names are equivalent, but with variations on choosing either using ssp. or subsp. to represent subspecies, using f. or fm. 93 to represent forms, and using different authorship spellings (e.g. A. Chevalier vs. A. Chev, L. vs. Linnaeus, etc). For such cases, we regard the two names as equivalent. Table 1 shows that ITIS and USDA plants provide fewer subspecies names. Therefore, we begin our alignment with these two taxonomies that are less complex than others, Given that one of the limitations of our taxonomy alignment approach is that only two taxonomies can be compared at once, we executed six pairwise alignments in total to compare each taxonomy with the other three (TITIS-TUSDA, TEoL-TITIS, TITIS -TUSDA, TGIBF -TITIS, TGBIF -TUSDA, TEoL-TGBIF). 4.1 TITIS-TUSDA We begin our alignment with these two taxonomies. Evidently, as shown in the PW of Figure 4, every node is considered congruent (in grey round boxes), meaning the two names from each taxonomy is equivalent. This means instead of six pairwise alignments of the four taxonomies, we can reduce our number of alignments to four given the merged Possible World (PW) of TEoL-TITIS and TEoL-TUSDA, TGIBF -TITIS and TGBIF -TUSDA will be in exactly the same structure. Figure 4. Input alignment (top) and output Possible World (bottom) for TITIS and TUSDA 4.2 TEoL-TITIS, TEoL-TUSDA Considering that TITIS and TUSDA are equivalent, here we only show the result for TEoL-TITIS (Figure 5). TEoL provides more subspecies information than either TITIS or TUSDA. As a non-biodiversity expert, we cannot assert how these subspecies relate to each other, therefore, we left the relations open in the input alignments. As a result, all the extra subspecies in TEoL are inferred to be also merged under “TCOther” in TITIS (and TUSDA) (Figure 5, PW). This indicates that TEoL may be a bigger taxonomy than the other two and TEoL.Theobroma cacao includes both TITIS.Theobroma cacao and TUSDA. Theobroma cacao. 94 Figure 5. Input alignment (top) and output Possible World (bottom) for TEoL and TITIS. 4.3 TGBIF and TITIS, TGBIF -TUSDA Similar to the result in 4.0.2, since there were no direct counterparts for many of TGBIF's subspecies, we left the nodes' relations open without linking them to anything in TITIS or TUSDA . The merged PW is also indicative that TGBIF is more granular in terms of subspecies, and the subspecies should be merged under TITIS's “Other” category. (See Appendix for visualizations of 4.3 result) 4.4 TEoL and TGBIF Figure 6 shows the two merged PWs of TEoL and TGBIF. Notably, for this pair of taxonomies, we ended up with more than one PWs, potentially due to the influence of the “other” category. Specifically, the differences between the two PWs are caused by whether the infraspecies of TGBIF is within or equivalent to TEoL.Other; or that TEoL Theobroma cacao ssp. leiocarpum Bernoulli Cuatrec is within or equivalent to TGBIF.Other. At this point, it is difficult to discern which PW yields a more reliable merged solution. Expert opinions are needed to proceed further in this case. Figure 6. Possible Worlds for TEoL and TGBIF 95 5.0 Discussion and Conclusion In this paper, we have presented a framework to reconcile taxonomies from different sources in BHL. Specifically, we have conducted six pairwise taxonomy alignments on four different taxonomies (ITIS, USDA Plants, EoL, and GBIF). As shown also in prior research (Franz et al., 2015, 2016; Cheng et al., 2017; Cheng and Ludäscher, 2019), a logic-based approach to taxonomy alignment can be used to align and merge different taxonomic perspectives into a solution that makes hidden relationships explicit. In this paper, the merged Possible Worlds of the alignments mainly serve as a subspecies grouping mechanisms that allow users to identify which taxonomies are more granular than the other, as illustrated in 4.0.2 and 4.0.3. The result for 4.0.1 also partially suggests that the PW may serve as a name disambiguation mechanism, where the grey boxes that groups equivalent terms shows that Theobroma cacao ssp. cacao L may be synonymous with Theobroma cacao subsp. cacao L. Further, while reserving a residual category “Other” to “designing for change” (Tennis, 2012) in classifications is usually considered a good practice, our results in 4.0.2, 4.0.3, 4.0.4 exemplified that the PWs will be partially influenced by the “residual category”. The children nodes may be classified into the “Other” category and creates ambiguities for the alignment results. Moreover, our framework tries to minimize the information overload during the alignment process in these aggregated databases. Interoperability endeavors such as taxonomy alignments, cross-walking, or ontology mapping rely substantially on human decisions, especially when the topic of alignments is domain-specific. Experts of a domain asserts what kind of relations a concept in taxonomy A has with taxonomy B. In this paper, we attempted to reduce expert involvement at the beginning stage by using semiautomatic alignment process and incorporating existing external taxonomies. This is not to say that these taxonomies are the ground truth of how Theobroma cacao's taxonomy should look like, nor that the species names aligned are the absolute answers for equivalencies (they are a lot of times not equivalent due to evolving semantic changes (Franz et al., 2016). Rather, these merged solutions are serving as interim knowledge organization systems pending to be further scrutinized by biodiversity and taxonomy experts in the future. Given the large amount of data in aggregated databases such as the Biodiversity Heritage Library, we believe using this approach to establish a minimal viable knowledge product first can be helpful for further efforts. In future work, we plan to extend this study mainly by (1) employing this framework to generate result for more species; and (2) implementing the merged PWs as a new species information representation structure that can be used alongside BHL and assessing its retrieval effectiveness. In the opening scenario, Charlie's searches actually further reveal that similar looking names in BHL such as Theobroma cacao L and Theobroma cacao Linnaeus yield identical search results with 2,733 records. This may suggest that BHL has an internal infrastructure that recognizes and organizes keywords together. We also hope to explore this and to incorporate parents (genus, family, order, or class-level), siblings (other species within the same genus), synonyms, and common names into the taxonomies to form more comprehensive species hierarchies in our framework. Ultimately, we hope to continue conversations with BHL on improving the practices of organizing species names and name reconciliation services. Acknowledgement This project is supported by the Center for Informatics Research in Science and Scholarship (CIRSS) at iSchool, University of Illinois at Urbana-Champaign. The first author 96 would also like to acknowledge the LEADS-4-NDP fellowship program, Dr. Jane Greenberg, and Steven Dilliplane for their ongoing support of this work. References Blake, James. 2011. “Some Issues in the Classification Of Zoology.” Knowledge Organization 38: 463–472. Cheng, Yi-Yun., Nico Franz, Jodi Schneider, Shizhuo Yu, Thomas Rodenhausen, and Bertram Ludäsche. 2017. “Agreeing to Disagree: Reconciling Conflicting Taxonomic Views Using a Logic‐Based Approach.” Proceedings of the Association for Information Science and Technology 54: 46-56. Cheng, Yi-Yun and Bertam Ludäscher. 2019. “Exploring Geopolitical Realities through Taxonomies: The Case of Taiwan.” NASKO 7: 77-93. Cohn, Anthony G. and Jochen Renz. 2008. “Qualitative Spatial Representation and Reasoning.” In Handbook of Knowledge Representation, edited by Frank van Harmelen, Vladimir Lifschitz, and Bruce Porter. Amsterdam: Elsevier, 551-596. Franz, Nico M., Mingmin Chen, Shizhuo Yu, Parisa Kianmajd, Shawn Bowers, and Bertram Ludäscher. 2015. “Reasoning Over Taxonomic Change: Exploring Alignments for the Perelleschus Use Case.” PloS One, 10, no. 2: e0118247. Franz, Nico M., Mingmin Chen, Parisa Kianmajd, Shizhuo Yu, Shawn Bowers, Alan S. Weakley, and Bertram Ludäscher. 2016. “Names Are Not Good Enough: Reasoning Over Taxonomic Change in the Andropogon Complex.” Semantic Web 7, no.6: 645–667. Franz, Nico M. and Beckett W. Sterner. 2018. “To Increase Trust, Change the Social Design Behind Aggregated Biodiversity Data.” Database 2018. Page, Roderic D.M. 2011. “Extracting Scientific Articles from a Large Digital Archive: Biostor and the Biodiversity Heritage Library.” BMC Bioinformatics 12: 187 Page, Roderic D.M. 2013. “Bionames: Linking Taxonomy, Texts, and Trees.” PeerJ 1: e190. Page, Roderic. 2019. “Text-Mining BHL: Towards New Interfaces to the Biodiversity Literature.” Biodiversity Information Science and Standards 3: e35013. Parr, Cynthia S., Robert Guralnick, Nico Cellinese, and Roderic D.M. Page. 2012. “Evolutionary Informatics: Unifying Knowledge about the Diversity of Life.” Trends in Ecology & Evolution 27, no. 2: 94-103. Randell, D. A., Zhan Cui, and Anthony G. Cohn. 1992. “A Spatial Logic Based on Regions and Connection.” Knowledge Representation and Reason 92: 165-176. Tennis, Joseph T. 2012. “The Strange Case of Eugenics: A Subject’s Ontogeny in a Long-Lived Classification Scheme and the Question of Collocative Integrity.” Journal of the American Society for Information Science and Technology 63, no. 7: 1350–1359. Thessen, Anne E., Hong Cui and Dmitry Mozzherin, Dmitry, 2012. “Applications of Natural Language Processing in Biodiversity Science.” Advances in Bioinformatics 2012. Wei, Qin, Patrick B. Heidorn, and Chris Freeland. 2010. “Name Matters: Taxonomic Name Recognition (TNR) in Biodiversity Heritage Library (BHL).” In 2010 iConference Proceedings: 284. 97 Appendix 4.3. Result for TGBIF and TITIS (Top: input taxonomies; Bottom: PW) Result for TGBIF -TUSDA (Top: input taxonomies; Bottom: PW) Stephanie Colombo – Universidad de la República, Facultad de Información y Comunicación, Uruguay Representation and Misrepresentation in Knowledge Organization The Cases of Bias Abstract: The concepts of representation and misrepresentation are recognized and used within Knowledge Organization. Although it is easier to find works related to the term representation in the literature, there are not many works that address the term misrepresentation from a conceptual perspective. Misrepresentation, as opposed to representation, has a negative connotation and it is considered a problem in Knowledge Organization. Despite the fact that misrepresentation could appear in any kind of knowledge organization systems, or that some systems have historically been criticized for the lack of representation of certain sectors of users, the term as such, has gained strength in recent years after the appearance of some theories that support more local systems instead of universal systems. In this work the concepts of representation and misrepresentation will be approached from a conceptual perspective and will be related to other terms within Knowledge Organization, such as bias. The relationship between positive bias or slant and representation, and negative bias as a form of misrepresentation. These relationships will be approached from the perspective of warrants, especially cultural warrant. 1.0 Introduction The purpose of this paper is to provide an overview of the relationship between the terms representation and misrepresentation and their relationship with the term bias, specifically the terms positive bias and negative bias. Although these terms are recognized and used within Knowledge Organization literature, it is not easy to find an approach from a conceptual perspective and precise terms’ definitions. Dubuc (1999, 33) claims that “the situation in which the terms are found condition the concept at the communication field to such an extent that [even] the same concept may receive different names or labels depending on the specialty in which it is used”. This also occurs the other way around, the same denomination may have more than one concept. “[A] concept exists within personal knowledge structures, in one or more minds. The concept can be represented by a symbol, such as a word or string of words, that may be uttered or recorded (e.g. written). The symbol thus indirectly represents the referent. The same referent may give rise to varied concepts in different minds. The same concept may be represented by different symbols.” (Vickery 1986, 146) As Gutiérrez Rodilla (2005, 10) states “Language is, […], a constitutive part of science. Therefore, it is impossible to learn science without knowing the language in which that science is expressed and without knowing how to correctly interpret its speech.” Therefore, it is necessary to use the exact terms in a specific field to assure a clear and coherent communication, avoiding ambiguities and misinterpretations. The structure of this paper is as follows. In the following section the concepts of representation and misrepresentation will be developed from a conceptual approach. In the section titled ‘Word vs. term’ a terminological perspective of the terms will be provided. Pointing out the main variation between word and term will be beneficial to have a better understanding of the differences between the conceptual approaches. 99 Principally, the difference between the word ‘misrepresentation’ and the term ‘misrepresentation’. In the section after this, the terms representation and misrepresentation will be presented from the term bias point of view. In particular, according to negative bias and positive bias. After that, the relationship among the above-mentioned terms will be discussed in relation to warrants and hospitality, more precisely cultural warrant and cultural hospitality. 2.0 The concept of representation and misrepresentation The way in which the terms representation and misrepresentation appear in the professional literature about Knowledge Organization is very singular. The term representation is commonly used in syntagmatic forms such as subject representation (Milani, Guimarães, and Olson 2014; Olson 2002), knowledge representation (Milani and Guimarães, 2011), among others. In the case of subject representation, the meaning refers to topic representation in catalogues through indexing and classification. In other words, how the work’s content is represented through topics or descriptors from indexing and class numbers or notations from classification. According to Olson (2002, 3) they are “the key to subject access”. In the book The Power to Name (Olson 2002) the author develops the representation concept through the action of assigning names to topics. She chooses the expression ‘naming’ because this action reflects a conscious will at the moment of representing concepts. She says “I choose the word naming because it connotes the power of controlling subject representation and, therefore, access.” (Olson 2002, 4) By preferring a name (terms or words) to represent a concept, an identity to that concept is being established. This identity is biased, it has a way of recognizing and observing that reality. Knowledge Representation could refer to either the name of an autonomous discipline from Knowledge Organization (Giunchiglia, Dutta, and Maltese. 2014) or a subordinate part of Knowledge Organization (Barité et al. 2015; Dahlberg 1993). Giunchiglia et al. (2014, 47) say that through ontologies, Knowledge Representation “provides a more expressive representation and query language, able to codify and automatically query such knowledge”. On the other hand, in Barité et al. (2015, 136) the meaning of Knowledge Representation is described as: “the group of processes of notational or conceptual symbolization of human knowledge in the field of any discipline. Knowledge Representation includes Classification, Indexing and the group of computer and linguistic aspects related to the symbolic translation of knowledge.” All the cases mentioned above share the semantic background of the concept representation. Representation can be understood as “the description or portrayal of someone or something in a particular way.” (Representation 2019). Its meaning implies presenting again or presenting in a different way, for instance a concept. In all the cases, the use of the term representation in Knowledge Organization has a positive connotation, i.e., a determined concept or phenomenon is correctly represented. On the contrary, the term misrepresentation appears in an isolated way and not in a syntagmatic way. It usually accompanies the term representation in any of its forms 100 (syntagmatic or isolated). It is used as an antonym to reinforce the idea of representation. It always appears as a representation problem (Milani and Guimarães, 2010). The use of the term misrepresentation is so linked to the use of representation that even its origin seems to be a consequence of the first. For a better understanding of this, it is necessary to explain the differences between word and term. 3.0 Word vs. term Although at first word and term seem to be the same, they present some differences between them as far as their concepts and uses are concerned. “A word is a unit described by a set of systemic linguistic characteristics and endowed with the property of referring to an element of reality.” (Cabré 1999, 25, in translation). While a term is “a unit of similar linguistic characteristics, used in a speciality domain. From this point of view, a word that is part of a specialized field would be a term.” (Cabré 1999, 25) Thus, the principal difference between words and terms is their use field. Speakers are the ones who establish the label ‘word’ or ‘term’ based on different contexts and through their use. Pearson claims: “While we accept that there are indeed differences between words and terms, we find that, without human intervention, it is not possible to use any of the proposed definitions of term as a means of distinguish between terms and words. This is because terms very often look the same as words and frequently not only look the same as words but can also function as words, albeit in different circumstances.” (Pearson 1998, 8) There are different definitions and ways of differentiating the concepts of words and terms. Cabré (1999), for example, mentions 4 situations in which words can be differentiated from terms: a) by its users: words are used by any speaker of that language, while terms are used by the specialists or experts in a certain knowledge. b) by the situation in which there are used: words are used in any form of communication, while terms are presented in more formal channels of communication. c) by the topic they represent: terms usually refer to concepts within a specific field, while words are used to refer to wider variety of meanings. d) by the kind of speech in which they usually appear: words are used in any type of speech, while terms usually appear in specialised speeches through their diffusion channels. In this way, the expressions which belong to a specific field and which are used to refer to a particular concept within a particular domain are considered terms. There are different ways to create terms and word. One of them is when a word or term moves to the other’s category. In other words, words turn into terms or terms become words. These processes are called terminologization (Gutiérrez Rodilla 2005) and de-terminologization (Meyer and Mackintosh 2000). As Gutiérrez Rodilla states: “the terms enjoy great mobility, both horizontally - that is to say, they move from one area of knowledge to another, with the same or different meaning - and vertically - even the most highly specialized can become words used daily by all speakers.” (Gutiérrez Rodilla 2005, 29) It seems natural to consider this process as the way in which representation has been coined in the Knowledge Organization terminology. As it was explained in the section 101 before, the word and term ‘representation’ keep the same semantic background. However, this does not happen with misrepresentation. The definition of the word misrepresentation is the following: “the act of deliberately giving false information to someone, especially in order to persuade them to enter into a contract, or a statement giving false information.” (Misrepresentation 2019) Despite the fact that both term and word have negative connotations, the definition of the word ‘misrepresentation’ implies a willing and conscious action. Misrepresentation is being created to persuade somebody or pursue something deliberately. Nonetheless, this not always the case in Knowledge Organization. A knowledge organization system could be misrepresented even if the author did not have the intention of doing it. The reason of this could be the lack of knowledge of other realities or ways of presenting concepts, the way in which this author perceives reality or the predominant way for this author, and finally just because it is representative for a community but not for another. Another hint could be that, as it was mentioned before, the term misrepresentation usually appears together with the term representation. If the prefix ‘mis-’ in English is analysed, it has 3 different senses: “1 bad or badly; 2 wrong or wrongly; 3 used to refer to an opposite or the lack of something” (Mis- 2019). The term misrepresentation covers the 3 aspects in Knowledge Organization. A knowledge organization system can present misrepresentation through bad representation, a wrong representation or a lack of representation. Taking all this into consideration, misrepresentation as a term does not seem to come from a terminologization process as it happens with the term representation. It could be inferred that misrepresentation is created in contrast to the term representation. 4.0 Bias as a form of representation and misrepresentation There are several authors who recognize the existence of bias in knowledge organization systems (Higgins 2012; Mai 2010). Despite this, there are few of them who identify the different aspects of the term and connect it with the representation and misrepresentation concepts. As Broughton says: “bias is said to exist when a controlled vocabulary contains an unduty large number of terms reflecting the ideas, interests or positions of a particular sector or field, or when terms relevant to another sector or field fail to appear. This may occur because the language of a particular group is preferred.” (Broughton 2012, 256) From the concept of bias two situations can be established. One of these situations indicates when a way of representing reality is relevant and beneficial for a certain community. This bias is called positive bias (Colombo 2015; Colombo and Barité 2015) or slant (Guimarães 2017). The other situation happens when a bias does not represent the ideological, cultural peculiarities and fails to represent the concepts, in some cases reaching prejudice. This aspect is called negative bias (Colombo 2015; Colombo and Barité 2015). Using Virtual Reality domain as an example of the use the concept of bias in relation to representation, Brey (1999, 12) says “when a VR application favours certain values or interests over others due to its choices in representation, it may be said that the model makes use of biased representations.” 102 Consequently, a knowledge organization system with a certain representation or representing a specific group or way of thinking, is a biased system from that point of view or form. On the other hand, a system that has misrepresentation as a result of a void in the representation or due to the fact that the concepts are not represented in a proper way, it has a negative bias. Following the example of Virtual Reality: “When a VR application fails to uphold accepted standards of accuracy by representing features as real that by such standards cannot justifiably be held to be present in reality or by failing to represent features that ought to be present in the application, we may say that the application misrepresents reality” (p. 11) (Brey 1999) Whereas Brey identifies both cases, ‘bias representation’ and ‘misrepresentation’ as “two types of representational failures or shortcomings” (Brey 1999, 12), this does not happen in Knowledge Organization. In the last years, Knowledge Organization has focused more on local systems instead of pursuing universality. Hence and according to this, it may be helpful to have knowledge organization systems positively biased. In Mai words “while modern classification aims at representing the universe of knowledge, postmodern classification aims at providing a pragmatic tool for specific domains.” (Mai 2004, 39) The issue is to decide for which community sector the bias is representative and for which it is not and how to detect these particular characteristics. “Verifiable misrepresentation requires that there are unambiguous, shared standards of accuracy in place according to which judgments of misrepresentation can be made.” (Brey 1999, 11) One of the ways to determine the cultural aspects of a sector is through the cultural warrant. 5.0 Cultural warrant and cultural hospitality in relation to representation A form of representation of a certain group in a knowledge organization system can be determined from the warrants. Depending on the approach of what needs to be represented or the bias that is sought to be obtained, it could be literary warrant, academic warrant, cultural warrant, among others. Literary warrant is based on the documents while academic warrant is based on the opinion of experts and cultural warrant, in particular, “means that any kind of knowledge representation and/or organization system can be maximally appropriate and useful for the individuals in some culture only if it is based on the assumption, values, and predisposition of that same culture. Conversely, if a system is not based on those assumptions, it will be appropriate and useful to some lesser extent for the individuals in the culture” (Beghtol 2002, 511) It is important to bear in mind that these warrants can be combined and are not exclusive to each other. To read more about warrants see (Barité 2018). In relation to the concept of ‘cultural warrant’ is the concept of ‘cultural hospitality’. The term ‘cultural hospitality’ is a deviation of the term ‘hospitality’. The concept of hospitality implies that a knowledge organization system is capable of introducing a new concept or term into its structure. The system must provide tools not only for the inclusion of an element, but also for establishing relationships between them, generating more permeable and not so rigid systems. Cultural hospitality in particular “means that a knowledge representation and organization system can ideally accommodate the various warrants of different cultures and reflect appropriately the assumption of any individual, group, or community.” (Beghtol 2005, 905) 103 In any case, the system has to provide a clear mention about not only how to introduce new concepts, but also for whom the system is set, or in other words, for which user community is more representative. 6.0 Conclusion All things considered, it can be observed that the terms ‘representation’ and ‘misrepresentation’ have a close relationship with the terms ‘positive bias’ and ‘negative bias’. In this context, bias is considered as a form of representation. What is more, it is not possible to think about representation without thinking about cultural warrant as a means to ensure a correct and better representation for each situation. References Barité, Mario. 2018. “Literary Warrant.” Knowledge Organization, 45: 517-536. Barité, Mario, Stephanie Colombo, Amanda Duarte Blanco, Lucía Simón, Gabriela Cabrera Castromán, María Luisa Odella, and Mario Vergara. 2015. Diccionario de Organización del Conocimiento: Clasificación, Indización, Terminología (6th ed.). Montevideo: CSIC. Beghtol, Clare. 2002. “A Proposed Ethical Warrant for Global Knowledge Representation and Organization Systems.” Journal of Documentation 58: 507-532. Beghtol, Clare. 2005. “Ethical Decision-Making for Knowledge Representation and Organization Systems for Global Use.” Journal of the American Society for Information Science and Technology, 56, no. 9: 903–912. Brey, Philip. 1999. “The Ethics of Representation and Action in Virtual Reality.” Ethics and Information Technology, 1 no. 1: 5–14. Broughton, Vanda. 2012. Essential Library of Congress Subject Headings. Londres: Facet Publishing. Cabré, Maria Teresa. 1999. La Terminología: Representación y Comunicación: Elementos Para una Teoría de Base Comunicativa y Otros Artículos. Barcelona: Institut Universitari de Lingüística Aplicada. Colombo, Stephanie. 2015. “Sesgo y Universalidad: Un Enfoque Histórico-Conceptual.” In Organización del Conocimiento: Sistemas de Información Abiertos: II Congreso ISKO Espa- ña-Portugal – XII Congreso ISKO España. Murcia: Universidad de Murcia, 598-602. Colombo, Stephanie and Mario Barité. 2015. “Tres Enfoques de Bias en Organización del Conocimiento: Bias Neutro, Bias Negativo y Bias Positivo.” Brazilian Journal of Information Science: Research Trends, 9, no. 2: 9-13. Dahlberg, Ingetraut. 1993. “Knowledge Organization : Its Scope and Possibilities.” Knowledge Organization, 20: 211–222. Dubuc, Robert. 1999. Manual Práctico de Terminología (3rd ed.). Santiago de Chile: Unión Latina. Giunchiglia, Fausto, Biswanath Dutta, B., and Vincenzo Maltese. 2014. “From Knowledge Organization to Knowledge Representation.” Knowledge Organization 41: 44–56. Guimarães, José Augutos Chaves. 2017. “Slanted Knowledge Organization as a New Ethical Perspective.” In The Organization of Knowledge: Caught between Global Structures and Local Meaning, edited by Jack Andersen and Laura Skouvig. Emerald Publishing, 87-102. Gutiérrez Rodilla, Bertha. 2005. El Lenguaje de las Ciencias. Madrid: Editorial Gredos. Higgins, Colin. 2012. “Library of Congress Classification : Teddy Roosevelt’s World in Numbers?” Cataloging & Classification Quarterly 50, no. 4: 249–262. Mai, Jens-Erik. 2004. “Classification in Context: Relativity, Reality, and Representation.” Knowledge Organization 31: 39-48. Mai, Jens-Erik. 2010. “Classification in a Social World: Bias and Trust.” Journal of Documentation 66: 627–642. 104 Meyer, Ingrid and K. Mackintosh. 2000. “When Terms Move into Our Everyday Lives: An Overview of De-terminologization.” Terminology 6, no. 1: 111-138. Milani, Suellen Oliveira and José Augusto Chaves Guimarães. 2010. “Bias in the Indexing Languages: Theorethical Approaches about Feminine Issues.” In Paradigms and Conceptual Systems in Knowledge Organization: Proceedings of the Eleventh International ISKO Conference 23-26 February 2010, Rome, Italy, edited by Claudio Gnoli and Fulvio Mazzocchi. Advances in knowledge organization 12. Würzburg: Ergon Verlag, 424-429. Milani, Suellen Oliveira and José Augusto Chaves Guimarães. 2011. “Biases in Knowledge Representation : An Analysis of the Feminine Domain in Brazilian Indexing Languages.” NASKO 3: 94–104. Milani, Suellen Oliveira and José Augusto Chaves Guimarães, and Hope A. Olson. 2014. “Bias in Subject Representation: Convergences and Divergences in the International Literature.” In Knowledge Organization in the 21st Century: Between Historical Patterns and Future Prospects: Proceedings of the Thirteenth International ISKO Conference 19-22 May 2014, Kraków, Poland, edited by Wieslaw Babik. Advances in knowledge organization 14. Würzburg: Ergon Verlag, 335–342. Mis-. 2019. In Longman Dictionary. Misrepresentation. 2019. In Longman Dictionary. Olson, Hope A. 2002. The Power to Name: Locating the Limits of Subject Representation in Libraries. Canadá: Springer. Pearson, Jennifer. 1998. Terms in Context. Amsterdam: John Benjamins. Representation. 2019. In Lexico. Vickery, B. C. 1986. “Knowledge Representation: A Brief Review.” Journal of Documentation 42: 145–159. Giulia Crippa – University of Bologna (Ravenna Campus), Italy Andre Vieira de Freitas Araujo – Federal University of Rio de Janeiro (Rio de Janeiro), Brazil Order of Knowledge, Selection and Bibliographical Tension in the 16th Century Between Gesnerian Universality and Possevinian Anti-Heretism Abstract: Our work is guided by the construction of the genealogy of knowledge organization (KO).The plan of our discussion is that which is linked to the constitutive elements of modern bibliographical principles. In the history of cultural circulation, after the invention of the printed book, at least two interpretative models of the function of the culture and the representation of knowledge in society can be observed. On the one hand, the lay principle is developed according to which man finds his dignity in the rational search. The antagonistic side, on the other hand, develops the dogmatic view of those who consider themselves to be representatives of unshakable certainties that, from the point of view of a superior good, impede the freedom of choice of individuals. Among the first great representatives of these two models, we can see how Conrad Gesner, in his secular-minded compilation, incorporates a first proposal, while Antonio Possevino, on the other hand, proposes a bibliography rigidly configured to support the Catholic Counterreformation spirit (Serrai and Sabba 2005). From a historical-bibliographic approach intertwined with the ongoing debates in KO, the objective of this study is to promote a brief comparative formulation between the gesnerian Bibliotheca and the possevinian Bibliotheca, from the selection and the bibliographical tension under the aegis of universality and antiheretism that delineate the order of knowledge in the 16th century. The study indicates that a proposal of theological and moral control in the access to knowledge, made through bibliographical control, finds in Possevino one of its greatest representatives (Santoro and Orlandi 2006). In this context, knowledge is evaluated through the rejection filter of authors who do not adhere to Catholic dogmas, which is why the Index Librorum Prohibitorum is instituted, condemning authors such as Giordano Bruno, Copernicus, Galileo and own Gesner. On the other side of the reflection on knowledge and information between the 16th and 17th centuries is the view of authors whose roots lie in the more properly humanistic culture. Genealogically, in bibliographical terms, it is Gesner and his work that heads this view, which tends to be delineated as lay knowledge. In comparative terms, as stated by Serrai (1993), while Conrad Gesner builds a Bibliotheca, which was Universalis, with Science, Nature and Theology as its main roads, Antonio Possevino draws his own Bibliotheca, which was Selecta, as a map of knowledge protected, guaranteed, without danger, by orthodoxy and morals. In Gesner there is absolute certainty in science and in Possevino reappearance a recurring anthropological skepticism and suspicion about the value and innocence of science. Indeed, the selection and the bibliographical tension between the two Bibliothecae, under the aegis of universality and anti-heritism, become the key not only to understanding but also to the delineation of the order of knowledge in the 16th century. In the long-term perspective, they are horizons that reveal the historical relations between knowledge, its control, its access and its organization. 1.0 Introduction Our work is guided by the construction of a genealogy of knowledge organization (KO), an archeology in which in each stratum, each epoch is revealed. If a complex bibliographical repertoire in medieval libraries does not seem to be relevant, because relatively few manuscript materials are available, one has to go to Trefler and, especially, Gesner, to find a substantial difference in bibliographical principles, which can be accomplished through a genealogical study of bibliographical treatises, a study that allows hermeneutic potentialities, provided that the criterion for evaluating the relationship between what should be faced and the way to deal with it is followed, 106 that is: the relationship between problems in search of solution and solutions offered, making it necessary to accept the phenomenological polymorphism of library reality in its history, because any linearity and imposed coherence turns out to be false. In order not to slip into conceptual misconceptions, it is necessary that the requirements of scientific explanation be limited to the observation of relationships and nothing else, relying on that single theoretical core represented by the individualization and functionalities of index and catalog relations that constitute critical matter and interpretative conditions of the Bibliography. Bibliography represents information, making it necessary to recognize the existence of logics and mediation procedures via indexes and catalogs. In this sense, the plan of our discussion is that which is linked to the constitutive elements of modern bibliographical principles. After the printed production of information in books, Bibliography acquires an essential role in order to be able to reformulate the library structure itself. In fact, the modern library reformulates itself because there is a change in the structures of knowledge and, therefore, in the logic of information organization, mainly through bibliographical and catalog production. The bibliographical structure, destined to become physical structure in libraries, is based, in our view, on a new settlement of the “parties” involved in the discussion about knowledge. We talk of “parties” because the univocal voice of the medieval Christian world is already fragmented from the perspective of Renaissance humanism. The Renaissance period brought, as a result, the formation of Protestant churches in the religious field, as well as the empirical-experimental foundation of the first claims of modern science. The result of all this is the strong reaction of the Catholic Church, which is reformulated at the Council of Trent (ended in 1563), from which the characteristics of the actors of a new dialectic of knowledge emerges. As Balsamo (2017) writes, there is, in fact, a genealogy on the basis of statements about organization and access to information, which are based on the same principles we ask ourselves about today: for what and for whom does one select, order and allow access to knowledge? In the history of cultural circulation, after the invention of the printed book, at least two interpretative models of the function of culture in society can be observed. On the one hand, the lay principle is developed according to which man finds his dignity in the responsible and rational search for truth, in the attentive and inexhaustible search for an understanding of the reality around us and of which we are constitutive elements. On the opposite, the antagonistic side develops the dogmatic view of those who consider themselves to be representatives of unwavering certainties that, from the perspective of a superior good, impede individuals' freedom of choice, as orthodox interpretation is guaranteed by official institutions, whose other task is to control its dissemination. Among the first great representatives of these two models, we can see how Conrad Gesner, in his “universal” compilation of secular spirit (Bibliotheca Universalis), embodies the first model while, on the opposite side, Antonio Possevino proposes a bibliography (Bibliotheca Selecta) rigidly configured to support the Catholic Counterreformation spirit (Serrai and Sabba 2005). From a contemporary perspective, we can build an imaginative exercise and situate the aforementioned contrast as a kind of precursor to the tension between “universal” 107 and “domain-oriented” KOSs, which have been objects of relevant discussions in the last decade, notably between Hjørland and Szostak. For Hjørland (2017), for example, the KOSs would be based on social relations established by specific domains, which is in line with the historical experiences that support our study. Naturally, in the context of this work, we are positioned in the 16th-century timespace in which informational, philosophical, social and cultural particularities must be rigorously considered and relativized when observing how knowledge was produced in specific “domains” and specific “discursive communities”. From the historical-bibliographic approach intertwined with the ongoing debates in KO, the objective of this study is to promote a brief comparative formulation between the gesnerian Bibliotheca and the possevinian Bibliotheca, from the selection and the bibliographical tension under the aegis of universality and anti-heretism that delineate the order of knowledge in the 16th century. 2.0 Conrad Gesner and Bibliotheca Universalis Conrad Gesner (1516-1565), as is already known, was a Swiss scholar, scientist and bibliographer. His education took place in many different cities like Zurich, Bourges, Paris, Montpelier, Basel and Strasbourg. Gesner had Ulrich Zwingli (1484-1531) as spiritual guide, intellectual reference, tutor and economic support. Zwingli had brought together and harmonized the principles of Christian Theology, with the exercise of human reason and the intellectual heritage of classical pagan civilization. In this perspective, in the center of Zwingli’s thought was God conceived as truth and supreme good, who had distributed to all the possibility of accessing truth and salvation from the moment of creation and not, as was believed in both Catholic and Protestant, thanks to the subsequent incarnation (Sabba 2012). The "Zwinglian theological vision" was adopted by Gesner, a particular current of the Protestant Reformation - distinct from Lutheranism and Calvinism - which had been marked and guided precisely by head of the Zurich Church (Sabba 2012). A quintessential Renaissance “polymath”, Gesner had the ability to articulate and discuss numerous areas of knowledge, publishing books on multiple topics such as linguistics, medicine, theology, botany, zoology, paleontology, mineralogy and bibliography. Bay (1916, 54-55) summarizes Gesner's relationship with his time and knowledge: “Gesner belonged to a period in the history of science distinguished for magnificent schoparship and elaborate method. His period of development and maturity was the ripening period of the Reformation. It was no rare occurrence that a man made himself master of the essentials of all knowledge thus far accumulated. […] Gesner had that peculiar ingenium which marshals both wisdow and knowledge”. Conrad Gesner's intellectual maturity and methodological rigor is consolidated in his work Biblioteca Universalis. A seminal work in the field of Bibliography, Bibliotheca consists of an alphabetical-nominal part called Bibliotheca Universalis (1545) (Figure 1) and Pandectae (1548, 1549), which is systematically ordered on the semantic content of the works. Bibliotheca Universalis (1545) is an alphabetical-nominal catalog that lists 5031 authors from around 15.000 works in Latin, Greek and Hebrew. It is organized in alpha- 108 betical order by the author's first name and presents a summary and extract of the documents listed. It is a record that represents, above all, the literary heritage of Western culture. Figure 1- Bibliotheca Universalis, 1545. Source: Gesner (1545). Bibliotheca's authorial scope includes erudite and non-erudite authors. Therefore, for Gesner, everyone should be remembered; therein lies a relevant aspect of his project, which is the possibility of giving voice to unknown authors, which makes Bibliotheca a device of wide dissemination. It is interesting to note Gesner’s refusal to discriminate, upon finding his humanist stance, leaving the reader to evaluate and even judge the sources. As Gesner says: "I wanted to report so much, but I left the selection and judgment of the books to others" (Gesner 1545, 3v). Gesner’s deliberate choice to treat all authors as worthy of memory points to a criterion adopted by Gesner: a supposed documentary impartiality. According to Serrai (1990, 82): “With this criterion of absolute documentary impartiality, Gesner planted another pillar that supports the techniques and ethics of bibliographic disciplines: the registration, organization and preservation of documentary memories cannot be subordinated to any ideological preference”. 109 For Gesner, the bibliographical operation should not be subject to restrictions or censorship, but, considering that Bibliotheca could also be used by inexperienced people, Gesner gives advice, guidance and warnings in relation to poor quality works. From the point of view of the organization of knowledge, it is worth remembering that Gesner proposes, in the second part of the Bibliotheca, called the Pandectae (1548), a classification system that expands the seven liberal arts of Medieval tradition to the categories of complementary subjects of interest to the Renaissance scholars, constituted by 21 classes or partitions. Gesner elaborates the Pandectae with the following classification structure for books: 1) Grammar (and Philology), 2) Dialectic, 3) Rhetoric (representing the trivium), 4) Poetics, 5) Arithmetic, 6) Geometry, 7) Music, 8) Astronomy (the last four classes representing the quadrivium), 9) Astrology, 10) Divination and Magic, 11) Geography, 12) History, 13) Mechanical Arts, 14) Natural Philosophy, 15) Metaphysics, 16) Moral Philosophy, 17) Economic Philosophy, 18) Politics and finally, 19) Law, 20) Medicine and 21) Theology. In his point of view of nature and his choices related to the taxonomy, Gesner reveals a scheme that seeks to contemplate the totality of orders: natural and artificial; of things and of sciences. 3.0 Antonio Possevino and Bibliotheca Selecta Antonio Possevino was born in Mantua, a small city in the north of Italy, in 1533. Mantua was, at that time, an important court, governed by the Gonzaga family. Possevino went to Rome in 1550 to study and, in 1554, became Cardinal Ercole Gonzaga’s secretary, working, at the same time, as teacher for the future cardinals Francesco and Scipione Gonzaga. In 1559 Possevino entered the Society of Jesus, a turning point in his life. From then on, he became a dedicated preacher against heresy and spent many efforts to try to solve theological and political matters with northern and eastern European countries. He travelled to France, Sweden, Poland, Russia, Hungary, Romania and Moravia, looking for reconciliation where there were schisms proposed. He died in Ferrara in 1611 (Serrai 1993). The textual structure of Possevino’s Bibliotheca (Figure 2) is based on the treatise modality, accompanied (as we said) by tables and authors. Books are divided into Holy Scriptures, Positive Theology, Scholastic Theology, Catechetical Theology, Practical Theology, Clergy, Heresy, Philosophy, Law, Medicine, Mathematic, Music, Architecture, Cosmography, Geography, History, Poetry, Oratory and Miscellaneous (Serrai 1977). The title of the work Bibliotheca selecta qua agitur de ratione studiorum in historia, in disciplinis, in salute omnium procuranda explains the relationship with the Ratio Studiorum, a pedagogical system established by the Jesuits for their educational centers, which will be published in 1599. Faced with the universality and impartiality of the information offered by Gesner’s Bibliotheca, a list of authors sorted alphabetically and by topic in the Pandectae, Possevino’s objective is to propagate Christian doctrine to remove heresies and annihilate the schism. To this end, the curriculum proposal aims to provide for an each individual, based on their conditions and social status, the indications of the authors and the appropriate readings, by the children of the princess, oriented 110 towards civilians, ecclesiastics, diplomacy, passing through the nobility, even the lowest classes. Figure 2 - Bibliotheca Selecta, 1593. Source: Possevino (1593). ge_summary_r&cad=0#v=onepage&q&f=false> Alongside the indications for the dispositio and the good conservation of the volumes and the catalogs of the works and the nomenclatures of authors, the Bibliotheca offered indications for the emendatio and the expurgatio of those works that would otherwise have been prohibited, in addition to committing to refuting works and authors already listed. So Possevino, in the "model library", felt the need to return to opposition to certain publications, positioning his Bibliotheca not only as a mirror but also as a complement to the Index Librorum Prohibitorum (Balsamo 2017). On the basis of Possevino's work, the Italian monastic libraries were purged at the end of the sixteenth century while, with regard to Rome, the libraries of the great cardinals were recipients of Possevino's censorship program (Serrai 1993). It is also true that in the Bibliotheca Selecta the primary interlocutors were princes and nobles who possessed rich libraries. All the rest of the work turned to the Jesuitic Order and the same Bibliotheca was initially conceived by Possevino as a bibliographical and at the same time pedagogical work to be destined primarily for the principles, considered on the one hand as the users of the Jesuit institution and, on the one hand other as the defenders of 111 Christianity. In the libraries of princes, therefore, not only printed books but also manuscript codes had to be subjected to rigorous censorship. A prescriptive bibliographical canon, such as the Bibliotheca, necessarily has a closed and imposed character, based on the Counterreformation Catholic doctrine. It is articulated following a hierarchical scheme that begins with the Divine History, then Positive Theology. The Scholastica Theology follows: that is, the interpretation of the sacred writings according to the teaching of the Church. Next comes the Theology practice as a spiritual direction of consciences, and Catechetic Theology, oriented to pedagogical activity, with the establishment of a whole curriculum for the school. Eleven of the eighteen books are dedicated to all this part. The autonomy of human science is questioned, because all sciences are included in Divine History. The Bibliotheca classification scheme is radically opposed to Gesner's Pandectae, which began with the Trivium and Quadrivium, to end with Theology in the twenty-first book. The Bibliotheca represented a guide to safe, guaranteed knowledge, without danger for Catholic orthodoxy. The work is organized in two volumes. The first, dedicated to Pope Clement VIII, is divided into eleven books: the first five lay the foundations of Christian education on the Scriptures and on Theology, while books VI-XI provide the cultural tools for evangelization of the world by reformed Christians to the inhabitants of the Indies, in view of a Catholic "conquest" or "reconquest". The second volume, dedicated to Sigismund III, King of Poland and Grand Duke of Lithuania, consists of six books (XII-XVIII) in which the different disciplines (law, philosophy, medicine, mathematics, architecture, geography, history, poetry, painting and rhetoric) are presented in descending hierarchical order and dependent on Theology, in controversy with their alleged autonomy. Possevino’s Bibliotheca can be used to draw a balance of early modern Catholic culture. Quotes and oversights, genuinely known texts and second-hand quoted texts, corrections and errors, convictions and censures, autobiographical references and (not always explicit) positions on current problems are a fertile ground that still awaits investigation. It’s interesting to notice that his suggestions for organizing a physical library differ from the bibliographical scheme, as Serrai indicates (1977, 79). 4.0 Order of knowledge, selection and bibliographical tension between Bibliotheca Universalis and Bibliotheca Selecta On the other side of the reflection on knowledge and information between the 16th and 17th centuries is the view of authors whose roots lie in the more properly humanistic culture. Genealogically, in bibliographical terms, it is Gesner and his work that heads this view, which tends to be delineated as “lay” knowledge. The idea of "universality" present in the Gesnerian work does not point to a generic totality, but to the possibility of access and appropriation of books and manuscripts by the learned community. If, at the time, this community is made up of scholars, at no time are there obstacles to its expansion. The ideological question involved, though, should be linked not to the simplistic opposition of religion versus science, or past versus future. The two models of interpreting the world are far more complex than this. As Serrai (1993, 717) points out, while Gesner’s bibliography was Universalis, Possevino’s is Selecta, which means that the 112 sources of the first are broad and that the sources of the second are strictly chosen. While Gesner looks for the widest compilation of works covering all the fields of knowledge (a compilation limited only by the three chosen languages), Possevino creates a bibliography aimed at education, study, reference inside the domain of orthodox theology. Gesner, as Balsamo states, considers bibliography “an essential tool for achieving knowledge and ‘communicating’ it to others”, being “an invitation to share in further research” (2017, 30). The core of Possevino’s Bibliotheca consists of “mapping” all the knowledge fields through texts, tables and the list of authors that wrote about each specific field, including those not acceptable by the reformed catholic doctrine, being not recommended. Gesner builds a catalogue meant to be not only the state of the art of knowledge, but also an effort to offer traces of all the previous culture. Possevino, on the other side, serves the Church purpose to reestablish its primacy as knowledge “broker”, interpreter of the right doctrine that ascend to the Divine. This way, the main purpose of Bibliotheca Selecta was to be “prescriptive bibliographic canon which would serve as a tool for imposing ideologically correct works on all who engaged in studies or research” (Balsamo 2017, 46-47). This opposition, stated by the very title of Possevino, should be smoothed by the relevance of Gesner as a source for his bibliography, although Gesner is cited by the author both in private letters (see Serrai 1993, 113) and in the Biblioteca Selecta (also cited by Serrai, 1993, 717 and 720, referring the first to the Preface to Bibliotheca Selecta and the last one to the Apparatus Sacer). One should consider that after he entered the Jesuit order, in 1559, Possevino had to contend with the relevance of the Bibliotheca Universalis throughout the European intellectual circles, even though included already in the first edition of The Index Librorum Prohibitorum, in 1564. The Catholic Church needed a modern structure for sustaining its authority, undermined by such a complete catalog as the Bibliotheca Universalis, whose contents trespassed the boundaries established by the Counterreformation. Gesner’s Bibliotheca had become so relevant that the Church had to choose whether to accept it or to lose the competition to sustain its knowledge authority. Possevino, in his Bibliotheca, forged a strategic tool based on the impossibility of universality: this would imply the acceptance of Protestant authors, while his interest focused only within the domain of Catholic knowledge. In order to better explain this dialectic, we offer an example taken from the disciplinary position of Catholicism in relation to artistic production. Clearly, the post-conciliar Church establishes rules for the realization of religious images, for which the didactic function stands out. That art conceived in this way had a brief life, is evident in its rapid evolution to the emotional appeal of Baroque representations that, even so, maintain their theological rigor, expressed by an effective rhetoric. We have already talked about Possevino's role in the elaboration of Counterreformation bibliographical catalogs, and it is worth mentioning that he dedicated himself to the bibliography related to art, in his Tractatio de Poesia et ethnica, humano et fabulosa collata cum vera, honesta et sacra, from 1595 (Possevino, 1971). Just like his Bibliotheca Selecta, Poesia et Pintura offers the rigidly delimited model of the Counterreformation doctrine, a model that becomes an instrument of close control applied to bibliographic information and the circulation 113 of books, aimed at the construction, “on the documentary level, of a collective memory selected according to a specific pedagogical program” (Balsamo 2017, 55). What we want to highlight here is that Possevino, a religious scholar and bibliographer, selects a set of authors and books not dedicated to techniques, but rather to morality in painting and sculpture. Possevino, is not an artist, so he expresses moral concerns on art, selecting those authors that “deal with this issue from a theoretical point of view, and not a practical one, as other art bibliographers were doing at the time, in order to structure the meaning of the object of art. This way, he offers titles that move away from the technical domain” (Crippa 2018, 76). Returning to the scheme of the actors of the dialectic of knowledge of the time, one can observe, on one side, the proposal of a theological and moral control in the access to knowledge, which is accomplished through bibliographic control, which finds in Possevino one of its greatest representatives (Santoro and Orlandi 2006). On the other side a libertarian, bourgeois matrix thinking develops, proposing a “universal”, secular access to knowledge. If we rely on this dialectic between the two models, it is appropriate here to offer a proposal for the individualization of their characteristics, focusing on the library as a public service that provides all the tools for study and information. We thus identify the current of thought linked to the post-conciliar vision, in which the control by the ecclesiastical institution of knowledge through its rigidly controlled administration and dissemination is placed as its basic principle. Perhaps, it should be remembered, once again, the role played by the new religious order of the Society of Jesus, an order specifically created to support the decisions of the Counterreformation. In any case, knowledge is thought the filter of rejection of authors who do not adhere to Catholic dogmas, which is why the Index Librorum Prohibitorum is instituted, condemning authors such as Giordano Bruno, Copernicus, Galileo and, not surprisingly, Gesner. 5.0 Considerations In comparative terms, as stated by Serrai (1993), while Conrad Gesner builds a Bibliotheca, which was Universalis, with Science, Nature and Theology as its main roads, Antonio Possevino draws his own Bibliotheca, which was Selecta, as a map of knowledge protected, guaranteed, without danger, by orthodoxy and morals. In Gesner there is absolute certainty in science and in Possevino reappearance a recurring anthropological skepticism and suspicion about the value and innocence of science. Gesner turns to scholars and elaborates for them the indices of the cultural heritage of all humanity, creating a mediating instrument for documents and monuments, without sectarianism and bias. Possevino, on the other hand, works in reverse, under the threat of Protestant advancement, and cannot be faithful to the principle of universality: he prepares a guide for those who, as Catholics, must be safeguarded, tutored and protected (Serrai and Sabba 2005). Gesner and Possevino became emblems of two cultural worlds and gave favorable conditions to the development of science and civilization. Gesner promoted a bibliographical selection based on criteria of intellectual, scientific nd philological rigor and developed a rigorous method of bibliographical nature. But a substantial difference with Possevino lies above all in the greater 114 conceptual breadth of Zwingli's ideological system, in which the Gesnerian culture was implanted, compared to that of the narrow doctrinal armor that marked the ideology of the Catholic Counterreformation (Serrai and Sabba 2005). In Modern Europe, Gesner and Possevino promoted bibliographical ruptures, of a Protestant and Catholic nature, respectively, that affected significantly the forms of production, organization and mediation of knowledge. Indeed, the selection and the bibliographical tension between the two Bibliothecae, under the aegis of universality and anti-heritism, become the keys not only to understanding but also to the delineation of the order of knowledge in the 16th century. In the long-term perspective, they are horizons that reveal the historical relations between knowledge, its control, its access and its organization. References Balsamo, Luigi. 2017. La Bibliografia: Storia di una Tradizione. Milano: Unicopli. Bay, Jens Christian. 1916. “Conrad Gesner, the Father of Bibliography: An Appreciation.” The Papers of the Bibliographical Society of America 1: 53-86. Crippa, Giulia. 2018. “A Invenção Bibliográfica da Arte na Modernidade: Notas Históricas sobre a Organização do Conhecimento Artístico no Século XVI.” Informação & Informação 23, n. 2: 58-77. Gesner, Conrad. 1545. Bibliotheca Universalis, sive, Catalogus Omnium Scriptorum locupletissimus in Tribus Linguis Latina, Graeca & Hebraica: extantium & non extantium, veterum & recentiorum in hunc usque diem, doctorum & indoctorum, publicatorum & in bibliothecis latentium: opus novum & non Bibliothecis tantum publicis privatisue instituendis necessarium, sed studiosis omnibus cuiuscunque artis aut scientiae ad studia melius formanda utilissimum. Tiguri: apud Christophorum Froschouerum. Hjørland, Birger. 2017. “Domain Analysis.” Knowledge Organization 44: 436-464. Possevino, Antonio. 1593. Societatis Iesu Bibliotheca Selecta Qua Agitur de Ratione Studiorum in Historia, in Disciplinis, in Salute Omnium Procuranda. Romae: ex Typographia Apostolica Vaticana. Possevino, Antonio. 1971. ”Quinam Pingendi Praecepta Tradiderint Antiqui et Recentes”. In Scritti d’Arte del Cinquecento. Tomo I, edited by Paola Barocchi. Milano, Napoli: Ricciardi. Sabba, Fiammetta. 2012. La ‘Bibliotheca Universalis’ di Conrad Gesner: Monumento della Cultura Europea. Roma: Bulzoni. Santoro, Marco and Antonella Orlandi. 2006. Avviamento alla Bibliografia: Materiali di Studio e di Lavoro. Milano: Editrice Bibliografica. Serrai, Alfredo. 1977. Le Classificazioni: Idee e Materiali per una Teoria e per una Storia. Firenze: Leo S. Olschki Editore. Serrai, Alfredo. 1990. Conrad Gesner. Edit by Maria Cochetti. Roma: Bulzoni. Serrai, Alfredo. 1993. Storia della Bibliografia IV: Cataloghi a Stampa. Bibliografie Teologiche. Bibliografie Filosofiche. Antonio Possevino. Edit by Maria Grazia Ceccarelli. Roma: Bulzoni. Serrai, Alfredo and Fiammetta Sabba. 2005. Profilo di Storia della Bibliografia. Milano: Edizioni Sylvestre Bonnard. Amelie Dorn – ACDH-CH, Austrian Academy of Sciences, Austria Renato Rocha Souza – ACDH-CH, Austrian Academy of Sciences, Austria Enric Senabre – ACDH-CH, Austrian Academy of Sciences, Austria Thomas Palfinger – ACDH-CH, Austrian Academy of Sciences, Austria Eveline Wandl-Vogt – ACDH-CH, Austrian Academy of Sciences, Austria Barbara Piringer – ACDH-CH, Austrian Academy of Sciences, Austria Crafting a System for Knowledge Discovery and Organisation A Case-Study on KOS for a Non-Standard German Legacy Dataset Abstract: This paper describes a case-study developing a knowledge organisation system (facet thesaurus) on the example of a non-standard German language legacy dataset, DBÖ [Datenbank der bairischen Mundarten in Österreich / Database of Bavarian Dialects in Austria]). A particular focus is placed on the 109 original data collection questionnaires contained in the collection, which are understood as an entry point to the entire collection. Here they serve as a case-study to demonstrate the process, which may be extended to the remainder of the collection. Ranganathan (1933, 1967) created the first faceted scheme - Colon Classification - to classify books in libraries. Faceted classification has also been used to assist automated search and retrieval of information (Prieto-Diaz 1991). According to Mills (2004), facet analysis has a very vital role for information retrieval and in the design of classificatory structures by the application of logical division to all forms of the content of records, subject and imaginative. The natural product of such division is a faceted classification. Building on these previous endeavours, we here introduce a facet thesaurus for eliciting and promoting access and navigability for the items in this collection, in order to make cross cutting topics accessible. 1.0 The aim and scope of the study & introduction The aim of this study is to introduce a first approach towards creating a knowledge organization system (facet thesaurus) for a non-standard German language legacy resource (DBÖ)1, enabling transversal knowledge discovery. Here we present the rationale to our approach, the applied methodologies, and a concrete example on technology related terms in the form of a case-study that, in a next step, can be readily applied to other thematic areas within and beyond the collection. Our undertaking is realised within the project exploreAT!2 and the wider framework of exploration space3 at the Austrian Centre for Digital Humanities and Cultural Heritage (ACDH-OeAW). The exploration space is a digital and physical space offering opportunities for experimentation and innovation in the networked Humanities, and has since its establishment in 2017 also been listed as a best practice example for Open Innovation in the Humanities4. The project exploreAT! is a multidisciplinary endeavour with international collaboration partners (semantic technologies: Adapt Centre, Dublin City University, IE; visualisation tools: VisUSAL, Universidad de Salamanca, ES), with the general aim of opening the DBÖ collection for thematic exploration and exploitation 1 [DBÖ] Österreichische Akademie der Wissenschaften. (1993–). Datenbank der bairischen Mundarten in Österreich [Database of Bavarian Dialects in Austria] (DBÖ). Wien. [Processing status: 2018.01.] 2 3 4 116 by means of semantic technologies and visual prototyping, as well as linking to other resources. The DBÖ collection is large and rich (~3.5 million entries) and is composed of digitised data collection questionnaires, answers as well as excerpts of vernacular dictionaries and folklore literature. The data, having undergone several stages of digitisation, is to-date available in TEI/XML format and partly as a MySQL database. The questionnaire data (109 thematic questionnaires comprising around 17.000 individual questions) thus constitutes only a fraction of the DBÖ. The questionnaires originally pertained to a dictionary project aimed at capturing the German language spoken by local population from the early 20th century onwards in the area of the former Austro-Hungarian empire. Therefore, apart from being a rich linguistic, non-standard resource, the collection also captures a wealth of historic cultural information of everyday life, e.g. customs, religious festivities, food, traditional medicine, professions, songs, among others. The answers to the questionnaires’ questions thus follow a lexicographic structuring and are composed of headwords (lemmas), senses/meanings, sources, geographic location information (GIS), person information (authors, collectors, data typists), etc. In the context of the exploreAT! project, opening up the collection and questionnaires to new ways of exploration was initiated via lexical concepts, enabling the linking to other resources, such as Linked Open Data (LOD) (cf. Abgaz et al. 2018; Dorn et al. 2019). With this came the necessity for transversal knowledge searching across questionnaires and access to knowledge otherwise inaccessible or hidden. The main topics of the questionnaires reflect more or less detailed aspects of everyday life, which is also reflected in the quantity of questionnaires dedicated to a certain topic. Topics such as “movement” (Bewegung) (11 questionnaires), “wedding” (Hochzeit) (5 questionnaires), “tailoring” (Schneiderei) (4 questionnaires) or “baking bread” (Brotbacken) (3 questionnaires) have been queried extensively, while other topics (e.g. body parts, time, animals, school and education, plants, brewing) were covered with fewer questionnaires. Based on this content and the overall topics represented, we have chosen use-cases that would allow us to explore the data from perspectives relevant for cultural exploration as well as from the Digital Humanities perspective in general. Exploring the basic subject of “technology” across the collection, on the one hand, would allow us to provide a “historical” view of technology related terms and concepts from the time when the questionnaires were conceptualised (1912)5, while, at the same time, providing a noticeable contrast to what we understand by technology today. The other selected use-case deals with the topic of “food”, which is covered by specific questionnaires, but also transversally with specific, food related questions occurring across several questionnaires. In addition, food is one of the key aspects of mankind’s culture and thus particularly relevant not only from a historical perspective but also nowadays. Navigating through such a vast collection is hard when dealing with purely term based search interfaces, and that is why we have chosen the faceted approach to organize the main concepts of the collection in a thesaurus, that would serve as the entry point for navigating through the questions, each one linked to one or (many) more answers. Relevant terms were chosen using its raw frequency in the questions. However, word 5 117 embedding models (Mikolov 2013a) were also used to explore semantic vicinities and to find similar concepts among the whole collection. Ultimately, we have achieved our goal to build a navigable interface providing access to the data fields in the collection of cultural heritage. 2.0 Theoretical background In the scope of this project, two main knowledge representation techniques were used: faceted analysis and word embeddings. A brief overview of these is provided in this section below. Faceted classification is a concept and a technique introduced by Ranganathan (1933, 1967) and later developed by the Classification Research Group (CRG) (Vickery 1960). A faceted scheme has several facets and each facet may have several terms, or possible values. A faceted classification scheme for wine, using (Broughton 2006) example, might include the facets (and terms) “grape varietal” (riesling, cabernet sauvignon, etc.), “region” (Napa Valley, Rhine, Bordeaux, etc.), and “year” (2001, 2002, etc.). According to Ranganathan, the process of choosing the facets is analytico-synthetic, that is, we first analyze the subject domain and then shape their compounding facets to adequately describe its characteristics and provide room for the concepts. That makes it a very powerful resource to describe and organize information. The facets need not be ordered, nor be of the same type, although they should be clearly defined and mutually exclusive (Broughton 2006). It was first devised for the classification of books in libraries (Ranganathan 1933), but was it subsequently adopted in information retrieval systems and search interfaces on the web (Prieto-Diaz 1991; Broughton and Lane 2000; Tudhope et al. 2006). Facet analysis has been used in the construction of information retrieval (IR) thesauri since the publication of the Information Retrieval Thesaurus of Education Terms in 1968 (Spiteri 2000; Barhydt, Schmidt, and Chang 1968). Spiteri (2000) states that there is no standardized model for the application of facet analysis to information retrieval thesauri and that national and international guidelines for thesaurus construction make minimal mention of the use of facet analysis. She presents, in his study, some critical arguments on how to evaluate the choices of facets for thesauri, and if these choices are coherent to the principles stated by both Ranganathan and the CRG. Ranganathan (1933) has proposed, influenced by the Brahman philosophy (Mazzocchi, 2013), the adoption of basic subjects and 5 characteristics of division used to derive facets: Personality (that we can interpret as “the things”), Matter (its characteristics), Energy (the processes), Space, and Time. His facets system has since the epithet "PMEST". The CRG, alternatively, preferred an ad hoc approach, and proposed that each subject area should be divided into categories that are appropriate to its nature. This latter approach was more suitable for our collection. Word embeddings are one of the many powerful NLP techniques that have been developed in the past few years. To build semantic models out of large textual collections, we need to represent the semantic units into mathematical vectors. There are mainly two ways to construct this representation: using the simple bag of words model (Zhang et al. 2010), where each word is represented by a specific vector in a huge multidimensional space; and using word embeddings models, such as Word2vec (Mikolov et al. 2013a; Mikolov et al. 2013b), where each word is represented by a linear combination of a smaller set of dimensions or vectors. These “basic” vectors for word 118 composition are obtained using specific neural network architectures (e.g. “continuous bag-of-words” or “skip-gram”) (Mikolov, Yih, and Zweig 2013). These distributed vector representations of texts impressively capture syntactic and semantic aspects of concepts and their relationships. To generate these contextualized word representation models, it is necessary to feed the underlying neural networks with large corpora of texts of a specific language, so that the transitive relations between the concepts that co-occur in "windows" of contiguous words in a sentence are captured. Word embeddings were used in our project to explore associations between words and to choose possible non preferred terms on the thesaurus, even if these terms were not as frequent in the raw count. 3.0 The method The methodologies applied in our study involve both automatic/machine based as well as manual/intellectual processes, in a collaborative setting of technical and domain experts. As a first step, the ~17.000 questions across the 109 questionnaires were tokenized and lemmatized. Abbreviations of words were resolved and stop words (e.g. articles, prepositions, etc.) removed. The non-concepts, i.e., words related to syntax, morphology, semasiology and onomasiology used to build the questions were identified and removed. The remaining "cultural heritage" words were extracted automatically using Python scripts6 and ranked according to their frequency across all questions (min.=1; max=609). This has yielded a total of 88.883 distinct terms, which we refer to as concepts. As put before, from all the basic subjects pertinent to the collection, we have chosen two for the proof of concept: technology (Technologie) and food (Essen), in which two different sets of domain expert in the team were involved. Technology related terms were identified manually among the most frequent by the domain experts, involving two rounds of evaluation and agreement processes, yielding a total of 186 terms. As we are dealing with a non-standard language collection, automatic identification of technology related terms would have only been partially successful, in spite of the availability of linguistic Knowledge Organization Systems as the german Wordnet (Germanet) and German Thesauri. Then, both the domain and the technical experts determined a suitable set of facets for the thesaurus, based on the concepts that were harvested and the guidelines from examples made by the CRG. The chosen facets were: trade/crafts (Gewerbe); artifacts (Artefakt), processes (Prozess), roles (Rollen) places of application (Anwendungsort), areas of application (Anwendungsbereich) and quality (Eigenschaften). Subsequently, the 186 technology related terms were assigned to these facets by the domain experts, again involving two rounds of evaluation and agreement. Terms that could not be clearly assigned, or when no agreement was reached among the domain experts were temporarily excluded (n=11). For these terms further evaluation is needed in future developments. Finally, a total of 175 technology related terms were distributed as follows: trade/crafts (n=8); artifacts (n=100), processes (n=46), roles (n=4), fields of application (n=9) and places of application (n=8). The food related terms are still being in the process of being harvested and the process has not been completed yet. 6 119 The chosen tools for the management of the vocabularies and data, and the display of the thesaurus hierarchies was the free and open source tool Tematres7 with the Visual Vocabulary addin8 installed. The database with the questions was kept as a Pandas Dataframe9, served by a Pandas REST API10 bridging the access to the Django REST Framework11. After all the technology related terms were chosen, the preferred and non preferred terms were added to the online thesaurus tool according to the facets in the hierarchies crafted. After all the terms were registered, the connections among the related terms were assigned. The final step was establishing links between each preferred term to the database of questions and answers that contained the term. This was made via the Tematres Multilingual Vocabularies12 resource (“Relations between vocabularies: RelatedMatch”), that takes the URL and generates a web link in the thesaurus management tool to the dataframe/database kept in the Pandas structure. The main advantage of this architecture is that it is comprised of an autonomous single Docker container that has the LAMP stack (Linux, Apache, MySQL, PHP/Python) tools, besides the Tematres with its enhancements, and it can be deployed quickly in any new environment. We plan to make a Docker image available - devoid of data - in the DockerHub13 as the initial (and outdated) version of Tematres14 that was used as the initial Image was updated. This will help accelerate the deployment of solutions like this in the future. 4.0 Results The In this section we will show some illustrations of the current state of the solution. Figure 1a below depicts the homepage with both the alphabetic and systematic displays for the thesaurus. The users can also make queries using the query box provided by the Tematres tool. Figure 1b illustrates a detail on the specific basic subject of technology, showing broader and specific terms and Figure 2 presents the details for the term “Schiff” (ship) under the facet “Artifakts”. We have used bibliographic notes to provide hyperlinks (related match) to the database sources. This kind of coupling between the concept navigation tool and the data can be changed by simply substituting the URIs. In a future version, we plan on implementing a more smooth interface as we can see in the Getty Art & Architecture Thesaurus Hierarchy Display15. The main advantage of the current architecture of the solution is that it allows for rapid deployment and uses free open source tools. 7 8 [last access: 12.12.2019] 9 [last access: 12.12.2019] 10 [last access: 12.12.2019] 11 [last access: 12.12.2019] 12 [last access: 12.12.2019] 13 [last access: 12.12.2019] 14 [last access: 12.12.2019] 15 chy?find=&logic=AND¬e=&english=N&subjectid=300000000 [last access: 12.12.2019] 120 a) b) Figure 1: The main hierarchy of the exploreAT! thesaurus. Figure 2: Details of the term “Schiff” with the bibliographic note pointing to the full questions in the dataframe. Finally, we also offer a hyperbolic geometry navigation interface, provided by the Tematres solution as an optional addin (Figure 3). 121 Figure 3: The visual navigational tool for the exploreAT! thesaurus. 5.0 Conclusions and future work In this paper we have presented a case-study on the development of a knowledge organisation system: a facet thesaurus, for transversal knowledge discovery within a non-standard language legacy dataset. We have demonstrated that a facet approach combined with both manual (human) and automatic (machine) generated concepts are essential for eliciting cultural knowledge. This setting, combined with the online thesaurus and the connection to the database with the original concepts, have provided an effective and intuitive way for navigating and retrieving the information contained in the DBÖ resource in a structured and accessible way. Our approach makes it possible to access, link and visualise thematic knowledge also transversally, which has often proven challenging. As a next step, and as a future extension of the basic subjects being identified. Food is close to completion, and then customs, religious festivities, food, traditional medicine, professions and songs are to follow. References Abgaz, Yalemisew, Amelie Dorn, Barbara Piringer, Eveline Wandl-Vogt, and Andy Way 2018 “Semantic Modelling and Publishing of Traditional Data Collection Questionnaires and Answers.” Information 9, no. 12: 297. Barhydt, Gordon C., Charles T. Schmidt, and Kee T. Chang. 1968. Information Retrieval Thesaurus of Education Terms. Cleveland, OH: Press of Case Western Reserve University. Broughton, Vanda. 2006. “The Need for a Faceted Classification as the Basis of All Methods of Information Retrieval.” Aslib Proceedings 58, nos. 1/2: 49-72. Broughton, Vanda and Heather Lane. 2000. “Classification Schemes Revisited: Applications to Web Indexing and Searching.” Journal of Internet Cataloging 2, nos. 3-4: 143-155. Dorn, Amelie, Barbara Piringer, Yalemisew Abgaz, Jose Luis Preza Diaz, and Eveline Wandl- Vogt. 2019. Enrichment of Legacy Language Data: Linking Lexical Concepts in Data 122 Collection Questionnaires on the example of exploreAT!. Budapest: Centre for Digital Humanities, Eötvös Loránd University, 13-15. Mazzocchi, Fulvio. 2013. “Ranganathan’s Universe of Knowledge and Categorical Thinking.” SRELS Journal of Information Management 50, no.6: 763-778. Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013a. “Distributed Representations of Words and Phrases and Their Compositionality.” In: NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems, volume 2, edited by Christopher J.C. Burges, Léon Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger. Red Hook, NY: Curran Associates Inc., 3111-3119. Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013b. “Efficient Estimation of Word Representations in Vector Space.” ArXiv: 1301.3781. Mikolov, Tomas, Wen-tau Yih, and Geoffrey Zweig. 2013. “Linguistic Regularities in Continuous Space Word Representations.” In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Atlanta: Association for Computational Linguistics, 746-751. Mills, Jack. 2004. “Faceted Classification and Logical Division in Information Retrieval.” Library Trends 52, no.3: 541-570. Prieto-Diaz, Ruben. 1991. “Implementing Faceted Classification for Software Reuse.” Communications of the ACM 34, no.5: 88-97. Ranganathan, S.R. 1933. Colon Classification. 1st edition. Madras: Madras Library Association. Ranganathan, S.R. 1967. Prolegomena to Library Classification, 3rd ed. London: Asia Publishing House, London. Spiteri, Louise F. 2000 “The Essential Elements of Faceted Thesauri.” Cataloging & Classification Quarterly 28, no.4: 31-52. Tudhope, Douglas, Ceri Binding, Dorothee Blocks, and Daniel Cunliffe. 2006. “Query Expansion Via Conceptual Distance in Thesaurus Indexed Collections.” Journal of Documentation 62: 509-533. Vickery, Brian C. 1960a. Faceted Classification. A Guide to Construction and Use of Special Schemes. Prepared for the Classification Research Group. London: Association of Special Libraries and Information Bureaux. Zhang, Yin, Rong Jin, and Zhi-Hua Zhou. 2010. “Understanding Bag-Of-Words Model: A Statistical Framework.” International Journal of Machine Learning and Cybernetics 1, nos. 1-4: 43-52. Sharon Farnel – University of Alberta (Edmonton), Canada Ali Shiri – University of Alberta (Edmonton), Canada Indigenous Community Driven Knowledge Organization at the Interface The Case of the Inuvialuit Digital Library Abstract: The goal of knowledge organization is to address the ways in which information and knowledge are conveyed, communicated, and understood. Beghtol (2002a; 2002b) argues for the principles of ‘cultural warrant’ and ‘cultural hospitality’, which make reference to the importance of taking into account cultural contexts, dimensions, and differences in knowledge organization. This notion of cultural relevance is also critical when developing digital libraries - online platforms for organizing, sharing, and providing access to resources in digital form - by, with, and for communities. The challenge of cultural relevance in digital libraries is particularly strong when working with Indigenous communities, as most digital library platforms are based on western approaches to knowledge organization. The Inuvialuit Digital Library ( was developed as part of the Digital Library North project, a four-year collaboration between the Inuvialuit Cultural Centre Pitquhiit-Pitqusiit (ICC) and communities within the Inuvialuit Settlement Region (ISR) in northwestern Canada, and researchers at the University of Alberta (Edmonton, Alberta, Canada), to develop a digital library infrastructure to support access to cultural resources. Using culturally appropriate methods, the team used an iterative development process to enact a culturally reflective and responsive metadata and knowledge organization framework for the Digital Library. This community driven framework allows the Inuvialuit to tell their own story in their own words, and enhances community engagement with their Digital Library. 1.0 Introduction The goal of knowledge organization is to address the ways in which information and knowledge are conveyed, communicated, and understood. Cultural aspects of knowledge organization and the principle of 'cultural warrant' and 'cultural hospitality' as argued by Beghtol (2002a; 2002b) make concrete references to the importance of taking into account cultural contexts, dimensions, and differences in knowledge organization. The same argument holds true about conceptualizing knowledge organization as understanding and effective communication. Digital libraries are online environments for organizing, sharing, and providing access to resources in digital form (Borgman 1999). They are understood to be developed by, with, and for user communities. Indeed, Pang (2012) notes that “one cannot fathom a digital library without considering the social interactions driving its development, sustainability and use” (86). Ideally, then, their content, functionality, as well as metadata and knowledge organization should reflect the needs, interests, and contexts of the communities from which they originate. Baca (2003) emphasizes this when she notes that it is not enough to use some metadata standard; a metadata standard appropriate to the materials in hand and in particular the intended end-users must be selected. Hudon (1997) and Zeng and Chan (2004) remind us of the importance of crosscultural and cross-lingual aspects of the development of knowledge organization systems and point to the importance of cultural relevance. Boast, Bravo, and Srinivasan (2007), Clarke (2002), Srinivasan (2002) and others argue for community specific metadata and knowledge organization based on the socially constructed and contextual 124 nature of knowing and understanding the world. Srinivasan (2012; 2017) labels this approach “fluid ontologies”, a knowledge organization framework that is used to design a digital library in a locally appropriate manner, evolving and changing along with the community. The development of digital libraries with Indigenous communities is particularly challenged by the fact that the technical platforms commonly used for developing them are, at their core, based on a western approach to knowledge organization (Christie 2004; 2005). Nakata (1997; 2002; 2007) explains that the digital environment is a space where Indigenous and non-Indigenous knowledge systems come into contact and where there is often tension as interactions are negotiated. He argues, however, that if Indigenous peoples are effectively and actively involved in the development and definition of the knowledge organization underlying a given system, then the power and promise of digital platforms to meet the needs and interests of the community can be achieved. While Indigenous community driven knowledge organization systems and practices have been utilized in several cultural heritage digital libraries in Australia (Bow, Christie, and Devlin 2015), New Zealand (Lilley 2015), and North America (Holland and Smith 2000), little research exists on knowledge organization practices of the Inuit in western Canada and how they can inform the development of knowledge organization systems and digital libraries (Farnel et al. 2017; Hennessy et al. 2013). The goal of this participatory, community based study is to collaboratively develop a culturally appropriate metadata and knowledge organization framework for the Inuvialuit Digital Library of cultural resources (Digital Library North 2017). 2.0 Background The Inuvialuit Digital Library ( was initially developed as part of the Digital Library North (DLN) project, a four-year collaboration between the Inuvialuit Cultural Centre Pitquhiit-Pitqusiit (ICC) and communities within the Inuvialuit Settlement Region (ISR) in northwestern Canada, and researchers at the University of Alberta (Edmonton, Alberta, Canada), to develop a digital library infrastructure to support access to cultural resources. The research was contextualized in six key areas, one of which was the development of a comprehensive, culturally aware and appropriate metadata and knowledge organization framework. To provide some context, the ISR (Figure 1) was designated in 1984 in the Final Agreement between the Inuvialuit and the Government of Canada. 125 Figure 1. Maps showing the Inuvialuit Settlement Region (ISR) The region covers approximately 91,000 km2 in the western Arctic region of what is now Canada. The six communities in the region are Aklavik, Inuvik, Paulatuk, Sachs Harbour, Tuktoyaktuk, and Ulukhaktok. The population is roughly 6,500, with more than half (3,400) located in Inuvik. The language of the Inuvialuit is collectively known as Inuvialuktun, which comprises three related languages: Sallirmiutun, Uummarmiutun, and Kangiryuarmiutun (Inuvialuit Regional Corporation, 2017). While “the region has an immensely rich culture and history, its geographic remoteness poses challenges for enabling easy access to cultural heritage resources” (Farnel et al. 2016, 3). The overarching goal was to help alleviate this challenge through the development of a digital library. 3.0 Methods Developing knowledge organization systems for Indigenous cultural heritage digital libraries requires multidisciplinary theoretical and methodological frameworks that take into account the cultural nuances of Indigenous knowledge creation, sharing, and dissemination. The definition and application of the metadata and knowledge organization framework for the Digital Library was an iterative process that incorporated culturally appropriate methods and made use of a number of information sources. A first source of information was the proposed content of the digital library itself. A second source of information was a series of interviews with the staff at the ICC who are the stewards of these resources. A third source of information was a crosscultural, cross-disciplinary, cross-national review of the academic and professional literature to understand what had been tried and had been shown to be successful and applicable in related projects with Indigenous communities. A fourth source of information was a review of existing digital library platforms to understand their strengths and weaknesses in this given context. The fifth and final source of information was the community itself. Information was gathered from a cross-section of the community, including elders and youth, language and culture instructors, and members 126 of the community at large. Information was gathered through means appropriate to this community context, including formal interviews and surveys, demonstrations and open houses, informal and targeted conversations, and user testing and usability sessions. The information gathered through these different activities was analyzed in order to derive dominant categories and themes. Coding was both deductive and inductive. Deductive coding reflected the main research areas of the project; inductive coding reflected the themes emerging from the information itself. Categories and themes were incorporated into the metadata and knowledge organization of the Digital Library, and tested and assessed by the community. Revisions and changes were made based on community feedback, and then tested again. There was a continuous feedback loop to ensure the framework was reflective of community interests and needs as they change and evolve over time. 4.0 Examples Cultural constructs such as language and dialect, the importance of place and resources associated with it, as well as visual interfaces have been identified as important to the ISR communities. The community driven and culturally relevant metadata and knowledge organization framework that underlies the Inuvialuit Digital Library can be seen in specific examples of metadata use and display, interface design, and content organization. The following examples highlight some key aspects of the framework as it has evolved to date. A key message from the community has been the importance of one or more means of browsing the Digital Library. In fact, a slight preference for browse over search has been indicated. The ways in which users can explore the collection have therefore been a topic of much discussion throughout the development process. Early discussions resulted in a home page that allowed for browsing by type and collection, as well as by featured images or exhibits. Over time it became clearer that there were certain key pathways into the Library that the community would like to see privileged. Figure 2 shows the current version of the home page which has the most important pathways, such as places and language resources, more prominent, with the additional pathways of resource type and a featured image still available. 127 Figure 2. Current version of the home page which emphasizes key pathways into the Library Given that the ICC’s mandate is language and culture revitalization, language learning resources, many developed by the Centre itself, represent a substantial portion of the overall Library and are seen as critical resources for highlighting. Initially, language resources were simply noted as one type of collection, accessible through browse by collection or by type, and the landing page had a simple structure of labelled images for each language (Figure 3). Figure 3. Early version of the language resource collections landing page However, as discussions continued our community collaborators discussed how they would like to do more with these collections. Not only did they want them to be a more prominent pathway into the Library, they also wanted to add additional contextual information to the landing page. The current version of the language resources landing page (Figure 4) is much richer in content as it includes information about each language and shows a map of the areas in the region where each is spoken, and clicking on any of the language names takes the user to a listing of all the resources dealing with that language. 128 Figure 4. New version of language resources landing page The Inuvialuit, like all Indigenous peoples, have a strong connection to land and place. For this reason, there has been strong interest in the ability to browse the Digital Library content by place. An early version of this functionality allows the user to find a single item on a map (Figure 5) and click through to view it, and from there use the metadata in the record to browse other items associated with that same place. Figure 5. Browsing place by locating an item on the map This functionality is well liked and is still available in the Library. However, this was not quite what the community had in mind. What has been described is the ability to start your browsing with a map, and to narrow into specific places and find all items associated with it. Figure 6 shows an interim version of this which is currently part of the Library. At the moment, this map includes only the six community names. If a user clicks on a community name (e.g., Ulukhaktok), they will be taken to a set of items with that place in the metadata. The Inuvialuit Regional Corporation (IRC), the ICC’s parent body, is currently working on a traditional place names map which will be much, much 129 richer than the one currently in use. The plan, once this map is complete, is to use it in place of the one currently in use in the Digital Library. Figure 6. Current map based browse and results from clicking on Ulukhaktok on the map An important component of a knowledge organization framework is the choice of metadata elements used to describe the resources, as well as what those elements are called, and what they contain. This is no different in the case of the Inuvialuit Digital Library, and developments in this area have been a large part of the community driven process. With many of the resources in the Library having a linguistic aspect, the ability to capture Language and Dialect, as well as Original Dialect in cases where there was a translation, was identified as critical from the earliest days of the project. The ability to see this information immediately for any resource resulted in its prominent place on any item screen, and the desire to make it easy to find other resources with the same language or dialect prompted the metadata element to be made browsable. With the cultural importance of place and land, family and community, the ability to capture in the metadata the places and people associated with a resource is critical. Early input from the community led to the renaming of elements to make them more relevant and usable; Creator and Contributor were combined into a single element and renamed People; Spatial Coverage was renamed Places. The Inuvialuit, like all Indigenous peoples in Canada, are victim to ongoing efforts to erase their culture, heritage, and language, including traditional names for people and places. These names and the traditions around them are being reclaimed by the community, and so including them in the metadata for resources in the Digital Library is extremely important. However, we have also heard of the importance of retaining in some way the colonial forms as well, as there are still in use and do represent an important part of the history of the Inuvialuit. And so a balance is struck, with the metadata including both but privileging the traditional. And the framework is also flexible enough to account for alternative spellings and dialect variations. Figure 7 shows these various aspects of resource description. 130 Figure 7. Item description showing traditional names for people and places A further aspect of the metadata description to highlight deals with the ways in which the subject matter of the resource is described. The community recognizes the value of using existing vocabularies and term lists for making the Library usable and sustainable. But there is also strong interest in being able to use the local language where and when it makes sense, and to allow for spelling and dialect variations as well. And so a growing local list of such terms is in use in the Digital Library, as can be seen in the description in Figure 8, which includes the local English and Inuvialuktun terms for parka. Figure 8. Localized subject terms for a resource in the Digital Library 131 5.0 Conclusion The most powerful experiences with digital collections occur when the knowledge structure and architecture are harnessed to the interests and needs of the community. The metadata and knowledge organization framework for the Inuvialuit Digital Library, and the collaborative methods used to develop it, demonstrate that knowledge organization is communication, understanding, and development. Noted Maori scholar Linda Tuhiwai-Smith reminds us that “the collective memory of imperialism has been perpetuated through the ways in which knowledge about Indigenous peoples was collected, classified and then represented in various ways … through the eyes of the West, back to those who have been colonized” (2012, 31). A community driven metadata and knowledge organization framework enables the Inuvialuit to tell the story they want to tell, in the way they want to tell it, pushing back against the story being told by others, re-centering the community and putting control back where it belongs. References Baca, Murtha. 2003. “Practical Issues in Applying Metadata Schemas and Controlled Vocabularies to Cultural Heritage Information.” Cataloging & Classification Quarterly 36, nos. 3/4: 47-55. Beghtol, Clare. 2002a. “A Proposed Ethical Warrant for Global Knowledge Representation and Organization Systems.” Journal of Documentation 58: 507-532. Beghtol, Clare. 2002b. “Universal Concepts, Cultural Warrant and Cultural Hospitality.” In Challenges in Knowledge Representation and Organization for the 21st Century: Integration of Knowledge Across Boundaries: Proceedings of the Seventh International ISKO Conference 10-13 July, 2002 Granada, Spain, edited by María José López-Huertas. Advances in knowledge organization 8. Würzburg: Ergon Verlag, 45-49. Boast, Robin, Michael Bravo, and Ramesh Srinivasan. 2007. “Return to Babel: Emergent Diversity, Digital Resources, and Local Knowledge.” The Information Society 23, no. 5: 395- 403. Borgman, Christine L. 1999. “What Are Digital Libraries? Competing Visions.” Information Processing and Management 35: 227-243. Bow, Catherine, Michael Christie, and Brian Devlin. 2015. “Shoehorning Complex Metadata in The Living Archive of Aboriginal Languages” In Research, Records and Responsibility: Ten Years of PARADISEC, edited by Amanda Harris, Nick Thieberger, and Linda Barwick. Sydney: Sydney University Press, 115-131. Christie, Michael. 2004. “Computer Databases and Aboriginal Knowledge.” Learning Communities: International Journal of Learning in Social Contexts: 4-12. Christie, Michael. 2005 “Words, Ontologies and Aboriginal Databases.” Media International Australia, Incorporating Culture & Policy 116, no. 1: 52-63. Clarke, Zoe. 2002 “Empowering Citizens to Tell Their Own Stories – The Cultural Objects in Networked Environments (COINE) Project.” VINE 32, no. 3. 31-36. Digital Library North. 2017. Farnel, Sharon, Ali Shiri, Sandra Campbell, Cathy Cockney, Dinesh Rathi, and Robyn Stobbs. 2017. “A Community-Driven Metadata Framework for Describing Cultural Resources: The Digital Library North Project.” Cataloging & Classification Quarterly 55, no. 5: 289-306. Farnel, Sharon, Ali Shiri, Dinesh Rathi, Cathy Cockney, Sandra Campbell, and Robyn Stobbs. 2016. “Of Places and Names: Working with Northern Canadian Communities to Enhance Subject Access to Digital Resources.” In World Library and Information Congress, 82nd IFLA General Conference and Assembly: 13-19 August 2016, Columbus Ohio, United States, Greater Columbus Convention Center: final announcement. The Hague, Netherlands: IFLA. Hennessy Kate, Natasha Lyons, Stephen Loring, Charles Arnold, Mervin Joe, Albert Elias, and James Pokiak 2013 “The Inuvialuit Living History Project: Digital Return as the Forging of Relationships Between Institutions, People, and Data.” Museum Anthropology Review 7, nos. 1/2: 44-73. 132 Holland, Maurita Peterson and Kari R. Smith (2000) “Using Information Technology to Preserve and Sustain Cultural Heritage: The Digital Collective.” UNESCO World Culture Report 2000: Cultural diversity, conflict and pluralism, UNESCO 186-196. Hudon, Michèle. 1997. “Multilingual Thesaurus Construction–Integrating the Views of Different Cultures in One Gateway to Knowledge and Concepts.” Information services & use 17, nos. 2/3: 111-123. Inuvialuit Regional Corporation. 2017. Lilley, Spencer C. 2015. “Ka Pō, Ka Ao, Ka Awatea: The Interface Between Epistemology and Māori Subject Headings”. Cataloging & Classification Quarterly 53, nos. 5/6: 479-495. Nakata, Martin. 1997. The Cultural Interface: An Exploration of the Intersection of Western Knowledge Systems and Torres Strait Islanders Positions and Experiences. Ph.D. dissertation. North Queensland, Australia: James Cook University. Nakata, Martin. 2002. “Indigenous Knowledge and the Cultural Interface: Underlying Issues at the Intersection of Knowledge and Information Systems.” IFLA Journal 28, nos. 5/6: 281- 291. Nakata, Martin. 2007. “The Cultural Interface.” Australian Journal of Indigenous Education 36, no. S1: 7-14. Pang, Natalie. 2012. “The Social Element of Digital Libraries.” In: Digital Libraries and Information Access: Research Perspectives, edited by G.G. Chowdhury and Schubert Foo. London: Facet Publishing, 83-96. Srinivasan, Ramesh. 2002. Village Voice: Expressing Narrative Through Community-Designed Ontologies. Master's thesis. Cambridge, Massachusetts: Massachusetts Institute of Technology. Srinivasan, Ramesh. 2012. “Re-Thinking the Cultural Codes of New Media: The Question Concerning Ontology.” New Media & Society 5, no.2, 203-223. Srinivasan, Ramesh. 2017. Whose Global Village?: Rethinking How Technology Shapes Our World. New York, New York University Press: 2017. Tuhiwai-Smith, Linda: Decolonizing Methodologies: Research and Indigenous Peoples (2nd ed.). London: Zed Books, 2012. Zeng, Marcia Leng and Lois Mai Chan. 2004. “Trends and Issues in Establishing Interoperability Among Knowledge Organization Systems.” Journal of the American Society for information science and technology 55, no. 5: 377-395. Amel Fraisse – Univ. Lille, EA 4073 - GERiiCO, France Samantha Blickhan – The Adler Planetarium, Chicago, USA Victoria Van Hyning – Library of Congress, Washington, USA Towards an Open, Inclusive and Sustainable Knowledge Organization Models Abstract: In an increasingly globalized context, multilingualism and multiculturalism have become major preoccupations for Knowledge Organization (KO) which have to be as fair as possible to ensure and sustain knowledge organization as a driver for development. Indeed, over time, the gap between languages of dominant nations or civilizations and other languages has been growing. In this research, we describe, evaluate and present first results of our sustainable and open access knowledge organization model. The model is based on a paradigms that permit different types of contributors, including volunteers as well as scientific and scholarly communities from across borders, languages, nations, continents, and disciplines to take part in the knowledge organization process in an efficient and dynamic way. Recent experiments with this model have been conducted on transnational literary texts as well as in the arena of crowdsourcing cultural heritage knowledge and collections enrichment. 1.0 Introduction The impact of the digital revolution on the preservation, organization, and sharing of human knowledge encoded by languages constitutes an extraordinarily rich phenomenon, characterized by both productive opportunities as well as obstacles and threats. In the first instance, digital has created tremendous opportunities in terms of accessing knowledge. New technologies also constitute a step forward in terms of public inclusion and awareness. In fact, the general public can be included and integrated, thanks to social networks and collaborative platforms, to provide mass dissemination of human knowledge. Nevertheless, there are numerous barriers that prevent sustained knowledge diversity as described by Hudon (1997), Beghtol (2002), Fraisse et al. (2019), and Barát (2008). Language is the most important barrier; as language diversity is decreasing, the preservation and transmission of such knowledge is at risk. The ever growing scientific and political interests in making knowledge open, accessible and sustainable has sparked major interest in many parts of the scientific community. Some disciplines have been concerned with problems of knowledge dissemination for a long time. Library and Information Science (LIS) is such a discipline. As a gateway to knowledge and culture, the field of LIS holds a long history on collecting, storing, organizing, and sharing access to knowledge as described by the pioneer of Documentation Studies Paul Otlet (1934). To this purpose, Knowledge Organization Systems (KOS), Information Retrieval Systems (IRS) and metadata exchange standards, among others, have been developed to meet the opportunities arising through the development of new technologies. Collections of the world's great libraries have been made available to the public through large-scale digitization. The Online Computer Library Center (OCLC), dedicated to the public purpose of furthering access to the world's information, produces and maintains WorldCat, the largest online public access catalog (OPAC) in the world. WorldCat itemizes the collections of 72,000 libraries in 134 170 countries and territories. Multilingual online digital libraries and archival projects collect documents and make them available to a wide audience. 2.0 The role of library and information science in building a global, shared knowledge community More than a century ago, Paul Otlet, the pioneer of Documentation Studies, envisioned a universal compilation of knowledge and the technology to make it globally available. He wrote numerous essays on how to collect and organize the world's knowledge (Otlet 1934). The ever growing number of digital documents and scientific and political interests in making them openly available all over the world has led to the creation of new digital collections in a broad range of fields and languages. Several Registries of Open Access Repositories (ROARs) hosted by national and international organizations and universities, have been developed. For example, The Library of Congress 1 has digitized approximately 164 million items in virtually all formats, languages, subjects, and periods. These collections are broad in scope, including research materials in more than 470 languages and multiple media. The Europeana collection2, launched in 2008 and funded by the European Commission, contains over fifteen million digitized paintings, drawings, maps, photos, books, newspapers, letters, diaries, etc., from fifteen hundred institutions. However, the language barrier is a key issue that Knowledge Organization Systems (KOS) have to address as described by Hudon (1997; 1998) and Agnes Hajdu Barat (2008). Indeed, over time, the gap between languages of dominant nations or civilizations and other languages has been growing. Although KOS include knowledge encoded in under-resourced languages, their use and exploration is still limited. 3.0 Current situation 3.1 Crowdsourcing as a means of decentering institutional authority and expanding the representation of different languages and cultures Since the year 2000 online crowdsourcing projects have proliferated in science, humanities, and cultural heritage fields. Hundreds of cultural heritage institutions have embarked on these projects, many of which explicitly invite people from diverse walks of life to transcribe, annotate or highlight text, speech, typed or handwritten documents. A wide spectrum of languages, historical periods, materials, and geographic areas are represented by these projects and the people who participate in them as described by Van Hyning (2019) and Ridge (2014). As more transcriptions and tags become available in different languages, cultural heritage institutions better represent the peoples and cultures in their collections, and the patrons and communities they serve. By inviting volunteers in to the process of transcribing, translating and tagging, institutions have the opportunity to co-create new knowledge, and make new discovery pathways through collections. These relationships can ultimately decenter traditional power dynamics and concepts of authority in knowledge systems, often for the better, though as Eveleigh (2014) demonstrates, the breakdown of barriers between professional practice and the knowledge of external participants is not inevitable with all crowdsourcing projects, but 1 2 135 rather requires careful project design and strategies of volunteer engagement. Examples of under-represented language crowdsourcing projects include the City Archive of Leuven3 project to transcribe more than 950,000 Dutch-language register pages from the Leuven court of Aldermen during the years 1362 to 1795; the Ancient Lives project, which launched on the platform in 2011, described by Williams et al. (2014), which asked online volunteers to transcribe fragments of ancient Greek texts from papyri fragments; and the Rediscovering Indigenous Languages project 4 crowdsourced the transcription of historic word lists, records and other documents relating to indigenous Australian languages. Many of these communities are selfsustaining, self-organizing, and productive of new knowledge, as well as decentralized—participants can translate and describe code and systems, but they can also contribute new functionality to them. This type of community co-creation might serve as a fruitful model for cultural heritage crowdsourcing, in which authority and the creation of descriptive records is still often overwhelmingly concentrated within institutions, rather than shared with the communities that originate cultural artefacts, texts, music, dance, and other outputs. 3.2 Increasing demand and need for global knowledge sharing and access According to the Sapient Globalization Report there are over 6,700 living languages in the world; the fifteen most popular languages are spoken by 49.5% of the world’s population, while the other 51.5% of the world’s population speak 6,600 languages. Yet, only about 6% of the world’s population speak English. Of the world’s 6000+ languages only a small fraction, a dozen or so, currently enjoy the benefits of modern information technologies and knowledge organization systems. A larger but still modest number, close to a hundred, have the so-called Basic LAnguage Resource Kit (BLARK): monolingual and bilingual corpora, machine readable dictionaries, terminologies, thesauri, ontologies and the like as described by Steven Krauwer (2003) and Antti Arppe et al. (2016). Preserving knowledge diversity and ensuring the right of all people to access knowledge in their mother tongue is the main goal of the Information for All Programme (IFAP) created by UNESCO. Several research work have called for cultural and linguistic diversity as described by e.g. Adler et al. (2016), Beghtol (2005), Dahlberg (1992), López-Huertas (2016), and Mustafa El Hadi (2015). In a previous research work Beghtol (1986; 2001) introduce the concept of cultural warrant. Fisher Fishkin (2011) introduced and described a new model for data curation and sharing by inviting colleagues around the world to collaborate on Digital Palimpsest Mapping Projects (DPMPs), or “Deep Maps”. Deep Maps, curated collaboratively by scholars in multiple locations, would put multilingual digital archives around the globe in conversation with one another, using maps as the gateway. 3 4 136 4.0 The open, inclusive and sustainable knowledge organization models 4.1 Basic principles 4.1.1 From closed, discontinuous and out of context to open, continuous and in context knowledge organization models Our solution aims to move from a closed, discontinuous, and out of context to open, continuous, and in context knowledge organization model. The basic concept is based upon the software localization paradigm proposed by Fraisse (2010) and Fraisse et al. (2009) and promoting the right of all people to use software in their mother tongue . It consists of renouncing the idea of perfect and complete knowledge and publishing partial knowledge with variable quality, which will be improved incrementally during the use of the knowledge organization system. Therefore, the knowledge organization process will be ongoing and improve continuously. The new process permits the incremental augmentation of both quality and quantity. The best known example of this is the Wikipedia community, in which knowledge is added and improved continuously by contributors. 4.1.2 From exclusive, unilateral and unsustainable to inclusive, collaborative and sustainable Although, current knowledge organization models seems impossible for most languages and even less so for endangered ones, both for reasons of cost, and quite often a scarcity or even lack of expert in these languages. Our solution aims at involving nonexperts such as volunteer contributors and especially end-users. These groups have the capacity to participate effectively, since they have a better knowledge of the target language (generally their native language) as well as the context of knowledge being processed. 4.2 The ROSETTA knowledge organization model An early implementation of the open, inclusive and sustainable knowledge organization model described above was implemented and experimented under the international research project ROSETTA 5 funded by the France-Stanford center for interdisciplinary studies. The main goal of this project consists of defining a knowledge organization model for translated literary texts as well as related scientific documents to these translations. As described by Fraisse et al. (2019), the proposed model is open, inclusive and sustainable. It is based on contributions of end-users as well as scientific and scholarly communities from across borders, languages, nations, continents, and disciplines. It consists in collecting knowledge about all worldwide translations of one original work and sharing that data through a digital and interactive global knowledge map. The proposed sustainable model allows different types of volunteers and contributors to take a part in the knowledge organization process in an efficient, dynamic and symbiotic way: while using the knowledge map, volunteers and contributors who know the local culture and language can participate by adding missing information about a given translation of an original work. Volunteers and contributors could be scholars or simply citizens interested in preserving knowledge diversity. We define global 5 137 knowledge ? ? ? ? about an original work ? ? ? ? as a set of knowledge ? ? about different translations ? ? of ? ? ? ? : ? ? ? ? ? ? ? ? = �? ? ? ? ? ? ? ? 1 ,? ? ? ? ? ? ? ? 2 , … ,? ? ? ? ? ? ? ? ? ? � where a knowledge ? ? ? ? ? ? ? ? ? ? about a given translation ? ? ? ? is a set of key properties as: ? ? ? ? ? ? ? ? ? ? = {? ? ? ? ? ? ? ? ? ? , ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? , ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? } Related knowledge was organized collaboratively and in context by scholars through an interactive and online global knowledge map. As described in Figure 1, the map displays all knowledge about all existing documents related to a given literary work. Each document is represented by a node on the world map, which could be considered as “completed” when all required knowledge is provided and “partially completed” when it lacks some knowledge. Nodes are updated incrementally by transnational endusers and scholars through the map. Indeed, during the map exploration, the end-user could edit any node to add missing knowledge. Figure 1: The global knowledge map representing existing translations of Adventures of Huckleberry Finn. The bubble over Brazil is highlighted, displaying the relevant information for the Portuguese translations from Brazil. 4.3 Co-creating and crowdsourcing knowledge of folklife and music traditions through the Library of Congress Traditions of collaborative knowledge creation in cultural heritage are perhaps rarer than they should be, but there are precedents in this sector as well. The twentieth-century folklorist Alan Lomax devoted his life to recording, celebrating, and promoting folk artists and tradition bearers in America, the Caribbean, and Europe. He conducted extensive fieldwork trips during which he produced audio recordings and extensive notes about the people he met, and their traditional arts. His goal was to demonstrate the value of traditional arts, and challenge what he saw as a hegemonic media and cultural system in America and Europe which failed to make room for cultural differences and killed off diversity. As Harvey et al. (2017) argue, Lomax was critical of “a centralized mediascape through which was broadcast an industrial American monoculture”. “Too few transmitters and too many receivers” was his central complaint. He was frustrated with the myopic unilateralism of corporate programming, which he saw operating through an “over-powerful, over-rich, over- reaching” communication system. His 138 answer to this was what he termed “cultural equity”: the right for folk communitieswhat he called “little bubbles of song and delight and ways of life and cookery,” encompassing “hundreds of thousands of these little generators of the original” - to have their voices heard and their traditions represented.” Lomax ultimately recorded over 1000 cultural groups, and hundreds of under-represented languages. He established the Association for Cultural Equity to advocate for folk artists, and donated his field notebooks, recordings, letters, and other papers to the Library of Congress where he helped to establish the American Folklife Center (AFC). In 2015, the AFC digitized Lomax’s papers and made them available online. In 2019, AFC partnered with a new crowdsourcing effort called By the People6 at the Library of Congress, to crowdsource the transcription, review, and tagging of these papers. By the People’s goals are to engage a diverse volunteer base with cultural heritage preserved at the Library of Congress; to generate transcriptions that will improve online search at the document level, and to provide transcriptions that can be read by screen readers, in order to assist people with visual or cognitive impairments, and those who can’t read original handwriting. By the People launched in October 2018 and to date volunteers have transcribed over 100,000 pages from a variety of collections including the papers of Rosa Parks, Walt Whitman, President Abraham Lincoln, and leading suffragists such as Susan B. Anthony and Mary Church Terrell. Volunteers are encouraged through the site itself, emails, in-person events, and social media to explore the documents, ask questions, speak with one another, and Library employees about their findings, struggles, joys, and what they’re learning. Their knowledge is taken back into the Library website in the form of transcriptions and enhanced metadata. By the People is a natural extension of Alan Lomax’s efforts to build “‘two-way bridges’ and [. . .] ‘two-way inter-communication systems’ for traditions presented in any medium” as described by Baron (2012). Documents in “The Man Who Recorded the World: On the Road with Alan Lomax” By the People transcription Campaign include materials in Haitian Creole, and dialects of Swedish, Polish, Danish, Hungarian, and other languages spoken by nineteenth- and twentieth-century migrants to the American Midwest, which volunteers transcribe in the original language. In addition to reaching out to over 30,000 registered volunteers to encourage them to participate in the project, AFC folklorists reached out to several descendants of the tradition bearers whom Lomax originally recorded to encourage them to contribute to By the People, and bring their knowledge to bear in this next phase of folklife preservation and exploration. 4.4 Crowdsourcing multilingual knowledge: The Zooniverse platform One current example of Knowledge Organization principles applied through crowdsourcing is Scribes of the Cairo Geniza7, a collaboration between Zooniverse8, the University of Pennsylvania Libraries, and more than half a dozen research institutions who have provided digital geniza images for the project9. The Zooniverse is the largest platform in the world for online crowdsourced research, with more than 250 projects launched since its inception in 2009. As of writing, the Zooniverse has more 6 7 8 9 139 than 1.9 million registered volunteers, who have collectively produced over 460 million classifications on crowdsourcing projects from a variety of disciplines, including astronomy, biology, ecology, climate science, history, and social science as explained in Blickhan et al. (2019). In 2015, Zooniverse launched the Project Builder10, a tool which allows anyone to create and run their own crowdsourcing project, hosted on Zooniverse, free of charge. The platform is maintained by teams based at the University of Oxford (Oxford, UK), the Adler Planetarium (Chicago, IL), and the University of Minnesota Twin Cities (Minneapolis, MN). While the majority of users thus far have been from English-speaking countries, the Zooniverse community is international. To reflect the global userbase, the Zooniverse team created a translation interface, which allows projects to be translated either by project team members, or by volunteers who want to help make a project available for a specific community of speakers. Along with the relatively recent option of a multilingual interface, the Zooniverse has featured multilingual project content since its early days. Ancient Lives11, which launched in 2011, invited volunteers to transcribe fragments of the Oxyrhynchus papyri. In order to open up participation to members of the public who were not fluent in Ancient Greek, the team created a clickable keyboards that volunteers could use to transcribe fragments through character matching. The Scribes of the Cairo Geniza project launched a transcription interface in 2019 with a similar type of assistive keyboard, as well as a multilingual user interface in Arabic, English, and Hebrew. In Scribes of the Cairo Geniza, volunteers are asked to help sort and transcribe fragments of the Cairo Geniza, a corpus of discarded fragments of pre-modern manuscripts discovered in the Ben Ezra synagogue in Fustat (now known as Cairo). The project is broken down into a series of workflows. The Sorting workflow asks volunteers to classify fragments as being written in either Arabic script, Hebrew script, or both. Based on the script type identified, volunteers are then asked to identify specific visual features like page layout, evidence of binding, etc., which can be added to each fragment’s record and assist in the identification process. Once a fragment has been sorted, it is sent to one of four transcription workflows: Easy or Difficult Arabic, or Easy/Difficult Hebrew. Separating workflows based on task type allows volunteers to choose how they wish to contribute based on their comfort level with the different tasks. For example, volunteers who are unable to read Hebrew or Arabic script are able to participate in the Sorting workflow, which offers an introductory tutorial as well as resources on identifying the differences between Hebrew and Arabic script. Volunteers who are fluent in Hebrew and/or Arabic, or who are able to confidently read either script, may choose to participate in the various transcription workflows. The transcription workflows feature clickable keyboards, which function as an additional linguistic and paleographic resource for transcribers who may need additional visual cues to aid in the transcription process. 5.0 Conclusion Knowledge Organization is facing a range of highly challenging issues considering the diversity of knowledge encoded in different languages and in particular those encoded in vulnerable and under-resourced ones. In this paper we described and explored an open, inclusive and sustainable knowledge model that permit different types 10 11 140 of contributors, including volunteers as well as scientific and scholarly communities from across borders, languages, nations, continents, and disciplines to take part in the knowledge organization process in an efficient and dynamic way. We explored examples of modern online crowdsourcing, as well as some of the historic attitudes within cultural heritage institutions that have led to or stood in contrast to ideas of co-production or collaboration between institutional gate-keepers and patrons of diverse cultural backgrounds. Crowdsourcing has hudge potential to expand the representation of vulnerable languages and cultural practices within the cultural heritage record, and to radically expand the base of people who contribute to the knowledge that is preserved and treated as authoritative by cultural heritage organizations, academia, and other domains. References Adler, Melissa A., Joseph T. Tennis, Daniel Martínez-Ávila, José Augusto Chaves Guimarães, Jens-Erik Mai, Ole Olesen-Bagneux, and Laura Skouvig. 2016. “Global/local Knowledge Organization: Contexts and Questions.” Proceedings of the Association for Information Science and Technology 53, no. 1: 1-4. Arppe, Antti, Jordan Lachler, Trond Trosterud, Lene Antonsen, and Sjur N. Moshagen. 2016. “Basic Language Resource Kits for Endangered Languages: A Case Study of Plains Cree”. In In CCURL 2016: Collaboration and Computing for Under-Resourced Languages: Towards an Alliance for Digital Language Diversity (LREC 2016 Workshop), edited by Claudia Soria, Laurette Pretorius, Thierry Declerck, Joseph Mariani,Kevin Scannell, Eveline Wandl-Vogt. Portoroz, Slovenia: European Language Resource Association, 1-8. Barát, Ágnes Hadju. 2008. “Knowledge Organization in the Cross-Cultural and Multicultural Society”. In Culture and Identity in Knowledge Organization: Proceedings of the Tenth International ISKO Conference 5-8 August 2008 Montréal, Canada, edited by Clément Arsenault and Joseph T. Tennis. Advances in knowledge organization 11. Würzburg: Ergon Verlag, 91–97. Baron, Robert. 2012. “”All Power to the Periphery” The Public Folklore Thought of Alan Lomax”. Journal of Folklore Research 49, no. 3: 275- 317. Beghtol, Clare. 1986. “Semantic Validity: Concepts of Warrant in Bibliographic Classification Systems.” Library Resources and Technical Services 30, no. 2: 109‐25. Beghtol, Clare. 2001. “Relationships in Classificatory Structure and Meaning.” In Relationships in the Organization of Knowledge. edited by Carol A Bean and Rebecca Green. Kluwer, Dordrecht: 99‐113. Beghtol, Clare. 2002. “Universal Concepts, Cultural Warrant and Cultural Hospitality”. In Challenges in Knowledge Representation and Organization for the 21st Century: Integration of Knowledge Across Boundaries: Proceedings of the Seventh International ISKO Conference 10-13 July, 2002 Granada, Spain, edited by María José López-Huertas. Advances in knowledge organization 8. Würzburg: Ergon Verlag, 45-49. Beghtol, Clare. 2005. “Ethical Decision-Making for Knowledge Representation and Organization Systems for Global Use.” Journal of the American Society for Information Science and Technology 56, no. 9: 903–12. Samantha Blickhan, Coleman Krawczyk, Daniel Hanson, Amy Boyer, Andrea Simenstad, Victoria Hyning, and Victoria van Hyning. 2019. ”Individual vs. Collaborative Methods of Crowdsourced Transcription.” Journal of Data Mining and Digital Humanities: hal- 02280013v2. Dahlberg, Ingetraut.1992. “Ethics and Knowledge Organization: In Memory of Dr. S.R. Ranganathan in His Centenary Year.” International Classification 19, no. 1: 1–2. 141 Eveleigh, Alexandra. 2014. “Crowding Out the Archivist? Locating Crowdsourcing within the Broader Landscape of Participatory Archives,”. In: Crowdsourcing our Cultural Heritage, edited by Mia Ridge. London: Routledge, 211-229 Fishkin Fisher, Shelley. 2011. “Deep Maps: A Brief for Digital Palimpsest Mapping Projects (DPMPs, or “Deep Maps”).” Journal of Transnational American Studies 3, no. 2. Fraisse, Amel. 2010. Localisation Interne et en Contexte des Logiciels Commerciaux et Libres. Ph.D. dissertation. Grenoble : Université de Grenoble. 00995093 Fraisse, Amel, Christian Boitet, Hervé Blanchon, and Valérie Bellynck. 2009. “A Solution for In Context and Collaborative Localization of Most Commercial and Free Software”. In Proceedings of the 4th Language and Technologies Conference, vol 1. Poznan, Poland, 536- 540. Fraisse, Amel, Zheng Zhang, Alex Zhai, Ronald Jenn, Shelley Fisher Fishkin, Pierre Zweigenbaum, Laurence Favier, and Widad Mustafa El Hadi. 2019. “A Sustainable and Open Access Knowledge Organization Model to Preserve Cultural Heritage and Language Diversity”. Information 10, no. 10: 303. Harvey, Todd, Andrew Peart, and Nathan Salsburg. 2017. “Alan Lomax and the "Grass Roots" Idea.” Chicago Review 60/61, no. 4/1: 37-45. Hudon, Michèle. 1997. “Multilingual Thesaurus Construction-Integrating the Views of Different Cultures in One Gateway to Knowledge and Concepts”. In: Information Services and Use 17: 11–123. Hudon, Michèle. 1998. “Information Access in a Multilingual and Multicultural Environment”. Presented at Congrès de l'American Society of Indexers. Seattle (WA). Krauwer, Steven. 2003. “The Basic Language Resource Kit (BLARK) as the First Milestone for the Language Resources Roadmap”. In Proceedings of the International Workshop Speech and Computer. López-Huertas, María. 2016. “The Integration of Culture in Knowledge Organization Systems.” In Knowledge Organization for a Sustainable World: Challenges and Perspectives for Cultural, Scientific, and Technological Sharing in a Connected Society. Proceedings of the Fourteenth International ISKO Conference 27-29 September 2016, Rio de Janeiro, Brazil, edited by José Augusto Chaves Guimarães, Suellen Oliveira Milani, and Vera Dodebei. Advances in knowledge organization 15. Würzburg: Ergon, 13–28. Mustafa El Hadi, Widad. 2015. “Cultural Interoperability and Knowledge Organization Systems.” In Organização do Conhecimento e Diversidade Cultural, Proceedings of the 3rd Brazilian ISKO-Conference, edited by José Augusto Chaves Guimarães and Vera Dodebei. Marília, São Paulo: Fundação para o Desenvolvimento do Ensino, Pesquisa e Extensão (FUNDEPE). 575– 606. Otlet, Paul. 1934. Traité de Documentation: Le livre sur le Livre: Théorie et Pratique, Mundaneum: Bruxelles, Belgium. Ridge, Mia. (Ed.). 2014. Crowdsourcing our Cultural Heritage. Farnham: Ashgate. Van Hyning, Victoria. 2019. “Harnessing Crowdsourcing for Scholarly and GLAM Purposes.” Literature Compass 16, nos. 3-4. Williams, Alex C., John F. Wallin, Haoyu Yu, Marco Perale, Hyrum D. Carroll, Anne-Francoise Lamblin, Lucy Fortson, Dirk Obbink, Chris J. Lintott, and James H. Brusuelas. 2014. “A Computational Pipeline For Crowdsourced Transcriptions Of Ancient Greek Papyrus Fragments”. In 2014 IEEE International Conference on Big Data (Big Data), Washington, DC, 100-105. doi: 10.1109/BigData.2014.7004460. Jonathan Furner – University of California, Los Angeles, USA New Formats, Shifting Fortunes Late-Twentieth-Century KO in the Wild Abstract: Three candidates for the knowledge organization (KO) systems that enjoyed the widest, most popular usage in the late twentieth century are (1) Encyclopædia Britannica’s Propædia or “Outline of Knowledge,” (2) the “Synopsis of Categories” at the heart of HarperCollins’ Roget’s International Thesaurus, and (3) OCLC’s Dewey Decimal Classification. Surprisingly, given the popularity of these systems (which continues in the latter two cases into the 2020s), only the last has received special attention in the ISKO community. The goal of this paper is to compare the function, form, and content of each of the three systems, in the context of a taxonomy of evaluation methods for KO applications that takes into account similarities and differences in formats and purposes. 1.0 Introduction Classification involves the identification of “groups of things [that] can then be combined and arranged to make a . . . system” (Beghtol 2010, 1045). In the remainder of this paper, I compare and contrast three cases in which classification has been carried out in the production of a practical system of a different kind. Each system is distinguished by the particular kind of things that make up its groups. In the first case, the system is a scheme for describing and arranging the subjects (i.e., topics) of entries in a general English-language encyclopedia (and thus for describing and arranging those entries themselves). An encyclopedia is “a literary work containing extensive information on all branches of knowledge, usually arranged in alphabetical order” (OED Online, December 2019).1 The encyclopedia in the case in question is the 15th edition of the Encyclopædia Britannica (EB; Hoiberg 2010);2 the scheme is the “Outline of Knowledge” (OoK), commonly known as the Propædia. In the second case, the system is a scheme for describing and arranging the meanings of entries (i.e., concepts) in a general English-language thesaurus (and thus for describing and arranging those entries themselves). A thesaurus is “a collection of concepts or words arranged according to sense” (OED Online, December 2019).3 The thesaurus in the case in question is the 8th edition of Roget’s International Thesaurus (RIT; Kipfer 2019);4 the scheme is the “Synopsis of Categories” (SoC). In the third case, the system is a scheme for describing and arranging the subjects (i.e., topics) of entries in any general English-language library catalog (and thus for describing and arranging those entries themselves, as well as the resources that are themselves described by those entries). A catalog is “usually distinguished from a mere list or enumeration, by systematic or methodical arrangement, alphabetical or other order, and often by the addition of brief particulars, descriptive, or aiding information” (OED Online, December 2019). The systematic arrangement adopted for the subjects of 1 For the history of encyclopedias, see Loveland (2019). 2 For the history of the Encyclopædia Britannica, see Whiteley (1992). 3 For the history of thesauri, see Hüllen (2009). 4 For the history of Roget’s Thesaurus, see Hüllen (2004). 143 library resources is typically set down in a library classification scheme.5 The scheme in the case in question is the Dewey Decimal Classification (DDC; Mitchell 2011).6 2.0 Method of Construction Each scheme was originally the work of one pioneer, collecting names of items (fields, concepts, subjects) and grouping items into classes and subclasses manually, on the basis of individual experience and self-proclaimed expertise. The first instance of the OoK, published in 1974, revised in 1985, and left to collect dust alongside the rest of the final print edition of EB in 2012, was compiled by Mortimer J. Adler (1902– 2001). The first instance of the SoC, published in 1852, was compiled by Peter Mark Roget (1779–1869); the version currently in use (Kipfer 2019) was developed by Robert L. Chapman (1920–2002) for the 1992 edition of RIT. The first instance of the DDC, published in 1876, was compiled by Melvil Dewey (1851–1931); much later, successive editions would be the work of small teams of editorial staff members, each led by a single editor-in-chief and supported by an international advisory board (the Editorial Policy Committee, EPC). Each scheme is essentially subjective, in that the contents of classes, and the relationships among them, are not somehow “read off” an objective reality; classes are assigned to positions in a tree structure on the basis of an individual’s perceptions and judgments.7 These perceptions and judgments are bound to vary greatly in accordance with differences in personal attitudes, preferences, and goals, as well as with differences in the sociocultural contexts characteristic of different times and places. So it is to be expected that schemes developed by different people for different purposes, even if they are intended for general rather than special application, will vary in form at the macrolevel, let alone at the micro-level. What is remarkable, given this expectation, is the degree to which the three schemes are in fact similar in certain aspects of their form, as well as in their function. 3.0 Function The primary functions of the three schemes—threefold in each case—are similar, as demonstrated by the following summary. The OoK is a scheme for describing, classifying, and arranging the entries in an encyclopedia according to the fields (i.e., the disciplines) to which those entries contribute; the SoC is a scheme for describing, classifying, and arranging the words and phrases in a lexicon according to the meanings of those words and phrases; the DDC is a scheme for describing, classifying, and arranging the resources in a collection according to the subjects (i.e., the topics) of the works that those resources instantiate. The results of applying any of the schemes to any given collection of items (entries, words, or resources) are (a) that items with similar characteristics are brought close together, as members of the same class, and (b) that classes with similar characteristics 5 For the (pre-1930) history of library classification, see Richardson (1930). 6 For the history of the Dewey Decimal Classification, see Miksa (1998). 7 Note we are not talking here of the assignment of resources, words, or entries to classes, but of the initial assignment of classes to positions that comprises the act of original creation of a scheme. 144 are brought close together, as proximate classes. In general, similarity among fields, meanings, or subjects is represented in the scheme by proximity among classes. The other two primary functions of the schemes are (a) indexical and (b) pedagogical. Each scheme provides a means for readers (a) to search for, locate, and access items of interest, and (b) to learn about the world, both by interacting with those items and by studying the scheme itself. In particular, each scheme provides a sense of the shape, size, and structure of the totality of general knowledge. 4.0 Form Each scheme takes the form of a tree whose lowest level of branches is relatively small in number: The OoK has ten main classes; the SoC has fifteen; the DDC has ten. In each case, the main classes are divided into a small number of subclasses, which are in turn divided into sub-subclasses, and so on. The DDC in its current form is the odd one out, in the sense that it is several orders of magnitude larger and more complex than the other two; but its basic structuring principle—the hierarchy—is the same. The main classes of each scheme are presented in Tables 1, 2, and 3. 5.0 Content The “Outline of Knowledge” that formed the bulk of the EB’s Propædia is a list of 15,000 subject headings, subheadings, sub-subheadings, etc., arranged systematically in a 7-level taxonomic structure. Each part is divided into a number of divisions (42 in total), each of which is divided into a number of sections (189 in total), in which each topic covered is outlined; at the end of each section, a list is given of suggested readings in the Macropædia and Micropædia. This structure was Adler’s, in his capacity as director of planning for the new EB, and had been worked out between 1965 and 1968. The rearrangement reflected Adler’s “love for classification and bringing a unity to knowledge,” and more generally his “interest in self-education” (Whiteley 1992). Adler was adamant that this structure should be viewed, not as a line or tree, but primarily as a circle. For Adler (1974, 6), the circle is a “powerful metaphor”: “with the circular arrangement of the parts, and with the rotation of the circle, the reader can begin anywhere in the circle of learning and go to adjacent parts around the circle; or, moving along interior transecting lines, the reader can go from any part across the circle to parts that are not adjacent on the circumference.” Moreover, the OoK’s part 10 might be placed in the center of the circle, reflecting a distinction between (a) “what we know about the world . . . by means of the various branches of learning or departments of scholarship” (parts 1 through 9) and (b) “what we know about the branches of learning or departments of scholarship—the various academic disciplines themselves” (part 10). The latter is what Quinton (1974, 9) calls “knowledge about knowledge, or knowledge of the second order”: i.e., the fields of logic, mathematics, science (“conceived as a knowledge-seeking activity, not as a set of findings”), history and the humanities, and philosophy. 145 Table 1. Main classes of the OoK (2010). # Caption ? ? ? ? /? ? ? ? 1 2 3 4 5 6 7 8 9 10 Matter and Energy The Earth Life on Earth Human Life Human Society Art Technology Religion The History of Mankind The Branches of Knowledge 39 25 44 24 46 39 34 35 132 45 8% 5% 9% 5% 10% 8% 7% 8% 28% 10% ? ? = count of pages for each class ? ? ? ? = total count of pages = 464 Table 2. Main classes of the SoC (2019). # Caption ? ? ? ? /? ? ? ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 The Body and the Senses Feelings Place and Change of Place Measure and Shape Living Things Natural Phenomena Behavior and the Will Language Human Society and Institutions Values and Ideals Arts Occupations and Crafts Sports and Recreation The Mind and Ideas Science and Technology 92 65 86 57 12 8 196 42 77 68 20 20 17 256 59 9% 6% 8% 5% 1% 1% 18% 4% 7% 6% 2% 2% 2% 24% 5% ? ? = count of categories in each class ? ? ? ? = total count of categories = 1075 146 Table 3. Main classes of the DDC (2011). # Caption in 2011 Caption in 1876 ? ? ? ? /? ? ? ? 000 100 200 300 400 500 600 700 800 900 Computer science, information & general works Philosophy & psychology Religion Social sciences Language Science Technology Arts & recreation Literature History & geography [no caption] Philosophy Theology Sociology Philology Natural Science Useful Arts Fine Arts Literature History 97 67 158 602 58 307 527 239 91 281 4% 3% 7% 25% 2% 13% 22% 10% 4% 12% ? ? = count of pages for each class ? ? ? ? = total count of pages = 2427 The 8th International edition’s 15 main classes into which its 1,075 categories of words and phrases are grouped are outlined in a “Synopsis of Categories” (Kipfer 2019, xix–xxxix), just as Peter Mark Roget’s 6 main classes were in his 1st edition of 1852. The current structure was introduced by Robert L. Chapman (emeritus professor of English, Drew University, Madison, NJ) as editor of the 5th International edition of 1992. Prior editions had retained Roget’s original structure with remarkably little change, as have all U.K. editions to date. Chapman should be credited for the most farreaching of all revisions made since 1852. In his work on the Thesaurus, Chapman acknowledges the help of the philosopher Charles Courtney (Drew University), and the cognitive psychologist George Miller (Princeton University), well known for his work on WordNet, the lexical database of English.8 Somewhat remarkably, the ten main classes in the DDC have survived into the twenty-first century in essentially the same form that they had in Melvil Dewey’s original plan of 1876. Precursors of Dewey’s scheme include Nathaniel B. Shurtleff’s Decimal System for the Arrangement and Administration of Libraries (1856) and William Torrey Harris’s scheme for the classification of books in the St. Louis Public School Library (1870), as well as the systems of Bacon (in The Advancement of Learning, 1605) and Hegel (in Enzyklopädie der philosophischen Wissenschaften, 1817) for the classification of the sciences, whose influences on Dewey have been much debated down the years.9 Wiegand (1998, 189), for example, concludes that Dewey chose Harris’s hierarchy for his own scheme “because it fit the Anglo-Saxon world into which he was born, a world further refined by the ... tradition, curriculum, and faculty” of the tiny Amherst College where Dewey had studied and worked. Given its widespread use around the world, “it is probably also fair to say that for the past century [the DDC] has quietly—almost invisibly—occupied an influential position as one of the 8 See 9 For the history of classification of the sciences, see Flint (1904). 147 forces sustaining the discursive formations of a Eurocentric patriarchy” (Wiegand 1998, 190). Hope Olson and others have criticized the DDC hierarchy for its marginalization and exclusion of “groups and topics outside of canonical knowledge” (Olson 1996, 302). Meanwhile, calls to “ditch Dewey” on account of its perceived user-unfriendliness have multiplied in an age of instant keyword searching and browsing.10 6.0 Evaluation Quinton (1974, 10) asserts that there are “two kinds of need” which KO systems must serve. “The first is sternly practical. . . . Classification by subject-matter is essential to the reader with access to the [library] shelves to show him what there is on the subject he is interested in.” Quinton continues: “. . . [T]here also exists a theoretical interest in attempting to find some ideal, or at least proper, order for the various fields of knowledge.” Taking the “theoretical interest” first: There are at least three different aspects of a KO system that we might consider when conducting our test of propriety. These are (a) the extent of the range covered by the entire set of main classes; (b) how and where the boundaries of individual classes are defined; and (c) the ways in which individual classes are related to one another. Simultaneously, there are at least four different kinds of criteria that we might choose to use to test the propriety of a KO system. These are (a) correspondence with some objective reality or ground truth; (b) internal consistency or coherence; (c) utility: i.e., ease, efficiency, and effectiveness of use; (d) morality: i.e., construction in accordance with some code of ethics. In at least the case of utility, then, the theoretical question devolves to the practical one, on the pragmatic assumption that the ideal system is the one that works best. How should a program of evaluation of the practical utility of these three KO systems proceed? It would be impractical to compare one scheme (or even one version of a scheme) with another, regardless of whether or not the comparison were for the same application context (encyclopedia entries, concepts, library resources). Yet a common evaluation program would be good, if only for the sake of efficiency. Information retrieval (IR) evaluation, based on measurements of query–document relevance, could potentially provide a model;11 but (a) limitations on the kinds of tasks and goals that are involved in IR tests, and (b) a lack of pools of relevance judgments in the contexts in question, mean that other methods should be explored. One simple plan might be to observe real users engaged in meaningful tasks, and to ask them to rate their success—a test, in other words, of user satisfaction. Given the sense that “going digital” has too often involved throwing the baby out with the bathwater—e.g., OoK’s demise with EB’s move to fully-digital in 2012; SoC’s absence from implementations of online thesauri—an effective design could be to compare “print with KO scheme” vs. “print without” vs. “digital with KO scheme” vs. “digital without.” If such a program were to provide warrant for the reinstatement of “digital with” as a vital part of users’ knowledge-seeking routines, so much the better. Our fourth criterion, the ethics of scheme construction and usage, has come more and more to the fore in considerations of KO evaluation. Apparently attempting to deflect 10 See, for example, Chiavaroli (2019). 11 See, for example, Harman (2011). 148 the kind of criticism that has in recent years increasingly been leveled at the DDC, Adler (1974, 6) asks how the OoK can avoid “tendentiousness or arbitrariness.” Does it not “reflect, perhaps even conceal, a commitment to one set of organizing principles rather than another? Does it not embody biases or preconceptions that are not universally acceptable?” Adler provides two immediate responses: (a) that the OoK was constructed “in the light of detailed recommendations, directions, and analytical contributions from scholars and experts in all the fields of knowledge represented”; and (b) that it was conceived, not as a hierarchy, but as “a circle of learning.” Dorothy Auchter (1999, 295) notes that many of the early reviewers of the 15th edition of the EB were “simply bewildered” by its tripartite structure. The decision to retain the alphabetical arrangement of Macropædia entries, rather than to follow the topical arrangement adopted in the Propædia was widely interpreted as a failure of nerve, one that “seriously undermines” the ability to browse among entries of related interest (Auchter 1999, 295), and that leads to frustration and scepticism. Samuel McCracken (1976, 63) describes the “dismembering” of the EB into “mini- and maxipedias” as “devoid of benefits” with “nothing to recommend it.” “The Propædia, at least, is harmless,” reckons McCracken (1976, 63). Others are less kind. Suzanne Selinger (1976, 440) is concerned about “the problem of bias or subjectivity,” noting (441) that it is no coincidence that “[t]he Propædia, the circle of learning that can theoretically begin anywhere, chooses to begin in its printed appearance with science.” Moreover (442), “it is clear from the weighting of topics that the values of the Propædia . . . are [preponderantly scientific].” This bias towards science is ironic given (442) that “[o]bjectivity and neutrality were among the great goals of the scientific method,” and (444) that “objectivity, absolutism, and the unity of truth” are [Adler’s] ideals and beliefs.” In sum (445), the Propædia is “grounded in and inseparable from the values, approaches, and presuppositions of scientism.” As a result (445), “The two cultures are not reconciled; one has been opted for at the expense of the other.” Anthony Quinton (1974, 9) is fairly suspicious of Adler’s credentials: “Adler’s long association with the movement in Chicago . . . which has sought to restore to learning unity of a kind exemplified in the work of Aristotle and with a pronounced neo-Thomist inflection” might reasonably raise questions about potential bias in the structuring of OoK. Nevertheless, Quinton (10) admits, “One would have to be very suspicious to think that this [conception of learning as a circle or “old-fashioned pie”] concealed some deep, ideological design and that perfectly sound and straightforward reasons had not been given for it.” Educator Robert McClintock (1976), meanwhile, is sharply critical of the 15th edition of the EB for its “inadequacy as an educative instrument.” Firstly, a “cult of authority, objectivity, and neutrality” is “embodied” in the EB, making it impossible for articles to be included “in which the author concretized and spoke directly to the curiosity and intelligence” of the layperson; secondly, most articles speak in an authoritative voice about fields of established knowledge, rather than in an educative voice about the questions that readers have; and thirdly (and most importantly, in the present context), the OoK, while “impressively complete in its range and detailed in its elaboration,” is arranged in an authoritative rather than a pedagogical order. “In working out an authoritative order, one starts with a body of knowledge and asks what order do 149 the authorities see in it . . . In working out a pedagogical order, one starts with a student and asks what order he should follow if he is best to apprehend the subject at hand . . .” Whoever takes the OoK at face value and follows it from its beginning, as a guide to self-study, “will be sent first to a long article on the ‘Nucleus, atomic’ . . . written entirely in the authoritative voice . . . simply not a feasible point at which to begin his study.” Quinton concludes his 1974 review by calling OoK “an immensely thorough and detailed piece of work.” For Quinton (11), “It embodies in its general form no striking innovations and does not conceal within it any principles likely to provoke controversy. Free from architectonic Procrusteanism it seems, for all its elaboration, a practical answer to a practical problem. Only the most exquisitely fastidious could think that they are somehow being got at.” Nevertheless, by the early 1990s, it was uncontroversial for Whiteley (1992, 84) to assert that “There is no index to the Propædia and it is difficult for the unsophisticated reader to use. This volume appears to be the least-used part of the encyclopedia.” 7.0 Conclusion With the publication of the final print edition of EB in 2010, Adler’s OoK seems to have died a largely unlamented death. Whether or not the OoK, or anything like it, will ever be resurrected and pressed into service online is a matter for Encyclopædia Britannica, Inc.’s accountants to consider. That there remains a market, however shrunken, for authoritative KO systems like the OoK to serve as guides for knowledgeseekers is demonstrated by the publication in 2019 of an 8th print edition of RIT (in the face of a plethora of online thesauri offering instant keyword searching), and biannual print versions of the DDC (in addition to the continuously updated WebDewey service to which thousands of libraries around the world subscribe). Whatever the ultimate fate of the OoK itself, perhaps the simple concept of the “circle of learning” that distinguished the OoK from its predecessors may profitably be salvaged, and used in response to critiques of line- and tree-based systems that necessarily have tops and bottoms, firsts and lasts, beginnings and ends. One additional function of KO schemes that has so far gone unremarked, and that goes beyond even the pedagogical function mentioned above, is one that we might call the generative function. What does one get out of reading the OoK, the SoC, or the DDC? A sense of one particular view—perhaps fundamentally mistaken—of the shape, size, and structure of the totality of knowledge, for sure; but also, potentially, ideas about new fields, new concepts, new subjects that are not currently part of that totality, and ideas about new ways of organizing that totality. The generative function is what continues to make the future of KO system design so exciting. References Adler, Mortimer J. 1974. “The Circle of Learning.” In The New Encyclopædia Britannica in 30 Volumes 30. Chicago: Encyclopædia Britannica, 5–7. Auchter, Dorothy. 1999. “The Evolution of the Encyclopædia Britannica: From the Macropædia to Britannica Online.” Reference Services Review 27, no. 3: 291–299. Beghtol, Clare. 2010. “Classification Theory.” In Encyclopedia of Library and Information Sciences, 3rd ed., edited by Marcia J. Bates and Mary Niles Maack. Boca Raton, FL: CRC Press, 1045–1060. 150 Chiavaroli, Melissa. 2019. “Ditching Dewey: Take Your Collections from Enraging to Engaging and Position Your Library for 21st Century Success.” Public Library Quarterly 38, no. 2: 124–146. Flint, Robert. 1904. Philosophy as Scientia Scientiarum; and, A History of Classifications of the Sciences. New York: Charles Scribner’s. Harman, Donna. 2011. Information Retrieval Evaluation. San Rafael, CA: Morgan & Claypool. Hoiberg, Dale H., ed. 2010. The New Encyclopædia Britannica in 32 Volumes, 15th ed. Chicago: Encyclopædia Britannica. Hüllen, Werner. 2004. A History of Roget’s Thesaurus: Origins, Development, and Design. Oxford: Oxford University Press. Hüllen, Werner. 2009. “Dictionaries of Synonyms and Thesauri.” In The Oxford History of English Lexicography: Vol. II, Specialized Dictionaries, edited by A.P. Cowie. Oxford: Clarendon Press, 25–46. Kipfer, Barbara Ann, ed. 2019. Roget’s International Thesaurus, 8th ed. New York: Collins Reference. Loveland, Jeff. 2019. The European Encyclopedia: From 1650 to the Twenty-first Century. Cambridge: Cambridge University Press. McClintock, Robert. 1976. “Enkyklios Paideia: The Fifteenth Edition of the Encyclopædia Britannica: A Review.” Proceedings of the National Academy of Education 3: 179–216. McCracken, Samuel. 1976. “The Scandal of ‘Britannica 3,’” Commentary 61, no. 2: 63–68. Miksa, Francis L. 1998. The DDC, the Universe of Knowledge, and the Post-Modern Library. Albany, NY: Forest Press. Mitchell, Joan S., ed. 2011. Dewey Decimal Classification and Relative Index, 23rd ed. Dublin, OH: OCLC. Olson, Hope A. 1996. “Dewey Thinks Therefore He Is: The Epistemic Stance of Dewey and the DDC.” In Knowledge Organization and Change: Proceedings of the Fourth International ISKO Conference 15-18 July 1996 Washington, United States, edited by Rebecca Green. Advances in Knowledge Organization 5. Frankfurt/Main: Indeks, 302–312. Quinton, Anthony. 1974. “The Organisation of Knowledge.” Times Literary Supplement (May 17): 9–11. Reprinted in: Quinton, Anthony. 1982. Thoughts and Thinkers. London: Duckworth, 57–64. Richardson, Ernest Cushing. 1930. Classification: Theoretical and Practical, 3rd ed. New York: H.W. Wilson. Selinger, Suzanne. 1976. “Encyclopedic Guides to the Study of Ideas: A Review Article.” Library Quarterly 46, no. 4: 440–447. Whiteley, Sandy. 1992. “The Circle of Learning: Encyclopædia Britannica.” In Distinguished Classics of Reference Publishing, edited by James Rettig. Phoenix, AZ: Oryx Press, 77–88. Wiegand, Wayne. 1998. “The ‘Amherst Method’: The Origins of the Dewey Decimal Classification Scheme.” Libraries & Culture 33, no. 2: 175–194. Francisco-Javier García-Marco – Universidad de Zaragoza, Spain Fernando Galindo – Universidad de Zaragoza, Spain Pilar Lasala – Universidad de Zaragoza, Spain Joaquín López del Ramo – Universidad Rey Juan Carlos, Spain Advancing the Interoperability of the GLAM+ and Cultural Tourism Sectors through KOS Perspectives and Challenges Abstract: The possibilities and challenges of knowledge organization systems (KOS) to collaborate in the interconnection between the cultural heritage sector–galleries, libraries, archives, museums, publishers… (GLAM+)–and the increasingly important cultural tourism industry are explored, and a model for framing their interaction is proposed. Due to the diversity of KOS implied in GLAM+, this project is to be treated as an interoperability problem, thought a strong user-oriented purpose is also needed, based on a careful assessment of tourists’ segmentation and their needs. The main components of the model are five: the real phenomena that form up the potential world of interest, the universe of potential sources, a web taxonomy, a domain thesaurus and an interoperability hub. The St James’ Way is used as a source of examples. It is concluded that thesauri based on ISO 25964 offer a great potential for the simple, flexible, dynamic and distributed interconnection between the institutions of memory (GLAM+, research institutions in digital humanities and social sciences, transparency portals…) and the growing demand from the tourist sector for a more personalized and contextualized experience that can make a difference in an increasingly competitive international market. 1.0 Introduction: context and motivation It has become a common experience that web information is changing tourism and travelling, but a new phase is developing in the last years. There is increasing evidence of “a growing “bifurcation” between traditional online travellers, i.e., those who use the Internet for standard travel products, and those who are beginning to adopt alternative channels and products in search of deeper and more authentic experiences”, with the first market entering into a relative stagnation and the second offering new opportunities for combining different products (Xiang et al. 2015). Some experts have even identified a ‘cultural turn’ in tourism (Dabbage 2018a, 55-56; 2018b). In this context, relating the immaterial, material, artistic, bibliographical and archival heritage to competent touristic proposals and infrastructure to be used by people of different nationalities, languages and cultures is becoming a key strategical challenge for both sectors. On one hand, the tourism sector can profit from more contextualized, personalized and interesting information resources. On the other, Humanities and GLAM+ (acronym for galleries, libraries, archives and museums, and, in general, the heritage preserving institutions) can improve their visibility and relevance through an undisputable, applied and practical contribution to social and economic development. Rich references to cultural artefacts–both before, during and after the visit (in webpages, VR applications, QR code support, augmented reality…)–can help tourists to opt for a particular tour; decide future activities outside the standard ones, widening choices and benefiting the local tourism market; improve their cultural, educational and life experience; promote word-of-mouth recommendation; and enhance the acquisition 152 of cultural and historical lessons and knowledge. Such a vision seems a gain-gain one both for tourists, destinations, and tourism agents and organizations. Within this framework, this paper explores how experts from the field of knowledge organization and information architecture can collaborate on devising and proposing action lines to improve the feedback between both sectors (tourism and cultural heritage institutions). In a first stage, by facilitating the use of the huge amount of data and digital artefacts made available by the information and communication professionals, humanists and social scientists, which allow the contextualization, enrichment and personalization of the touristic experience in its relation to relevant cultural objects. Second and reciprocally, by looking for strategies to enhance the transfer of resources from the tourism industry to the field of basic research in the humanities and social sciences, promoting its economic sustainability by connecting it with its potential market uses. This last point seems specially important at a time when Humanities have become neglected by funding agencies because of the longstanding economic constrains following the 2008 crisis and the increasingly acrimonious cultural wars between globlal and identitarian political stakeholders. 2.0 Aims and research questions The overall intention of this paper is to explore how KO research and development can contribute to the interconnection of the institutions of memory (libraries, archives, museums, documentation centres, research institutions in digital humanities and social sciences, transparency portals…) and the needs of cultural tourists, an increasingly important industry (Fang 2020); and to develop a model that can contribute to frame and guide future research on the field. Specifically, six research questions were addressed: Which roles can the GLAM sector perform inside the digital information ecology of cultural tourism? Which are their implications for KO? How can the relation between the GLAM and cultural tourism sectors be modelled to reveal the central role of KOS in such interactions? What are the relevant characteristics of KOS used in the main GLAM subsectors? Are they interoperable with cultural tourism websites? On which terms? To answer these questions, in the following sections, the main agents and factors involved in the interoperability between the GLAM and cultural tourism sectors are identified and considered from the point of view of Knowledge Organization, that is, of the role that KOS might have in their successful interaction; and a model is proposed. Finally, the problems and opportunities for KO in this field are identified. 4.0 Analysis and discussion In the next sections, we will consider the relation among cultural tourism and the GLAM+ sector in the digital age as a potential information ecology. In such an ecology, three main kind of agents and artefacts can be identified: tourism information mediators and their web sites; cultural heritage institutions (GLAM+ sector) with their information sources and databases; and end users (tourists and travellers) with their needs, which are partially known but must be partially disclosed. In this emergent ecology, KO can contribute to close the gaps among their agents in three ways: modelling the connection among GLAM+ resources and user needs; identifying knowledge representation and organization technologies and methodologies that 153 can be useful to implement the model; and offering an operative model to start experimentation. Two contextual problems outside the model must be also stressed because of their current importance: communicational issues and legal concerns. 4.1 The information ecology of cultural tourism The concept of ‘information ecology’ allows the modelling of information systems that have not been purposely designed, in contrast to those compact and well-differentiated from their environment, like libraries, archives and information centres. An information ecology (Hubermann 2001; Shim and Lee 2006; Sebastiá 2008) can be defined as a network of information organisms interacting among them and with their environment to form a complex system. This concept is very useful to think about situations where different information agents co-exist, cooperate and compete to fulfil an information need. This is the typical situation in the Internet; and, in our case, what users interested in cultural tourism will experience when trying to solve their information needs. For example, Internet information on the St James’ Way is provided by a distributed network of independent agents: the non-governmental sector (associations, religious institutions, informal groups and individuals), private for-profit companies (travel services firms, publishing houses, consumer cooperatives…), and public institutions (council, national and regional governments…) (López del Ramo and García Marco 2018). Each of them has its own aims and provides specific information services to pilgrims and tourists, who, on their part, are also very segmented regarding their specific interests. 4.2 The tourist information mediators (providers and associations) Though GLAM+ institutions sometimes address proactively the information needs of tourists, relevant information is generally vehiculated through the tourism industry, and more precisely by those institutions and departments that are specialized in connecting tourists and destinations. This is mainly a marketing activity, and their more specialized agents are the so-called destination marketing organizations (DMOs): institutional, council, regional, national tourist information offices, and their background marketing departments. DMOs can be the main mediator agents between GLAM+ institutions and tourists, though online travel agencies (OTAs) and social media have been gaining prominence (Xiang et al. 2015). Of course, it should not be forgotten that there are big GALM+ institutions that have great DMOs inside them, e.g., big museums and galleries. Associations of experts and fans related to the protection and dissemination of cultural heritage are also another important sector of mediators in the field of cultural tourism. Internet has brought a revolution to marketing, and now even the smallest organizations can have an Internet global presence. From the point of view of tourist mediators, knowledge organization may have two different uses, one internal and the other external. The internal objective is related to knowledge management: the representation of knowledge into information; information preservation, retrieval and sharing; and the transformation of information into knowledge. For this, a corporative KOS is needed. 154 The external goal is to communicate the part of this information that is relevant to tourists, so that they become aware, transform it into knowledge and hopefully choose the proposed destinations for their travels. This is usually done through websites and social platforms, but also increasingly by more and more sophisticated mobile applications. In this regard, tourist information systems must be seen as integrated, with a core of organized information and a set of well-established distribution procedures. The main KO tools for these purposes are web taxonomies in the case of cultural tourism webs, and folksonomies and ad hoc taxonomies in the case of social networks and blogs. Of course, knowledge organization experts now that, for proper functioning, both aims must be connected into a successful knowledge organization system, though both of them should be efficaciously addressed and easily differentiated. In fact, both functions are more or less separated in many organizations: sometimes one is absorbed or displaced by the other; other times website and corporate information databases are not well communicated. Frequently, the corporate taxonomy becomes the structure of a tourist website, and, as a result, the website reflects more the ontology of the organization than that of tourists (López del Ramo and García Marco 2018) or is mainly productoriented; but there are also many well-designed sites from a user-oriented perspective (Table I). Table I. Main categories in the taxonomies of four St James’ Way websites Galicia government tourist-oriented website Castilia-Leon government website Spanish federation of associations website Travel agency Discover St James’ Way Get ready Advices On feet On bicycle On horse Advice Advices Trekking On bicycle Groups Plan your trip Equipment ‘Credencial’ ‘Turismo España’ GPS Ways Ways (each one) On the way ‘Correos’ (Sp. mail co.) Health information Recycling Services ‘Our’ services Santiago and Galicia Scriptorium History Knowl. and research Links Xunta de Galicia Contact Federation News Firm, our team, Contact News 4.3 The side of the information sources: GLAM, research centres and publishing The information sources side is quite complex. Any approach to integrate GLAM resources in tourist information sites requires enhancing interoperability among very diverse systems. Libraries are by far the more standardized GLAM subsector, though the gap is being closed very quickly because of the pressure for global and integrated access that Internet at the same time offers and requires. Their resource description and interchange standards are fully international and integrated (MARC21 family), and to a lesser grade also are their KOS, both systematic classifications (LCC, DC, UDC…) and alphabetic subject headings lists (LCSH, RAMEAU, EMBNE…). There has been even a strong work on integrating systematic and alphabetic KOS in libraries, some finished and other still 155 on course (LCC-LCSH, UDC and national subject heading systems in Europe…). Also, relevant mappings among competing systems are on course (e.g., Slavic 2011). These are strong points for the interoperability of the GLAM and cultural tourism sectors. But there are also weaknesses from the tourist or traveller perspective. Classifications frequently do not provide the level of specificity required to map sources to their needs; and topics are scattered among many classes, doing mapping projects really complicate. For example, there is no class in the UDC for St James’ Way. The classification of works about the St James’ Way is usually done in the Regional Geography class (913), and less in Routes, etc. (656.022) or in the many (and scattered) classes available for Travel (e.g., 338.48‑12, 656.022.33), as transport and tourism are in different trees. For the purpose of adding specificity, this selected class is frequently faceted in Spain by the two main countries: (44+460), that is, France and Spain (to denote the “Camino Francés”), though the actual countries may change (e.g. Portugal). The National Library of Spain has noted the lack of specificity of the expression, and its librarians have usually added “Camino de Santiago” between the brackets and after the country codes. Subject headings are usually much more specific. Though the exact subject heading is not present in many international systems, it can be easily recognized in the strings: EMBNE: Peregrinaciones cristianas — Santiago de Compostela LCSH: (Christian pilgrims and pilgrimages-Spain-Santiago de Compostela) RAMEAU: [Pèlerinages chrétiens-Espagne-Saint-Jacques-de-Compostelle (Espagne)] Therefore, there is a strong potential for an easy interconnection of libraries and cultural tourism websites. But a serious problem stands in the way: practical travel decisions require data, not systematized knowledge, and there is no easy method to transform books and other complex library materials into high quality data without the cooperation of the research, publishing and media sectors, as it will be discussed later. Big museums, which rely on very analytical databases for the control of their collections and operations, are at the forefront of providing semantic data that can be linked by external sites. This data provides factual, contextual and bibliographical information on pieces of art and craftmanship and reproductions that can be inserted in websites and A/VR applications. The CIDOC CRM ontology, originated in the museum field, provides a full frame for the interoperability of other controlled vocabularies and is being influential in the GLAM sector. Museums have also sound KOS: AAT, Iconclass, etc. Archives, their fonds and documents are rarely used by tourists and their mediators in a direct way. There are interesting projects linking historical documents to heritage sites, monuments and pieces of art, but this kind of information is usually secondary for tourists: they will use it only out of curiosity and not for practical decisions. Archives are also very idiosyncratic in their intellectual organization, as their KO leitmotiv is the principle of provenance and respect for the original order, which is complemented by the macrolevel organizations based on function to allow the navigation of the multiple changes that organizations suffer along their life. Anyway, authorities have grown increasingly important for the multilevel description in archives, and there is a strong movement to make library, museum and archival headings compatible. Outside GLAM but closely related, there is a huge industry that is also fully involved in knowledge preservation: the publishing sector. Though a first sight could suggest that 156 their role is incorporated through the library network, editorial organizations have a huge task ahead that is crucial for the successful interoperability among the cultural and scientific sectors: adapting the publishing practices to the promises and requisites of the semantic web and big data revolutions. Cultural tourists have two broad categories of information needs: a) gaining broad perspectives for travel planning and background knowledge development, and for this purpose books and journals are totally suitable; b) acquiring and accessing data throughout different sources, and it is here were the digitally-enhanced user has problems. Most books are oriented to the first need; and, as they are not usually automatically searchable and navigable through library systems, a big gap is opened between both worlds. Semantic web offers the technology to bridge this gap, but editions and library catalogues must be transformed to fulfil this promise. What it has been said about the ‘traditional’ publishing sector must be applied to the current multimedia environment, including audio-visual industries and videogaming. Other emerging ones, like augmented and virtual reality, use semantic technologies by default. Strictly, the GLAM acronym does not include the producers of information; and, as a great part of the task of producing semantic- and data-supporting documents rely on them, it is suggested that an extended superset should be considered to include publishers and other media producers, e.g. GLAM+. 4.4 The tourist side: end users first? It has been observed that “huge discrepancies exist between the domain ontology derived from tourism Web sites and the one emerging from user queries”, mainly because the tourism industry uses a very specific terminology and categorization that is not simply connected with the terms that users actually search, and because, apart from a group of overrepresented categories, the “the overall domain is extremely rich and largely idiosyncratic […] with numerous destination specifics” (Xiang et al. 2009). Though consumer generated contents – e.g., blogs and reviews (Gretzel, Hwang, and Fesenmaier 2006) – can be used to learn about the language that travellers use, filling the terminological gap; there are problems that do not seem easy to solve, especially those derived from the intrinsic limitations of attention, processing memory, interface space, and the representation of a really complex domain. Problems increase when taking into account that cultural tourism is a huge sector and that tourists themselves are indeed very segmented. Only in the field of cultural tourism, McKercher and Du Cross (2002) identified five types of tourists: purposeful, sightseeing, serendipitous, casual and incidental; and Csapó (2012) classified seven different kinds of cultural tourism products for them: heritage tourism; cultural thematic routes; cultural city tourism, cultural tours; traditions, ethnic tourism; event and festival tourism; religious tourism, pilgrimage routes; and creative culture, creative tourism. On the other hand, this gap between the overwhelming worlds of information and tourists is a very interesting challenge that has attracted a lot of talent. In particular, modelling users through ‘ontological’ profiles, based on types and/or personality traits, is an effort that has been on course for more than a decade and now constitutes the established research topic of tourism recommendation systems (Grün, Neidhardt, and Werthner 2017). However, GLAM+ connection requires specific typologies for both 157 tourists and sources, based on their information needs: broad ones–planning, and connecting and developing their mind maps–and specific ones–solving problems with relevant data, and navigating among sources in search of unanswered questions–. In this situation, mediators and users can work through general-purpose search sytems like Google, or with the source providers’ KOS. In the first case, users will obtain relevant selections, but not necessarily very precise when leaving the first results, neither exhaustive or filtered by source. With the second approach, these aims could be better served, but a difficult work of knowledge organization engineering will lie ahead. The first evidence gained in any experience in KOS/GLAM+ interoperability for cultural tourism is that it is no easy task, specially when leaving the basic operations and data and getting into the realm of learning, sense-making and culture. 4.5 Bridging the gap (1): towards an ontological and epistemological model Thus, can a model be developed to integrate the different information needs, structures and finally perspectives of GLAM+, DMOs and tourists so that a truly cultural tourism information ecosystem may be born? In our opinion, any proposal has to take in account four different layers: the cultural world (artefacts, persons and organizations, sites, abstract realities), counting on the KOS that are already functioning in the GLAM sector; cultural tourism ‘science’ and operation, as expressed in some KOS, like the important WTO thesaurus (World Tourism Organization 2001), in the tourism website taxonomies and some prospective ontologies that are being developed (Li, Buhalis, and Zhang 2013); and the last one for representing the user. To this last aim, Maslow’s (1954) pyramid of human needs seems especially well suited to frame the classification and interrelation of so different layers as basic needs (accommodation, eating and drinking, security…), social ones (identity, relation…) and self-actualization (sense-making, high culture, wisdom-building…). 4.6 Bridging the gap (2): available standards and technologies To model such a KO ecology, fully developed technologies and standards are available both in the field of KOS and in the semantic web realm. Regarding networking knowledge organization systems (KOS), thesauri offer a great potential for the simple, flexible, dynamic and distributed interconnection between the institutions of memory, especially after their relaunch with the new ISO 25964 standard (Aitchison and Dextre Clarke 2004; Dextre Clarke 2012). In particular, Part 2 of ISO 25964 deals with the interoperability of all these different KOS implied in the information ecology of cultural tourism, and proposes specific mapping models and specific devices (=EQ, ~EQ, +, |, Bm, NM, RM). In our project, we are experimenting with a hub architecture, but a lot of system-to-system projects are on course, so the environment is changing quickly. Dextre Clarke (2011) has made a very clear diagnostic of what are the expected results of mapping thesauri and the main kinds of KOS, and Soergel (2010) has proposed a clear conceptual framework that also considers facet-based search. In the particular case of cultural tourism, more positive results can be expected from interoperability with authority and subject lists, because they accurately represent cultural artefacts (through titles), persons and places; though the problems will persist for higher level concepts. 158 On the part of the semantic web research, W3C has by now greatly completed the standards that deal with its lower and middle layers; thesauri have been successfully expressed in RDF, OWL and SKOS; and tourism recommender systems based on semantic web and ontologies have become one the main research fronts in information science and tourism: “Recommender systems based on semantic web and ontology technologies are an effective method and tool to improve the quality of internet service through personalization and customization.” (Li, Buhalis, and Zhang 2013). 4.7 Bridging the gap (3): a methodology KO interoperability is difficult when working with already well-seasoned KOS, but it becomes a mess when trying to connect ad hoc web taxonomies and folksonomies– on the side of DMOs and other traveller-oriented websites–with GLAM+ KOS on the other. In a current project on the Aragonian part of the St James’ Way, we have divided our work in two lines of action. On one hand, the different websites are being classified in homogeneous groups and their taxonomies studied: council, regional and national DMOs, mainly from the public sector; pilgrim associations; and business providing information (publishing houses, consumer cooperatives, hotel chains, transport firms…). In this way, it is possible to obtain some sort of a ‘least common multiple’ of all these empirical taxonomies, including all their concepts without repeating them. Equivalent terms, which are extremely important in the “language of tourism” (Xiang, Gretzel, and Fesenmaier 2009), are then controlled, so they can be used to resolve searches; taxonomy-incompatible polyhierarchical relations are resolved into BT/NT and RT; and the relations among the taxonomy categories (now thesaurus concepts) are expressed through RT. As a result, a special kind of thesaurus emerges, which could be called a Common Compact Taxonomical Thesaurus (CCTT). Each of its concepts is to be a node (page) in a trial CMS-supported website. Thinking in future interoperability, a SPARQL point should be developed, with an independent system behind to manage prospective and external KOS links. On the other hand, the most used KOS and knowledge representation systems in the different GLAM+ sectors are being studied to explore their potential interoperability. Prospective analysis has shown that mapping each KOS to the others would be a difficult, uncertain and huge task; and that it is also unfeasible to take one of them as a hub. Therefore, the idea is to connect the subsets of them that offer greater potentiality (because they are related to the concepts of the CCTT) to the nodes of the website. Though the best solution for actual interoperability would be to establish an independent KOS hub, for the moment we intent to build a trial DMO website that will incorporate the pilot CCTT as an extended taxonomy; and then link the external resources to each node of the CMS-based website, based on a case-by-case analysis of their correspondences with the controlled tags in the sources. Complementarily, relevant Wikidata nodes and their relations could be mined to produce a set of mappings. All these steps are based on a presupposition: that DMOs KOS really express user needs, but this extreme should be further researched, because actual evidence is limited. This gap can only be filled in the future by doing user studies: both subjective (satisfaction surveys, etc.) and objective (eye tracking, log analysis, search analysis…). Certainly, the strong need to connect both subdisciplines of information science (i.e., KO 159 and user studies) becomes even greater when approaching multidisciplinary and multiplatform fields, like cultural tourism and GLAM+ integration. Only in this way, the whole cycle of evidence-based KOS design and research could be effectively closed. 4.8 Beyond KOS and linked open data: legal concerns and communication issues Although the paper is focused on the analysis of networking knowledge organization systems, there are other aspects that are essential for their operations in the real world: in particular, communicational issues (communication security, effectiveness of the informational message) and legal concerns (privacy, data protection and intellectual property, basically), especially considering the problems created by the massive data processing that is becoming inherent to the Internet, which requires the use of algorithms that respect the legislation and standards on human rights in the use of open data. 5.0 Conclusion In this paper, it has been shown that KO, cultural tourism, GLAM+ and humanities can develop a fruitful alliance as disciplines. In the digital realm, cultural tourism needs humanities to support sound and personalized products, incorporating GLAM+ resources and data to enhance tourists’ experiences. To integrate all this information, it has a networking KO gap. On the other hand, cultural tourism is a fascinating subject for KO. First, it is a truly transdisciplinary field, while far from compact. Second, it brings forward the problem of end users’ segmentation with their different information needs, and therefore diverse conceptual maps. In practice, a KO-enhanced operative model for the networking of GLAM and cultural tourism websites has been outlined, generalizing from an on-going project on the St. James’ Way. As a conclusion, it can be affirmed that networking KOS–in particular thesauri following its relaunch after the new ISO 25964 standard (Aitchison and Dextre Clarke 2004; Dextre Clarke 2012)–offer a great potential for a simple, flexible, dynamic and distributed interconnection between the institutions of memory (libraries, archives, museums, documentation centres, research institutions in digital humanities and social sciences, transparency portals…) and the growing demand of the tourism sector for a more and more personalized and contextualized experience that can make a difference in an increasingly competitive market (Abrahams and Dai 2005). Acknowledgments This research work was carried within the CSO2015-65448-R (MINECO/FEDER) project. We are also grateful to the anonymous referees for their suggestions. References Abrahams, Brooke and Wei Dai. 2005. “Architecture for Automated Annotation and Ontology Based Querying of Semantic Web Resources.” In 2005 IEEE/WIC/ACM International Conference on Web Intelligence, Proceedings, 413-417. Aitchison, Jean and Stella Dextre Clarke. 2004. “The Thesaurus A Historical Viewpoint, with a Look to the Future.” Cataloging & Classification Quarterly 37: 5-21. Csapó, Janos. 2012. “The Role and Importance of Cultural Tourism in Modern Tourism Industry.” In Strategies for Tourism Industry - Micro and Macro Perspectives, edited by Murat Kasimoglu and Handan Aydin. London: IntechOpen, 201-232. 160 Debbage, Keith. 2018a. “Economic Geographies of Tourism: A Critical and Contested Discourse.” In The SAGE Handbook of Tourism Management. London: SAGE, 53-68 Debbage, Keith. 2018b. “Economic Geographies of Tourism: One ‘Turn’ Leads to Another.” Tourism Geographies 22: 347-353. Dextre Clarke, Stella G. 2011. “In Pursuit of Interoperability: Can We Standardize Mapping Types?” In Concepts in Context – Cologne Conference on Interoperability and Semantics in Knowledge Organization; held 19-20 July 2010; Cologne, Germany, edited by F. Boteram, W. Goedert, and J. Hubrich. Würzburg: Ergon Verlag. Dextre Clarke, Stella G. 2012. “ISO 25964: A Standard in Support of KOS Interoperability.” In Facets of Knowledge Organization; 4-5 July 2011; London, ed. Alan Gilchrist and Judi Vernau. London: Emerald, 129-134. Fang, Wei Tang. 2020. Tourism in Emerging Economies. Singapore: Springer. Gretzel, Ulrike, Yeong-Heyon Hwang, and Daniel R. Fesenmaier. 2006. “A Behavioural Framework for Destination Recommendation Systems Design.” In Destination Recommendation Systems: Behavioural Foundations and Applications, edited by D. R. Fesenmaier, K. Wöber and H.Werthner. Wallingford, UK: CABI, 53-64. Grün, Christoph, Julia Neidhardt, and Hannes Werthner. 2017. “Ontology-Based Matchmaking to Provide Personalized Recommendations for Tourists.” In Information and Communication Technologies in Tourism 2017, edited by R. Schegg and B. Stangl B. Cham.: Springer, 3-16. Huberman, B. A. 2001. The Laws of the Web: Patterns in the Ecology of Information. Cambridge, Mass.: MIT Press. ISO. 2011. ISO 25964-1:2011. Information and Documentation. Thesauri and Interoperability with Other Vocabularies. Part 1: Thesauri for Information Retrieval. Geneva: ISO. ISO. 2013. ISO 25964-2:2013. Information and Documentation. Thesauri and Interoperability with Other Vocabularies. Part 2: Interoperability with Other Vocabularies. Geneva: ISO. Li, Nao, Dimitrios Buhalis, and Lingyun Zhang. 2013. “Interdisciplinary Research on Information Science and Tourism.” In Information and Communication Technologies in Tourism 2013, edited by L. Cantoni and Z. Xiang. Berlin, Heidelberg: Springer, 302-313. López del Ramo, Joaquín and Francisco Javier García Marco. 2018. “El Camino de Santiago en los sitios web de las Comunidades Autónomas: análisis del Contenido, orientación y encuadres temáticos predominantes.” Revista General de Información y Documentación 28: 703-26. Maslow, Abraham H. 1954. Motivation and Personality. New York: Harper. McKercher, B. and Hilary C. Du Cross. 2002. Cultural Tourism: The Partnership Between Tourism and Cultural Heritage Management. New York: Hayworth Hospitality Press. Sebastiá, Montserrat. 2008. “La Ecología de la Información: Un Nuevo Paradigma de la Infoesfera.” Pliegos de Yuste 7-8: 24-36. Shim, Seonyoung and Byungtae Lee. 2006. “Evolution of Portals and Stability of Information Ecology on the Web.” In Proceedings of the 8th International Conference on Electronic Commerce. New York: ACM, 584-588. Slavic, Aida. 2011. “Classification Revisited: A Web of Knowledge.” In Innovations in Information Retrieval: Perspectives for Theory and Practice, Eds. Allen Foster and Pauline Rafferty. London: Facet, 23-48 Soergel, Dagobert. 2010. “Conceptual Foundations for Semantic Mapping and Semantic Search.” In Cologne Conference on Interoperability and Semantics in Knowledge Organization. World Tourism Organization. 2001. Thesaurus on Tourism & Leisure Activities (Trilingual: English, French, Spanish). Madrid: World Tourism Organization. Xiang, Zheng, Dan Wang, Joseph T. O’Leary, and Daniel R. Fesenmaier. 2015. “Adapting to the Internet: Trends in Travelers’ Use of the Web for Trip Planning.” Journal of Travel Research 54: 511–27. Xiang, Zheng, Ulrike Gretzel, and Daniel R. Fesenmaier. 2009. “Semantic Representation of Tourism on the Internet.” Journal of Travel Research 47: 440–53. Ann M. Graf – Simmons University, United States Domain Analysis of Graffiti Art Documentation A Methodological Approach Abstract: Details are presented of a recent research project undertaken to ascertain the documentary and descriptive practices associated with graffiti artwork from within the graffiti art community as evidenced by 241 graffiti websites. Domain analytic methodologies following a pragmatic approach to knowledge organization and using evidence obtained from within an artistic community are extremely useful ways to provide insight into what are the most important facets of information to capture for works not often documented from within libraries, archives, and museums. This paper will discuss various methods used to analyze community-driven graffiti art collection, organization, and description in the online environment, the results of which form a part of the basis upon which a faceted KOS can be built. 1.0 Introduction Knowledge organization systems (KOS) in use for the documentation and description of artworks have a respectable, if shorter, history compared to those used in libraries (Urban 2014). This is often understood to be due to the fact that museums are most often collecting and creating surrogate records for unique objects (Taylor 1999). Unlike the library, the art museum is representing objects that would not benefit from shared cataloging practice, though this is changing as images of artworks are increasingly available online and users desire to have access to them regardless of where they are physically located. For certain types of art that often fall outside the purview of the formal institution, such as graffiti art, documentation and organization of the resulting image records is carried out largely by the graffiti art community itself, including graffiti artists and enthusiasts. This is due to the extra-institutional nature of the artworks, the legal complications often surrounding their creation, and the inability to monetize the works when found “on the street” (Schacter 2014). Interest in the artworks continues to rise, despite these documentary challenges, evidenced by the large number of websites dedicated to preserving the images of the works around the world. While KOSs in popular use in the library, archives, and museum environment do not include granular terminology to address the many facets of graffiti art, at least one, the Getty Art and Architecture Thesaurus (AAT), has recently added a limited number of graffiti art descriptive terms. This paper introduces research conducted on the state of descriptive and organizational practice applied to collections of graffiti art images in websites from around the world. The data for analysis comes from a set of 241 graffiti websites. This stage of the research will report on the categories – or facets – used to organize the image galleries themselves. This research does not explore description as applied to individual images. Knowledge of organizational practice among those documenting the art form in image galleries online is foundational to understanding the implicit categories, vocabulary, and details associated with a particular artistic tradition and may inform further research on documentation and organization of outsider art. It is very hard to pin down a community, to draw lines around those within and those without. In the case of graffiti art, with its intersecting boundaries of what is legal and what is not, and what is defined by some as true graffiti, and by others as vandalism, 162 and yet others as the more sanitized term street art, it can be impossible to find consensus. Should the artists themselves be the ones to say what they are doing, how they are doing it, and how it is described, documented, and organized? Are those who simply love the often bright, complex, and (sometimes) publicly placed works allowed to have a say? Do those who actively look for the works and photograph them, sharing them online in large galleries, belong to this community? Because graffiti image galleries online are very often cooperative endeavors, whether knowingly or not, willingly or not, of images submitted by artists who created the works, photographers who stumbled upon them, webmasters or social media mavens who enjoy them, and any combination of these and more, the easiest way to begin this research was to use the evidence of the collections themselves. It is acknowledged from the start that this is a messy endeavor, trying to decide who is acting and what role they may play in the organization of graffiti art online. While few authors have addressed the challenges of documenting graffiti art specifically (see Masilamani 2008 and Gottlieb 2008 for two of the more robust examples), the process is actively taking place in a very broadly distributed fashion, each participant seemingly acting independently of the others. Despite their autonomy, those doing the documentation in this research represent a community or a domain in that they share “an ontological base that reveals an underlying teleology, a set of common hypotheses, epistemological consensus of methodological approaches, and social semantics” (Smiraglia 2012, 114). This work fills a gap in the research by illuminating the facets for organization of graffiti art images as used by those working to share large collections of the works online. The value of domain analysis as a research tool is well documented in the knowledge organization literature (Hjørland and Albrechtsen 1995, Hjørland 2002, Smiraglia 2015, Albrechtsen 2015), as well as facet analysis (Hjørland 2013, Cho et al. 2018, Campbell 2004). There are no domain analyses that examine modern graffiti art image documentation and the facets used as attributes to organize the images. Research reported by Graf (2016) revealed the lack of graffiti art-related terminology available in the AAT. Only three out of the twenty most often used graffiti terms from her analysis of graffiti zines appeared in the AAT. Interestingly, within two years, eleven more of the same twenty terms were added to the AAT, bringing the original percentage from 15% to 70%. This indicates the influence of the graffiti art community and their practices, and the reporting of research on those practices, on widely used professional tools for the documentation and organization of artworks. Further granularity can be found in this current research for those desiring to extend the available terminological offerings for graffiti art documentation. 2.0 Methodology As preliminary research in this area, the first step was to decide exactly what would be examined. There are many websites devoted to the documentation of graffiti and street art. One of the very first and most well known of these is Art Crimes ( The About page on their website states that “Art Crimes was the first graffiti site on the net, and we're still one of the biggest …” (Art Crimes 2020). As an early and large graffiti art website, Art Crimes has gathered links to numerous other graffiti and street art websites around the world. At the time of the research, Art Crimes included a 163 list of 709 links to other sites. This list was used as the basis for the eventual set of 241 websites evaluated. Each of the 709 links on Art Crimes was visited and a judgement was made on whether or not to include the site in the study based on several criteria. 318 of the links were either dead, empty, or presented a notification that the site had moved without providing forwarding information. 64 sites were fully in languages other than English and therefore eliminated from the study, though some sites that were kept employed other languages but kept navigation labels for the site in English. 57 of the sites were professional artists’ sites, not specifically galleries of graffiti or street art images. 20 of the sites were not relevant because they were focused on music, advertising, or other products or services. Eight of the sites were links to social media galleries, such as Flickr or Instagram. These were not included in this study at this time because of the organizational confines of social media platforms. Each social media platform includes specific ways that uploaded images can be labeled, organized, and grouped. There exist thousands of graffiti and street art image galleries on social media platforms, and they are ripe for further investigation, but were considered outside the purview of this research. One site among the 709 was not an independent website, but rather a sub-page of the Art Crimes website itself, and therefore eliminated, though Art Crimes itself remained in the study. After all of the sites were evaluated in this way, 241 live sites remained. Each site was evaluated for structural elements of pages and sub-pages, indicated by navigation labels and hotlinked text. Examples of navigation labels can be easily seen across the top of the webpage banner in Figure 1 for the site 50mm Los Angeles ( These navigation labels include: Gallery, Articles, Events, L.A. Legends, Blackbook, Links, Forum, About Us, and Submit an Event. There is also hotlinked text, “login | register”, that leads to other sub-pages of the website. All of these labels were entered into a QDA Miner database for each of the 241 websites. Figure 1. 50mm Los Angeles website home page with navigation labels. Each of the sub-pages accessed through the navigation labels and hotlinked text were visited and each was evaluated for evidence of further navigation labels, or sub-divisions of organization. Some websites had very shallow organization with only a couple levels, while others were deeper structures with several levels. An example of a website with deeper structure is Fatcap ( The home page of this website includes a navigation label for “pictures.” Hovering the mouse over this label gives the user several sub-levels from which to choose: all pictures, worldwide graffiti, artists, crews, types, supports, and styles. Clicking on “worldwide graffiti” takes the user to a new sub-page 164 that includes links to 117 deeper sub-pages, arranged by larger geographic regions: Africa, Asia, North America, South America, Europe, and Oceania. Clicking further, on “United States” for example, takes the user to a list of 40 states and the District of Columbia, each with from 1 to 69 individual cities linked as sub-pages to go even deeper. Other websites only had a couple levels, but then divided a single level into hundreds of sub-pages. An example of a sub-page in a shallower organizational structure with numerous organizational divisions, again from the website 50mm Los Angeles, is shown in Figure 2. Only part of the screen is visible in this image, but the organization of the image gallery is divided into an alphabetical, hotlinked list of navigation labels, some representing artist names (pseudonyms), locations, and styles, etc. Clicking on any of the hotlinked labels in this expansive list will take the user to a gallery of graffiti and street art images with works that have something to do with the label. Each of these subpages used as organization for image galleries was also entered into the QDA Miner database for further evaluation, as will be explained in detail below. Figure 2. Gallery page of 50mm Los Angeles website with links to individual galleries. Once all 241 websites were visited and all navigation labels were entered into QDA Miner for all levels of organization, each individual label was coded to indicate what type of organization was indicated. The coding developed as the analysis proceeded. Six broad categories of codes, or facets, evolved during the research, two of which focused 165 on the websites themselves, and four of which focus on the artwork images on the websites. The two categories of codes that apply to the websites themselves include Sites and Other Media. The Sites category includes navigation labels that refer to aspects of the websites, how users can interact with the websites, and other information related to shopping, subscribing, and other graffiti and street art-related information accessed within the websites. They do not concern description of graffiti and street art images associated with the image galleries on the websites. The Sites codes include: About, Contact, ContributeFlix, Disclaimer, FAQ, Forum, Glossary, Guestbook, History, HowTo, Interviews, Map, MyAccount, Poll, Shop, Subscribe, and Videos. This category of codes relates to the structure, navigation, and use of the website in general. The second of the two categories not concerned with description of graffiti and street art images is the Other Media codes. This category of code was applied to navigation labels that linked to a blog or social media account associated with a website, such as an Instagram, Facebook, or Flickr account, or to a list of links to other graffiti or street art sites, or other associated media located outside the websites studied. The remaining four categories of codes are the focus of the research reported herein. These categories were used to describe graffiti and street art images themselves and include General, Types, Supports, and Locations. Each of these four works-based categories will be described in greater detail and will provide insight into the documentation, description, and organization practices of the graffiti art community, which includes artists, photographers, and various enthusiasts as described earlier. 3.0 Findings Each of the code categories is divided into sub-categories, which reflect aspects of description for graffiti works. The first of these is the General category, which is divided into 17 codes, or facets, as shown in Table 1. In each of the code tables, the name of the code is given first, followed by how many times that code was applied over all 241 websites. The third column indicates the percentage of all codes applied. The fourth column indicates how many of the 241 sites earned that code at least once, followed in the last column by the percentage of all sites that used that code at least once. Some of the websites earned the same coding in multiple places on the site, which accounts for the sometimes very large number of individual codes, like Artist. Whenever the name of an artist was used as a way to organize a gallery of images, that label of the artist’s name was coded as Artist. As evident in Figure 2, some websites included hundreds of individual artist’s names. Each of the tables lists codes in order of the percentage of sites making use of the code at least once, providing a type of ranking for the popularity of an aspect of organization across all sites. This also avoids the skewing effect of using each code instance as a popularity measure instead. While the month of a work was applied 35 individual times, putting it fifth in terms of instances, it was seen on only 2.1 percent of all sites, or 5 sites, which indicates it was 13th out of 17 in popularity. Some of the General codes reflect common aspects of traditional art documentation, such as the use of an artist’s name or the year. Others reflect affordances of an online gallery, such as New, Color, Featured, RatedHigh, and Old. A very interesting aspect of many of the General codes is their specific applicability to graffiti art. This is evident in 166 codes such as Gallery (used to indicate when a work was in a gallery and not in a traditional graffiti location), RIP (used for commemorative pieces in honor of a graffiti artist who has died), Legal, Outside, and Illegal. Most graffiti is assumed to be illegal, but there are also legal walls where graffiti is allowed. It makes sense to include organization for legal works, as they are created under very different circumstances than illegal ones. It doesn’t make as much sense to offer organization specifically for illegal works, and this code was applied on only 2 sites, compared with 7 sites that earned the Legal code. Table 1. General codes and their usage across all sites. General Codes Count % of Codes # of Sites % of Sites Artist 14439 71.2 50 20.7 Event 89 0.4 31 12.9 Gallery 49 0.2 29 12.0 Year 227 1.1 27 11.2 New 35 0.2 26 10.8 Old 35 0.2 26 10.8 Featured 27 0.1 20 8.3 Inside 11 0.1 10 4.1 RIP 75 0.4 10 4.1 RatedHigh 14 0.1 8 3.3 Legal 15 0.1 7 2.9 Outside 7 0.0 7 2.9 Month 35 0.2 5 2.1 Color 12 0.1 4 1.7 Day 5 0.0 4 1.7 Decade 8 0.0 4 1.7 Illegal 5 0.0 2 0.8 The next category of work-related codes are the Support codes. These codes were applied when organizing by the surface upon which the artwork was created or placed. One distinction in this group of codes is the Canvas code, applied here to works produced in a studio. One-third of all sites earned this code, reflecting the difference in perception of graffiti-style artworks committed on canvas as opposed to walls, trains, or other publicly accessible spaces. The use of the street is important to the notion of graffiti and street art (Austin 2010, Riggle 2010). Painting on canvas is often seen as a desire for profit, a safe way to make art in the comfort of a studio, or a type of selling out of the art form (Jacobson 2017). This conception is common enough that a relatively large number of sites used this type of organizational label to separate out works made in a studio from those made on the streets. Table 2. Support codes and their usage across all sites. Support Codes Count % of Codes # of Sites % of Sites Canvas 109 0.6 77 32 Walls 107 0.5 65 27 Trains 253 1.2 51 21.2 167 Blackbook 28 0.1 20 8.3 Freights 27 0.1 16 6.6 CarsTrucksVans 28 0.1 12 5.0 Subways 81 0.4 11 4.6 Billboards 10 0.0 5 2.1 Body 5 0.0 4 1.7 Clothing 11 0.1 4 1.7 Rooftops 4 0.0 4 1.7 Tunnels 5 0.0 4 1.7 Subway Cars 60 0.3 3 1.2 Buses 3 0.0 2 0.8 Highways 2 0.0 2 0.8 Signs 2 0.0 2 0.8 Skate Deck 2 0.0 2 0.8 Trash Bins 2 0.0 2 0.8 Shutters 2 0.0 2 0.8 One of the most interesting categories of codes is that devoted to types of art. This category is rich with terminology, much of it specific to graffiti art. It is also the largest of the code categories, with 31 individual codes. Many of the codes are familiar terms that could be associated with more traditional art forms, such as Sketches, Murals, Stencils, Posters, and Political. Many others have specific meaning within the graffiti art community, such as Tags, Pieces, Bombs, Throwups, Productions, TrainWholecars, TrainEtoEs (end-to-ends), TrainTtoBs (top-to-bottoms), Wheatpaste, and Wildstyle. Table 3. Type codes and their usage across all sites. Type Codes Count % of Codes # of Sites % of Sites Sketches 74 0.4 56 23.2 Graffiti 75 0.4 50 20.7 Other 184 1.1 43 18.3 CommercialDesign 63 0.3 43 17.8 StreetArt 38 0.2 35 14.5 Murals 39 0.2 32 13.3 Tags 22 0.1 17 7.1 3D 18 0.1 16 6.6 Characters 59 0.3 15 6.2 Pieces 37 0.2 15 6.2 Stencils 20 0.1 13 5.4 Bombs 13 0.1 12 5.0 Throwups 16 0.1 12 5.0 Letters 19 0.1 10 4.1 Productions 12 0.1 10 4.1 Stickers 14 0.1 10 4.1 Digital 8 0.0 8 3.3 TrainWholecars 13 0.1 8 3.3 168 Action 6 0.0 6 2.5 Posters 9 0.0 5 2.1 SprayPaint 4 0.0 4 1.7 Wheatpaste 4 0.0 4 1.7 Political 3 0.0 3 1.2 Projections 3 0.0 3 1.2 TrainEtoEs 5 0.0 3 1.2 Collaborations 3 0.0 3 1.2 TrainPanels 3 0.0 3 1.2 Silvers 2 0.0 2 0.8 TrainTtoBs 2 0.0 2 0.8 Wildstyle 3 0.0 2 0.8 Handstyle 2 0.0 2 0.8 The last category of work-related codes is the Location codes. Location can be considered a common attribute to document for most traditional artworks, but it holds special significance in the graffiti art community. Graffiti art styles are passed on from older, more established writers to younger ones, and graffiti writers will often “bite” or copy work they admire by others. Styles can be associated with geographic locations around the world as well as with individual artists. Having as precise a location as possible for an individual work is desirable for those wishing to see the work in person as well as for those who research the art style and its evolution across time and space. The value of location information is counterbalanced by the desire of artists acting illegally to remain anonymous and not leave a trail by which they can be tracked by law enforcement. This tension is evident in the very consistent lack of precise location information available across all 241 websites. The most commonly employed level of geographic location information was by city, followed closely by country. Only one site got close enough to mention a street address, while another once mentioned an intersection of streets. Thirteen sites referenced location via specific landmarks that might be recognizable to some familiar with the next level up in the location hierarchy, such as city name. Parts of cities were also used by thirteen sites. This would include mention of a specific borough of New York City, or a cardinal direction employed with a city name, such as East L.A. Many sites employed numerous levels of geographic faceting, starting by continents or countries, and working down through specific states and cities. A number of websites were geographically focused and indicated works from countries outside their focus with a gallery for World graffiti, a type of “other” code. Table 4. Location codes and their usage across all sites. Location Codes Count % of Codes # of Sites % of Sites Cities 1637 8.6 43 17.8 Countries 543 2.8 37 15.8 SpecificLandmarks 73 0.4 13 5.8 CityParts 94 0.5 13 5.4 World 22 0.1 12 5.0 169 Continents 42 0.2 11 4.6 States 117 0.6 6 2.5 CountryParts 10 0.0 5 2.1 Address 2 0.0 1 0.4 Intersection 27 0.1 1 0.4 Undisclosed 2 0.0 1 0.4 4.0 Conclusion and Further Research This research has analyzed the organizational facets employed across a large number of websites that share graffiti art images and how they are further broken down. The work is the first of its kind to describe the results of efforts by a distributed group of image collection managers to organize graffiti images online. It is valuable in that it provides insight into the facets of organization in use around the world to categorize artworks not often collected or documented by traditional art or cultural heritage institutions. The methodology employed has been fruitful for description of current organizational practice. Several factors intrinsic to the art form complicate the ability to effectively use traditional methods of documentation for works of art, including in large part the often illegal nature of creating graffiti and street art. Legal issues often contribute to the obfuscation of common aspects of art documentation, such as where works are, the identities of those who created them, and dates for creation, change, and destruction of works. Commonalities in practice are easily seen from this research, but there is a lot more that can be gained by further study. Knowing how these works are organized right now is only half of the equation that could lead to development of systems to serve not only those who maintain these diverse collections, but the many users who approach them as well. Further research into the needs and desires of such collection users could fill in other missing facets that may be extremely useful. While conducting this research, it was found that a number of the websites studied were very well developed. Nineteen of the 241 sites had very large collections, employed notably consistent, very granular use of facets, were active and adding new images, and included clear and in-depth information about the sites themselves. These 19 sites have been noted for further study. Much more analysis was carried out on the websites and interviews of website curators were also conducted, adding still more valuable information about why certain facets and terminology are used. These additional details, in combination with study from the user perspective, would add to the growing amount of information that could be used to design systems for the documentation and organization of graffiti art and street art, as well as other types of found art and ephemera. References Albrechtsen, Hanne. 2015. “This is Not Domain Analysis.” Knowledge Organization 42: 557-61. Art Crimes. 2020. About Art Crimes: What We’re Doing and Why. Austin, Joe. 2010. “More to See than a Canvas in a White Cube: For an Art in the Streets.” City 14, nos. 1/2: 33-47. Campbell, D. Grant. 2004. “A Queer Eye for the Faceted Guy: How a Universal Classification Principle Can Be Applied to a Distinct Subculture.” In Knowledge Organization and the Global Information Society: Proceedings of the Eighth International ISKO Conference 13-16 170 July 2004 London, UK, edited by Ia C. McIlwaine. Advances in knowledge organization 9. Würzburg: Ergon Verlag, 109-13. Cho, Hyerim, Thomas Disher, Wan-Chen Lee, Stephen A. Keating, and Jin Ha Lee. 2018. “Facet Analysis of Anime Games: The Challenges of Defining Genre Information for Popular Cultural Objects.” Knowledge Organization 45: 484-99. Gottlieb, Lisa. 2008. Graffiti Art Styles: A Classification System and Theoretical Analysis. Jefferson, NC: McFarland & Company, Inc. Graf, Ann M. 2016. “Describing an Outsider Art Movement from Within: The AAT and Graffiti Art.” In Knowledge Organization for a Sustainable World: Challenges and Perspectives for Cultural, Scientific, and Technological Sharing in a Connected Society. Proceedings of the Fourteenth International ISKO Conference 27-29 September 2016, Rio de Janeiro, Brazil, edited by José Augusto Chaves Guimarães, Suellen Oliveira Milani, and Vera Dodebei. Advances in knowledge organization 15. Würzburg: Ergon, 125-32. Hjørland, Birger. 2002. “Domain Analysis in Information Science: Eleven Approaches – Traditional as Well as Innovative.” Journal of Documentation 58: 422-62. Hjørland, Birger. 2013. “Facet Analysis: The Logical Approach to Knowledge Organization.” Information Processing and Management 49: 545-57. Hjørland, Birger and Hanne Albrechtsen. 1995. “Toward a New Horizon in Information Science: Domain-Analysis.” Journal of the American Society for Information Science 46: 400-25. Jacobson, Malcolm. 2017. “Marketing with Graffiti: Crime as Symbolic Capital.” Street Art & Urban Creativity Scientific Journal 3, no. 2: 102-11. Masilamani, Rachel. 2008. “Documenting Illegal Art: Collaborative Software, Online Environments and New York City’s 1970s and 1980s Graffiti Art Movement.” Art Documentation 27, no. 2: 4-14. Riggle, Nicholas Alden. 2010. “Street Art: The Transfiguration of the Commonplaces.” Journal of Aesthetics & Art Criticism 68: 243-57. Schacter, Rafael. 2014. “The Ugly Truth: Street Art, Graffiti and the Creative City.” Art & the Public Sphere 3, no 2: 161-76. Smiraglia, Richard P. 2012. “Epistemology of Domain Analysis.” In Cultural Frames of Knowledge, edited by Richard P. Smiraglia and Hur-Li Lee. Würzburg: Ergon Verlag, 111-24. Smiraglia, Richard P. 2015. Domain Analysis for Knowledge Organization: Tools for Ontology Extraction. Waltham, MA: Chandos. Taylor, Arlene G. 1999. “Museums and Art Galleries.” In The Organization of Information, Englewood, CO: Libraries Unlimited, 9-11. Urban, Richard J. 2014. “Library Influence on Museum Information Work.” Library Trends 62: 596-612. David Haynes – Edinburgh Napier University, United Kingdom Understanding Personal Online Risk to Individuals via Ontology Development Abstract: This paper describes the development of an ontology of risk as a way of better understanding the nature of the potential harms individuals are exposed to when they disclose personal data online. The ontology was designed to be compatible with BFO, the Basic Formal Ontology, which is intended to promote interoperability. Ontologies from domains such as genetics and medical research are in many instances designed to conform to BFO. An initial exercise to monitor the online activity of six participants from the library and information services community helped to identify the points at which personal data is disclosed during online activity. It also explored the motivations for these disclosures, by questioning participants about their perceptions of risk. The resulting analysis suggested that an ontology would be better than a typology to represent the complex relationships between risk concepts. Terms were also extracted from existing terminologies. Risk scenarios were developed and tested during a formative workshop and incorporated into the ontology. A potential application of the ontology is to identify clusters of risk and map the factors that contribute to specific risks. 1.0 Introduction This research arose from an investigation into the nature of the risks associated with online disclosure of personal information. Interactions with online systems and social media platforms use an economic model based on the sale of personal data (Enders et al. 2008). For instance online behavioural advertising has been a remarkably effective model that has led to the growth of companies such as Facebook, which was able to announce profits of $18.5 billion on revenue of $70.7 billion in 2019 (Facebook 2020). In return for disclosing personal data, individuals gain ‘free’ access to online services. When faced with risk, feelings should be considered alongside rational decision making (Loewenstein et al. 2001; Finucane and Holup 2006). Behaviour models tend to emphasise conscious, rational decision-making during online transactions involving personal data (Kehr et al. 2015). This has been characterised by many researchers as the ‘privacy calculus’. The perceived benefits are judged to outweigh the perceived risks of disclosure. Individual risk and public safety are a focus for current UK government policy (DCMS 2019). In the European Union privacy concerns have been reflected in the General Data Protection Regulation (GDPR) (European Parliament 2016). The purpose of this research is to understand the nature of the risks faced by individuals when they conduct online transactions. The description and categorizing of risks may help with the delivery of more effective mechanisms for managing those risks. Baldwin et al (2010) argue that the purpose of regulation is to manage risk. Although legislation is the primary means of regulation adopted by government, it is not the full picture. Lessig (2006) encapsulated one aspect of internet regulation by the phrase “Code is Law”. The way in which systems are designed affects the way in which they operate. Cavoukian (2012) extended this idea with the concept of ‘privacy by design’. Haynes et al (2016) go on to suggest that a number of regulatory mechanisms (coding, self-regulation, market response and law) work in concert to regulate access to personal data on social networks. Mapping the risks and their relationship with causes and effects may produce better insights into effective responses to this public safety issue. 172 This research sets out to examine the nature of the risks faced by individuals when they engage in online activity. The research considers the following questions: • What is the nature of the risks that individuals face when using the internet? • Is there an existing typology of online risk? • Can an ontology of risk be developed to represent risk relationships more effectively than previous typologies of risk? 2.0 Literature review 2.1 Nature of online risk Risk is an elusive concept, the definition of which depends on the context (Fischhoff, Watson, and Hope 1984). Aven and Renn (2009, 2) define risk in the following terms: A. Risk is expressed by means of probabilities and expected values B. Risk is expressed through events/consequences and uncertainties Simply put, risk is the “effect of uncertainty on objectives” (ISO 2009, 1). Risk applies to individuals, organizations, governments and societies. When considering the risk to individuals it is necessary to make a distinction between risks to personal privacy and risks associated with disclosing personal data (e.g. via data breaches, as well as voluntary disclosure). The privacy calculus captures the concept of perceived individual risk as well as benefits associated with disclosure of personal data (Dinev and Hart 2006). Studies have found that there is an inverse correlation between severity of perceived risks and willingness to disclose personal data (Dinev and Hart 2006). Some studies have described the apparently paradoxical result where individuals disclose personal data despite perceived dangers associated with doing so (Gimpel, Kleindienst, and Waldmann 2018). Privacy paradox studies tend to depend on interviews with individuals about what they would do in hypothetical situations (Gimpel, Kleindienst, and Waldmann 2018; Min and Kim 2015). Work by Acquisti and Grossklags (2005) suggested that there is a discrepancy between intention and actual behaviour. 2.2 Using an ontology to describe risk This research initially set out to develop a taxonomy of risk based on harm to individuals. This would allow hierarchical relationships between concepts. Entities in a taxonomy can be grouped by common origin (phylogeny) or by similarity (morphology) (Gnoli 2017). Solove (2006) provides a classification of harms, which is a starting point for categorizing risks. These largely predate the advent of social media and need to be updated to incorporate the spectrum of online harassment which can range from bullying through to hate speech. Skinner, Song, and Chang (2006) developed a taxonomy of risk based on three dimensions or views: time, space and matter. This was specifically developed in the context of collaborative environments and needs validation with empirical data. Wright and Raab (2014, 290–91) identify examples of harms based on privacy principles. These both feed into an initial identification of online harms. Haynes and Robinson (2015) set these risks in a network of interconnected risks and consequences. 173 2.3 Complexity of relationships and ontologies The decision to use an ontology was based on the ability to define classes of concept and to describe different types of relationship between those classes. Ontology development has been extensive in the biomedical area and this provides a corpus of experience that can be applied elsewhere. Some attention has been paid to other domains such as project management, business processes and cyber security, either using ontologies as a tool for risk assessment (McKone and Feng 2015; Mohammad et al. 2015) or as a means of mapping the relationships between different elements of risk and specific instances of risk events. Perhaps the most directly relevant work is the review of ontologies covering cyber risk which seemed to emphasise vulnerabilities and exploitation by an attacker. There was less emphasis on the concepts of likelihood and impact, which were included in only 3 of the 10 ontologies reviewed by Oltramari and Kott (2018). The authors highlight the problem of estimating probabilities and impact levels in a dynamic environment where the behaviour of a target affects the outcomes. So, for instance if a targeted organization improves its security measures, a potential attacker will switch their attention to another, more vulnerable target. They also speculate that it is impossible to determine the outcomes without knowing more about the motivation of the attackers. An ontology of online risk needs to reflect the complex nature of risk and the need to incorporate concepts such as: Vulnerability, Threat, Incident, Consequence, Harm and Response. Some of these classes also have properties that are defined in their schemas. For example, it might be useful to incorporate the idea of impact of a Harm or the probability of an Incident into the description of a risk scenario. 3.0 Methods 3.1 Creation of the ontology Early prototyping used the Graphite system provided via the Synaptica interface. This was intuitive and allowed experimentation with different data formats and development of schemas. This development environment allows export into an OWL-compatible system so that it can plug into high-level ontologies such as the Basic Formal Ontology (BFO). The ontology development was based on the approach described by Arp et al (2015), who describe four general principles of ontology design: 1. Realism – an ontology is a representation of reality, which is supported by evidence and observation 2. Perspectivalism – reality is too complex to be represented by a single approach. Ontologies should therefore aim to be relevant and accurate within a specified domain 3. Fallibilism – an ontology will change as our understanding and knowledge of a domain develops. It is therefore necessary to be able to keep track of different versions of an ontology and the changes made 4. Adequatism – room must be made for all the types of entity that exist within the domain of the ontology Arp et al. (2015, 44) suggest that ontologies are representations of reality rather than models of reality based on mental concepts: 174 Realism in ontology is based further on the idea that with the aid of science we can come to know the general features of reality in the form of universals and the relations between them. This realist approach has a number of general consequences. First, it implies that ontologies are representations of reality, not of people’s concepts or mental representations or uses of language. This presents some real challenges in dealing with human behaviour and motivations. When looking at privacy this research is concerned with motivations to disclose personal data online and the harms (and benefits) that might result. The harms themselves may depend on the perceptions of the individual, so that similar events might be viewed very differently by different individuals. What is the ‘reality’ we are trying to represent with this ontology? The fact of people’s perceptions is a reality that is captured in attitudinal surveys. They provide a snapshot if what people thought at a particular point in time – and of course they may change in light of experience, better understanding of online harms or education about privacy risks. Risk can be seen as part of the ontology of social reality rather than objective reality, because it depends on agency: “risk belongs to this subjective ontology [of social reality]. Thus, risks are real, but only insofar as there is a social reality in which subjects engage in risk taking.” (Merkelsen 2011, 894). 3.2 Choice of software The ontology was designed to be hospitable to RDF data to allow for import from other ontologies and export of the resulting ontology to new environments. The Protégé system developed at Stanford was considered as a suitable platform because it is widely used and has an active community of developers. It supports OWL, which is a W3C standard. The Synaptica Graphite system was also considered for this exercise and was eventually selected because of its terminology management features and the support available to the researcher. 3.3 Development of the ontology The methodology for development of the ontology was described in a previous paper (Haynes 2019). Noy and McGuiness’ (2001) iterative approach was adopted and applied to the seven-step method for ontology development of Arp et al (2015). 3.4 Testing and validation The ontology design was tested in a workshop with 14 researchers and practitioners with backgrounds in: knowledge organization, information governance, cybersecurity and information science. Participants worked in groups to examine the proposed representation of risk and to provide critiques to refine it. An initial set of risk incident types was incorporated into the ontology as a set of scenarios, based on standard definitions and on descriptions in the literature. A degree of normalisation was required for consistency. Seminar participants were asked to explore risk scenarios to identify the consequents and harms that could result from each type of incident. They were also asked to consider the causes that contributed to the incident. The responses were consolidated and expressed as relationships, which entered into the ontology.The relationship network was then explored and graphs generated to illustrate the connection between different entities in the ontology. 175 3.5 Visualization of graphs The graphs representing the relationships were shown using the visualization tool within the Synaptica Graphite system. This is an interactive system that allows exploration of the relationship between nodes and navigation through the landscape of risks, their causes and consequences. 4.0 Results 4.1 Scope of the ontology The scope of the ontology was defined during the early stage of the project and was based on the overall objective of better understanding risk to individuals. The scope of the onotology is described more fully in Haynes (2019, 171–72) and can be summarised as follows: The ontology covers online hazards faced by online users and the resulting consequences and harms to the individual. It shows the cause and effect relationships between threats, incidents and consequences of disclosing personal data online. The main purpose of the ontology is to map different types of hazards that individuals face and the possible mitigating actions that they could take. It will also identify similarities between different hazards and to identify ways in which they might be addressed. 4.2 Evolution of the representation of risk in the light of feedback During the workshop, the initial representation was endorsed with some modifications to align it more closely with the cybersecurity view of risk rather than the project management view. Figure 1 shows the revised representation of risk, which incorporates feedback from the formative workshop. Risk is now defined in terms of threats that exploit vulnerabilities in systems. The threats could be malicious or accidental. Risk events are classed as Incidents. As well as mitigating actions to lessen the impact of an incident, there are also avoiding actions and defending actions to reduce the likelihood of an incident and to reduce or eliminate the threat and/or vulnerability of a system. Figure 1: Modified Representation of Risk There was some discussion about whether consequence and harm should be separated. Examination of instances of this representation suggest that it is useful to distinguish 176 between the consequence of an incident and the harm to an individual. For example, during a data breach incident, personal bank account details might fall into the hands of criminals and the harm to the individual might be loss of money. The harm is not necessarily realised because the bank may take mitigating action, or the criminals might fail to exploit the data. 4.3 Scenarios A set of scenarios was developed from reports in the literature, the case studies conducted with the volunteers and development of scenarios during the workshop. Table 1 lists the scenarios used to test different types of risk faced by individuals. They were used to explore the relationships between the causes of a risk and its consequences and these were captured in the ontology. Table 1 - Scenarios used to develop the ontology Risk Incident scenario CLICK-BAIT Fall down a click-bait rabbit hole CLOUD STORAGE Data breach of cloud documents DIGITAL ASSISTANTS Digital assistant self-launches ILLEGAL SITE Visit an illegal site LOCATION TRACKING Location tracking made public NON-SECURE SITE Land on non-https site ONLINE BANKING Bank login details revealed ONLINE PURCHASES Data breach of online purchase transaction OUT-OF-DATE SOFTWARE Use out of date software PHISHING Respond to phishing email PICTURES ON SOCIAL MEDIA Hostile response to photo posted on social media PROFESSIONAL NETWORKS Employer discovers job-seeking activity RE-USE OF PASSWORDS Re-used password is detected 4.4 Exploring the network of relationships The modified representation of risk is based on different relationships between the concept classes. Table 2 shows the classes and their relationships within the ontology. Many of these relationships have reciprocals. So for instance, the top term ‘Psychological harm’ in the ontology scheme Harm, has narrower terms: ‘Annoyance’, ‘Fear’ and ‘Worry’. Each of these has a reciprocal broader term relationship with ‘Psychological harm’. Table 2 - Relationships allowed between concepts in different classes Subject (class) Predicate(s) Object (class or property) Consequence broader/narrower Consequence Consequence leadsTo Consequence Consequence leadsTo Harm Harm broader/narrower Harm Harm hasProperty Impact Incident broader/narrower Incident 177 Subject (class) Predicate(s) Object (class or property) Incident hasProperty probability Incident LeadsTo Incident Incident leadsTo Consequence Incident leadsTo Threat Response broader/narrower Response Response mitigates Impact Response mitigates Incident Response mitigates Harm Response mitigates Consequence Response mitigates Threat Response mitigates Vulnerability Threat exploits Vulnerability Threat leadsTo Incident Vulnerability broader/narrower Vulnerability Vulnerability leadsTo Incident The visualization of the ontology demonstrates the complex relationships between concepts (Figure 2). For instance ‘breach of cloud storage’ is a scenario in the Incident scheme. It is a consequence of ‘use of cloud services’ (a prerequisite in event tree analysis) and/or ‘data theft’. It leads to ‘loss of confidentiality’ and ‘consequential loss’. From Figure 2 we can see that Use of Cloud Services is a vulnerability and that has a number of subordinate relationships. Data theft on the other hand is classed as a threat because it implies intent on the part of an agent. Figure 2 – Causes and consequences of a breach of cloud storage 178 Going the other way, a breach of cloud storage could lead to loss of confidentiality, which in turn could lead to loss of reputation (a harm). Loss of confidentiality could also result from the self-launch of a digital assistant. There are likely to be other incidents that could lead to this consequence. The breach could also lead to a consequential loss resulting in financial loss to an individual, another harm. Some relationships are two way. For instance, ‘Breach of cloud storage’ could be both a cause and a consequence of ‘Loss of confidentiality’. This illustrates the greater richness of description that is possible using an ontology rather than a taxonomy. 4.5 Inferences from the ontology As well as providing a helpful visual display of the relationships between different aspects of risk, the ontology allows navigation and exploration of different aspects of risk. This could be valuable in tracking relationships and identifying connections that are not immediately obvious on initial inspection. A possible development of this research would be to consider the eigenvector values of each node to determine closeness and identify potential clusters of concepts (Hansen, Schneiderman, and Smith 2011). This may reveal deeper structure in the set of scenarios in the ontology. 5.0 Discussion and conclusion 5.1. Addressing the research questions The project set out to explore the nature of risks that individuals face when using the internet. One way of doing this is to develop a taxonomy. Taxonomies are based on hierarchical relationships and do not allow for the complex relationships between risk concepts. For this reason, an ontology was developed instead. It was based on preexisting work as well as industry definitions of vulnerability and threats (Haynes 2019). However, these definitions tended to be focused on technical issues and consequences to systems, or organizations. This ontology shifts the focus on to people and the impact of online incidents on individuals. It explores by means of scenarios the relationships between Vulnerabilities, Threats and Incidents and then the outcomes of Incidents in terms of Consequences and Harms. The ontology also includes Responses that could mitigate these risks. 5.2 Limitations The analysis of scenarios is based on one researcher’s interpretation of data gathered from a small group of experts. To some extent this is subjective and needs a more rigorous evaluation – possibly by means of a Delphi study. This would allow a panel of experts to arrive at a consensus about the concepts and relationships associated with the scenarios. 5.3 Future Development The next stage of development for this ontology is to populate it with instances from a variety of sources, including reports in the press, incident data from data protection regulators and case studies in the literature. This would test how well the scenarios describe the reality of online risks to individuals. It would also provide the groundwork for creation of linked data sets, which could be analysed to inform policy on online safety. 179 Acknowledgement This research was supported by the Royal Academy of Engineering and the Office of the Chief Science Adviser for National Security under the UK Intelligence Community Postdoctoral Fellowship Programme (Grant No. ICRF1718\1\54). The Graphite system used to develop the ontology was provided by Synaptica Ltd. The research was conducted during Dr Haynes’ Fellowship at the Department of Library and Information Science at City, University of London. Thanks to colleagues at Napier and the anonymous reviewers for their valuable comments and suggestions. References Acquisti, Alessandro, and Jens Grossklags. 2005. “Privacy and Rationality in Individual Decision Making.” IEEE Security & Privacy 3, no. 1: 26–33. Arp, R., B. Smith, and A.D. Spear. 2015. Building Ontologies with Basic Formal Ontology. Cambridge, MA: MIT Press eBooks Library. Aven, Terje, and Ortwin Renn. 2009. “On Risk Defined as an Event Where the Outcome is Uncertain.” Journal of Risk Research 12, no. 1: 1–11. Baldwin, Robert, Martin Cave, and Martin Lodge. 2010. The Oxford Handbook of Regulation. Oxford Handbooks in Business and Management. Oxford: Oxford University Press. Cavoukian, Ann. 2012. “Privacy by Design [Leading Edge].” IEEE Technology and Society Magazine 31, no. 4: 18–19. DCMS. 2019. Online Harms White Paper. Dinev, Tamara and Paul Hart. 2006. “An Extended Privacy Calculus Model for E-Commerce Transactions.” Information Systems Research 17, no. 1: 61–80. Enders, Albrecht, Harald Hungenberg, Hans-Peter Denker, and Sebastian Mauch. 2008. “The Long Tail of Social Networking.” European Management Journal 26, no. 3: 199–211. European Parliament. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation). 20160504&from=EN. Facebook. 2020. Facebook Q4 2019 Results. Presentation-_final.pdf. Finucane, Melissa L and Joan L Holup. 2006. “Risk as Value: Combining Affect and Analysis in Risk Judgments.” Journal of Risk Research 9, no. 2: 141–64. Fischhoff, Baruch, Stephen R Watson, and Chris Hope. 1984. “Defining Risk.” Policy Sciences 17, no. 2: 123–39. Gimpel, Henner, Dominikus Kleindienst, and Daniela Waldmann. 2018. “The Disclosure of Private Data: Measuring the Privacy Paradox in Digital Services.” Electronic Markets 28, no. 4: 475–90. Gnoli, Claudio. 2017. “Classifying Phenomena Part 2: Types and Levels.” Knowledge Organization 44: 37–54. Hansen, Derek L, Ben Schneiderman, and Marc A Smith. 2011. Analyzing Social Media Networks with NodeXL: Insights from a Connected World. Burlington, MA: Morgan Kaufmann Publishers. 180 Haynes, David. 2019. “Creating an Ontology of Risk: A Human-Mediated Process.” In The Human Position in an Artificial World: Creativity, Ethics and AI in Knowledge Organization. ISKO UK Sixth Biennial Conference London 15-16th July 2019, edited by David Haynes and Judi Vernau, 167–80. Baden-Baden: Ergon Verlag GmbH. Haynes, David, David Bawden, and Lyn Robinson. 2016. “A Regulatory Model for Personal Data on Social Networking Services in the UK.” International Journal of Information Management 36, no. 6: 872–82. Haynes, David and Lyn Robinson. 2015. “Defining User Risk in Social Networking Services.” Aslib Journal of Information Management 67, no. 1: 94–115. ISO. 2009. ISO 31000:2009 Risk Management — Principles and Guidelines. Geneva: International Organization for Standardization ISO. 2011. ISO 25964-1:2011 - Information and Documentation — Thesauri and Interoperability with Other Vocabularies. Part 1: Thesauri for Information Retrieval. Geneva: International Organization for Standardization Kehr, Flavius, Tobias Kowatsch, Daniel Wentzel, and Elgar Fleisch. 2015. “Blissfully Ignorant: The Effects of General Privacy Concerns, General Institutional Trust, and Affect in the Privacy Calculus.” Information Systems Journal 25, no. 6: 607–35. Lessig, Lawrence. 2006. Code. 2nd ed. New York; London: BasicBooks. Loewenstein, George F, Elke U Weber, Christopher K Hsee, and Ned Welch. 2001. “Risk as Feelings.” Psychological Bulletin 127, no. 2: 267–86. McKone, Thomas E, and Lydia Feng. 2015. “Building a Human Health Risk Assessment Ontology (RsO): A Proposed Framework.” Risk Analysis 35, no. 11: 2087–2101. Merkelsen, Henrik. 2011. “The Constitutive Element of Probabilistic Agency in Risk: A Semantic Analysis of Risk, Danger, Chance, and Hazard.” Journal of Risk Research 14, no. 7: 881–97. Min, Jinyoung, and Byoungsoo Kim. 2015. “How Are People Enticed to Disclose Personal Information Despite Privacy Concerns in Social Network Sites? The Calculus between Benefit and Cost.” Journal of the Association for Information Science & Technology 66, no. 4: 839–57. Mohammad, Mahmud Abdulla, Ioannis Kaloskampis, Yulia Hicks, and Rossitza Setchi. 2015. “Ontology-Based Framework for Risk Assessment in Road Scenes Using Videos.” Procedia Computer Science 60, no. C: 1532–41. NIST. 2019. National Vulnerability Database. Noy, Natalya F. and Deborah L. McGuinness. 2001. Ontology Development 101: A Guide to Creating Your First Ontology. Stanford: Stanford CA. Oltramari, Alessandro and Alexander Kott. 2018. “Towards a Reconceptualisation of Cyber Risk: An Empirical and Ontological Study.” Journal of Information Warfare 17, no. 1: 49–73. Skinner, Geoff, Song Han, and Elizabeth Chang. 2006. “An Information Privacy Taxonomy for Collaborative Environments.” Information Management & Computer Security 14, no. 4: 382– 92. Solove, Daniel J. 2006. “A Taxonomy of Privacy.” University of Pennsylvania Law Review 154, no. 3: 477–564. Wright, David and Charles Raab. 2014. “Privacy Principles, Risks and Harms.” International Review of Law, Computers & Technology 28, no.3 : 277–98. Antoine Henry – Université de Lille, GERiiCO, France Widad Mustafa El Hadi – Université de Lille, GERiiCO, France The Use of Community to Organize Knowledge The Case of an Energy Company Abstract: This research conducted within an energy sector company brings together both information systems and knowledge organization (KO). It is based on a case study, aiming to analyze the ‘interface’ question in a comprehensive way through the building of an information system dedicated to the organization of knowledge within a community of practice. Through this case, we will develop an approach related to KO technologies that highlight the importance of the ‘interface’ not only as user interface in software but, moreover, as a pathway between users, communities and organizations necessary for promoting a common understanding and use of knowledge in an efficient way. 1.0 Introduction In this political and economic understanding of the society that is the ‘Information Society’ or the ‘Knowledge Society’ (first used by Drucker (1969) then by the UNESCO), the question of how to organize the knowledge is essential. From decades knowledge organization systems (KOS) (Mazzocchi 2018) such as classifications, and documentary languages, topic maps, ontologies (more recently through the use of information technology), were developed in order to optimize the way we use knowledge. The organization of knowledge can be thought of through the prism of digital technology because the theoretical framework of knowledge organization (KO) allows us to ‘develop methods to guide effective practices to exploit our knowledge in a digital environment’ (Beau 2012, 1). The knowledge organization system in this way, ‘constitutes a common language, either for the design of an information system, or, more generally, for sharing knowledge concerned by different carriers. It can be used as a framework to express knowledge shaping in the field in as comprehensive and complete manner as possible’ (Mahé et al. 2010, 66). The objective of these KOSs is therefore to ‘define principles for describing a domain to facilitate the classification and search for more or less abstract items: documents, persons, places, products, opinions or activities’ (Zacklad and Giboin 2010, 8) and thus facilitate knowledge dissemination. Moreover, in organizations where dematerialized knowledge, or even produced natively in digital formats, has become prevalent (Martínez-Ávila 2015; Fujita and Pinheiro 2016). The concept of interface can be viewed as a user interface in order to access to knowledge. In KO publications, authors are using generally the term ‘interface’ in this meaning. However, an interface can also be seen as a pathway between groups of people or inside a community (here for instance between science and society (Puente- Rodríguez, Bos and Koerkamp, 2019)). In this research, we will explore the interface aspect, not only as a computer interface but to show how a KOS can serve as an interface at various levels of an organization. In this paper, we will focus on the interface aspect of knowledge organization as follows: we first, develop our literature review in order to define more precisely the research questions and scope. Then we define our methodology, show results and our 182 findings that are going to be discussed. Finally, we show our search shortcomings, conclusion and limits. 2.0 Literature Review According to Broughton et al. (2005, 133), the concept of KO is mostly about the use of knowledge organization systems (e.g., classification, thesauri, semantic networks) and the process of organizing this knowledge. If KO is mainly related to ‘memory institutions’, many organizations are working now over the way they organize their knowledge. In this article, we will focus over this KO dimension in a private organization to highlight an aspect related to KO: the interface dimension. The professional context in which the members of an organization operate is then a specific context in terms of knowledge organization. This forces the organization to create its own organizational model and technical systems to best meet its objectives by empowering its employees to act. Knowledge produced within an organization can be considered as organizational knowledge (Bibikas et al. 2008; Coakes, Coakes, and Rosenberg 2008; Yang, Fang, and Lin 2010) as an entity is able to produce new knowledge regarding its needs and then disseminate it throughout the organization (including in services, products or systems). This production of organizational knowledge is related to the actions of members who are part of the whole (the organization) and create in the context of their mission knowledge to achieve ‘some end’ (Nonaka and Takeuchi 1995, 58). Before becoming organizational knowledge, knowledge is already individual or even communal within a group (Merali 2000; Allard 2004; Kaschig, Maier and Sandow 2016). The willingness to share knowledge among employees remains fairly recent (in relation to the development of knowledge management (Bell DeTienne et al. 2004)): ‘organizations have only recently begun to expect their employees to consistently share and exchange knowledge; in the past, organizations typically urged workers to pursue individual goals and rewarded them on the basis of individual performance and knowhow’ (Biron and Hanuka 2015, 655) and this with the aim of being competitive (Chen and Fong 2015; Martinez-Gil 2015). The community has an essential role in the production and management of organizational knowledge by considering that ‘real knowledge management is not possible without true community’ (Hassel 2007, 193), in fact ‘in a strict sense, knowledge is created only by individuals. An organization cannot create knowledge without individuals’ (Nonaka and Takeuchi 1995, 58); from individuals, knowledge can become organizational by being shared in communities of practice and more broadly in the organization. These statements resonate with the consideration that is by allowing, in the communicating organization, to co-construct meaning that ‘emerges within communities and that its analysis should not be dissociated from the social, historical, cultural and political dimensions’ (Lemke 1995, 9 cited by Hachour 2011, 202). Thus, individual knowledge can become ‘organizational’ by being shared in the communities of practice and more widely in the organization. The professional situation in which the members of the organization operate is in fact a specific context in terms of knowledge organization. For instance, an action carried out by a member of the organization involves a ‘situated action’ (Guyot 2000) and therefore to a situation where the actor is confronted with a need for established knowledge, or even the production of 183 a new knowledge if the technical situation is a new one. The knowledge produced therefore comes from the professional context in which the need to solve concrete problems appears. This idea is in line with the fact that ‘the collective competence of actors is based on the existence of networks that ensure knowledge sharing’ (Alter 2000, 267). This situation can be illustrated by Castro Goncalves (2011) who highlights the fact that learning in an organization is supported by interactions between individuals confronted in their tasks. If those researches illustrated the role of community to create knowledge, however, it remains the question of how this community can organize knowledge the way they need it. In many cases, KOS are controlled at the organizational level by experts that are in a ‘for use’ approach (Folcher 2015). These systems are set up, developed and mediated by experts who are sometimes quite far from the operational situation. They, then design these systems in a logic ‘for use’ before being set out for the actors. Instead of developing them in a ‘in use’ (ibid.) approach, in which end-users’ usage patterns are registering in the developed software. This approach is then a source of legitimization for the system and a way of sharing the view of the community regarding the way they organize knowledge toward the rest of the organization. This literature review leads to the following research question, how the development of a hybrid KOS by its future users can be seen as an interface at various levels in the organization? 3.0 Context of the research and methodology In order to investigate this question, we choose a case study approach in an organization. The data collected allowed us to design this progressive development of a new KOS. It is a result of a one-year participating observation (Soulé 2007). Other data were collected during workshops aimed at identifying more precisely the expectations of future users. Participating observation (i.e.: daily observations) and workshops (which allow to focus the attention of a group over a specific question, for instance hover the kind of knowledge they used) are complementary in order to adopt an “insider” position close to the protagonists studied. This approach elects us to deal with how the knowledge is socialized, produced and organized, and in this context of the interface question connects situated knowledge within a community and more broadly in the whole organization. In this way, regarding our research question, we can manage to apprehend choices and views of the actors in a more comprehensive way; for instance, taking in consideration political aspect or relationship between individual/direction. The company ‘Alpha’ is an important company in the energy sector, according to the INSEE nomenclature, which operates throughout France. The population of our study (also called ‘Territorials’ and represents around 45 individuals) is located in the Île-de- France region (IDF) and works in public relations positions. We can describe those ‘Territorials’ as a community of practice (Wenger 1998) since they share codes and routines that are part of the collective’s social practices (Coulon 2002); they have then internalized the values conveyed by this community and its system of representation. Quarterly meetings, gatherings and seminars organized during the year are opportunities for them to meet, exchange and at depth to strengthen their capacity to produce practices. The use of systems is then limited to devices such as e-mail, calls or ‘business’ software, but without any specific space for organizing the knowledge of this community or 184 facilitating its sharing. As part of their job, in relation with local actors, the knowledge they produce is kept individually by employees or sometimes recorded in textual documents or even in Excel spreadsheets. 4.0 Results ‘Territorials’ are in the logic of ‘actionable knowledge’, i.e. knowledge intended to produce an action and effects (Argyris 2003), that the business software (call @T) they used before does not allow. In the case of @T, its use has never been approved by the ‘Territorials’ but was imposed by the national management as a case tracking tool. Since its implementation in 2012, the ‘Territorials’ have been constantly developing other systems that better meet their needs for information and project monitoring. The request expressed by the ‘Territorials’ was the construction of an information system allows creating links between their data, actors, projects and territories in order to have actionable knowledge. Progressively, the participants in the workshops noted that the observations regarding the system could not be carried out independently of those concerning their practices. They even had an official mission letter from a member of their steering committee to pursue this aim. In conjunction with the technical part, the associated approach has been reviewed to identify current practices. Far from simplifying an approach of technical determinism or innovation determinism, the technical system is built through practice and use and goes beyond technical aspects in order to bring about through its mediation action a set of translations that build a sociotechnical ecosystem that has not yet stabilized (Hoareau 2014). It is on the basis of this observation that the ‘Territorials’ have developed an entityassociation or entity-relationship scheme proposed by Chen (1976). This is how the KO dimension emerged in the project. This work on the system is essential, an information system is above all a symbolic system of representation (Bélisle 2002) which is mobilized here and which we had to present/model through the entity-relationship scheme and the vision on their profession so that this view can be implemented in the final product. This is then composed by representations designed and interpreted by the ‘Territorials’ (business classifications, integration of their processes, etc.) and links collective and/or individual actions by a technological base. Regarding their needs and the solution that was developed to answer practically to them, the KOS is then a hybrid one that aggregate: terminology (in order to have a common vocabulary between them), a knowledge base (in which they can preserve their knowledge) and also a semantic network (this semantic network is inspired by linked data, yet it’s simplified by using the graph database NEO4J. This database allows storing elements and create semantic relations between them). The result of the semantic network looks like a topic map in the user interface in order to allow them to ‘navigate’ through their knowledge and their concepts/entities figure N° 1: 185 Figure 1: Illustration of the use of the semantic relation between data (prototype of the information system) All those elements aggregated in the final hybrid KOS are interoperable in order to facilitate the communication between the components and the other information system in the organization. Regarding the individual, community and organizational level, we can summarize the results at those levels through the Table N° 1 bellow: Table 1: Comparison between individual, community and organizational level using the information system with hybrid KOS Individual level Community level Organizational level Save time with faster identification of the right interlocutors for a project. Harmonization of practices and vocabularies between IDF ‘Territorials’. Implementation of knowledge continuity. Simplification for tracking information and fewer various information flows. Construction of a sociotechnical system so as to become a virtual community of practice. Use of a new technology within the organization: graph-oriented databases. Consideration of their requests and recognition of the specificities of their activities. Work of reflection on their job and the way it is done. Enhancement of the organization’s information assets in order to make it more efficient. Deduce new information or knowledge through graphic modeling which becomes a support to the analysis of ‘Territorial’ data. Possibility to do more collaborative work. By adding data and information via the online tool, they enrich their collective knowledge heritage which can then be consulted and used by all. Valuing the members of the organization by showing consideration for their needs or expectations. 186 Capturing weak signals and setting up inductive logic from the visualization. Production of values, practices, benchmarks for community members. Development of an interface between various actors with a common need of knowledge. It’s obvious the system plays an interface role at various levels and results of its actions would depend on this level. 5.0 Discussion By designing the information system ‘in their uses’ and practices, and by questioning their profession, this seems to be an approach that limits the risk of rejection of a hybrid KOS and makes it closer to the operational reality. From an organizational point of view, the development of a specific information system shared by the ‘Territorials’ raises questions about their extremely individualistic culture (for instance address book is a tool for their work and at the service of their careers) to a more collaborative culture. Moreover, in this case, the creation of a shared vocabulary in the hybrid KOS is an opportunity to reinforce their common culture and to create a bridge between this community and the organization to spread knowledge. By building their KOS, ‘Territorials’ also make an important analytical work related to their knowledge and their needs of knowledge. To illustrate that, during one workshop arise the question of how to preserve knowledge when someone is leaving? As we know, when a member of the community leaves, the loss of its knowledge for the organization can be detrimental. The implementation of a mechanism to ensure continuity and allow the community to fulfill its mission is then to be considered in a ‘knowledge continuity’ (KC) strategy (Ermine 2010; Biron and Hanuka 2015). It is to alleviate this situation but also to harmonize practices of the members and strengthen the sense of belonging to this group (Ellison, Steinfield, and Lampe 2007) that an information system was developed on their initiative and with them to transform this community of practice into a fullyfledged virtual community of practice (Tessier, Bourdon, and Kimble 2014). Thanks to the comprehensive approach allowed by participating observation, we can estimate that this hybrid KOS is then an interface at various levels: - It is an interface between the members of the community of practice that enhance (first of all) the possibility of sharing knowledge. The development of a common controlled terminology or of a semantic network is then useful to reinforce the community of practice and harmonize practices. Moreover, it’s also an information system that allows them to have a clear-cut idea about their work, their needs of knowledge related to their uses. - It is an interface between the community and the organization (and therefore other employees). In this specific case, as the community develop its own KOS, they do formalize the way they organize knowledge and how they consider the environment in which they evolve. Furthermore, by using a technological system, it’s an opportunity to reinforce interoperability between their system and the rest of the organization’s information system through the use of API (application programming interface) for instance or the sharing of the used terminology. - It can at least, be considered as an interface between the community and future members (or new members) in order to facilitate the integration inside the 187 community by facilitating the access to a shared language and to share knowledge. Then, the KOS enhances knowledge continuity during the time that the community is existing. This situation underscores the importance of a dualistic evolution of the system and the user, in fact, the user must adapt within the framework of co-evolution where adjustments are made both by the user and to the technical system for optimal operation (Bourguin and Derycke 2005). By authorizing employees to develop their information system, the organization’s hierarchy delegates its ability to control and organize their knowledge. With the new system, the ‘Territorials’ found themselves facing a situation where they had to produce new explicit knowledge, and, furthermore, must consider the way they organize it in order to use it in an efficient way. The system developed was designed by the future users for collective use, in particular with regard to the knowledge related to their missions. 6.0 Conclusion This article focused over the study of the building of an information system that is a hybrid KOS (including shared vocabulary, knowledge base and semantic network) by its future users. ‘Territorials’ did an analytical work over knowledge, their knowledge needs and the way they can organize it in order to realize their missions. At that time, this KOS is then an interface between all the ‘Territorials’ that are going to use it and even with the future members of this community. By building themselves, the aim is to ensure that the KOS is related to their uses. In this configuration, the organization allowed them to create in fact an interface between this community of practice and the rest of the employees. Throughout this article, we highlight the interface dimension of KO regarding the effect that the development of an information system as a hybrid KOS has over a community of practice and by extension, to an organization. Then, we consider the interface dimension not only as the user interface (what the user is going to see when he is using the software), but also as a pathway of connecting various kinds of users in an organization. It is crucial to highlight this dimension as an essential point step in a KOS design. Moreover, through this example, we also had the opportunity to see how a community is able to design its own KOS and build their relations with the whole company in this way. 7.0 Limitations At this stage of experimentation, the project is operational on the scale of a single region, but its extension to all regions is envisaged by the national management, which supervise the activity of the ‘Territorials’ if the results on the IDF are convincing. However, as we shall see in the second practical example, regional specificities are sometimes such that there is a strong disparity closely linked to the local context. References Allard, Suzie. 2004. “Knowledge Creation.” In: Handbook on Knowledge Management 1. International Handbooks on Information Systems, vol 1, edited by Clyde W. Holsapple. Berlin, Heidelberg: Springer, 367–379. Alter, Norbert. 2000. L’Innovation Ordinaire. Paris: PUF. 188 Argyris, Chris. 2003. Savoir pour Agir. Paris: Dunod. Beau, Francis. 2012. “L’Organisation des Connaissances au Cœur de la Démarche Scientifique. Organiser une Mémoire pour Comprendre et Savoir, Puis Agir et Décider avec Sagesse.” Études de Communication 39: 77–103. Bélisle, Claire. 2002. “Médiatiser L’Apprentissage Aujourd’hui.” In: Médiation, Médiatisation et Apprentissages, edited by Marie-José Barbot and Thierry Lancien. Lyon: EA 2534 Plurilinguisme et apprentissages, École normale supérieure lettres et sciences humaines, 21– 33. DeTienne, Kristen Bell DeTienne, Gibb Dyer, Charlotte Hoopes, and Stephen Harris. 2004. “Toward a Model of Effective Knowledge Management and Directions for Future Research: Culture, Leadership, and CKOs.” Journal of Leadership & Organizational Studies 10, no. 4: 26–43. Bibikas, Dimitirs, Dimitrios Kourtesis, Iraklis Paraskakis, Ansgar Bernardi, Leo Sauermann, Dimitris Apostolou, Gregoris Mentzas, and Ana Cristina Vasconcelos. 2008. “Organisational Knowledge Management Systems in the Era of Enterprise 2.0: The Case of Organik.” In: CEUR Workshop Proceedings, edited by Dominik Flejter, Slawomir Grzonkowski, Tomasz Kaczmarek, Marek Kowalkiewicz, Tadhg Nagle, and Jonny Parkes. Innsbruck, 45–53. Biron, Michal and Hagar Hanuka. 2015. “Comparing Normative Influences as Determinants of Knowledge Continuity.” International Journal of Information Management 35, no. 6: 655– 61. Bourguin, Grégory and Alain Derycke. 2005. “Systèmes Interactifs en Co-évolution.” Revue des Interactions Humaines Médiatisées 6, no.1: 1–29. Broughton, Vanda, Joacim Hansson, Birger Hjørland, and Maria J. López-Huertas. 2005. “Knowledge Organization.” In: European Curriculum Reflections on Library and Information Science Education, edited by Leif Lörring and Leif Kajberg. Copenhagen: Royal School of Library and Information Science, 133–148. Castro Goncalves, Luciana. 2011. “Construire L’action Collective dans L’interaction entre Projet et Communauté de Pratique dans des Contextes Complexes.” Humanisme et Entreprise 304: 37–56. Chen, Le and Patrick S.W. Fong. 2015 “Evaluation of Knowledge Management Performance: An Organic Approach.” Information & Management 52, no. 4: 431–453. Chen, Peter Pin-Shan. 1976. “The Entity-Relationship Model---Toward a Unified View of Data.” ACM Transactions on Database Systems 1, no. 1: 9–36. Coakes, E.W., J.M. Coakes, and D. Rosenberg. 2008. “Co-Operative Work Practices and Knowledge Sharing Issues: A Comparison of Viewpoints.” International Journal of Information Management 28, no. 1: 12–25. Coulon, Alain. 2002. L’ethnométhodologie. Paris: Presses Universitaires de France. Drucker, Peter F. 1969. The Age of Discontinuity. Guidelines to our Changing Society. New York: Routledge. Ellison, Nicole B., Charles Steinfield, and Cliff Lampe. 2007. “The Benefits of Facebook ‘Friends:’ Social Capital and College Students’ Use of Online Social Network Sites.” Journal of Computer-Mediated Communication 12, no. 4: 1143–1168. Ermine, Jean‐Louis. 2010. “Introduction to Knowledge Management.” In Trends in Enterprise Knowledge Management, edited by Imed Boughzala and Jean-Louis Ermine. Londres: ISTE, 21–43. Folcher, Viviane. 2015. “Conception pour et dans L’usage: La Maîtrise D’usage en Conduite de Projet.” Revue des Interactions Humaines Médiatisées 16, no. 1: 39–60. Fujita, Mariângela and Lena Vania Ribeiro Pinheiro. 2016. “Epistemology as a Philosophical Basis for Knowledge Organization Conceptions.” In Knowledge Organization for a Sustainable World: Challenges and Perspectives for Cultural, Scientific, and Technological Sharing in a Connected Society. Proceedings of the Fourteenth International ISKO 189 Conference 27-29 September 2016, Rio de Janeiro, Brazil, edited by José Augusto Chaves Guimarães, Suellen Oliveira Milani, and Vera Dodebei. Advances in knowledge organization 15. Würzburg: Ergon, 29–35. Guyot, Brigitte. 2000. Les Dynamiques Informationnelles. Grenoble: Université Stendhal Grenoble. Hachour, Hakim. 2011. “Epistémologies Socio-Sémiotiques et Communication Organisante: La Coproduction de Sens Comme Moteur de L’organisation.” Communication et Organisation 39: 195–209. Hassel, Lewis. 2007. “A Continental Philosophy Perspective on Knowledge Management.” Information Systems Journal 17, no.2: 185–195. Hoareau, Émilie. 2014. Capital Sociotechnique et Innovation: Le Cas du Réseau Qualireg. Saint- Denis: La Réunion. Kaschig, Andreas, Ronald Maier, and Alexander Sandow. 2016. “The Effects of Collecting and Connecting Activities on Knowledge Creation in Organizations.” Journal of Strategic Information Systems 25, no. 4: 243–258. Lemke Jay L. 1995. Textual Politics: Discourse and Social Dynamics, London, Taylor & Francis. Mahé, Sylvain, Benoît Ricard, Philippe Haik, Antonietta Folino, and Noémie Musnik. 2010. “Gestion des Connaissances et Systèmes D’organisation de Connaissances: Premier Modèle et Retours D’expérience Industriels.” Document Numérique 13, no. 2: 57–73. Martínez-Ávila, Daniel. 2015. “Knowledge Organization in the Intersection with Information Technologies.” Knowledge Organization 42, no. 7: 486–498. Martinez-Gil, Jorge. 2015. “Automated Knowledge Base Management: A Survey.” Computer Science Review 18: 1–9. Mazzocchi, Fulvio. 2018. “Knowledge organization system (KOS).” Knowledge Organization 45: 54-78. Also available in ISKO Encyclopedia of Knowledge Organization, eds. Birger Hjørland and Claudio Gnoli, Merali, Y. 2000. “Individual and Collective Congruence in the Knowledge Management Process.” The Journal of Strategic Information Systems 9, nos. 2–3: 213–234. Nonaka, Ikujiro and Hirotaka Takeuchi. 1995. The Knowledge-Creating Company: How Japanese Companies Create the Dynamics of Innovation. New York: Oxford University Press. Puente-Rodríguez, Daniel, A.P. (Bram) Bos, and Peter W.G. Groot Koerkamp. 2019. “Rethinking Livestock Production Systems on the Galápagos Islands: Organizing Knowledge-Practice Interfaces Through Reflexive Interactive Design.” Environmental Science and Policy 101: 166–174. Soulé, Bastien. 2007. “Observation Participante ou Participation Observante? Usages et Justifications de la Notion de Participation Observante en Sciences Sociales.” Recherches Qualitatives 27, no. 1: 127–140. Tessier, Nathalie, Isabelle Bourdon, and Chris Kimble. 2014. “Participer à une Communauté de Pratique Virtuelle: Retours D’expériences dans une Multinationale de L’ingénierie,” Recherches en Sciences de Gestion 100, no.1: 121–140. Wenger, Etienne. 1998. Communities of Practice. Cambridge: Cambridge University Press. Yang, Chen-Wei, Shih-Chieh Fang, and Julia L. Lin. 2010. “Organisational Knowledge Creation Strategies: A Conceptual Framework.” International Journal of Information Management 30, no. 3: 231–238. Zacklad, Manuel and Alain Giboin. 2010. “Systèmes d’organisation des Connaissances Hétérogènes pour les Applications Documentaires.” Document numérique 13, no. 2: 7–12. Philip Hider – Charles Sturt University, Australia Fiction Genres in Library Catalogues and Social Cataloguing Sites Abstract Samples of fiction genres both represented and not represented in the Library of Congress Genre/Form Terms (LCGFT) were compared with respect to their usage in the social cataloguing site, LibraryThing. It was found that the non-LCGFT genres, mostly based on entries in Wikipedia, were markedly more used than were the LCGFT genres. A particular feature of many of the non-LCGFT genres was an element of affect, relatively lacking in the LCGFT sample. It is suggested that there may remain a reluctance in library cataloguing to fully embrace this aspect of genre, and creative works such as fiction, and that this reluctance may be due in part to the traditional, modernist paradigm of the cataloguer as gatekeeper to objects rather than as a facilitator of experiences and feelings that those objects may provide. 1.0 Introduction The study and description of ‘genre’ has been relatively neglected in knowledge organization (KO), compared with resource attributes such as ‘subject’ (Lee and Zhang 2013). Yet for a wide range of works, including creative works, the concept of genre is central to the way in which they are described, and thus conceptualized, by both creators and consumers. Recently, the Library of Congress developed its Library of Congress Genre/Form Terms for Library and Archival Materials (LCGFT) to help address this gap in library cataloguing (Young and Mandelstam 2013). This paper assesses the extent to which there may still be gaps in LC’s coverage of genre by comparing the use of samples of LCGFT and non-LCGFT fiction genres on the social cataloguing site, LibraryThing. The paper goes onto discuss why missing genres may not have been included in the library vocabulary, and the relationship between professional and commercial genre classifications and the genre folksonomies to be found on social cataloging sites. This discussion draws, in particular, on the critique of the modernist paradigm in library cataloguing and classification by scholars such as Mai (2011), the work of scholars such as Spiteri and Pecoskie (2017) in pointing out the importance of affect in fiction access, and the theoretical framework of art classification originally developed by DiMaggio (1987). 2.0 Literature review Zhang and Olson (2015) have explored the ways in which genres exhibit qualities of both ‘essences’ and ‘contexts’ in library cataloguing and beyond. On the one hand, genres involve essences that provide stability; on the other they relate to contexts that are fluid. Noting that ‘genre has long been a source of uncertainty and unease in bibliographic control’, the authors consider genre to be ‘an integration of aboutness, of-ness and is-ness’ (Zhang and Olson 2015, 540, 550). The meaning of specific genres becomes even more complex when it viewed as in an ongoing state of negotiation between different protagonists, such as creators, audiences and intermediaries (Tudor 2012). The influence of different interests and groups on art classification has likewise been included in the theoretical framework originally proposed by DiMaggio (1987), with ‘ritual’ classifications being shaped by commercial, professional and administrative inputs. In the framework, these classifications vary 191 across the dimensions of differentiation, hierarchy, universality and boundary strength. While the conceptualization of artistic genres has tended to be viewed as more heavily influenced by commercial interests than by the professional views of critics, for example, Brown (2015) notes the exception of ‘feel good’ movies, a category with sometimes positive and sometimes negative connotations that have followed the lead of the film critics rather than the film distributors. Although genres have, for many decades, played an important role in the everyday description of creative works, their use in library cataloguing and other KO practice has been less than prominent, with far more attention having been given to describing books and other materials in terms of ‘subject’ (Lee and Zhang 2013). Thus there has been a long debate around the concept of ‘aboutness’, but far less attention paid to what it means for a novel, for example, to ‘belong’ to a particular genre category. Nevertheless, formal controlled vocabularies for the description of genres do exist, with the Library of Congress having developed a separate list, over the past decade, of headings for genres and formats covering a wide range of materials, including fiction. The LCGFT is primarily based on those genres and form subdivisions previously established in the Library of Congress Subject Headings (LCSH) (Young and Mandelstam 2013). The LCGFT defines genres as ‘categories of works that are characterized by similar plots, themes, settings, situations, and characters’ (Library of Congress 2018, 3), and this definition has been adopted for the purposes of this paper. For inclusion as LCGFT, terms require ’literary warrant’, that is, they should be based on items being catalogued, although the terms also need to be justified with reference to authoritative sources (Library of Congress 2018). Outside of librarianship, another formal classification system that includes fiction genres is BISAC (, developed for and by the book industry. The classification is not as deep as those typically used in library cataloguing, but is influential in the book world. Because of its commercial roots, its adoption by (some) libraries has been criticized by Martínez-Ávila (2016). Meanwhile, fiction experts and enthusiasts have the opporunity to contribute to the description of genres on Wikipedia, which divides ‘genre fiction’ into the main genres of 1) crime, 2) fantasy, 3) romance, 4) science fiction, 5) Western, 6) inspirational and 7) horror ( The general reading public are also able to ‘tag’ their personal collections with their own genre terms on social cataloguing sites such as LibraryThing ( and Goodreads ( When aggregated, these terms form a folksonomy (Rafferty 2018). Social tagging environments such as LibraryThing have been studied extensively from a KO perspective (e.g. Johansson and Golub 2019; Vaidya and Harinarayana 2016; Voorbij 2012; Bates and Rowley 2011; Lu, Park, and Hu 2010; Adler 2009). Their ‘democratization’ of access provision and KO practice has been championed by scholars such as Mai (2011), who have characterised traditional library cataloguing and classification as modernist and objectivist, with the cataloguer describing and classifying materials according to a single, ‘authoritative’ perspective. Pando and Almeida (2016) have shown how this approach is now being challenged across the field of KO, with a postmodern viewpoint coming to the fore due to the pervasiveness of online technologies that support activities such as social tagging. 192 While social cataloguing and social bookmarking generate many tags that describe personal relationships with resources (e.g. ‘to read’), many more describe aspects that are, or could be, relevant to other users (Heymann, Paepcke, and Garcia-Molina 2010). Stover (2009) has noted the opportunity the sites provide for the classification of creative works by affect, as well as subject. Spiteri and Pecoskie (2017) have followed up by developing an ‘affect’ taxonomy to cover emotions, tones and association, for use in readers’ advisory services, in this case based on the literature and existing schemes. Social cataloguing sites with large user communities also provide a good opportunity to gauge ’user warrant’, defined here as the justification for the inclusion of terms and concepts in an indexing vocabulary on the basis of their likely use by prospective users of that vocabulary. User warrant is often contrasted to other major forms of warrant recognised in KO, including literary warrant, mentioned earlier, expert warrant and cultural warrant (Hider 2015). Svenonious (2000, 135) argues that ‘literary warrant is a necessary but not sufficient basis for admitting terms into the vocabulary of a subject language. This is because there is no guarantee that the vocabulary of those who create the literature of a discipline will match the vocabulary of those who search for it.’ As such, user warrant is a commonly accepted basis for the development of schemes and vocabularies (Hider 2015). Sometimes, user warrant is also distinguished from ‘use warrant’, though other times the terms are used interchangeably (Martínez-Ávila and Budd 2017). In this paper, they will be defined and distinguished operationally, as described in the Methods section. It should also be noted that the literature review did not identify any other study on the social tagging specifically of genres. 3.0 Research questions This project aims to explore the following research questions: (1) Are there fiction genres with relatively high levels of general use and user warrant which are not covered by the LCGFT? (2) If so, what is the nature of these genres? (3) Broadly, why are libraries not describing fiction in terms of these genres? 4.0 Method The study devised an index of use and user warrant that could be utilized in the context of a readily accessible social cataloguing site, namely, LibraryThing. A purposive sample of twelve genre terms not included in LCGFT (i.e. as neither preferred nor nonpreferred terms) was derived from Wikipedia and, in one case (i.e. ‘pulp’), the Ebay search interface ( The sample was selected on the basis that the terms would generally not be used as subjects, when applied to fiction, as per the distinction made by the Library of Congress between subjects and genres (Young and Mandelstam 2013). All seven of Wikipedia’s divisions of genre fiction were represented in the sample, as were genres that were described as crossing over different divisions. The twelve genre terms, as well as works cited in Wikipedia and Ebay as examples of the genres, were searched in the Library of Congress catalog ( Literary and collection warrant could be found in all cases. The sample genres were: biopunk, chick lit, dieselpunk, dying Earth, gaslamp, grimdark, hardboiled, inspirational, northwestern, pulp, splatterpunk, and weird. The sample did not allow for 193 a general comparison between LCGFT and non-LCGFT genre tags in LibraryThing, but it did allow for an exploratory study of possible gaps in LCGFT. The sample terms were then searched in the LibraryThing interface, which hosts approximately 155M tags, used by roughly 2.3M subscribers ( As such LibraryThing is the world’s biggest social cataloguing site and deemed to be representative of the way in which the public at large describe works of fiction (while acknowledging certain biases, including one toward the English language). For each sample term, the tags that included either the term exactly or a word-form variant of the term (e.g. ‘biopunk’ and ‘bio-punk’), and that could have been used to represent a fiction genre, were identified. For each of these tags, the number of works for which the tag was used, and the number of subscribers who used the tag, were recorded (LibraryThing collates different printings and editions of the same work, albeit imperfectly). This information was provided directly in the LibraryThing interface. Tags with variant terms were grouped together. As many tags have been used by thousands of different subscribers for thousands of different works, for the purposes of this study those groups of tags used by fewer than ten subscribers altogether were discarded at this stage. Many of these tags were personal in nature or contained typographical errors. The total numbers of works and users for the remaining tag groups were then adjusted in some cases for non-genre use, that is, where tags had been used in senses other than a fiction genre. For example, ‘biopunk’ was used as a tag for a book of poetry, while ‘inspirational’ had been used to describe non-literary works that were ‘inspirational’. These adjustments were based on samples of the ten works for which tags had been most used. The final estimates of works and users were then added up for each of the twelve sample genres, providing an index of use warrant and user warrant respectively. A sample of the 20 LCGFT genres under the ‘Fiction’ heading was selected for comparison with the non-LCGFT sample. As the latter are subgenres, those LCGFT that represented the main genres listed on Wikipedia were excluded. Also excluded were those LCGFT that did not include the word ‘fiction’, as these tended to be ‘forms’, like ‘short stories’, which fell outside of the LC definition for genres above, or in a few cases were non-English terms for non-English language material. Further, those LCGFT listed at more than one level below the ‘Fiction’ heading were not used. The selection of the remaining LCGFT genres was done randomly. For each of the sample LCGFT genres the process as described above was repeated, except that in this case all of their non-preferred, as well as preferred, terms, as listed in LCGFT, were included as variants to be searched in the LibraryThing interface. The estimated numbers of works and users for the non-LCGFT and LCGFT genres were then compared. Finally, the most tagged works for each of the twelve non-LCGFT genres were then searched in the OCLC WorldCat database (, which comprises bibliographic records used in vast numbers of library catalogues around the world. For the top ten works found in WorldCat, the subject and genre headings used in each of their most prominent edition were recorded and analysed for semantic overlap with the sample genre. The headings were also analysed for their overlap across the ten works. 194 5.0 Findings The twelve non-LCGFT genres were all represented with tags in LibraryThing. Six of the genres (biopunk, dieselpunk, dying Earth, grimdark, northwestern and splatterpunk) were represented by just one group of variant tags used by at least ten users, the other six by several tag groups with ten or more users; the genre with most tag groups was chick lit, with 15 groups. For some genres with multiple tag groups, however, one or two of the groups were far more used than others. Generally, the most used terms were those that included just the genre term itself (e.g. ‘hardboiled’), but the term qualified with ‘fiction’ or the parent genre(s) (e.g. ‘hardboiled detective’ or ‘hardboiled fiction’), were in some cases also quite often used. Of the 71 tag groups with ten or more users, across the whole sample of non-LCGFT genres, a majority (42/62 = 68%) appeared to be used virtually exclusively to represent the fiction genre, but about a third (20/62 = 32%) were also used for other concepts to varying degrees. Not surprisingly, the two genres with the tags that exhibited the most mixed use were those with terms that had other common meanings, namely ‘inspirational’ and ‘weird’. The four other genres with polysemic terms were gaslamp, pulp, northwestern and splatterpunk. In some of these cases, the different meaning still related to the fiction genre, but covered the genre more broadly or another literary form (e.g. poetry). One might perhaps have expected a larger number of tag groups per LCGFT genre than per non-LCGFT genre, as the non-preferred LCGFT terms were also searched in LibraryThing. However, whereas the twelve non-LCGFT genres yielded 71 tag groups, the 20 LCGFT genres yielded just 69 tag groups. Apart from the possibility of different distributions of degrees of ambiguity across these groups, a likely explanation is simply that the LCGFT genres were used less overall in LibraryThing than were the non- LCGFT genres. As with the non-LCGFT genres, some of the LCGFT genres yielded just the one tag group, others yielded several; likewise, while a majority of the LCGFT tag groups were virtually exclusively used for the fiction genre, some (18/69 = 26%) were also used for other concepts, related or otherwise. Tables 1 and 2 list the estimates for the number of works in LibraryThing that were tagged for each of the non-LCGFT and LCGFT genres respectively, at the time of the study. We can clearly see that overall the non-LCGFT sample was used for many more works than was the LCGFT sample. Of the six genres with tags to more than 10,000 works, four were non-LCGFT, despite the smaller sample size. The top genre was chick lit, followed by magic realist fiction and pulp. At the other end of the scale, there were five LCGFT genres (20% of the sample) with fewer works than the lowest non-LCGFT genre had. If the median number for the LCGFT sample (162) were deemed the threshold for ‘use warrant’, then all bar two of the non-LCGFT genres (dieselpunk and northwestern) would have a strong case for inclusion in LCGFT. Tables 3 and 4 list the estimates for the number of users in LibraryThing that had tagged for each of the non-LCGFT and LCGFT genres respectively, at the time of the study. The distributions are similar to those of the numbers of works (above), and accordingly we can clearly see that overall the non-LCGFT genres had been used by a lot more LibraryThing members than had the LCGFT genres. Again, chick lit comes out on top by a large margin. Of the seven genres with over 10,000 taggers, five are from the smaller non-LCGFT sample. If the median number of taggers for the LCGFT sample 195 (160) were deemed the threshold for ‘user warrant’, then again all bar two of the non- LCGFT genres would have a strong case for inclusion in LCGFT. One should also bear in mind that there may well be synonyms for the non-LCGFT terms that were used by other taggers, and for other works, which would increase the non-LCGFT numbers. Table 1. Works with non-LCGFT fiction genre tags Genre Works Chick Lit 74781 Pulp 27310 Inspirational 21937 Weird 14414 Hardboiled 8179 Gaslamp 751 Genre Works Grimdark 652 Dying Earth 646 Splatterpunk 225 Biopunk 200 Dieselpunk 139 Northwestern 52 Table 2. Works with LCGFT fiction genre tags Genre Works Magic realist fiction 35345 Legal fiction (Literature) 10157 Campus fiction 949 Philosophical fiction 876 Utopian fiction 829 Didactic fiction 488 Road fiction 371 Mythological fiction 352 Social problem fiction 344 Picaresque fiction 171 Genre Works Martial arts fiction 153 Easter fiction 102 Prison fiction 73 Pastoral fiction 62 Bisexual fiction 55 Fishing fiction 34 Transgender fiction 32 Hunting fiction 24 Samurai fiction 24 Nonsense fiction 0 Table 3. Users tagging for non-LCGFT fiction genres Genre Users Chick Lit 81056 Pulp 27639 Inspirational 21990 Weird 14676 Hardboiled 10038 Gaslamp 776 Genre Users Dying Earth 707 Grimdark 654 Splatterpunk 230 Biopunk 202 Dieselpunk 138 Northwestern 52 196 Table 4. Users tagging for LCGFT fiction genres Genre Users Magic realist fiction 37384 Legal fiction (Literature) 11054 Utopian fiction 1335 Campus fiction 1050 Philosophical fiction 1003 Didactic fiction 490 Social problem fiction 445 Picaresque fiction 264 Mythological fiction 250 Road fiction 211 Genre Users Easter fiction 108 Pastoral fiction 102 Samurai fiction 49 Prison fiction 31 Martial arts fiction 22 Fishing fiction 19 Bisexual fiction 18 Hunting fiction 14 Transgender fiction 11 Nonsense fiction 0 The distributions in these tables are notable for several other reasons. First, there appears to be hardly any use or user warrant for nonsense fiction, despite its inclusion in LCGFT (or if there is, LibraryThing taggers use a term not covered by LCGFT). Second, chick lit appears to have more use and user warrant than the 20 LCGFT genres put together, demonstrating how a well-used professional vocabulary can nevertheless be hugely at odds with ‘folk’ perspectives on particular works. Third, the exponential-like distributions shows the extent to which genres, or at least fiction genres, can withstand relatively massive amounts of use in preference over subgenres. Fourth, the indexes of ‘use warrant’ and ‘user warrant’ as constructed for this study very strongly correlate, with Spearman’s rank correlation coefficients of over 0.999 for both non-LCGFT and LCGFT orders. The thematic analysis of headings in the WorldCat records for works tagged with the non-LCGFT terms did not reveal any clear substitute terms that were consistently used instead of these terms. For the most part, however, the headings indicated that the tags used in LibraryThing did indeed represent the genres described in Wikipedia and, in the case of pulp, as used on Ebay. There was one very noticable exception, however, namely ‘northwestern’, which appeared to be used to describe fiction of a quite different ilk from the ‘western’ set further north in the American continent as per Wikipedia. In some cases, for instance, the library headings pointed to settings in Russia. The wide range of headings found in the records for works tagged ‘inspirational’ and ‘weird’ also pointed to a ‘slippage’ in meaning from that described in Wikipedia, although in these cases it is less clear to what extent the difference is absolute or one of degree. Many of the headings for ‘weird’ did relate to ‘the other’ and alientation, while many of the headings for ‘inspirational’ could be considered emotive. The headings for the other nine corresponding genres are summarized as follows: chick lit works attracted relatively large numbers of headings, representing many different subjects and various categories of women; pulp fiction also had headings covering a wide variety of subjects, including fictitious characters, and sex and crime related topics; hardboiled fiction mostly had specific headings pertaining to characters, plot and place; gaslamp fiction had a large number of headings for classes of persons and some for broader genres, including science fiction; grimdark fiction also had headings for broader genres, as well as for a number of subjects with a military focus; dying Earth fiction had fewer subject headings, but headings for both fantasy and science fiction; 197 splatterpunk fiction also had fewer headings, tending to cover subjects related to violence; biopunk had a wide range of subject headings, including some with a scientific slant; and dieselpunk fiction had headings for a range of ’dark’ subjects, but lacking any obvious common thread. Sampling the headings, and the works described by these headings, it is clear that many of the non-LCGFT tags represent fiction that may be about various subjects, but that also, critically, invokes certain feelings, and, overall, more so than does the fiction tagged with the LCGFT genres, many of which indicate particular settings (e.g. prison fiction) or particular themes (e.g. social problem fiction), more than particular reader experiences. Even ‘chick lit’ and ‘pulp’, although their more literal meanings do not represent feelings, have widely known connotations that include affect. Indeed, Fenkel (2019: 183) has argued that chick lit forms part of ‘a type of umbrella genre’ that she labels as ‘pleasure’, ‘constituted as an ephemeral archive that is translated into popular fiction when it is read as a history of feeling in public cultures.’ It seems that pulp fiction might also form part of such an umbrella genre. In summary, the sampled non-LCGFT genres can be regarded as, on the average, more affective than the sampled LCGFT genres. Although the method used in this study does not account for synonyms used by LibraryThing taggers for the terms under analysis and so strictly compares the use and user warrant of terms rather than concepts, it is nevertheless able to highlight, at the very least, deficiencies in language that could be of interest to vocabulary builders, and it could be extended to compare such deficiencies across multiple schemes. It can also be argued that most terms used in most schemes would be the most used or only commonly used term for their concept, and as such the index provides, even at the conceptual level, a reasonable ‘rule of thumb’ for the purposes of broad comparison. 6.0 Conclusion The findings reported above strongly suggest that although libraries are de-scribing fiction using some of the genres heavily used by the general population of fiction readers, they are also missing a considerable number of others. The greater amount of user and use warrant for the Wikipedia and Ebay genres than for the LCGFT genres, on the average, would also suggest that the impact of li-brary classification on everyday genre classification (at least with respect to fic-tion) is weak relative to other classifications, such as those of experts via Wik-ipedia and commercial classifications. Therefore, while it may be that both ‘pro-fessional’ and commercial classifications strongly impact art classifications as a whole, some of these classifications are likely to be considerably more impactful than others. The more heavily used genres in this study tend to connote a strong affective element. The importance of affect for fiction searching has been noted by authors such as Stover (2009) and Mikkonen and Vakkari (2016). It is very possible that this element has, in fact, worked against the inclusion of some gen-res in library vocabularies. In some cases, the connotations may be seen as derog-atory, but, beyond this, recognition of affect may have not sat well in the modern-ist, essentialist paradigm of which the Anglo-American library cataloguing has long been a part, with cataloguers viewing themselves as providers of access to information objects, rather than postmodern facilitators of access to materials that are engaged with perhaps primarily for emotional and subjective reasons. The modernist paradigm is therefore especially imperfect, 198 it would seem, for the pro-vision of access to works of the imagination, such as fiction. Rightly, the Library of Congress has separated out LCGFT from LCSH, as genres go well beyond sub-ject. However, this study demonstrates that the genre list may need further work, underpinned by greater recognition of the importance of affect in the consump-tion of creative works, including the reading of fiction. Acknowledgement The author wishes to thank Anya Smeaton for her assistance in compiling the data for this paper. References Adler, Melissa. 2009. “Transcending Library Catalogs: A Comparative Study of Controlled Terms in LCSH and User-generated Tags in LibraryThing for Transgender Books.” Journal of Web Librarianship 3, no. 4: 309-331. Bates, Jo and Jennifer Rowley. 2011. “Social Reproduction and Exclusion in Subject Indexing: A Comparison of Public Library OPACs and LibraryThing Folksonomy.” Journal of Documentation 67: 431-448. Brown, Noel. 2015. “The Feel-good Film: A Case Study in Contemporary Genre Classification.” Quarterly Review of Film & Video 32, no. 3: 269-286. DiMaggio. Paul. 1987. “Classification in Art.” American Sociological Review 52, no. 8: 440-455. Heymann, Paul, Andreas Paepcke, and Hector Garcia-Molina. 2010. “Tagging Human Knowledge.” In Third ACM International Conference on Web Search and Data Mining (WSDM2010), February 3-6, 2010. New York City. Preprint retrieved from Hider, Philip. 2015. “A Survey of the Coverage and Methodologies of Schemas and Vocabularies Used to Describe Information Resources.” Knowledge Organization 42: 154-163. Johansson, Sandra and Koraijka Golub. 2019. “LibraryThing for Libraries: How Tag Moderation and Size Limitations Affect Tag Clouds.” Knowledge Organization 46: 245-259. Lee, Hur.-Li and Lei Zhang. 2013. “Tracing the Conceptions and Treatment of Genre in Anglo- American Cataloging.” Cataloging & Classification Quarterly 51, no. 8: 891–912. doi:10.1080/01639374.2013.832457. Library of Congress. 2018. Introduction to Library of Congress Genre/Form Terms. Lu, Caimei, Jung-ran Park, and Xiaochua Hu. 2010. “User Tags Versus Expert-Assigned Subject Terms: A Comparison of LibraryThing Tags and Library of Congress Subject Headings.” Journal of Information Science 36, no. 6: 763–79. doi:10.1177/0165551510386173. Mai, Jens-Erik. 2011. “Folksonomies and the New Order: Authority in the Digital Disorder.” Knowledge Organization 38: 114-122. Martínez-Ávila, Daniel. 2016. “BISAC: Book Industry Standards and Communications.” Knowledge Organization 43: 655-662. Martínez-Ávila, Daniel and John M. Budd. 2017. “Epistemic Warrant for Categorizational Activities and the Development of Controlled Vocabularies.” Journal of Documentation 73: 700- 715. Mikkonen, Anna and Pertti Vakkari. 2016. “Readers’ Interest Criteria in Fiction Book Search in Library Catalogs.” Journal of Documentation 72: 696-715. Pando, Daniel Abraão and Carolos Cândido de Almeida. 2016. “Knowledge Organization in the Context of Postmodernity from the Theory of Classification Perspective.” Knowledge Organization 43: 113-117. Rafferty, Pauline. 2018. “Tagging.” Knowledge Organization 45: 500-516. Also available in ISKO Encyclopedia of Knowledge Organization, ed. Birger Hjørland, coed. Claudio Gnoli. Spiteri, Louise and Jen Pecoskie. 2018. “Expanding the Scope of Affect: Taxonomy Construction for Emotions, Tones, and Associations.” Journal of Documentation 74: 383-397. Stover, Katie Mediatore. 2009. “Stalking the Wild Appeal Factor.” Reference & User Services Quarterly 56, no. 3: 243-246. 199 Svenonius, Elaine. 2000. The Intellectual Foundation or Information Organization. Cambridge, MA: MIT Press. Tudor, Andrew. 2012. “Genre.” In Film Genre Reader IV, edited by Barry Keith Grant. Austin: University of Texas Press. Vaidya, Praveenkumar and N. S. Harinarayana. 2016. “The Comparative and Analytical Study of LibraryThing Tags with Library of Congress Subject Headings.” Knowledge Organization 43: 35-43. Voorbij, Henk. 2012. “The Value of LibraryThing Tags for Academic Libraries.” Online Information Review 36, no. 2: 196-217. Young, Janis L. and Yael Mandelstam. 2013. “It Takes a Village: Developing Library of Congress Genre/Form Terms.” Cataloging & Classification Quarterly 51, nos. 1-3: 6-24. Zhang, Lei and Hope A. Olson. 2015. “Distilling Abstractions: Genre Redefining Essence versus Context.” Library Trends 63, no. 3: 540-554. Maximilian Hindermann – Basel University Library, Switzerland Andreas Ledl – Basel University Library, Switzerland BARTOC FAST A Federated Asynchronous Search Tool for Remote Vocabulary Access Abstract: In this paper we introduce BARTOC FAST, a federated asynchronous search tool for remote vocabulary access. We first motivate the need for BARTOC FAST by exposing the limitations of the local Skosmos instance. We then discuss the advantages of BARTOC FAST – vast search space, low footprint, and modularity – and provide an overview over its implementation and some design challenges. We close by considering some anticipated use cases and plans for future development. 1.0 Limitations of The Basic Register of Thesauri, Ontologies & Classifications ( is a full terminology registry for knowledge organization systems (KOS). We aim to collect as many controlled vocabularies as possible in one place, to describe them uniformly and to make them accessible. So far this is not the case, at least not with regard to the searchability of vocabulary content. BARTOC FAST attempts to supplement with an application that allows to search for concepts and terms in vocabularies. The whole project stems from our conviction that KOS presented in should comply with the FAIR principles for the use of controlled vocabularies2. consists of two main modules: one contains the metadata of currently about 3,200 KOS, the other provides the members of the KOS, i.e. concepts and terms, as long as they are available in SKOS format. For the SKOS vocabulary service we use Skosmos, an open source web-based SKOS publishing tool developed by Suominen et al. (2015). Skosmos allows users to browse SKOS vocabularies alphabetically or hierarchically and offers structured concept display. For, the most important feature was the global search, which enabled searching for concepts across all hosted vocabularies. Consequently, vocabularies from other terminology registries had to be uploaded to our own RDF Triple Store. This caused serious problems after some time: • Skosmos could no longer process the large number of 1,436 vocabularies (3,314,740 concepts, 11,492,219 terms, with the massive Getty vocabularies not included). • Since our vocabularies were mostly clones of remotely hosted vocabularies, it was a considerable effort to keep them up to date by periodically checking for updates or new versions. Comparable portals such as Linked Open Vocabularies face similar problems and solve them through manual annual reviews and comments in a separate metadata field3. 1 2 3 201 • Not all vocabularies were available in full SKOS. We had to realize that Skosmos is a great tool, but was no longer sufficient for our special purpose of a powerful search instrument for plenty of KOS. Or as Osma Suominen puts it in the Skosmos User Forum: “Skosmos works well up to around tens of thousands of concepts (...). With more than 100,000 concepts, it starts getting slow” (Suominen 2015). One could add that with millions of concepts the global search function collapses. However, smaller Skosmos deployments containing unique vocabularies, run by institutions like UNESCO4, Food and Agriculture Organization (FAO) of the United Nations5, National Library of Finland6, University of Oslo Library7, Inist-CNRS8, etc. provide REST-style APIs and Linked Data access to the underlying data. Our task was to replace the global search of Skomos with a federated search method that was capable of querying multiple REST APIs and SPARQL endpoints simultaneously to make millions and millions of concepts from any number of terminology registries accessible with one tool. 2.0 BARTOC FAST We have overcome the aforementioned limitations of by developing BARTOC FAST9. BARTOC FAST is a remote retrieval aid for thesaurus, ontology and classification concepts. The acronym FAST stands for Federated Asynchronous Search Tool and should not be confused with OCLC’s vocabulary of the same name, which resolves to Faceted Application of Subject Terminology10. At present the FAST federation contains more than 20 resources – including Skosmos11, Getty Vocabularies12, the Integrated Authority File (GND) via lobid-gnd, as introduced by Steeg et al. (2019), and the Research Vocabularies Australia13, see for a full list – thus comprising a vast search space. BARTOC FAST offers three decisive advantages over First, Skosmos can take a back seat and exclusively serve as an instance for SKOS vocabularies that are not hosted anywhere else. Secondly, the data in BARTOC FAST is always up-to-date, as it comes directly from the APIs of the terminology registries. Conversely the footprint of BARTOC FAST as compared to is massively reduced since the maintenance and support of the terminology registries is delegated to their providers. Thirdly, BARTOC FAST is a single access point that allows users to search not only for vocabularies but also in vocabularies. The new search interface of will combine both functionalities (see Figure 1). 4 5 6 7 8 9 10 11 12 13 202 Figure 1: Prototype of the new search interface for including BARTOC FAST. 2.1 Implementation BARTOC FAST is implemented in Python 3.7 and runs on the servers of the Basel University Library. The frontend and backend will be discussed in turn. The BARTOC FAST frontend uses the Django framework14 and comes with three distinct views: basic, advanced and API. Each view corresponds the expectations of a specific type of user: 1. The basic view mirrors the user interface of a discovery tool such as Primo15. All parameters (except the search input of course) are fixed and set to values that yield usable results in most cases. The user simply types in a search word and receives a list of results. This list is both sortable and searchable (see Figure 2) and hence allows for additional refinement. 2. The advanced view is similar to the basic view but allows for a customization of all parameters (see Figure 3). These include the maximum search time, the choice of queried resources, and the option of keeping duplicates. Note that some desirable facets, filters and modifiers are not yet implemented but scheduled for the next development cycle. The BARTOC FAST API is a HTTP-based RESTful API, soon to be compliant with the Reconciliation Service API16 returning results as JSKOS, a data format for KOS by Voß (2019), or generic JSON-LD. The view itself is identical to the advanced view. 14 15 16 203 Figure 2: BARTOC FAST basic view results list for search term “knowledge organization”. Shifting the focus towards the BARTOC FAST backend now, since BARTOC FAST is a remote service, it needs to process many distinct APIs. For this reason, resource modelling and query resolving in BARTOC FAST is handled by a GraphQL17 schema which makes use of the Graphene module18. Simply put, GraphQL provides a (meta-) query language over all the resources in the BARTOC FAST federation (see Figure 4). Resource modelling is modular and on an API basis. This means that new instances of already modelled APIs (such as Skosmos and SPARQL) can easily be added to the BARTOC FAST federation. 17 18 204 Figure 3: The BARTOC FAST advanced view enables parameter customization. Figure 4: A BARTOC FAST search request in the GraphQL query language. BARTOC FAST resolves federated queries by translating the user’s search input into an API call for each resource; the list of aggregated results is then returned. More precisely, query resolution takes three steps for each resource: asynchronously fetching data by means of an API call to the resource, normalizing this data, and purging duplicates. Every result in the list of aggregated results has three mandatory data fields: a URI, a nonempty set of labels, and a source: 1. The URI is used to tell apart results within and across resources. Given the results of a single resource, all results with the same URI are merged into a single result and their labels (respectively the contents of their labels) are aggregated. For the results of multiple resources, redundant results as identified per URI are purged. 2. BARTOC FAST results use four SKOS labels19 as base, namely skos:prefLabel, skos:altLabel, skos:hiddenLabel and skos:definition. Since not all resources in BARTOC FAST employ SKOS, the semantic equivalents of these 19 205 labels are provided via a mapping (e.g., skos:prefLabel is equivalent to rdfs:label20 is equivalent to gnd:preferredName21). Note that the descriptor takes no special priority. 3. The source is the queried resource. 2.2 Challenges In this section we discuss two design challenges that we encountered in the development of BARTOC FAST. An initial challenge concerns the construction of federated queries. Generally speaking, a federated query processes the input and passes it to each resource in the federation in a format that is accepted by the resource. So when a BARTOC FAST search request is carried out, the user’s search word is transformed into an API call for each resource in the BARTOC FAST federation. Since each API is represented as a model, this task is reduced to transforming the input string into a search request for each model. However, different models allow for different kinds of search requests. To give two examples: not all models support Boolean operators, and different labels are given varying importance in different models. Our solution to this challenge is twofold. First, instead of trying to construct formally equivalent search requests across all models, we opted for the pragmatic approach of constructing search requests with similar outputs and behaviors. Secondly, users can toggle a view of the exact API calls triggered by their string input on the BARTOC FAST results page. This whiteboxing awards users more control since unwanted API calls can simply be turned off. In the future we plan to include options for customizing federated queries (e.g., exact match over label X). A second challenge concerns the varying sizes of the resources within the BARTOC FAST federation. Unsurprisingly, big registries such as the Integrated Authority File (GND) tend to outcrowd smaller registries with respect to the number of results for a search request. However, the number of results is not an indicator of quality. Generally speaking, there are at least two solutions available to this problem: 1. Expand or shrink the BARTOC FAST federation according to need. We have already implemented this solution by allowing the user to manually (de)select queried resources in the advanced view as discussed above. 2. Rank the results. Some simple variants for ranking include giving preference to label X over label Y, or giving preference to results with less empty labels. In addition to reducing noise, this solution has an additional advantage: if a resource is still dominant for ranked results, it is a marker of the resource’s quality rather than a problem. The downside of this solution is that the neutrality of the results is no longer guaranteed. The role of BARTOC FAST would shift from an aggregator to a curator. For this reason, if we decide to implement this solution, it will be strictly opt-in. 3.0 Outlook At its current stage, BARTOC FAST is still a prototype with limited service, like a Google search kind of tool for KOS content. In this preliminary stage of development, 20 21 206 it does not fully benefit from all SKOS labels with the REST API queries or RDF data with the SPARQL queries, which allow different query types, plus the filters, aggregates, modifiers, and operators. Such extended functionalities will be part of future development steps. However, with BARTOC FAST in production, we pay particular attention to possible use cases now. BARTOC FAST provides a valuable vocabulary resource for KOS mapping (e.g. via Cocoda mapping tool by Balakrishnan et al. (2018)) and automated subject indexing (e.g. via Annif by Suominen (2019)) of multidisciplinary digital repositories like Zenodo22 and similar, that don’t come with a controlled vocabulary natively. We also intend to further grow the BARTOC FAST federation by adding instances of already modelled APIs or by modelling APIs that promise to add value in scope or depth. If you know of a resource that should be added to BARTOC FAST, please do not hesitate to contact us! Finally, we plan to provide full access to the BARTOC FAST source code under a permissive license as soon as some security issues have been addressed. References Balakrishnan, Uma, Jakob Voß, and Dagobert Soergel. 2018. “Towards Integrated Systems for KOS Management, Mapping, and Access: Coli-Conc and its Collaborative Computer-Assisted KOS Mapping Tool Cocoda”. In Challenges and Opportunities for Knowledge Organization in the Digital Age: Proceedings of the Fifteenth International ISKO Conference 9-11 July 2018 Porto, Portugal, edited by Fernanda Ribeiro and Maria Elisa Cerveira. Advances in knowledge organization 16. Würzburg: Ergon Verlag, 693-701. Steeg, Fabian, Adrian Pohl, and Pascal Christoph. 2019. “lobid-gnd – Eine Schnittstelle zur Gemeinsamen Normdatei für Mensch und Maschine”. Informationspraxis 5, no. 1: 1-25. Suominen, Osma. 2015. “Parameters in Vocabularies.ttl”. Skosmos User Forum.!msg/skosmos-users/55gXfKHWfuU/V6MboBNCDgAJ. Suominen, Osma. 2019. Annif: DIY Automated Subject Indexing Using Multiple Algorithms. Suominen, Osma, Henri Ylikotila, Sini Pessala, Mikko Lappalainen, and Matias Frosterus. 2015. Publishing SKOS Vocabularies with Skosmos. Manuscript submitted for review, June 2015. Voß, Jakob. 2019. JSKOS Data Format for Knowledge Organization Systems. 22 Chris Holstrom – University of Washington Information School, USA Joseph T. Tennis – University of Washington Information School, USA Visibility, Identity, and Personal Expression Qualitative Case Studies of Social Tagging on MetaFilter Abstract: Social tagging is often studied quantitatively and through a lens of tag typologies and terminological representation. This qualitative study examines three cases of social tagging on MetaFilter, a community weblog with extensive use of and discussion about tagging practices and the significance of tags. The cases illuminate the intersection of social tagging and the cultural and social themes of visibility, identity, and personal expression. We close with a reflection on how community affects tagging practices. 1.0 Introduction Many websites such as Stack Overflow, GitHub, Goodreads, and LinkedIn allow users to label and organize information by using social tagging. The resulting folksonomies have most often been studied quantitatively, beginning with early studies that measured the distribution and types of tags on Web 2.0 folksonomies such as and Flickr (Kipp and Campbell 2006; Munk and Mørk 2007). While these and many other quantitative studies describe tagging behaviors and outcomes, they do not deeply explore how social tagging intersects with social and cultural themes such as visibility, identity, and personal expression in online communities. We do have, however, a rich history of studying how other modes of subject indexing and classification intersect with these social and cultural themes. Bowker and Star (1999) describe how classification systems reflect and shape larger social systems and dynamics. Olson (2002) explores representation and identity in indexing and argues that the rigid vocabularies employed by early indexers like Dewey marginalize and exclude those who are different from the indexer. Tennis (2002) introduces the concept of subject ontogeny and traces how social and cultural changes affect our understanding of knowledge and classification choices over time. Furner (2007) applies critical race theory to study how ineffectively indexing languages represent racial minorities. Duarte and Belarde-Lewis (2015) examine cataloging and classification through the lens of colonialism and discuss the potential of Indigenous community-based approaches to representation and identity in information systems. While these critical and social perspectives represent a major thread of knowledge organization research, only a small body of social tagging research has followed this path. The lack of attention to the social and community aspects of social tagging can be attributed, in part, to early social tagging platforms being weakly linked to online communities. For example, users with similar tagging histories could connect with each other and existing communities could establish canonical tags to share information with each other on services like (Tonkin et al. 2008); however, these social tagging activities were not strongly linked or integrated with other community activities. As social tagging has become a more integrated feature of online communities, a body of research has emerged that has, instead of treating the existence of social tagging as a social phenomenon in and of itself, studied tagging as an activity within these online communities that informs and is informed by cultural and social 208 factors. Much of this research has focused on terminological representation. For example, Adler (2009) compares the LibraryThing folksonomy with traditional indexes, focusing on representation and vocabulary in describing books with transgender themes. Adler argues that “[T]he greatest power of folksonomies, especially when set against controlled vocabularies like the Library of Congress Subject Headings, lies in their capacity to empower user communities to name their own resources in their own terms.” Bates and Rowley (2011) also consider visibility, identity, and personal expression in the LibraryThing folksonomy and find that minority voices can be well represented and that their identities can be accurately portrayed through concerted community effort. For example, they find that LibraryThing's social tagging approach “offers benefits over LCSH... in the discoverability and representation of LGBTQ resources.” Coqc (2015) observes a community of Twitter users who tweet in Sami and use Sami hashtags specifically to raise the visibility of the endangered language. These hashtags function very much like social tags, and they allow these Twitter users to increase the visibility of an underrepresented culture and language. Cocq finds, however, that these tweets reach a limited audience outside of the small community producing them. Bullard (2016) captures user discussions about choosing preferred terms for social tagging in a fan fiction community. These discussions balance term popularity, utility for information retrieval, and potential harm to non-dominant user groups. Bullard's approach allows us to see not just the terms that are ultimately used in the folksomy, but community attitudes and values that inform the choices. This study, taking particular inspiration from Bullard, aims to build on the broad tradition of knowledge organization research that considers the intersection of subject indexing with social and cultural factors. We also aim to add to the small body of this type of research that is specific to social tagging. The cases presented in this study consider, qualitatively, the themes of visibility, identity, and personal expression in social tagging that manifest in the community blog website MetaFilter and its subdomain for discussions about the site, MetaTalk. By analyzing these cases, we aim to address the following research questions: 1) How do social taggers use tags to increase representation and visibility of underrepresented voices in an online community? 2) How do social taggers use identification tags that respect the personal, social, and cultural identities of authors and content creators? 3) How do social taggers balance the utility of descriptive tags with the personal and social value of using tags for humor, commentary, and personal expression? 2.0 Method This study considers three cases of MetaFilter community members discussing how to use social tagging effectively and appropriately in their online community. MetaFilter is a community blog that has run continuously for over 20 years, and is still active. MetaFilter has thousands of community members, paid staff and moderators, and numerous subdomains including a question-and-answer site called Ask MetaFilter. MetaFilter has global membership, but the language of the site is English. The United States and the United Kingdom are disproportionately well represented in the community. Since 2005, authors of posts on MetaFilter and its subdomains have been able to tag or label their posts with as many tags as they like, using any vocabulary that they 209 like. Although moderators and members who are closely connected with the original authors can modify the tags for a post, MetaFilter is a narrow taxonomy, meaning that the post author provides the tags for their own posts and other community members do not contribute additional tags. The tags that post authors choose are particularly important in this folksonomy, then, because the community cannot establish different descriptions or identifiers through their own tags and because these tags are displayed prominently next to the content and are used for search and navigation throughout the site. The MetaFilter community has discussed social tagging practices extensively, with 370 separate discussion threads and approximately 13,000 comments about tagging posted to the MetaTalk subdomain. These discussions are publically available and readily findable because they themselves are tagged with tags such as tags, tag, tagging, folksonomy, and labels. We collected all discussions with these tags, removed discussions about HTML tagging syntax, and began coding discussions by types of tags (Golder and Huberman 2006), motivations for tags (Gupta et al. 2011), types of initial post (feature requests, how-to questions, etc.), and topic of discussion (tagging syntax, culturally appropriate tagging, etc.). Key themes emerged during coding, and we identified particularly interesting discussions about visibility, identity, and personal expression that we surface in the following cases. 3.0 Cases In this study, we consider three cases from the large set of tagging discussions on MetaTalk to illustrate and explore emergent themes from ongoing coding and analysis work. Two of the selected cases are contained completely or predominately in single discussion threads. The other case spans multiple discussion threads that are thematically connected. These cases were chosen because they illuminate key questions and themes about social tagging and folksonomies, they explore questions that are difficult to answer with quantitative analysis, and they feature rich and thoughtful discussion about how tagging intersects with concepts of visibility, identity, and personal expression. 3.1 #JulyByWomen and diverse global voices The MetaFilter community regularly reflects on itself as an online community, aiming to create a welcoming space with open discussion and diverse voices. The #JulyByWomen campaign arose from discussions about women being underrepresented in the MetaFilter community, particularly as authors of posts (viggorlijah 2014a). The campaign encouraged women to author more posts during the month of July 2014 and to use the tag JulyByWomen on those posts to raise the visibility of women on the site. The campaign diverged from MetaFilter's typical use of tags because the JulyByWomen tags reflected the identity of the author instead of describing the content of the post. Despite—or because of—this divergent use of tags, the campaign was considered a major success. #JulyByWomen increased visibility of and participation by women (viggorlijah 2014b), and the use of a consistent tag also provided an indexing benefit, as all of these posts were automatically collected in a single place (“Posts tagged with julybywomen” n.d.). 210 The success of #JulyByWomen led to further discussion about using tags to increase visibility for underrepresented voices and topics on MetaFilter. The case that we consider in this study occurred in a MetaTalk thread titled “#GlobalVoices/#NonWestNov/#GlobalSouthSept,” which was posted on August 5, 2014, directly after the conclusion of the #JulyByWomen campaign and in consultation with, viggorlijah, the MetaFilter member behind #JulyByWomen (divabat 2014). The discussion ran through August 11, 2014, with 240 total comments made by 91 different users. The post author, who identified themselves in the discussion thread as an “Asian international student currently floating between countries,” aimed to introduce a new tag to promote posts by community members “that are outside the White Western norm, especially those outside the US” and “posts about people, places, and so on that take place outside the West” (divabat 2014). The discussants readily agreed with the premise that non-Western voices and topics were underrepresented on MetaFilter and with the goal of increasing their visibility. Community members also supported using a tag as an organizing principle for achieving these goals. However, the proposal received significant constructive criticism, and this criticism can inform our understanding of how tags and identity interact. One criticism focused on a key difference in focus between #JulyByWomen and the proposed campaign for global voices. #JulyByWomen focused specifically on increasing visibility of women as authors of MetaFilter posts, while the proposed campaign aimed to increase visibility for both underrepresented post authors and underrepresented topics. This lack of focus caused confusion about the meaning of the proposed tag. Should Western community members use the tag to post about non- Western topics? Should non-Western community members use the tag when posting about general topics? The discussion did not reach consensus on how to support both proposed goals with a single tag. Additionally, the site founder, mathowie, and other community members expressed concern that encouraging specific subjects would make the campaign less successful than #JulyByWomen. The criticisms about focus suggest that successful tags, especially tags that communicate complex information like identity and culture, should not be overloaded with multiple meanings. Another critique of the proposal was the lack of clarity in the proposed tags. Clarity is important because, while affording individuals freedom to choose their own vocabulary is considered a feature of social tagging in general, community members need a shared meaning for a tag to use it successfully in the context of a coordinated campaign. MetaFilter community members expressed confusion about the geographical and cultural boundaries of proposed tags like GlobalVoices, GlobalSouthSept, and NonWestNov. Some were unfamiliar with the term “Global South,” some were unsure of the boundaries of “The West,” some wondered why the United Kingdom might be lumped together with the United States, and some wondered whether non-white topics from Western countries fit the campaign. In contrast, #JulyByWomen had clearer boundaries. Despite gender being complex and non-binary, the concept of “woman” created less confusion than the cultural and geographical constructions discussed in this case. Clarity is not the only rhetorical aspect of tag construction that matters for visibility. Positive framing is critical to making underrepresented groups visible, as negative framing can marginalize or “other” these groups. For example, JulyByWomen 211 was a positively framed and inclusive tag, while JulyByNotMen would have been a negatively framed tag and emphasized who was excluded. MetaFilter community members proposed a variety of tags for the global voices campaign. Those tags that used negative framing, like NonWestNov, RestOfTheWorld, and BeyondUSA were criticized for “othering” the very groups and topics that they aimed to promote. In contrast, positively or neutrally framed tags like PostMoarGlobal, SeptemberForTheWorld, and GlobalFilter were more considered more inclusive and more clear about boundaries. Finally, this case shows that visibility has a temporal component. #JulyByWomen succeeded in part because it had clear start and end dates. One proposal for the global voices project suggested that there be an ongoing effort to raise visibility of these voices and topics. This idea was roundly rejected as a form of segregation by the post author and not revisited by other community members. “Having it just be a 'use this tag when you're talking about stuff outside the US' kind of defeats the purpose of this project, which is to make a concentrated effort to highlight and showcase material from all around the world. It can continue past the month, but right now just having it as a general-purpose tag feels like it's siloing off those posts even more” (divabat 2014). Ultimately, the global voices campaign did not achieve the success of #JulyByWomen, with no officially adopted tag and only 17 posts tagged with the most popular proposed tag, PostMoarGlobal (“Posts tagged with postmoarglobal” n.d.). 3.2 Post tags and deadnames In this case the tag is an author's name, Daniel M. Lavery (Wikipedia contributors 2020). This author has published work under the names Mallory Oldberg, Daniel Mallory Oldberg, and Daniel M. Lavery. The question put to the discussion thread is two-fold. “I posted Daniel Mallory Ortberg's latest instalment of his ongoing serial fic. When tagging the post, I was conflicted about tagging it with his deadname (which is “Mallory Ortberg”. Two questions: 1) on a technical level, will tagging with his full name link up with older posts tagged with his deadname, since the latter is included in the former? He's the same author, and that continuity of work seems valuable. 2) on a trans-etiquette level, is this shitty and equivalent to deadnaming? (I'm more interested in hearing what trans folks have to say on this one.) I ended up tagging the post with his name and his deadname but I'm questioning that” (sixswitch 2019). The first question shows us a mental model on how tags work in MetaFilter. The second question is framed as an etiquette question, but given the responses in the thread it is not limited to that. For this tagging community it becomes a contested conceptual move. The term deadname is called into question as inappropriate. So we have two levels of terminology work that surface. One in the tag and its content and the other what we call part of the content of that tag. Both of these rely on community input to reconcile. The first question is immediately answered. It is a technical question of the mechanics of the tagging system of the site. Further, action is taken by a moderator, Eyebrows McGee, to add the tag danielmalloryoldberg to all posts that had a combination of those names or a subset of those names. The second question, while framed as an etiquette question, quickly turns into a discussion of the use of deadname in the post. The community rejets this terminology for a name used by a person before they transition to another gender. A decision is taken to flag tags like this as “flag with note” (Lobstermitten 2019). What follows are various discussions of the dynamics of interacting with trans people both in the community of MetaTalk and through tagging. 212 This etiquette question goes right to the heart of identity and its relationship to tagging. It required the community to engage with sensitive terminology and negotiate how identity changes and does not change in the context of gender transitioning. This engagement and negotiation required intellectual work to understand the role of tags in identity representation. The community did not demand change other than that taken by the moderator. Further, there was no resolution on best practices that the community adopted in this context. It remains to be seen whether best practices will surface around this particular issue at this time. 3.3 Personal expression in tagging Tags on MetaFilter are valued primarily for their utility, as evidenced by recurring questions about tagging best practices and requests for search and browse features to increase tag utility. For example, MetaFilter community members ask about how to ensure that tags accurately describe the subject matter, how to make tags sufficiently specific, and how to format tags syntactically and orthographically to support retrieval in search and browse modalities (Going to Maine 2016). MetaFilter community members, many of whom are academics and librarians, answer knowledgeably about indexing best practices and the technical details of MetaFilter's tagging and retrieval ecosystem. Despite the primary focus on utility, tagging is not seen as exclusively utilitarian by the MetaFilter community. Because tags are assigned by post authors, they are considered part of that post author's personal expression. Exercising the freedom that this perspective affords, some community members use tags to joke about and comment on the topics of their posts. These personal expression tags generally do not support retrieval and are rarely descriptive in an traditional indexing sense, but they do allow community members to express their personality and viewpoints and, when used appropriately, can build a sense of community. The question, then, becomes: When and how are personal expression tags beneficial, when and how are they inimical, and how does a community balance these potential benefits and harms? Multiple MetaTalk discussions recognize the benefit of the creative use of tags. For example, one post asked community members what their favorite tags are and received numerous spirited responses about obscure and humorous tags (Fizz 2017). The humorous use of tags is evident throughout MetaFilter, such as one post with 75 tags, all synonyms for nonsense (Just this guy, y'know 2017). These tags entertain, provide commentary, and establish the personality of the community. However, some community members have asked whether humorous tags are harmful. For example, one member asked whether an extremely long joke tag should “be frowned upon” (i love cheese 2006). The consensus response was that such tags are not harmful to the folksonomy and have value outside of indexing: “I don't think it affects the usefulness of tags, as long as more accurate tags are included. Plus it makes me smile” (cali 2006). This sentiment is repeated in a discussion about a more controversial tag, batshitinsane (UKnowForKids 2005). “Silly tags are only noise when that's the only tag you use for your post. If there's four other decent tags included that people might actually use for a search, then why does it matter if there is one that no one will ever use as a search term?” (23skidoo 2005). However, the tag batshitinsane is problematic for reasons other than being noise in the folksonomy—it is derogatory, vulgar, and persistently popular as a form of humor. As 213 such, it has been the topic of multiple discussion threads. One such post asks if the term is offensive (CCBC 2010). MetaFilter moderators determined that the tag should not be outright banned, but encouraged community members to use it responsibly. They considered gratuitous use of the tag in posts about mental health and use of the tag to editorialize and stifle discussion to be irresponsible uses, showing that in this case community norms and not utility determined the appropriateness of a personal expression tag. Another form of personal expression through tagging is not about humor, positioning, or performance. It is about identity and visibility. Shortly after MetaFilter implemented tagging, one prolific user tagged all of their posts with their username to increase their visibility. This approach to tagging ran counter to the intent of tags being descriptive of content, but some community members considered it “harmless fun” (calwatch 2005). Site founder mathowie, however, considered this type of personal expression through tagging to be harmful. “Tags were added as a way to categorize everything on the site under descriptive keywords. The poster's name doesn't really impart any info, and since you can already find every post by a username, I repeat that it's in effect already built-in and pointless to essentially state the same information twice. There's no need to make usernames into explicit tags. I'm sure quonsar just wanted to see his name in lights on the popular page, which he did, and now is gone. Whoop-de-doo” (mathowie 2005). This case of tagging as personal expression was considered harmful for reasons of utility—it added confusing noise to the folksonomy—and for reasons of community norms—it did not build community through personal expression, instead benefiting just one user. 4.0 Findings and Conclusion As is reflected in the literature, tagging is an activity that expands our notion of indexing (Golder and Huberman 2006; Tennis 2006; Munk and Mørk 2007). Tags are powerful symbols of community and are infused with the power to create inclusivity in the community. Therefore, care must be taken to understand how best to curate a tag collection to promote visibility, identity, and personal expression in the community. The power of tags, and the care that they require, are reflected in the extensive discussions on MetaTalk. We analyzed three cases in these discussions to study, qualitatively and deeply, how the MetaFilter community uses the power of social tagging to increase representation and visibility of underrepresented voices, to respect the identities of authors and cultural groups, and to balance the utility of descriptive tags with humor and personal expression. We found that tags can increase visibility for underrepresented groups, provided that the tags have a clear focus and purpose, clear boundaries, positive and inclusive framing, and are part of a time-bound, concerted campaign. Especially in folksonomies that support unlimited tags for each resource, these visibility tags do not harm retrieval utility and can have a significant benefit of inclusivity, both practical and symbolic. However, the community must agree on a canonical tag for such efforts to succeed. The effort to promote voices outside of the Western and white voices that dominate the community failed because the community, despite significant care and effort, could not establish a clearly defined canonical term to use as a tag. Similar to our first case, which showed that care and attention can increase the visibility of women's contributions on MetaFilter, our second case demonstrated that 214 substantive and careful effort can ensure that a social tagging community respects a trans author's wishes in naming. Community moderators implemented a technical solution to unite works authored under different names, and the community discussed with care both the intersection of social tagging and personal identity, and the ethics–not just etiquette–of deadnaming and the use of the term “deadname” itself. This discussion illuminates the symbolic and practical power of tags as signifiers of identity, and shows the importance of not tarnishing tags by implementing them without care and attention. Finally, in the third case, we saw the tension present between utility and humor in tagging. The role of humor was both contested and celebrated in this context. The community viewed potentially superfluous tags as not harmful, provided descriptive tags were also provided, and beneficial to personal expression, humor, and a sense of community. Specific instances of humor were called out as inimical, however. The use of humorous tags to trivialize sensitive subjects, to stifle balanced discussion, or to promote one's self were considered inappropriate and harmful uses of the power of tags. It seems, then, that using tags for personal expression and humor are generally supported, provided their use does not undermine the core utility of tags or the social values of the community. The deep, qualitative analysis of social tagging that is presented in this study represents a small step toward understanding social taggers in a more robust way, akin to how we understand the motivations and knowledge of professional indexers. Through the vibrant communication among members of the MetaFilter community, we can see the attitutudes, motivations, knowledge, and values that shape social tagging in an online commununity. We recommend further study of modern social tagging sites to better understand how social tagging works as an integral feature of online communities. References 23skidoo. 2005. “Bitshitinsane Tags.” MetaTalk. Adler, Melissa. 2009. “Transcending Library Catalogs: A Comparative Study of Controlled Terms in Library Of Congress Subject Headings And User-Generated Tags In Librarything For Transgender Books.” Journal of Web Librarianship 3, no. 4: 309-331. Bates, Jo and Jennifer Rowley. 2011. “Social Reproduction and Exclusion in Subject Indexing.” Journal of Documentation 67: 431-448. Bowker, Geoffrey C. and Susan Leigh Star. 1999. Sorting Things Out: Classification and its Consequences. Cambridge, Mass.: Massachusetts Institute of Technology. Bullard, Julia. 2016. “Warrant as a Means to Study Classification System Design.” Journal of Documentation 73: 75-90. cali. 2006. “Should Including Long Humorous Tags in AskMe Posts Be Frowned Upon?” Meta- Talk. calwatch. 2005. “Someone Likes the Sound of His Own Name.” MetaTalk. CCBC. 2010. “Is the Term 'Batshit Insane' offensive?” MetaTalk. Cocq, Coppélie. 2015. “Indigenous Voices on the Web: Folksonomies and Endangered Languages.” The Journal of American Folklore 128, no. 509: 273-285. divabat. 2014. “#GlobalVoices / #NonWestNov / #GlobalSouthSept.” MetaTalk. 215 Duarte, Marisa Elena and Miranda Belarde-Lewis. 2015. “Imagining: Creating Spaces for Indigenous Ontologies.” Cataloging & Classification Quarterly 53, no. 5-6: 677-702. Fizz. 2017. “#FavMetaTag.” MetaTalk. Furner, Jonathan. 2007. “Dewey Deracialized: A Critical Race-Theoretic Perspective.” Knowledge Organization 34: 144-168. Going to Maine. 2016. “How Should You Tag Posts?” MetaTalk. Golder, Scott A. and Bernardo A. Huberman. 2006. “Usage Patterns of Collaborative Tagging Systems.” Journal of Information Science 32, no.2: 198-208. Gupta, Maniah, Rui Li, Zhijun Yin, and Jiawei Han. 2011. “An Overview of Social Tagging And Applications.” In Social Network Data Analytics, edited by Charu C Aggarwal. Boston, MA: Springer, 447-497. i love cheese. 2006. “Should Including Long Humorous Tags in AskMe Posts be Frowned Upon?” MetaTalk. Just this guy, y'know. 2017. “How to Navigate the Bullshit-rich Modern Environment.” Meta- Filter. Kipp, Margaret E.I. and D. Grant Campbell. 2006. “Patterns and Inconsistencies in Collaborative Tagging Systems: An Examination of Tagging Practices.” Proceedings of the American Society for Information Science and Technology 43, no.1: 1-18. LobsterMitten. 2019. “Post Tags and Deadnames.” MetaTalk. mathowie. 2005. “Someone Likes the Sound of His Own Name.” MetaTalk. Munk, Timme Bisgaard and Kristian Mørk. 2007. “Folksonomy, the Power Law & the Significance of the Least Effort.” Knowledge Organization 34: 16-33. Olson, Hope A. 2013. The Power to Name: Locating the Limits of Subject Representation in Libraries. Dordrecht: Kluwer Academic Publishers. Posts tagged with julybywomen. (n.d.) MetaFilter. Posts tagged with postmoarglobal. (n.d.) MetaFilter. sixswitch. 2019. “Post Tags and Deadnames.” Metatalk. Tennis, Joseph T. 2006. “Social Tagging and the Next Steps for Indexing: Fordist Reflexivity and Intertextuality.” In 17th ASIS SIG/CR Classification Research Workshop. Tennis, Joseph T. 2002. “Subject Ontogeny: Subject Access through Time and the Dimensionality of Classification.” In Challenges in Knowledge Representation and Organization for the 21st Century: Integration of Knowledge Across Boundaries: Proceedings of the Seventh International ISKO Conference 10-13 July, 2002 Granada, Spain, edited by María José López-Huertas. Advances in knowledge organization 8. Würzburg: Ergon Verlag, 54-59. Tonkin, Emma, Edward M. Corrado, Heather Lea Moulaison, Margaret E.I. Kipp, Andrea Resmini, Heather D. Pfeiffer, and Qiping Zhang. 2008. “Collaborative and Social Tagging Networks.” Ariadne 54. UKnowForKids. 2005. “Bitshitinsane Tags.” MetaTalk. viggorlijah. 2014a. “#JulyByWomen project.” MetaTalk. 216 viggorlijah. 2014b. “#JulyByWomen slightly passes goal.” MetaTalk. Wikipedia contributors. 2020. “Daniel M. Lavery.” In Wikipedia, The Free Encyclopedia. Gregory H. Leazer – University of California, Los Angeles, USA Robert Montoya – Indiana University, USA Jonathan Furner – University of California, Los Angeles, USA Numerical Classification and Complexity Developing a Classification of Classifications Abstract: The difference between monothetic and polythetic classification is well established in the literature (Sneath, 1962; Sokal & Sneath, 1963; Needham, 1975). Monothetic classification defines a group such that all members share specific common features, and that, at least in regard to their defining characteristics, any member of the group is substitutable for another. Polythetic groups, on the other hand, are “composed of organisms with the highest overall similarity, and this means that no single feature is either essential to group membership or is sufficient to make an organism a member of the group” (Sneath, 1962, p. 291). Numerical classifications, per Sokal and Sneath, are defined as a type of polytheticism. We argue that polythetic and numerical classification are not coterminous, and that all three classifications vary along an axis of complexity. Distinguishing characteristics of complexity include the number and nature of membership criteria, the internal structure of a classification, and the nature of consensus used in determination of a classification. 1.0 Introduction: Monotheticism, Polytheticism, and Complexity The difference between monothetic and polythetic classification is well established in the literature (Sneath 1962; Sokal and Sneath 1963; Needham 1975). Monothetic classification defines a group such that all members share specific common features, and that, at least in regard to their defining characteristics, any member of the group is substitutable for another. Polythetic groups, on the other hand, are “composed of organisms with the highest overall similarity, and this means that no single feature is either essential to group membership or is sufficient to make an organism a member of the group” (Sneath 1962, 291). Monotheticism is commonly attributed to Aristotle (Topics; see, e.g., Hjørland 2017); polytheticism a creation of Wittgenstein (Philosophical Investigations). Lakoff also attributes polytheticism to the work done by Rosch in psychology, a body of work she called “prototype theory.” In what ways can a classification be said to be complex? On the one hand, classification generally represents a reduction of the universe—a simplified model—by substituting a smaller number of kinds for a larger number of individuals. The substitution of types for tokens also foregrounds important or essential features— definitional characteristics in the parlance of monotheticists—so that not merely the number of things perceived is reduced, but the act of perception itself is concentrated on meaning-bearing features, with other features rendered to the background or discarded as inessential or non-salient. Finally, the identification of essential or salient features allows for cognitive economy. Categories (or the prototypes that reside at the center of polythetic groups) foreground a series of quick mental operations including recognition, memory, and making inferences regarding classified individuals or objects. For example, classifying something as metallic allows for expedited reasoning regarding the object’s look and feel, its hardness and weight, durability, and the possibility that it could be magnetic. Exceptional metals, such as mercury or sodium– potassium alloy (NaK), which are liquid at room temperature, are handled as mental 218 exceptions. Such categories, including flightless birds, while exceptions, still represent simplifications of the world insomuch as a general category and its exceptions is a reduction in complexity from mentally listing all the subordinate species of birds, much less the extraordinary social and psychological burden of treating all birds as unspeciated individuals. While a classification is by its nature is a simplification, it can be simple or complex in relation to another. Monothetical classification, as a generalization, can be said to be simpler than polythetic classification. Once we have consensus regarding membership criteria for a class, the process of assigning individual cases to the class can be relatively simple. For example, osteoporosis is defined as “A value of [bone mass density] 2.5 standard deviations or more below the young adult mean (T-score ≤ –2.5)” (WHO Scientific Group on the Prevention and Management of Osteoporosis 2003, 57), and further provides descriptions of standard diagnostic procedures and reference standards for generating T-scores. Clear definitional criteria provide simple binary membership rules. The internal organization of monothetic classes are also comparatively simple. Once an individual meets membership criteria, they are in the class, and all members of the class are substitutable for each other in reference to those definitional characteristics. This relative simplicity of monothetic classifications isn’t to suggest that all such classifications are easy to construct. In fact, because the membership criteria are of high consequence—disallowing for partial membership at the margins of the class, for example—it can be difficult to specify those criteria so that the resulting class includes what we want to include, and excludes what we have habitually or tacitly excluded. Bennett (1980) and Gould (1981) discuss the construction of the class “zebra” which traditionally refers to several extant and extinct species of the genus Equus. Gould asks whether “they form a true evolutionary unit” (6) and finds that the class as popularly constructed may not meet the cladistic requirements derived from evolutionary theory. How do we proceed? Do we admit that we have irreconcilable differences on the construction of the class and arbitrarily choose one method for developing definitional characteristics, or do we gerrymander the class in such a way that it includes what we want to include and rig the rules of definition so we get the right result? And just because we have definitional characteristics doesn’t mean they are easy to apply. The Miller test (United States Supreme Court 1973) provides the following definitional criteria for the class “obscene materials” (24): a. whether “the average person, applying contemporary community standards,” would find that the work, taken as a whole, appeals to the prurient interest …; b. whether the work depicts or describes, in a patently offensive way, sexual [or excretory] conduct specifically defined by the applicable state law; and c. whether the work, taken as a whole, lacks serious literary, artistic, political or scientific value. To be considered obscene, all three criteria must be met. The third criterion, known colloquially as the SLAPS test, is, like many legal definitions, notoriously complex in interpretation and application, requiring an assessment (inter alia) of literary value. The U.S. Internal Revenue Code of 1986 (United States 2018) is primarily a classification that assigns people into tax brackets—at 3,842 pages, it can only be characterized as a highly complex monothetic classification. However, despite their potential complexities, monothetic classifications are generally simpler than polythetic classifications. Like a good theory, monothetic 219 membership criteria are designed to be parsimonious in nature and limited in number, and when well designed, are made with an effort to find clear and meaningful distinguishing markers of membership. Monothetic criteria are designed to yield clear outcomes—either in or out—and therefore the internal structure of a class is monotonic: in regards to membership criteria, each member of a class is substitutable for another. The standard description of polythetic classification is given in Wittgenstein's notion of family resemblances which describes a nuanced yet relatively simple form of polytheticism based on a limited number of generally well-understood though perhaps hard-to-measure criteria. “We see a complicated network of similarities, overlapping and criss-crossing: similarities in the large and in the small … I can think of no better expression to characterize these similarities than ‘family resemblances’: for the various resemblances between members of a family—build, features, color of the eyes, gait, temperament, and so on …” (Wittgenstein 1953/2009, §66–67). The notion of family resemblances is perhaps understood as a discrete and small number of criteria whose similarities are best intuited rather than measured. Compared to classical monothetic groupings, polythetic classes are more nuanced and complex. Definitional criteria are potentially larger in number, with an appeal to a gestalt of formal patterns of overlapping criteria. Because not all definitional criteria are universally present and are understood best in relation to each other, the internal structure of a class generally includes membership gradation, with central and marginal members, and fuzzy membership boundaries. The distinction between monothetic and polythetic classifications along an axis of complexity is pretty straightforward: discrete definitional criteria, all of which must be present, versus a shifting array of definitional characteristics, incorporating nuances of similarity and difference. 2.0 The Construction of Numerical Classifications Sokal and Sneath define numerical classification as “the grouping by numerical methods of taxonomic units based on their character states” (1973, xii). For Sokal and Sneath, numerical classification is a type or subset of polytheticism: it is polytheticism with empirical observations, concrete measurements, and statistical assessments of similarity and correlation—and an appeal toward scientific justification, or alternatively, scientific pretense. Montoya (in press) states that numerical classification is the grouping of entities (organisms, documents, data, etc.) using quantifiable measures of evaluation (such as traits, terms, values, etc.). Numerical taxonomy includes the processes of selecting representative entities, weighting entity values, clustering entities based on these values, relating entities into clusters, and including them into a system based on uniform theoretical commitments. Strictly speaking, numerical classification entails that membership criteria be expressed in numerical form, and the monothetic classification defining osteoporosis is precisely of that nature. We argue that while monothetic, polythetic, and numerical classifications contain simple and complex examples, they are not coterminous, and they can be viewed as existing at different positions along an axis of complexity. The purpose of our paper is to lead to the development of a classification of classifications. One dimension of this classification is to consider the complexity of the similarities or resemblances that form the basis of a classification. Eventually we will discuss additional modalities of classification, including Hjørland (1998; 2017) who differentiates classifications by 220 their “methodology” of class formation, or the various kinds of theoretical commitments implicated by various forms of classification. Queer classification in particular is a useful concept for understanding the implications of those theoretical commitments, and emphasizes membership criteria that are semiotic (i.e., interpretive and socially constructed) in nature. However, in this paper we will present an initial assessment of complexity, in relation to monothetic, polythetic, and numerical classification, to determine whether complexity is a useful criterion of comparison for classifications generally. Finally, we also believe that numerical classification is perhaps the least understood and least theoretically integrated into the literature of knowledge organization, and so the bulk of our remaining comments will be aimed to address it primarily. 2.1 Assessing the Complexity of Numerical Classification While we have defined numerical classification above as strictly being numerical in nature, it has a number of additional features as it is presented in its canonical form by Sokal and Sneath (1963). The features used in numerical classification—which is aimed at the identification of species and their evolutionary development—are not merely numerical but also numerous. For example, Bennett (1980, 273–274) identified 21 features in her analysis of zebras, including, for example: 1. Number of functional digits; 2. Degree of isolation of protocone and hypocone; 3. Presence and size of preorbital facial fossa; 4. Presence and degree of development of secondary infundibular fold on I2; 5. Size and position of inferior canines; ... 21. Presence of opisthotic dolichocephaly. Common uses of numerical classification incorporate large numbers of observations which form the basis of similarity that can only be assessed statistically via complex computational techniques. The feature sets can be quite large, and generally a feature is included if it is hypothetically relevant to class formation. Because features have uncertain relationship to any class, some features may be non-salient, and even redundant, in the sense that they strongly co-vary with another feature, marking the possibility that both features might be the consequence of some unidentified latent feature. These all indicate complexity in a potential classification, and are in contrast with polythetic classifications where features may be difficult to explain but all have at least tacit relevance to the class in question. 2.2 Consensus How consensus functions in numerical classifications depends, in part, upon when in the process consensus is being implemented as a rubric for structuration and decisionmaking. If we look to consensus classifications in the biological or biodiversity world, as we see in the Global Biodiversity Information Facility (GBIF), for example, consensus is often used as a mechanism to create “taxonomic backbones” upon which data points for various species are appended (2019). In this case, consensus is used as a mechanism to provide a generically-agreed upon taxonomy that can then serve as an organizational and access mechanism for species data that, at their points of origin, may or may not have been contextualized within taxonomies with the same philosophical or 221 methodological commitments. Consensus backbones essentially present a universal structure to avoid the inevitable conflicts between one taxonomic opinion and another. The same kind of universal approach is used in many bibliographic systems as well, though often not in automated ways—the Dewey Decimal Classification (DDC) system, for example, uses disciplinary subject partitions to organize documents. The result is that with the DDC, as in GBIF, some class decisions are counter to some pockets of expert opinion. The reorganization of the rosid family of angiosperms is a case in point (Green and Martin 2013). Due to the prevailing popularity and use of phylogenetic approaches, the DDC found enough warrant to change the schedule to reflect new scientific approaches. Yet despite the fact that phylogenetic approaches are now preferred and accepted, this same schedule must be used to organize documents even not subscribing to this schema. Consensus as an organizational approach defers to the majority knowing that the scientific world of opinion is plural. Given this reality, entities placed within consensus systems should be understood to have several distinct, and perhaps conflicting, identities: on the one hand, they have their position within the consensus taxonomy, juxtaposed with other entities within an environment that is ostensibly more global in nature; on the other, one also must understand that, at its point of origin, that same entity may or may not have been constructed or contextualized on the same ontological terms. Complexity increases if we cannot wrest these two identities apart from one another. So, while the DDC is certainly consensus-based, a body of editors, as well as particular cited evidence, or warrant, can be pinpointed as the source of this change— and thus is also a source of bias in a classification’s construction. In the case of GBIF, however, and other automated synthetic systems, consensus decisions are not so easily visible, nor are the arbiters of this change identifiable. GBIF’s backbone taxonomy is “updated regularly through an automated process in which the Catalogue of Life acts as a starting point also providing the complete higher classification above families” (GBIF, 2019). It is precisely this “black box” of automation that makes numerical approaches especially complex, more difficult to understand. If we think of a more embedded automated system such as the Google search engine, we see this phenomenon clearly: there is no way to understand how algorithmic techniques are intervening to propose certain top-level classes for a seemingly-endless corpus of online documents. In these spaces we can ask, Why do these terms (or traits, or phenomena, or entities) mean more than any other? How are entities and traits valued in relationship to one another? And, What were the alternate possibilities by which these results could have been interpreted to an equally-valid state? Google is also a prime example of how the question of when consensus intervenes becomes important. For Google, or any other dynamic searching or retrieval mechanism, classifications are dynamic, which means that the principles for construction, quantities being examined, and the resulting classes, differ each time we query a classification. For example, searching for a complex phrase in Google one day can provide a different set of results than another day. This is because the commitments for classification construction and the body of possible entities are changing. Humans also intervene in these algorithmic structures in ways that we cannot totally understand. Safiya Noble made this readily apparent in her work on race and algorithmic power (Noble 2018). In 2012, Noble published an article in Bitch magazine noting the 222 marginalization and racial classification of “black girls” and women on Google’s interface. “By August 2012,” Noble states, “Panda (an update to Google’s search algorithm) had been released, and pornography was no longer the first series of results for ‘black girls’; but other girls and women of color, such as Latinas and Asians, were still pornified” (2018, 4). In cases like these, automated consensus mechanisms, and the constant rate at which they are applied, confound our ability to understand them: even if—and that is a big if—we manage to understand the logic of classification at one moment, the results may have very little bearing a week later. Returning to the notion of consensus, it is necessary to state an obvious fact: consensus is not, despite rhetoric to the opposite, equatable to universal agreement—at least not in the case of classification. Any one person can contest decisions made through automated means and, in fact, a critical approach to this work would support and popularize this approach. We can perhaps go so far as to say that consensus may be more-or-less equivalent to authority, so far as we, the users, acquiesce in some way to the fact that so-and-so system will be authoritative in one situation or another. GBIF, for example, is an authoritative source for data, but it certainly makes no claims about agreement within the scientific community about the taxonomic perspective it proliferates. Problems arise, again as Noble shows, when the authority of systems becomes authoritative without a sense of critical analysis. Google results should hold no authority on the question and the formation of our racial, ethnic, or cultural identities, and yet, this is precisely how they are being used whether purposefully or not. This is a critical point to understanding, and limiting, the impending wave of classificatory systems resulting from the application of artificial intelligence solutions to big data, for example, in “smart city” projects. The complexity of data and the sophistication and apparent neutrality of algorithms result in decisions that bear the authority of the onlypartially understood classificatory regime but whose actual heuristics and resulting classificatory decisions make little sense, and which fail to provide justification or alternative possible constructions. Unquestioned algorithmic complexity is a dangerous social reality and, as such, it makes good sense to delineate what we do and do not understand about these systems of organization. 2.3 Issues of Complexity: What We Know and What We Know We Don’t Know Classification and knowledge organization’s long history of scholarship gives a good grounding to understand some of the basic known factors about classification that we understand to be fluid, contended, and arbitrary. The space here is not sufficient to mention them all, but some basic issues can be identified as they relate to numerical classification. When thinking about the quantification of factors in numerical and algorithmic taxonomic methods, we know well that the values we apply to attributes or entities are, if well intended, arbitrary nonetheless. Let us take the example of a phenetic classification of pine trees—a classification based on formal physical characteristics. There are some quantities that we might find important: that they are evergreen, the texture and structure of their bark, dimensions and characteristics of cones, needle count and position, height, etc. That these qualities are used to classify a pine tree is arbitrary in that we could have identified any number of what might be considered non-essential characteristics: flexibility in the wind, utility as fire-wood, etc. Likewise, when we think of the numerical classification of documents, a system may use terms, co-term 223 prevalence, document source, authorship, keywords, etc. In documentary analysis this aboutness is essential to description but also evasive in terms of method identification. And then there are factors for classification that may be difficult to identify and measure. It wasn’t until genetic testing could identify sequences of importance (the COI or COX1 “barcoding” gene, for example) that phylogenetic approaches opened the door for revolutionary taxonomic methods in the biodiversity sciences, for example. We could finally “measure” organism classes in a way that was “universal” and replicable. In the bibliographical world, the notion of relevance (Wilson 1973) has always been identified as central to information retrieval and selection, and yet truly quantifying relevance in a way that meets the searching criteria for infinite moments of need still evades us. Relevance is the primary goal of search engines given that searches are explicitly intended to satisfy some situational need. And so we know that there are some qualities that are obvious, some quantities that are difficult to define, and in both cases, what we choose to seek out is wholly arbitrary based on our assumptions about the world, our ontological commitments, and our contextual purposes for organizing. On top of the qualities we use to class entities, we must also insert our own hermeneutic skills to interpret their meanings as they relate to one another. Relationships are neither given, nor obvious, and will always depend on the context in which they should function. “To specify a relationship, we may first designate all the parties bound by the relationship (hereafter referred to as the participants in the relationship) and then specify the nature of any relationship that binds them together” (Green 2008). This means that relationships made by a scientist using phylogenetic methods, for example, will be based on specific and arbitrary quantities and articulated in equally arbitrary thresholds for a given set of taxa (even if the decisions are evidence based and properly “scientific”). But these constructed relationships are not natural relationships: they are imposed interpretive frames. In the end, why we make any given decisions can be based on clear guidelines or can be based on tacit or unconscious factors. In phylogenetics, clear mechanisms to distinguish one distinct species from another can be identified, along with the thresholds used to assess taxa. When Francis Galton was using composite photography to classify types of criminals in the last quarter of the 19th-century, however, it is clear that racial factors were taken into consideration. The history of the classification of race and humans is riddled with these conscious and unconscious biases (for example, see Smith 2015). Some unconscious biases impact classifications that are less easily identifiable. Ontological commitments, for example, are sometimes difficult to archaeologically unearth in certain biological taxonomies without the producer on hand to explain certain decisions. Numerical classifications add a layer of complexity onto this that is significantly more complicated: the fact that statistical models are both mathematically complex and difficult to reverse engineer to understand class partitions at any given point in time. Once again, looking at Noble’s work (2018), an essential problem with automated organization is that it becomes very difficult to identify the location(s) of error when assessing a given set of results. Error in this sense is multi-valenced; it is locational (as in, there is ostensibly a code location and directive to locate), temporal (when, in fact, did this decision occur in a long scale of decision locations?), and multivariable (what variables or quantities were being referenced at that particular point and time?). 224 3.0 Function Our paper has attempted to appraise three kinds of classifications in terms of their complexity. Monothetic, polythetic and numerical classifications are judged to be increasingly complex, by virtue of the number and nature of their definitional characteristics, their internal structure, the nature of consensus in their formation, and the comprehensibility of their resulting classes. However, as we have also noted, there are simple and complex examples within each type of classification. While we believe that monothetic classifications are generally simpler that their polythetic and numerical counterparts, such a conclusion may be shaped in the way we have generally understood each type of classification. By choosing Sokal and Sneath, for example, to represent the canonical form of numerical classification, we may have unwittingly opted into a more complex version of that classification. We would not typically use the taxonomy of species as the primary and nearly exclusive use of any one type of classification. Additionally, we are certain that we have not located all the various modalities of classificatory complexity. Definitional characteristics and the nature of consensus have figured prominently in our previous work on classifications, but there are certainly other salient facets to complexity that we have not yet explored. For example, we know that the automated and numerical classifications that predominate in common web-based applications are also closed and proprietary (e.g., Pasquale 2015), making them inherently more complicated to assess. This paper should be viewed as an initial foray into assessing the complexity of classification. Additionally, we have been operating under an assumption that the three kinds of classifications presented here are substantially different in kind. There may yet be a theory that, for example, presents them each, in sequence, as a generalization of a previous model. That is, perhaps, that monothetic classification is a more specific kind of polythetic classification, with more precise definitional characteristics. That a given classification might be more complex may not, strictly speaking, mean that it’s actually different. Finally, we need to consider that complexity ultimately may not be the right dimension for explaining differences amongst classifications. Another characterization, one that partially replicates the criteria used here or even one entirely novel, might be more effective at differentiating and understanding classifications. Complexity is simply our first, best stab at trying to understand classifications. Numerical classifications, in their automated and recent guises, are relatively entries in the history of classification, and accommodating them into our understanding of classifications generally is an unfolding process. Over time, the novelty of numerical classification—and the significance of its differences with antecedent models—may fade, and numerical classification may eventually be viewed as the same as its polythetic cousins. But right now, as automated approaches emerge in novel and not entirely welcome ways, with uncertain social and political consequences, we endeavor to understand what is new, and what is the same, with numerical and automated approaches to classification. Numerical classifications certainly feel inventive, riskier, and more complex. Their deployment, arriving as they do without complete user comprehension as to the nature of their operation, represents a new period in the history of classification, and their complexity masks uncertainty in the consequences of their use. 225 References Bennett, Debra. 1980. “Stripes Do Not a Zebra Make, Part I: A Cladistic Analysis of Equus.” Systematic Biology 29, no. 3: 272–287. Global Biodiversity Information Facility. 2019. GBIF Backbone Taxonomy. Copenhagen, DK: GBIF Secretariat. Green, Rebecca. 2008. “Relationships in Knowledge Organization.” Knowledge Organization 35, no. 2/3: 150–159. Green, Rebecca and Giles Martin. 2013. “A Rosid Is a Rosid Is a Rosid . . . or Not.” Advances in Classification Research Online 23, no. 1: 9–16. Gould, Stephen Jay. 1981. “What, If Anything, Is a Zebra?” Natural History 90, no. 7: 6–12. Hjørland, Birger. 1998. “The Classification of Psychology: A Case Study in the Classification of a Knowledge Field.” Knowledge Organization 25: 162–201. Hjørland, Birger. 2017. “Classification.” Knowledge Organization 44: 97–128. Lakoff, George. 1987. Women, Fire, and Dangerous Things: What Categories Reveal About the Mind. Chicago: University of Chicago Press. Montoya, Robert. In press. “Numerical Classification.” Knowledge Organization. Needham, Rodney. 1975. “Polythetic Classification: Convergence and Consequences.” Man 10, no. 3: 349–369. Noble, Safiya Umoja. 2012. “Missed Connections: What Search Engines Say about Women.” Bitch, no. 54 (Spring): 36–41. Noble, Safiya Umoja. 2018. Algorithms of Oppression: How Search Engines Reinforce Racism. New York: New York University Press. Pasquale, Frank. 2015. The Black Box Society: The Secret Algorithms That Control Money and Information. Cambridge, MA: Harvard University Press. Rosch, Eleanor. 1978. “Principles of Categorization.” In Cognition and Categorization, edited by Eleanor Rosch and Barbara B. Lloyd. Hillsdale, NJ: Lawrence Erlbaum Associates, 27–48. Smith, Justin E.H. 2015. Nature, Human Nature, and Human Difference: Race in Early Modern Philosophy. Princeton, NJ: Princeton University Press. Sneath, Peter H.A. 1962. “The Construction of Taxonomic Groups.” In Microbial Classification: Twelfth Symposium of the Society for General Microbiology, edited by Geoffrey C. Ainsworth and Peter H. A. Sneath. Cambridge: Cambridge University Press, 289–332. Sokal, Robert R., and Peter H.A. Sneath. 1963. Principles of Numerical Taxonomy. San Francisco, CA: W. H. Freeman. United States. 2018. U.S. Code: Title 26, Internal Revenue Code. content/pkg/USCODE-2018-title26/pdf/USCODE-2018-title26.pdf. United States Supreme Court. 1973. Miller v. California. 413 U.S. 15 (1973). WHO Scientific Group on the Prevention and Management of Osteoporosis. 2003. Prevention and Management of Osteoporosis: Report of a WHO Scientific Group. Geneva, Switzerland: World Health Organization. Wilson, Patrick. 1973. “Situational Relevance.” Information Storage and Retrieval 9, no. 8: 457– 471. Wittgenstein, Ludwig. 1953/2009. Philosophical Investigations, 4th ed., trans. G.E.M. Anscombe, P.M.S. Hacker, and Joachim Schulte. Malden, MA: Wiley-Blackwell. Deborah Lee – Department of Library and Information Science, City, University of London, United Kingdom Lyn Robinson – Department of Library and Information Science, City, University of London, United Kingdom David Bawden – Department of Library and Information Science, City, University of London, United Kingdom Operatic Knowledge Organisation An Exploration of the Domain and Bibliographic Interface in the Classification of Opera Subgenres Abstract: The classification of Western art music is notoriously complex, and the classification of opera subgenres provides distinct challenges. So, this paper considers the classification of opera subgenres from a knowledge organisation perspective. The paper starts with a short examination of key ideas from genre theory and musicological writings on genre, as well as the idea of opera subgenres. The categorisation of opera subgenres in the music domain is examined, utilising key music essays and sources. The large number of opera subgenres is identified, and the subgenres are explored using the framework of hierarchical, equivalence and associative relationships. The treatment of opera subgenres in eight bibliographic classifications is examined, where it is found to both reflect the disarray of the music domain and offer distinct discords. A model is proposed which considers the classification of opera subgenres on two planes, combining the web of relationships between subgenres (inter-subgenre plane) with the categorisation of the subgenre’s constituent attributes (categorisation plane). 1.0 Introduction Opera is a significant part of the study and performance of Western art music. Yet, opera does not appear to have a systematic classificatory framework for its subgenres, suffering from unmanageable quantities and a lack of a standardised set of subgenres. While the overall facets of music have been studied in knowledge organisation (Elliker 1994; Lee 2017a), and the medium facet has received particular attention (Lee 2017b; Lee and Robinson 2018), the form/genre facet within Western art music has not been deeply analysed. So, this paper considers the classification of opera subgenres. It utilises knowledge organisation theories and concepts to explore the classification of opera within the music domain, and to compare this with the treatment of opera subgenres within bibliographic classification schemes. Hence, knowledge organisation will be employed to help understand and to disentangle the so-called chaotic nature of opera subgenres. The paper starts with a short review of key ideas about music genres and opera subgenres, from the perspective of Western art music. Next, the classification of opera is explored from the perspective of the music domain. Some musicological sources are analysed to illuminate the quantity of opera subgenres and to consider the relationships between different subgenres. The treatment of opera subgenres in eight bibliographic classification schemes is considered, demonstrating interesting discords with the music domain. Finally, a model of classifying opera subgenres is presented, which explores how the categorisation of constituent parts of opera subgenres can interplay with relationships between subgenres. 227 2.0 Introducing genre-as-classification Studies and analysis of the idea of genre have a long pedigree and many different domains are interested in conceptualising and utilising the idea of genre. Frow (2006, 10), working in the realm of critical theory and literature, defines genre as “… a set of conventional and highly organizing constraints of the production and interpretation of meaning”, showing how genre is concerned with structure and rules for description. Genre’s role as a way of distinguishing things and taxonomic function (Frow 2006) is discussed by genre theorists and those within the music domain (see for example, Holt (2007) writing about popular music). Although the role of genre goes far beyond a taxonomic device (Andersen 2015), this paper is concerned with the taxonomic idea of genre, sitting alongside other papers in knowledge organisation which discuss genre categorisation in artistic forms (for example, Rafferty 2010). What is meant by a musical genre requires consideration. First, note that although the idea of a musical genre can have a wide range of meanings, in this paper genres are considered to be individual groups of works within the Western art music tradition, rather than the term “Western art music” being the genre. Second, the position of genre in the faceted classification of music can become blurred with form (for further discussion see Elliker 1994 and Lee 2017a). Third, it is useful to consider what elements make up a genre. Frow (2006), writing from a general genre theory perspective, considers a genre to be constituted by a number of aspects including formal features, thematic structure, physical setting, “situation” and more. Tereszkiewicz (2014) says that works of the same genre will have similarities in form, content and function. Noteworthy in these writings are the presence of form and function, themselves often considered facets of music. Frow (2006) also includes extrinsic qualities, showing how genre moves beyond the intrinsic qualities of a musical work. Music discourse also seeks to define a musical genre’s attributes. Dahlhaus (1987) is a useful source and suggests that genre consists of text, function, medium and form. (These aspects have been translated into standardised music classification terminology.) The idea of medium – who is playing or singing the work – is particularly important: Samson (2015), positions instrumentation as a defining feature of genre, while Dahlhaus (1987) defines genre as the expected connection between form and medium. This paper is concerned in particular with subgenres. Subgenres are defined by the Oxford English Dictionary (“subgenre, n.” 2019) as “A subdivision of a genre of literature, music, film, etc.”. A brief perusal of music literature suggests that subgenre is a valid term for types of opera: for example, Carter (2014) and Senici (2014) use the term subgenre in essays about opera, and a key encyclopedia entry for opera (Brown et al. 2001) also uses the term when referring readers elsewhere. So, what does it mean to be a subgenre of opera? At its most literal, any specific type of opera will count. When and how a subgenre becomes a subgenre in its own right, and on whose authority, is an intriguing question. For example, opera seria is a significant subgenre of opera (McClymonds and Heartz 2001); however, these works were called dramma per music at the time they were written, with the term “opera seria” being adopted by those writing from a historical viewpoint (McClymonds and Heartz 2001). This illuminates how subgenre creation and categorisation of operas into subgenres can be enacted by those removed from the works’ creation, such as historians and theorists. 228 3.0 The classification of opera in the music domain The first stage of this research involves analysing the music domain’s conception of the classification of opera subgenres; to do this, sources of information from the music domain are needed, which illustrate the classification of opera subgenres. Searching the literature does not reveal any standard knowledge organisation systems for subgenres of operas. Instead, the alphabetical list of 69 “see also” links in the Grove Music Online article for opera ((Brown et al. 2001) is a starting point – Grove Music Online (2020; henceforth abbreviated to its common name of Grove) is the seminal encyclopaedia and source for the study of music. Importantly, two musicologists writing generally about opera and genre, Campana (2012) and Carter (2014) utilise this list when reflecting upon opera subgenres, and indeed Campana (2012) refers to the list as a typology. The Grove typology (Brown et al. 2001) contains 69 terms, of which 66 are types of musicaldramatic works. However, the Grove typology presents some issues. First, Campana (2012, 221) refers to the typology as something produced “without any ambition of thoroughness”. This can be confirmed by the inclusion of three non-genre terms (verismo, libretto, Jesuits) and the exclusion of confirmed subgenres of opera. Second, the typology raises some questions about what is included in the boundary of opera, such as, whether works for dissemination through film are really subgenres of opera. Third, the typology is an alphabetical list of generic labels, so further sources are needed to contemplate the structure of opera subgenres. So, supplementary sources will also be used. Campana (2012) also refers to the Wikipedia table of subgenres of operas. This Wikipedia table (“List of opera genres” 2019) is a useful resource: some of its subgenres are not found in the Grove typology and some entries include descriptions of their relationships with other terms in the table. It is also pertinent to supplement these list-like KOSs with ideas about classification not contained within an actual KOS. Two musicological essays which (briefly) discuss the classification of opera subgenres will be used: an essay on genre and poetics by Campana (2012) and an essay questioning the nature of opera by Carter (2014). Finally, Grove entries for specific subgenres may also include implicit information about classification, so a selection of these can also be harvested. The musicological sources identify that there is a large number of opera subgenres and that the wording used by musicologists suggests that this high number is not always helpful. For example, Campana (2012, 202) talks about the “copious and disparate typologies” found in music dictionaries – although she only explicitly mentions one music source – and later comments on the “sheer number of generic labels” (204) which exist. Similar language is used by Carter (2014, 17), who describes the contents of the Grove typology as a “bewilderingly large number”. Doubts about every subgenre’s usefulness and necessity can also be read into these discussions (Campana 2012). The ways that subgenres are distinguished and labelled attracts attention. For example, Carter (2014, 17) describes the Grove typology as a “terminological minefield”. So, in musicological thought it can be inferred that not only are there are a large number of opera subgenres, but also that this is unusual or unexpected. Moreover, the subgenres of opera are, to musicological eyes, chaotic in number and type. It is interesting to consider how the music domain contemplates relationships between subgenres of operas. As neither the Grove typology nor the Wikipedia table contain formal manifestations of relationships between subgenres, implicit information 229 will be utilised instead, such as comments found in Grove entries for specific subgenres. Ideas about subgenre relationships will be identified from these music sources and then reframed in knowledge organisation terms. Hierarchical relationships are reflected in opera subgenres. For example, the subgenre conte lyrique has a short entry in Grove, where it is described as a “term used in the late 19th century for a particular kind of opéra comique” (“Conte lyrique” 2002; italics in original). This demonstrates a genus-species hierarchy (Aitchison, Gilchrist and Bawden 2000). The subgenres of opera also present more complex hierarchies, such as polyhierarchical relationships. For example, the film musical (Traubner, Gayda and Snelson 2001) has a parent subgenre of musical, but also a parent in the genre of films. The fait historique presents a different sort of polyhierachy. Bartlet (2002) describes it as “a type of late 18th-century French opéra or opéra comique …” (italics in original); in other words, its parent could be one of two specific opera subgenres. Hierarchically, the subgenre of fait historique as a whole has two possible parents, but each exemplar of the subgenre would have only one parent (unlike the film musical). All these examples ask questions about the quantity of levels within opera: is the fait historique a subgenre or a sub-subgenre? This questions the ontological nature of the idea of opera subgenres. Other types of relationships are also implied. For example, in the Grove entry for commedia per musica (“Commedia per musica” 2001), the term commedia in musica is given as an alternative, which depicts an equivalence relationship. Diminutives are another example of equivalence relationships found in opera subgenres. For example, the burla is described in Grove (Burla 2001) as one type of comic Italian opera, which can have the diminutive terms of burletta and burlettina. However, the term burletta has two meanings, as it can also refer to a particular type of English opera (Temperley 2001). This example demonstrates the complexities of equivalence relationships in opera, and the importance of separating out relationships based on labelling, from relationships based on meaning. Associative relationships are also present. For example, Märchenoper and opéra féerie are both subgenres with plots drawn from fairy tales (Millington 2001; Bartlet 2001). These two subgenres could be considered to have an associative relationship, of an undefined nature. Figure 1 depicts the combined hierarchical, equivalence and associative relationships, using the example subgenre of Märchenoper. Some of Märchenoper’s possible relationships are shown, including its associative relationship with opéra féerie (whose two possible parents are shown via dotted lines). Note that three of the four equivalence relationships for Märchenoper, taken from Millington’s (2001) description of variant terms, appear as entries in the Wikipedia table (“List of opera genres” 2019); this highlights how some of the discussion about the quantity of subgenres (for example, Campana 2012), could actually be related to alternative titles and the instability of generic labels. So, separating out the distinguishable subgenres from mere alternative appellations can help to order the chaos. Ultimately, examining relationships highlights the complexity of opera subgenres, and shows how knowledge organisation can usefully distil and disentangle the lists of subgenres found in sources such as Wikipedia and Grove. 230 4.0 The classification of opera in bibliographic classifications Considering how opera is classified in bibliographic classification schemes is critical, and it is fruitful to compare this with the music domain. This comparison is aided by utilising the idea of accords and discords, from the framework of relationships between scientific and bibliographic classifications, developed in Lee, Robinson and Bawden (2019). Eight bibliographic classification schemes are utilised for this purpose: British Catalogue of Music Classification (Coates 1960), Dickinson Classification (Dickinson 1938), Flexible Classification (Pethes 1967), Universal Decimal Classification (British Standards Institution 2006), Subject Classification (Brown 1914), Dewey Decimal Classification (Dewey et al. 2003), and McColvin and Reeves (McColvin, Reeves and Dove 1965). There is not space to reproduce a summary of the results here, but key results are identified below. Interestingly, only eight out of the17 consulted music classification schemes are found to include any terms for specific subgenres of opera and opera-like genres (Lee 2017a).1 The first important point to note is the low numbers of opera subgenres. Firstly, relatively few subgenres are listed in the eight schemes: only a total of 27 classes for opera subgenres are represented (though three classes contain multiple subgenres, to be discussed below). Looking at the schemes without opera subgenres is also fruitful: for example, Library of Congress Classification (Library of Congress 2019), is generally an extremely detailed scheme; yet, it does not choose to list categories of opera nor separate opera from other musical-dramatic works (Library of Congress 2019). Interestingly, some schemes state their mistrust of opera categorisation explicitly: for example, the Expansive Classification (Cutter 1891-1904) and Olding’s (1954) classification both state that they do not consider dividing opera into subcategories to be useful. This is in direct contrast to the music domain where the large numbers of subgenres was a focus point, and hence shows discord between bibliographic classification and the music domain. There are a number of possible explanations. First, the main rationale for the bibliographical classification schemes is retrieval; so, while many opera subgenres may exist, there may not be warrant for their inclusion in a bibliographic classification scheme. Second, the complexities and bewilderment commented upon by the musicologists might lead to lack of standardisation in subgenres, which in turn leads to a lower probability that subgenre information is useful to users. Third, the discord could be a reflection of the shallower levels of information seen in bibliographic classification schemes than domain-based classifications. However, the lack of opera subgenres in bibliographic classification schemes which are notoriously detailed (for example, Flexible Classification and Library of Congress Classification) suggests this is not the only (or even primary) explanation. The eight bibliographic schemes also reveal a distinct lack of coherence between themselves: out of 27 classes, 14 appear in only one of the eight schemes. While six of these “single-appearances” could be explained away as they are from a notoriously detailed scheme (Flexible Classification), the other five cannot. The lack of coherence 1 The Library of Congress Genre/Form Terms provides a potential additional source for the bibliographic classification of opera subgenres. However, a cursory glance through the variant terms attached to the entry for opera (Library of Congress 2020) shows equivalent or less detail than the bibliographic schemes, so has not been contemplated further within the space limitations of this paper. 231 could be considered a realisation of classification chaos, perhaps enhancing the views of the Olding and Expansive classifications about the foolishness of trying to categorise opera. Fourteen subgenres appear in the combined bibliographic schemes which do not appear in the Grove typology or Wikipedia table, showing further discord between the music domain and bibliographic classification. In some cases the bibliographic classification scheme examples are more detailed than the music domain: for example, great operetta and small operetta appear in a bibliographic classification scheme but not in the Grove typology or Wikipedia table. This weakens any argument that the lack of subgenres in the bibliographic schemes is due to lack of detail. In other cases, the bibliographic classification schemes have broader categories which do not refer to specific subgenres but could be considered as broad types of opera – for example, light opera and comic opera. This suggests a domain/bibliographic discord in the idea of opera’s units. The bibliographic classification schemes demonstrate some explicit relationships between subgenres. For example, Flexible Classification (Pethes 1967) has a hierarchical relationship in operetta, where the sub-classes of great operetta and small operetta have the class operetta as their parent. A form of equivalence can be seen in the use of combined classes for subgenres, such as McColvin and Reeves’s (McColvin, Reeves and Dove 1965) shared class for light opera, musical comedies and revues. So, there is some accord between bibliographic schemes and the domain, in that hierarchy and equivalence relationships are present, albeit with different levels of implicit-ness. 5.0 Towards a model of classifying opera subgenres There is, however, another way of contemplating the classification of opera subgenres, away from their inter-subgenre relationships: consider the categorisation of the attributes of each subgenre. Sources in the music domain comment on this as a categorisation method. For example, Campana (2012) and Carter (2014) remark upon the different ways that subgenres are delineated in the Grove typology, though neither author is intending to provide a complete list of the distinguishing features found in the typology or from a theoretical perspctive. Campana’s (2012) and Carter’s (2014) combined list of attributes include formal qualities (including the interrelation between speech, music and dance), subject matter, medium (in this case meaning the media of performance, such as radio or television), function (for example school operas), national operas and subgenres (relating to the idea of place) and a sense of historical period. (Note that as medium has another meaning in music classification (Lee and Robinson, 2018), the term “dissemination” will be adopted instead for the category containing foci such as television or film.) At this juncture it is useful to revisit Dahlhaus’ (1987) general list of a genre’s constituents; this sees some overlap (form and function), and also adds the ideas of medium (who is playing and performing) and text. If we were to argue that what distinguishes one subgenre from another is the same idea as what constitutes that subgenre (or genre), we can combine both sets of factors. Therefore, we could see that loosely speaking, opera has (at least) eight constituents, which translated into standardised music classification terms, are as follows: form, subject, dissemination, function, place, time, medium, and text. 232 Note that ideas such as place are complex in opera categorisation. Place can represent the boundaries of a subgenre’s world, its germination or its association. Furthermore, attributes do not always work independently. For example, there is a nebulous boundary between nationality and place, as hinted at in Carter’s (2014) depiction of “national genres”, and place can be associated with text via language. Ultimately, this categorisation of attributes is useful, but does not always distinguish between single subgenres: for example, opera buffa and burla both describe Italian comic operas of the 18th century. Put simply, categories are invaluable for studying shared properties of opera subgenres; however, they cannot always elineate between one subgenre and another and cannot explicitly track genre development. So, a model is proposed in Figure 2, which visualises opera subgenres as both a system of relationships between individual subgenres and as the categories of information which constitute individual subgenres. The model has two planes: the intersubgenre plane and the categorisation plane. The inter-subgenre plane allows for the complexities and quantity of subgenres by disentangling the web of relationships between subgenres; wheras the categorisation plane shows how each subgenre contains categories of information, which could be seen as working in tandem with the subgenreto-(sub)genre relationships found in the inter-subgenre plane. Links between the categorisations and music’s facets could be perceived, between categorisations which also appear as meta-facets (Elliker 1994) such as function, place, time and medium. (Form is, of course, already part of the form/genre facet (Elliker, 1994; Lee, 2017a)). These constituent-facet connections are demonstrated with freely-drawn blue arrows, representing the informal nature of these relationships. Finally, the associative relationship in the inter-subgenre plane could be concurrently viewed as a connection between subgenres which share particular constituents, especially for operas sharing the same subject material. This is also depicted with a freely-drawn blue arrow, representing the intangible nature of this connection. Therefore, this model demonstrates how knowledge organisation can be used to unpick and provide order to the music domain’s multitudinous collection of opera subgenres. 6.0 Concluding thoughts Opera subgenres are viewed as being somewhat tumultuous from the perspective of the music domain. This paper has analysed the opera subgenre soup, in order to unpick what is happening. Using a knowledge organisation framework, it can be seen that at least some of the superfluous number of subgenres might be explained by the presence of the same subgenres appearing multiple times in resources with different labels. Furthermore, the subgenres of opera can be better understood as a complex web of different relationships, rather than through a one-dimensional list. Lastly, the two-plane model suggests that there are connections between categorising the inter-subgenre relationships and categorising the information which informs the delineation of those subgenres. This is a novel way of considering the classification of (Western art music) genres, and could be utilised to examine other knotty sets of genres. Exploring the bibliographic classification of subgenres introduced some interesting ideas. Tangible discords with the domain are shown: the quantity of subgenres differs, as does the level of hierarchy represented by classes in some cases (both broader and narrower). These cannot be explained solely by the retrieval-focussed nature of 233 bibliographic classification schemes. However, the findings can also be read as accordance between scientific and bibliographic classifications: the lack of coherence in the bibliographic scheme’s categorisation of opera subgenres could be seen as a reflection of the confused and unruly set of opera subgenres found in the music domain. This paper is a preliminary step in furthering understanding of Western art music genre classification, and so future research could see a similar analysis applied to other Western art music genres. Furthemore, it would be productive to see how the results from this paper fit into genre classification research pertaining to other music traditions. It would also be interesting to contemplate other sorts of artistic works using the twoplane model. Consequently, this paper illustrates how knowledge organisation can provide order to operatic chaos, and in the process, advance our understanding of music classification and knowledge organisation more generally. Figure 1. The subgenre relationships for Märchenoper Figure 2. Model of the classification of opera subgenres 234 References Aitchison, Jean, Alan Gilchrist, and David Bawden. 2000. Thesaurus Construction and Use. 4th ed. London: ASLIB. Andersen, Jack. 2015. “What Genre Theory Does.” In Genre Theory in Information Studies., edited by Jack Andersen. Bingley: Emerald, 1-12. Bartlet, M. Elizabeth C. 2001. “Opéra Féerie.” Grove Music Online. Bartlet, M. Elizabeth C. 2002. “Fait Historique.” Grove Music Online. British Standards Institution. 2006. UDC, Universal Decimal Classification. 3rd ed. London: British Standards Institution. Brown, Howard Mayer, Ellen Rosand, Reinhard Strohm, Michel Noiray, Roger Parker, Arnold Whittall, Roger Savage, and Barry Millington. 2001. “Opera (i).” Grove Music Online. Brown, James Duff. 1914. Subject Classification. 2nd ed. London: Grafton & Co. Burla. 2001. Grove Music Online. Campana, Alessandra. 2012. “Genre and Poetics.” In The Cambridge Companion to Opera Studies., edited by Nicholas Till. Cambridge: Cambridge University Press, 202-224. Carter, Tim. 2014. “What is Opera?” In The Oxford Handbook of Opera., edited by Helen M. Greenwald. New York: Oxford University Press, 15-32. Coates, Eric. 1960. The British Catalogue of Music Classification. London: Council of the British National Bibliography. Commedia per musica. 2001. Grove Music Online. Conte lyrique. 2002. Grove Music Online. Cutter, C. A. 1891-1904. Expansive Classification. Boston: Cutter. Dahlhaus, Carl. 1987. Schoenberg and the New Music. Trans. Derrick Puffett and Alfred Clayton. Cambridge: Cambridge University Press. Dewey, Melvil, Joan S. Mitchell, Julianne Beall, Giles Martin, Winton E. Matthews, and Gregory R. New. 2003. Dewey Decimal Classification and Relative Index. 22nd ed. Dublin, Ohio: OCLC. Dickinson, George Sherman. 1938. Classification of Musical Compositions: A Decimal-Symbol System. Reprinted in: Bradley, C.J. 1968. The Dickinson Classification: A Cataloguing & Clasisfication Manual for Music, Including a Reprint of the George Sherman Dickinson Classification of Musical Compositions. Carlisle, Penn.: Carlisle Books. Elliker, Calvin. 1994. “Classification Schemes for Scores: Analysis of Structural Levels.” Notes 50, no. 4: 1269-320. Frow, John. 2006. Genre. London: Routledge. Holt, Fabian. 2007. Genre in Popular Music. Chicago: University of Chicago Press. Grove Music Online. 2020. Lee, Deborah. 2017a. Modelling Music: A Theoretical Approach to the Classification of Notated Western Art Music. PhD dissertation. London: City, University of London. Lee, Deborah. 2017b. “Numbers, Instruments and Hands: The Impact of Faceted Analytical Theory on Classifying Music Ensembles.” Knowledge Organization 44: 405-15. Lee, Deborah, and Lyn Robinson. 2018. “The Heart of Music Classification: Towards a Model of Classifying Musical Medium.” Journal of Documentation 74: 258-277. doi:10.1108/JD-08-2017-0120. Lee, Deborah, Lyn Robinson, and David Bawden. 2019. “Modelling the Relationship Between Scientific and Bibliographic Classification for Music.” Journal of the Association for Information Science and Technology 70, no. 3: 230-241. doi: Library of Congress. 2019. [Library of Congress Classification]. M: Music and Books on Music. 235 Library of Congress. 2020. Linked Data Service: Library of Congress Genre/Forms Terms (LCGFT). List of Opera Genres. 2019. Wikipedia. McClymonds, Marita P. and Daniel Heartz. 2001. “Opera Seria.” Grove Music Online. McColvin, Lionel R., Harold Reeves, and Jack Dove. 1965. Music Libraries: Including a Comprehensive Bibliography of Music Literature and a Select Bibliography of Music Scores Published Since 1957. New ed. London: Andre Deutsch. Millington, Barry. 2001. “Märchenoper.” Grove Music Online. Olding, R. K. 1954. “A System for Classification of Music and Related Materials.” Australian Library Journal 3: 13-18. Pethes, Ìvan. 1967. A Flexible Classification System of Music and Literature on Music. Budapest: Centre of Library Science and Technology. Rafferty, Pauline. 2010. Genre Theory, Knowledge Organisation and Fiction. In Paradigms and Conceptual Systems in Knowledge Organization: Proceedings of the Eleventh International ISKO Conference 23-26 February 2010, Rome, Italy, edited by Claudio Gnoli and Fulvio Mazzocchi. Advances in Knowledge Organization 12. Würzburg: Ergon Verlag, 254-61 Samson, Jim. 2015. “Genre.” Grove Music Online. Senici, Emanuele. 2014. “Genre.” In The Oxford Handbook of Opera, edited by Helen M. Greenwald. New York: Oxford University Press, 33-52 “Subgenre, n.” 2019. OED Online. Tereszkiewicz, Anna. 2014. Genre Analysis of Online Encyclopedias: The Case of Wikipedia. Cambridge: Cambridge University Press. Temperley, Nicholas. 2001. “Burletta.” Grove Music Online. Traubner, Richard, Thomas L. Gayda, and John Snelson. 2001. “Film Musical.” Grove Music Online. Daniel Libonati Gomes – Federal University of Pará, Brazil Thiago Henrique Bragato Barros – Federal University of Rio Grande do Sul, Brazil The Bias in Ontologies An Analysis of the FOAF Ontology Abstract: Knowledge Organization Systems (KOS), like thesauri, classification schemes, taxonomies, or ontologies, are essential tools for the organization and representation of information in various contexts and are often understood as neutral tools without any bias. However, we can argue representing information, even unconsciously, we can describe some form of prejudice, that is, what is a bias, by the person who creates the system. This selection of elements represented is required in any KOS since every representation has a specific function that is related to a context. Ontologies are an excellent example of this because, as Guarino, Oberle, and Staab (2009) state, these KOS need to delimit their goal to enable reuse and avoid problems arising from excess of ontological commitment. With that in mind, we seek to discuss possible bias that a KOS may have, but focusing on ontologies and taking as our object of analysis the Friend of a Friend (FOAF) ontology. Thus, we characterized this research as descriptive, with a qualitative approach. The objective of the review is to understand the implications of bias in these KOS, also seeking to discuss how Knowledge Organization, as a field of study, can act in the development of tools that recognize its own bias and still be able to perform its functions. For the analysis, the theoretical framework of Discursive Semiotics is used, which studies the formation of meaning as a phenomenon from a model called Generative Trajectory of Meaning (GTM). From this perspective, we can understand bias as a product of semiotic processes – figurativization, thematization, and discursivization (Greimas and Courtés 2013) – involving the KOS developer social-cultural contexts (Gomes and Barros 2019a, 2019b). From this theoretical understanding, all the elements that constitute the FOAF ontology – classes and properties – are analyzed, as well as its documentation available online. We concluded that bias is an inherent feature of a KOS and that Knowledge Organization could focus on conducting studies on technologies that enable information retrieval, taking into account this aspect of its tools. In order to : (1) go beyond the KOS bias, using, for example, "see also" connections that act as hyperlinks to systems with other biases that best fit the user's needs; or (2) "learn" the various perspectives that exist on the same topic, represent them in a KOS and drive users to those best suited to their needs – in which case issues such as Machine Learning and Artificial Intelligence should enter the discussion, making this tools more semantic enriched. 1.0 Introduction This study aimed to discuss the consequences of the presence of one or more biases in Knowledge Organization Systems (KOS), taking ontologies as an example of the presence of this bias. For this, we analyzed the elements that make up the Friend of a Friend (FOAF) ontology, which aims to represent individuals and their relationships, as its name already implies. We also seek, from the analysis, to demonstrate the importance that understanding and explaining the bias of a KOS can have since considering it can be fundamental for efficient information retrieval to occur. 237 2.0 The method For the analysis of the FOAF ontology, we used the theoretical tools of Discursive Semiotics, considering that, based on this theory, it is possible to understand the formation of the meaning of a discourse, which, in the present case, is an ontology. By discourse, we mean the concretization, in language, of a particular social, historical, ideological, and environmental context (Possenti 2009). Thus, Discursive Semiotics studies the mechanism by which a given discourse is shaped, and when applied to ontologies or KOS in general, it can reveal some important aspects of its constitution, especially those related to aspects that shape it (Gomes and Barros 2019b). Discursive Semiotics, for didactic reasons (Greimas and Courtés 2013), adopts a model called Generative Trajectory of Meaning (GTM), which organizes the formation of discourse in two levels of depth – that is, it goes from the semantic level to the discursive. It is essential to highlight that there are several GTM models, and we adopted in this work the one developed by Greimas and Courtés (2013). The GTM has two aspects: (1) semionarrative structures and (2) discursive structures. A semiotic analysis starts from the discursive structures – which organize the contextual elements that form the discourse and make it understandable – and go into the semionarrative structures – formed by elements called actants, which can act to each other and transforming it. The actants, in turn, are formed by even smaller and completely abstract units, called semes, which gain meaning from their interaction with opposite, complementary and contradictory semes. The interaction between semes generates what we can understand as the "meaning" or “particular meaning” of a given the word. As the focus of this work is the bias that possibly exists in an ontology, we chose to pay more considerable attention to the GTM's discursive level. At this level, there is a series of operations, which take place from semionarrative structures, covering them with the contextual component mentioned above – the social, historical, ideological, and circumstantial context. Operations that occur at this level are: • Discursivization: it makes explicit those involved in the discourse, forming actors (actorialization), as well as space (spatialization) and time (temporalization) in which they were enunciated; • Figurativization: the actors gain a semantic investment, becoming figures; that is, they we can understand as something real. • Thematization: it is an abstract thematic covering on which the figures act. Thus, all operations that occur from semionarrative to discursive structures come into contact with some linguistic system, thus forming lexemes (in a way, words), from which we interact and put the discourse into practice, considering a given context. However, this explanation is still too general and was designed especially for discourses in action, which is not the case with ontologies. These types of KOS were built generally, because of their reuse (Gruber 1995) and the domain representation and, because of that, ends up being quite a generalist. In previous works (Gomes and Barros 2019a), we highlight that the concepts that constitute ontologies can be understood, like any word, as lexemes formed from the semiosis operations explained above. Thus, the concepts can be considered figures within the represented discursive universe; that is, they went through the process of figurativization so that they have a precise semantic coating. Thematization occurs based 238 on the domain itself that is being represented in the ontology, considering that the understanding of the concepts is only possible from the abstract coating given by the themes. Finally, based on the existence of the ontologist (after all, it is they who constructs the discourse, the ontology), it is possible to affirm that actorialization, temporalization, and spatialization also occur. These processes allow us to situate concepts from the referents they seek to represent, together with the perspective of those who produce the discourse. The following figure shows how the elements that make up an ontology (including its developer) also in light of the level of GTM's discursive structures: Figure 1 – The GTM’s discursive structures on ontologies Therefore, the discussion carried out in this work we based in the idea that lexemes form an ontology gain meaning from the performance of a series of semiotic operations involving the formation of figures – which gain meaning because they are linked to some theme – and the formation of the ontologist as a semiotic actor, present in a specific time and space. Thus, the study of the FOAF ontology started from the semiotic approach. This research we characterized as descriptive and qualitative. The FOAF ontology was analyzed from its documentation 1 , which explains its objectives and constitution (classes and properties), and used the concepts presented in this section as a theoretical foundation. The analysis sought to clarify how the semiotic processes that occur in the GTM discursive structures end up generating a bias in the ontology, even against the will of the ontologist. Initially, we studied the general information of FOAF, such as its objectives and used, then we move on to the study of the classes and properties that comprise it. To check for the presence of bias in the ontology, we observe which of the classes and properties, as well their descriptions in the documentation, dialog with a more specific context or ideology – for example, which classes or properties represent things 1 FOAF Vocabulary Specification 0.99. Available at: 239 that are present in a given region of the world. Therefore, a class like foaf: Person2, based on the distinction between what is and is not a person, says much less about the bias of ontology than a property like foaf:gender, which has more significant social implications, given the discussions about gender, so we chose to pay more attention to foaf: gender. Thus, we selected some of these classes and properties to guide the discussion about the ontology bias. Starting from the explanation of the bias present in FOAF, we propose a discussion about how this bias can affect an ontology and the actions that can take so that this phenomenon is not necessarily an intentional problem. 3.0 The FOAF ontology and its bias The FOAF ontology, as stated in its documentation, was developed to connect people and information through the Web, and this information can be anything from documents of any support, data, or even just ideas in someone's mind. For this, FOAF integrates three types of networks: “social networks of human collaboration, friendship, and association; representational networks that describe a simplified view of a cartoon universe in factual terms, and information networks that use Web-based linking to share independently published descriptions of this inter-connected world” (Brickley and Miller 2014) FOAF's terms were divided into three broad categories: (1) Core, formed by terms that involve people and groups regardless of time and technology; (2) Social Web, formed by terms related to activities carried out on the Web; and (3) Linked Data utilities, formed by terms that can be useful for the Web community to connect data. Despite this division, the documentation explicitly distinguishes only the terms of categories 1 and 2. It is worth noting that this ontology can always be updated, with the insertion of new terms and that old terms, called "archaic" in the documentation, are always maintained in order to enable old forms to become modern again (Brickley and Miller 2014). The following image, taken directly from FOAF's documentation, explains all its classes and properties: Figure 2 – FOAF classes and properties As we can see, some classes and properties are quite general, as is the case with foaf:Agent, defined as something capable of doing something. This class can be used to represent situations in which the being who is acting in a given situation is not exactly a person or group of people (it can be a software bot, for example). A subclass of 2 To reference the names of the classes and properties of the ontology, we have chosen to use the same form as the one present in the FOAF documentation. Thus, something like foaf: Agent (capitalized) is a class, whereas foaf: knows (lowercase) is a property. 240 foaf:Agent is foaf:Person, used to represent people, and they may be alive, dead, or not even exist at all. This broader scope is fundamental in the case of FOAF, which is an ontology that aims to be widely reused in the most diverse situations that involve the connection between people and information on the Web. In the theoretical scope of Information Science, a concept present at FOAF that generates much discussion is represented in the class foaf:Document. We know that there are several perspectives on the concept of a document, such as those presented by Suzanne Briet (1951), Michael Buckland (1997), and Berndt Frohmann (2009). The FOAF documentation states only the following about this class: “The Document class represents those things which are, broadly conceived, ‘documents’” (Brickley and Miller 2014). With this definition of the class, quite broad, it is already possible to perceive more clearly the presence of bias in the ontology, although that specific bias does not generate any negative consequences. The definition of a document is not a common concern in all areas, but it is crucial for Information Science, which has, in this object (Buckland's "information as a thing" (1991)), one of its focuses of study. Thus, the ontology, even if indirectly and without an intention, ends up “taking sides", even if this does not produce harmful effects for what it proposes. However, there are elements in FOAF that are less subtle, and that makes the ontology bias even more explicit. The foaf:gender property, already mentioned, can cover several different perspectives. In a more conservative ideological perspective, there are only two genders, male and female; however, in a more progressive perspective, gender is understood in a much less fixed way. The FOAF documentation says the following about that property: “The gender property relates an Agent (typically a Person) to a string representing its gender. In most cases, the value will be the string 'female' or 'male' (in lowercase without surrounding quotes or spaces). Like all FOAF properties, there is, in general, no requirement to use gender in any particular document or description. Values other than 'male' and 'female' may be used, but are not enumerated here. The gender mechanism is not intended to capture the full variety of biological, social, and sexual concepts associated with the word 'gender'” (Brickley and Miller 2014). In other words, foaf:gender recognizes the diversity of perspectives about gender, so to avoid taking a stand and maintaining a high level of generality in ontology, they choose to leave the term open, without mandatory use in any circumstances, but indicating that the most common way to fill this property is with the strings "male" and "female." The authors themselves make it clear, later in the document, that they are aware of the difficulty of working with the concept of gender: "We have tried to be respectful of diversity without attempting to catalog or enumerate that diversity" (Brickley and Miller 2014). As explained in the previous section, we can explain this phenomenon of bias in an ontology using Discursive Semiotics and taking into account that an ontology is a discourse, the person responsible for developing the ontology, in the case of FOAF, ontologists have a bias. These subjects, being inserted in a given spatial, temporal reality, and being able to act on the things that exist in the world, end up transferring to the ontology a section of all these aspects that shape them. To understand this, we could imagine how the description of foaf:gender could be different 30 years ago, that is, the current description is the result of the temporalization process explained previously, responsible for inserting the actors in a particular temporal reality. The spatial aspect of FOAF's discourse can be seen when comparing the pairs of properties foaf:firstName X foaf:givenName and foaf:lastName X foaf:familyName. In 241 a way, each pair refers to the same thing, however, as the developers themselves claim, the “concepts of ‘first’ and ‘last’ names do not work well across cultural and linguistic boundaries; however they are widely used in address books and databases” (Brickley and Miller 2014), that is, although the two ways of referring to someone's first and last names are valid in some cultures, in others someone's last name is not the same as their family name. For example, in some Eastern countries, the first name is the family name, and the last name is the given name. We can also point out, as another example of the presence of contextual aspects of ontologists at FOAF, the inclusion of the property, labeled as “archaic”, foaf:dnaChecksum (which could be used to verify the data integrity in the DNA transference from a person), created as a joke by the developers. The only objective with this inclusion was to demonstrate the great diversity of properties that could be created to identify someone, some of which, the developers add, “we might find disturbing” (Brickley and Miller 2014). 4.0 The consequences of the presence of bias in FOAF and KOS in general In order to understand how a specific bias can affect an ontology like FOAF, it is worth highlighting some requirements that it must meet. According to Gruber (1995), an ontology must have: • clarity: the concepts present in an ontology must be clear and objective so that the definitions do not depend on social contexts or computational requirements. That is why ontologies are generally developed from a formal language, using logical axioms. Also, in order to facilitate the understanding of ontology by a human being, it is highly recommended that the definitions be documented in natural language (as is the case with the documentation analyzed here); • coherence: the axioms that make up the ontology must be coherent so that those logical inferences can be easily made. There can be no contradictions between the definitions; • extendibility: an ontology must be developed, taking into account that the vocabulary can be reused in some other situation. Thus, the elements that make up the ontology must be open enough so that new terms are inserted without the need to change those already present; • minimal encoding bias: ontologies must be formed by the concepts they want to represent regardless of the computational language used in their development since they are used by different systems of representation and styles of representation; • minimal ontological commitment: an ontology must have a minimal ontological commitment as possible in order to be able to share knowledge and reuse it, in addition to interoperability between systems. The fewer statements about the discursive universe that are made, the better, so that it is preferable, in many cases, that only the necessary (but not sufficient) characteristics of a given concept are made explicit (however, for reasons of clarity, when possible, a complete definition, with necessary and sufficient characteristics, must be provided). An ontology that “take sides” in a very explicit way faces the risk of not being able to meet its objectives efficiently, as this may affect some of the above requirements, 242 more specifically its clarity, extendibility, and ontological commitment. For example, if FOAF adopted only foaf:firstName and foaf:lastName to express the names of individuals, it would be committing itself to a more specific reality, unlike what happens with foaf:givenName and foaf:familyName, which serves the largest share of the world population. Besides, in the case of foaf: gender, the decision not to make this property mandatory or to define default values for it also had the purpose of maintaining the above requirements, which would not happen if the developers were more assertive in their opinion. In the case of other KOS, such as thesauri, the requirements above are different, especially concerning clarity or even ontological commitment. Many KOS must represent a given domain of knowledge in the most transparent possible way, with precise definitions (very different from the general characterization of ontologies). In such cases, the presence of a bias can further affect the retrieval of information through this KOS, as it can affect the way users interact with the information they seek (that is, KOS influences how the user understands the information) or negatively affect the retrieval of information, as how the information was represented may not be consistent with what the user understands. An excellent example of this can be seen in the work of Miranda and Costa (2019), in which the authors analyze the way a bibliographic representation of texts from Umbanda is made, finding cases in which books referring to this religion were inserted in class 133.4 of the DDC system (Demonology and witchcraft). In this situation, the bias of those responsible for classification is quite evident, since, for Umbandists, their belief has nothing to do with witchcraft. Based on these examples and the semiotic process of meaning formation not only of ontologies but of KOS in general, we understand that bias is an inherent element in information representation, although we try to avoid it. Even a very generalist ontology such as FOAF ends up “taking sides” regarding some themes. 5.0 Conclusion In this work, we started from the principle that a KOS can be understood as a discourse, being able to be analyzed by the theoretical tools provided by Discursive Semiotics, which allows us to understand how the meaning of that discourse is constructed. Thus, we analyzed FOAF ontology as a discourse, which carries with it some contextual aspects regarding who develops it (who they are, where they live when they are developing the ontology), so that, even against their will, it ends up transmitting in their elements some of these aspects. In other words, FOAF, like any KOS, is biased, and this is natural. However, it is necessary to think about how Knowledge Organization can deal with this bias since a KOS that transmits a particular idea in a very understandable way can end up negatively affecting information retrieval from its use. The inherent bias in KOS, we believe that to deal with this situation, it is necessary to develop studies focused on technologies that enable the use of bias in favor of the users. An idea can be found in FOAF itself: “If people publish information in the FOAF document format, machines will be able to make use of that information. If those files contain ‘see also’ references to other such documents in the Web, we will have a machine-friendly version of today's hypertext Web” (Brickley and Miller 2014). 243 The use of “see also” references in KOS, in general, could be a way of connecting systems that deal with the same topic but have different perspectives. Alternatively, even within a single system, to support the different perspectives that may exist. Another possible solution, considerably more complex, would be the KOS itself, being part of a more extensive system, having access to what the user tends to search for, and directing search results to those already known needs. In other words, KOS would learn what biases might exist in any topic and make a comparison between what it knows and what its user knows or usually research. In this case, the Knowledge Organization should bring issues such as Machine Learning and Artificial Intelligence to the discussions. Thus, the discussion proposed in this paper aimed to explain how, even in generalist KOS like ontologies, a bias can be evident, which can affect information retrieval. As this bias is natural and attempts to avoid it are not entirely adequate (for reasons explained by semiotics), we highlight that a paradigm shift towards accepting the KOS bias as a way to benefit the user could bring benefits. Since we understand bias as inherent to KOS, this work proposes that the developers of these tools pay attention to their ideas and recognize the wide variety of existing perspectives on the theme that they seek to represent in their systems. With this in mind, they could look for ways to direct users to tools that best fit their needs. This paradigm shift could occur from more straightforward changes, such as the “see also” references, or in the depth of studies on technologies that allow the automation of this process of verifying the bias and directing the user. References Brickley, Dan and Libby Miller. 2014. FOAF Vocabulary Specification 0.99 (01-14-2014). [rdf, xml] Briet, Suzanne. 1951. Qu’est-ce Que la Documentation?. Paris: Édition Documentaires Industrialles et Técnicas. Buckland, Michael K. 1991. “Information as a Thing” Journal of the American Society for Information Science 45, no. 5: 351-60. Buckland, Michael K. 1997. “What is a ‘Document’?” Journal of the American Society for Information Science 48, no. 9: 804–9. Frohmann, Berndt. 2009. “Revisiting ‘What is a Document?” Journal of Documentation 65: 291– 303. Gomes, Daniel Libonati and Thiago Henrique Bragato Barros. 2019a. “A Construção do Discurso em Ontologias: Um Estudo com Base na Semiótica Discursiva.” Informação e Informação 24, no. 3: 78–103. Gomes, Daniel Libonati and Thiago Henrique Bragato Barros. 2019b. “O Discurso em Ontologias: Uma Abordagem a Partir da Semiótica Discursiva.” In Organização do conhecimento Responsável: Promovendo Sociedades Democráticas e Inclusivas, edited by Thiago Henrique Bragato Barros and Natalia Bolfarini Tognoli. Belem: Ed. da UFPA, 372– 81. Greimas, A. J. and J. Courtés. 2013. Dicionário de Semiótica. São Paulo: Contexto. Gruber, Thomas R. 1995. “Toward Principles for the Design of Ontologies Used for Knowledge Sharing?” International Journal of Human-Computer Studies 43, nos. 5-6: 907-28. Guarino, Nicola, Daniel Oberle, and Steffen Staab. 2009. “What Is an Ontology?” Handbook on Ontologies, May, 1–17. Miranda, Marcos Luiz Cavalcanti de, and Costa, Deniz. 2019. “A Organização do Conhecimento Sobre Umbanda e sua Representação Bibliográfica: Uma Análise Exploratória a Partir de 244 Registros Biográficos.” In Organização do conhecimento Responsável: Promovendo Sociedades Democráticas e Inclusivas, edited by Thiago Henrique Bragato Barros and Natalia Bolfarini Tognoli. Belem: Ed. da UFPA, 419-27. Possenti, Sírio. 2009. Os Limites do Discurso. São Paulo: Parábola Editorial. Lucinéia Souza Maia – Universidade Federal de Ouro Preto, Brazil Gercina Ângela de Lima – Universidade Federal de Minas Gerais, Brazil A System for Specifying Semantic Relations for Knowledge Representation Abstract: Semantic relations are fundamental for understanding the nature of the connection between two concepts in a domain. This paper presents a model for extracting semantic relations for the representation of knowledge from academic documents in the context of the Portuguese language. A Web information system called Semantizar was developed to support the extraction of semantic relations from classificatory structures that represent specific academic documents. To evaluate the qualitative performance of Semantizar, a case study was carried out, which pointed to important contributions to research about semantic relations extraction. According to outcomes, when two concepts of a classificatory structure exist in a sentence, a semantic relationship between them can actually exist. Finally, it is concluded that this research is relevant because it brings important findings for the extraction of semantic relations for the knowledge representation of academic documents to be applied in the Brazilian scenario. 1.0 Introduction Semantic relations are fundamental for understanding the nature of the connection between two concepts in a domain. According to Khoo and Na (2006) and Green, Bean e Myaeng (2011), concepts can be seen as blocks of knowledge, and relationships are links that connect and hold these blocks together within the structures of knowledge in people's minds. Some factors can influence the specification of semantic relations, including language and culture. According to Khoo and Na (2006), it is difficult to analyze the meaning of concepts and their relations when they are taken apart from language because each language has its characteristics and is linked to the cultural factor. According to Khoo and Na (2006), it is difficult to analyze the meaning of concepts and their relations when they are taken apart from language because each language has its characteristics and is linked to the cultural factor. Therefore, in the literature review conducted by Maia (2018), was identified the lack of research concerning the theme in Brazil. Subsequently, a Semantic Relations Extraction Model was formulated. This model’s goal is to be a theoretical construction of a way of establishing relationships between concepts from academic document, in order to create semantic structures. The model was expanded to a computational prototype called Semantizar. 2.0 The Semantizar The following procedures were observed in the development of Semantizar: (1) specification, (2) data modeling, (3) architectural design, and (4) prototype implementation in a Web system. Of these, the specification and the prototype implementation will be presented in this paper. In the first stage of the development of Semantizar - specification -, was made the description of the algorithm for the Semantic Relations Extraction Model, which considers as inputs a classificatory structure and its respective academic document, from which the structure originated. The classificatory structure is broken down into concepts and the academic document is decomposed into sentences. 246 After this decomposition into concepts and phrases, the Semantizar scans all the phrases searching for pairs of concepts. In the first scan, the pair of concepts 1 and 2 is enhanced, then is checked their existence in each sentence until the last one, regardless of the discovery of a pair of concepts in a sentence. In this way, the search is done throughout the document. If the concept pair is found in a sentence, that sentence is highlighted, so that a manual check confirms or denies the existence of a semantic relationship between these concepts. Then, the Semantizar combines the concept 1 with all the other concepts of the classificatory structure, checking whether each combination exists in a sentence. Upon ending the combinations with concept 1, the Semantizar subsequently makes combinations with concept 2 and checks whether they exist in all sentences. In this way, the Semantizar makes combinations of pairs of concepts with all the concepts of the classificatory structure and checks the existence of each of these pairs in each sentence of the academic document, from the first to the last one, as illustrated in Figure 1. Figure 1: Search iterations of pairs of concepts in the sentences of the academic document. The prototype implementation phase was divided into three activities: (1) data input, (2) reading and preparation and (3) extraction of semantic relations. The Figure 2 shows de initial interface of Semantizar prototype. In the data input activity (1), the user informs the metadata of the academic document from which intends to extract semantic relations, and then sends the files of the publication and classificatory structure. The publication being a .pdf file and the classificatory structure being a plain text file with .txt extension. 247 Figure 2: The initial interface of Semantizar. In the subsequent activity, reading and preparation (2), the Semantizar checks whether each term of the classificatory structure exists in the database. If the term does not exist, it is automatically registered by the system in way: The PHP programming language, chosen for the implementation of the model, allows text files to be converted into vectors. In this sense, the file that refers to the classificatory structure is automatically converted into a vector of terms, in which each line of the structure (which refers to a term) is transformed into a position of the vector. Therefore, the algorithm goes through each position of the vector, checking if the content of the position, which is the term of the classificatory structure, exists in the database; if it does not exist, the term is automatically registered by the system as a noun. This was established because, grammatically analyzing, the terms of the classificatory structure denote the nouns. The second task of the reading and preparation activity is the preparation of the publication file (which is an academic document: a thesis or a dissertation) for the manipulation that will occur in the next activity of the prototype. Due to the programming language chosen for the implementation, it was necessary to convert the text in .pdf format to a temporary file in .txt format. The activity of extracting semantic relations (3) is considered the most important, being the core of the Semantic Relations Extraction Model. It consists of two tasks: the first searches for pairs of terms from the classificatory structure in sentences of the publication to which the structure refers. The temporary file of the academic document created in the previous step is transformed into a string (variable that stores alphanumeric characters). Subsequently, this string is decomposed into a smaller string each time a period (.) is found in the file. In this way, each position of the vector is a sentence of the publication, separated by a period. Then, the sentence vector is scanned seeking terms of term vector in each sentence vector position. If a term is found in the sentence, the Semantizar goes through the other positions of the vector of terms checking if there is another term of the structure in the same sentence. In affirmative case, the sentence is taken to compose the interface created for the user to validate the semantic relationship. The validation of the semantic relation is the second task of the relation extraction activity. If the user agrees that there is a semantic relation between the two concepts of 248 the classificatory structure found in a sentence in the publication by the Semantizar, (s)he is directed to semantic relation register interface. In the register interface the user specifies the semantic relation according to judgment when analyzing the sentence. Besides that, the user determines the type of semantic relation, the inverse relation, if any, and points out the properties: symmetry and reflexivity. The transitivity property was not considered because it is understood that it applies to ternary relationships, which is not the case in this paper. The types of semantic relations that appear in Semantizar were the result of a literature review by Maia, Lima and Maculan (2017), which elaborated a taxonomy with 63 types of semantic relations classified as hierarchical, equivalent and associative. 3.0 Case study To evaluate the efficiency of Semantizar in the extraction of semantic relations, a case study was carried out. It was organized according to the Experimentation Process methodology proposed by Wohlin et al. (2014), which considers three stages: (1) definition of the case study, (2) planning and (3) operation. In the first stage, definition of the case study (1), it was determined: (a) the object of the case study, which is the semantic relations; (b) the objective: to verify the efficiency of Semantizar; and (c) the context, which are theses and dissertations in the domain of Knowledge Organization and Representation. After the case study definition phase, the second stage, planning (2), follows. According to Wohlin et al. (2014), this stage basically indicates “how” the case study will be conducted. Thus, in planning, it was established: (a) the sample and (b) the analysis of the data to be collected. For the sample, was chosen the faceted structure of MHTX, by Lima (2004) and the thesis Fatores Interferentes no Processo de Análise de Assunto: Estudo de Caso de Indexadores (Interfering factors in the process of subject analysis: indexer case study), by Naves (2000). The MHTX is an in-context hypertextual navigation model to organize theses and dissertations, aiming to support the reading and retrieval of these documents in Digital Libraries of Theses and Dissertations. In the MHTX prototype three navigation tools was created: the expanded summary, the concept map and the faceted structure. In its implementation, the Naves’s (2000) was used aforementioned thesis to instantiate the tools created. Among these tools, the faceted structure presents the characteristics of the desired sample for Semantizar. As mentioned in the planning stage, in addition to defining the sample, the analysis of the data was determined. In this case, was decided to perform quantitative and qualitative analyses in order to identify: (I) the number of semantic relationships suggested by the application due to the amount of semantic relations that actually exist (this factor is important in order to evaluate whether the Semantizar has the potential to automatically extract semantic relations); (II) the concepts that are most likely to be semantically related (this parameter can point to the key concepts of the analyzed publication); (III) the characteristics of the semantic relations found; and (IV) the parallel between concept relationships in the original faceted structure and the resulting representation from Semantizar. Following the case study process, the next step is the operation (3). This phase consists of the execution of the previously defined and planned case study (Wohlin et al. 249 2014). For this, three procedures were necessary. The first was preparation, which involved the clipping of the sample from the classificatory structure and the publication, selecting subjects from the Personality facet, as can be seen in Figure 3. In this case, the terms idea, thought and concept were broken down. Regarding the academic document, we decided to consider chapters 2, 3 and 4 of Naves’s thesis (2000). This choice was due to the fact that these chapters constitute the conceptual definitions in the thesis in question. Figure 4 shows the thesis’ summary, in which these chapters can be seen. We also decided to remove images from the chapters, since Semantizar cannot support image analysis. The second procedure, execution, comprised the processing of the sample snippets on Semantizar. Finally, the last procedure was the validation, in which a refinement and compilation of the data collected was made in order to avoid interferences in the data analysis and interpretation. Both in the validation of the data and in the results, the concepts were observed individually and with their pairs. Figure 3: Snippet of MHTX faceted structure Figure 4: Fragment of the expanded summary of the Naves (2000) 250 4.0 Results Figure 5 presents a conceptual map generated from the classificatory structure without Semantizar. In this Figure, it appears that the explicit semantic relations are due to the naming of the sub-facets used by Lima (2004) (see these sub-facets highlighted in Figure 3). Also, it is observed that there is a semantic relation between information professional (Profissional da Informação) and librarian (Bibliotecário) created from indentation that denotes a type of hierarchy. However, it was not possible to specify what the semantic relation really is. Figure 5: Clusters resulting from conceptual map of the MHTX faceted structure without Semantizar. In the organization of the faceted structure in the concept map, the presence of two clusters is verified, as highlighted in Figure 5. The first group consists of concepts related to text (texto), and the other consists of concepts related to indexer (indexador). In clusters, objects belonging to a group are related to each other, however, they do not relate to concepts that are outside of their group. Therefore, in the conceptual map generated from the semantic relations found in Semantizar, there is a cohesion between all concepts in such way that these clusters are not possible to be obtained, that is, the concepts are all related to each other, as seen in Figure 6. With the use of Semantizar, it was possible to explicit 101 semantic relations. 5.0 Contributions The relations between all concepts were possible with the support of Semantizar, which allowed the creation of a representation that covered every concepts of a semantically related classificatory structure. So, the user can view all possible relations between the concepts. Therefore, Semantizar achieved its goal of semantically enriching a classificatory structure. The performance of the case study, within the scope presented, was possible due to the computational support of Semantizar. This task performed manually could demand time and effort on the part of the professional who executes it. In this sense, Semantizar facilitated the extraction and explanation of semantic relations, essential for the knowledge representation, semi-automating this task. Thus, the Semantizar contributed to the knowledge representation based on a classificatory structure, showing itself to be objective when detecting two concepts in a sentence from extensive text. The identification of concept pairs within sentences is one of the most laborious steps in the context that Semantizar was created to operate. 251 Figure 14: Conceptual map of the semantic relations established in the case study. Therefore, through Semantizar, it was possible to establish 101 semantic relations, including inverse ones, in 53 different concept pairs, based on a classificatory structure containing 22 concepts. With that, it was possible to improve the semantics of the sample. Another important contribution of Semantizar was that, in a way, it acted as a validating agent of the classificatory structure, as it indicated 199 signs of semantic relations in 131 sentences from the snippet of the academic document from which the classificatory structure originated. It is believed that this amount of evidence is an important indicator to suggest that the structure relevantly represents the academic document. Another important factor in this regard concerns the sentences in the academic document, as some of them encompassed more than a clue of semantic relation, which reinforced confidence in the classificatory structure. In addition, it was possible, through Semantizar, to indicate the most important concept pairs for the academic document. During the analysis of the data, it was found the concept pairs that occurred most frequently were the most relevant in the represented domain. In the case of the sample, two pairs stood out: indexer and document (indexador e documento) and indexer and text (indexador e texto). Considering these pairs were extracted from the thesis entitled Interfering factors in the process of subject analysis: indexer case study, by Naves (2000), it can naturally be said, even without reading the entire content of the academic document, that such pairs are the most important in the 252 domain. Consequently, Semantizar also indicated the most important concepts for the academic document. In the same way as for the pairs of concepts, considering the concepts that occurred most frequently in the semantic relations, it can be said that the most important ones in Naves’s thesis (2000) were indexer (indexador), text (texto) and document (documento), also taking into consideration that text and document were classified as almost synonyms, which also indicates coherence in determining this semantic relationship. It was also found during the analysis that, many times, the inverse relations were not possible because the relations were indirect, the concepts existed: subject (assunto), content (conteúdo) and information (informação). Therefore, it was discovered that these concepts should compose the classificatory structure due to the fact that they repeatedly occurred and that they are representative for the domain. Similarly, it was noticed the concepts with the most evidences false in the context in which they were found were those that are routinely used in academic documents, such as concept (conceito), idea (ideia) and authors (autores). In this sense, a review of the classificatory structure to indicate whether they remain to represent the academic document was suggested. Thus, in pointing out these suggestions, it can be said that Semantizar can operate to refine the classificatory structure. It was also noted that the extraction of concepts can be performed from lists of terms, as the inherent hierarchy of the classificatory structure was not decisive in Semantizar for the indication of the existence of a semantic relation. Finally, the case study carried out made important contributions to research on the extraction of semantic relations. Those contributions are: 1) The presence of two concepts in a sentence is an indication of the existence of a semantic relation between these concepts; 2) A pair of concepts can have more than one semantic relation; 3) A pair of concepts can have the same semantic relation, even in different contexts; 4) The context, and the knowledge about it, is fundamental for determining the type and/or subtype of the semantic relation for the same pair of concepts with different relations; 5) The determination of semantic relations, in the way it was treated by Semantizar, depends on human interpretation; 6) The verbs are the main grammatical class for defining a semantic relation and; 7) Not all semantic relations can be explained. References Green, Rebecca, Carol A. Bean, and Sung Hyon Myaeng. 2011. The Semantics of Relationships: An Interdisciplinary Perspective. Dordrecht: Springer. Khoo, Christopher S.G. and Jin-Cheon Na. 2006. “Semantic Relations in Information Science.” Annual Review of Information Science and Technology 40, no. 1: 157-228. Lima, Gercina Angela Borem de Oliveira. 2004. Mapa Hipertextual (MHTX): Um Modelo para Organização Hipertextual de Documento. Ph.D. dissertation. Belo Horizonte: Federal University of Minas Gerais. Maia, Lucinéia Souza. 2018. Extração e Explicitação de Relações Semânticas para a Representação do Conhecimento de Documentos Acadêmicos: Um Estudo de Caso a Partir de uma Estrutura Classificatória. Ph.D. dissertation. Belo Horizonte: Federal University of Minas Gerais. Maia, Lucinéia Souza, Lima, Gercina Ângela de, and Maculan, Benildes Coura Moreira dos Santos. 2017. “Taxonomia dos Tipos de Relações Semânticas para a Organização e Representação do Conhecimento: Uma Proposta a Partir da Literatura.” Tendências da Pesquisa Brasileira em Ciência da Informação 10, no. 2. 253 Naves, Madalena Martins Lopes. 2000. Fatores Interferentes no Processo de Análise de Assunto: Estudo de Caso de Indexadores. Ph.D. dissertation. Belo Horizonte: Federal University of Minas Gerais. Wohlin, Claes, Per Runeson, Martin Höst, Magnus C. Ohlsson, Björn Regnell, and Anders Wesslén. 2014. Experimentation in Software Engineering. Berlin: Springer. Carlos Henrique Marcondes – Federal University of Minas, Brazil Célia da Consolação Dias – Federal University of Minas Gerais, Brazil Representing Faceted Classification in SKOS Abstract: Faceted classification is one of the most important contributions from knowledge organization to information resources management. Proposed in the twentieth century by pioneers such as Bliss and Ranganathan as an alternative to the rigidity of enumerative classification, faceted classification has gained importance, both academic and practical. Today it is widely used in the web architecture, from scientific to commercial sites, and even as a methodology for the development of ontologies. The web is evolving towards the vision of the Semantic Web, in which resources are not only navigable and understandable by humans, but also with content that has precise meaning. This feature enables computer applications to perform sophisticated tasks. A key issue with assigning meaning to the descriptions of web resources is the vocabularies. Many such vocabularies are faceted. Can faceted classification play a role in the Semantic Web? How could SKOS be extended to represent faceted classification? What type of entity is a facet? What are its components? Could faceted classification be formalized to be represented in Semantic Web standards? Could the current KOS evolve to take advantage of the potentialities of Semantic Web technologies? The aim of this paper is to achieve a conceptual model of a faceted classification and its components; to code this model in SKOS and to evaluate such codification. Canonical definitions of “facet” and its components are used to develop a semantic model of a facet schema and its components. Based on this model, a proposal of codification in SKOS is achieved and evaluated. 1.0 Introduction In her article “A Semantic (Faced) Web?” La Barre asks the question of how to integrate faceted classification into the Semantic Web: “The chief focus is upon Semantic Web implementations that employ, adapt, or misconstrue the theory or practice of facet analysis and Faceted Classification. A secondary focus is upon suggestions for the creation of operational definitions and functional requirements for facet theory that may serve to enhance, amplify or extend current understandings and practices in Semantic Web implementations.” (La Barre 2010, 103). Faceted classification is one of the most important contributions from the knowledge organization to information resources management. Proposed in the twentieth century by pioneers such as Bliss and Ranganathan as an alternative to the rigidity of enumerative classification, faceted classification has gained importance, both academic and practical. Today it is widely used in web architecture, from scientific to commercial sites (Vickery 2008; La Barre 2006, 50), as an information retrieval device (Broughton 2006) and even as a methodology for the development of ontologies (Prieto-Diaz 2002). Hudon (2019) reminds many authors of the importance of faceted classification applications, such as serving as navigational tool for websites, structuring systems of objects and information about them, and assisting in the understanding of the complex relationships between objects. Broughton and Slavic (2007, 728) stress that “...the potential for faceted approaches to information retrieval in electronic environments had been perceived as early as the beginning of the 1980.” The web is evolving towards the vision of the Semantic Web, in which resources are not only navigable and understandable by humans, but also by machines. Semantic Web applications navigating between resources can process such resources to perform sophisticated tasks. Within this context, interoperability is key (Zeng 2019), so that generic Semantic Web applications can interact with web resources. Meaning is assign 255 to such web resources by describing them with different vocabularies. To enable Semantic Web applications to interact with web resources, they must be described formally and with standards languages whose constructs make reference through Internationalized Resource Identifier (IRI)1 to vocabularies where terms have precise meaning and global scope. Recently several vocabularies developed for information retrieval, library systems or databases have been adapted for Semantic Web technologies and for reference through IRI; many of them incorporate facets. To provide a bridge between Knowledge Organization System (KOS) and the Semantic Web, a standard, Simple Knowledge Organization System (SKOS), a metadata model to describe web resources in the Resource Description Framework (RDF)2, has been under development; since 2009 SKOS is a W3C standard. However, the present version of SKOS vocabulary does not support the representation of faceted KOS (La Barre 2006, 116) (W3C 2009). How could SKOS be extended to represent faceted classification? What type of entity is a facet? What are its components? Could faceted classification be formalized in Semantic Web standards (La Barre 2006, 111; Miles and Bechhofer 2008) to take advantage of the potential of such technologies? The aim of this paper is to represent a faceted schema in SKOS; in order to achieve it, a conceptual model of a faceted classification and its components was developed; such a model was coded in SKOS, and such codification was evaluated. 2.0 The method Canonical definitions of facets and their components found in KOS literature are used as bases to identify components of a facet. Sources discussing and defining metaphysical entities as classes, subclasses, instances, properties, attributes, characteristics and relationships are used for achieving a semantic model of a faceted schema. Based on this model a proposal for codification in SKOS is developed and evaluated. TemaTres ( software is used to generate the codification in SKOS. 3.0 Results This section contains an analysis of definitions of the concept “facet” to use as bases to develop a conceptual model of a facet schema and its components. On the basis of such a model, a codification in SKOS is achieved. 3.1 What is a facet? What is faceted classification? La Barre (2003) observed that there is no consensus from scholars for the meaning of term “facet”. To understanding the meaning of “facet” canonical definitions were selected from KOS literature. While Ranganathan (1967a) emphasizes the aspects of a basic subject and its compound subjects, Mills and Broughton (1977) focus on subclasses and their principles of division, Soergel (1995) sees facets as entities and finally Taylor (1992) 1 IRI - Internationalized Resource Identifier, 2 RDF, The Resource Description Framework, is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model 256 discusses a vision of classes specifically concerning their aspects, properties, and characteristics. Proposed in the twentieth century by Ranganathan and others faceted classification is an alternative to the rigidity of enumerative bibliographic classifications in which a book is assigned to one class its general subject. Such classifications are problematic when a book has more than one subject or “points of view” or facets, such as the book “Control of virus diseases of the stem of rice plant in the winter of 1967 in Madras” (Ranganathan 1967b, 13). In faceted classification, each subject within a compound subject is considered a facet in generating by a synthetic process, a notation used to locate the book in a specific shelf and retrieve it. Thus faceted classifications are also information retrieval devices used to retrieve entities’ representations based on their properties – where those are the different subjects of complex subjects as suggested by Ranganathan or product’s characteristics in an e-commerce site. Ranganathan (1967b, 88) conceived facet as “a generic term used to denote any component - be it a basic subject or an isolate - of a compound Subject, and also its respective ranked forms, terms and numbers.” Soergel (1995, 258) says, “facets are aspects or viewpoints from which entities such as food products or subjects (topics, themes) in an area such as education - can be analysed.” Svenonius (2000, 139) sees facets as categories of generality, defining as “grouping of terms obtained by the first division of a subject discipline into homogeneous or semantically cohesive categories.” Facets are defined by Taylor (1992, 274) as "clearly defined, mutually exclusive, and collectively exhaustive aspects, properties or characteristics of a class or specific subject” or, by Mills and Broughton (1977, 38), who write "A facet may be defined as the total set of subclasses produced when a class is divided by a single broad principle..." Both definitions introduce the concept of class and its division based on its aspects, properties, or characteristics. A classification schema is defined as “a list of classes arranged according to a set of pre-defined principles for the purpose of organizing items in a collection or entries in an index, bibliography or catalog into groups based on their similarities and differences to facilitate access and retrieval.” (Fallucchi and De Luca 2018). Another definition of schema classification highlighted by Jacob (2004, 524) agrees that “a classification scheme is a set of mutually exclusive and nonoverlapping classes arranged within a hierarchical structure and reflecting a predetermined ordering of reality.” A faceted schema is also a structure to represent entities and relationships. According to NISO (2005) faceted analysis is a way of organizing knowledge. Facet analysis is particularly useful for: “• new and emerging fields where there is incomplete domain knowledge or where relationships between the content objects are unknown or poorly defined; • interdisciplinary areas where there is more than one perspective on how to look at a content object or where combinations of concepts are needed; • vocabularies where multiple hierarchies are required but can be inadequate due to difficulty in defining their clear boundaries; or • classifying electronic documents and content objects where location and collocation of materials is not an important issue.” NISO (2005, 13). 257 Although Ranganathan conceives facets within the scope of the bibliographic classifications of books, contrary to enumerative bibliographic classifications in which to a book is assigned only one subject that defines its position within the classification schema, Ranganathan realized that books could be about several subjects simultaneously. Those component subjects of a compound subject, as previously mentioned, are its facets. A subject is a relationship between a book and what it is about; a component of a subject is a component of any (ontologically) thing a book is about. Conceptual models such as FRBR (IFLA 1997) and LRM (Riva et al 2017) consider subject according to the same ontological view. 3.2 What are the components of a faceted schema? According to De Grolier (1965, 102), “the term facet itself is just a new, fashionable, word for designating the series of subdivisions of a given subject according to one, and one only, of its characteristics.” Facet analysis aims to meet users’ needs to access information: “what entities [and] what aspects of those entities are of interest to the user group” throughout the process of conceptual analysis” (Vickery 1960, 11). This is a fundamental criterion in the development of faceted classification. In an example cited in the ISO 25964-1 (2011, 69) standard, there is a subclass of the agricultural industries products class, dairy products, milk; properties of milk, as milk by fat content, milk by form, milk by source animal; and milk by treatment type. From the milk by fat content property, the following facets are derived, giving rise to subclasses: whole milk, low-fat milk, and skim milk. From the property milk by form are derived the following facets or subclasses: dried milk and liquid milk. From the property milk by source animal the following facets or subclasses are derived: buffalo milk, cow milk, goat milk, and sheep milk; and from the property milk by treatment are derived the following facets or subclasses: condensed milk, evaporated milk, homogenized milk, pasteurized milk, and sterilized milk. To this schema, we added, to help develop our arguments, another class, producer, with three subclasses (two of them milk producers and other a tractor producers), Nestle Massey Ferguson, and Parmalat. We can distinguish the following elements in this example: the class, milk, one of its subclasses according to a criterion or facet (a milk property, the treatment applied to the milk), milk by treatment; and the subclasses derived from applying this criterion, condensed milk, evaporated milk, homogenized milk, pasteurized milk, and sterilized milk. Vickery (2008, 156), discussing the structure of a faceted classification, distinguishes elements such as D, a Domain, S1, S2, S3..., Subject fields within a Domain, F1, F2, F3... facets within each Subject field, T1, T2, T3... terms within each facet, and the order of terms within each facet. Within the elements listed by Vickery, Subject fields may be associated with Classes in Taylor’s (1992, 274) previous definition. Three elements can be identified in these definitions: a class; the set of subclasses, generated based on its aspects; and properties or characteristics. In the previous examples and in many others, a facet is always defined relative to a class by applying to it a criterion, there is not a facet that is not a facet of a class. 258 According to Ranganathan (1967b, 55), “Characteristic – an attribute or any attribute complex with reference to which the like or unlikeness of entities can be determined and at least two of them are unlike.” Facets might be those entities identified in metaphysics and ontological analysis as “properties.” Properties are existentially dependent entities, as their existence depends on the existence of the entities that are the bearers of these properties; a specific marriage cannot exist without the existence of the individuals forming the couple. A specific blue colour cannot exist without being the colour of a specific blue object, such as my blue shoes. Marriage and colour are specific types of properties; the former is a relationship, the last is an attribute (Guarino 1997). Web languages such as Ontology Web Language (OWL) and the Resource Description Framework Scheme (RDFS) distinguish class-subclass properties, or unary properties, from binary properties. The first type comprises the taxonomic or paradigmatic structure of a domain. The two former types of properties are called in OWL an Object Property (a relationship) and Data Property (an attribute). Object properties require the existence of two entities, one being the domain and the other the range of a specific relationship. A marriage is a typical Object Property. Data Property requires just one entity, the domain of a Data Property, while the range is a set of possible values. A colour is a typical Data Property, as it assigns a data value, “blue” to the property colour of an entity. Sowa (2000, 32) makes same distinction between attributes and characteristics. “Properties (also called ‘attributes,’ ‘qualities,’ ‘features,’ ‘characteristics,’ ‘types’) are those entities that can be predicated of things or, in other words, attributed to them” (Orilia and Swoyer 2011). To Aristotle (1991, 3) the notion of a property, or characteristic, rests on that of predicates; properties are predicated of subjects; hence they do not exist without being properties of something “(for all colour is in a body” as they are dependent of something to which they are predicated. The same notion is also in Chen’s Entity-Relationship Model (1976): a domain can be modelled identifying the entities, the relationships and the attributes of entities and relationships. “A substance—that which is called a substance most strictly, primarily, and most of all—is that which is neither said of a subject nor in a subject, e.g. the individual man or the individual horse. The species in which the things primarily called substances are, are called secondary substances, as also are the genera of these species” (Aristotle 1991, 4). In this citation, Aristotle relates (first) substance to secondary substances, the genera and the species, defining a class-subclass relationship. A (secondary) substance holds the essence of an entity. Substances are organized in hierarchies of class-subclasses in which subclasses are defined by their essences as having the genus of its parent class plus a differentia from it, in a sequence of increasing specificity. Aristotle (1991, 44) also distinguishes “an accident or property of a thing,” those categories that qualify a subject: quantities, qualities, relations, location, time: “Of things said without any combination, each signifies either substance or quantity or qualification or a relative or where or when or being-in-a-position or having or doing or being-affected” (Aristotle 1991, 3). Applying a property to a class generates subclasses (classes and subclasses are universals, secondary substances according to Aristotle), the individuals or instances that 259 make up the extension (Orilia and Swoyer 2011, section 1.1.4) of the class (first substances according to Aristotle). The property (also a secondary substance) is a subclass of, or is a type of, relates a class and a subclass. The property is an instance of relates a class and its instances. Frické (2010) call this kind of instantiation “first-order instantiation,” or the first-order property. He claims that there is a second kind, the property is an instance of, as in the case of Aristotle citation, the second-order instantiation or the application of secondorder properties to first-order properties; that is the case of ‘being a species’ applied to the first order property ‘being a tiger’ – the genera. Tiger is an instance of the secondorder type species. According to this perspective, organizing books on poetry, prose, or theatre by literary genre means applying a secondary property (the literary genus) to primary properties (poetry, prose, theatre), the values of literary genus assigned to each book. The concept of second-order property is similar to that of meta-property (Guarino 1997). Applying Frické concptualization to the ISO 25964-1 example: - Shere Khan (first-order instance) -> tiger (genus) -> animal (species); - condensed milk, etc. (second-order instance, subclass) -> milk by treatment (property) -> milk (class). A property (or characteristic or facet) divides a class, generating different subclasses, one for each different value of that property existing within the domain. According a facet is a non-sortal or characterizing property of a class (Orilia and Swoyer 2011, section 7.8), generating subclasses but no individuals. Faceted classification is not concerned with the class-subclass hierarchy, a “classification ontology” according to Giunchiglia, Dutta and Maltese (2014, 52) within a domain but rather in a given a class, finding the properties of this class, a “descriptive ontology” of interest to users (Giunchiglia, Dutta, and Maltese 2014, 53), and the instances within the domain with different values to each to these properties. The same notion exists in knowledge organization, between paradigmatic relationships, those permanent, structural or taxonomic relations within a domain, and syntagmatic relationships, those ad hoc, a posteiori or transient relationships (Khoo and Na 2006, 164). First-order logic and languages such as OWL do not deal well with second order properties as the scope of quantifiers range over individuals (Väänänen 2019). However second order properties are appropriate to specify faceted classification, as they raise not only the individuals that are instances of a class but also the subclasses that are instances of a class, according to Frické (2010). 3.3 How to map components into a conceptual model? Faceted classification have as components classes, their facets, i.e., their properties (unary properties – subclasses -, and binary properties - relationships), which constitute the criteria to derive instances, and the instances themselves (first- and secondorder instances) of each specific facet. Eventually, the order of instances within each facet is also specified. There is also a difference between the two types of facets. One is those derived from classes-subclasses relationships, where the two relata are subclasses of just one primary class within the domain, while the other is those derived from relationships in which the two relata belong, or are subclasses of, different primary classes within the domain. 260 According to Frické (2010, 44) first-order logic is adequate to discuss second order properties. Applying this conceptualization proposed by the author, plus the OWL concept of properties (Data and Object Properties) of each having a domain and a range. A logic theory is proposed in order to formalize the results of the previous analysis. - Be D1 a domain formed by the primary classes (or unary properties) C1(x1), C2(x2), C3(x3) …; - Be C1(x1), one such classes; - Be c1 = {x1: C1(x1)} (c1 is the concept of C1); - Be P1, P2, P3… the properties of the class C1(x1); - Be P1 = {a1, a2, a3, an …} (a1, a2, a3, an … are the extension of P1, i.e., instances of P1). - Be p1 = {x1,y1: P1(x1,y1)} (p1 is the concept of P1); - Be P1(c1, p1) i.e., the binary properties with domain c1; - Definition: F1C1(c1,P1): P1(c1, p1) (Facet F1C1 is the class defined by the predicate, or criterion P1, having as domain the concept of class c1 ({c1}) and as range the instances of P1, i.e. the class defined by instances of the relationship P1, as for example, in the case of P1 = milk_by_treatment, the set s1: {condensed milk, evaporated milk, homogenized milk, pasteurized milk, and sterilized milk }. In this example, as milk_by_treatment is one of the attributes of milk, FC1 maybe a Data Property facet in OWL sense. - Definition: FC1(c1,p): ∀R(c1,p) (the Facets of a class C1 are all the binaries properties that have as domain the class c1). Another example, be P2 the property milk_by_producer. F2C2(c1,P2) is the Facet of C1, i.e. the class formed by the set s2 {Nestle, Parmalat}. In this example, as producer is a different class from milk, FC2 may be an Object Property facet in the OWL sense. - Be FC1 = {a1, a2, a3, an …} (a1, a2, a3, an … are the instances of P1). - Be FC2 = {b1, b2, b3, bn …} (b1, b2, b3, bn … are the instances of P2). - Definition: FC1 → □ (an ≠ am) (two second order instances generated by the same facet cannot be equal). - Definition: F1C1, F2C2 → ◊ (a = b) (it is possible that two second-order instances generated by different facets be equal). 3.4 How to map the conceptual model into SKOS? This section presents the mapping of the conceptual model’s examples of facets in SKOS. Due to page limitations on the number not all the concepts in the example of ISO 25964-1 are presented, only those needed to illustrate the proposal. milk (source_animal) 261 source_animal milk buffalo_milk (source_animal) 1 Nestle (producers_of_milk) 1 producers_of_milk (milk) 4.0 Concluding remarks The concept is the fundamental element of the SKOS vocabulary. In basic SKOS, conceptual resources (concepts) can be identified with URIs, labelled with lexical strings in one or more natural languages, documented with various types of note, semantically related to each other in informal hierarchies and association networks and aggregated into concept schemes. Finally, semantic relations play a crucial role for defining concepts, by assigning meaning and context. (SKOS 2009). 262 Faceted classification was conceived by Ranganathan to deal with subject components of compound subjects; it has been evolving towards handling with different properties of documents – subjects, authors, publication date – and with different properties of things, organizing them in descriptive ontologies. Faceted classification is an information retrieval device. Its effectiveness depends on the analysis and identification of the properties of the objects being described, according to relevance criteria. While taxonomic classification emphasizes the Aristotelian substances, faceted classification emphasizes the accidents, i.e., the different and relevant properties (according to the modelling here proposed) through which the information objects may be retrieved. The aim of this paper is to develop a conceptual model of a faceted classification and its components; this model is coded in SKOS, and the codification evaluated. We propose conceptualizing facets as properties (relationships and attributes, attributes and characteristics, data type and object type properties, according to the different authors or sources identified). This modelling option makes explicit the components encompassed by a facet and its interrelations: a class with its properties, a facet (one of the properties of a class, a criterion) and the subclasses or instances created by applying that criterion to the class. Conceptualized in this way, its coding in SKOS follows. The coding maintains and adds to the constructs of the SKOS vocabulary, in addition to using similar constructions as those used in Semantic Web languages, such as RDF, RDFS and OWL, such as classes, subclasses, and properties etc. This helps to bring KOS closer to the mainstream of the Semantic Web. Acknowledgments: We are grateful to Prof. Linair Campos for the valuable comments about this paper. This work was carried out with the support of the Brazilian agencies CAPES - Financing Code 001 and CNPq, grant number 305253/2017-4. References Aristotle. 1991. Complete Works. Ed. Jonathan Barnes. Princeton, N.J.: Princeton University Press. La Barre Kathryn. 2010. “A Semantic (Faced) Web?” Les Cahiers du Numérique 6, no. 3: 103- 131. Broughton, Vanda. 2006. “The Need for a Faceted Classification as the Basis of All Methods of Information Retrieval.” ASLIB Proceedings 58, no. 1: 49-72. Broughton, Vanda and Aida Slavic. 2007. “Building a Faceted Classification for the Humanities: Principles and Procedures.” Journal of Documentation 63: 727-754. Chen, Peter Pin-Shan. 1976. “The Entity-Relationship Model-Toward a Unified View of Data.” ACM Transactions on Database Systems 1, no.1: 9-36. Fallucchi, Francesca and Ernesto William De Luca. 2018. “Connecting and Mapping LOD and CMDI Through Knowledge Organization.” In Metadata and Semantic Research. MTSR 2018, edited by E. Garoufallou, F. Sartori, R. Siatri, and M. Zervas. Communications in Computer and Information Science 846. Cham: Springer, 291-301. Frické, Martin. 2010. “Classification, Facets, and Metaproperties.” Journal of Information Architecture 2: 2. Grolier, Éric de. 1965. On the Theoretical Basis of Information Retrieval Systems: Final Report. Washington, D.C.: Air Force Office of Scientific Research. 263 Giunchiglia, Fausto, Biswanath Dutta, and Vincenzo Maltese. 2014. “From Knowledge Organization to Knowledge Representation.” Knowledge Organization 41: 44-56. Guarino, Nicola. 1997. “Some Organizing Principles for a Unified Top-Level Ontology.” In: AAAI Spring Symposium on Ontological Engineering, 57-63. Hudon Michèle. 2019. “Facet.” In: ISKO Encyclopedia of Knowledge Organization, edited by Birger Hjørland and Claudio Gnoli. ISO/DIS 25964-1. 2010. Thesauri and Interoperability with other Vocabularies, Part 1: Draft for Comment: Thesauri for Information Retrieval. London: British Standards Institution. Jacob, Elin K. 2004. “Classification and Categorization: A Difference That Makes a Difference”. Library Trends 52, no. 3: 515–540. International Federation of Library Associations and Institutions (IFLA) 1997. Study Group on Functional Requirements for Bibliographic Records: Final Report. UBCIM Publications New Series. München: K. G. Saur. Khoo, Christopher S.G. and Jin-Cheon Na. 2006. “Semantic Relations in Information Science.” Annual Review of Information Science and Technology 40: 157-228. Miles, Alistair and Sean Bechhofer. 2008. “SKOS Simple Knowledge Organization System Reference.” W3C Mills, Jack and Vanda Broughton. 1977. Bliss Bibliographic Classification. Introduction and Auxiliary Schedules. (2nd ed.). London, Butterworth. National Information Standards Organization. NISO. 2005. Z39.19-2005 - Guidelines for the Construction, Format, and Management of Monolingual Controlled Vocabularies. Baltimore, MD: NISO Orilia, Francesco and Chris Swoyer, 2011. “Properties”. In The Stanford Encyclopedia of Philosophy (Winter 2011 Edition), edited by Edward N. Zalta. Prieto-Diaz, Rubén. 2002. “A Faceted Approach to Building Ontologies.” In The 21st International Conference on Conceptual Modeling, Tampere, Finland, 2002. Ranganathan, Shiyali Ramamrita. 1967a. “Hidden Roots of Classification.” Information Storage and Retrieval 3, no. 4: 399-410. Ranganathan, Shiyali Ramanrita. 1967b. Prolegomena to a Library Classification. Bombay, Asia Publishing House. Riva, Pat, Patrick Le Boeuf, and Maja Žumer. 2017. IFLA Library Reference Model: A Conceptual Model for Bibliographic Information. Netherlands: IFLA. Svenonius, Elaine. 2000. The Intellectual Foundation of Information Organization. MIT, Cambridge. Soergel, Dagobert. 1995. “The Art and Architecture Thesaurus (AAT): A Critical Appraisal.” Visual Resources 10, no. 4: 369-400. Sowa, John F. 2002. Knowledge Representation: Logical, Philosophical, and Computational Foundations. Ed. Brooks/Cole Pacific Grove CA. Taylor, Arlene. G. 1992. Introduction to Cataloging and Classification. 8th ed. Englewood, Colorado: Libraries Unlimited. Väänänen, Jouko. 2019. "Second-order and Higher-order Logic." In The Stanford Encyclopedia of Philosophy (Fall 2019 Edition), edited by Edward N. Zalta. Vickery, Brian. 2008. “Faceted Classification for The Web.” Axiomathes 18, no. 2: 145-160. Zeng, Marcia Lei. 2019. “Interoperability.” Knowledge Organization 46: 122-146. W3C. 2009. SKOS Simple Knowledge Organization System Primer. Daniel Martínez-Ávila – Universidad Carlos III de Madrid (UC3M), Spain Fidelia Ibekwe – Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France Fernanda Bochi – São Paulo State University (UNESP), Brazil The Epistemic Communities and Evolution of Knowledge Domains A Domain Analysis of the Journal Education for Information Abstract: Bibliometrics has been presented as one of the approaches to domain analysis. In this context, the relationship between domain analysis and journals has also been explored. There have been several bibliometric studies of specific journals using a domain-analytic approach, often published in ISKO venues, such as those carried out on the journal Knowledge Organization, and the Spanish KO-specific journal Scire. These domain analyses helped to determine the relevance and interest of these journals for the KO and other communities. We propose to study the epistemic communities around the journal Education of Information. An Interdisciplinary Journal of Information Studies. One of the assumptions of our research is that the domain of a given journal is delineated by its epistemic communities. We performed simple bibliometric counts on authors’ affiliations and number of co-authorships in order to identify research elites across the seven periods covering the existence of the journal (1983-2018). We then mapped the co-authorship networks of the journal in order to visualise how its epistemic communities emerged and changed during that time. 1.0 Introduction The field of knowledge organization is institutionalized, among other things, by professorships at universities around the world, by teaching and research programs at institutions of higher education, by conferences such as the ISKO meetings and scholarly journals (Hjørland 2016). Scholarly journals are of utmost importance not only for the publication and socialisation of research but also for the shaping of the epistemic communities that constitute the domain. Scholars very often do not just consider themselves theoretically and socially integrated into a department or school of a higher institution, but feel more part of the community of researchers that publish, review, and interact in one or several journals that share paradigms and theoretical assumptions underlying their research domains. In this sense, we might say that the authors and the editorial committee of a journal contribute in defining the journal’s domain in perhaps a more influential way than the journal’s scope. Smiraglia (2015, 9) explained the relationship between journals and domains as follows: “Journals are the formal venues for most scholarly communication, and studying them as whole works is also one means of identifying productive elements of a research front. Of course, few journals are devoted to topical areas that are as narrowly defined as most domains under study. For exampe [sic], even in the field of knowledge organization, the principle journal Knowledge Organization is devoted to the entire field. Thus, it would likely be the most cited journal in all domains within KO, but there are no journals devoted to specific narrow aspects of KO, such as “integrative levels,” “multilingual thesauri,” or “ethics in KO.”” Of course, this composition is complex and presents almost as many variables related to the definition of the domain as bibliometric indicators can be used for domain analysis. Bibliometrics was presented by Hjørland (2002; 2017) as one of the approaches to domain analysis. Several authors, including Chen, Ibekwe-SanJuan, and Hou (2010), 265 Ibekwe-SanJuan (2008), Ibekwe-SanJuan and SanJuan (2010), and Smiraglia (2015), have explored, in practice, how bibliometrics can be used to map knowledge domains. In particular, Smiraglia has conducted some of the most prominent domain analyses for knowledge organization such as the bookshelf studies of the ISKO international proceedings (Smiraglia 2008; 2011; 2013; 2014; 2017; 2018). Other bibliometric studies have focused on the journal Knowledge Organization (Smiraglia 2012; Guimarães, Martínez-Ávila, and Alves 2015; Alves, Dalessandro, and Bochi 2019) and on the journal Scire (Guimarães, Pinho, and Ferreira 2012; Oliveira et al. 2017). Studies of other journals that also use bibliometric techniques especially in relation to authorship and that could also be considered domain analyses in this vein include those carried out on the journal Scientometrics (Oliveira and Grácio 2012) and Journal of Informetrics (Hilário and Grácio 2018). Our study aims to identify the epistemic communities formed around the journal “Education of Information. An Interdsiciplinary Journal of Information Studies” (EFI) using simple bibliometric techniques that focus on bibliographic units such as authors’ affiliations, co-authorships and research elites. A core assumption of our research is that the domain of a given journal can be delineated by the epistemic communities identified with it. This is especially important for journals that are not highly specialised and have a general scope that might correspond to broad categories such as JCR’s “Information Science & Library Science” or Scopus’s “Library and Information Sciences”; or for journals whose titles suggest an interdisciplinary scope. For instance, the journal under study was called “Education for Information” (EFI) until early 2019 when the subtitle “An Interdsiciplinary Journal of Information Studies” was added to it. The word “education” in its title continues to attract submissions from scholars in the field of education despite its aspiration to be an interdisciplinary journal in the broad field of LIS journal as shown by its scope1. This journal welcomes papers on knowledge organization as part of the LIS field and has a good number of ISKO researchers in its editorial board. 2.0 Method The journal Education for Information was founded in 1983. To perform an analysis of the most productive affiliations, authors, and co-authorship networks, we split its 36 years of existence into seven five-year periods. According to Guimarães, Martínez- Ávila, and Alves (2015), “a five-year period is considered to be an adequate range to characterise scientific production.” Indeed, this was the time range used by these authors to study the epistemic communities of the journal Knowledge Organization. We also believe five-year periods is an adequate range to study the evolution of the 36 years of existence of the journal in seven periods as this number of periods presents a good balance between manageability and a good level of detail for the analysis. Using Price’s Elitism Law (Price 1963) that indicates that the elite of a certain domain (the most productive authors) is represented by the square root of the total amount of authors or publications of the studied domain, we calculated the research elites and the most productive institutions using authors’ affiliations as input. 1 See for its scope. 266 Price’s Elitism Law has been used in bibliometric studies applied to knowledge organization (Guimarães and Tennis 2012). For the co-authorship networks, we used the software Ucinet version 6.6 and built seven matrices of authors for the seven periods (20x20, 28x28, 57x57, 53x53, 21x21; 49x49; 49x49). We standardised the name of these authors using the authority records of Scopus and the Web of Science when available, in order to avoid redundancies and variations in the names of the same authors. 3.0 Results The authors in the research elites come from a total of 53 institutions based on their afilliations. Figure 1 shows the nine most productive institutions across the seven 5 year periods. The results show the leading role of the Aberystwyth University, and more specifically the College of Librarianship Wales (CLW) with 29.09% (16) of EFI publications in the first ten years (1983-1992). According to its biographical history2, the CLW was established in 1964 and it was the first library school in Wales. In 1989, the college merged with the University of Wales, Aberystwyth, becoming the Department of Information and Library Studies. The second institution by order of publications in EFI is McGill University in Canada with 16.37% (9) of the publications distributed in three periods (1983-1987, 1993-1997, 2013-2018). The publications come from three different units at McGill University: the School of Information Studies, the McLennan-Redpath Library, and the Department of Family Medicine. The University of Northumbria at Newcastle in the United Kingdom, the University of Ibadan in Nigeria, Robert Gordon University in the United Kingdom, and Charles Sturt University in Australia, each represented 9.09% (5) of the publications in the seven periods. The contributions of these four institutions come from departments in Library and Information Science. The University of Sheffield in the United Kingdom was present in the first three periods with 7.28% (4) publications while Queen’s University of Belfast and Loughborough University in the United Kindom each accounted for 5.45% (3) of the publications. Figure 1. Most representative affiliations of the contributions to the journal Education for Information 2 College of Librarianship Wales Archive. Available at: 1 1 7 2 1 1 1 5 1 1 1 3 1 2 2 2 1 1 2 1 1 2 1 2 5 2 5 U n i v e r s i t y o f S h e f f i e l d U n i v e r s i t y o f N o r t h u m b r i a a t … U n i v e r s i t y o f Ib a d a n R o b e r t G o r d o n U n i v e r s i t y Q u e e n ' s U n i v e r s i t y o f B e l f a s t M c G i l l U n i v e r s i t y Lo u g h b o r o u g h U n i v e r s i t y C h a r l e s S t u r t U n i v e r s i t y A b e r y s t w y t h U n i v e r s i t y Affiliate Count Per Year A ffi lia tio n 1983-1987 1988-1992 1993-1997 1998-2002 2003-2007 2008-2012 2013-2018 267 Table 1 below shows that in the first three five-year periods (1983-1997), John Andrew Large (henceforth Large JA), who was the first Editor in Chief (EIC) of the journal from 1983 to 2013, was also its most productive author. This points to the prominent role played by the EIC in establishing the journal, expanding and consolidating its authorship networks and thus its initial epistemic community. This assumption is supported by the co-authorship networks we uncovered for the seven five-years periods of the study (see Figure 2 hereafter). Table 1. Research elite of the journal Education for Information for the períod 1983-2018 Years Authors Articles Published Country 1983-1987 Christine J. Armstrong 7 United Kingdom Blaise Cronin 5 United Kingdom John Andrew Large 5 United Kingdom Harold Borko 4 United States Norman Roberts 4 United Kingdom Derryan Paul 4 United Kingdom Gayle Edward Evans 4 United States Kevin J. McGarry 4 United Kingdom Richard J. Hartley 3 United Kingdom Noragh Jones 3 United Kingdom Marianne Broadbent 3 Australia M. Wise 3 United Kingdom John Harris 3 United Kingdom Michael E.D. Koenig 3 United States John R. Turner 3 United Kingdom William Fisher 3 United States William J. Martin 3 United Kingdom 1988-1992 John Andrew Large 13 United Kingdom Norman Roberts 5 United Kingdom Ian M. Johnson 5 United Kingdom Alan J. Clark 5 United Kingdom Richard J. Hartley 4 United Kingdom Joan M. Day 4 United Kingdom Blaise Cronin 4 United Kingdom Yves Courrier 4 France Thomas D. Wilson 3 United Kingdom Ronald J. Edwards 3 United Kingdom Kevin J. McGarry 3 United Kingdom David P. Woodworth 3 United Kingdom Mary Nassimbeni 3 South Africa Helen Howard 3 Canada Robin Frederick Guy 3 United Kingdom 1993-1997 John Andrew Large 8 Canada Thomas A Schröder 8 Germany Robin Frederick Guy 7 United Kingdom Thomas D. Wilson 5 United Kingdom France Bouthillier 3 Canada Alan J. Clark 3 United Kingdom Anne Goulding 3 United Kingdom Marcos Silva 3 Canada Clive Cochrane 3 United Kingdom Peter G. Underwood 3 United Kingdom Douglas Anderson 3 United Kingdom Flora Smith 3 United Kingdom 1998-2002 Ian M. Johnson 6 United Kingdom Anne Goulding 4 United Kingdom 268 Susan Hornby 4 United Kingdom Sajjad ur Rehman 4 Kuwait Charles Oppenheim 3 United Kingdom John Kennedy 3 Australia Rita Marcella 3 United Kingdom Irene Wormell 3 Denmark Graeme Baxter 3 United Kingdom 2003-2007 Andrew Kenneth Shenton 5 United Kingdom Mabel K. Minishi-Majanja 3 South Africa Jaya Raju 3 South Africa Ross Harvey 3 Australia Dennis N. Ocholla 3 South Africa Ayoku A. Ojedokun 2 Botswana Sajjad ur Rehman 2 Kuwait Pat Dixon 2 United Kingdom Clive Cochrane 2 United Kingdom Kgomotso H. Moahi 2 Botswana Ian M. Johnson 2 United Kingdom John Mills 2 Australia 2008-2012 Mutawakilu Tiamiyu 2 Nigeria Williams Ezinwa Nwagwu 2 Nigeria Valentini Moniarou-Papaconstantinou 2 Greece Adeola Opesade 2 Nigeria Mary Carroll 2 Australia Afsaneh Hazeri 2 Iran Paul T. Jaeger 2 United States Andrew Kenneth Shenton 2 United Kingdom Kemi Ogunsola 2 Nigeria Bernadette Welch 2 Australia Yun-Ke Chang 2 Singapore Daphne Kyriaki-Manessi 2 Greece Maryam Sarrafzadeh 2 Iran/ Australia Emmanouel Garoufallou 2 Greece Naomi V. Hay-Gibson 2 United Kingdom Folake Longe 2 Nigeria Rania Siatri 2 Greece Isola Ajiferuke 2 Canada Georgios A. Giannakopoulos 2 Greece Antonio Muñoz-Cañavate 2 Spain Kanwal Ameen 2 Pakistan Wole Olatokun 2 Nigeria 2013-2018 Pierre Pluye 5 Canada Vera Granikov 3 Canada Quan Nha Hong 3 Canada Margaret Max Evans 2 Canada Ya-Fei Yang 2 Taiwan Chih-Kai Chang 2 Taiwan Fatih Oguz 2 United States Kiersten F. Latham 2 United States Bashar K. Hammad 2 Jordan Dennis N. Ocholla 2 South Africa Sharon Stoerger 2 United States Fuziah Mohamad Nadzar 2 Malaysia Isabelle Vedel 2 Canada Jennifer Branch-Mueller 2 Canada 322 Others 1088 Total 1410 269 In Figure 2, red squares represent the authors from the research elites of the journal, blue squares represent the co-authors of the research elite, while the thickness of the lines (edges) represent the intensity of the co-authorships as shown by number of times they co-authored a paper. Scientific collaboration, as a social activity of science, originates from the relations between two or more authors, promoting their visibility, confronting the similarities and differences of their knowledge, and contributing to the emergence of new concepts and domains (Hilário & Grácio, 2011). In the first period (1983-1987), the network was very sparse and there were few coauthorships, most were made up of no more than two authors and a few isolated nodes (lone authorships). The publications were concentrated among the research elite of the time, notably Large JA, then editor in chief of the journal, and Armstrong CJ who worked at the same institution as Large JA were co-authors in one subgraph. In this first period, the epistemic community of the journal was emerging and not yet structured. In the second period (1988 to 1992), a more interconnected network became visible with four subgraphs, each of which has three or more nodes of coauthorships and a few lone authors. Although many of the research elite were still in single authorship, multiple collaborations in triads appeared, such as the one between ”Edwards RJ, Wilson TD, Roberts N, and Cronin B”. In this period, Large JA co-authored a paper with four authors forming a pentagonal network. This indicates increased collaborations between authors and a structuring of the epistemic communities around the journal. Price (1963) believed that due to the exponential growth of the number of publications in co-authorship, single authorship would cease to exist. Although single authorship was a significant practice during the first ten years of existence of the journal, we observe that especially after 1993 this practice decreases in favour of co-authorships. In the third period (1993-1997), the network became even more dense. A huge interconnected subgraph shows Large JA (30) at its center, followed by Smith F (16), Anderson D (13), Underwood PG (12) (see Table 2). These authors were also part of the research elite of this period. Two other people who also held editorship roles in the journal were very centrally placed in the network: Hartley RJ (6) was book reviewer for the journal between 2000-2001 and subsequently joint-editor-in-chief; Guy RF (4) was editor-in-chief together with Large JA at the start of the journal. Two other smaller but dense networks in this period show authors like Wilson TD and Goulding A as central nodes, all of them being part of the research elite. 270 1983-1987 1988-1992 1993-1997 1998-2002 2003-2007 2008-2012 2013-2018 Figure 2: The seven co-authorship networks of Education for Information for the period 1983-2018. 271 It appears then that in its first three periods of the journal’s existence (1983-1997), the editor-in-chief (Large JA) played a major role in structuring the co-authorship networks and in shaping its epistemic communities. Bourdieu (1983) stated that scientific capital can be accumulated or even transferred and that it is directly related to the scientific prestige and trajectory of the researcher. This idea reinforces our assumption about the key role of a journal’s editor-in-chief in the emergence, growth and development of a scientific journal. After 1998, Large JA no longer appeared as part of this journal’s research elite, suggesting that while the journal needed the scientific capital of its founding editor-inchief to give it prestige during its formative years, once this has been achieved, the journal began to build on this and to consolidate its own scientific capital. This assumption seems to be borne out by the subsequent co-authorship networks after the first three periods (1998-2018). While the journal’s original research elites were predominantly from the United Kingdom (like its first editor-in-chief) and the United States, research elites emerged from other parts of the world (South Africa, Nigeria, Kuwait, Australia, Greece, and Canada). In the fourth period (1998-2002), the networks continued to display a greater degree of connectivity with most authors being either directly or indirectly connected with the others, with the exception of three small subgroups. The authors with the most coauthorships were Oppenheim C (18) who is placed at the transition point between two interconnected networks. Kennedy J (15), Parker S (15), Hornby S (12) and Morgan S (15) appeared as hubs in the networks. By contrast with the two preceding periods, the network of co-authorships became very sparse in the fifth period (2003-2007) with a few disparate subgraphs, each with two or three nodes and each disconnected from the others. The journal’s editors who had been present in the preceding periods, and thus instrumental in structuring its epistemic communities were noticeably absent in the network for this period. One wonders why the epistemic communities which had been coalescing around the journal over the first twenty years appear to have disintegrated. This period seems to be a sort of “turning point” (point de rupture) for the journal and may be a sign of disengagement in the journal by its founding editors-in-chief and his network. A more qualitative analysis is required to understand what happened at this time in the journal’s life. In the sixth period (2008-2012), a new epistemic community seems to emerge comprising two densely connected and moderately sized subgraphs. One completely interconnected subgraph was built around the following authors: Olatokun W, Opesade A, Longe F, Nwagwu WE, Ajiferuke I, Ogunsola K, and Tiamiyu M). Apart from Ajeferiku who is based in Canada, his co-authors are all based in the Africa Regional Centre for Information Science, University of Ibadan, in Nigeria. The rest of the network showed very disconnected subgroups in which the majority of the clusters correspond to the publications that the members of the subgroups shared. Thus, we observe a shifting of th journal’s center of gravity outside the UK and North America. In the last seventh period (2013-2018), a very dense network appears around Pluye P (20), a professor at the University of McGill (Canada). This dense subgraph is explained by the fact that Pluye P. co-guest edited a series of special issues of the journal on Health information evaluation with a fellow team member Granikov V. Hence, the densely connected subgraph is a direct result of the tradition of multiple co- 272 authorships within this team from the Department of Family Medicine and the School of Information Studies at McGill University. The other smaller subgroups show fewer number of nodes. The biggest collaborations here have no more than four nodes, as in the case of the membersof the research elite Nadzar FM and Branch-Mueller J. 3.0 Conclusion Grácio (2018) pointed out that scientific collaboration manifests itself through the collective intellectual work, promoting an association of skills and knowledge, uniting researchers with thematic proximity, and sometimes approximating researchers from different areas. Our study showed the crucial role played by a scientific journal’s editor-in-chief in the emergence, development, and consolition of its epistemic communities which in turn structures research fields. In the case of the journal Education for Information, its first editors-in-chief (Large JA, Hartley RJ) occupied prominent and central positions in the first three periods of the existence of the journal when it needed to gain visibility and credibility as a channel for scholarly publication in the interdisciplinary field of LIS. Once the epistemic communities were consolidated, the influence of its founding editors-in-chief waned and eventually disappeared as they either shifted their attentions elsewhere or left their editorship positions. With the change of editorship in 2014, new more diverse and international epistemic communities began to coalesce around the journal. This indicates that while the journal benefitted from the scientific capital of its founding editors in its first two decades of existence, a point of rupture (split) occurred afterwards in which the epistemic communities built by the founding editors were phased out. As the journal consolidated its reputation, it attracted authors from outside this historic epistemic community, thus arriving at a more diverse and international epistemic communities that are clearly different from those of the founding editors. References Alves, Bruno Henrique, Rafael Cacciolari Dalessandro, and Fernanda Bochi dos Santos. 2019. “Colaboração Científica no Periódico Knowledge Organization: Elementos para Caracterização de um Domínio” In: Organização do Conhecimento Responsável: Promovendo Sociedades Democráticas e Inclusivas, edited by Thiago Henrique Bragato Barros and Natalia Bolfarini Tognoli. Belem: Ed. da UFPA, 137-144. Bourdieu, Pierre. 1983. “O Campo Científico.” In Pierre Bourdieu: Sociologia, edited by Renato Ortiz. São Paulo: Ática, 122-155. Chen, Chaomei, Fidelia Ibekwe-SanJuan, and Jianhua Hou. 2010. “The Structure and Dynamics of Co-Citation Clusters: A Multiple-Perspective Co-Citation Analysis.” Journal of the American Society for Information Science & Technology 61, no. 7: 1386–1409. Grácio, Maria Cláudia Cabrini. 2018. “Colaboração Científica: Indicadores Relacionais de Coautoria.” Brazilian Journal of Information Science 12, no. 2: 24-32 Guimarães, José Augusto, Fabio A. Pinho, and Gustavo M. Ferreira. 2012. “Relações Teóricas da Organização do Conhecimento com as Abordagens de Catalogação de Assunto, Indexação e Análise Documental.” Scire 18, no. 2: 31-41. Guimarães, José Augusto Chaves, Daniel Martínez-Ávila, and Bruno Henrique Alves. 2015. “Epistemic Communities in Knowledge Organization: An Analysis of Research Trends in the Knowledge Organization Journal.” Paper resented at ISKO UK biennial conference 13th – 14th July 2015, London. 273 Guimarães, José Augusto Chaves and Joseph T. Tennis. 2012 “Constant Pioneers: The Citation Frontiers of Indexing Theory in the ISKO International Proceedings.” In Categories, Contexts and Relations in Knowledge Organization: Proceedings of the Twelfth International ISKO Conference 6-9 August 2012, Mysore, India, edited by A. Neelameghan and K. S. Raghavan. Advances in knowledge organization 13. Würzburg: Ergon Verlag, 39-43. Hilário, Carla Mara and Maria Cláudia Cabrini Grácio. 2011. “Colaboração Científica na Temática “Redes Sociais”: Uma Análise Bibliométrica do ENANCIB no Período 2009-2010.” Revista EDICIC 1: 363-375. Ibekwe-SanJuan, Fidelia. 2008. “The Impact of Geographic Location on the Development of a Specialty Field: A Case Study on Sloan Digital Sky Survey in Astronomy.” Knowledge Organization 35: 239-250. Ibekwe-SanJuan Fidelia and Eric SanJuan. 2010. “Knowledge Organization Research in the Last Two Decades: 1988-2008.” In Paradigms and Conceptual Systems in Knowledge Organization: Proceedings of the Eleventh International ISKO Conference 23-26 February 2010, Rome, Italy, edited by Claudio Gnoli and Fulvio Mazzocchi. Advances in knowledge organization 12. Würzburg: Ergon Verlag, 115-121. Hilário, Carla Mara and Maria Cláudia Cabrini Grácio. 2018. “Análise de Citações Considerando a Contribuição dos Autores e Ordem da Autoria nos Artigos do Journal of Informetrics.” In: XIX Encontro Nacional de Pesquisa em Ciência da Informação – ENANCIB 2018, 4327- 4342 Hjørland, Birger. 2002. “Domain Analysis in Information Science: Eleven Approaches- Traditional as well as Innovative.” Journal of Documentation 58: 422-62. Hjørland, Birger. 2016. “Knowledge Organization (KO).” Knowledge Organization 43: 475- 484. Hjørland, Birger. 2017. “Domain Analysis.” Knowledge Organization 44: 436-464. Oliveira, Ely Francina Tannuri de, Bruno Henrique Alves, Marcos Rodrigues do Prado, and Maria Aparecida Pavanelli. 2017. “Produção Científica e Inserção Internacional da Revista Scire no Período de 2006 a 2014.” Scire 23, no. 1: 47-56. Oliveira, Ely Francina Tannuri de and Maria Cláudia Cabrini Grácio. 2012. “Visibilidade dos Pesquisadores no Periódico Scientometrics a Partir da Perspectiva Brasileira: Um Estudo De Cocitação.” Em Questão 18, no. 3: 99-113. Price, Derek J. de Solla. 1963. Little Science, Big Science. New York, Columbia. Smiraglia, Richard P. 2008. “ISKO 10’s Bookshelf. An Editorial.” Knowledge Organization 35: 187-191. Smiraglia, Richard P. 2011. “ISKO 11’s Diverse Bookshelf: An Editorial.” Knowledge Organization 38: 179-186. Smiraglia, Richard P. 2012. “Shifting Intension in Knowledge Organization: An Editorial.” Knowledge Organization 39: 405-408. Smiraglia, Richard P. 2013. “ISKO 12’s Bookshelf— Evolving Intension: An Editorial.” Knowledge Organization 40: 3-10. Smiraglia, Richard P. 2014. “ISKO 13’s Bookshelf: Knowledge Organization, the Science, Thrives—An Editorial.” Knowledge Organization 41: 343-356. Smiraglia, Richard P. 2015. Domain Analysis for Knowledge Organization: Tools for Ontology Extraction. Oxford: Chandos Publishing. Smiraglia, Richard P. 2017. “ISKO 14’s Bookshelf: Discourse and Nomenclature—An Editorial.” Knowledge Organization 44: 3-12. Smiraglia, Richard P. 2018. “ISKO 15’s Bookshelf: Dispersion in a Digital Age. An Editorial.” Knowledge Organization 45: 343-357. Paul Matthews – University of the West of England (Bristol), England. Knowledge Organisation Systems for Chatbots and Conversational Agents A Review of Approaches and an Evaluation of Relative Value- Added for the User Abstract: Chatbots and smart digital assistants usually rely on some form of knowledge base for providing a source of answers to user queries. These may be more or less structured, ranging from free text, keyword search to interconnected knowledge graphs where answers are surfaced through structured query and/or dynamic logics. This paper will review the state of the art and probe the added value to the end user of using more structured KOSs as the knowledge base component of agent architectures. In addition to the extant literature, experimental development of a chat bot for student information based on a partial knowledge graph is described. A key conclusion is that ontology in a broad sense can be used to drive dialogue as much as to derive factually correct or organisationally-approved responses. 1.0 Introduction Text and speech-based dialogue agents are growing in popularity as a new or complementary interface to organisational systems and services. They offer high availability, convenience and the potential to identify, triage and satisfy common queries and interactions. As a window into an organisation’s knowledge it seems natural that knowledge organisation systems (KOSs) should play a major role in supporting and directing exchanges, though this is a more sophisticated role than they usually play. For the sake of interoperability and portability, organisations have often erred on the side of simplicity and relatively semantically shallow KOSs to support discovery (Isaac and Baker 2015). However, this level of power is neglecting a wealth of experience in the implementation of Knowledge Representation and Knowledge Engineering coming from the computer science / AI disciplines (Giunchiglia, Dutta, and Maltese 2014). These somewhat deeper approaches allow the formal expression of conceptual structures and their interrelation, as well as the application of automated logical inference over a system’s knowledge base to arrive at evidence-based conclusions. These affordances once again become potentially useful when it comes to conversational agents, so it is a good time to revisit standards and tools developed for knowledge representation and apply them to chatbots (Cameron et al. 2018) A similar driver has been the developments in natural language interfaces, where some progress has been made in mapping unstructured queries to formal data relations. Indeed, there was a call to action at ISKO 2004 for better connection between KO and computational linguistics (Jones, Cunliffe, and Tudhope 2004). Jone’s et al’s exploratory work showed how term disambiguation and the mapping of user queries to KO structures could be improved by using a general purpose thesaurus and expanding matching to include thesaurus scope notes. The application of KOS to conversational interfaces requires an active approach and a pragmatic view of semantics (as the concepts are action-oriented and application-specific) in addition to a unification of user and organisational warrant (Hjørland 2007), as 275 the goal is to allow the user to interact successfully in real-time with an organisation’s knowledge structures. This paper will look at ways in which KOS can both fuel and drive dialogue in conversational agents and chatbots. This will be based on an exploration of ways in which the user experience at the dialogue interface is affected by bot behaviours and capabilities. We will then look at examples of how KOSs have been used in domain-specific dialogue systems, before exploring in more detail a case-study on a bot for student information. It is hoped that these sections will serve to highlight challenges and opportunities in KOS-driven agents. Firstly, however, we will look at a broad typology of bot technical architectures to help illustrate where KOS traditionally sits. 2.0 Dialogue System Typologies and Components A common distinction made with agents are goal-driven or rules-based and those that are more open-domain and able to tackle a wider variety of topics. (Altinok 2018). The former are perhaps more likely to end in satisfactory completion due to having a more predictable, constrained structure that can also be driven in part by menus and fixed choices. They are however restricted in requiring explicit representation of the dialog structure (Augello et al. 2013) and in only being able to perform a limited set of functions – also, if the pattern matching fails, the bot may flounder. In contrast, in some of the more open AI-enabled systems, the most suitable response is predicted based on models trained on a large dataset of human dialogue. Here, some precision is often sacrificed in an attempt to provide the user with a more open ended, flexible conversation experience. In most cases, where a dialogue requires an informational response, then a data or knowledge base is central to the architecture (Figure 1), whether this is based on the endpoint of a retrieval-based dialogue fragment or the model-guided prediction of an AI-based agent. Increasingly, bot architectures include natural language processing to elucidate the “things” or entities being talked about and the goals or intents of the user in relation to them. These enriched structures are used to (perhaps fuzzily) match to entries in the knowledge base using similarity or information retrieval algorithms. The best matches can then be used to derive the system response. In recent years we have also seen an increase in more hybrid architectures, including both rules-based and generative components and it is likely that this trend will increase, as the two approaches can to some extent make up for the shortcomings of the other. Certainly, new dynamic architectures are needed to help lift chatbots out of the “trough of disillusionment” in which many still lie – as the user experience at the interface falls short of expectations. 276 Figure 1: Generic Agent Architecture (Maroengsit et al. 2019) 3.0 User Interaction with Conversational Interfaces In terms of interaction patterns, Hill, Ford and Farreras (2015) looked at interaction logs from conversations with chatbots in comparison to human partners, and found that chatbot interactions contained fewer words per message but a greater number of messages overall compared to humans. This indicates that a system needs to be able to proactively seek clarifications, in the same way that humans do in everyday interactions (Schlangen 2004). These may be to enable the completion of semantic underspecification, or to clarify intent. A related issue is dialogue initiative. Finite state (rule-based) systems tend to dominate the initiative, whereas in real conversation, initiative alternates between participants (Jurafsky 2018). One-sided initiative feels inflexible, less natural and can’t handle statements containing multiple pieces of information. Reasons for using a spoken agent include time saving and multi-tasking (Luger and Sellen 2016). Luger & Sellen noted that regular users invest time in understanding the agent’s strengths and capabilities, including initial playful interactions. Interestingly, factors affecting interaction included having sufficient understanding of an agent’s capabilities and internal workings, including “whether or not its capabilities altered over time” and the extent to which the agent was learning from the user. Maintenance of context is a key fundamental strength of human conversation, and an automated agent’s limited abilities in this area can be a source of frustration (Vtyurina et al. 2017). The “anaphora” or maintaining the implicit referent of the conversation continues to be a technical challenge for agent developers. Vtyurina et al.’s study also highlighted the need for information credibility to be strengthened in an agent’s answers, perhaps by providing links where information could be verified, similarly where a variety of opinions exist on a topic, users asked for summaries of these. 277 4.0 Evaluation criteria for Conversation-Oriented KOS Several dimensions have been proposed to organise KOS themselves, one of the more common spectra being semantic strength – this itself correlated with time and money (Souza, Tudhope, and Almeida 2012). At the higher semantic strength end, formal theories and ontologies are heavier and provide richer expressivity, logics and entailments or automated reasoning. Conversely with weaker semantics comes the opportunity for more interoperability at the syntactic level. Ontologies themselves can vary according to domain or application specificity and in the area of chatbots this maps quite well onto generalist versus specialist conversational capabilities. Soergel (2001) proposes a set of evaluation characteristics for KOS, including purpose, coverage, conceptual structure, extent of precombination, access and display, and degree of updating. In terms of coverage, Tennis (2013) notes the temporal limitations of correctness and that dynamic theories are necessary given the inevitability of change and the need to cater for the ever changing and evolving knowledge domain. A geologic or archaeological view might best reflect how KOs change over time in practice, though the degree to which change is incremental or immediate may vary between systems and stakeholders. In the perhaps more radical change scenario, interaction with users may be seen to drive immediate change. In sum, many existing criteria for KOS evaluation focus purely on aspects of the KOS itself rather than its wider position within an organisational information system. This neglects pragmatic aspects such as: ease and automation of updating, degree of fit with legacy corporate systems, degree of redundancy. These latter considerations are surely essential if the KOS is to be sustainable and to fulfil a central role in corporate digital communications. 5.0 KOS Approaches for Agents Perhaps the simplest form of data store to support an agent is question-answer pairs, where the user query is compared to the knowledge base to find the closest matching question. This kind of architecture was used as a core component, for example, by Carisi et al (Carisi, Albarelli, and Luccio 2019) to build an airport information chatbot, though their bot also was able to connect to other airport databases to report flight information. Such a knowledge base can be bootstrapped using FAQ lists from web pages or from call centre records. More “capable” bots may be closely linked to underlying ontology. Athreya et al’s dbpedia bot (2018) used third party natural language parsing to match queries to DBpedia URIs and literals, with literals given precedence as more likely to correspond to factual answers. Altinok (2018) describes a system for banking services that uses an ontology, both to represent the bank’s services and their features, but also to structure the dialogue. Noun phrases in user queries are scanned for known entities by matching to ontology classes and individuals (with use of a synonym list to capture alternative phrasing). Matched entities are then used to provide the context for further queries, somewhat mitigating the anaphora problem. Conversation memory is also used to keep track of ontology nodes that have been visited in order to avoid repetition and provide the next dialog prompt. 278 For their bot to discuss issues in climate change, Groza and Toniuc (2017) used an ontology but converted it to API.AI pattern matching model for use in bot interaction. Classes were mapped to entities and roles to intents. However, when no intent was matched (i.e. queries were not easily matched to models), entailment rules were used derived from a debate corpus. Roca et al. (2019) propose a general microservice architecture for their prototype chatbot to support patients with chronic illness. The argument is that such bots are easily extensible without disruption to service. In terms of communication and storage, the bot used AIML to represent the dialogue logic, with FHIR (Fast Healthcare Interoperability Records) as the data store for the patient and associated data entities (e.g. Care Plan, Practitioner, Organization, Questionnaire), which could be used to generate monitoring questions – via dynamic AIML generation - and personalise the conversation 6.0 Case Study: Student Personal Circumstances Procedure We have developed a prototype chatbot for students, based around common queries connected to timetabling, choice of optional modules, changing course and so on. In doing so, we have worked closely with our enquiry services to understand both the kinds of enquiry received and how students are subsequently advised in accordance with university regulations. One of the most common sources of enquiry is around problems taking exams or submitting coursework due to illness, bereavement or other life events. These cases may be eligible for special consideration in accordance with the university’s marks removal or personal circumstances procedures depending on whether or not the student has submitted the work or attempted the exam. To be eligible, students need some evidence to support the claim, or they may self-certify once per academic year without supporting evidence. Examples of the type of queries around these processes are shown in Table 1. Table 1: Sample queries around personal circumstances supplied by information point advisors or taken from web search logs (in some cases these have been amended slightly to preserve anonymity). System appropriate response noted. Source Example Appropriate Response Information Point Advisors “I want to self-certify but where is the form?” From QA knowledge base “I don’t have any evidence, what should I do?” Triage Self-Certification “When is the deadline to submit a PC form?” Triage Personal Circumstances “Can I make a request after the deadline?” Clarification of Intent “I sat an exam but I shouldn’t have, what can I do?” Triage Marks Removal “I handed in the work but I shouldn’t have, what can I do?” Triage Marks Removal “How do I submit a request?” Clarification of Intent “I applied for missed assessment but I would like to cancel it now, what should I do?” Triage Process Cancelation Web Search Logs “personal circumstances evidence submission” Clarification of Intent “personal circumstances but can’t attend campus” Triage Personal Circumstances 279 “personal circumstances reasons” From QA knowledge base “I have backache and find it hard to sit in an exam” From QA knowledge base (but suggests process outside of domain knowledge) “missing exams” Clarification of Intent “ill on exam day” Clarification of Intent “late personal circumstance” Triage Marks Removal “student illness” Clarification of Intent Our initial prototype used a rule-based decision tree for personal circumstances to guide the user through the process, once the applicability had been established through identifying that intent (using a set of intent patterns such as “sick exam” or “missed deadline”). The logical dialogue structure for this is shown in Figure 2. Additionally, a “one-off” question-answering dataset is used when there is a strong match to a more specific question relating to the topic (here the user “passes through” to a direct answer rather than entering a protracted dialogue. Problems with the approach taken include that, while the structure correctly formalises the regulations, a) the structure is hard-coded and any changes to the process require reworking at the code level; and b) it follows a set script or funnel down which all users are sent. It therefore does not adequately accommodate short-cut, clarification and process-related questions such as “I have submitted by work and don’t have any evidence, what should I do?”, “What counts as evidence?” or “What to do if I don’t have evidence?”, or “How many times can I self-certify?”. While problem a) was addressed by allowing new dialogue structures to be loaded from external files at run-time, the essential inflexibility of this approach remained. 280 Figure 2: Dialogue flow for personal circumstances, first prototype We therefore have commenced work on a second prototype using a domain ontology to represent the enquirer (student), the processes we want to represent, and the requirements for these processes. The dialogue can then be built based around the completeness of the entity (object properties) as constructed during the conversation, a more “framebased” approach. If these are missing, then this will trigger information gathering questions. Once enough information is available, then the enquirer can be channelled into the application procedure for the relevant process. In order to facilitate maintenance, we store the natural language options to identify intents in the entities RDF labels, and the queries to generate in the entity and property RDF comments. We also take advantage of SWRL rules which capture the formal regulations (e.g. Listing 1). Here, the student is flagged as eligible for marks removal if they have: a) submitted already b) have admissible evidence. Listing 1:Example SWRL rule for Marks Removal Eligibility Student(?s) ^ Assessment(?a) ^ Process(?p) ^ Documentation(?d) ^ hasSubmitted(?s, ?a) ^ hasDocumentation(?s, ?d) ^ isAdmissibleForProcess(?d, ?p) -> isEligibleForProcess(?s, MarksRemoval) 281 7.0 Discussion Our case study is typical of many agent interaction settings as: a) the initial user input may be minimal and requiring clarification and b) it is very important the user is advised appropriately once the system is sufficiently appraised of its circumstances. It therefore needs to match user status to organisational policies and processes – a strong argument for use of a formal, rule-based ontology as a core architectural component. That said, for it to fulfil the requirements of a conversational interface, it does need to be sufficiently detailed, pointing toward the expensive, resource-intensive end of the KOS spectrum (Souza et al. 2012). Considering Soergel’s (2001) evaluation criteria for KOS in the light of agent applications, we see that the purpose requires relatively high specificity and that, in terms of sources and coverage, while preferred terms can be those defined and used by the organisation, they should be comprehensively mapped to the natural language of the user in order to be correctly identified, matched and prioritised. In terms of conceptual structure, while some depth of class hierarchy seems useful for organisational purposes, it seems less likely that these are very useful for satisfying user needs, as queries that are applicable to ancestor classes are probably more unusual. Instead, the properties of a specific atomic entity seem to be more useful to guide slot filling and frame completion. Finally, and perhaps most importantly is Soergel’s criterion “updating”, which we should elaborate to include not only frequency but ease of updating, along with the centrality of the KOS itself to the overall organisation information system. It is here that ontology-based agents seem to be very early-stage – very little is said about how they can be maintained or embedded within an organisation, though semantic wikis offer a useful opportunity to bridge the gap between knowledge engineering and domain expertise. On the ontology creation side, there is a relatively large amount of work looking at how they can be derived from textual sources, but it is almost inevitable that further quality assurance steps will be needed and that this cannot be fully automated. Architecturally speaking, Roca et al.’s example (2019) seems to point the way, in describing an approach that allows for the knowledge base to be incrementally enhanced without loss of service. In their case, however, underlying knowledge structures are relatively partitioned or siloed, which would seem to risk increasing the problem of maintainability. On the flip side, use of single integrated ontologies will require version control and continuous validation to ensure that extensions to not introduce errors in existing rules. 8.0 Conclusions This paper has looked at KOSs within chatbot and agent architectures in order to understand what the KOS itself can contribute to the end user experience. Clearly, there are demonstrable ways that a KOS-driven service can not only represent the domain knowledge with which a user is interacting but can also shape and facilitate the dialogue itself. Here there are potential advantages in goal-driven or regulatory oriented applications in maintaining formal correctness, where “generative” AI-based systems cannot necessarily guarantee such accuracy. While individual prototypes and technical approaches are relatively well documented, less has been published on how to get to a state where suitable ontologies are embedded within an organisation and maintained as a core system. While perhaps less 282 superficially exciting, this work can establish a basis from which conversational interfaces can evolve productively. References Altinok, Duygu. 2018 “An Ontology-Based Dialogue Management System for Banking and Finance Dialogue Systems.” ArXiv: 1804.04838 Athreya, Ram G., Axel-Cyrille Ngonga Ngomo, and Ricardo Usbeck. 2018. “Enhancing Community Interactions with Data-Driven Chatbots--The DBpedia Chatbot.” In WWW '18: Companion Proceedings of the The Web Conference 2018, edited by Pierre-Antoine Champin, Fabien Gandon, Lionel Médini, Mounia Lalmas, and Panagiotis G. Ipeirotis. Republic and Canton of Geneva, Switzerland: International World Wide Web Conferences Steering Committee, 143– 146. Augello, Agnese, Giovanni Pilato, Giorgio Vassallo, and Salvatore Gaglio. 2013. “Chatbots as interface to Ontologies.” In Advances onto the Internet of Things. Advances in Intelligent Systems and Computing 260, edited by S. Gaglio and G. Lo Re. Cham: Springer, 285–299. Cameron, Gillian, David Cameron, Gavin Megaw, RR Bond, Maurice Mulvenna, Siobhan O'Neill, C Armour, and Michael McTear. 2018. “Back to the Future: Lessons from Knowledge Engineering Methodologies for Chatbot Design and Development.” In Proceedings of the 32nd International BCS Human Computer Interaction Conference (HCI-2018), edited by Raymond Bond, Maurice Mulvenna, Jonathan Wallace, and Michaela Black. Swindon, UK: BCS Learning & Development Ltd, 1–5. Carisi, Matteo, Aandrea Albarelli, and Flaminia L. Luccio. 2019. “Design and Implementation of an Airport Chatbot.” In GoodTechs '19: Proceedings of the 5th EAI International Conference on Smart Objects and Technologies for Social Good, edited by Armir Bujari, Pietro Manzoni, Anna Forster, Edjair Mota, and Ombretta Gaggi. New York: Association for Computing Machinery, 49–54. Giunchiglia, Fausto, Biswanath Dutta, and Vincenzo Maltese. 2014. “From Knowledge Organization to Knowledge Representation.” Knowledge Organization 41: 44–56. Groza, Adrian and Daniel Toniuc. 2017. “Explaining Climate Change with a Chatbot Based on Textual Entailment and Ontologies.” In IEEE 13th International Conference on Intelligent Computer Communication and Processing. Cluj-Napoca: IEEE. 2017. Hill, Jennifer, W. Randolph Ford, and Ingrid G. Farreras. 2015. “Real Conversations with Artificial Intelligence: A Comparison Between Human-Human Online Conversations and Human- Chatbot Conversations.” Computers in Human Behavior 49: 245–250. Hjørland, Birger. 2007. “Semantics and Knowledge Organization.” Annual Review of Information Science and Technology 41: 367–405. Isaac, Antoine and Thomas Baker. 2015 “Linked Data Practice at Different Levels of Semantic Precision: The Perspective of Libraries, Archives and Museums.” Bulletin of the American Society for Information Science and Technology 41, no. 4: 34–39. Jones, Iolo, Daniel Cunliffe, and Douglas Tudhope. 2004. “Natural Language Processing and Knowledge Organisation Systems as an Aid to Retrieval.” In Knowledge Organization and the Global Information Society: Proceedings of the Eighth International ISKO Conference 13-16 July 2004 London, UK, edited by Ia C. McIlwaine. Advances in knowledge organization 9. Würzburg: Ergon Verlag, 315. Jurafsky, Dan. 2018. Conversational Agents AKA Dialog Agents. Luger, Ewa and Abigail Sellen. 2016. ““Like Having a Really Bad PA”: The Gulf Between User Expectation and Experience of Conversational Agents.” In CHI '16: Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, edited by Jofish Kaye, Allison 283 Druin, Cliff Lampe, Dan Morris, and Juan Pablo Hourcade. New York: Association for Computing Machinery, 5286–5297. Maroengsit, Wari, Thanarath Piyakulpinyo, Korawat Phonyiam, Suporn Pongnumkul, Pimwadee Chaovalit, Thanaruk Theeramunkong. 2019. “A Survey on Evaluation Methods for Chatbots.” ICIET 2019: Proceedings of the 2019 7th International Conference on Information and Education Technology. New York: Association for Computing Machinery, 111–119. Roca, Surya, Jorge Sancho, José García, and Álvaro Alesanco. 2019. “Microservice Chatbot Architecture for Chronic Patient Support.” Journal of Biomedical Informatics 102: 103305. Schlangen, David. 2004. “Causes and Strategies for Requesting Clarification in Dialogue.” In Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004. Cambridge, Massachusetts: Association for Computational Linguistics, 136–143. Soergel, Dagobert. 2001. “Evaluation of Knowledge Organization Systems (KOS): Characteristics for Describing and Evaluating KOS.” In JCDL 2001 NKOS Workshop, Roanoke, VA,5. Souza, Renato Rocha, Douglas Tudhope, and Mauricio B. Almeida. 2012. “Towards a Taxonomy of KOS: Dimensions for Classifying Knowledge Organization Systems.” Knowledge Organization 39: 179–192. Tennis, Joseph. 2013. “Metaphors of Time and Installed Knowledge Organization Systems: Ouroboros, Architectonics, or Lachesis?’ Information Research 18, no. 3: 1–8. Vtyurina, Alexandra, Denis Savenkov, Eugene Agichtein, and Charles L.A. 2017. “Exploring Conversational Search With Humans, Assistants, and Wizards.” In CHI EA '17: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems, edited by Gloria Mark, Susan Fussell, Cliff Lampe, M.C. Schraefel, Juan Pablo Hourcade, Caroline Appert, and Daniel Wigdor. New York: Association for Computing Machinery, 2187–2193. Claire McDonald – University of Washington, Seattle, WA, United States Call Us by Our Name(s) Shifting Representations of the Transgender Community in Classificatory Practice Abstract: The linguistic shifts in terms used for self-representation and collective representation by the transgender community are a compelling exemplification of subject-based classification schemes’ shortcomings. These shortcomings become readily apparent in digital environments which enable user-generated tagging of content; on these platforms, members of the transgender community are able to maintain their preferred selfreferential terms to establish the subject(s) of their work. In using creator-generated tags as a source for contemporary terminology to be used in subject-based classification schemes, I will build upon previous work by surveying self-referential terminology generated by creators of original works about the transgender community. By comparing the terminology used in hashtags on posts centered on the transgender community on Instagram with the terminology present in the Library of Congress’ subclass heading for HQ77.7—HQ77.95 and the Library of Congress’ available subject headings relating to gender variance, I will examine the shifts in rhetorical representation of materials focused on the transgender community over time, using this as justification for additional documentation regarding global changes made to classification schemes and indexing languages into online catalog records. 1.0 Introduction The Library of Congress Classification Scheme is the bibliographic classification scheme developed by the Library of Congress. Its primary sites of implementation are libraries within the United States of America, and it is also implemented in numerous libraries within and outside the United States (Olson 2000, 54). The flaws built into LCC have been well-documented throughout the 20th century—particularly as these flaws relate to the scheme’s representation of marginalized communities (Adler 2017, 9-10, 44; Fox 2016, 375; Olson 2000, 54; Olson 2001, 541). The transgender community is one of many groups that has been represented within LCC by terminology which is perceived by the community as outdated and pejorative (Roberto 2011, 57). LCC subclass HQ77.7—HQ77.95 is where materials about the transgender community are classified using the term “transsexualism,” which is considered to be an outdated—and often offensive—term by many transgender individuals. The terms used to name LCC classes and subclasses are meant to reflect the language used by relevant discourse communities; however, despite terminological shifts within queer studies and other fields, which have been catalyzed by the transgender community’s increased political and cultural visibility, “transsexualism” continues to be a term used within LCC. I argue that the HQ77.7—HQ77.95 subclass heading is no longer reflective of the terminology used by transgender individuals during acts of self-representation, using hashtags on Instagram as a way to measure the prevalence of particular terminology within the LGBTQ community when creating and sharing content by and about transgender individuals. I also argue for the addition of version control in publicfacing catalog records as a means for documenting large-scale edits made to knowledge organization infrastructures, so as to increase transparency and facilitate institutional accountability in classificatory and cataloguing practices. 291 2.0 Literature Review When creating a bibliographic classification scheme—particularly those which use a hierarchical structure—one must establish different concepts’ relations to each other (Tennis 2012, 1351), and also the terminology used to name concepts and subjects. It is understood that language is always imperfect and constantly evolving, and this is true in classificatory practices as it is in other contexts (Ranganathan 1937, 62; Tennis 2016, 574). Despite this acknowledged fluidity, which is a natural result of knowledge production, authors of classification schemes are put in a position where they must choose stable terms according to common usage (Adler 2017, 34; Roberto 2016, 63). In the implementation of a classification scheme, there must also be a way for terms to systematically be updated to reflect changes in common usage (Fox 2016, 380; Olson 2001, 660-661; Ranganathan 1937, 64). Changes to classification schemes and indexing languages are unavoidable as knowledge is constantly produced, and these changes may be based upon literary warrant and/or ethical reasons (Tennis 2013, 2). This process of addition and revision broadly prevents the catalog from becoming obsolete Ranganathan 1937, 66). Further, periodic revisions to classification schemes and indexing languages can and should prevent the continued use of terminology which is harmful to marginalized communities (Roberto 2016, 63). Literary warrant is a key concept in consideration of how terms are chosen in classification schemes; it is the idea that relevant literature should be surveyed with attention given to the language used to describe phenomena within a given knowledge domain, so that the terminology reflects the established discourse within that domain (Johnson 2010, 662). Literary warrant may be a useful way of choosing a particular term, as it is intended to enforce the language used within a community of experts (Adler 2009, 313; 2017, 20; Olson 2000, 56); however, the enforcement of literary warrant in practice often undermines its conceptual usefulness, as inconsistent application of this concept may also justify the use of outdated or derogatory terminology (Olson 2000, 57-58, 65). The HQ section of the Library of Congress classification scheme is a space which has been subjected to scrutiny by a number of scholars and researchers within the field of knowledge organization (Adler 2015, 489; Johnson 2010, 663). Materials about the LGBTQ community—as well as about sexual deviations, sadism, masochism, and fetishism—are catalogued in the HQ70s, with materials about “transsexualism” residing in HQ77.7—77.95 (Library of Congress, n.d.). According to the application of literary warrant within this subclass of LCC, “transsexualism” is a valid term based upon literary warrant from early 20th century medical and psychiatric literature, in which the term appears frequently when referring to gender-variant individuals (Adler 2009, 668; 2017, 36; Johnson 2010, 668). However, contemporary discourse communities use other terminology to describe individuals who do not identify and/or present as their birth-assigned gender; most notably, these discourse communities include the transgender community, who are shaping the ways in which gender variance is named by creating and disseminating their works in both analog and digital media. ‘Transgender’ as a broad term used to describe individuals who transgress traditionally-codified notions of gender may be attributed to Leslie Feinberg, a prominent transgender author and activist (Adler 2009, 318-319); the term was initially introduced by Virginia Price in the 1980s to describe individuals who do not present as their birthassigned gender yet do not seek to medically transition (Johnson 2010, 666). Today, 292 “transgender” is a preferred term to “transsexual” by most members of the transgender community; this preference is validated in common speech as well as in textual works in many disciplines and authoritative entities. While “transgender people” was introduced into LCSH in 2007 (Library of Congress 2019a), “transsexualism” remains the term used in LCC to describe this community. 3.0 Method Past work which focuses on the terminology used to describe materials about the transgender community has engaged with terminological disparity between LCC/LCSH and discursive practices within the LGBTQ community. Melissa Adler’s study of the disparities between LCC/LCSH and user-generated tags in platforms which are intended for use with book recommendations is a notable example of research focused on how the transgender community is named in different formal and informal environments; the users of these platforms are readers and reviewers of the books which are given tags to facilitate retrieval (Adler 2009, 320). Adler’s study shows the disparities between terminology used in LCC/LCSH and terminology used by readers who engage with materials by and about the transgender community. To build upon Adler’s work, I have focused on terminology used by the creators of materials about the transgender community on Instagram. As a social media platform, Instagram is used primarily for disseminating images or short-form text. To facilitate content retrieval, creators affix hashtags to their post; there is no controlled vocabulary imposed upon hashtags, meaning that creators may use whatever terms they wish in hashtags, but creators also have an interest in applying hashtags which are commonusage terms if their intention is for their post to be seen by a wide range of individuals. Instagram was chosen for this work because of the importance that social media platforms and other digital environments have for the LGBTQ community; these digital environments are relied upon by the LGBTQ community for sharing not only creative works, but relevant information pertaining to legislation, politics, and medical care. Instagram enables a user to search for a term as a hashtag, and these searches return all of the posts upon which this term is affixed as a hashtag as well as the total number of posts on the platform which bear the hashtag. Beginning on October 8, 2019, eleven terms were entered as searches for hashtags. These eleven terms were chosen based upon the terminology used in LCC/LCSH as well as upon the derivations of Library of Congress terminology which are used upon this platform. “Transsexualism” is the term used to name HQ77.7—77.95, and “transgenderpeople” and “transsexuals” are terms used in LCSH (Library of Congress n.d.; 2017; 2019a). “Transgender” and “trans” were chosen based upon common usage among the transgender community in published works as well as in spoken discourse; “transsexual” was chosen to maintain continuity in studied terms’ forms. To account for terminological disparity within LCSH, which has subject headings for “female-to-male transsexuals,” “male-to-female transsexuals,” “transgender men,” and “transgender women,” (Library of Congress 2018a; 2018b; 2019b; 2019c) qualifications regarding the gender binary were affixed to “trans,” “transgender,” and “transsexual” as search terms. The respective quantities of posts returned on October 8, 2019, with each of these hashtags were recorded as the initial quantities; the same eleven terms were searched as hashtags on February 5, 2020, so as to assess the relative frequency of use of each respective term. 291 4.0 Findings The number of posts bearing “trans” or “transgender” as hashtags at the beginning of the data collection process was at least double the quantity of the number of posts bearing “transsexual” or some derivation thereof; this was seen to be the case regardless of whether or not these terms were qualified by some directional acknowledgement of the gender binary. Further, the frequency at which posts were tagged with “trans” or “transgender” was higher than that of posts tagged with “transsexual.” This was seen not only in all three of these terms in their singular and plural forms, but also in derivations of these terms which included directional acknowledgement of the gender binary. Searched Terms Number of Posts, Oct. 8, 2019 Number of Posts, Feb. 5, 2020 femaletomaletrans 5,442 5,788 femaletomaletransgender 18,159 19,043 femaletomaletranssexual 2 2 maletofemaletrans 3,332 3,565 maletofemaletransgender 6,810 7,849 maletofemaletranssexual 0 0 trans 7,053,350 7,441,758 transgender 8,965,528 9,315,772 transgenderpeople 1,975 1,999 transsexual 593,165 605,412 transsexualism 367 372 transsexuals 28,942 30,267 Figure 1: Eleven terms searched as hashtags on Instagram, accompanied by each term’s respective quantities of tagged posts. It is particularly notable that, over the course of data collection, the only terms which showed no increase in posts using them as hashtags were “femaletomaletranssexual,” and “maletofemaletranssexual,” and that only five posts were tagged with “transsexualism” over the course of seventeen weeks. Considering that these terms were chosen due to their presence in LCSH as subject heading terms and as a subclass heading in LCC, this should be particularly noteworthy. 5.0 Discussion As seen in the collected data, “transsexualism” and derivations thereof is a term which is used far less frequently than “trans” or “transgender” to signify posts about gender variance on a popular social media platform for purposes of retrieval by platform users. This extension of literary warrant to include self-representative discursive practices in digital environments calls into question the continued use of “transsexualism” as the term used to name HQ77.7—77.95. A bibliographic classification scheme’s implementation necessarily affects the ways in which patrons access materials and interpret materials (Adler 2017, 5; Tennis 2012, 1355). By maintaining “transsexualism” as a subclass heading, despite the ongoing work 292 of the transgender community, the broader LGBTQ community, and activist librarians to shift LCC/LCSH terminology, these organizational infrastructures are enabling a particular interpretation of works about the transgender community (Tennis 2012, 1355). This interpretation prioritizes early 20th century medical and psychiatric discourse over the transgender community’s self-representational discourse, with the effect of marginalizing the latter (Adler 2017, 34; Olson 2000; 55, 68-9). Further, this maintenance of an outdated, non-preferred term as a subclass heading may be detrimental to patrons who are using preferred terminology when attempting to access materials about this community (Roberto 2011, 63). Based upon the discursive practices of the transgender community, the Library of Congress must address their reliance on what the majority of the described community perceives to be an inaccurate term The relative speed of terminological shift from ‘transsexual’ to ‘transgender’ as a preferred term for the transgender community is made manifest in LCC, in which a oncecommon term’s presence is now perceived as a problem. This is a particularly interesting example of term shift, as ‘transsexual’ and ‘transsexualism’ are terms which have has a history of being used not only by doctors, psychiatrists, and researchers, but also selfreferentially by many gender variant individuals in printed works and spoken language alike. When one considers the pace at which the transgender community began to adopt the term ‘transgender’ self-referentially; the fact that ‘transsexual’ is not universally perceived as a pejorative term by gender variant individuals, as the collected data shows that it is still a used term (albeit with less frequency than “trans” or “transgender”); and the increasing use of the term “trans”, a truncation of transsexual and transgender, within the community; the implication is that this community may continue to engage in ongoing self-referential terminological shift. LCC’s inability to adequately express this shift can—and should—be understood as a vestige of antiquated notions of gender and gender variance (Roberto 2011, 57). This example also can be understood as an example of the problems which may be caused by a universal classification scheme which is so firmly fixed that it cannot accommodate for new knowledge or conceptual shifts (Olson 2000, 57; Tennis 2016, 574). The findings in this study are consistent with those of other knowledge organization researchers who have lent their academic focus to terminological stagnancy and contentious terms’ presence in LCC/LCSH (Drabinski 2013, 98; Fox 2016, 375; Olson 2000, 74; 2001, 647), particularly with regards to terminology used to describe the LGBTQ community (Adler 2015, 491; 2017, 324; Roberto 2011, 63). Future research concerning the continued use of obsolete and demeaning terminology within LCC and other classification schemes must continue, as the findings of researchers often bolster the lived experiences of individuals outside of academia and may act as justification for broader structural changes to be made to LCC and other classification schemes. This future research must not focus solely on justification for updating specific terminology. While changing a given term may be a welcome short-term solution, updating terms on a oneto-one basis will not solve a host of broader concerns that this particular study illuminates. 291 6.0 Reducing Opacity in LCC/LCSH At this time, any modifications made to catalog records which represent global changes to LCC/LCSH are not made visible to patrons, who only see the finalized catalog record. A valuable addition to public-facing library catalog records would be additional documentation regarding global changes to LCC/LCSH as they affect these records. Rather than making changes relating to terminology and erasing all traces of former terminology, it would be useful to maintain a visible history of these changes on the records themselves, so as to convey the history of modifications made to this classification scheme and indexing language. The addition of version control would also be copacetic with LCSH itself, which includes subject headings’ edit history in its entries. LCC and LCSH are information infrastructures which are fixed in the context during which they were created and modified; they are also representations of this institution’s perspective on concepts, topics, and subjects at that point in time. This is unavoidable, as it is not possible for cataloguers to ever truly be neutral (Drabinski 2013, 95; Olson 2000, 64; Roberto 2011, 62), nor is it possible for cataloguers to predict the future directions of knowledge production, self-representational practices, and so on. Version controlling public-facing library catalog records would provide support for navigating the retrieval-related issues caused by terminological shift, as this would convey both outdated and updated terminology while prioritizing the latter (Tennis 2012, 574). Identities and the terms used to express them are multifaceted and constantly changing— both at an individual level and at collective levels—and knowledge is constantly being produced, yet LCC/LCSH maintain their fixity without acknowledging the temporal nature of the materials over which they exert control (Tennis 2012; Olson 2000, 57; Roberto 2011, 60). If the ways in which these changes affect classification schemes and indexing languages were documented and made accessible, users would still be able to rely upon familiar (if outdated) terminology while being encouraged to adopt terminology which is more current. When considering how to reduce the harm perpetuated by LCC/LCSH’s antiquated notions of gender identity and sexual orientation, Emily Drabinski advocates for the maintenance of these infrastructures, along with increased interactions in which librarians use these existing infrastructures as a way to teach patrons about institutional control over non-normative ways of being (Drabinski 2013, 108-109). The assumption made is that patrons will notice the workings of LCC/LCSH and ask librarians about them, and that librarians will be equipped to answer their questions, which cannot be guaranteed on any scale; more importantly, this suggestion enables the continued use of infrastructures which have been revealed to be harmful in numerous capacities. By documenting large-scale changes to LCC/LCSH in item records, Drabinski’s idea of using LCC/LCSH as tools for studying institutional representations of identities and communities would still be possible, but in a way that still encourages large-scale modifications of these infrastructures. These traces of oppressive and harmful practices would be usable as a tool for having instructional conversations about how institutions exact power in what are currently ‘invisible’ ways without maintaining these infrastructures in ways that perpetuate harm. Numerous scholars have called for increased transparency in the acknowledgement of terminological shifts and other changes made to classification schemes and indexing 292 language (Olson 2000, 68; Tennis 2013, 6; 2012, 1352; Roberto 2011, 63). Adding public-facing version control to item records in a library catalog would, quite literally, convey these global changes to users in this and other contexts. The assumption here, though, is that there will be global changes to document; put another way, the usefulness of version control in library catalogs’ item records is contingent on the Library of Congress’ willingness to make global changes justified by literary warrant and/or demanded by the communities being named. At this time, LCC/LCSH are exerting control over how users seek and retrieve materials, but in a way which is largely invisible to the average user (Drabinski 2013, 97; Olson 2000, 66). Increased transparency about how these infrastructures are exerting that control—in ways which are both beneficial and detrimental—would be a way for users to better understand how these infrastructures are shaping their efforts to find information. The addition of this documentation to public-facing library catalog records would thus be a mechanism for institutional accountability, as version controlled records would indicate reparative practices and harmful practices alike. LCC/LCSH have been widely criticized for their damaging treatment of the LGBTQ community and other already-marginalized communities by researchers and librarians over the course of the 20th century; transparency about the continuation of harmful practices may encourage the instutition to actually implement the reparative changes which have been repeatedly demanded. That being said, changes have been made to LCC/LCSH which do reduce the harm caused by problematic terminology and other practices (Olson 2000, 60), and version controlled item records would convey these positive changes to users as well. The well-intentioned goal of LCC/LCSH are to facilitate browsing and retrieval of relevant materials, but the frequency with which these infrastructures are shown to fall short of this goal necessitates a closer look at the tools used for this purpose (Tennis 2016, 578). These tools include the terms used to name classes and subjects, but also the catalogs themselves. Future research may focus on the logistics of exploiting digital environments’ capabilities for enabling the addition of version control to library catalog records on a global scale, so as to reduce the burden that additional documentation could place upon cataloguers, librarians, and other relevant information professionals. The addition of version control to library catalogs’ public-facing item records would surely be beneficial to the transgender community and other communities which have been poorly represented by LCC/LCSH, but would also be beneficial to any subject represented within a universal classification scheme; knowledge is constantly changing within disciplines, and any work which seeks to increase a classification scheme’s flexibility and ability to accommodate these changes is sure to benefit users across disciplines. That being said, the addition of version control to catalog records will not, by itself, solve problems such as terminological stagnancy and continued use of derogatory language. This documentation will convey this institution’s shifts in perspective on the bibliographic universe and enable transparency and accountability on an institutional level; it will provide support to users within and outside academia who are using these catalogs, and will hopefully facilitate future work which will substantially reduce the maintenance of oppressive practices in information infrastructures. 291 7.0 Conclusion The evolution of self-representative terminology used by the transgender community, which may be seen in not only published works but in discursive practices in digital environments, has not been adequately reflected in the Library of Congress Classification Scheme, which still uses the outdated term “transsexualism” to name the subclass within which works about this community are classified; this community is only slightly better represented in Library of Congress Subject Headings. This disparity in terminology conveys previously-voiced concerns about the continued use of outdated, perjorative terminology in LCC/LCSH; it also shows the greater shortcomings of knowledge organization infrastructures which are so fixed that they cannot accommodate for terminological shift based upon the development of new knowledge. The addition of publicfacing documentation of global changes to these infrastructures to catalog records may ameliorate these concerns, as this documentation would convey changes in representations of subjects; it would also act as a mechanism for institutional accountability to its patrons, particularly those who have called for changes to be made to these infrastructures. References Adler, Melissa. 2017. Cruising the Library: Perversities in the Organization of Knowledge. First edition. New York: Fordham University Press. Adler, Melissa. 2015. ““Let’s Not Homosexualize the Library Stacks”: Liberating Gays in the Li- Brary Catalog.” Journal of the History of Sexuality 24, no. 3: 478-507. doi:10.7560/JHS24306. Adler, Melissa. 2009. “Transcending Library Catalogs: A Comparative Study of Controlled Terms in Library of Congress Subject Headings and User-Generated Tags in Librarything for Transgender Books.” Journal of Web Librarianship 3, no. 4: 309–331. doi:10.1080/19322900903341099. Drabinski, Emily. 2013. “Queering the Catalog: Queer Theory and the Politics Of Correction.” The Library Quarterly 83, no. 2: 94-111. doi:10.1086/669547. Fox, Melodie J. 2016. ““Priorities of Arrangement” or a “Hierarchy of Oppressions?”: Perspectives on Intersectionality in Knowledge Organization.” Knowledge Organization 43: 373—383. Johnson, Matt. 2010. “Transgender Subject Access: History and Current Practice.” Cataloging & Classification Quarterly 48, no. 8: 661-683. doi:10.1080/01639370903534398. Library of Congress. n.d. “Class H—Social Sciences.” Library of Congress Classification Outline. Library of Congress. 2007 “Transsexuals.” LC Subject Headings (LCSH). Revised 27 Jun. 2007. Library of Congress 2018a. “Female-to-male Transsexuals.” LC Subject Headings (LCSH). Revised 11 Dec. 2018. Library of Congress. 2018b. “Transgender Men.” LC Subject Headings (LCSH). Revised 11 Dec. 2018. Library of Congress. 2019a. “Transgender People.” LC Subject Headings (LCSH). Revised 24 Jan. 2019. Library of Congress. 2019b. “Male-to-female Transsexuals.” LC Subject Headings (LCSH). Revised 24 Jan. 2019. Library of Congress. 2019c. “Transgender Women.” LC Subject Headings (LCSH). Revised 24 Jan. 2019. 292 Olson, Hope A. 2000. “Difference, Culture and Change: The Untapped Potential of LCSH.” Cataloging & Classification Quarterly 29, nos. 1-2: 53-71 doi:10.1300/J104v29n01_04. Olson, Hope A. 2001. “The Power to Name: Representation in Library Catalogs.” Signs 26, no. 3: 639-668. doi:10.1086/495624. Ranganathan, S.R. 1937. Prolegomena to Library Classification, London: Edward Goldston, Ltd. Roberto, K.R. 2011. “Inflexible Bodies: Metadata for Transgender Identities *.” Journal of Information Ethics 20, no. 2: 56—64. doi:10.3172/JIE.20.2.56. Tennis, Joseph. 2013. “Metaphors of Time and Installed Knowledge Organization Systems: Ouroboros, Architectonics, or Lachesis?” Information Research 18, no. 3: C38. Tennis, Joseph T. 2016. “Methodological Challenges in Scheme Versioning and Subject Ontogeny Research.” Knowledge Organization 43: 573-80. doi:10.5771/0943-7444-2016-8-573. Tennis, Joseph T. 2012. “The Strange Case of Eugenics: A Subject's Ontogeny in a Long‐Lived Classification Scheme and the Question of Collocative Integrity.” Journal of the American Society for Information Science and Technology 63, no. 7: 1350—59. doi:10.1002/asi.22686. Ådne Meling – Western Norway University of Applied Sciences, Department of Pedagogy, Religion and Social Studies, Norway A Critique of the Use and Abuse of Typologies in Cultural Policy Analysis Abstract: Typologies used explicitly or implicitly in cultural policy analysis (CPA) do not comport well with basic principles of typology. The research area of CPA therefore suffers from a lack of conceptual rigour. In order to provide an understanding of why CPA finds itself in this unfortunate state, this article examines three specific and relevant typologies, and concludes that CPA would benefit from using principles of knowledge organisation more actively. 1.0 Introduction The typologies examined in this article provide the framework for a significant proportion of the research conducted within the social science subdiscipline of cultural policy analysis (CPA).1 More precisely, these typologies are utilised in various analytical attempts to address fundamental CPA research questions such as: How does the public sector legitimise public funding of the arts? As a research area, CPA is predominantly empirical and critical, and to a lesser degree conceptual.2 There is thus a potential for increasing the conceptual rigour of CPA by applying principles of knowledge organisation. As Hjørland (2017) has pointed out, it is possible to make a long list of units that can be classified. Although objectives are not on Hjørland’s list of potential units of classification, we may add them to this list. For example, researchers within education might find it valuable to classify educational objectives (Bloom 1956). Correspondingly, researchers within CPA classify cultural policy objectives and cultural policy rationales. Thus, CPA includes both the construction of typologies and the process of assigning empirical observation to them. Researchers within CPA rarely develop their typologies themselves. More often, they use typologies a priori, for example in addressing questions about the ways in which authorities legitimise public funding of the arts. A priori typologies are applied in analyses of empirical material such as policy documents and interview transcriptions. However, the typologies that are applied a priori to answer these kinds of research questions are rarely scrutinised explicitly by the researchers who make use of them. This lack of explicit scrutiny might be due to the tendency within the social sciences to take 1 In the relevant literature, “cultural policy studies” is a more common term than “cultural policy analysis”. However, the term “studies” in “cultural policy studies” is often a signal that the publication is intended to be critical, as in critical theory (e.g. McGuigan 2004). “Cultural policy analysis” is a more general term and includes analyses of cultural policy that are informed both by critical theory and by other theoretical perspectives. In this article, which considers cultural policy analysis broadly understood, the term “cultural policy analysis” is therefore more accurate and useful than “cultural policy studies”. However, the abbreviation CPA (cultural policy analysis) is neither an established nor a commonly used abbreviation. It is used in this article for the sake of brevity. 2 This can be observed in prominent CPA journals, such as the International Journal of Cultural Policy, which only rarely publishes articles with the stated aim of scrutinising analytical frameworks. 294 typologies for granted. As Bailey (1994, 83) has remarked, “because classification is so ubiquitous, it is relatively easy to overlook it”. The methodological apparatus of CPA is derived from several social science disciplines, but most notably economics and sociology. These two disciplines differ considerably in terms of their approaches to concepts. The typologies that these two disciplines offer will be discussed here with regard to two basic requirements of typologies, which are that they should consist of collectively exhaustive and mutually exclusive types (Marradi 1990) and that they should be practically useful. 2.0 Typologies applied in CPA We will consider three typologies that are frequently applied by researchers within CPA. The conceptual rigour of these typologies is important for the general rigour of CPA, owing to the prominence of these typologies within it. 2.1 Welfare economics A seminal work within CPA was published in 1966, when Baumol and Bowen (1966) provided an analysis of the economic challenges of the performing arts. The authors stated that the purpose of their study was to “be able to specify objectively the alternatives facing the arts and to describe their costs and the burdens they require society to shoulder” (Baumol and Bowen 1966, 4). This publication marked the founding of cultural economics as a research area. Accordingly, Baumol and Bowen’s book is a starting point for many researchers who wish to conduct CPA from a welfare economics perspective. Their goal was to “describe the logic on which such a decision, one way or the other [public funding or no public funding of performing arts], should be based if it is to satisfy the criterion of rationality” (Baumol and Bowen 1966, 378). The logic that they referred to was the logic of welfare economics. Based on this logic, they presented a typology of three possible arguments for public support of the arts: 1. Egalitarian grounds (“public funds devoted to the opening of opportunities to the impecunious”; see Baumol and Bowen 1966, 379). 2. The education of minors (“The arts must be made available early, while tastes are still being formed and behavior patterns developed”; see Baumol and Bowen 1966, 380). 3. The public goods argument of non-excludable and non-rivalrous goods (Government subsidies of the production of a good are warranted if it is both impossible to exclude consumers from using the good and additional consumers do not reduce quality of consumption for other users; see Baumol and Bowen 1966, 381). 295 Now, how does this typology fit as an a priori tool for analysing and comparing ways in which governments legitimise public support of the arts? The short answer is: not so well. An immediate problem with the “logic” of this typology is that it is easy to suggest additional potential political arguments for public support of the arts that are plausible and yet are not covered by the typology. In other words, the types in the typology are not collectively exhaustive. For example, several authors have pointed out that, historically, public support of the arts has been significantly legitimised through the argument that national pride is increased by excellent arts performances (e.g. Bennett 1995, Duelund 2009). Some Danes, for example, might get an increased feeling of national pride after witnessing an excellent performance by a Danish orchestra of a symphony written by a Danish composer. But how does that argument fit in Baumol and Bowen’s (1966) overview? The “national pride” argument cannot reasonably be claimed to have been covered by the typology. Another example of a potential argument that is not covered is the frequently made claim that the economic output of a nation or region can be increased by governments “investing” in the arts for the sake of economic impacts (e.g. Myerscough 1988). Contemporary cultural economics has developed somewhat from the overview in Baumol’s and Bowen’s seminal book. For example, Fullerton (1991) lists rationales of public funding that are related to concepts such as redistribution, merit goods, public goods and externalities. Publications within contemporary welfare economics (e.g. Stiglitz and Rosengard 2015) frequently assume, although implicitly, that these kinds of legitimations are collectively exhaustive and mutually exclusive. In that regard, contemporary welfare economics represents a more developed a priori tool for CPA than that of Baumol and Bowen’s (1966) typology. But, while the typology of public funding rationales provided by recent welfare economics might be satisfactory with regard to collective exhaustiveness and mutual exclusiveness, welfare economics is still not a practically useful tool in policy analysis. The typologies of legitimations for public Figure 1: Baumol and Bowen's typology of public funding rationales for the performing arts 296 funding that are provided by welfare economics are rarely applied as a priori analytical tools in policy analysis. The reason is probably a lack of phenomenological relevance. For example, if we consider Fullerton’s list referred to above, an analyst using welfare economics as an a priori tool might ask herself: Is the social cohesion that a hypothetical government claims to provide through public funding of the arts an example of a “public goods” argument or of an “externalities” argument for public funding? The problem in answering this is that the technical and abstract way in which the different types of welfare economics arguments for public funding of the arts are labelled significantly reduces their practical applicability within CPA. The lack of practical applicability of welfare economics is due to the fact that ministries of culture, along with other funding bodies, rarely formulate themselves in ways that can easily be interpreted through the lenses of economic concepts such as “public goods” and “externalities”. In conclusion, when it comes to providing CPA with an a priori tool, what seems to be the major strength of economics as a discipline is that the over decades it has improved Baumol and Baumol’s typology into a typology that consists of mutually exclusive and collectively exhaustive categories of legitimation (e.g. Stiglitz and Rosengard 2015). However, a major weakness of welfare economics, from the perspective of CPA, is that the labels that it attaches to different types of legitimitation of public funding of the arts are too technical and abstract. The labels lack ‘thickness’ (Geertz 2017). Hence, it is difficult for CPA researchers to assign empirical observations to the types of legitimation that welfare economics provides. 2.2 The intrinsic–instrumental dichotomy A second typology that has been extensively applied within CPA is the intrinsic– instrumental dichotomy (e.g. O-Kyung 2010). Although this is not always explicitly stated, this dichotomy derives from classical sociological theory – most prominently, Weber’s typology of types of social action. Weber’s typology of social action consists of instrumentally rational, value-rational, affectual and traditional action (Weber 1978). Researchers within CPA have frequently made use of the first two of these types, differentiating between instrumental value and intrinsic value as motivations behind cultural political argumentation, where these two notions correspond to the first two of Weber’s categories. The dichotomy is frequently used as an a priori analytical tool in CPA, but the dichotomy is also contested within CPA, for two reasons. First, within CPA publications, the notion of instrumentality has de facto become inextricably associated with valueladen CPA, in that the term “instrumental policies” has become synonymous with “undesirable policies” (e.g. Belfiore 2015). This means that it is has become difficult to use the term “instrumental” in a value-free manner (see Weber 1949). Second, the intrinsic–instrumental dichotomy is conceptually contested, in the sense that researchers within CPA disagree on whether it is at all plausible, in any context, for a political body to make the claim that a policy is motivated by the intrinsic value of art. Some researchers have concluded that it is logically fallacious to apply the proclaimed intrinsic value of art as an argument for public funding (e.g. Bakhshi et al. 2009; Culyer 1973; 297 Vestheim 2007). Now, let us assume that a researcher would like to conduct a value-free form of CPA, and to ask a research question based on the intrinsic–instrumental dichotomy, such as: To what extent is contemporary cultural policy (for example, in Denmark) underpinned by the assumption that cultural policy is instrumental; and, by contrast, to what degree is contemporary cultural policy underpinned by the idea that the arts have intrinsic value? Going further, the CPA researcher will now have to answer a number of non-trivial questions: What does the researcher understand by the term “intrinsic value”? What does the researcher believe that the authorities mean when they convey their intentions to legitimise public funding of art through the intrinsic-value argument? Does the researcher assume that the term “intrinsic”, whether this is communicated literally or intentionally, refers to a value that is independent of the existence of human beings? Does the notion of the “intrinsic value” of art invoke religion or spirituality? In other words, the intrinsic-value concept elicits a plethora of follow-up questions. This is because the concept seems to be notoriously elusive, in that researchers who rely on its relevance rarely provide an intensional definition. What seems to be the case is that the concept is either defined in the negative, as the opposite of instrumental value, or through a form of “family resemblance” (Wittgenstein 1958) or extension (Marradi 2012). Researchers seem to assume that readers will know what intrinsic value is all about, although the readers themselves cannot be assumed to be able to define it by its intension. For example, one researcher concludes that the arts are “intrinsically valuable to society” (Belfiore 2004, 200, emphasis in original), without defining intrinsic value. Without imputing too much to this quote, the assumption seems to be that readers do not need a definition of intrinsic value because they will know from practical experience what intrinsic value is. But if researchers wish to analyse the degree to which cultural policy is instrumental – and, by contrast, the degree to which it is underpinned by notions about the intrinsic value of art – it is reasonable to expect a conceptual elucidation of the concept of intrinsic value. Hence, applications of the intrinsic–instrumental dichotomy within CPA are likely to provoke more questions than they answer. Figure 2: Intrinsic value versus instrumental value in the legitimation of public funding of the the arts 298 What this tells us is that if a researcher within CPA states, explicitly or implicitly, that he or she will apply the intrinsic–instrumental dichotomy as an a priori analytical tool, this is a decision that requires at least two follow-up questions: (1) Is the use of the intrinsic–instrumental dichotomy in a CPA publication intended to convey that the publication is value-laden, or is the dichotomy used in an analytical, value-free manner? (2) If the researcher’s aim is to analyse the intentions in the empirical material, how does the researcher define the term “intrinsic value” in terms of its intension? The problem with publications within CPA is that these questions are rarely asked, and thus rarely answered. Hence, the intrinsic–instrumental dichotomy is a rather confusing a priori tool that lacks conceptual rigour. 2.3 The orders of worth The final a priori CPA tool that deserves to be scrutinised here is the so-called orders of worth framework provided by Boltanski and Thévenot (2006). This framework is much more recent than the two previously discussed tools, but its prevalence within CPA during the last decade or so has been considerable. Boltanski and Thévenot state: “we have been able to observe the operation of six higher common principles to which, in France today, people resort to most often in order to finalise an agreement or pursue a contention” (Boltanski and Thévenot 2006, 71, emphasis in original). Subsequently, they describe how their observed “common principles” extend into various “political forms of worth” (Boltanski and Thévenot 2006, 83–124). In other words, the common principles are considered to apply to the political realm, not just to the social world in general. It is this application of the common principles to the political realm that has paved the way for the entrance of the “orders of worth” framework into CPA as an a priori tool. From the perspective of CPA, there are problems with this system that have not been sufficiently acknowledged by researchers. According to Boltanski and Thévenot themselves, “our list of principles is not exhaustive; we can discern the shape of other polities that might be constructed” (Boltanski and Thévenot 2006, 71). In other words, Figure 3: The orders of worth and government funding of art 299 if we apply the “orders of worth” framework as an a priori tool in the analysis, we cannot simply assume that all possible justifications of public funding of the arts are covered by Boltanski and Thévenot’s (2006) system. The system is not intended to be an exhaustive list of the various forms of legitimation that are used in relation to public policies. In addition, it seems clear that the types of justifications that are identified in their book are not mutually exclusive. The authors describe six different polities, or “worlds”. These are the “inspired world”, the “domestic world”, the “world of fame”, the “civic world”, the “market world” and the “industrial world”. However, nowhere do Boltanski and Thévenot claim that these worlds are supposed to be mutually exclusive. On the contrary, it seems reasonable to conclude that they are not. For example, there is nothing in Boltanski and Thévenot’s definitions that suggests that agents cannot be associated with both the “world of inspiration” and the “industrial world” simultaneously. This also means that the different ways of justifying an action, such as the action of public art funding, cannot easily be assigned to the typology of worlds from which these justifications emanate, because one specific form of justification might emanate from more than one world. Let us look at some studies that use the “orders of worth” system in an a priori manner. One study concludes that “the world of inspiration is the most important value regime in the creative industries” (Nijzink et al. 2017, 609). Presumably, artists and creative workers are therefore of the opinion that the “world of inspiration” should inform the public authorities that fund the arts. But this is trivial. It is not necessary to have an advanced education in the social sciences to predict that, of Boltanski and Thévenot’s various “worlds”, artists will prefer the “world of inspiration”. In addition, Nijzink et al. seem to misunderstand Boltanski and Thévenot’s conceptual system by implicitly referring to the system as a typology of collectively exhaustive and mutually exclusive forms of legitimation. They simply rank the different worlds in terms of their importance as cultural political legitimations for different shareholders (Nijzink et al. 2017, 609), without much regard for the openness of the system that is emphasised by Boltanski and Thévenot. In another study, Lemasson concludes that “the coexistence of the inspired logic with the civic or the industrial ones” has been “difficult to achieve” in cultural policy in Quebec (Lemasson 2017, 81). But this is just a way of concluding, in a language coloured by the framework of Boltanski and Thévenot, that the arts sometimes need public support to survive. Yet another study concludes that, of Boltanski and Thévenot’s orders of worth, it is the “civic world” that has dominated the legitimating rhetoric of cultural funding in Norway and Sweden at the beginning of the 21st century (Larsen 2016, 129). But does this mean that “the inspired world” has become less important? Where are the borders between the “inspired world” and the “civic world”? Has artistic excellence become less important, while inclusion of different demographic groups of society has become more important? If the latter is the case, then how does it bring us deeper insights to embed these observations in Boltanski and Thévenot’s conceptual system? This seems entirely unclear. In summary, Boltanski and Thévenot’s orders of worth appears, at first glance, as a productive a priori tool in analysis, owing to the authors’ use of the phrase “on justification”, for example in the title of their book. The phrase “on justification” might suggest that the system is tailor-made for analysing how public bodies justify taxpayer funding of their services. However, as an a priori tool in CPA, this system has two major 300 weaknesses. First, the system is not intended to consist of collectively exhaustive and mutually exclusive types of legitimations. Second, the system is made up of labels that tend to obscure empirical findings more than they provide insights into them. Thus, the main benefit of the orders of worth system, from the perspective of the CPA publications that make use of it, is not conceptual rigour. The system has managed to “conquer the academic scene” in CPA (Mangset 2010, 48), but it seems reasonable to ask whether there have been guru effects involved in the conquest (Sperber 2010, Elster 2011). 3.0 Conclusions and implications This article has questioned the ways typologies are used in an a priori fashion within a specific area of the social sciences, namely cultural policy analysis (CPA). The aim has not been to provide an exhaustive presentation of typologies that are applied by contemporary researchers, nor has the goal been to provide an exhaustive examination of the methodological problems associated with the typological systems examined. Instead, the goal has been to show that Bailey’s (1994) warning – namely, that we should not take our typologies for granted – needs to be continuously repeated in the social sciences. The problem addressed in this article might be due to a too deep divide between theoretical and empirical investigations. It is important to avoid a situation where theory within the field is used as “sacred texts to be worshipped as totems” (Turner 1998, 245). The construction of typologies is ongoing work, and better typologies of cultural policy rationales can be mined out in a collaborative effort by the research community as a whole. Thus, analysts should be encouraged to reflect on the typologies that they use, and to do so explicitly in the methodological sections of their publications. It might be tempting, especially for younger researchers, to adopt a readymade typology, in particular, if they observe that this typology has already reached paradigmatic status. But such a paradigmatic status should not be allowed to deflect researchers within CPA from explicitly scrutinising their typologies. References Bailey, Kenneth D. 1994. Typologies and Taxonomies: An Introduction to Classification Techniques. Thousand Oaks, CA & London: SAGE. Bakhshi, Hasan, Alan Freeman, and Graham Hitchen. 2009. Measuring Intrinsic Value: How To Stop Worrying and Love Economics. Munich: Munich Personal RePEc Archive. Baumol, William J. and William G. Bowen 1966. Performing Arts – The Economic Dilemma: A Study of Problems Common to Theater, Opera, Music, and Dance. Millwood, NY: Kraus Reprint Co. Belfiore, Eleonora. 2004. “Auditing Culture: The Subsidised Cultural Sector in the New Public Management.” International Journal of Cultural Policy 10, no. 2: 183–202. Belfiore, Eleonora. 2015. “‘Impact’, ‘Value’ and ‘Bad Economics’: Making Sense of the Problem of Value in the Arts and Humanities.” Arts and Humanities in Higher Education 14, no. 1: 95–110. Bennett, Oliver. 1995. “Cultural Policy in the United Kingdom: Collapsing Rationales and the End of a Tradition.” International Journal of Cultural Policy 1, no. 2:199–216. Bloom, Benjamin S. 1956. Taxonomy of Education Objectives: The Classification of Educational Goals. London: David McKay Co. Boltanski, Luc and Laurent Thévenot. 2006. On Justification: Economies of Worth. Princeton, NJ & Oxford: Princeton University Press. 301 Culyer, Anthony J. 1973. The Economics of Social Policy. London: Robertson. Duelund, Peter. 2009. “Our Kindred Nations: On Public Sphere and the Paradigms of Nationalism in Nordic Cultural Policy.” In What About Cultural Policy? Interdisciplinary Perspectives on Culture and Politics, eds. M. Pyykkönen, N. Simanainen, and S. Sokka. Helsinki: Minerva, 117–39. Elster, Jon. 2011. “Hard and Soft Obscurantism in the Humanities and Social Sciences.” Diogenes 58, nos. 1-2: 159–70. Fullerton, Don. 1991. “On Justifications for Public Support of the Arts.” Journal of Cultural Economics 15, no. 2: 67–82. Geertz, Clifford. 2017. The Interpretation of Cultures. New York: Basic Books. Hjørland, Birger. 2017. “Classification.” Knowledge Organization 44: 97–128. Larsen, Håkon. 2016. Performing Legitimacy. Cham: Springer. Lemasson, Gaëlle. 2017. “On the Legitimacy of Cultural Policies: Analysing Québec’s Cultural Policy with the Economies of Worth.” International Journal of Cultural Policy 23, no. 1: 68– 88. Mangset, Per. 2010. “Ernst Kris and Otto Kurz, Legend, Myth, and Magic in the Image of the Artist: A Historical Experiment.” International Journal of Cultural Policy 16, no. 1: 48-49. Marradi, Alberto. 1990. “Classification, Typology, Taxonomy.” Quality and Quantity 24: 129– 57. Marradi, Alberto. 2012. “The Concept of Concept: Concepts and Terms.” Knowledge Organization 39: 29–54. McGuigan. Jim. 2004. Rethinking Cultural Policy. Maidenhead: Open University. Myerscough, John. 1988. The Economic Importance of the Arts in Britain. London: Policy Studies Institute. Nijzink, Douwe, Quirijn L. van den Hoogen, and Pascal Gielen. 2017. “The Creative Industries: Conflict or Collaboration? An Analysis of the Perspectives from which Policymakers, Art Organizations and Creative Organizations in the Creative Industries Are Acting.” International Journal of Cultural Policy 23, no. 5: 597–617. O-Kyung, Yoon. 2010 Intrinsic and Instrumental Rationales in Contemporary UK Cultural Policy: Negotiating Cultural Values in the Climate of Neoliberalism. Ph.D. dissertation. Loughborough: Loughborough University. Sperber, Dan. 2010. “The Guru Effect.” Review of Philosophy and Psychology 1, no. 4: 583-592. Stiglitz, Joseph E. and Jay K. Rosengard. 2015. Economics of the Public Sector. 4th edition. New York: W. W. Norton & Co. Turner, Jonathan. 1998. “Must Sociological Theory and Sociological Practice Be So Far Apart? A Polemical Answer.” Sociological Perspectives 41, no. 2: 243-258. Vestheim, Geir. 2007. “Theoretical Reflections.” International Journal of Cultural Policy 13, no. 2: 217–36. Weber, Max. 1949. On the Methodology of the Social Sciences. New York: The Free Press. Weber, Max. 1978. Economy and Society: An Outline of Interpretive Sociology. Berkeley: University of California Press. Wittgenstein, Ludwig. 1958. Philosophical Investigations. Oxford: Blackwell. Juan Bernardo Montoya-Mogollón – São Paulo State University – UNESP, Brazil Sonia Troitiño – São Paulo State University – UNESP, Brazil Digital Forensics Science and Knowledge Organization An Interdisciplinary Approach to Addressing the Conceptual Challenges of Born-Digital Records Abstract: The digital age has drastically impacted the way people think and act using technology. The advancements of the 20th and 21st centuries have ushered in a number of changes related to the treatment of information. Private and public organizations have faced increasing pressure to shift to digital formats. However, the speed at which these changes have occurred has created several conceptual concerns related to the trustworthiness of digital records and systems, and the security afford by their infrastructure. This paper identifies and addresses several of these concerns through the interdisciplinary theoretical frame of Archival, Diplomatic, and Digital Forensics areas. This alternative approach is presented as a contribution to Knowledge Organization (KO), which can help reframe the way scholars understand the following problems posed by digital records: 1) The challenge of maintaining digital records as authentic, accurate, and reliable sources of information; 2) The challenge of establishing trustworthy repositories for information storage and access; and 3) The challenge of ensuring long-term maintenance and preservation of records. We argue that the different concepts and techniques of Archival Diplomatics and Digital Forensics, such as provenance, original order principles, and chain of custody, have much to contribute to the professional study and practice of KO in digital environments, particularly where it relates to issues of authenticity. 1.0 Introduction Currently in Archival Science, a series of terms are used as synonymous with ‘records’ in bureaucratic and archival scenarios, such as data, document, and information. Some of these terms were created by The International Research on Permanent Authentic Records project – InterPARES1. We decided to adopt them as they can be used in an interoperable kind in Archival-Diplomatics, Forensics Science, and Knowledge Organization (KO). We will need to use this terminology in this article because in several studies these words are written without distinction, which can cause ambiguity for our research problem. We will look at a series of disciplines that use these concepts in their own field, sometimes with a different meaning. Data is the smallest meaningful piece of information. A document is recorded information. A record is any document created (i.e., made or received and set aside – i.e. kept, saved – for action or reference) by a physical or juridical person in the course of practical activity as an instrument and by-product of said activity (in some countries, a record is called archival document or simply document). Information is a message intended for communication across space and time (Duranti and Thibodeau 2006, 15). 1 This group born in 1990 with the “aims at developing the knowledge essential to the long-term preservation of authentic records created and/or maintained in digital form and providing the basis for the standards, policies, strategies, plans of action capable of ensuring the longevity of such material and the ability of its users to trust its authenticity.” ( 303 Consequently, we will be using the concept of born-digital records. A search of academic literature can find electronic records or electronic documents, but a small difference exists: electronic regularly makes reference to electric signals and is applied when we talk of hardware or technological devices (Prayudi and Ashari 2015, 1). While the meaning of ‘digital’ is considered as the interaction of various hardware and software components to give rise to binary digits: “Hardware and low-level software detects the physical properties and interprets them as binary digits – i.e., digits that can take only one of two possible values – which are called bits. By convention, we say that the two possible binary values are 1 and 0” (Lee 2012a, 511). Therefore, we will use the word ‘born-digital’ to define those records created/produced exclusively in digital environments and not as a part of a digitization process. Even today, almost forty years after the introduction of digital records in business systems, there are still problems considering these born-digital records as legally reliable. Archival Science is concerned with studying and proposing measures to be adopted for their maintenance and long-term preservation. Maintenance and preservation were an issue with the analogical or paper record, but a machine did not need to mediate the process as is the case with digital records. Therefore, Archival Science is constantly renewing, by searching knowledge in other areas to understand the nature and composition of born-digital records and making efforts to maintain their trustworthiness. Figure 1. Interdisciplinarity of Archival Knowledge (Duranti 2010). For this purpose, we intend to establish the relationship between digital Archival- Diplomatics, Digital Forensics Science, and KO. This contribution is contemplating how Digital Forensics represents the data storage in several technological systems and provides a knowledge framework to understand the composition of born-digital records. With this knowledge, the archivist will have the ability to preserve and access records, in addition to possessing other kinds of tools found in digital forensics. 2.0 Archival-Diplomatics Science Archival Science deals with three core concepts: provenance, original order, and chain of custody: “Two principles key to archival theory are the principle of provenance, which respects the documentary facts, maintaining the records of one creator separately from another, and the principle of original order, 304 which mandates keeping/describing the records in the order in which they were created and used. When these principles are respected and articulated through archival description, the authenticity of the record aggregations is protected [18-20]. The presumption of authenticity derives initially from the context of creation and chain of custody, and the documented processes of establishing intellectual, administrative, and usually, physical control – appraisal, accessioning and archival arrangement” (Rogers 2013, 8). As Guimarães and Tognoli (2015, 567) also note, both provenance and original order concepts that belong to “respect des fonds”, are relevant to Archival Knowledge Organization (AKO). These concepts allow the identification of the records in “its production context for planning its creation/production and treatment of its accumulation in the archives led the area to think over the identification as an archival process and the discussions about the place it occupies in the context of archival methodologies” (Tognoli and Rodriguez 2018, 46-47). Thus, the identification process is key to the creation of knowledge as it must consider the application of the principles of provenance and original order. On the other hand, Diplomatics Science was founded in 1681 by Jean Mabillon (1602-1707), a French Benedictine monk who established the foundations to analyze the authenticity of diplomas (hence the name Diplomatics for the records of this period) kept in the Abbey of Saint-Germain-des-Prés. He wrote his masterpiece de re diplomatica libri VI, at a time of social upheaval due to document forgeries throughout Europe. His work aimed to examine and compare the extrinsic (support, ink, seal, signature, etc.) and intrinsic (the intellectual message of the author) elements of the form of diplomas or records to verify their authenticity or falsehood. In the 20th century, Diplomatics Science was renewed and adapted for Archival Science. The two scholars key to reviving the study of the relationship between Diplomatics and Archival Science were Hilary Jenkinson (1957) and Roger-Henry Bautier (1961). However, in the early 1980s with the arrival of the digital age, the extrinsic and intrinsic elements of form began to address the concept of authenticity in the digital world (Duranti 1989; Duranti 2009, 42). This involved analyzing the individuals that participate in the process of record creation, the contexts in which the record exist, the act or transaction in which it participates, the procedures and documentary forms governing its creation, and the relationships that connect it to other records (Rogers 2015, 8). However, in the digital environment, the extrinsic and intrinsic elements of form are not easy to address because the representation of the digital records operates in abstract structures of conceptual, logical, and physical layers. The conceptual layer is the way the record is understood by a person on the screen or monitor. The logical layer is an object that is recognized and processed by hardware and software. The physical layer is an inscription of signs on a physical medium. The conceptual layer is closer to the traditional record in paper format. The logical layer is the most important to understand how the record is represented because of the storage data (content) and the metadata which could be easily visible in the conceptual layer or by using specialized tools (Rogers 2015). Archival-Diplomatics has been broadly studied in KO by a series of academics. We intend to continue this trend by considering the application of forensics science as a contribution to the KO. The study of Digital Forensics Science has been significant to analyze and preserve the authenticity of born-digital records in the long-term. 305 3.0 Digital Forensics Science Digital Forensics is a relatively new discipline (Pollitt 2010), which arose at the start of the 1980s at the height of computer accessibility. At the same time, illicit acts started to appear. Currently, this discipline is faced with an increasing number of challenges because of the internet and the complexity of digital devices. The most widely used definition of Digital Forensics is provided by the Digital Forensic Research Conference -DFRWS: “The use of scientifically derived and proven methods toward the preservation, collection, validation, identification, analysis, interpretation, documentation and presentation of digital evidence derived from digital sources for the purpose of facilitating or furthering the reconstruction of events found to be criminal, or helping to anticipate unauthorized actions shown to be disruptive to planned operations” (Palmer 2001, 16). In its early days, it was known as Computer Forensics, but with the broadening of its research objective, its name was changed to Digital Forensics: digital because it analyzes the information configured in binaries digits (as explained earlier) and forensics because it has a narrow relationship with the law. The etymological meaning of forensics is related to the forum “because the courts are forums where information may persuade us to restrict or remove individual liberties, they have proven to be a serious testing ground for scientific research” (Palmer 2001, 2). The relationship (and necessity) between Digital Forensics and Digital Archival Diplomatics draws on maintaining and preserving the trustworthiness of born-digital records. In a study written by Elizabeth Diamond (1994, 140), she considered that “if the historian is the lawyer in the court of history, then the archivist is the forensic scientist”. The responsibility of archivists is to ensure the maintenance and preservation of records, more-so, of born-digital records. A series of skills are required, for example: 1) Knowing the internal framework or layers; 2) How it is represented in the technological systems; 3) How to maintain its trustworthiness over time; and 4) How to provide access to users. According to Adam Jansen and Luciana Duranti (2011), knowledge of professionals about digital records must be more accurate since the challenges in the digital environment are increasingly complex: “Custodians can only preserve records as trustworthy (i.e. reliable, accurate and authentic) as they are when first created. It is therefore the custodian’s responsibility to establish the identity of the records prior to acquiring them and to maintain that identity, together with their integrity, afterwards (MacNeil, 2004). In the digital environment, this is a tall order, because it is not possible to preserve digital records; it is only possible to preserve the ability to reproduce them (Duranti and Thibodeau, 2006). As it will always be necessary to retrieve the binary bits and process those bits through the use of intermediaries (i.e. hardware and software) in order to render the evidence into a human readable format, it falls upon the custodian to ensure that the necessary intermediaries will exist when needed. To render representations with an accuracy that is able to withstand a diplomatic analysis requires the custodian to store the binary content of the record, including indicators of all the elements of documentary form necessary to convey the essence of the record, in a manner that ensures the record will be rendered with the same presentation and in the same context that gave it meaning” (Duranti and Jansen 2011). The above quotation explains the responsibility and accountability of archivists. Therefore, Archival-Diplomatics provides a framework that supports the preservation of the integrity and identity of born-digital records. The junction with digital forensics allows the use of tools and practices to deepen the representation of digital devices and its layers by keeping accurate information to produce accurate knowledge. 306 4.0 Digital Forensics Science and Knowledge Organization We noticed an evolution in how Archival Diplomatics fit the digital environment. The efforts made to work interdisciplinarily and apply the knowledge of disciplines, as such Diplomatics, Forensics, Law, History, and Computational Science, have supplied prolific answers for issues related to the trustworthiness, maintenance, and preservation of digital records in determined social contexts. The application of Digital Forensics to Archival Diplomatics processes allows us to interchange concepts, techniques, and tools to understand the structure of digital environments. Therefore, it should be stated that the archivist and the professional in Digital Forensics process have similar objectives for different purposes. The author Corinne Rogers (2013, 6) notes this similarity between the identification of (digital) records in the archival study and the (digital) evidence in the study of digital forensics. She expresses that: “Archivists and digital forensics practitioners share challenges involved in appraising and analyzing large volumes of digital material. The core archival functions have been identified as appraisal and acquisition, arrangement and description, retention and preservation, management and administration, reference and access [11]. Digital preservation has been demonstrated to encompass records creation and recordkeeping [12], thereby extending the archival functions over the entire life cycle of digital records. The traditional archival functions may be compared with the functions of digital forensics practice: identification, preservation, collection, examination, analysis, presentation and decision [13]. At the root of each is investigative research into the material in question – namely the digital traces of activities, and the relationships of those traces to the actors and actions which gave rise to them” (Rogers 2013, 6). The functions mentioned above in both Archival-Diplomatics and Digital Forensics are shared and complemented by concepts such as provenance, original order, and chain of custody in the digital context. However, their meanings are slightly different. Provenance is defined as the identification, extraction, and the saving of essential information about the context of creation; original order reflects original folder structures, files associations, related applications, and user accounts; and the chain of custody is the documentation of how records were acquired, whether or not they were transformed, and the use of hardware and software mechanisms to ensure that the data has not been inadvertently changed. Another relevant concept shared between forensics and archival sciences is the identification of sensitive information, specifically personal and private information, “the same tools that are used to expose sensitive information can be used to identify, flag and redact or restrict access to it” (Lee 2012b, 5). Therefore, the chain of custody is paramount in the digital environment. This element is essential to maintaining the integrity of the bitstreams, and although some challenges exist to ensure such integrity, its use is increasingly common. Its issues are related to 1) The volatility of digital evidence in resources such as register, memory, table, processor, temporary filesystem, disk, remote logging and data monitoring, physical configuration, and network topology, as well as archived data (Prayudi and Azhari 2015, 3); 2) with medium failure/bit rot; and 3) the obsolescence of both software and hardware (Lee et al. 2012). Digital Forensics practitioners build knowledge by deepening the internal structure of the born-digital records with the aim of better understanding its nature. To get to this point, it is necessary to understand the inner workings of logical, physical, and conceptual layers addressed by Rogers (2015). In addition to this, we need to examine how these layers are represented and structured within the technological systems. The author Christopher Lee (2012a) outlines the structure of these layers in an interesting overview. 307 He defines nine levels of representation (Figure 2) described as digital resources, namely: 0) bitstream on physical medium; 1) raw signal stream through I/O equipment; 2) bitstream through I/O equipment; 3) sub-file data structure; 4) file as “raw” bitstream; 5) File through filesystem; 6) In-application rendering; 7) Object or package; and 8) Aggregation of objects. Figure 2: Digital Resources – Levels of Representation (Lee 2012c). These levels go beyond Knowledge Organization Systems (KOS): “used to organize documents, document representations and concepts” (Hjørland 2008, 86). They act as preparation to care for the integrity in the storage systems. This is a very important first step to maintain the integrity of data, records, documents, and information until it is transformed into knowledge. In this regard, Digital Forensics exists as the first step to identify digital objects before later being integrated with Archival Diplomatics as a means to achieve knowledge. This knowledge has two meaningful uses: in certain legal or juridical cases and/or for archival or stewardships processes. 5.0 Concluding remarks Digital Forensics research is currently being analyzed and explored, with interesting outcomes in several centers of research, law enforcement agencies, and universities. We hope that the knowledge brought by this science continues to increase for the benefit of digital records conservation. The concepts, practices, and tools of Digital Forensics are being applied in several knowledge fields, and we can notice the benefits this offers to disciplines, such as Information Science, Archival-Diplomatics, and Knowledge Organization. A series of software products are being developed to carry out this work. One such example is “BitCurator,”2 a software produced in 2016 in a partnership between the 2 308 School of Information and Library Science at the University of North Carolina at Chapel Hill and the Maryland Institute for Technology in the Humanities. This software runs with natural language processing (NLP), developing for collecting institutions to extract, analyze, and produce reports on features of interest in text extracted from born-digital materials contained in collections (BitCurator 2018). The knowledge of archivists is key to the arrangement and description of paper records. Now in the digital device era, in addition to the Archival-Diplomatics Knowledge, tools and the application of the techniques produced and developed by Digital Forensics are fundamental because they deal directly with the integrity, reliability, and authenticity of digital records. Identifying, seizing, imaging, and analyzing material, such as floppy disks, cassette tapes, compact disks, USB, and hard drives, could be better performed through the application of technological tools and digital forensics processes to maintain the data integrity. References Bautier, Robert-Henri. 1961. “Leçon D'ouverture du Cours de Diplomatique à l'École des Chartes.” Bibliothèque de l’Ecole des Chartes 119: 194-225. Diamond, Elizabeth. 1994. “The Archivist as Forensic Scientist – Seeing Ourselves in a Different Way.” Archivaria 38: 139-154. Duranti, Luciana. 1989. “Diplomatics: New Uses for an Old Science.” Archivaria 28: 7-27. Duranti, Luciana. 2009. “From Digital Diplomatics to Digital Records Forensics.” Archivaria 68: 39-66. Duranti, Luciana. 2010. A Framework for Digital Heritage Forensics. Duranti, Luciana and Adam Jansen. 2011. “Authenticity of Digital Records: An Archival Diplomatics Framework for Digital Forensics.” In: ECIME - 5th European Conference on Information Management and Evaluation – COMO, Italy, 134-139. Duranti, Luciana and Kenneth Thibodeau. 2006. “The Concept of Record in Interactive, Experiential and Dynamic Environments: The View of InterPARES.” Archival Science 6: 13-68 Guimarães, José Augusto Chaves and Natália. Bolfarini Tognoli, 2015. “Provenance as a Domain- Analysis Approach in Archival Knowledge Organization.” Knowledge Organization 42: 562- 569. Hjørland, Birger. 2008. "What is Knowledge Organization (KO)?" Knowledge Organization 35: 86-101. Jenkinson, Hilary. 1958. “Archives and the Science and the Study of Diplomatic.” Journal of the Society of Archivist 8: 207-210. Lee, Christopher A. 2012a. “Digital Curation as Communication Mediation.” In: Handbook of Technical Communication, edited by Alexander Mehler and Laurent Romary. Berlin: De Gruyter Mouton, 507-530. Lee, Christopher A. 2012b. “Archival Application of Digital Forensics Methods for Authenticity, Description and Access Provision.” Comma 2: 133–140. Lee, Christopher A. 2012c. “Archival Application of Digital Forensics Methods for Authenticity, Description and Access Provision.” Presented at International Council on Archives Congress 2012. Lee, Christopher A., Matthew Kirschenbaum, Alexandra Chassanoff, Porter Olsen, and Kam Woods. 2012. “BitCurator: Tools and Techniques for Digital Forensics in Collecting Institutions.” D-Lib Magazine 18, nos. 5/6. Palmer, Gary. 2001. “A Road Map for Digital Forensic Research. Digital Forensic Research Conference (DFRWS).” In The Digital Forensic Research Conference 2001, 1-42. 309 Pollitt, Mark. 2010. “A History of Digital Forensics.” In: Advances in Digital Forensics VI. Digital Forensics 2010, edited by K.P. Chow and S. Shenoi. IFIP Advances in Information and Communication Technology 337. Berlin, Heidelberg: Springer, 3–15. Prayudi, Yuri and Azhari SN. 2015. “Digital Chain of Custody: State of the Art.” International Journal of Computer Applications 114, no. 5: 1-9. Rogers, Corinne. 2013. “Digital Records Forensics: Integrating Archival Science into a General Model of the Digital Forensics Process.” In Proceedings of the Second International Workshop on Cyberpatterns: Unifying Design Patterns with Security, Attack and Forensic Patterns 2013, edited by Clive Blackwell. Oxford, UK: Oxford Brookes University, 4-21. Rogers, Corinne. 2015. “Diplomatics of Born Digital Documents – Considering Documentary Form in a Digital Environment.” Records Management Journal 25, no.1: 6-20 Tognoli, Natalia Bolfarini and Ana Célia Rodrigues. 2018. “An Analysis of the Theoretical and Practical Application of Diplomatics to Archival Description in Knowledge Organization.” In Challenges and Opportunities for Knowledge Organization in the Digital Age: Proceedings of the Fifteenth International ISKO Conference 9-11 July 2018 Porto, Portugal, edited by Fernanda Ribeiro and Maria Elisa Cerveira. Advances in knowledge organization 16. Baden- Baden: Ergon, 43-51. Katherine Morrison – Indiana University Department of Information and Library Science, United States Committed to a Narrative Expressions of Knowledge Organization at The Henry Ford Museum of American Innovation Abstract: Innovation is one of the most pervasive rhetorical tropes in information industries and technologies. How can we deconstruct the meaning and social currency of innovation? Although science and technology studies critics have approached this question in recent years, it deserves closer study from knowledge organization (KO) domains. The history of innovation is a history of classification. In the Library of Congress, the subject heading “innovation” is classed by industry: technological innovations, agricultural innovations, and organizational innovations, for example. This classificatory structure fulfills capitalist expectations of industrial economics and mystifies the material production of industries and technologies. Significantly, innovation also allows documentary institutions to periodize their materials and order objects of knowledge along a hagiographic trajectory. An important site of study for this issue is The Henry Ford Museum of American Innovation in Dearborn, Michigan. This project explores how all The Henry Ford’s diverse collections are classed as “innovative.” My research question asks how a priori commitments, particularly a narrative of innovation that intertwines technological and social progress, shape subsequent expressions of KO as found in the description and arrangement of museum artifacts? This question will guide my analysis of a museum that places the Rosa Parks bus, Civil Rights Movement memorabilia, and slave shackles in the same physical space as Thomas Edison’s electric pen, a Macintosh 512K personal computer, and early television sets. All these objects are classed along a singular trajectory—they are all hagiographic exemplars of American history. Central here is the argument that The Henry Ford classifies innovation by privileging its collection objects’ modes of inscribing meaning in a “cluster of mutually defining” (Gitelman 2000) technological and social practices of change that are themselves already privileged in canons of American history. In this way, we see how KO is not an idealized neutral space to judge and describe objects, but rather expresses a positionality from which regimes of KO are constructed. 1.0 Introduction Innovation is one of the most pervasive rhetorical tropes in information industries and technologies. How can we deconstruct the meaning and social currency of innovation? Although science and technology studies critics have approached this question in recent years, it deserves closer study from knowledge organization (KO) domains. In the Library of Congress, the subject heading “innovation” is classed by industry: technological innovations, agricultural innovations, and organizational innovations, for example. This classificatory structure fulfills capitalist expectations of industrial economics and mystifies the material production of industries and technologies. Henry Ford is arguably the most canonically important industrial capitalist in American history. He is also celebrated as a legendary innovator, not only for his vision of industrial technologies (the assembly line) but also his bold social vision. His legacy as an innovator is enshrined at The Henry Ford Museum of American Innovation and Greenfield Village in Dearborn, Michigan1. The museum and historical village date to 1926, when he purchased 260 acres near his sprawling River Rouge Factory complex with the purpose of 1 The Henry Ford.” n.d. 311 building an equally sprawling museum complex filled with historical buildings and artifacts of domestic and industrial life (Swigger 2014, 17). Today, the space collectively known as The Henry Ford is the largest indoor-outdoor museum complex in the United States. This paper addresses the meaning-making of collections at the museum specifically, whose collection mission is to “illustrate the process and context of innovation.” Of critical importance, then, is an examination of how all the museum’s diverse artifacts and documents are classed as innovative. This project examines how the groupings of collections at The Henry Ford express contingent meanings and relationships, and how these relationships constitute a particular ideological narrative of innovation. This paper demonstrates the fundamental positionality of any knowledge organization system. A popular replica and exhibit space at The Henry Ford’s Greenfield Village—Thomas Edison’s Menlo Park--allows a glimpse into how the museum positions innovation as a verb and noun in order to make a coherent narrative. This narrative, in turn, relies on tropes of American resilience and novelty. What follows is a discussion of other information and communication technologies-specifically a collection of those meant to evoke feelings of personal nostalgia--on display in the museum. The final section analyzes how The Henry Ford has evolved to communicate more directly with its local constituents in Dearborn and Detroit through the acquisition and display of American Civil Rights movement artifacts. Edison’s laboratory and twentieth century information technologies may not seem to share a common meaning of “innovation” with the museum’s Civil Rights exhibits and the infamous Rosa Parks bus, but as we shall see, they work in concert to collapse the meaning of innovation in order to serve a whiggish American history trajectory. Ultimately, The Henry Ford must maintain narrative control in order to represent and organize American history—a serious knowledge organization task. 2.0 The method A central goal of this project is to bring together more closely media history and KO literature. If, as Lisa Gitelman and others have suggested, we can think of media as material objects and complicated structures of sociocultural communication (Gitelman 2008, 6), we should examine more closely how meaning-making takes place at the intersections of physical presence and ideas. Lorraine Daston has further emphasized the indeterminate slippage between the “stolidly functional things…[and how they] radiate an aura of the symbolic” (Daston 2007, 19). Such an acknowledgment is particularly important for understanding the conceptual and physical authority of KO systems. This paper acknowledges the prodigious body of literature on Ford’s Greenfield Village, a strange and impressive village of preserved historic structures and built on site. Jessie Swigger’s foundational text “History is Bunk”: Assembling the Past at Henry Ford’s Greenfield Village (2014) built on archival evidence and secondary literature (such as biographies by Steven Watts and David L. Lewis) to explain how Ford built a selective genealogy of industrial and domestic habits and materials. The particular assemblage of actual historical buildings (including the transplanted Wright Brothers’ cycle shop and an actual Cotswold cottage transported from England) and altered spaces created by The Henry Ford’s staff over the second half of the twentieth century are a seductively idiosyncratic site of study, rich with anachronisms and impressive efforts at historical authenticity. The museum, which boasts 26 million artifacts spanning 300 312 years of history, is well-recognized across literatures of technology history, material culture, and public history. The Society for the History of Technology’s journal publication, Technology and Culture, has regularly featured articles by The Henry Ford staff and reviews of its exhibits and expansions. This project introduces this site of public and technology history to the domain of KO. Ultimately it argues that the act of “doing” public history is an act of knowledge organization. In recent years the domain of knowledge organization has recognized museums as powerful sites for critical KO theorization and research. In her illuminating conceptual review on this topic, Hannah Turner emphasized how critical KO analysis of museums “posits museums as key sites of knowledge production and circulation and sets the stage for an understanding of the “background” work of museums as an important site for understanding knowledge organization more broadly.” (Turner 2017, 473). Turner and numerous other KO researchers including Rick Szostak (2017), Melissa Gill (2017), and Lala Hajibayova (2017) have all enriched the KO community’s understanding of how institutional infrastructures and sociotechnical affordances position regimes of knowledge and organization. I offer a contribution to this literature, and also expand its critical and methodological dimensions. By comparing the different ways The Henry Ford describes and models the world, this project asks KO researchers to confront museums collections’ ideological and narrative powers. It is specifically these powers that legitimize museums’ production and organization of knowledge. Central to this project’s contribution to KO is the emphasis that museum artifacts act as documents with powers of inscription. This argument owes much to the documentality work done by Michael Buckland and Ron Day. Buckland’s foundational 1997 essay explained how museum objects’ semiotic and evidentiary mechanisms of meaning control their existence as documents (Buckland 1997). Day began his most recent book with an important question about Suzanne Briet’s formative 1951 conception of a document as an antelope as documented by scientists (as a stuffed type specimen, for example): “What is forgotten about particular beings when they are subject to (or subjects of) the representation of being, understood as essential universal types (i.e., as class members)?” (Day 2019, 3). What is lost when all the numerous artifacts on display at The Henry Ford are understood as exemplars of innovation? What does “innovation” not say about the RCA-Victor Console Television Receiver, or the Rosa Parks transit bus? These questions have important ideological implications for the organization of knowledge, where classes of documents act “not just descriptively, but prescriptively” (Day 2019, 1). The intersecting domains of knowledge organization and museums must reckon with the full range of museums’ ideological powers. Kevin Coffee, Chief of Interpretation and Education at Lowell National Historic Park, outlined this extent by remarking that, “museums and similar cultural organizations have a fundamental function to define and control visual expressions of major social narratives.” (Coffee 2006, 435) The resulting “concentrations of ideological symbols” legitimize and reinforce certain narratives about society (Coffee 2006, 435). The organization of museum objects produce prescriptive knowledge about culture, society, and time. Moreover, we can only comprehend the full extent of this ideological operation by unveiling the moral orders that dictated the museum’s collections from its start (Woodson-Boulton 2007, 48-49). By making explicit the complicated lineage of The Henry Ford’s technological collections, we can better 313 understand why and how knowledge organizing commitments change in museum institutions. 3.0 Innovation and History Physical exhibition spaces in museums act as a primary principle of division among artifacts. Greenfield Village contains the most popular exhibition spaces at The Henry Ford (visitors may purchase tickets for the museum, the village, or both). Menlo Park— an assemblage of replica structures and authentic artifacts of Thomas Edison’s experimental laboratory—was one of Ford’s first endeavors in creating Greenfield Village. Ford and his treasured Greenfield Village architect, Edward J. Cutler, recreated the building in 1929 with some original elements ( As a canonical American inventor, we can look to Thomas Edison to better understand how the institution of The Henry Ford defines innovation. The assemblage of replicas and authentic artifacts cooperate to materialize a convincing general class of innovative objects. The curious mixture of real and imitation at Menlo Park evince Henry Ford’s presentist image of historicity. Ford’s collecting commitments, which began with the relocation and restoration of historic structures at his ambitious Greenfield Village, hinged on his ambivalent nostalgic attitude, which was a reaction to the mixing of races and cultures forms in American cities. It is cruelly ironic that his pre-urban, pre-industrial nostalgia was in large part due to the changing economic conditions of industrial cities that Fordist capitalism engendered. Ford’s ideology of innovation is best summarized in his infamous 1916 remark that “History is more or less bunk. It’s tradition. We don’t want tradition. We want to live in the present, and the only history that is worth a tinker’s damn is the history that we make today.” Ford’s establishment of Greenfield Village and the then-termed Edison Institute and Industrial Museum (Swigger 2014, 3) operated with these two connected goals: to celebrate a hagiographic American history of industrial progress, and to root acts of progress “today” in these whiggish lessons of the past. This narrative control is ideology—specifically, an ideology of American innovation. The ideological power of Greenfield Village and Menlo Park lies, in part, in how convincing the environment is. The combination of replica and original produces an effective model of historic reality. When Edison himself toured the site with Ford in 1929, he remarked that the site was “99.9% perfect” (Swigger 2014, 78). The Henry Ford even classifies the space itself as an artifact entity, with typical catalog attributes such as creation date (1929, the year Ford reconstructed the building), subject date (1876-1883, the years Edison actively used the laboratory), and materials (wood, glass, metal). In addition to environment-setting objects (furniture, laboratory equipment), the interior contains many of Edison’s original patent models, including his successive printing and duplex telegraphs, electric lights, phonographs, vote recorders, and the electric pen. The artifacts are not organized in a linear progression, but instead are arranged in-situ, placed in naturalistic settings around the interior. Despite the naturalistic physical framing, however, they are described (in both written collection descriptions and scripts enacted by historical reenactors) as a succession of increasingly efficacious and influential inventions. In reality, Edison’s electric pen was a commercial failure and evidence of the nuanced social negotiations of Edison’s many inventions (Gitelman 314 2000, 5); however, The Henry Ford does not represent the artifact in this way. It is instead classified as an innovation in the overarching trajectory of Edison’s innovation. In this way, The Henry Ford defines innovation as a verb and a noun. As a verb, the institution follows the normative conception of what Benoît Godin and others have defined as a “linear model of innovation” (Godin 2006; 2012; 2015). While there are numerous ways of thinking about models, it is most applicable here to conceive of models as narratives. The linear narrative of innovation consists of three discrete stages: a research phase, an applied research and development phase, and a final phase of production and dissemination. This narrative is a simplified representation of segments of reality, or potential realities, and does not (or cannot) account for the complexities of scientific practice, market forces, and supply and demand. In this respect, the narrative of innovation is a concentrated set of symbols—it is ideological. Furthermore, this suggests how we can understand the narrative properties of classification and knowledge organization. 4.0 Communication and Media Technologies, Communicating and Mediating Innovation It should not be surprising that the tens of thousands of objects exhibited at The Henry Ford communicate meanings relationally. As a whole, the museum objects are classed as innovative. Edison’s electric pen is defined as innovative in relation to the other objects in the reconstructed Menlo Park; the assemblage of objects and structures that comprise the Menlo Park site evoke existing grand narratives about Edison as the grand American inventor. As the largest indoor-outdoor museum complex in the United States, it is helpful to examine how objects in the museum proper (as opposed to Greenfield Village) communicate meanings of innovation. Doing so demonstrates how The Henry Ford manifests an organized physical landscape of knowledge comprised of mutually defining media. The permanent exhibit “Your Place In Time” situates the viewer differently than Menlo Park and Greenfield Village. The exhibit space is structured as a chronological journey through twentieth century popular American technologies. It contains objects of immediate personal nostalgia for viewers, from cassette and record players to an interactive MTV music video creation station. Neither the exhibit space nor its artifacts immediately communicate a message of American innovation the way Menlo Park does. Significantly, however, many of the Your Place In Time artifacts are included in the same collections class as Thomas Edison’s inventions. The Macintosh 512K Personal Computer, for example, is classed alongside the Edison electric pen in the museum’s “Information Technology & Communications” digital collection set. In this example, The Henry Ford partially flattens the historical differences between these artifacts to allow their mutual inclusion in an object type category. Here, the definition of “innovation” as a noun takes precedence over the nuanced negotiation of technology history. After all, visitors at the physical “Your Place In Time” exhibit do not bear witness to a reenactment of the innovation process, the way they do in Menlo Park and Greenfield Village. They instead see and interpret inventions—the end-products of the linear model of innovation. 315 The different viewer positions at Menlo Park and inside the museum—where the former invites viewers to witness reenactments and engage with physical spaces and artifacts, and the latter invites viewers to look at artifacts and identify with them based on their own personal experiences—create slippages of meaning for understanding innovation. As the viewers visit the different exhibit spaces at The Henry Ford, they engage with narratives that classify innovation as a thing and a process. In the next section, we will see how The Henry Ford additionally classifies innovation as a trait held by a person. 5.0 Rosa Parks, Innovator The Henry Ford’s evolution should be contextualized in a larger trajectory of public history and technology museums. As technology historians have explained, the field began to question its pervasive and deterministic narratives as early as the 1980s (Staudenmaier 2002, 168-181). The Society for the History of Technology concretized more critical and reflexive methodologies during a 2007 workshop sponsored by the organization and the National Science Foundation. Colin Divall and David Edgerton both began publications at this workshop that asked the field to mobilize a more critical, conceptual framework for the domain (Divall 2010; Edgerton 2010). Edgerton in particular critiqued existing scholarship for focusing too myopically on technological novelties—innovations. Edgerton urged scholars to analyze materials that communicate something about popular understandings of technologies as they emerge in socially, politically, and geographically situated times (Edgerton 2010). Applying this more dynamic imperative to public history, Divall quoted past president of The Henry Ford Harold Skramstad: “…the fundamental challenge is to design exhibitions that have a clear and coherent intellectual intent while at the same time providing engaging individual experiences” (Skramstad cited in Divall 2010, 957). In other words, technology history museum professionals must recognize their own agency in creating complex analyses of technologies, which then inform audience experiences. This co-productive turn toward museum and audience dialogue has affected other public history sites in the United States such as Lowell National Historic Park (Goldstein 2000, 129-137). It is no mere coincidence that The Henry Ford publicly expanded its civil rights artifact collections beginning around 2001, the same era these conversations engaged technology historians. It is also no mere coincidence that this focus on civil rights artifacts coincided with increased support for Greenfield Village’s African American Family Life and Culture program and partnership with the emerging National Arab American Museum (in 2000 Dearborn’s population was more than 28% Arab American; Detroit’s population was more than 80% Black). As The Henry Ford changed their commitments to meet evolving community needs, their collections came to include an American Democracy and Civil Rights focus area. In 2001 The Henry Ford purchased and restored the “Rosa Parks Bus”—the Montgomery, Alabama city bus on which Parks initiated the Montgomery Bus Boycott in 1955. The bus is the apex holding of The Henry Ford’s permanent “With Liberty and Justice for All” exhibit. An oft-reproduced Pete Souza photograph of Barack Obama sitting alone in the bus in 2012 demonstrates the material and visual power of the artifact (Peralta 2012). Parks holds a distinguished position in the museum’s webpage list of “historic innovators” (as opposed to the museum’s contemporary list of innovators, which 316 includes figures like Bill Gates, Elon Musk, and Steve Wozniak)—besides Parks the list features Henry Ford, George Washington Carver, Thomas Edison, and the Wright Brothers. The museum also now classes innovation by certain social behaviors or “habits”: collaboration, breaking rules, learning from failure, remixing, and being curious ( All of these classes deploy a certain linear rhetoric of social progress—they infer new relationships (collaboration, remixing), a progression of standards (breaking rules, learning from failure), and new applied thought processes (being curious). This classification of innovation is socially prescriptive, but ethically and morally ambivalent. The Henry Ford describes Parks as a specific kind of innovator, and one that distinguishes her immediately from the likes of Edison and the Wright Brothers: “…her simple, spontaneous act embodies the notion of social innovation—that a new idea or way of doing things can have such far-reaching impact, that it renders old ways obsolete and radically alters how people think about themselves, their social interactions, and their place in the larger world.” ( How is this description similar or dissimilar to the museum’s classification of innovation by behavior type? The most appropriate class to categorize this description of Parks is “breaking rules.” Her act of protest “radically” changed others’ behaviors in a new fashion. What about collaboration? This descriptive web page text notes Parks’ attendance at the Highlander Folk School civil rights training during the summer of 1955, but on the whole it canonizes Parks as a unique visionary of the Civil Rights movement: “Many consider her singular act of protest to be the event that sparked the Civil Rights movement […] her flawless character, her quiet strength, and her moral fortitude caused her act to successfully ignite action in others.” ( The text’s uncited quotation of the iconic phrase “they had messed with the wrong one” additionally suggests a move away from failed protests or tactics heretofore in the Civil Rights movement—an echo of The Henry Ford’s innovative behavior class “learning from failure.” Curiosity and remixing are similarly inferred, with the former attributed to her marriage to Raymond Parks and subsequent exposure to the Civil Rights movement, and the latter attributed to her combination of extraordinary moral characteristics. This example shows how The Henry Ford uses a flexible classification structure to appropriately situate the creators of its disparate collections. 6.0 Conclusion The Henry Ford Museum of American Innovation has evolved significantly since Henry Ford’s death, both in terms of its mission and collecting principles. The broad class of “innovation” has allowed the museum to expand its collections and exhibits to meet the evolving needs of its surrounding communities and the evolving expectations of technology history and museum studies. By comparing how the museum accounts for disparate definitions of “innovation,” we can see just how flexible the museum’s ideological classification needs to be. At the same time, this classification flattens the ideological differences between these figures and unites them on a singular trajectory of American history. This paradoxical classing—where innovation means something very 317 temporally coherent, and so many characteristics that it means little at all—allows us to question the ideological power prescribed by museums. Meaning in museum spaces takes place at the negotiation between the institution and viewer agency. The settings for these interactions—whether they are assemblage spaces of authenticity and replica, or more traditional museum exhibit spaces—play a significant role in what meanings are produced. At The Henry Ford, institutional agents must account for highly disparate settings (a historic village and interior museum) and collections of artifacts. Under the broad, flexible, and at times contradictory class of “innovation,” The Henry Ford must bring together these settings and artifacts. To do this effectively, the institution projects a grand narrative of American history. Narrativity is key here. In the organization of historical moments, narrativity creates a necessary flow. As Ron Day recently explained, “The theoretical construct of the past as continuous, much less returnable, is an explanation that depends on narrative, historiographical, conventions…Time must be seen as continuous in order for component parts to be retrieved from its series” (Day 2019, 106). Sites of public history and technology, such as The Henry Ford Musem of American Innovation, offer complex opportunities to unveil these narrative—and ultimately ideological—mechanisms. As KO research continues to examine such sites, we should critically consider the fundamental positionality of any KO system. References Buckland, Michael. 1997. “What Is a Document?” Journal of the American Society for Information Science 48, no. 9: 804–9. Coffee, Kevin. 2006. “Museums and the Agency of Ideology: Three Recent Examples.” The Museum Journal 49: 435–48. Daston, Lorraine. 2007. Things That Talk: Object Lessons from Art and Science. New York, NY: Zone Books. Day, Ronald E. 2019. Documentarity: Evidence, Ontology, and Inscription. Cambridge, MA: The MIT Press. Divall, Colin. 2010. “Mobilizing the History of Technology.” Technology and Culture 51, no. 4: 938–60. Edgerton, David. 2010. “Innovation, Technology, or History: What Is the Historiography of Technology About.” Technology and Culture 51, no. 3: 680–97. Gill, Melissa. 2017. “Knowledge Organization within the Museum Domain: Introduction.” Knowledge Organization 44: 469–71. h Gitelman, Lisa. 2000. Scripts, Grooves, and Writing Machines: Representing Technology in the Edison Era. Palo Alto, CA: Stanford University Press. Gitelman, Lisa. 2008. Always Already New: Media, History, and the Data of Culture. Cambridge, MA: The MIT Press. Godin, Benoît. 2006. “The Linear Model of Innovation: The Historical Construction of an Analytical Fram.” Science, Technology, & Human Values 31, no. 6: 639–67. Godin, Benoît. 2012. “‘Innovation Studies’: The Invention of a Specialty.” Minerva 50, no. 4: 397–421. Godin, Benoît. 2015. “Models of Innovation: Why Models of Innovation Are Models, or What Work Is Being Done in Calling Them Models?” Social Studies of Science 45, no. 4: 570–96. Goldstein, Carolyn M. 2000. “Many Voices, True Stories, and the Experiences We Are Creating in Industrial History Museums: Reinterpreting Lowell, Massachusetts.” The Public Historian 22, no. 3: 129–37. 318 Hajibayova, Lala, and Kiersten F. Latham. 2017. “Exploring Museum Crowdsourcing Projects Through Bourdieu’s Lens.” Knowledge Organization 44: 506–14. Peralta, Eyder. 2012. “President Obama Sits In Rosa Parks Bus.” National Public Radio, April 19, 2012. Staudenmaier, John M. 2002. “Rationality, Agency, Contingency: Recent Trends in the History of Technology.” Reviews in American History 30, no. 1: 168–81. Swigger, Jessie. 2014. “History Is Bunk”: Assembling the Past at Henry Ford’s Greenfield Village. Amherst: University of Massachusetts Press. Szostak, Rick. 2017. “A Grammatical Approach to Subject Classification in Museums.” Knowledge Organization 44: 494–505. Turner, Hannah. 2017. “Organizing Knowledge in Museums: A Review of Concepts and Concerns.” Knowledge Organization 44: 472–84. Woodson‐Boulton, Amy. 2007. “‘Industry without Art Is Brutality’: Aesthetic Ideology and Social Practice in Victorian Art Museums.” Journal of British Studies 46, no. 1: 47–71 Catalina Naumis-Peña – Universidad Nacional Autónoma de México, México Hugo Alberto Guadarrama-Sánchez – Universidad Nacional Autónoma de México, México Luis Enrique Sánchez-Rodríguez – Universidad Nacional Autónoma de México, México Rosa de Guadalupe Hernández-Villeda – University of Copenhagen, Denmark Terminological Relations of a Thesaurus for University Cultural Infrastructure Terms1 Abstract: The objective of this work is to define the terminological relations of the University’s cultural infrastructure with the purpose of developing a thesaurus, in Spanish, that contributes to the thematic organization of a database. This will be used for a better communication and understanding among organizers of cultural activities in a university environment. In order to achieve this project, we used a descriptive method integrating techniques of observation. We completed interviews and analyzed the existing literature by using the factual records as instruments. Different worksheets and a questionnaire were also used to obtain a domain consisting essentially of cultural spaces such as auditoriums, libraries, cinemas, esplanades, outdoor forums, museums, concert halls, conference rooms, audiovisual projection rooms, multipurpose rooms, dance halls, music halls and theaters, in addition to their relationship with technical resources and areas and attributes. Thus, obtaining an arborescent structure for the development of a thesaurus destined for the university environment. 1.0 Introduction The purpose of this study is to define the terminological relationships on a university cultural infrastructure, in order to develop a thesaurus, in Spanish, that contributes to the thematic organization of an information system. The thesaurus will be used to achieve a better understanding and communication among the organizers of different cultural activities in a university environment of large dimensions ⸺⸺one that has many campuses in a same country or even in others (Universidad Nacional Autónoma de México. UNAM) The organization of cultural activities in the university implies an acquaintance with the spaces, the location and the technological resources that each enclosure possesses. In this way, deciding the correct place to carry out the diverse activities becomes easier. However, this paper focuses on solving this communicative gap through organized and representative terms. The comprehension of the field of action and the elements that characterize it implies an intellectual effort that requires an in-depth study of conceptualization though terms that represent and directly interrelates them. University cultural spaces are of two types: professional spaces for a given activity or multipurpose spaces. The acts to perform in the venues are, but not limited to, presentation of plays, concerts, dance shows, screening of films or videos, university protocol events such as awards, conferences, seminars, with each of these actions including different supporting elements. 1 This is part of a project supported by the PAPIIT IT400318 resources 320 Thus, in the communication between the various university users of cultural spaces it is essential to use the terminological designations that are understandable, both for organizers, and the participants in the diverse acts in order to avoid the errors that human communication can cause. “The purpose of a thesaurus is to guide the indexer and the search engine to select the same preferred terms or combine the preferred terms to represent an assigned theme” (ISO 2011, 12). The concept of indexing language to represent documents in databases is used in Information Science, not only to index documents but also to structure the database (Fugmann 1992). 2.0 Methodology To develop the thesaurus a combination of different methods was used: description (scouting) of the spaces and their infrastructure, interviews with decision makers and technicians, design and testing of the questionnaire, analysis of terminology in interviews, content analysis in literature on cultural venues, comparison of the terms in specialized thesauri in the topics. All this was done to gain acknowledge of the domain in which a thesaurus operates; “thesaurus was on the agenda, but the design was to be based on the results of the domain study” (Likke 2001, 774). 2.1 Art & Architecture Thesaurus (Getty Research Institute) The consultation of thesauri related to the subject before undertaking the construction of a thesaurus is mandatory. In the case study on cultural venues, the most representative is the Art & Architecture Thesaurus (AAT) of the Getty Research Institute. This thesaurus has a published translation in Spanish; however, we observed that the approach is centered in the Anglo-Saxon context with hierarchical and faceted terms related to art, architecture and other material cultures, associated concepts, periods and activities, etc. Both the hierarchies used and the terminological approach are not focused on a university environment, where the thesaurus operates from, since the AAT reflects terms linked to the arts, the entertainment and shows corresponding to a wider cultural spectrum. The relations in AAT are wide and confusing for the communicative situation that we are trying to solve with the thesaurus that we are building. This thesaurus can then be a contribution to transliterate Anglo-Saxon terms in an Ibero-American context. “The need for the definition of an international “glossary” for architectural heritage and by extension, for cultural heritage has arose. This need has originated from the fact that there are various methodologies regarding heritage documentation. Various vocabularies and thesauri are used in the field of conservation, while the variety of “uniqueness” of each cultural artefact turns its categorization into a difficult endeavor. In addition, not only spatial information needs to be standardized, but also the related metadata. Multilingualism, the translation of terms and the existence of many local words for the description of the same object, are the most important challenges when structuring vocabularies. Therefore, the attempt to describe an object with terms understandable to every culture and the adoption of a common “linguistic ground” meets a number of difficulties” (Maietti 2018, 107). The purpose of the thesaurus to be developed, unlike Paul Getty's thesaurus, is to group and organize the terms that describe cultural spaces, technical resources, areas and attributes as the main categories from which other ramifications are derived. These necessities make unbeatable differences not only in the main categorization, but in the terms that represent the concepts in a university cultural environment that does not include mass entertainment, for example, at a cultural level. 321 In this way, the differences and needs in the terminological structure must be remedied from different disciplinary contributions since a single approach is not enough to explain social, natural, economic and cultural phenomena. Hence, it is necessary to broaden the horizon with conceptual elements representative of the interests in indexing and information retrieval (Hjørland 2002a). “It is really important to know the most important information sources in one or more domain at a rather detailed level. It has a strong relevance for practical information work” (Hjørland 2002b, 425). 2.2 Definition of the University Cultural Infrastructure Because the purpose of this thesaurus is to organize the terminology of an information system, the next step was to define and understand the concept that would properly name the information system that was developed. After studying and analyzing the scope of operation, University Cultural Infrastructure was chosen. Infrastructure can be understood, in a general aspect, as the set of properties and resources that some individual, company, or institution has. Nonetheless, there are certain definitions from institutions and initiatives involved in national cultural development, in the Anglo-Saxon world. For the government authorities in London, cultural infrastructure is the grouping of creative work sites, performing arts rehearsal spaces, music recording studios, film and television studios (Khan 2019). While the national organization dedicated to cultural development in Canada, considers that cultural infrastructure is comprised of resources and spaces built specifically or adapted for use. Examples of the spaces that are part of the cultural infrastructure are the performing arts centers, galleries and museums. (CCNC Special Editions 2009). Being that for the Australian government the cultural infrastructure is considered as the buildings built or acquired to create, share and enjoy the artistic and cultural activities, such as theaters, galleries, museums, libraries, archives, community rooms, cinemas, public art and spaces for outdoor events. (Create New South Wales 2020) From the information presented above, it can be deduced that, in Anglo-Saxon countries, cultural infrastructure refers specifically to movable and immovable property that are conserved, acquired or adapted for performing arts such as dramatic performances, music and dance, such activities are carried out ⸻⸻⸻usually in dance halls, concert halls, auditoriums, theaters, galleries and museums. In a university environment the cultural infrastructure also includes other types of spaces that are used to carry out activities related to the academic life. For the purposes of this work, the University Cultural Infrastructure (ICU) is defined as all those cultural spaces where artistic and cultural activities such as dance, theatrical performances, film projections, conferences, concerts, art shows, etc. are held and where technical resources ⸺⸺movable goods; the set of tools, instruments and artifacts used to perform cultural activities⸺⸺ are required. These resources can be specialized and used for various events. 2.3 Exploration of spaces and resources A first approach to the domain that would comprise the database and the terminology used was defined by observing the cultural enclosures and their components, supported by the resulting interviews. We classified the types of spaces based on the resources that 322 characterize them and we used an initial terminology compared to the definitions in specialized works such as glossaries, dictionaries, encyclopedias, controlled vocabularies, university academic works, videos, and images. The plans of the university units were also consulted to obtain different data. Thus, we selected and discarded the consulted works and the digital contents based on the level of description of the distinct entities and also on their compatibility with the ICU. We developed a questionnaire to interview the authorities and managers of university spaces, that was first tested in certain representative venues. Additionally, we compared it to other questionnaires applied in cultural information systems of other national and foreign institutions. Based on this, we visited the university units in Mexico City and in Mexico’s different states where diverse facilities are owned. This questionnaire was also sent to dependencies abroad in order to also include them in the information system. As a result of the combination of the descriptive and exploratory methods and their application to the definitions of the existing enclosures, we carried out a terminological integration to form a specialized vocabulary. This allowed for a first categorization, typifying the cultural spaces to develop the domain trees in separate groups: auditorium, library, cinema, esplanade, outdoor forum, museum, concert hall, conference room, audiovisual projection room, multipurpose room, dance hall, music room, theater. 2.4 The ICU domain trees After having reviewed and selected the ICU domain terminology, domain trees were designed for each cultural enclosure. This allowed us to identifying the differences and similarities between the elements that compose the different spaces. The first venue analyzed was the theater, essentially a cultural space par excellence, therefore, it was the starting point to determine the terminology of the rest of the cultural spaces that were similar, such as the auditorium, the cinema, the concert halls, the halls of conferences, etc. These different enclosures have components and elements in common (room, armchairs, hallways, etc.) the differences lie in the architectural design and the alterations in scale, in addition to the activities that are carried out in these cultural spaces. Although the university’s cultural spaces include enclosures and multipurpose sites to develop the activities, it is necessary to distinguish each of them both for their particular characteristics and their technical resources. This is achieved through visiting and being acquainted with the distinct enclosures, observing their characteristics and properties based on the terminology that represents each one of them. Some terms are usually specific to an area; in that case these terms become the qualities that clearly distinguish some areas from the rest. An arborescent structure should avoid, as much as possible, the repetition of words and denominations. Consequently, two cultural spaces should not be in the same field or level since each one is different for its attributes and for the activities that are carried out in them. Domain trees must be based on facts and documents that prove the existence of the terminology; however, there is subjectivity bias in the interpretation of the contents and the lexicons. “The pragmatic approach to classification through meaningful units of knowledge must be based on recognition of the obvious truth that any single unit may be meaningful in any number of different relationships depending on the immediate purpose. Thus, it is the external relations, the environment, of the concept that are all- 323 important in the act of classifying… Relationship is not a universal, but a specific fact unique to the things related, and just as these relations reveal the nature of relata, so the relata determine the character of the relationship. (Shera 1951, 83-84) Shera’s statement is perhaps the argument that justifies the need to elaborate a special thesaurus to demonstrate the reason why a pragmatic classification highlights the necessary properties of a concept, in a specific information system. The relations that are developed for an information system are disimilar from those that stand out for another system whose objectives are different. (Kwaśnik 2019) Figure 1 Comparison between the domain trees of the audience spaces in a theater and an auditorium. Most of the similarities between the terms that name the spaces were found in their public areas. Thus, it became evident that we could not simply classify the spaces based on their type of enclosures, rather we had to begin to classify them based on the different relationships with the areas into which they are divided.2 On the one hand, in the case of outdoor spaces, such as open-air forums and esplanades, we identified that these have only very few elements that support the realization of the cultural activities. In this way, despite being open places they must have a specific category that identifies them. Hence, it is important to categorize their elements in order to link them with the technical resources that support outdoor presentations, such as concerts, fairs, exhibitions, festivals, etc. On the other hand, there are spaces that are part of larger architectural ensembles. For example, museums, in their architectural program, can include, in addition to exhibition spaces, a library and even an auditorium. In this sense, to repeat the arborescent structure of a museum’s auditorium or library would be redundant. 2 There are certain terms used in Spanish that do not have a clear equivalence in English; there are not definitions or names for specific parts of auditoriums and theaters. 324 The close relations that keep within them a set of the different cultural spaces, more importantly, keep the similarities between the elements that compose them and the resources that are integrated in them. These diverse elements showed that there was a necessity to organize the general arborescent system in a different way. That is to say, the organization did not have to be based on the different existing cultural spaces of the University. We had to relate them in such a way that the characteristics and resources, of each of the spaces, would be distinguished through the sum of what they do and do not possess in their exteriors and interiors. The main categories should have the conceptual capacity to characterize the spaces by their similarities and differences without repeating the component elements of each space, because this would also be useful when programming, for example: an auditorium that has external communication on stage, such as theaters. By including this auditorium related to an area outside of the stage, it shows the ability to introduce elements into the auditorium that can make it function as a theater. It was necessary to create a simple categorization that highlighted the common elements in different spaces that worked to bring programming possibilities together and did not repeat the common characteristics in each space. Additionally, having developed an arborescent structure for each cultural space allowed us to understand its composition and elements to establish the appropriate connections between the different categories, classes, and subclasses, in order to create a level of integration in the terminology. For this purpose, a conceptual universe was created. This includes three categories: A) Cultural spaces, B) Technical resources and C) Areas and attributes within the domain of the ICU. Some of these elements are closely connected depending on whether they are linked according to their degree of affinity. Figure 2. Conceptual universe of the thesaurus of the ICU and its three main categories. 325 3.0 Discussion and results Each category contains an arborescent structure (which includes related elements) that is associated with other thematic ramifications, but without losing its own hierarchical composition. In this sense, associations are the way in which the terms of each category are intertwined. In this sense, the related terms (RT) can cover the relationship “from many to many” elements to connect them to the categories and they also cover the relations “from one to many” to link the associations that are shared in more than one term. Specific terms, however, (ST) with a “one-to-one” principle are also necessary, according to the type of link between space, resources, areas, and attributes. The category of cultural spaces (A) is integrated by all those places where cultural activities are programmed and carried out. Subject to this category are the classes and subclasses that describe the architectural elements that are understood as the fixed spaces physically delimited within a general area or space and within the components of the architectural elements; this can be fixed or semi-fixed and support the realization of the main activity, depending on the space in question. Below, we present a diagram of the connections or associations of the main three categories. We specifically highlight the Cultural Spaces because they are the ones that have both the areas and the places where technical resources are used. Figure 3. Relations between the general elements of each category 326 3.1 Evaluation Once the proposal was developed and inserted into the software, the related and hierarchical descriptors were obtained. The indexes created with the help of the software were handed to the Cultural Diffusion Coordination of the UNAM which is directly responsible for evaluation of the information system. After their evaluation, with the help of their observations, the thesaurus’ final validation will proceed. As for the synonymy relations and the meanings of the descriptors, they were incorporated in some cases necessarily to present the structure in a simpler way to the evaluators who would be in charge of the operation of the information system. “ The relationship of intersection refers to the relationship that the meaning of one word intersects with the meaning of the other word to a certain extent. In this case, the two words are at the same level. There is no upper term, nor lower term, which is the case in the previous relationship. Accurately speaking, most semantic synonyms are also in this relationship. The difference between intersected semantic synonyms and intersected contextual synonyms lies in the fact that the intersected part of semantic synonyms is the whole part of one meaning of the synonymous words, whereas the intersected part of contextual synonyms is only a part of one meaning of the synonymous words” (Zeng 2007, 35). Although the categories are grouped according to their common elements, through intersection, the relationships that are created to give functionality to the thesaurus can reflect a greater sense of complexity as is the case with the synonyms of the domain. Since in an abstract sense, they share the same level despite the order of preference by the community. Synonyms, then, were not included in this first stage because the activity organizers can express their preferences without feeling influenced by other contexts. “ As mentioned above, a core problem in IR is the adequate“mental modeling ”of subject literatures. What categories and concepts are we talking about? In interacting with subject literatures, users are interacting among other things with (1) Different kinds of knowledge fields with different social and cognitive organization. (2) Different languages for special purposes (LSP) (3) Different kinds of research methods (4) Different kinds of, among other things, primary, secondary and tertiary documents (5) Different patterns of cognitive authority. (6) Different semantic distances between questions and documents (cf. Brooks, 1995)” (Hjørland, 2002b). Although in this paper we only present the hierarchical and associative relationships, in the thesaurus, which is already elaborated, there is a more complete version that includes synonymy relationships, the meaning of the terms, and the scope notes that define the meanings of the terms in the field of operation. Through the university body in charge of Cultural Diffusion, composed of specialists in the field responsible for the enclosures, technicians are being consulted to validate and obtain a quality intellectual product, capable of responding to the needs of the communities concerned. 4.0 Conclusions The three main categories in which the terminological relationships of the University Cultural Infrastructure are organized were achieved after several conceptual approaches, particularly, after confronting and comparing the domain trees of the different facilities registered in the university environment. The initial categories for organizing the thesaurus were the types of venues that the university noticed, after trying to relate them to each other, were not adequate. We understood that this was because the types of venues have similar characteristics and that 327 are not of help for their definition. The enclosures are better defined through the areas and the attributes that are related to them. Domain trees are useful for starting to develop hierarchical relationships between terms; however, the thesaurus must present an integrating tree development that is not the sum of each domain tree. Thesauri are specific structures in an area of operation because their conformation supposes facilitating, understanding, and communication by studying and reflecting the terms used by the community for which they are intended. They also reflect the characteristics that are of interest to highlight to fulfill the system's objectives. The relationships around significant knowledge units are of a pragmatic classification; this is crucial to determine when structuring a thesaurus to give clarity to an information system. The utility of the University Cultural Infrastructure Thesaurus is operative to organize the information system on university events, taking advantage of the spaces and resources that are available. Its conformation was a collaborative intellectual work to solve the communicative situation; we did this by conceptualizing and relating the characteristics of the environment where the information system operates. References CCNC Special Editions. 2009. “Cultural Infrastructure: An Integral Component of Canadian Communities.” Creative City News 5: 1. Create New South Wales. 2020. Cultural Infrastructure. Fugmann, Robert. 1992. “Indexing Quality - Predictability versus Consistency.” International Classification 19, no. 1: 20-21. Hjørland, Birger. 2002a. “Epistemology and the Socio-Cognitive Perspective in Information Science.” Journal of the American Society for Information Science and Technology 53, no. 4: 257-270. Hjørland, Birger. 2002b. “Domain Analysis Information Science: Eleven Approaches-Traditional as well as Innovative.” Journal of Documentation 58: 422-462 ISO. 2011. ISO 25964 – The International Standard for Thesauri and Interoperability with other vocabularies. Part 1 Thesauri and Information retrieval. Geneva: ISO Khan, Sadiq. 2019. Cultural Infrastructure Plan a Call to Action. Kwaśnik, Barbara H. 2019. “Changing Perspectives on Classification as a Knowledge- Representation Process.” Knowledge Organization 46: 656-667. Lykke, Marianne. 2001. “A Framework for Work Task-Based Thesaurus Design.” Journal of Documentation 57: 774-797 Maietti, Federica, Marco Medici, Federico Ferrari, Anna Elisabetta Ziri, and Peter Bonsma. 2018. “Digital cultural heritage: Semantic Enrichment and Modelling in BIM Environment.” In Digital Cultural Heritage: Final Conference of the Marie Skłodowska-Curie Initial Training Network for Digital Cultural Heritage, ITN-DCH 2017. Switzerland: Springer, 107. Shera, Jesse. H. 1951. “Classification as the Basis of Bibliographic Organization.” In Bibliographic Organization: Papers presented before the Fifteenth Annual Conference of the Graduate Library School July 24-29, 1950, edited by Jesse H. Shera and Margaret E. Egan, Chicago: Univ. of Chicago Press, 72-93. Zeng, Xian-mo. 2007. “Semantic Relationships Between Contextual Synonyms.” US-China Education Review 4, no. 9: 33-37. Inger Beate Nylund – OsloMet - Oslo Metropolitan University, Norway Using the Concept of Warrant in Designing Metadata for Enterprise Search Abstract: Metadata is an issue of growing concern in enterprise search. Several authors argue that adding metadata can improve the findability of content (Cleverley and Burnett 2015; Schymik et al. 2015; Stocker et al. 2015; White 2016). This paper proposes to use the concept of warrant when designing metadata for enterprise search. The paper combines the concept of warrant (Barité 2018; Beghtol 1986) with other concepts to analyse eighteen articles in library and information science journals from the period 2000-2019. The articles report on the design of thesauri, taxonomies, classification schemes, metadata or ontologies in work domains. The results indicate that the main warrants used in the articles are information sources’ warrant, task performers’ warrant, and work context warrant. The warrants illuminate where and how the concepts and terms in the designed systems are grounded. We argue that the concept of warrant is useful to analyse and choose different perspectives when designing metadata, but that other concepts and frameworks are needed for evaluation and implementation of metadata in enterprise search. 1. The aim and scope of the study This study is concerned with the design of metad