Content

Daniel Libonati Gomes, Thiago Henrique Bragato Barros, The Bias in Ontologies: An Analysis of the FOAF Ontology in:

International Society for Knowledge Organziation (ISKO), Marianne Lykke, Tanja Svarre, Mette Skov, Daniel Martínez-Ávila (Ed.)

Knowledge Organization at the Interface, page 236 - 244

Proceedings of the Sixteenth International ISKO Conference, 2020 Aalborg, Denmark

1. Edition 2020, ISBN print: 978-3-95650-775-5, ISBN online: 978-3-95650-776-2, https://doi.org/10.5771/9783956507762-236

Series: Advances in Knowledge Organization, vol. 17

Bibliographic information
Daniel Libonati Gomes – Federal University of Pará, Brazil Thiago Henrique Bragato Barros – Federal University of Rio Grande do Sul, Brazil The Bias in Ontologies An Analysis of the FOAF Ontology Abstract: Knowledge Organization Systems (KOS), like thesauri, classification schemes, taxonomies, or ontologies, are essential tools for the organization and representation of information in various contexts and are often understood as neutral tools without any bias. However, we can argue representing information, even unconsciously, we can describe some form of prejudice, that is, what is a bias, by the person who creates the system. This selection of elements represented is required in any KOS since every representation has a specific function that is related to a context. Ontologies are an excellent example of this because, as Guarino, Oberle, and Staab (2009) state, these KOS need to delimit their goal to enable reuse and avoid problems arising from excess of ontological commitment. With that in mind, we seek to discuss possible bias that a KOS may have, but focusing on ontologies and taking as our object of analysis the Friend of a Friend (FOAF) ontology. Thus, we characterized this research as descriptive, with a qualitative approach. The objective of the review is to understand the implications of bias in these KOS, also seeking to discuss how Knowledge Organization, as a field of study, can act in the development of tools that recognize its own bias and still be able to perform its functions. For the analysis, the theoretical framework of Discursive Semiotics is used, which studies the formation of meaning as a phenomenon from a model called Generative Trajectory of Meaning (GTM). From this perspective, we can understand bias as a product of semiotic processes – figurativization, thematization, and discursivization (Greimas and Courtés 2013) – involving the KOS developer social-cultural contexts (Gomes and Barros 2019a, 2019b). From this theoretical understanding, all the elements that constitute the FOAF ontology – classes and properties – are analyzed, as well as its documentation available online. We concluded that bias is an inherent feature of a KOS and that Knowledge Organization could focus on conducting studies on technologies that enable information retrieval, taking into account this aspect of its tools. In order to : (1) go beyond the KOS bias, using, for example, "see also" connections that act as hyperlinks to systems with other biases that best fit the user's needs; or (2) "learn" the various perspectives that exist on the same topic, represent them in a KOS and drive users to those best suited to their needs – in which case issues such as Machine Learning and Artificial Intelligence should enter the discussion, making this tools more semantic enriched. 1.0 Introduction This study aimed to discuss the consequences of the presence of one or more biases in Knowledge Organization Systems (KOS), taking ontologies as an example of the presence of this bias. For this, we analyzed the elements that make up the Friend of a Friend (FOAF) ontology, which aims to represent individuals and their relationships, as its name already implies. We also seek, from the analysis, to demonstrate the importance that understanding and explaining the bias of a KOS can have since considering it can be fundamental for efficient information retrieval to occur. 237 2.0 The method For the analysis of the FOAF ontology, we used the theoretical tools of Discursive Semiotics, considering that, based on this theory, it is possible to understand the formation of the meaning of a discourse, which, in the present case, is an ontology. By discourse, we mean the concretization, in language, of a particular social, historical, ideological, and environmental context (Possenti 2009). Thus, Discursive Semiotics studies the mechanism by which a given discourse is shaped, and when applied to ontologies or KOS in general, it can reveal some important aspects of its constitution, especially those related to aspects that shape it (Gomes and Barros 2019b). Discursive Semiotics, for didactic reasons (Greimas and Courtés 2013), adopts a model called Generative Trajectory of Meaning (GTM), which organizes the formation of discourse in two levels of depth – that is, it goes from the semantic level to the discursive. It is essential to highlight that there are several GTM models, and we adopted in this work the one developed by Greimas and Courtés (2013). The GTM has two aspects: (1) semionarrative structures and (2) discursive structures. A semiotic analysis starts from the discursive structures – which organize the contextual elements that form the discourse and make it understandable – and go into the semionarrative structures – formed by elements called actants, which can act to each other and transforming it. The actants, in turn, are formed by even smaller and completely abstract units, called semes, which gain meaning from their interaction with opposite, complementary and contradictory semes. The interaction between semes generates what we can understand as the "meaning" or “particular meaning” of a given the word. As the focus of this work is the bias that possibly exists in an ontology, we chose to pay more considerable attention to the GTM's discursive level. At this level, there is a series of operations, which take place from semionarrative structures, covering them with the contextual component mentioned above – the social, historical, ideological, and circumstantial context. Operations that occur at this level are: • Discursivization: it makes explicit those involved in the discourse, forming actors (actorialization), as well as space (spatialization) and time (temporalization) in which they were enunciated; • Figurativization: the actors gain a semantic investment, becoming figures; that is, they we can understand as something real. • Thematization: it is an abstract thematic covering on which the figures act. Thus, all operations that occur from semionarrative to discursive structures come into contact with some linguistic system, thus forming lexemes (in a way, words), from which we interact and put the discourse into practice, considering a given context. However, this explanation is still too general and was designed especially for discourses in action, which is not the case with ontologies. These types of KOS were built generally, because of their reuse (Gruber 1995) and the domain representation and, because of that, ends up being quite a generalist. In previous works (Gomes and Barros 2019a), we highlight that the concepts that constitute ontologies can be understood, like any word, as lexemes formed from the semiosis operations explained above. Thus, the concepts can be considered figures within the represented discursive universe; that is, they went through the process of figurativization so that they have a precise semantic coating. Thematization occurs based 238 on the domain itself that is being represented in the ontology, considering that the understanding of the concepts is only possible from the abstract coating given by the themes. Finally, based on the existence of the ontologist (after all, it is they who constructs the discourse, the ontology), it is possible to affirm that actorialization, temporalization, and spatialization also occur. These processes allow us to situate concepts from the referents they seek to represent, together with the perspective of those who produce the discourse. The following figure shows how the elements that make up an ontology (including its developer) also in light of the level of GTM's discursive structures: Figure 1 – The GTM’s discursive structures on ontologies Therefore, the discussion carried out in this work we based in the idea that lexemes form an ontology gain meaning from the performance of a series of semiotic operations involving the formation of figures – which gain meaning because they are linked to some theme – and the formation of the ontologist as a semiotic actor, present in a specific time and space. Thus, the study of the FOAF ontology started from the semiotic approach. This research we characterized as descriptive and qualitative. The FOAF ontology was analyzed from its documentation 1 , which explains its objectives and constitution (classes and properties), and used the concepts presented in this section as a theoretical foundation. The analysis sought to clarify how the semiotic processes that occur in the GTM discursive structures end up generating a bias in the ontology, even against the will of the ontologist. Initially, we studied the general information of FOAF, such as its objectives and used, then we move on to the study of the classes and properties that comprise it. To check for the presence of bias in the ontology, we observe which of the classes and properties, as well their descriptions in the documentation, dialog with a more specific context or ideology – for example, which classes or properties represent things 1 FOAF Vocabulary Specification 0.99. Available at: http://xmlns.com/foaf/spec/20140114.html. 239 that are present in a given region of the world. Therefore, a class like foaf: Person2, based on the distinction between what is and is not a person, says much less about the bias of ontology than a property like foaf:gender, which has more significant social implications, given the discussions about gender, so we chose to pay more attention to foaf: gender. Thus, we selected some of these classes and properties to guide the discussion about the ontology bias. Starting from the explanation of the bias present in FOAF, we propose a discussion about how this bias can affect an ontology and the actions that can take so that this phenomenon is not necessarily an intentional problem. 3.0 The FOAF ontology and its bias The FOAF ontology, as stated in its documentation, was developed to connect people and information through the Web, and this information can be anything from documents of any support, data, or even just ideas in someone's mind. For this, FOAF integrates three types of networks: “social networks of human collaboration, friendship, and association; representational networks that describe a simplified view of a cartoon universe in factual terms, and information networks that use Web-based linking to share independently published descriptions of this inter-connected world” (Brickley and Miller 2014) FOAF's terms were divided into three broad categories: (1) Core, formed by terms that involve people and groups regardless of time and technology; (2) Social Web, formed by terms related to activities carried out on the Web; and (3) Linked Data utilities, formed by terms that can be useful for the Web community to connect data. Despite this division, the documentation explicitly distinguishes only the terms of categories 1 and 2. It is worth noting that this ontology can always be updated, with the insertion of new terms and that old terms, called "archaic" in the documentation, are always maintained in order to enable old forms to become modern again (Brickley and Miller 2014). The following image, taken directly from FOAF's documentation, explains all its classes and properties: Figure 2 – FOAF classes and properties As we can see, some classes and properties are quite general, as is the case with foaf:Agent, defined as something capable of doing something. This class can be used to represent situations in which the being who is acting in a given situation is not exactly a person or group of people (it can be a software bot, for example). A subclass of 2 To reference the names of the classes and properties of the ontology, we have chosen to use the same form as the one present in the FOAF documentation. Thus, something like foaf: Agent (capitalized) is a class, whereas foaf: knows (lowercase) is a property. 240 foaf:Agent is foaf:Person, used to represent people, and they may be alive, dead, or not even exist at all. This broader scope is fundamental in the case of FOAF, which is an ontology that aims to be widely reused in the most diverse situations that involve the connection between people and information on the Web. In the theoretical scope of Information Science, a concept present at FOAF that generates much discussion is represented in the class foaf:Document. We know that there are several perspectives on the concept of a document, such as those presented by Suzanne Briet (1951), Michael Buckland (1997), and Berndt Frohmann (2009). The FOAF documentation states only the following about this class: “The Document class represents those things which are, broadly conceived, ‘documents’” (Brickley and Miller 2014). With this definition of the class, quite broad, it is already possible to perceive more clearly the presence of bias in the ontology, although that specific bias does not generate any negative consequences. The definition of a document is not a common concern in all areas, but it is crucial for Information Science, which has, in this object (Buckland's "information as a thing" (1991)), one of its focuses of study. Thus, the ontology, even if indirectly and without an intention, ends up “taking sides", even if this does not produce harmful effects for what it proposes. However, there are elements in FOAF that are less subtle, and that makes the ontology bias even more explicit. The foaf:gender property, already mentioned, can cover several different perspectives. In a more conservative ideological perspective, there are only two genders, male and female; however, in a more progressive perspective, gender is understood in a much less fixed way. The FOAF documentation says the following about that property: “The gender property relates an Agent (typically a Person) to a string representing its gender. In most cases, the value will be the string 'female' or 'male' (in lowercase without surrounding quotes or spaces). Like all FOAF properties, there is, in general, no requirement to use gender in any particular document or description. Values other than 'male' and 'female' may be used, but are not enumerated here. The gender mechanism is not intended to capture the full variety of biological, social, and sexual concepts associated with the word 'gender'” (Brickley and Miller 2014). In other words, foaf:gender recognizes the diversity of perspectives about gender, so to avoid taking a stand and maintaining a high level of generality in ontology, they choose to leave the term open, without mandatory use in any circumstances, but indicating that the most common way to fill this property is with the strings "male" and "female." The authors themselves make it clear, later in the document, that they are aware of the difficulty of working with the concept of gender: "We have tried to be respectful of diversity without attempting to catalog or enumerate that diversity" (Brickley and Miller 2014). As explained in the previous section, we can explain this phenomenon of bias in an ontology using Discursive Semiotics and taking into account that an ontology is a discourse, the person responsible for developing the ontology, in the case of FOAF, ontologists have a bias. These subjects, being inserted in a given spatial, temporal reality, and being able to act on the things that exist in the world, end up transferring to the ontology a section of all these aspects that shape them. To understand this, we could imagine how the description of foaf:gender could be different 30 years ago, that is, the current description is the result of the temporalization process explained previously, responsible for inserting the actors in a particular temporal reality. The spatial aspect of FOAF's discourse can be seen when comparing the pairs of properties foaf:firstName X foaf:givenName and foaf:lastName X foaf:familyName. In 241 a way, each pair refers to the same thing, however, as the developers themselves claim, the “concepts of ‘first’ and ‘last’ names do not work well across cultural and linguistic boundaries; however they are widely used in address books and databases” (Brickley and Miller 2014), that is, although the two ways of referring to someone's first and last names are valid in some cultures, in others someone's last name is not the same as their family name. For example, in some Eastern countries, the first name is the family name, and the last name is the given name. We can also point out, as another example of the presence of contextual aspects of ontologists at FOAF, the inclusion of the property, labeled as “archaic”, foaf:dnaChecksum (which could be used to verify the data integrity in the DNA transference from a person), created as a joke by the developers. The only objective with this inclusion was to demonstrate the great diversity of properties that could be created to identify someone, some of which, the developers add, “we might find disturbing” (Brickley and Miller 2014). 4.0 The consequences of the presence of bias in FOAF and KOS in general In order to understand how a specific bias can affect an ontology like FOAF, it is worth highlighting some requirements that it must meet. According to Gruber (1995), an ontology must have: • clarity: the concepts present in an ontology must be clear and objective so that the definitions do not depend on social contexts or computational requirements. That is why ontologies are generally developed from a formal language, using logical axioms. Also, in order to facilitate the understanding of ontology by a human being, it is highly recommended that the definitions be documented in natural language (as is the case with the documentation analyzed here); • coherence: the axioms that make up the ontology must be coherent so that those logical inferences can be easily made. There can be no contradictions between the definitions; • extendibility: an ontology must be developed, taking into account that the vocabulary can be reused in some other situation. Thus, the elements that make up the ontology must be open enough so that new terms are inserted without the need to change those already present; • minimal encoding bias: ontologies must be formed by the concepts they want to represent regardless of the computational language used in their development since they are used by different systems of representation and styles of representation; • minimal ontological commitment: an ontology must have a minimal ontological commitment as possible in order to be able to share knowledge and reuse it, in addition to interoperability between systems. The fewer statements about the discursive universe that are made, the better, so that it is preferable, in many cases, that only the necessary (but not sufficient) characteristics of a given concept are made explicit (however, for reasons of clarity, when possible, a complete definition, with necessary and sufficient characteristics, must be provided). An ontology that “take sides” in a very explicit way faces the risk of not being able to meet its objectives efficiently, as this may affect some of the above requirements, 242 more specifically its clarity, extendibility, and ontological commitment. For example, if FOAF adopted only foaf:firstName and foaf:lastName to express the names of individuals, it would be committing itself to a more specific reality, unlike what happens with foaf:givenName and foaf:familyName, which serves the largest share of the world population. Besides, in the case of foaf: gender, the decision not to make this property mandatory or to define default values for it also had the purpose of maintaining the above requirements, which would not happen if the developers were more assertive in their opinion. In the case of other KOS, such as thesauri, the requirements above are different, especially concerning clarity or even ontological commitment. Many KOS must represent a given domain of knowledge in the most transparent possible way, with precise definitions (very different from the general characterization of ontologies). In such cases, the presence of a bias can further affect the retrieval of information through this KOS, as it can affect the way users interact with the information they seek (that is, KOS influences how the user understands the information) or negatively affect the retrieval of information, as how the information was represented may not be consistent with what the user understands. An excellent example of this can be seen in the work of Miranda and Costa (2019), in which the authors analyze the way a bibliographic representation of texts from Umbanda is made, finding cases in which books referring to this religion were inserted in class 133.4 of the DDC system (Demonology and witchcraft). In this situation, the bias of those responsible for classification is quite evident, since, for Umbandists, their belief has nothing to do with witchcraft. Based on these examples and the semiotic process of meaning formation not only of ontologies but of KOS in general, we understand that bias is an inherent element in information representation, although we try to avoid it. Even a very generalist ontology such as FOAF ends up “taking sides” regarding some themes. 5.0 Conclusion In this work, we started from the principle that a KOS can be understood as a discourse, being able to be analyzed by the theoretical tools provided by Discursive Semiotics, which allows us to understand how the meaning of that discourse is constructed. Thus, we analyzed FOAF ontology as a discourse, which carries with it some contextual aspects regarding who develops it (who they are, where they live when they are developing the ontology), so that, even against their will, it ends up transmitting in their elements some of these aspects. In other words, FOAF, like any KOS, is biased, and this is natural. However, it is necessary to think about how Knowledge Organization can deal with this bias since a KOS that transmits a particular idea in a very understandable way can end up negatively affecting information retrieval from its use. The inherent bias in KOS, we believe that to deal with this situation, it is necessary to develop studies focused on technologies that enable the use of bias in favor of the users. An idea can be found in FOAF itself: “If people publish information in the FOAF document format, machines will be able to make use of that information. If those files contain ‘see also’ references to other such documents in the Web, we will have a machine-friendly version of today's hypertext Web” (Brickley and Miller 2014). 243 The use of “see also” references in KOS, in general, could be a way of connecting systems that deal with the same topic but have different perspectives. Alternatively, even within a single system, to support the different perspectives that may exist. Another possible solution, considerably more complex, would be the KOS itself, being part of a more extensive system, having access to what the user tends to search for, and directing search results to those already known needs. In other words, KOS would learn what biases might exist in any topic and make a comparison between what it knows and what its user knows or usually research. In this case, the Knowledge Organization should bring issues such as Machine Learning and Artificial Intelligence to the discussions. Thus, the discussion proposed in this paper aimed to explain how, even in generalist KOS like ontologies, a bias can be evident, which can affect information retrieval. As this bias is natural and attempts to avoid it are not entirely adequate (for reasons explained by semiotics), we highlight that a paradigm shift towards accepting the KOS bias as a way to benefit the user could bring benefits. Since we understand bias as inherent to KOS, this work proposes that the developers of these tools pay attention to their ideas and recognize the wide variety of existing perspectives on the theme that they seek to represent in their systems. With this in mind, they could look for ways to direct users to tools that best fit their needs. This paradigm shift could occur from more straightforward changes, such as the “see also” references, or in the depth of studies on technologies that allow the automation of this process of verifying the bias and directing the user. References Brickley, Dan and Libby Miller. 2014. FOAF Vocabulary Specification 0.99 (01-14-2014). http://xmlns.com/foaf/spec/20140114.html [rdf, xml] Briet, Suzanne. 1951. Qu’est-ce Que la Documentation?. Paris: Édition Documentaires Industrialles et Técnicas. Buckland, Michael K. 1991. “Information as a Thing” Journal of the American Society for Information Science 45, no. 5: 351-60. Buckland, Michael K. 1997. “What is a ‘Document’?” Journal of the American Society for Information Science 48, no. 9: 804–9. Frohmann, Berndt. 2009. “Revisiting ‘What is a Document?” Journal of Documentation 65: 291– 303. Gomes, Daniel Libonati and Thiago Henrique Bragato Barros. 2019a. “A Construção do Discurso em Ontologias: Um Estudo com Base na Semiótica Discursiva.” Informação e Informação 24, no. 3: 78–103. Gomes, Daniel Libonati and Thiago Henrique Bragato Barros. 2019b. “O Discurso em Ontologias: Uma Abordagem a Partir da Semiótica Discursiva.” In Organização do conhecimento Responsável: Promovendo Sociedades Democráticas e Inclusivas, edited by Thiago Henrique Bragato Barros and Natalia Bolfarini Tognoli. Belem: Ed. da UFPA, 372– 81. Greimas, A. J. and J. Courtés. 2013. Dicionário de Semiótica. São Paulo: Contexto. Gruber, Thomas R. 1995. “Toward Principles for the Design of Ontologies Used for Knowledge Sharing?” International Journal of Human-Computer Studies 43, nos. 5-6: 907-28. Guarino, Nicola, Daniel Oberle, and Steffen Staab. 2009. “What Is an Ontology?” Handbook on Ontologies, May, 1–17. https://doi.org/10.1007/978-3-540-92673-3_0. Miranda, Marcos Luiz Cavalcanti de, and Costa, Deniz. 2019. “A Organização do Conhecimento Sobre Umbanda e sua Representação Bibliográfica: Uma Análise Exploratória a Partir de 244 Registros Biográficos.” In Organização do conhecimento Responsável: Promovendo Sociedades Democráticas e Inclusivas, edited by Thiago Henrique Bragato Barros and Natalia Bolfarini Tognoli. Belem: Ed. da UFPA, 419-27. Possenti, Sírio. 2009. Os Limites do Discurso. São Paulo: Parábola Editorial.

Chapter Preview

References

Abstract

The proceedings explore knowledge organization systems and their role in knowledge organization, knowledge sharing, and information searching.

The papers cover a wide range of topics related to knowledge transfer, representation, concepts and conceptualization, social tagging, domain analysis, music classification, fiction genres, museum organization. The papers discuss theoretical issues related to knowledge organization and the design, development and implementation of knowledge organizing systems as well as practical considerations and solutions in the application of knowledge organization theory. Covered is a range of knowledge organization systems from classification systems, thesauri, metadata schemas to ontologies and taxonomies.

Zusammenfassung

Der Tagungsband untersucht Wissensorganisationssysteme und ihre Rolle bei der Wissensorganisation, dem Wissensaustausch und der Informationssuche. Die Beiträge decken ein breites Spektrum von Themen ab, die mit Wissenstransfer, Repräsentation, Konzeptualisierung, Social Tagging, Domänenanalyse, Musikklassifizierung, Fiktionsgenres und Museumsorganisation zu tun haben. In den Beiträgen werden theoretische Fragen der Wissensorganisation und des Designs, der Entwicklung und Implementierung von Systemen zur Wissensorganisation sowie praktische Überlegungen und Lösungen bei der Anwendung der Theorie der Wissensorganisation diskutiert. Es wird eine Reihe von Wissensorganisationssystemen behandelt, von Klassifikationssystemen, Thesauri, Metadatenschemata bis hin zu Ontologien und Taxonomien.