Content

Victor Odumuyiwa, Yetunde Zaid, Olatunde Barber, Enhancing Knowledge Organization Through Implicit Collaboration in Crowdsourcing Process in:

International Society for Knowledge Organziation (ISKO), Marianne Lykke, Tanja Svarre, Mette Skov, Daniel Martínez-Ávila (Ed.)

Knowledge Organization at the Interface, page 507 - 511

Proceedings of the Sixteenth International ISKO Conference, 2020 Aalborg, Denmark

1. Edition 2020, ISBN print: 978-3-95650-775-5, ISBN online: 978-3-95650-776-2, https://doi.org/10.5771/9783956507762-507

Series: Advances in Knowledge Organization, vol. 17

Bibliographic information
Victor Odumuyiwa – University of Lagos, Nigeria Yetunde Zaid – University of Lagos, Nigeria Olatunde Barber – University of Lagos, Nigeria Enhancing Knowledge Organization Through Implicit Collaboration in Crowdsourcing Process Abstract: This paper presents our approach in removing noisy labels from crowdsourced data and enhancing understanding and communication through crowdsourcing process aimed at creating metadata for describing digitalized artworks of a University Library Museum. A responsive Web application was created for the crowdsourcing activity and made open to the University community for interested individuals to participate in annotating the images. The collected annotation in form of tags were preprocessed and filtered to generate a subset of tags by removing duplicates and also eliminating some noisy labels using majority voting. The resulting subset was used as labels for the images for a second round of crowdsourcing process where users chose from the filtered labels. Comparing the output of the second round with the label (tags) from an expert shows a high level of similarity between the selected tags and the expert generated tags. 1.0 Introduction People read different meanings to images and view them from different perspectives. The creator of an image always have an intended information to communicate and users of the image interpret the image based on their existing knowledge and also the context in which they find themselves. Images like artworks are more difficult for users to interpret and most often need explanation by an expert. In museums, most of the collections have labels and a little description of the work in order to assist visitors (users) to relate with the collections. Despite this, users still find it difficult to understand some of the collections and, most often, need a guide to provide them with explanation and to answer their questions. When these collections are digitized and rendered online, it becomes more necessary to help users in interpreting the artifacts. Several approaches have been used in the literature. Sharma and Siddiqui (2016) developed an ontology based approach for retrieval of digitalized museum artifacts. They claimed to support semantic retrieval by automatically extracting ontological concepts in form of visual and textual features from images and their textual descriptions. Some other approaches involved the use of experts in creating ontology. However as earlier stated and also supported by the work of Zhitomirsky-Geffet et al. (2016), “experts are able to build knowledge organization schemes and ontologies of high professional quality, but experts are hard to find and expensive to employ”. The other problem related to this is the non-scalability of the ontologies as they are domain and context specific and for new collections, new ontologies would have to be created. Also, the advancement in Internet and Web technologies, and the need to disseminate artifacts to a wide audience has led to a continuous rise in the digitalization of existing artifacts and creation of new digital artifacts. However, human experts cannot adequately meet up with the challenge of providing prompt knowledge organization service to cater for this rise. In information retrieval research, the limitation of taxonomy, expert indexing and ontologies have been discussed severally. Approaches like social bookmarking, social indexing, and collaborative tagging have emerged and have been extensively explored 508 to bridge the gap between controlled vocabularies used by experts in classifying documents and natural language expression of information needs by users (Golder and Huberman 2006). Users’ participation through social indexing and collaborative content creation have enhanced knowledge organization and information retrieval despite the inherent problem of “noise” associated with users’ content in knowledge organization activities. Using non-experts in knowledge organization can be considered as a crowdsourcing process. Crowdsourcing has been defined as “the process of bringing in many people to achieve great feats from tasks that used to be handled by only a specialized few” (Howe 2006). Crowdsourcing has been applied to several areas like information search, image labelling (Jackson et al. 2018), data classification, document translation, sentiment analysis, organizational learning (Lenart-Gansiniec and Sułkowski 2018) and many more. The shortage of experts (“specialized few”) in handling astronomically increasing documents generated on daily basis can be considered as one of the reasons for increasing resort to crowdsourcing. Crowdsourcing of data come in two different forms. The first approach involves the requester providing a list of labels (or answers) from which the participants can choose from. The second form is to allow participants to type in labels (or answers) based on their perception of the object on which data is sought (Adeogun and Odumuyiwa 2019). Crowdsourcing is a social process which can also be referred to as implicit collaboration. This process comes with the advantage of enhancing speed in solving problems by depending on the wisdom of the crowd. It can also culminate in collective intelligence. It is however important to point out a fundamental problem of “noisy labels” prevalent in crowdsourced data (Adeogun and Odumuyiwa 2019). Since the participants are not necessarily experts in the domain, provided data may be riddled with impreciseness and unnecessary tautology especially when participants type in their responses freely without any constraint of choosing from a list of labels. 2.0 Methodology The objective of this work is to explore how to enhance the representation of the University of Lagos collection of artworks. The goal is to explore the possibility of engaging users of the artwork in generating tags to describe what the artwork depicts. Of course, describing the artwork can be done by an expert or some experts in cultural artifacts. However, experts are scarce, their services are expensive and they are limited in number. They are more suited for collections of smaller number. The observed increase in the creation of digital artifacts and contents exposes the limitation of having enough experts to manually provide metadata on digital artefacts. The University of Lagos Library has a museum that contains over 123 artworks some of which have been displayed in international expositions. Two of the artworks are shown in Figures 1 and 2. The University Library has digitalized the artifacts and would like to create an online information retrieval system for the content. 509 Figure1: Olumo rock Figure 2: Beaded crocodile from Cameroon We created a responsive web application (see Figure 3) to crowdsource for tags and comments on the images of the artifacts. In the first phase of this research, a total of 28 artifacts were posted on the application. Participants (users) were drawn from the University community by broadcasting the link to the Web application and allowing interested individuals to create an account on the application. When the account is created, the users log in with their account credentials and start viewing the images and providing annotation in form of tags and comments. Our objective is to capture users understanding of the artifacts without any aid hence we did not provide any label or description. Users could add as many tags as they want to any of the images. Running the application for just one day, over 25 participants subscribed and generated over 430 tags. Each image received an average of 15 tags. Figure 3: The crowdsourcing application 510 Figure 4: The crowdsourcing application with preloaded options for each image We preprocessed the data to correct some typographical errors and also did a frequency count of each tag expressed on an image. With the processing, the tags were filtered and we selected about 10 most occurring tags for each image and we modified the application using the selected tags as labels for the image. Another call for crowdsourcing was made and about 20 users responded. In this second call, the application was modified such that users were asked to select the 3 options that best describe each of the images (see Figure 4). About 20 users responded to the second call with about 671 tags selected in total. We applied majority voting on the new tags to remove noisy labels and select the best three for each image. We also engage an expert librarian focused on artworks to provide metadata for the images and we compared his output with the output of the crowdsourcing process as shown in Table 1. 3.0 Result and Discussion We compared the result of the crowdsourced process with that of the expert and observed high similarity among the two. The essence of this work is to test our approach in removing noisy labels from crowdsourced data by experimenting with few artworks. Of course the end goal is to apply this approach to a large corpus of artworks with or without input from experts. The initial results we obtained in this work convince us that with large corpus of artworks, crowdsourcing will make it faster for us to obtain relevant metadata to describe our artifacts and would enhance the information search process using the information retrieval system to be created. 511 Table1: Comparison of the crowdsourced tags with the expert tags for four images Crowdsourced Tags Expert Tags 1 Graduation Matriculation Convocation Graduation Matriculation Success 2 Arts Cattle rearer Herdsman Livestock Management Life in Northern Nigeria Fulani herdsman 3 Mahatma Gandhi Wise man Drawing Mahatma Gandhi A Sage Role Model 4 Hunger Poverty Painting Hunger Despair Malnutrition 4.0 Conclusion This is an ongoing research work aimed at creating an information retrieval system on artworks. This paper reports the first phase of the research focused on crowdsourcing tags from the University community to serve as metadata for describing artifacts. We used the majority voting approach to remove noisy labels from the crowdsourced data. It was observed that the resulting data were very similar to the data provided by an expert on the artworks. In addition, we observed that more semantically relevant tags were generated during the crowdsourcing process. This can enhance the work of the expert in providing metadata on the artifacts. We plan to improve the result by using some other algorithms for removing the noisy labels without having to reduce user selections in the second round of crowdsourcing process. References Adeogun Yetunde and Victor Odumuyiwa. 2019. “A Comparative Analysis of Four Label Extraction Algorithms for Crowdsourced Data.” In Transition from Observation to Knowledge to Intelligence (TOKI), edited by Victor Odumuyiwa, Olufade Onifade, Amos David, and Charles Uwadia. Nigeria: ISKO-West Africa, 41-56. Golder, Scott A. and Bernardo A. Huberman. 2006. “Usage Patterns of Collaborative Tagging Systems.” Journal of Information Science 32:198-208. Howe, Jeff. 2006. “The Rise of Crowdsourcing.” Wired Magazine 14, no. 6: 1-4 Jackson, Corey, Kevin Crowston, Carsten Østerlund, and Mahboobeh Harandi. 2018. “Folksonomies to Support Coordination and Coordination of Folksonomies.” Computer Supported Cooperative Work (CSCW) 27, nos. 3-6: 647-678. Lenart-Gansiniec, Regina and Łukasz Sułkowski. 2018. “Crowdsourcing—A New Paradigm of Organizational Learning of Public Organizations.” Sustainability 10, no. 10: 3359. Sharma, Manoj Kumar and Tanveer J. Siddiqui. 2016. “An Ontology Based Framework for Retrieval of Museum Artifacts.” Procedia Computer Science 84: 169-176. Zhitomirsky-Geffet, Maayan, Barbara H. Kwaśnik, Julia Bullard, Lala Hajibayova, Juho Hamari, and Timothy Bowman. 2016. “Crowdsourcing Approaches for Knowledge Organization Systems: Crowd Collaboration or Crowd Work?” Proceedings of the Association for Information Science and Technology 53, no. 1: 1-6.

Chapter Preview

References

Abstract

The proceedings explore knowledge organization systems and their role in knowledge organization, knowledge sharing, and information searching.

The papers cover a wide range of topics related to knowledge transfer, representation, concepts and conceptualization, social tagging, domain analysis, music classification, fiction genres, museum organization. The papers discuss theoretical issues related to knowledge organization and the design, development and implementation of knowledge organizing systems as well as practical considerations and solutions in the application of knowledge organization theory. Covered is a range of knowledge organization systems from classification systems, thesauri, metadata schemas to ontologies and taxonomies.

Zusammenfassung

Der Tagungsband untersucht Wissensorganisationssysteme und ihre Rolle bei der Wissensorganisation, dem Wissensaustausch und der Informationssuche. Die Beiträge decken ein breites Spektrum von Themen ab, die mit Wissenstransfer, Repräsentation, Konzeptualisierung, Social Tagging, Domänenanalyse, Musikklassifizierung, Fiktionsgenres und Museumsorganisation zu tun haben. In den Beiträgen werden theoretische Fragen der Wissensorganisation und des Designs, der Entwicklung und Implementierung von Systemen zur Wissensorganisation sowie praktische Überlegungen und Lösungen bei der Anwendung der Theorie der Wissensorganisation diskutiert. Es wird eine Reihe von Wissensorganisationssystemen behandelt, von Klassifikationssystemen, Thesauri, Metadatenschemata bis hin zu Ontologien und Taxonomien.