Content

Ziyoung Park, So Young Yoon, Seunghee Son, Yoonwhan Kim, Constructing Semantic Periodical Index Database Focusing on the Visegrad Group’s Transition Process (writing problems) in:

International Society for Knowledge Organziation (ISKO), Marianne Lykke, Tanja Svarre, Mette Skov, Daniel Martínez-Ávila (Ed.)

Knowledge Organization at the Interface, page 357 - 363

Proceedings of the Sixteenth International ISKO Conference, 2020 Aalborg, Denmark

1. Edition 2020, ISBN print: 978-3-95650-775-5, ISBN online: 978-3-95650-776-2, https://doi.org/10.5771/9783956507762-357

Series: Advances in Knowledge Organization, vol. 17

Bibliographic information
Ziyoung Park – Hansung University, South Korea So Young Yoon – Hansung University, South Korea Seunghee Son – Hansung University, South Korea Yoonwhan Kim – Yonsei University, South Korea Constructing Semantic Periodical Index Database Focusing on the Visegrad Group’s Transition Process Abstract: The purpose of this study is to build a database of articles related to the transition of the Visegrad Group and to use the collected data as a reference for Korean unification. The transitional research team, which studied the Visegrad Group’s histories and political systems, and the database team, comprised of library and information science researchers, collaborated to list articles from 1988 to 2004 and create indexes, summaries, full-text information, and a catalog. The results of this study will be used to design and construct the article index database for multilingual data as well as study the countries’ transition processes. 1.0 Project overview This research discusses the periodical index database construction of the Visegrad Group, consisting of Central European countries, particularly the Czech Republic, Hungary, Poland, and Slovakia (International Visegrad Fund 2019). The transitional process of these countries during the late 1980s and early 2000s may provide essential insights about Korean reunification. The project aims to provide relevant information about the Visegrad countries’ transition processes that will serve as a reference for the establishment of a peace regime and reunification of the Korean Peninsula. The scope of the periodical index database includes the period between 1988 before Mikhail Gorbachev delivered a speech to the United Nations (UN) and 2004 when all the Visegrad countries joined the European Union (EU). This period is divided into three phases: • First Phase – Emergence of signs and phenomenon of transition (1988– 1991) • Second Phase – Implementation of transition (1991–1999) • Third Phase – Achievement of transition (1999–2004) This research was conducted by two research teams. First, the transitional research team—comprised of Korean experts on European affairs—studied the Visegrad countries’ modern histories and political systems, discovered and interpreted the relevant articles serving as basic data on the countries’ transitions, built bibliographic information and article summaries, and assigned categories and keywords. Second, the database team consisted of library and information science majors who can design the data structure, refine raw data, and construct additional structured data to increase the basic data’s usability. The team was responsible for database design, refinement, and quality control and the definition of the metadata types, elements, and values to be built by transition researchers for the overall frame of the database in the early stages of the study. Afterward, all basic data collected yearly were inspected for corrections and authority control. In the interim, the status of the data built by the researchers was monitored to provide additional guidelines or to modify the structure of the initial 358 database design. In addition, useful external information, such as authority numbers and Wikipedia links, were added to improve the quality and usability of the existing data. 2.0 Initial database design and modification The major role of the database team was to design the initial database frame. Their second major role was to provide the guidelines for the creation of the metadata of the documents selected by the transition researchers and the quality improvement of basic data from the perspective of information organization. The database was initially designed with the following information: (1) Classificatory information: This includes classification by country based on the magazine’s issuing country, by year based on the year of publication, by source based on the publisher, and by content based on the topic (politics, economy, society, and culture). The place discussed in the content, rather than the magazine’s publisher, will be the location keyword. (2) Descriptive information: The three descriptive elements are elements for the journal, article, and metadata creators and reviewers. Initially, the description included the author’s information; however, it was later removed because of difficulties in acquiring each article author’s name and contact information. (3) Summary information: The summary, written by each country’s assigned researcher, states the content of the full-text article in 100 words. It is written based on facts rather than personal opinions or criticism. (4) Full-text information: The research team gathered all articles in internal storage for metadata reviews and further references. (5) Index information: Index terms were classified into five types by adding general subjects and events to the PLO index, based on the person, location, and organization involved. The importance of index information increases as the project progresses. Transition researchers also assigned the appropriate keywords as they wrote summaries. Then, the keywords, which were written in different languages, were linked with established Korean index terms, and International Standard Name Identifier (ISNI) numbers or Wikipedia link information was added to the proper names of people, locations, and organizations. 3.0 Document selection and acquisition of full-text articles In this process, history and political science majors, who are capable of understanding local languages and analyzing each country’s historical events, first selected, analyzed, and described the articles. They created the basic descriptive catalog, wrote summaries in Korean, and assigned keywords and categories for each article. For article selection, they examined 10 kinds of daily and weekly journals issued in the transition countries— the Czech Republic and Slovakia (Czechoslovakia), Poland, and Hungary—and the countries that influenced them, Germany and the US. The researchers checked and selected the full text of all articles, while considering each country’s research period and equity, through local visits and web services. It was difficult to select only a few hundred articles per journal among the multitude of articles within a limited time. The final number of articles collected by country is shown in Figure 1. 359 Figure 1. Number of articles collected by country 4.0 Basic metadata creation and refinement At the beginning of the study, the database team outlined the initial design of metadata elements and values and provided the transition team with guidelines to build their basic metadata—the bibliographic information, summaries, and keyword lists of the articles in Korean. After creating the metadata and descriptive catalog, the researchers wrote each article’s Korean summary, which is 100 words long, that can help Korean researchers in searching and understanding the articles in other languages. The researchers also assign keywords to each article, which are classified by the person, location, organization, general subject, and event involved. In addition, each article is also categorized into politics, economics, society, and culture. Once the transition researchers complete the article selection and the primary metadata, the database team checks the metadata from the full-text image files and ensures that it meets the criteria stated in the guidelines. Then, the database team corrects any identified errors and asks transition researchers for feedback when necessary. Besides spelling errors, the most common types of errors are the following: • Titles and subtitles selection: The main title, usually in the largest font size, is the main title of the article. On the other hand, the subtitle, consisting of phrases that expound on the title, is usually presented in a larger font on the top or bottom of the main title. However, there were several instances when they were mistakenly switched. • Incomplete full-text article: Parts of the original text was not obtained when newspaper or magazine articles begin on the first page but are continued on another page. • Other basic errors: Incorrect capitalization in titles, missing volume, and issue information for articles, and incorrectly copied original link information available on the web The following are examples of summarized metadata for each article: • Descriptive elements for the unit of journals 577 662 575 678 480 0 100 200 300 400 500 600 700 800 CZ DE HU PL US N o. o f A rt ic le s CZ DE HU PL US 360 - Journal Identifier: 0139-1682 (ISSN) - Journal Title: HVG - Place of Publication: Budapest - Publisher: HVG Kiadó Zrt. • Descriptive elements for the unit of articles - Internal Identifier for the Article: 2937 - Original Title: “Határeset” [mandatory] - Korean Translated Title: “극단적인 경우” [mandatory] - Original Subtitle: “EU-magyar csatlakozási tárgyalások” [optional] - Korean Translated Subtitle: “유럽연합-헝가리 가입 협상” [optional] - Contributor: K. György - Publication Date: 2001-08-04 - Volume No.: 23 - Issue No.: 31 - Start Page: 79 - End Page: 81 - Online Access: No - Full-Text File Name: H2093 (internal storage) • Descriptive elements for the metadata creators and reviewers - Email Address: aran***@articl***.org (contact information for the metadata creator) [mandatory] - Note: (comments or questions of reviewer) [optional] • Korean article summary - Some of the negotiation themes to join the European Union (EU) are not in the EU regulations; therefore, some topics would not be a problem unless they endanger the EU. One of the more difficult areas is the Schengen regulatory system, particularly for topics such as changing the monitoring and control of borders that violate EU standards. (Omitted below) [Translated from the Korean summary] • Article category: “경제” (Economy) [Korean] • Article keywords (Korean established/original keywords): - [organization] 유럽연합 (European Union) - [event] 유럽연합 가입협상(EU csatlakozási tárgyalás) - [event] 유로화 도입 (Euro bevezetése) - [general subject] 쉥겐규정 (Schengeni előírások) - [general subject] 국경통제 (határellenőrzés) • Image file for the article 361 Figure 2. Example of a partial full-text image [HVG 23(31): 79–81] 5.0 Data enhancement by constructing index term clusters In addition to correcting data errors, the database team also examined article classification and indexing information, which consists of author keywords and controlled index terms. The index terms are in different languages, such as Czech, Polish, Hungarian, German, and English, and the team has made a cluster by gathering words with similar meanings and then adding the preferred Korean term. Semantic grouping of concepts is essential for semantic search capabilities (King and Reinold 2008, 27). If individual terms are divided into facets, the index term list can be called “faceted” (Chan and O’Neill 2010, 13). Given the articles’ data written in Czech, Polish, Hungarian, German, and English, the database team constructed a faceted multilingual index term cluster by grouping multilingual index terms for each preferred Korean index term. Moreover, the team assigned a specific identifier to each index term cluster by concept and linked them to other entities. The following points were considered in building a multilingual index term cluster: • Similar index terms constructed by language are grouped according to the established Korean index term and examined by experts of each language. • In the case of proper names, such as institutions or organizations, even if the Korean translation was the same, the terms are considered different. • After drafting a multilingual cluster, another researcher in the team crosschecked the consistency of index clusters within the same language. 362 Example: Index_term_no: 207 Preferred term (Kor): 다당제 Variant term (Kor): 복수정당제도 Corresponding term (Hun): többpártrendszer Corresponding term (Ger): Mehrparteiensystem Figure 3. Example of the index term cluster “multiparty system” 6.0 Data enhancement by adding controlled link information The database team also added external link information to the index term clusters to strengthen semantics because Koreans might be unfamiliar with the persons, locations, organizations, events related to the Visegrad Group’s transition. The database includes external link information created by adding Uniform Resource Identifiers (URIs) of related bibliographic entities from reliable sources, such as ISNI, the National Library of Korea, and Wikipedia, as well as index terms clusters that combine the keywords assigned by researchers in each magazine. The team designated Korean Wikipedia links and the link information provided by Linked Open Data from the National Library of Korea as external links for all index terms. Furthermore, the team added ISNI to the Person and Organization index terms, and URI information from the GeoNames site to the Location index terms. External link information is also useful, especially for the semantic web environment and information sharing (Lee, Park, and Lee 2017; Park 2012). 7.0 Conclusion In this study, articles related to the system transition of the Visegrad Group were selected from representative weekly and daily magazines to establish the article index database. In the process, the team systematically designed the catalog, classification/index, summary, and fulltex information. Through repeated data reviews, the team corrected basic errors and built index clusters so that articles in multiple languages could be semantically linked. Key figures and events related to the transition also provided additional information in conjunction with ISNI or Wikipedia pages. Index term and descriptive information about each article are crucial in the periodical index database. Index terms were found to improve articles’ findability and help users understand the articles and locate relevant information. Moreover, with the constructed faceted index terms, 363 articles from a certain period can be linked to a related person, organization, location, and event. Well-arranged sets of multilingual index terms also enabled the use of a reliable multilingual dictionary. Furthermore, the links to ISNI, Wikipedia, GeoNames, and Authority Data from the National Library of Korea helped provide additional person, event, and bibliographic information. The periodical index database constructed through this research can be used as a basis for Korean unification studies and contribute to further research about unification and transition. In addition, the data and the skills acquired during the construction process will also be used as a reference for creating and using an article index for multilingual data in the future. Acknowledgments This research, financially supported by Hansung University, is based on the Fundamental Research Program “Compilation of the Documents regarding the Democratic Transformation of Eastern Europe: 1989–2004: With a Special Focus on the Visegrad Four’s Periodical,” supported by National Research Foundation. We want to express our sincere gratitude and appreciation to all the researchers who participated in the project. References Chan, Lois Mai and Edward T. O'Neill. 2010. FAST: Faceted Application of Subject Terminology: Principles and Application. California: Libraries Unlimited. International Visegrad Fund. 2020. Visegrad Group. http://www.visegradgroup.eu/ King, Brandy E. and Kathy Reinold. 2008. Finding the Concept, Not Just the Word: A Librarian’s Guide to Ontologies and Semantics. England: Chandos Publishing. Lee, Sungsook, Ziyoung Park, and Hyewon Lee. 2017. “Expanding the Scope of Identifying and Linking of Personal Information in Linked Data: Focusing on the Linked Data of National Library of Korea.” Journal of the Korean Society for Information Management 34, no. 3: 7– 21. Park, Ziyoung. 2016. “Extending Bibliographic Information Using Linked Data.” Journal of the Korean Society for Information Management 29, no. 1: 231–251. http://doi.org/10.3743/KOSIM.2012.29.1.231.

Chapter Preview

References

Abstract

The proceedings explore knowledge organization systems and their role in knowledge organization, knowledge sharing, and information searching.

The papers cover a wide range of topics related to knowledge transfer, representation, concepts and conceptualization, social tagging, domain analysis, music classification, fiction genres, museum organization. The papers discuss theoretical issues related to knowledge organization and the design, development and implementation of knowledge organizing systems as well as practical considerations and solutions in the application of knowledge organization theory. Covered is a range of knowledge organization systems from classification systems, thesauri, metadata schemas to ontologies and taxonomies.

Zusammenfassung

Der Tagungsband untersucht Wissensorganisationssysteme und ihre Rolle bei der Wissensorganisation, dem Wissensaustausch und der Informationssuche. Die Beiträge decken ein breites Spektrum von Themen ab, die mit Wissenstransfer, Repräsentation, Konzeptualisierung, Social Tagging, Domänenanalyse, Musikklassifizierung, Fiktionsgenres und Museumsorganisation zu tun haben. In den Beiträgen werden theoretische Fragen der Wissensorganisation und des Designs, der Entwicklung und Implementierung von Systemen zur Wissensorganisation sowie praktische Überlegungen und Lösungen bei der Anwendung der Theorie der Wissensorganisation diskutiert. Es wird eine Reihe von Wissensorganisationssystemen behandelt, von Klassifikationssystemen, Thesauri, Metadatenschemata bis hin zu Ontologien und Taxonomien.