Á. Castellanos, A. García-Serrano, J. Cigarrán, E. W. De Luca, Improving the Knowledge Organization of Linguistic Resources in:

Wieslaw Babik, H. Peter Ohly, Karsten Weber (ed.)

Theorie, Semantik und Organisation von Wissen, page 175 - 186

1. Edition 2017, ISBN print: 978-3-95650-239-2, ISBN online: 978-3-95650-326-9,

Series: Fortschritte in der Wissensorganisation, vol. 13

Bibliographic information
175 Improving the Knowledge Organization of Linguistic Resources Ángel Castellanos, Ana García-Serrano, Juan Cigarrán ETSI Informática UNED, Spain,,, Ernesto William De Luca Potsdam Univ. of Applied Sciences, Germany, Abstract Linguistic Resources constitute one of the main repositories for content enrichment and modelling tasks; they include a big amount of formalized linguistic data as well as the relations between these data. However, when applying these resources, some problems (e.g., noisy information or redundancy) arise, mainly related to the identification of the most valuable information contained in them, for its application in a specific task. In this context, Knowledge Organization (KO) techniques applied to the knowledge contained in these resources should make easier the identification of such information. In this regard, this paper proposes a Knowledge Organization technique based on Formal Concept Analysis (FCA) to organize the knowledge included in a linguistic resource like EuroWordNet (EWN). By means of FCA, it is created a new organizational layer on top of the EWN, structuring the knowledge contained in it. It is going to be proved that the using of this EWN enhanced version improves the textual enrichment and data representation process. To that end, the proposed approach has been applied to a specific task, the Topic Detection Task in at Replab 2013. Results show how the proposed FCA-based technique is able to organize linguistic-based data, obtaining a better performance than when only unorganized linguistic information is applied. Likewise, results improve the performance of the state-of-the-art algorithms for topic detection. Keywords and Categories: Formal Concept Analysis, Knowledge Organization, Linguistic Resources, Content Representation, Topic Detection, HAC, Knowledge acquisition 1. Introduction Linguistic-based resources as WordNet [13] and its variants EuroWordNet or MultiWordNet provide a vast amount of linguistic information, contextualized in concepts and related according to different relationships. These lexical databases represent a valuable source of knowledge, which application might be very helpful in the enrichment, modelling and representation of contents expressed in natural language. In fact, they have been applied in a wide range of content modelling tasks, such as: word sense disambiguation, information retrieval, text classification, summarization or machine translation among others. Broadly speaking, these content modelling approaches try to create a content representation to be easily processed by automatic systems, offering as much information about the contents as possible. In this context, new challenges have appeared with the appearance or the Social Networks or Microblogging platforms like Twitter, different from those related to traditional content (news reports, articles, web-pages). As described Babik/Ohly/Weber: Theorie, Semantik und Organisation von Wissen. Würzburg: Ergon 2017 in [6], dealing with tweets involves some considerations that potentially limits the performance of the traditional representation and modelling techniques. Some of these considerations are explained in [3]: existence of special signs, use of slang, or spelling mistakes and, the most important, the limitation of characters imposed by Twitter. So, either by this limitation or by the kind of information shared by the users (short messages and updates), the most important aspect is the shortness of the textual content. This issue leads on a low cardinality of the relationships between contents, hindering the application of NLP or IR algorithms. The shortness problem may be mitigated by the inclusion of lexical information related to the contents included in linguistic resources. However, this application, which theoretically appears as a straightforward solution, presents some problems in its application to real scenarios. These problems are related to the inclusion of undesired information (noisy, redundant or unrelated information). It is related to the difficulty of identifying the most interesting information in the linguistic resources. Linguistic resources have a lexically-based structure based on conceptual-semantic relationships. For instance, the WordNet (and its subprojects) structure is based on hierarchical relationships (hyper- and hyponymy). However, there are other non-hierarchical relationships (for instance, synonymy) not included in the WordNet hierarchy. We consider that these relationships may be useful in order to organize the WordNet content for its later application in an enrichment task. This information loss produces that sometimes isnot clear how to identify the most suitable information or how it should be used. Therefore, a better knowledge organization is needed, creating an extra-layer which also organizes the data according to these other relationships, in order to facilitate the application of linguistic resources for content modelling and representation tasks [5, 9, 16]. In this regard, this paper proposes a Knowledge Organization approach based on the application of Formal Concept Analysis (FCA), a mathematical theory of content organization. FCA appears as a suitable approach, given that: 1) it does not require from prior information about the contents to organize them; 2) it organizes the contents based on a lattice structure, a richer representation than a hierarchy. The lattice represents a formalism that better explores correlations, similarities, anomalies or even inconsistencies in the data structures [7]; and 3) it offers an easy-readable representation of the resultant structure, facilitating its navigation and understanding. Through the FCA application is feasible to create an explicit hierarchical representation (the concept lattice) of the inherent data structures, implications and dependencies of the knowledge expressed in the linguistic resources. We claim that this new structure offers a better representation and organization of knowledge contained in the linguistic resources. Our expectation is that this FCA-based improved knowledge representation might facilitate the content enrichment and modelling and, specifically, the Twitter content modelling. The proposed approach and its application for content enrichment is tested in a specific data organizational task: the Topic Detection Task at Replab 2013. This task is focused on detecting latent topics in a set of tweets about companies. As the tweets are in English and Spanish, we have selected the Spanish and English versions of EuroWordNet (EWNes and EWNen) as input to extract the linguistic information, later modelled by means of FCA. It is going to be proved how the more accurate representation of tweets content, driven by the information inferred from the FCA-based knowledge organization of the EWN data, achieves a significant performance improvement for topic detection systems. 177 FCA basics and the data representation proposed is explained in section 2. Its application for topic detection in section 3, the state of the art of the Knowledge Organization field and its relation to FCA is detailed in section 4, and finally the conclusions are drawn in section 5. 2. FCA Knowledge Organization for Content Modelling The proposed approach bases its operation in the organization of the knowledge contained in linguistic resources by means of Formal Concept Analysis (FCA). FCA is a mathematical theory of concept formation [4, 14, 31, 32] derived from lattice and ordered set theories that provides a theoretical model to organize formal contexts. A formal context is defined as a triple ? ∶=(? ,? ,? ), where ? is a set of (formal) objects, ? a set of (formal) attributes and ? a binary relation between ? and ? , i.e. (? ⊆? ×? ), denoted by ? ? ? , which is read as: the object ? has the attribute ? . From the data in the formal context, it can be inferred a set of formal concepts including the set of objects sharing the same set of attributes. Formally defined, a formal concept is a pair (? ,B), where ? is a set of objects (also known as the extent of the formal concept), and ? is a set of attributes (also known as the intent of the formal concept) shared by all the objects in ? . Following the FCA theory it is possible to derive a concept-based hierarchy, ordering the set of formal concepts in a subconcept-superconcept-relation according to their extents: where (? ,D) is called a super-concept of (? ,B) and, conversely, (? ,B) is a sub-concept of (? ,D) (i.e., (? ,B) is more specific than (? ,D)). The order that results from this relationship can be proven to be a lattice, a concept lattice, denoted by ? (? ,? ,? ), associated to the formal context. Since concept lattices are ordered sets, they can be naturally displayed in terms of Hasse diagrams. Figure 2 shows an example of the Hasse Diagram of a concept lattice1. This example illustrates the marking method: formal concepts (nodes in the diagram, exactly one node for each formal concept) are depicted with a minimal set of objects (labels in grey) and a minimal set of attributes (labels in white). If ? 1≤? 2 holds, then ? 2 is placed above ? 1 in the lattice and if there is no other concept ? 3 such that ? 1 ≤C3 ≤? 2, there is a line joining ? 1 and ? 2 and ? 1 is called the upper neighbour of ? 2 (? 1≻? 2). For the later computation of the topic detection approach, two important types of formal concepts are the object concepts and the attribute concepts: • The object concept of an object o (? ? ) is the most specific concept that includes o in its extent. In a Hasse Diagram (Figure 2), if a formal concept (a node in the figure) is the object concept of a given object (the one/s in the label connected to the node), it is denoted by a blue semicircle. • The attribute concept of an attribute a (? ? ) is the most generic concept including a in its intent. As with object concepts, if a formal concept is the attribute concept of a given attribute, it is denoted by a black semicircle (in the node related to the attribute) in the Hasse Diagram. 1 Concept lattices have been generated with the ConExp application Babik/Ohly/Weber: Theorie, Semantik und Organisation von Wissen. Würzburg: Ergon 2017 2.1 FCA-based Knowledge Organization In what follows, details about how we performed the organization of the knowledge contained in EWN by means of FCA are explained. Applying the FCA theory to the EWN environment, we take the ENW synsets as the objects of the formal context and the EWN relationships related to these synsets (e.g. hyperonymy, hyponymy, meronymy, synonymy, antonymy…) as the attributes. Consequently, the lattice structure will group those "similar" synsets according to their shared EWN relationships and it will organize them from the most generic to the most specific one. More specifically, we have used the single English (EWNen) and Spanish (EWNes) versions. This is because these are the languages covered by the collection used by our experimentation. The rationale of this knowledge organization is: 1. Generate a new model of the EWN structure using its hierarchical relationships (Hyperonym, Hyponym) as well as the non-hierarchical ones (Synonymy and Antonymy) by means of the application of FCA. FCA will create a hierarchical structure based on an inferred order from the whole amount of EWNen and EWN es relationships. 2. Find similarities between synsets not initially covered by the original EWNen and EWNes structure. Some figures about the FCA modelling for EWNen and EWNes are shown in Table 1. Table 1: FCA Knowlede Organization Statistics To exemplify this new FCA-based Knowledge organization Figure 1 shows the EWNen hierarchy for the synset "auto" (this example has been created by means of the LexiRes software [19]) and Figure 2 shows the concept lattice related to the same synset. For the sake of simplicity, Figure 2 only shows an excerpt, but the final model includes all the information in EWNen and EWNes. The concept lattice organizes the relationships according to its specificity: for instance, the relationship ―hyperonym_entity‖ is a very general one (i.e. all the EWN synsets have it). The EWN structure has also a similar organization based on the hierarchical relationships (e.g., hyperonymy, hyponymy); however in the concept lattice structure the non-hierarchical relationships are also taken into account to infer the inherent structure. For example, the relationship ―synonym_auto‖ appears as a specific one, more related to the synset "machine" than, for example, the relationship ―hyperonym_entity‖. So, in order to represent ―machine‖, ―synonym_auto‖ seems to be more suitable than some other more generic relationship. 179 Figure 1: Example of EuroWordNet hierarchy 2.2 Content Modelling The content modelling is based on representing the tweets in the Replab dataset (see section 3) by means of the most suitable EWN information, inferred from the FCA-based model. To that end, given a tweet, the tweet terms which also appear in the corresponding FCA-based EWNen or EWNes model (i.e. the tweet language is provided in its metadata, so based on that language the system will use the EWNen or EWNes model) will be selected and the information related to them in the FCA-based model will be used for the tweet modelling. Formally defined: Given a ? ? ? ? ? ? to be represented, which is composed by a set of representative terms (? ? ? ? 1,…,? ? ? ? ? ) and the FCA-based EWNen or EWNes model ? ? ? ? (? ,? ,? ), we define a set of object concepts ? ? related to these terms, such as: a set of upper neighbours ? ? of these object concepts, such as: and a set of candidate concepts, such as: Then, the final representation of the ? ? ? ? ? ? will be a set of labels such as: An application example of this process can be seen in Table 2, which presents a data representation based on the FCA-based Modelling example in Table 2 (in the third column) compared to a repres entation using the EWN information, without applying FCA (in the second column). This latter methodology (EWN without FCA) is based on adding all the information included in the EWN synsets appearing in the tweets. In Babik/Ohly/Weber: Theorie, Semantik und Organisation von Wissen. Würzburg: Ergon 2017 contrast, applying our modelling methodology (the EWN-FCA Representation), the concept lattice in Figure 2 is used to infer the most specific information related to the terms in the tweet by executing the modelling herein proposed. Table 2. Example of Data Representation using the EWN-FCA Representation Figure 2. FCA-based Model 3. Experimentation: Topic Detection Topic Detection refers to the clustering of a set of contents according to the latent themes/topics that addressed by them. Topic Detection can be understood as an organizational task; so, the final topic detection performance relies not only on the topic detection system but also on the content modelling accuracy (i.e., the representation of the contents to be analyzed in order to detect the addressed topics). The Replab 2013 Evaluation Campaign provides as an experimental framework including the Topic Detection problem [1]. To that end, it offers an experimental and evaluation environment for topic detection systems as well as an experimental dataset: the Replab 2013 dataset. This dataset consists of a collection of tweets about 61 entities crawled during the period from the 1st June 2012 till the 31st Dec 2012 by using the entity's canonical name as query (e.g. BMW). The entities selected belong to four domains: automotive, banking, universities and music/artists. The topic detection systems are requested to separately detect the topics for each one of the 61 entities in the data set. 181 In contrast to other alike scenarios like news classification, the topics to be detected are not defined a priori. It will be dependent on the thematic addressed by the users in their tweets about the entities. Moreover, this topic set is different along the time; some new unknown topic might appear and some of the already known disappear; being difficult to predict such changes. 3.1 Experimental Setup To test the performance obtained by the proposal, three groups of experiments were settled in order to model the tweets in the Replab dataset. Textual-based, EWN-based (i.e. using the EWNen and EWNes data without the FCA Modelling) and FCA-based modelling. Textual-based modelling approach can be considered as a baseline to represent the Replab data. Since the topics to be detected are mainly thematically based, it is expected that the tweet text will be a strong signal for their detection. On the other hand, the EWN-based approaches have been proposed in order to set "how much" of the possible performance improvement is due to the FCA Modelling and "how much" to the EWN information itself (i.e., the information extracted from EWNen or EWNes without any modelling). In particular, the developed experimental configurations are: • Text: Each tweet is represented only by its textual content (after stop-words removal, stemming and by taking into account the special Twitter signs like hashtags and references). • EWN: Each tweet is only represented with the EWN Information (without the FCA Modelling) related to the terms in the tweet. In other words, for each synset appeared in the tweet, it includes all the information in EWN about the synset. In this case stopwords have been also removed, but the terms have not been stemmed (i.e., because it would make impossible the matching to the EWN synsets) and the special signs have not been used. • Text + EWN: Each tweet is represented with the textual content (pre-processed in the same way than in the ―Text‖ approach) plus the EWN information (pre-processed in the same way than in the ―EWN‖ approach). • EWN-FCA: Each tweet is represented with the EWN-FCA based Information (the obtained from the FCA-based Modelling) related to the terms appearing in the tweet. That is, the content modelling in section 2.2 will be applied to represent the tweet. • Text + EWN_FCA: Each tweet is represented with the textual content (preprocessed as in the ―Text‖ approach) plus the EWN-FCA based information (preprocessed as in the ―EWN‖ approach). Table 3. Tweet representation example Babik/Ohly/Weber: Theorie, Semantik und Organisation von Wissen. Würzburg: Ergon 2017 In order to identify the topics addressed by each entity tweets, the HAC has been applied to the tweets (represented according to the aforementioned configurations) related to each one of the 61 entities by separate. That is, for each entity HAC has been individually applied to its related tweets, resulting in 61 different clustering results, one per entity. The same methodology has been applied to all the different data representation approaches, so the final performance is only dependent on the data representations. 3.2 Results The results herein presented are obtained by applying the Replab evaluation framework: gold-standard, evaluation script and official measures. Results are expressed in terms of Reliability (R), Sensitivity (S) and the F-measure of both (i.e., a weighted harmonic mean of R and S). More information about R and S, as well as its formal definition can be found in [2]. Briefly explained, these measures consider a binary relationships between pairs of items: relatedness, which means that two items belong to the same cluster. Therefore, Reliability is defined as precision of the relationships predicted by the system with respect to those that are derived from the gold standard; and Sensitivity is similarly defined as the recall of these relationships. In essence, these measures are equivalent to BCubed Precision and Recall. More in detail, Reliability (2) and Sensitivity (3) are defined as: where ? ? ? ? ? ? ? (? ,? ) represents that ? and ? belong to the same cluster and ? ? ? ? ? ? (? ,? ) is analogous but applied to the system output. The final R and S measures are the average of the individual R(i) and S(i) for the 61 entities in the dataset. R and S are combined with the Micro-average F-measure: the final F-measure values (i.e., the ones shown in the results) indicate the average of the different F-measures F(i) of the individual R(i) and S(i) values for each of the 61 entities in the dataset. In Table 4, the results using the RepLab Official Evaluation Environment are shown. For the values denoted by †, the approach is significantly better compared to the Textual approach, according to a Wilcoxon test with a p-value of 0.05 and for the values denoted by ffi, the approach is significantly better compared to the EWN approach. The textual-based approach can be seen as a baseline: no extra information is added. The EWN experiment can be also considered as a baseline in order to test the improvement of the FCA-based EWN representation. 183 Table 4. Topic Detection Results The first remark is the high-performance of the text-based representation. Not surprisingly, the text is confirmed as a strong signal to detect thematically-based topics. If only EWN information is considered, it is able to outperform the text-based results. It seems reasonable being that, EWN information covers the same aspects that in textual information but with a better representation (through the EWNen and EWNes structure). Nevertheless, if EWN information is used together with textual information, results are worse than taking into account both by separate: EWN and textual information cover the same aspects, so the application of both together will lead to a redundant representation with no improvement in the task. This latter issue is reflected in the reliability (precision-based) and sensitivity (recallbased) values. The redundancy makes that more features (textual- or EWN- based) are shared by the tweets, even though no new relations between the tweets have been discovered. Therefore, the clustering process will generate bigger clusters with more coverage but less precise. The best performing approach is the one applying the FCA-based proposal (EWN- FCA). As we hypothesized, the FCA-based model is able to better organize the EWN information, facilitating the identification of the more valuable information. When applied to content modelling, it leads to a more informative representation, achieving a performance improvement for the proposed topic detection task. On the other hand, FCA-based information + Text, yields poor results. It can be again explained by the fact that using together EWN (even the FCA-modelled) and textual information includes redundant data, leading to low-precise results. To sum up, the FCA-based proposal seems to be feasible to create a better knowledge organization of linguistic resources, leading to a better data representation and obtaining a higher performance for a data organization task like Topic Detection. Linguistic-based information adds valuable information not originally covered by the Twitter data. The inclusion of EWN information achieves some improvement; however, it is when the FCA-based approach is applied when the most interesting information is extracted. 4. Related Work Knowledge is defined as Information plus Context [15], so Knowledge Organization (KO) refers to the organization of information according to the specific aspects involved in a given context. Knowledge Organization aims to reach a better access to the information based on the structures or patterns inferred from the resultant organization. Several resources and technologies has been proposed for that, such as: RDF representations [28], social tagging [21], relational databases [27], linguistic databases [10], or faceted classification approaches [23]. Babik/Ohly/Weber: Theorie, Semantik und Organisation von Wissen. Würzburg: Ergon 2017 n the literature, FCA in relation to Knowledge-based Resources has been mainly used to create [20, 22] or merge ontologies [30]. Nevertheless, this work intends to restructure already existent knowledge-based (i.e. linguistic) resources. In this regard, FCA has been already proposed to be used with knowledge-based resources like WordNET [12, 18, 33], ontologies [11] or semantic-based data representations like DBPedia [8, 17]. All of these applications are based on the creation of an extra knowledge layer on top of these resources. It is expected to enable a better formalization and structuration of the knowledge contained in them. In the same way that FCA applied to the textual contents provides a richer content organization, applied to knowledge-based contents should result in a richer knowledge organization. As it is claimed in [17], it can assumed that the structure derived from the FCA application to knowledge-based resources could create a more abstract concept layer, valuable to understand relationships, inherent data structures, implications or dependencies. In this sense, the work in [25] and some other similar ones [18, 24, 26] present a FCAbased mathematical representation of the information contained in WordNET. The main differences of the work presented herein presented rely on: 1) the FCA-based representation is created by taking into account all the information available in the knowledge domain (i.e. the EuroWordNET database), instead of taking only a bunch of data related to a specific environment (e.g. query results). Consequently, it is expected to obtain a more informative representation. 2) The aim of evaluating the proposal by applying the generated representation to a specific task; that is, to create a content modelling and test it in a topic detection task. In this sense, other works tried to only use the generated representation as a database to expand the information already known about the contents or to conduct an experimental analysis over the obtained representation. 5. Conclusions In this paper it has been presented a Knowledge Organization approach applying a conceptual-based modelling (Formal Concept Analysis). The hypothesis was that the structure derived from the FCA application could better organize the knowledge contained in a linguistic based resource (EuroWordNET). It was expected that this better knowledge organization might improve the inferring of valuable information from that kind of linguistic databases. In order to test this latter claim, we have proposed an experimental framework based on a data organization task: the Replab 2013 Topic Detection task. Since the task is based on the clustering of similar tweets according to their contents, it requires an accurate description of the tweets content. In this context, we have applied the FCA-based improved version of EWN for the content enrichment and we compared its performance to the one achieved by a textual baseline and also by the application of the EWN information without the organization provided by FCA. Our initial hypothesis has been confirmed by the obtained results. Where unorganized EWN information was not able to infer much more valuable information to the one already contained in the tweets, the FCA-based approach was able to do it: the best performance is obtained by the FCA-modelled EWN information. Special attention should be paid to the combination of Textual and EWN information (Text + EWN and Text + EWN-FCA): even though EWN information (FCA-modelled or un-modelled) 185 leads to an accurate data representation, outperforming the textual approach, when it is used in conjunction with textual information it results in a less precise, and consequently a worse data representation, due to the redundancy included (i.e., EWN data and text cover at the end the same aspects). To summarize, when knowledge-based information is taken into consideration in order to enrich and model a set of data, not only the information itself is important, but also the way in which it is included. In this sense, some previous organization of the knowledge, like the one presented in this paper, should be addressed. Although the proposal has been only tested in a Topic Detection Task, it can be applied to any task dependent of an accurate data representation. We particularly believe that the application to Content-based Recommenders may be interesting, given that they mostly base their operation in the accuracy of the item representation. Acknowledgments This work has been partially supported by the Spanish project VOXPOPULI (TIN2013- 47090-C3-1-P). References [1] AMIGÓ, E., CARRILLO DE ALBORNOZ, J., CHUGUR, I., CORUJO, A., GONZALO, J., MARTÍN, T., MEIJ, E., RIJKE, M., AND SPINA, D. Overview of RepLab 2013: Evaluating Online Reputation Monitoring Systems. In Information Access Evaluation. Multilinguality, Multimodality, and Visualization, P. Forner, H. Muller, R. Paredes, P. Rosso, and B. Stein, Eds., vol. 8138 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2013, pp. 333–352. [2] AMIGÓ, E., GONZALO, J., AND VERDEJO, F. A general evaluation measure for document organization tasks. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval (New York, NY, USA, 2013), SIGIR ‘13, ACM, pp. 643–652. [3] ANTA, A. F., CHIROQUE, L. N., MORERE, P., AND SANTOS, A. Sentiment Analysis and Topic Detection of Spanish Tweets: A Comparative Study of of NLP Techniques. Procesamiento del Lenguaje Natural 50, 0 (2012). [4] BELOHLAVEK, R. Introduction to formal concept analysis. Olomouc, UPOL, Faculty of Science, Department of Computer Science (2008). [5] BENTIVOGLI, L., FORNER, P., MAGNINI, B., AND PIANTA, E. Revising the wordnet domains hierarchy: semantics, coverage and balancing. In Proceedings of the Workshop on Multilingual Linguistic Ressources (2004), Association for Computational Linguistics, pp. 101–108. [6] BERROCAL, LUIS, J., FIGUEROLA, C. G., AND RODRÍGUEZ, A. Z. REINA at RepLab2013 Topic Detection Task : Community Detection. In RepLab 2013, in CLEF Working Notes (2013). [7] CARPINETO, C., AND ROMANO, G. Concept data analysis: Theory and applications. John Wiley & Sons, 2004. [8] CASTELLANOS, A., GARCÍA-SERRANO, A., AND CIGARRÁN, J. Linked Data-based Conceptual Modelling for Recommendation: A FCA-based Approach. In E-Commerce and Web Technologies, M. Hepp and Y. Hoffner, Eds., vol. 188 of Lecture Notes in Business Information Processing. Springer International Publishing, 2014, pp. 71–76. [9] CHEN, S.-J., AND CHEN, H.-H. Mapping multilingual lexical semantics for knowledge organization systems. Electronic Library, The 30, 2 (2012), 278–294. [10] CHIARCOS, C., CIMIANO, P., DECLERCK, T., AND MCCRAE, P. J. Proceedings of the 2nd Workshop on Linked Data in Linguistics (LDL-2013): Representing and linking lexicons, terminologies and other language data. Association for Computational Linguistics, 2013, ch. Linguistic Linked Open Data (LLOD). Introduction and Overview, pp. i – xi. [11] CIMIANO, P., HOTHO, A., AND STAAB, S. Learning concept hierarchies from text corpora using formal concept analysis. J. Artif. Int. Res. 24, 1 (Aug. 2005), 305–339. Babik/Ohly/Weber: Theorie, Semantik und Organisation von Wissen. Würzburg: Ergon 2017 [12] FALK, I., AND GARDENT, C. Combining formal concept analysis and translation to assign frames and semantic role sets to french verbs. Annals of Mathematics and Artificial Intelligence 70, 1-2 (2014), 123–150. [13] FELLBAUM, C. WordNet: An Electronic Lexical Database. Bradford Books, 1998. [14] GANTER, B., AND WILLE, R. Formal concept analysis: mathematical foundations. Springer- Verlag New York, Inc., 1997. [15] GRADMANN, S. Knowledge=Information in Context: on the Importance of Semantic Contextualisation in Europeana. [The Hague]: Europeana Office, 2010. [16] KENT, R. E. The IFF foundation for ontological knowledge organization. Cataloging & classification quarterly 37, 1-2 (2003), 187–203. [17] KIRCHBERG, M., LEONARDI, E., TAN, Y. S., LINK, S., KO, R. K., AND LEE, B. S. Formal concept discovery in semantic web data. In Formal Concept Analysis. Springer, 2012, pp. 164–179. [18] LEE, M. C., LIU, Z. L., CHEN, H. H., LAI, J. B., AND LIN, Y. T. Fca based concept constructing and similarity measurement algorithms. In Advanced Information Management and Service (IMS), 2010 6th International Conference on (2010), IEEE, pp. 384–388. [19] LUCA, E. W. D., AND NÜRNBERGER, A. LexiRes: A Tool for Exploring and Restructuring EuroWordNet for Information Retrieval. In Proceedings of the Workshop on Text-based Information Retrieval (TIR-06). In conjunction with the 17th Conference on Artificial Intelligence (ECAI’06). Riva del Garda, Italy (2006). [20] MAIO, C. D., FENZA, G., GAETA, M., LOIA, V., ORCIUOLI, F., AND SENATORE, S. RSSbased e-learning recommendations exploiting fuzzy FCA for Knowledge Modeling. Applied Soft Computing 12, 1 (2012), 113 – 124. [21] MATTHEWS, B., JONES, C., PUZON, B., MOON, J., TUDHOPE, D., GOLUB, K., AND NIELSEN, M. L. An evaluation of enhancing social tagging with a knowledge organization system. In Aslib Proceedings (2010), vol. 62, Emerald Group Publishing Limited, pp. 447–465. [22] MCCALLUM, A. Information Extraction: Distilling Structured Data from Unstructured Text. Queue 3, 9 (Nov. 2005), 48–57. [23] MÉNARD, E. Ordinary image retrieval in a multilingual context: A comparison of two indexing vocabularies. In Aslib proceedings (2010), vol. 62, Emerald Group Publishing Limited, pp. 428–437. [24] POELMANS, J., ELZINGA, P., VIAENE, S., AND DEDENE, G. Formal concept analysis in knowledge discovery: a survey. In Proceedings of the 18th international conference on Conceptual structures: from information to intelligence (Berlin, Heidelberg, 2010), ICCS‘10, Springer-Verlag, pp. 139–153. [25] PRISS, U. Lattice-based information retrieval. Knowledge Organization 27, 3 (2000), 132–142. [26] PRISS, U. Linguistic Applications of Formal Concept Analysis. In Formal Concept Analysis, B. Ganter, G. Stumme, and R. Wille, Eds., vol. 3626 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2005, pp. 149–160. [27] SINGH, S. K. K., LIM, L. H. S., MERICAN, A. F., AND DIMYATI, K. Biodiversity information retrieval across networked data sets. Aslib Proceedings 62, 4/5 (2010), 514–522. [28] SMETHURST, M., AND SCOTT, T. Building coherence at In Aslib Proceedings (2010), vol. 62, Emerald Group Publishing Limited, pp. 476–488. [29] SPINA, D., GONZALO, J., AND AMIGÓ, E. Learning Similarity Functions for Topic Detection in Online Reputation Monitoring. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (New York, NY, USA, 2014), SIGIR ‘14, ACM, pp. 527–536. [30] WANG, Y., DU, Y., AND CHEN, S. The Understanding between Two Agent Crawlers based on Domain Ontology. In Computational Intelligence and Natural Computing, 2009. CINC ‘09. International Conference on (June 2009), vol. 1, pp. 47–50. [31] WILLE, R. Concept lattices and conceptual knowledge systems. Computers & mathematics with applications 23, 6 (1992), 493–515. [32] WILLE, R. Restructuring Lattice Theory: An Approach Based On Hierarchies Of Concepts, vol. 5548 of Lecture Notes in Computer Science. Springer Berlin Heidelberg, 2009. [33] ZHANG, L., PEI, Z., AND CHEN, H. Extracting fuzzy linguistic summaries based on including degree theory and fca. In Foundations of Fuzzy Logic and Soft Computing. Springer, 2007, pp. 273– 283.

Chapter Preview



Proceedings der 13. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und dem 13. Internationalen Symposium der Informationswissenschaft der Higher Education Association for Information Science (HI) Potsdam (19.–20.03.2013): ‚Theory, Information and Organization of Knowledge‘ | Proceedings der 14. Tagung der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) und Natural Language & Information Systems (NLDB) Passau (16.06.2015): ‚Lexical Resources for Knowledge Organization‘ | Proceedings des Workshops der Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) auf der SEMANTICS Leipzig (1.09.2014): ‚Knowledge Organization and Semantic Web‘ | Proceedings des Workshops der Polnischen und Deutschen Sektion der Internationalen Gesellschaft für Wissensorganisation (ISKO) Cottbus (29.–30.09.2011): ‚Economics of Knowledge Production and Organization‘