David Haynes, Understanding Personal Online Risk to Individuals Via Ontology Development in:

International Society for Knowledge Organziation (ISKO), Marianne Lykke, Tanja Svarre, Mette Skov, Daniel Martínez-Ávila (Ed.)

Knowledge Organization at the Interface, page 171 - 180

Proceedings of the Sixteenth International ISKO Conference, 2020 Aalborg, Denmark

1. Edition 2020, ISBN print: 978-3-95650-775-5, ISBN online: 978-3-95650-776-2,

Series: Advances in Knowledge Organization, vol. 17

Bibliographic information
David Haynes – Edinburgh Napier University, United Kingdom Understanding Personal Online Risk to Individuals via Ontology Development Abstract: This paper describes the development of an ontology of risk as a way of better understanding the nature of the potential harms individuals are exposed to when they disclose personal data online. The ontology was designed to be compatible with BFO, the Basic Formal Ontology, which is intended to promote interoperability. Ontologies from domains such as genetics and medical research are in many instances designed to conform to BFO. An initial exercise to monitor the online activity of six participants from the library and information services community helped to identify the points at which personal data is disclosed during online activity. It also explored the motivations for these disclosures, by questioning participants about their perceptions of risk. The resulting analysis suggested that an ontology would be better than a typology to represent the complex relationships between risk concepts. Terms were also extracted from existing terminologies. Risk scenarios were developed and tested during a formative workshop and incorporated into the ontology. A potential application of the ontology is to identify clusters of risk and map the factors that contribute to specific risks. 1.0 Introduction This research arose from an investigation into the nature of the risks associated with online disclosure of personal information. Interactions with online systems and social media platforms use an economic model based on the sale of personal data (Enders et al. 2008). For instance online behavioural advertising has been a remarkably effective model that has led to the growth of companies such as Facebook, which was able to announce profits of $18.5 billion on revenue of $70.7 billion in 2019 (Facebook 2020). In return for disclosing personal data, individuals gain ‘free’ access to online services. When faced with risk, feelings should be considered alongside rational decision making (Loewenstein et al. 2001; Finucane and Holup 2006). Behaviour models tend to emphasise conscious, rational decision-making during online transactions involving personal data (Kehr et al. 2015). This has been characterised by many researchers as the ‘privacy calculus’. The perceived benefits are judged to outweigh the perceived risks of disclosure. Individual risk and public safety are a focus for current UK government policy (DCMS 2019). In the European Union privacy concerns have been reflected in the General Data Protection Regulation (GDPR) (European Parliament 2016). The purpose of this research is to understand the nature of the risks faced by individuals when they conduct online transactions. The description and categorizing of risks may help with the delivery of more effective mechanisms for managing those risks. Baldwin et al (2010) argue that the purpose of regulation is to manage risk. Although legislation is the primary means of regulation adopted by government, it is not the full picture. Lessig (2006) encapsulated one aspect of internet regulation by the phrase “Code is Law”. The way in which systems are designed affects the way in which they operate. Cavoukian (2012) extended this idea with the concept of ‘privacy by design’. Haynes et al (2016) go on to suggest that a number of regulatory mechanisms (coding, self-regulation, market response and law) work in concert to regulate access to personal data on social networks. Mapping the risks and their relationship with causes and effects may produce better insights into effective responses to this public safety issue. 172 This research sets out to examine the nature of the risks faced by individuals when they engage in online activity. The research considers the following questions: • What is the nature of the risks that individuals face when using the internet? • Is there an existing typology of online risk? • Can an ontology of risk be developed to represent risk relationships more effectively than previous typologies of risk? 2.0 Literature review 2.1 Nature of online risk Risk is an elusive concept, the definition of which depends on the context (Fischhoff, Watson, and Hope 1984). Aven and Renn (2009, 2) define risk in the following terms: A. Risk is expressed by means of probabilities and expected values B. Risk is expressed through events/consequences and uncertainties Simply put, risk is the “effect of uncertainty on objectives” (ISO 2009, 1). Risk applies to individuals, organizations, governments and societies. When considering the risk to individuals it is necessary to make a distinction between risks to personal privacy and risks associated with disclosing personal data (e.g. via data breaches, as well as voluntary disclosure). The privacy calculus captures the concept of perceived individual risk as well as benefits associated with disclosure of personal data (Dinev and Hart 2006). Studies have found that there is an inverse correlation between severity of perceived risks and willingness to disclose personal data (Dinev and Hart 2006). Some studies have described the apparently paradoxical result where individuals disclose personal data despite perceived dangers associated with doing so (Gimpel, Kleindienst, and Waldmann 2018). Privacy paradox studies tend to depend on interviews with individuals about what they would do in hypothetical situations (Gimpel, Kleindienst, and Waldmann 2018; Min and Kim 2015). Work by Acquisti and Grossklags (2005) suggested that there is a discrepancy between intention and actual behaviour. 2.2 Using an ontology to describe risk This research initially set out to develop a taxonomy of risk based on harm to individuals. This would allow hierarchical relationships between concepts. Entities in a taxonomy can be grouped by common origin (phylogeny) or by similarity (morphology) (Gnoli 2017). Solove (2006) provides a classification of harms, which is a starting point for categorizing risks. These largely predate the advent of social media and need to be updated to incorporate the spectrum of online harassment which can range from bullying through to hate speech. Skinner, Song, and Chang (2006) developed a taxonomy of risk based on three dimensions or views: time, space and matter. This was specifically developed in the context of collaborative environments and needs validation with empirical data. Wright and Raab (2014, 290–91) identify examples of harms based on privacy principles. These both feed into an initial identification of online harms. Haynes and Robinson (2015) set these risks in a network of interconnected risks and consequences. 173 2.3 Complexity of relationships and ontologies The decision to use an ontology was based on the ability to define classes of concept and to describe different types of relationship between those classes. Ontology development has been extensive in the biomedical area and this provides a corpus of experience that can be applied elsewhere. Some attention has been paid to other domains such as project management, business processes and cyber security, either using ontologies as a tool for risk assessment (McKone and Feng 2015; Mohammad et al. 2015) or as a means of mapping the relationships between different elements of risk and specific instances of risk events. Perhaps the most directly relevant work is the review of ontologies covering cyber risk which seemed to emphasise vulnerabilities and exploitation by an attacker. There was less emphasis on the concepts of likelihood and impact, which were included in only 3 of the 10 ontologies reviewed by Oltramari and Kott (2018). The authors highlight the problem of estimating probabilities and impact levels in a dynamic environment where the behaviour of a target affects the outcomes. So, for instance if a targeted organization improves its security measures, a potential attacker will switch their attention to another, more vulnerable target. They also speculate that it is impossible to determine the outcomes without knowing more about the motivation of the attackers. An ontology of online risk needs to reflect the complex nature of risk and the need to incorporate concepts such as: Vulnerability, Threat, Incident, Consequence, Harm and Response. Some of these classes also have properties that are defined in their schemas. For example, it might be useful to incorporate the idea of impact of a Harm or the probability of an Incident into the description of a risk scenario. 3.0 Methods 3.1 Creation of the ontology Early prototyping used the Graphite system provided via the Synaptica interface. This was intuitive and allowed experimentation with different data formats and development of schemas. This development environment allows export into an OWL-compatible system so that it can plug into high-level ontologies such as the Basic Formal Ontology (BFO). The ontology development was based on the approach described by Arp et al (2015), who describe four general principles of ontology design: 1. Realism – an ontology is a representation of reality, which is supported by evidence and observation 2. Perspectivalism – reality is too complex to be represented by a single approach. Ontologies should therefore aim to be relevant and accurate within a specified domain 3. Fallibilism – an ontology will change as our understanding and knowledge of a domain develops. It is therefore necessary to be able to keep track of different versions of an ontology and the changes made 4. Adequatism – room must be made for all the types of entity that exist within the domain of the ontology Arp et al. (2015, 44) suggest that ontologies are representations of reality rather than models of reality based on mental concepts: 174 Realism in ontology is based further on the idea that with the aid of science we can come to know the general features of reality in the form of universals and the relations between them. This realist approach has a number of general consequences. First, it implies that ontologies are representations of reality, not of people’s concepts or mental representations or uses of language. This presents some real challenges in dealing with human behaviour and motivations. When looking at privacy this research is concerned with motivations to disclose personal data online and the harms (and benefits) that might result. The harms themselves may depend on the perceptions of the individual, so that similar events might be viewed very differently by different individuals. What is the ‘reality’ we are trying to represent with this ontology? The fact of people’s perceptions is a reality that is captured in attitudinal surveys. They provide a snapshot if what people thought at a particular point in time – and of course they may change in light of experience, better understanding of online harms or education about privacy risks. Risk can be seen as part of the ontology of social reality rather than objective reality, because it depends on agency: “risk belongs to this subjective ontology [of social reality]. Thus, risks are real, but only insofar as there is a social reality in which subjects engage in risk taking.” (Merkelsen 2011, 894). 3.2 Choice of software The ontology was designed to be hospitable to RDF data to allow for import from other ontologies and export of the resulting ontology to new environments. The Protégé system developed at Stanford was considered as a suitable platform because it is widely used and has an active community of developers. It supports OWL, which is a W3C standard. The Synaptica Graphite system was also considered for this exercise and was eventually selected because of its terminology management features and the support available to the researcher. 3.3 Development of the ontology The methodology for development of the ontology was described in a previous paper (Haynes 2019). Noy and McGuiness’ (2001) iterative approach was adopted and applied to the seven-step method for ontology development of Arp et al (2015). 3.4 Testing and validation The ontology design was tested in a workshop with 14 researchers and practitioners with backgrounds in: knowledge organization, information governance, cybersecurity and information science. Participants worked in groups to examine the proposed representation of risk and to provide critiques to refine it. An initial set of risk incident types was incorporated into the ontology as a set of scenarios, based on standard definitions and on descriptions in the literature. A degree of normalisation was required for consistency. Seminar participants were asked to explore risk scenarios to identify the consequents and harms that could result from each type of incident. They were also asked to consider the causes that contributed to the incident. The responses were consolidated and expressed as relationships, which entered into the ontology.The relationship network was then explored and graphs generated to illustrate the connection between different entities in the ontology. 175 3.5 Visualization of graphs The graphs representing the relationships were shown using the visualization tool within the Synaptica Graphite system. This is an interactive system that allows exploration of the relationship between nodes and navigation through the landscape of risks, their causes and consequences. 4.0 Results 4.1 Scope of the ontology The scope of the ontology was defined during the early stage of the project and was based on the overall objective of better understanding risk to individuals. The scope of the onotology is described more fully in Haynes (2019, 171–72) and can be summarised as follows: The ontology covers online hazards faced by online users and the resulting consequences and harms to the individual. It shows the cause and effect relationships between threats, incidents and consequences of disclosing personal data online. The main purpose of the ontology is to map different types of hazards that individuals face and the possible mitigating actions that they could take. It will also identify similarities between different hazards and to identify ways in which they might be addressed. 4.2 Evolution of the representation of risk in the light of feedback During the workshop, the initial representation was endorsed with some modifications to align it more closely with the cybersecurity view of risk rather than the project management view. Figure 1 shows the revised representation of risk, which incorporates feedback from the formative workshop. Risk is now defined in terms of threats that exploit vulnerabilities in systems. The threats could be malicious or accidental. Risk events are classed as Incidents. As well as mitigating actions to lessen the impact of an incident, there are also avoiding actions and defending actions to reduce the likelihood of an incident and to reduce or eliminate the threat and/or vulnerability of a system. Figure 1: Modified Representation of Risk There was some discussion about whether consequence and harm should be separated. Examination of instances of this representation suggest that it is useful to distinguish 176 between the consequence of an incident and the harm to an individual. For example, during a data breach incident, personal bank account details might fall into the hands of criminals and the harm to the individual might be loss of money. The harm is not necessarily realised because the bank may take mitigating action, or the criminals might fail to exploit the data. 4.3 Scenarios A set of scenarios was developed from reports in the literature, the case studies conducted with the volunteers and development of scenarios during the workshop. Table 1 lists the scenarios used to test different types of risk faced by individuals. They were used to explore the relationships between the causes of a risk and its consequences and these were captured in the ontology. Table 1 - Scenarios used to develop the ontology Risk Incident scenario CLICK-BAIT Fall down a click-bait rabbit hole CLOUD STORAGE Data breach of cloud documents DIGITAL ASSISTANTS Digital assistant self-launches ILLEGAL SITE Visit an illegal site LOCATION TRACKING Location tracking made public NON-SECURE SITE Land on non-https site ONLINE BANKING Bank login details revealed ONLINE PURCHASES Data breach of online purchase transaction OUT-OF-DATE SOFTWARE Use out of date software PHISHING Respond to phishing email PICTURES ON SOCIAL MEDIA Hostile response to photo posted on social media PROFESSIONAL NETWORKS Employer discovers job-seeking activity RE-USE OF PASSWORDS Re-used password is detected 4.4 Exploring the network of relationships The modified representation of risk is based on different relationships between the concept classes. Table 2 shows the classes and their relationships within the ontology. Many of these relationships have reciprocals. So for instance, the top term ‘Psychological harm’ in the ontology scheme Harm, has narrower terms: ‘Annoyance’, ‘Fear’ and ‘Worry’. Each of these has a reciprocal broader term relationship with ‘Psychological harm’. Table 2 - Relationships allowed between concepts in different classes Subject (class) Predicate(s) Object (class or property) Consequence broader/narrower Consequence Consequence leadsTo Consequence Consequence leadsTo Harm Harm broader/narrower Harm Harm hasProperty Impact Incident broader/narrower Incident 177 Subject (class) Predicate(s) Object (class or property) Incident hasProperty probability Incident LeadsTo Incident Incident leadsTo Consequence Incident leadsTo Threat Response broader/narrower Response Response mitigates Impact Response mitigates Incident Response mitigates Harm Response mitigates Consequence Response mitigates Threat Response mitigates Vulnerability Threat exploits Vulnerability Threat leadsTo Incident Vulnerability broader/narrower Vulnerability Vulnerability leadsTo Incident The visualization of the ontology demonstrates the complex relationships between concepts (Figure 2). For instance ‘breach of cloud storage’ is a scenario in the Incident scheme. It is a consequence of ‘use of cloud services’ (a prerequisite in event tree analysis) and/or ‘data theft’. It leads to ‘loss of confidentiality’ and ‘consequential loss’. From Figure 2 we can see that Use of Cloud Services is a vulnerability and that has a number of subordinate relationships. Data theft on the other hand is classed as a threat because it implies intent on the part of an agent. Figure 2 – Causes and consequences of a breach of cloud storage 178 Going the other way, a breach of cloud storage could lead to loss of confidentiality, which in turn could lead to loss of reputation (a harm). Loss of confidentiality could also result from the self-launch of a digital assistant. There are likely to be other incidents that could lead to this consequence. The breach could also lead to a consequential loss resulting in financial loss to an individual, another harm. Some relationships are two way. For instance, ‘Breach of cloud storage’ could be both a cause and a consequence of ‘Loss of confidentiality’. This illustrates the greater richness of description that is possible using an ontology rather than a taxonomy. 4.5 Inferences from the ontology As well as providing a helpful visual display of the relationships between different aspects of risk, the ontology allows navigation and exploration of different aspects of risk. This could be valuable in tracking relationships and identifying connections that are not immediately obvious on initial inspection. A possible development of this research would be to consider the eigenvector values of each node to determine closeness and identify potential clusters of concepts (Hansen, Schneiderman, and Smith 2011). This may reveal deeper structure in the set of scenarios in the ontology. 5.0 Discussion and conclusion 5.1. Addressing the research questions The project set out to explore the nature of risks that individuals face when using the internet. One way of doing this is to develop a taxonomy. Taxonomies are based on hierarchical relationships and do not allow for the complex relationships between risk concepts. For this reason, an ontology was developed instead. It was based on preexisting work as well as industry definitions of vulnerability and threats (Haynes 2019). However, these definitions tended to be focused on technical issues and consequences to systems, or organizations. This ontology shifts the focus on to people and the impact of online incidents on individuals. It explores by means of scenarios the relationships between Vulnerabilities, Threats and Incidents and then the outcomes of Incidents in terms of Consequences and Harms. The ontology also includes Responses that could mitigate these risks. 5.2 Limitations The analysis of scenarios is based on one researcher’s interpretation of data gathered from a small group of experts. To some extent this is subjective and needs a more rigorous evaluation – possibly by means of a Delphi study. This would allow a panel of experts to arrive at a consensus about the concepts and relationships associated with the scenarios. 5.3 Future Development The next stage of development for this ontology is to populate it with instances from a variety of sources, including reports in the press, incident data from data protection regulators and case studies in the literature. This would test how well the scenarios describe the reality of online risks to individuals. It would also provide the groundwork for creation of linked data sets, which could be analysed to inform policy on online safety. 179 Acknowledgement This research was supported by the Royal Academy of Engineering and the Office of the Chief Science Adviser for National Security under the UK Intelligence Community Postdoctoral Fellowship Programme (Grant No. ICRF1718\1\54). The Graphite system used to develop the ontology was provided by Synaptica Ltd. The research was conducted during Dr Haynes’ Fellowship at the Department of Library and Information Science at City, University of London. Thanks to colleagues at Napier and the anonymous reviewers for their valuable comments and suggestions. References Acquisti, Alessandro, and Jens Grossklags. 2005. “Privacy and Rationality in Individual Decision Making.” IEEE Security & Privacy 3, no. 1: 26–33. Arp, R., B. Smith, and A.D. Spear. 2015. Building Ontologies with Basic Formal Ontology. Cambridge, MA: MIT Press eBooks Library. Aven, Terje, and Ortwin Renn. 2009. “On Risk Defined as an Event Where the Outcome is Uncertain.” Journal of Risk Research 12, no. 1: 1–11. Baldwin, Robert, Martin Cave, and Martin Lodge. 2010. The Oxford Handbook of Regulation. Oxford Handbooks in Business and Management. Oxford: Oxford University Press. Cavoukian, Ann. 2012. “Privacy by Design [Leading Edge].” IEEE Technology and Society Magazine 31, no. 4: 18–19. DCMS. 2019. Online Harms White Paper. Dinev, Tamara and Paul Hart. 2006. “An Extended Privacy Calculus Model for E-Commerce Transactions.” Information Systems Research 17, no. 1: 61–80. Enders, Albrecht, Harald Hungenberg, Hans-Peter Denker, and Sebastian Mauch. 2008. “The Long Tail of Social Networking.” European Management Journal 26, no. 3: 199–211. European Parliament. 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Processing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation). 20160504&from=EN. Facebook. 2020. Facebook Q4 2019 Results. Presentation-_final.pdf. Finucane, Melissa L and Joan L Holup. 2006. “Risk as Value: Combining Affect and Analysis in Risk Judgments.” Journal of Risk Research 9, no. 2: 141–64. Fischhoff, Baruch, Stephen R Watson, and Chris Hope. 1984. “Defining Risk.” Policy Sciences 17, no. 2: 123–39. Gimpel, Henner, Dominikus Kleindienst, and Daniela Waldmann. 2018. “The Disclosure of Private Data: Measuring the Privacy Paradox in Digital Services.” Electronic Markets 28, no. 4: 475–90. Gnoli, Claudio. 2017. “Classifying Phenomena Part 2: Types and Levels.” Knowledge Organization 44: 37–54. Hansen, Derek L, Ben Schneiderman, and Marc A Smith. 2011. Analyzing Social Media Networks with NodeXL: Insights from a Connected World. Burlington, MA: Morgan Kaufmann Publishers. 180 Haynes, David. 2019. “Creating an Ontology of Risk: A Human-Mediated Process.” In The Human Position in an Artificial World: Creativity, Ethics and AI in Knowledge Organization. ISKO UK Sixth Biennial Conference London 15-16th July 2019, edited by David Haynes and Judi Vernau, 167–80. Baden-Baden: Ergon Verlag GmbH. Haynes, David, David Bawden, and Lyn Robinson. 2016. “A Regulatory Model for Personal Data on Social Networking Services in the UK.” International Journal of Information Management 36, no. 6: 872–82. Haynes, David and Lyn Robinson. 2015. “Defining User Risk in Social Networking Services.” Aslib Journal of Information Management 67, no. 1: 94–115. ISO. 2009. ISO 31000:2009 Risk Management — Principles and Guidelines. Geneva: International Organization for Standardization ISO. 2011. ISO 25964-1:2011 - Information and Documentation — Thesauri and Interoperability with Other Vocabularies. Part 1: Thesauri for Information Retrieval. Geneva: International Organization for Standardization Kehr, Flavius, Tobias Kowatsch, Daniel Wentzel, and Elgar Fleisch. 2015. “Blissfully Ignorant: The Effects of General Privacy Concerns, General Institutional Trust, and Affect in the Privacy Calculus.” Information Systems Journal 25, no. 6: 607–35. Lessig, Lawrence. 2006. Code. 2nd ed. New York; London: BasicBooks. Loewenstein, George F, Elke U Weber, Christopher K Hsee, and Ned Welch. 2001. “Risk as Feelings.” Psychological Bulletin 127, no. 2: 267–86. McKone, Thomas E, and Lydia Feng. 2015. “Building a Human Health Risk Assessment Ontology (RsO): A Proposed Framework.” Risk Analysis 35, no. 11: 2087–2101. Merkelsen, Henrik. 2011. “The Constitutive Element of Probabilistic Agency in Risk: A Semantic Analysis of Risk, Danger, Chance, and Hazard.” Journal of Risk Research 14, no. 7: 881–97. Min, Jinyoung, and Byoungsoo Kim. 2015. “How Are People Enticed to Disclose Personal Information Despite Privacy Concerns in Social Network Sites? The Calculus between Benefit and Cost.” Journal of the Association for Information Science & Technology 66, no. 4: 839–57. Mohammad, Mahmud Abdulla, Ioannis Kaloskampis, Yulia Hicks, and Rossitza Setchi. 2015. “Ontology-Based Framework for Risk Assessment in Road Scenes Using Videos.” Procedia Computer Science 60, no. C: 1532–41. NIST. 2019. National Vulnerability Database. Noy, Natalya F. and Deborah L. McGuinness. 2001. Ontology Development 101: A Guide to Creating Your First Ontology. Stanford: Stanford CA. Oltramari, Alessandro and Alexander Kott. 2018. “Towards a Reconceptualisation of Cyber Risk: An Empirical and Ontological Study.” Journal of Information Warfare 17, no. 1: 49–73. Skinner, Geoff, Song Han, and Elizabeth Chang. 2006. “An Information Privacy Taxonomy for Collaborative Environments.” Information Management & Computer Security 14, no. 4: 382– 92. Solove, Daniel J. 2006. “A Taxonomy of Privacy.” University of Pennsylvania Law Review 154, no. 3: 477–564. Wright, David and Charles Raab. 2014. “Privacy Principles, Risks and Harms.” International Review of Law, Computers & Technology 28, no.3 : 277–98.

Chapter Preview



The proceedings explore knowledge organization systems and their role in knowledge organization, knowledge sharing, and information searching.

The papers cover a wide range of topics related to knowledge transfer, representation, concepts and conceptualization, social tagging, domain analysis, music classification, fiction genres, museum organization. The papers discuss theoretical issues related to knowledge organization and the design, development and implementation of knowledge organizing systems as well as practical considerations and solutions in the application of knowledge organization theory. Covered is a range of knowledge organization systems from classification systems, thesauri, metadata schemas to ontologies and taxonomies.


Der Tagungsband untersucht Wissensorganisationssysteme und ihre Rolle bei der Wissensorganisation, dem Wissensaustausch und der Informationssuche. Die Beiträge decken ein breites Spektrum von Themen ab, die mit Wissenstransfer, Repräsentation, Konzeptualisierung, Social Tagging, Domänenanalyse, Musikklassifizierung, Fiktionsgenres und Museumsorganisation zu tun haben. In den Beiträgen werden theoretische Fragen der Wissensorganisation und des Designs, der Entwicklung und Implementierung von Systemen zur Wissensorganisation sowie praktische Überlegungen und Lösungen bei der Anwendung der Theorie der Wissensorganisation diskutiert. Es wird eine Reihe von Wissensorganisationssystemen behandelt, von Klassifikationssystemen, Thesauri, Metadatenschemata bis hin zu Ontologien und Taxonomien.