Content

Juan Bernardo Montoya-Mogollón, Sonia Troitiño, Digital Forensics Science and Knowledge Organization: An Interdisciplinary Approach to Addressing the Conceptual Challenges of Born-Digital Records in:

International Society for Knowledge Organziation (ISKO), Marianne Lykke, Tanja Svarre, Mette Skov, Daniel Martínez-Ávila (Ed.)

Knowledge Organization at the Interface, page 302 - 309

Proceedings of the Sixteenth International ISKO Conference, 2020 Aalborg, Denmark

1. Edition 2020, ISBN print: 978-3-95650-775-5, ISBN online: 978-3-95650-776-2, https://doi.org/10.5771/9783956507762-302

Series: Advances in Knowledge Organization, vol. 17

Bibliographic information
Juan Bernardo Montoya-Mogollón – São Paulo State University – UNESP, Brazil Sonia Troitiño – São Paulo State University – UNESP, Brazil Digital Forensics Science and Knowledge Organization An Interdisciplinary Approach to Addressing the Conceptual Challenges of Born-Digital Records Abstract: The digital age has drastically impacted the way people think and act using technology. The advancements of the 20th and 21st centuries have ushered in a number of changes related to the treatment of information. Private and public organizations have faced increasing pressure to shift to digital formats. However, the speed at which these changes have occurred has created several conceptual concerns related to the trustworthiness of digital records and systems, and the security afford by their infrastructure. This paper identifies and addresses several of these concerns through the interdisciplinary theoretical frame of Archival, Diplomatic, and Digital Forensics areas. This alternative approach is presented as a contribution to Knowledge Organization (KO), which can help reframe the way scholars understand the following problems posed by digital records: 1) The challenge of maintaining digital records as authentic, accurate, and reliable sources of information; 2) The challenge of establishing trustworthy repositories for information storage and access; and 3) The challenge of ensuring long-term maintenance and preservation of records. We argue that the different concepts and techniques of Archival Diplomatics and Digital Forensics, such as provenance, original order principles, and chain of custody, have much to contribute to the professional study and practice of KO in digital environments, particularly where it relates to issues of authenticity. 1.0 Introduction Currently in Archival Science, a series of terms are used as synonymous with ‘records’ in bureaucratic and archival scenarios, such as data, document, and information. Some of these terms were created by The International Research on Permanent Authentic Records project – InterPARES1. We decided to adopt them as they can be used in an interoperable kind in Archival-Diplomatics, Forensics Science, and Knowledge Organization (KO). We will need to use this terminology in this article because in several studies these words are written without distinction, which can cause ambiguity for our research problem. We will look at a series of disciplines that use these concepts in their own field, sometimes with a different meaning. Data is the smallest meaningful piece of information. A document is recorded information. A record is any document created (i.e., made or received and set aside – i.e. kept, saved – for action or reference) by a physical or juridical person in the course of practical activity as an instrument and by-product of said activity (in some countries, a record is called archival document or simply document). Information is a message intended for communication across space and time (Duranti and Thibodeau 2006, 15). 1 This group born in 1990 with the “aims at developing the knowledge essential to the long-term preservation of authentic records created and/or maintained in digital form and providing the basis for the standards, policies, strategies, plans of action capable of ensuring the longevity of such material and the ability of its users to trust its authenticity.” (www.interpares.org) 303 Consequently, we will be using the concept of born-digital records. A search of academic literature can find electronic records or electronic documents, but a small difference exists: electronic regularly makes reference to electric signals and is applied when we talk of hardware or technological devices (Prayudi and Ashari 2015, 1). While the meaning of ‘digital’ is considered as the interaction of various hardware and software components to give rise to binary digits: “Hardware and low-level software detects the physical properties and interprets them as binary digits – i.e., digits that can take only one of two possible values – which are called bits. By convention, we say that the two possible binary values are 1 and 0” (Lee 2012a, 511). Therefore, we will use the word ‘born-digital’ to define those records created/produced exclusively in digital environments and not as a part of a digitization process. Even today, almost forty years after the introduction of digital records in business systems, there are still problems considering these born-digital records as legally reliable. Archival Science is concerned with studying and proposing measures to be adopted for their maintenance and long-term preservation. Maintenance and preservation were an issue with the analogical or paper record, but a machine did not need to mediate the process as is the case with digital records. Therefore, Archival Science is constantly renewing, by searching knowledge in other areas to understand the nature and composition of born-digital records and making efforts to maintain their trustworthiness. Figure 1. Interdisciplinarity of Archival Knowledge (Duranti 2010). For this purpose, we intend to establish the relationship between digital Archival- Diplomatics, Digital Forensics Science, and KO. This contribution is contemplating how Digital Forensics represents the data storage in several technological systems and provides a knowledge framework to understand the composition of born-digital records. With this knowledge, the archivist will have the ability to preserve and access records, in addition to possessing other kinds of tools found in digital forensics. 2.0 Archival-Diplomatics Science Archival Science deals with three core concepts: provenance, original order, and chain of custody: “Two principles key to archival theory are the principle of provenance, which respects the documentary facts, maintaining the records of one creator separately from another, and the principle of original order, 304 which mandates keeping/describing the records in the order in which they were created and used. When these principles are respected and articulated through archival description, the authenticity of the record aggregations is protected [18-20]. The presumption of authenticity derives initially from the context of creation and chain of custody, and the documented processes of establishing intellectual, administrative, and usually, physical control – appraisal, accessioning and archival arrangement” (Rogers 2013, 8). As Guimarães and Tognoli (2015, 567) also note, both provenance and original order concepts that belong to “respect des fonds”, are relevant to Archival Knowledge Organization (AKO). These concepts allow the identification of the records in “its production context for planning its creation/production and treatment of its accumulation in the archives led the area to think over the identification as an archival process and the discussions about the place it occupies in the context of archival methodologies” (Tognoli and Rodriguez 2018, 46-47). Thus, the identification process is key to the creation of knowledge as it must consider the application of the principles of provenance and original order. On the other hand, Diplomatics Science was founded in 1681 by Jean Mabillon (1602-1707), a French Benedictine monk who established the foundations to analyze the authenticity of diplomas (hence the name Diplomatics for the records of this period) kept in the Abbey of Saint-Germain-des-Prés. He wrote his masterpiece de re diplomatica libri VI, at a time of social upheaval due to document forgeries throughout Europe. His work aimed to examine and compare the extrinsic (support, ink, seal, signature, etc.) and intrinsic (the intellectual message of the author) elements of the form of diplomas or records to verify their authenticity or falsehood. In the 20th century, Diplomatics Science was renewed and adapted for Archival Science. The two scholars key to reviving the study of the relationship between Diplomatics and Archival Science were Hilary Jenkinson (1957) and Roger-Henry Bautier (1961). However, in the early 1980s with the arrival of the digital age, the extrinsic and intrinsic elements of form began to address the concept of authenticity in the digital world (Duranti 1989; Duranti 2009, 42). This involved analyzing the individuals that participate in the process of record creation, the contexts in which the record exist, the act or transaction in which it participates, the procedures and documentary forms governing its creation, and the relationships that connect it to other records (Rogers 2015, 8). However, in the digital environment, the extrinsic and intrinsic elements of form are not easy to address because the representation of the digital records operates in abstract structures of conceptual, logical, and physical layers. The conceptual layer is the way the record is understood by a person on the screen or monitor. The logical layer is an object that is recognized and processed by hardware and software. The physical layer is an inscription of signs on a physical medium. The conceptual layer is closer to the traditional record in paper format. The logical layer is the most important to understand how the record is represented because of the storage data (content) and the metadata which could be easily visible in the conceptual layer or by using specialized tools (Rogers 2015). Archival-Diplomatics has been broadly studied in KO by a series of academics. We intend to continue this trend by considering the application of forensics science as a contribution to the KO. The study of Digital Forensics Science has been significant to analyze and preserve the authenticity of born-digital records in the long-term. 305 3.0 Digital Forensics Science Digital Forensics is a relatively new discipline (Pollitt 2010), which arose at the start of the 1980s at the height of computer accessibility. At the same time, illicit acts started to appear. Currently, this discipline is faced with an increasing number of challenges because of the internet and the complexity of digital devices. The most widely used definition of Digital Forensics is provided by the Digital Forensic Research Conference -DFRWS: “The use of scientifically derived and proven methods toward the preservation, collection, validation, identification, analysis, interpretation, documentation and presentation of digital evidence derived from digital sources for the purpose of facilitating or furthering the reconstruction of events found to be criminal, or helping to anticipate unauthorized actions shown to be disruptive to planned operations” (Palmer 2001, 16). In its early days, it was known as Computer Forensics, but with the broadening of its research objective, its name was changed to Digital Forensics: digital because it analyzes the information configured in binaries digits (as explained earlier) and forensics because it has a narrow relationship with the law. The etymological meaning of forensics is related to the forum “because the courts are forums where information may persuade us to restrict or remove individual liberties, they have proven to be a serious testing ground for scientific research” (Palmer 2001, 2). The relationship (and necessity) between Digital Forensics and Digital Archival Diplomatics draws on maintaining and preserving the trustworthiness of born-digital records. In a study written by Elizabeth Diamond (1994, 140), she considered that “if the historian is the lawyer in the court of history, then the archivist is the forensic scientist”. The responsibility of archivists is to ensure the maintenance and preservation of records, more-so, of born-digital records. A series of skills are required, for example: 1) Knowing the internal framework or layers; 2) How it is represented in the technological systems; 3) How to maintain its trustworthiness over time; and 4) How to provide access to users. According to Adam Jansen and Luciana Duranti (2011), knowledge of professionals about digital records must be more accurate since the challenges in the digital environment are increasingly complex: “Custodians can only preserve records as trustworthy (i.e. reliable, accurate and authentic) as they are when first created. It is therefore the custodian’s responsibility to establish the identity of the records prior to acquiring them and to maintain that identity, together with their integrity, afterwards (MacNeil, 2004). In the digital environment, this is a tall order, because it is not possible to preserve digital records; it is only possible to preserve the ability to reproduce them (Duranti and Thibodeau, 2006). As it will always be necessary to retrieve the binary bits and process those bits through the use of intermediaries (i.e. hardware and software) in order to render the evidence into a human readable format, it falls upon the custodian to ensure that the necessary intermediaries will exist when needed. To render representations with an accuracy that is able to withstand a diplomatic analysis requires the custodian to store the binary content of the record, including indicators of all the elements of documentary form necessary to convey the essence of the record, in a manner that ensures the record will be rendered with the same presentation and in the same context that gave it meaning” (Duranti and Jansen 2011). The above quotation explains the responsibility and accountability of archivists. Therefore, Archival-Diplomatics provides a framework that supports the preservation of the integrity and identity of born-digital records. The junction with digital forensics allows the use of tools and practices to deepen the representation of digital devices and its layers by keeping accurate information to produce accurate knowledge. 306 4.0 Digital Forensics Science and Knowledge Organization We noticed an evolution in how Archival Diplomatics fit the digital environment. The efforts made to work interdisciplinarily and apply the knowledge of disciplines, as such Diplomatics, Forensics, Law, History, and Computational Science, have supplied prolific answers for issues related to the trustworthiness, maintenance, and preservation of digital records in determined social contexts. The application of Digital Forensics to Archival Diplomatics processes allows us to interchange concepts, techniques, and tools to understand the structure of digital environments. Therefore, it should be stated that the archivist and the professional in Digital Forensics process have similar objectives for different purposes. The author Corinne Rogers (2013, 6) notes this similarity between the identification of (digital) records in the archival study and the (digital) evidence in the study of digital forensics. She expresses that: “Archivists and digital forensics practitioners share challenges involved in appraising and analyzing large volumes of digital material. The core archival functions have been identified as appraisal and acquisition, arrangement and description, retention and preservation, management and administration, reference and access [11]. Digital preservation has been demonstrated to encompass records creation and recordkeeping [12], thereby extending the archival functions over the entire life cycle of digital records. The traditional archival functions may be compared with the functions of digital forensics practice: identification, preservation, collection, examination, analysis, presentation and decision [13]. At the root of each is investigative research into the material in question – namely the digital traces of activities, and the relationships of those traces to the actors and actions which gave rise to them” (Rogers 2013, 6). The functions mentioned above in both Archival-Diplomatics and Digital Forensics are shared and complemented by concepts such as provenance, original order, and chain of custody in the digital context. However, their meanings are slightly different. Provenance is defined as the identification, extraction, and the saving of essential information about the context of creation; original order reflects original folder structures, files associations, related applications, and user accounts; and the chain of custody is the documentation of how records were acquired, whether or not they were transformed, and the use of hardware and software mechanisms to ensure that the data has not been inadvertently changed. Another relevant concept shared between forensics and archival sciences is the identification of sensitive information, specifically personal and private information, “the same tools that are used to expose sensitive information can be used to identify, flag and redact or restrict access to it” (Lee 2012b, 5). Therefore, the chain of custody is paramount in the digital environment. This element is essential to maintaining the integrity of the bitstreams, and although some challenges exist to ensure such integrity, its use is increasingly common. Its issues are related to 1) The volatility of digital evidence in resources such as register, memory, table, processor, temporary filesystem, disk, remote logging and data monitoring, physical configuration, and network topology, as well as archived data (Prayudi and Azhari 2015, 3); 2) with medium failure/bit rot; and 3) the obsolescence of both software and hardware (Lee et al. 2012). Digital Forensics practitioners build knowledge by deepening the internal structure of the born-digital records with the aim of better understanding its nature. To get to this point, it is necessary to understand the inner workings of logical, physical, and conceptual layers addressed by Rogers (2015). In addition to this, we need to examine how these layers are represented and structured within the technological systems. The author Christopher Lee (2012a) outlines the structure of these layers in an interesting overview. 307 He defines nine levels of representation (Figure 2) described as digital resources, namely: 0) bitstream on physical medium; 1) raw signal stream through I/O equipment; 2) bitstream through I/O equipment; 3) sub-file data structure; 4) file as “raw” bitstream; 5) File through filesystem; 6) In-application rendering; 7) Object or package; and 8) Aggregation of objects. Figure 2: Digital Resources – Levels of Representation (Lee 2012c). These levels go beyond Knowledge Organization Systems (KOS): “used to organize documents, document representations and concepts” (Hjørland 2008, 86). They act as preparation to care for the integrity in the storage systems. This is a very important first step to maintain the integrity of data, records, documents, and information until it is transformed into knowledge. In this regard, Digital Forensics exists as the first step to identify digital objects before later being integrated with Archival Diplomatics as a means to achieve knowledge. This knowledge has two meaningful uses: in certain legal or juridical cases and/or for archival or stewardships processes. 5.0 Concluding remarks Digital Forensics research is currently being analyzed and explored, with interesting outcomes in several centers of research, law enforcement agencies, and universities. We hope that the knowledge brought by this science continues to increase for the benefit of digital records conservation. The concepts, practices, and tools of Digital Forensics are being applied in several knowledge fields, and we can notice the benefits this offers to disciplines, such as Information Science, Archival-Diplomatics, and Knowledge Organization. A series of software products are being developed to carry out this work. One such example is “BitCurator,”2 a software produced in 2016 in a partnership between the 2 https://github.com/bitcurator/bitcurator-nlp/wiki 308 School of Information and Library Science at the University of North Carolina at Chapel Hill and the Maryland Institute for Technology in the Humanities. This software runs with natural language processing (NLP), developing for collecting institutions to extract, analyze, and produce reports on features of interest in text extracted from born-digital materials contained in collections (BitCurator 2018). The knowledge of archivists is key to the arrangement and description of paper records. Now in the digital device era, in addition to the Archival-Diplomatics Knowledge, tools and the application of the techniques produced and developed by Digital Forensics are fundamental because they deal directly with the integrity, reliability, and authenticity of digital records. Identifying, seizing, imaging, and analyzing material, such as floppy disks, cassette tapes, compact disks, USB, and hard drives, could be better performed through the application of technological tools and digital forensics processes to maintain the data integrity. References Bautier, Robert-Henri. 1961. “Leçon D'ouverture du Cours de Diplomatique à l'École des Chartes.” Bibliothèque de l’Ecole des Chartes 119: 194-225. Diamond, Elizabeth. 1994. “The Archivist as Forensic Scientist – Seeing Ourselves in a Different Way.” Archivaria 38: 139-154. Duranti, Luciana. 1989. “Diplomatics: New Uses for an Old Science.” Archivaria 28: 7-27. Duranti, Luciana. 2009. “From Digital Diplomatics to Digital Records Forensics.” Archivaria 68: 39-66. Duranti, Luciana. 2010. A Framework for Digital Heritage Forensics. http://mith.umd.edu/forensics/wp-content/uploads/2010/05/4n6umd_duranti.pdf. Duranti, Luciana and Adam Jansen. 2011. “Authenticity of Digital Records: An Archival Diplomatics Framework for Digital Forensics.” In: ECIME - 5th European Conference on Information Management and Evaluation – COMO, Italy, 134-139. Duranti, Luciana and Kenneth Thibodeau. 2006. “The Concept of Record in Interactive, Experiential and Dynamic Environments: The View of InterPARES.” Archival Science 6: 13-68 Guimarães, José Augusto Chaves and Natália. Bolfarini Tognoli, 2015. “Provenance as a Domain- Analysis Approach in Archival Knowledge Organization.” Knowledge Organization 42: 562- 569. Hjørland, Birger. 2008. "What is Knowledge Organization (KO)?" Knowledge Organization 35: 86-101. Jenkinson, Hilary. 1958. “Archives and the Science and the Study of Diplomatic.” Journal of the Society of Archivist 8: 207-210. Lee, Christopher A. 2012a. “Digital Curation as Communication Mediation.” In: Handbook of Technical Communication, edited by Alexander Mehler and Laurent Romary. Berlin: De Gruyter Mouton, 507-530. Lee, Christopher A. 2012b. “Archival Application of Digital Forensics Methods for Authenticity, Description and Access Provision.” Comma 2: 133–140. Lee, Christopher A. 2012c. “Archival Application of Digital Forensics Methods for Authenticity, Description and Access Provision.” Presented at International Council on Archives Congress 2012. http://ica2012.ica.org/files/pdf/Full%20papers%20upload/ica12Final00290.pdf Lee, Christopher A., Matthew Kirschenbaum, Alexandra Chassanoff, Porter Olsen, and Kam Woods. 2012. “BitCurator: Tools and Techniques for Digital Forensics in Collecting Institutions.” D-Lib Magazine 18, nos. 5/6. http://www.dlib.org/dlib/may12/lee/05lee.html Palmer, Gary. 2001. “A Road Map for Digital Forensic Research. Digital Forensic Research Conference (DFRWS).” In The Digital Forensic Research Conference 2001, 1-42. 309 Pollitt, Mark. 2010. “A History of Digital Forensics.” In: Advances in Digital Forensics VI. Digital Forensics 2010, edited by K.P. Chow and S. Shenoi. IFIP Advances in Information and Communication Technology 337. Berlin, Heidelberg: Springer, 3–15. Prayudi, Yuri and Azhari SN. 2015. “Digital Chain of Custody: State of the Art.” International Journal of Computer Applications 114, no. 5: 1-9. Rogers, Corinne. 2013. “Digital Records Forensics: Integrating Archival Science into a General Model of the Digital Forensics Process.” In Proceedings of the Second International Workshop on Cyberpatterns: Unifying Design Patterns with Security, Attack and Forensic Patterns 2013, edited by Clive Blackwell. Oxford, UK: Oxford Brookes University, 4-21. Rogers, Corinne. 2015. “Diplomatics of Born Digital Documents – Considering Documentary Form in a Digital Environment.” Records Management Journal 25, no.1: 6-20 Tognoli, Natalia Bolfarini and Ana Célia Rodrigues. 2018. “An Analysis of the Theoretical and Practical Application of Diplomatics to Archival Description in Knowledge Organization.” In Challenges and Opportunities for Knowledge Organization in the Digital Age: Proceedings of the Fifteenth International ISKO Conference 9-11 July 2018 Porto, Portugal, edited by Fernanda Ribeiro and Maria Elisa Cerveira. Advances in knowledge organization 16. Baden- Baden: Ergon, 43-51.

Chapter Preview

References

Abstract

The proceedings explore knowledge organization systems and their role in knowledge organization, knowledge sharing, and information searching.

The papers cover a wide range of topics related to knowledge transfer, representation, concepts and conceptualization, social tagging, domain analysis, music classification, fiction genres, museum organization. The papers discuss theoretical issues related to knowledge organization and the design, development and implementation of knowledge organizing systems as well as practical considerations and solutions in the application of knowledge organization theory. Covered is a range of knowledge organization systems from classification systems, thesauri, metadata schemas to ontologies and taxonomies.

Zusammenfassung

Der Tagungsband untersucht Wissensorganisationssysteme und ihre Rolle bei der Wissensorganisation, dem Wissensaustausch und der Informationssuche. Die Beiträge decken ein breites Spektrum von Themen ab, die mit Wissenstransfer, Repräsentation, Konzeptualisierung, Social Tagging, Domänenanalyse, Musikklassifizierung, Fiktionsgenres und Museumsorganisation zu tun haben. In den Beiträgen werden theoretische Fragen der Wissensorganisation und des Designs, der Entwicklung und Implementierung von Systemen zur Wissensorganisation sowie praktische Überlegungen und Lösungen bei der Anwendung der Theorie der Wissensorganisation diskutiert. Es wird eine Reihe von Wissensorganisationssystemen behandelt, von Klassifikationssystemen, Thesauri, Metadatenschemata bis hin zu Ontologien und Taxonomien.