Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Issue title: The International Workshop on Socio-Technical Aspects in Security
Guest editors: Thomas Groß and Luca Viganò
Article type: Research Article
Authors: Lendák, Imrea; c; * | Indig, Balázsb | Palkó, Gáborb
Affiliations: [a] Data Science and Engineering Department, Faculty of Informatics, Eötvös Loránd University, Budapest, Hungary | [b] Department of Digital Humanities, Faculty of Humanities, Eötvös Loránd University, Budapest, Hungary | [c] Faculty of Technical Sciences, University of Novi Sad, Novi Sad, Serbia
Correspondence: [*] Corresponding author. E-mail: lendak@inf.elte.hu.
Note: [1] This paper is an extended and revised version of a paper presented at the International Workshop on Socio-Technical Aspects in Security.
Abstract: Web archives store born-digital documents, which are usually collected from the Internet by crawlers and stored in the Web Archive (WARC) format. The trustworthiness and integrity of web archives is still an open challenge, especially in the news portal domain, which face additional challenges of censorship even in democratic societies. The aim of this paper is to present a light-weight, blockchain-based solution for web archive validation, which would ensure that documents retrieved by crawlers are authentic for many years to come. We developed our archive validation solution as an extension and continuation of our work in web crawler development mainly targeting news portals. The system is designed as an overlay over a blockchain with a proof-of-stake (PoS) distributed consensus algorithm. PoS was chosen due to its lower ecological footprint compared to proof-of-work solutions (e.g. Bitcoin) and lower expected investment in computing infrastructure. We based our prototype on the open-source Nxt blockchain and implemented it in Python. The prototype was tested on web archive content crawled from Hungarian news portals at two different timestamps with more than 1 million articles in total. We concluded that the proposed solution is accessible, usable by different stakeholders to validate crawled content, deployable on cheap commodity hardware, tackles the archive integrity challenge and is capable to efficiently manage duplicate documents.
Keywords: Web archive, validation, blockchain, proof-of-stake, web crawling, censorship
DOI: 10.3233/JCS-210040
Journal: Journal of Computer Security, vol. 30, no. 3, pp. 499-515, 2022
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl