Affiliations: [a] Statistics Netherlands (CBS), The Hague, The Netherlands | [b] University of Duisburg-Essen, Essen, Germany | [c] Statistics North Rhine-Westphalia (IT.NRW), Düsseldorf, Germany
Correspondence:
[*]
Corresponding author: Christian Borgs,
Information und Technik NRW, Statistisches
Landesamt (IT.NRW), Roßstraße 64, 40476 Düsseldorf, Germany. Tel.: +49 211 9449 2514; E-mail: christian.borgs@it.nrw.de.
Abstract: German official statistics publish statistics on personal insolvency. These statistics have been recently enhanced using web scraping to extract additional information from a public website on which the insolvency announcements are published. The currently scraped data is used for quality assurance and to derive an early indicator of personal insolvency. This paper provides novel methodological analyses for the same administrative database and presents further opportunities to improve the current official statistics regarding detail and timeliness using web scraping and text mining. These newly derived statistics inform on several aspects regarding personal insolvency’s demographic and spatial distribution.
Keywords: Administrative data, spatial data, text mining, web scraping, experimental statistics