Web scraping techniques to collect data on consumer electronics and airfares for Italian HICP compilation
Abstract
The paper is focused on the results of testing web scraping techniques in the field of consumer price surveys with specific reference to consumer electronics products (goods) and airfares (services). The paper takes as starting point the work done by Italian National Statistical Institute (Istat), in the context of the European project ``Multipurpose Price Statistics'' (MPS). Among the different topics covered by MPS are the modernization of data collection and the use of web scraping techniques. Included are the topic of quality (in terms of efficiency and reduction of error) and some preliminary comments about the usability of big data for statistical purposes. The general aims of the paper are described in the introduction (Section 1). In Section 2 the choice of products to test web scraping procedures are explained. In Sections 3 and 4, after a description of the survey for consumer electronics and airfares, the results and/or the issues of testing web scraping techniques are conveyed and discussed. Section 5 stresses some comments about the possible improvements in terms of quality deriving from web scraping for inflation measures. Some conclusive remarks (in Section 6) are drawn with a specific attention to big data issue. In two fact boxes centralised collection of consumer prices in Italy and the IT solutions adopted for web scraping are presented.