Affiliations: [a] Computational Statistics Department, Politeknik Statistika STIS, Jakarta, Indonesia | [b] Directorate of Analysis and Statistics Development, BPS Statistics Indonesia
Correspondence:
[*]
Corresponding author: Setia Pramana Politeknik Statistika STIS, Jl Otista 64 C, Jakarta, 13330, Indonesia. E-mail: setia.pramana@stis.ac.id.
Abstract: The development of information technologies and the massive generation of data in today’s digital world provides new opportunities for official statistics. Big data, especially produced by a marketplace have a great potential to produce a list of online shops. The aim of this paper is to develop an online shop sampling frame from marketplace data. Using the shop and item datasets, an item-level data algorithm is developed to determine whether a shop is active or not to be included in the frame. In this study, the focus is for online shops in Jakarta Province, Indonesia. The algorithm is built using divide and conquer principle and statistical method. The frame produced consists of 13 attributes such as Shop ID, number of items, annual revenue, shop types, Business scale classification and the location (URL and physical address). The frame contains of 101,443 active online shops which most of it are micro enterprises.
Keywords: Digital economy, marketplace, sampling frame, big data