Affiliations: [a] Assistant Librarian, Central Library, IIT Kharagpur; (E): samrat@library.iitkgp.ernet.in | [b] Librarian, Central Library, IIT Kharagpur, and Co-Project Investigator, National Digital Libraries (NDL) Project, India; (E): bsutra@library.iitkgp.ernet.in | [c] Professor, Department of Computer Science and Engineering, IIT Kharagpur; Joint-Project Investigator, NDL Project and Head, RM School of Engineering Entrepreneurship, IIT Kharagpur; (E): ppd@cse.iitkgp.ernet.in
Abstract: OAI-PMH-enabled open source digital library software, such as DSpace, EPrints, VuFind, Drupal OAI harvester, and PKP harvester, have made it possible to harvest massive metadata from different IDRs. IT brought new hope and opportunities for providing various new services to our library users. This article attempts to explore the tools, techniques, and significant challenges for largescale metadata harvesting and metadata curation. A recent bibliographic study of Scopus has shown that there is a rapid increase in publication of articles over the last two decades. “A total of 25,482 publications represent the literary output in different formats, in different subjects, and from various nations” (ul Ajaz Wani and Gul 2008). All these preprint academic research documents, such as conference papers, journal articles, annual reports, protocols, and lecture notes may be already uploaded or need to be uploaded in various institutional digital repositories (IDRs) for long-term digital preservation and reuse. In this study, we have harvested the metadata from different such IDRs into a centrally indexed repository for providing a single window search box. Therefore, with this, we may dream that the day is not far away when we will not need any e-resource subscriptions, as those will be available in our IDR. It will indeed be a great achievement and will be extremely helpful to the academic community. However, along with this, a continuous metadata curation is a major intermediate phase, which focusses on the proper mapping of data to metadata. Programmatic curation and manual curation are the two processes conducted for final curation of the harvested metadata, where both have their own merits and demerits. This article further focusses on the process workflow of metadata data curation, and the possible challenges to be managed by the librarian for proper indexing of the items.