Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Issue title: Special Issue on Semantic Deep Learning
Guest editors: Dagmar Gromann, Luis Espinosa Anke and Thierry Declerck
Article type: Research Article
Authors: Vilalta, Armanda; * | Garcia-Gasulla, Darioa | Parés, Ferrana | Ayguadé, Eduarda; b | Labarta, Jesusa; b | Moya-Sánchez, E. Ulisesa | Cortés, Ulisesa; b
Affiliations: [a] Barcelona Supercomputing Center (BSC), 08034 Barcelona, Spain. E-mails: armand.vilalta@bsc.es, dario.garcia@bsc.es, ferran.pares@bsc.es, eduard.ayguade@bsc.es, jesus.labarta@bsc.es, eduardo.moyasanchez@bsc.es, ia@cs.upc.edu | [b] Universitat Politècnica de Catalunya (UPC), 08034 Barcelona, Spain
Correspondence: [*] Corresponding author. E-mail: armand.vilalta@bsc.es.
Abstract: The current state of the art for image annotation and image retrieval tasks is obtained through deep neural network multimodal pipelines, which combine an image representation and a text representation into a shared embedding space. In this paper we evaluate the impact of using the Full-Network embedding (FNE) in this setting, replacing the original image representation in four competitive multimodal embedding generation schemes. Unlike the one-layer image embeddings typically used by most approaches, the Full-Network embedding provides a multi-scale discrete representation of images, which results in richer characterisations. Extensive testing is performed on three different datasets comparing the performance of the studied variants and the impact of the FNE on a levelled playground, i.e., under equality of data used, source CNN models and hyper-parameter tuning. The results obtained indicate that the Full-Network embedding is consistently superior to the one-layer embedding. Furthermore, its impact on performance is superior to the improvement stemming from the other variants studied. These results motivate the integration of the Full-Network embedding on any multimodal embedding generation scheme.
Keywords: Multimodal embedding, Full-Network embedding, caption retrieval, image retrieval, deep neural network
DOI: 10.3233/SW-180341
Journal: Semantic Web, vol. 10, no. 5, pp. 909-923, 2019
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
sales@iospress.com
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
info@iospress.nl
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office info@iospress.nl
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
china@iospress.cn
For editorial issues, like the status of your submitted paper or proposals, write to editorial@iospress.nl
如果您在出版方面需要帮助或有任何建, 件至: editorial@iospress.nl