Affiliations:
University of Roma, Tor Vergata, Via del Politecnico 1, Roma, Italy
Correspondence:
[*]
Corresponding author: Danilo Croce, University of Roma, Tor Vergata, Via del Politecnico 1, Roma, Italy. E-mail: croce@info.uniroma2.it.
Abstract: Expressive but complex kernel functions, such as Sequence or Tree kernels, are usually underemployed in NLP tasks as for their significant computational cost in both learning and classification stages. Recently, the Nyström methodology for data embedding has been proposed as a viable solution to scalability problems. It improves scalability of learning processes acting over highly structured data, by mapping data into low-dimensional compact linear representations of kernel spaces. In this paper, a stratification of the model corresponding to the embedding space is proposed as a further highly flexible optimization. Nyström embedding spaces of increasing sizes are combined in an efficient ensemble strategy: upper layers, providing higher dimensional representations, are invoked on input instances only when the adoption of smaller (i.e., less expressive) embeddings provides uncertain outcomes. Experimental results using different models of such an uncertainty show that state-of-the-art accuracy on three semantic inference tasks can be obtained even when one order of magnitude fewer kernel computations is carried out.
Keywords: Nyström method, scalability, kernel methods, structured language learning