Affiliations: Department of Engineering and Architecture, University of Parma, Italy
Correspondence:
[*]
Correspondence to: Gianfranco Lombardo, Department of Engineering and Architecture, University of Parma, Italy. E-mail: gianfranco.lombardo@unipr.it.
Abstract: The application of Machine Learning techniques over networks, such as prediction tasks over nodes and edges, is becoming often crucial in the analysis of Complex systems in a wide range of research fields. One of the enabling technologies in that sense is represented by Node Embedding, which enables us to learn features automatically over the network. Among the different approaches proposed in the literature, the most promising are DeepWalk and Node2Vec, where the embedding is computed by combining random walks and neural language models. However, characteristic limitations with these techniques are related to memory requirements and time complexity. In this paper, we propose a distributed and scalable solution, named ActorNode2vec, that keeps the best advantages of Node2Vec and overcomes the limitations with the adoption of the actor model to distribute the computational load. We demonstrate the efficacy of this approach with a large network by analyzing the sensitivity of walk length and number of walks parameters and make a comparison also with Deep walk and an Apache Spark distributed implementation of Node2Vec. Results show that with ActorNode2vec computational times are drastically reduced without losing embedding quality and overcoming memory issues.
Keywords: Network science, embedding, node embedding, Node2vec, actodes, distributed systems, data
mining, complex systems, actor model