Affiliations: Department of Computer Science, Georgia State University, Atlanta, GA, USA, E-mails: {nmancuso, btork, alexz}@cs.gsu.edu | Centers for Disease Control and Prevention, Atlanta, GA, USA, E-mails: {kki8, lkg7}@cdc.gov | Department of Computer Science & Engineering, University of Connecticut, Storrs, CT, USA, E-mail: ion@engr.uconn.edu
Note: [] Corresponding author: Nicholas Mancuso, Department of Computer Science, Georgia State University, Atlanta, GA, USA. Tel.: +1 404 290 6526; Email: nmancuso@cs.gsu.edu.
Abstract: This paper addresses the problem of reconstructing viral quasispecies from next-generation sequencing reads obtained from amplicons (i.e., reads generated from predefined amplified overlapping regions). We compare the parsimonious and likelihood models for this problem and propose several novel assembling algorithms. The proposed methods have been validated on simulated error-free HCV and real HBV amplicon reads. The new algorithms have been shown to outperform the method of Prosperi et. al [24]. Our experiments also show that viral quasispecies can be reconstructed in most cases more accurately from amplicon reads rather than shotgun reads. All algorithms have been implemented and made available at https://bitbucket.org/nmancuso/bioa/.