Artificial intelligence analysis of FTIR and CD spectroscopic data for predicting and quantifying the length and content of protein secondary structures
Abstract: Besides NMR and X-ray crystallography, FTIR and CD spectroscopy are widely considered to be useful for determining protein secondary structure. These techniques can be used to obtain data in few minutes, using small quantities of proteins, which make them amenable for proteomics research. Here we explore the possibility of using artificial intelligence techniques to simultaneously analyse both FTIR and CD spectroscopic data for an identical set of proteins. Neural network analysis was carried out on normalised regions of FTIR (1700-1600 cm−1) and CD (180-259 nm) spectral data both with and without boxcar averaging in order to quantify the average length and percentages of secondary structures. A hybrid genetic algorithm/neural network approach, that automatically selects structure-sensitive wavelength/frequency, was used for the quantification of the protein secondary structure. Using this algorithm we also successfully identified the region of the CD spectrum that contains the most structure-sensitive information. This was located between 214-251 nm, suggesting that this region alone may be sufficient to rapidly determine the secondary structure content from CD spectral data. Overall, CD spectroscopic analysis produced better results compared to FTIR spectroscopy when selected wavelengths were used, although FTIR was better when the entire region between 1700-1600 cm−1 (FTIR), and 180-259 nm (CD), was subjected to neural network analysis. Application of Adaptive Neuro-Fuzzy Inference System (ANFIS) with fuzzy subtractive clustering for the analysis of the spectral data led to a slightly better prediction of the average helix/sheet length for FTIR spectroscopy compared to CD. Our findings reveal the potential of using artificial intelligence techniques for not only extracting structural information but also for better understanding of the relationship between complex spectral data and biologically important information.
Keywords: Artificial intelligence, FTIR, CD, spectroscopy, secondary structure, protein