Purchase individual online access for 1 year to this journal.
Price: EUR N/A
ISSN 1386-6338 (P)
ISSN 1434-3207 (E)
In Silico Biology is a scientific research journal for the advancement of computational models and simulations applied to complex biological phenomena. We publish peer-reviewed leading-edge biological, biomedical and biotechnological research in which computer-based (i.e.,
) modeling and analysis tools are developed and utilized to predict and elucidate dynamics of biological systems, their design and control, and their evolution. Experimental support may also be provided to support the computational analyses.
In Silico Biology aims to advance the knowledge of the principles of organization of living systems. We strive to provide computational frameworks for understanding how observable biological properties arise from complex systems. In particular, we seek for integrative formalisms to decipher cross-talks underlying systems level properties, ultimate aim of multi-scale models.
Studies published in
In Silico Biology generally use theoretical models and computational analysis to gain quantitative insights into regulatory processes and networks, cell physiology and morphology, tissue dynamics and organ systems. Special areas of interest include signal transduction and information processing, gene expression and gene regulatory networks, metabolism, proliferation, differentiation and morphogenesis, among others, and the use of multi-scale modeling to connect molecular and cellular systems to the level of organisms and populations.
In Silico Biology also publishes foundational research in which novel algorithms are developed to facilitate modeling and simulations. Such research must demonstrate application to a concrete biological problem.
In Silico Biology frequently publishes special issues on seminal topics and trends. Special issues are handled by Special Issue Editors appointed by the Editor-in-Chief. Proposals for special issues should be sent to the Editor-in-Chief.
About In Silico Biology
is a pendant to
(in the living system) and
(in the test tube) biological experiments, and implies the gain of insights by computer-based simulations and model analyses.
In Silico Biology (ISB) was founded in 1998 as a purely online journal. IOS Press became the publisher of the printed journal shortly after. Today, ISB is dedicated exclusively to biological systems modeling and multi-scale simulations and is published solely by IOS Press. The previous online publisher, Bioinformation Systems, maintains a website containing studies published between 1998 and 2010 for archival purposes.
We strongly support open communications and encourage researchers to share results and preliminary data with the community. Therefore, results and preliminary data made public through conference presentations, conference proceeding or posting of unrefereed manuscripts on preprint servers will not prohibit publication in ISB. However, authors are required to modify a preprint to include the journal reference (including DOI), and a link to the published article on the ISB website upon publication.
Abstract: To see the effect of temperature on the codon and amino acid usage in phages, codon and amino acid usage of 13 phages of extremely thermophilic prokaryotes were compared with that of 14 phages of mesophilic prokaryotes. Correspondence analysis on RSCU values of two groups of phage genomes clearly shows that phages are separated along the second major axis according to their growth temperature, whereas, they are separated along the first major axis according to their…GC content. Correspondence analysis on RAAU values of two groups of phages clearly shows that protein encoding genes of the phages along the second major axis are highly correlated with the GRAVY, aromaticity and cysteine content. Moreover, correspondence analysis on the regular and irregular structures of proteins of phages infecting extremely thermophilic prokaryotes reveals that temperature is one of the factors responsible for most significant differentiation of codon and amino acid usages variation in these phages.
Keywords: Relative synonymous codon usage (RSCU), relative amino acid usage (RAAU), correspondence analysis (CA), frequency of G+C at the synonymous third codon
positions (GC3s), covalently closed circular (ccc) DNA, GRAVY, aromaticity, cysteine content, thermophilic prokaryotes, mesophilic prokaryotes
Abstract: In this paper we describe some utilizing conditions of a recently published tool that offers two basic functions for the classical problem of discovering motifs in a set of promoter sequences. For the first it is assumed that not necessarily all of the sequences possess a common motif of given length l. In this case, CHECKPROMOTER allows an exact identification of maximal subsets of related promoters. The purpose of this program is to recognize putatively co-regulated…genes. The second, CHECKMOTIF, solves the problem of checking if the given promoters have a common motif. It uses a fast approximation algorithm for which we were able to derive non-trivial low performance bounds (defined as the ratio of Hamming distance of the obtained solution to that of a theoretically best solution) for the computed outputs. Both programs use a novel weighted Hamming distance paradigm for evaluating the similarity of sets of l-mers, and we are able to compute performance bounds for the proposed motifs. A set of At promoters were used as a benchmark for a comparative test against five known tools. It could be verified that SiteSeeker significantly outperformed these tools.
Abstract: A Naive Bayes classifier tool is presented for annotating proteins on the basis of amino acid motifs, cellular localization and protein-protein interactions. Annotations take the form of posterior probabilities within the Molecular Function hierarchy of the Gene Ontology (GO). Experiments with the data available for yeast, Saccharomyces cerevisiae, show that our prediction method can yield a relatively high level of accuracy. Several apparent challenges and possibilities for future developments are also discussed.…A common approach to functional characterization is to use sequence similarities at varying levels, by utilizing several existing databases and local alignment/identification algorithms. Such an approach is typically quite labor-intensive when performed by an expert in a manual fashion. Integration of several sources of information is in this context generally considered as the only possibility to obtain valuable predictions with practical implications. However, some improvements in the prediction accuracy of the molecular functions, and thereby also savings in the computational effort, can be achieved by restricting attention to only those data sources that involve a higher degree of specificity. We employ here a Naive Bayes model in order to provide probabilistic predictions, and to enable a computationally efficient approach to data integration.
Keywords: Protein function prediction, Naive Bayes, data integration, Gene Ontology
Abstract: High-throughput genome sequencing projects continue to churn out enormous amounts of raw sequence data. However, most of this raw sequence data is unannotated and, hence, not very useful. Among the various approaches to decipher the function of a protein, one is to determine its localization. Experimental approaches for proteome annotation including determination of a protein's subcellular localizations are very costly and labor intensive. Besides the available experimental methods, in silico methods present…alternative approaches to accomplish this task. Here, we present two machine learning approaches for prediction of the subcellular localization of a protein from the primary sequence information. Two machine learning algorithms, k Nearest Neighbor (k-NN) and Probabilistic Neural Network (PNN) were used to classify an unknown protein into one of the 11 subcellular localizations. The final prediction is made on the basis of a consensus of the predictions made by two algorithms and a probability is assigned to it. The results indicate that the primary sequence derived features like amino acid composition, sequence order and physicochemical properties can be used to assign subcellular localization with a fair degree of accuracy. Moreover, with the enhanced accuracy of our approach and the definition of a prediction domain, this method can be used for proteome annotation in a high throughput manner. SubCellProt is available at www.databases.niper.ac.in/SubCellProt.
Keywords: Protein function, subcellular localization, machine learning, PNN, kNN
Abstract: Restriction endonucleases represent one of the best studied examples of DNA binding proteins. Type II restriction endonucleases recognize short sequences of foreign DNA and cleave the target on both strands with remarkable sequence specificity. Type II restriction endonucleases are part of restriction modification systems. Restriction modification systems occur ubiquitously among bacteria and archaea. Restriction endonucleases are indispensable tools in molecular biology and biotechnology. They are important model system for specific protein-nucleic acid…interactions and also serve as good example for investigating structural, functional and evolutionary relationships among various biomolecules. The interaction between restriction endonucleases and their recognition sequences plays a crucial role in biochemical activities like catalytic site/metal binding, DNA repair and recombination etc. We study various patterns in restriction endonucleases type II and analyzed their structural, functional and evolutionary role. Our studies support X-ray crystallographic studies, arguing for divergence and molecular evolution. Conservation patterns of the nuclease superfamily have also been analyzed by estimating site-specific evolutionary rates for the analyzed structures related to respective chains in this study.
Abstract: Promoter prediction is an important and complex problem. Pattern recognition algorithms typically require features that could capture this complexity. A special bias towards certain combinations of base pairs in the promoter sequences may be possible. In order to determine these biases n-grams are usually extracted and analyzed. An n-gram is a selection of n contiguous characters from a given character stream, DNA sequence segments in this case. Here a systematic study is made to discover the…efficacy of n-grams for n = 2, 3, 4, 5 in promoter prediction. A study of n-grams as features for a neural network classifier for E. coli and Drosophila promoters is made. In case of E. coli n = 3 and in case of Drosophila n = 4 seem to give optimal prediction values. Using the 3-gram features, promoter prediction in the genome sequence of E. coli is done. The results are encouraging in positive identification of promoters in the genome compared to software packages such as BPROM, NNPP, and SAK. Whole genome promoter prediction in Drosophila genome was also performed but with 4-gram features.
Keywords: Biological data sets, machine learning method, neural networks, in silico method for promoter prediction, binary classification, cascaded classifiers
Abstract: There is a critical need for new and efficient computational methods aimed at discovering putative transcription factor binding sites (TFBSs) in promoter sequences. Among the existing methods, two families can be distinguished: statistical or stochastic approaches, and combinatorial approaches. Here we focus on a complete approach incorporating a combinatorial exhaustive motif extraction, together with a statistical Twilight Zone Indicator (TZI), in two datasets: a positive set and a negative one, which represents…the result of a classical differential expression experiment. Our approach relies on the existence of prior biological information in the form of two sets of promoters of differentially expressed genes. We describe the complete procedure used for extracting either exact or degenerated motifs, ranking these motifs, and finding their known related TFBSs. We exemplify this approach using two different sets of promoters. The first set consists in promoters of genes either repressed or not by the transforming form of the v-erbA oncogene. The second set consists in genes the expression of which varies between self-renewing and differentiating progenitors. The biological meaning of the found TFBSs is discussed and, for one TF, its biological involvement is demonstrated. This study therefore illustrates the power of using relevant biological information, in the form of a set of differentially expressed genes that is a classical outcome in most of transcriptomics studies. This allows to severely reduce the search space and to design an adapted statistical indicator. Taken together, this allows the biologist to concentrate on a small number of putatively interesting TFs.
Abstract: Using a large database of protein domain families of known 3-D structure we present an analysis on the relationships among sequences, structures and functions of closely-related enzymes performed at the level of catalytic domains. Only in 38% of the pairs of homologous catalytic domains characterized by over about 60% of sequence identity the functions are almost completely identical. Nearly 43% of the pairs differ in their substrate specificity. Hence the most common variation of enzyme function…among the closely-related homologues is the differences in the substrate specificity. For homologous pairs characterized by a sequence identity of 30–60%, if the structural difference metric is less than about 30, the functions are highly conserved. For clearly homologous protein domain pairs, usually sharing less than 40% sequence identity, we observe that often the chemical groups involved in the functions, and the cofactors differ. We also report of extremely unusual cases of closely-related homologues belonging to entirely different classes of enzymes. Such drastic shifts in the gross functions of homologues seem to be achieved by retooling of catalytic residues or by altering the stability of the intermediates in the biochemical reactions. Our work provides guidelines on the functional annotation based on homology searches and in structural genomics initiatives.
Keywords: Enzyme classification, homologous proteins, protein evolution, protein function, protein structure
Abstract: Salmonella enterica serovar Typhimurium invades the intestinal epithelial cells using type three secretion system (TTSS) encoded on Salmonella pathogenicity island-1 (SPI-1). The key regulator of this secretion system is HilA, which is in turn regulated by HilD, HilC and RtsA. It is also known that SirA/BarA system, a two-component regulatory system plays a crucial role in regulating HilA. There are two different mechanisms that have been proposed earlier for regulation of HilD-HilC-RtsA-HilA network by…SirA. One considers SirA to be acting through HilA and HilC, whereas the other considers SirA to be acting through HilD. In this paper, we have built mathematical models corresponding to both these scenarios and carried out simulations under different gene knock-out conditions. Additionally, since the two proposed mechanisms based on the experimental data are equally likely, we also considered a mechanism which is a combination of the two proposed mechanisms. The simulations were carried out to check the levels of HilA, the factor regulating the virulence, as well as the levels of the intermediate components in the network, namely HilC and RtsA. The simulation results were used to check the consistency of various models and also to suggest the most probable mechanism of hilA regulation. The results of our study show that while most of the mathematical models are able to predict the virulence data, the models considering SirA to regulate through HilA and HilC fail to predict the levels of intermediate components, HilC and RtsA. Nevertheless, one of the models considering regulation of virulence by SirA via HilD was able to predict results comparable to the experimental data. In addition, combination of this model (regulation by SirA via HilD) with the model considering regulation by SirA through HilA and HilC, also predicted results consistent with experimental observations. Our conclusions were further validated by testing the stability of the results against changes in parameter values, thus confirming the relative robustness of the proposed modeling system.
Keywords: Salmonella, pathogenicity, type three secretion system, virulence, HilA, SirA, mathematical modeling