Abstract
To understand the folding behavior of proteins is an important and challenging problem in modern molecular biology. In the present investigation, a large number of features representing protein sequences were developed based on sequence autocorrelation weighted by properties of amino acid residues. Genetic algorithm (GA) combined with multiple linear regression (MLR) was employed to select significant features related to protein folding rates, and to build global predictive model. Moreover, local lazy regression (LLR) method was also used to predict the protein folding rates. The obtained results indicated that LLR performed much better than the global MLR model. The important properties of amino acid residues affecting protein folding rates were also analyzed. The results of this study will be helpful to understand the mechanism of protein folding. Our results also demonstrate that the features of amino acid sequence autocorrelation is effective in representing the relationship between protein sequence and folding rates, and the local method is a powerful tool to predict the protein folding rates.
Original language | English |
---|---|
Pages (from-to) | 1159-1168 |
Number of pages | 10 |
Journal | Journal of Theoretical Biology |
Volume | 264 |
Issue number | 4 |
DOIs | |
Publication status | Published - Jun 2010 |
Externally published | Yes |
Keywords
- Amino acid sequence autocorrelation
- Genetic algorithm
- Local lazy regression
- Multiple linear regression
- Protein folding rate