Algorithms in Bioinformatics


A   A   A
Sections
Home > Publications > Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs).

Skip to content. | Skip to navigation

Document Actions

C. Rausch, T. Weber, O. Kohlbacher, W. Wohlleben, and D. H Huson (2005)

Specificity prediction of adenylation domains in nonribosomal peptide synthetases (NRPS) using transductive support vector machines (TSVMs).

Nucleic Acids Res, 33(18):5799–5808.

We present a new support vector machine (SVM)-based approach to predict the substrate specificity of subtypes of a given protein sequence family. We demonstrate the usefulness of this method on the example of aryl acid-activating and amino acid-activating adenylation domains (A domains) of nonribosomal peptide synthetases (NRPS). The residues of gramicidin synthetase A that are 8 A around the substrate amino acid and corresponding positions of other adenylation domain sequences with 397 known and unknown specificities were extracted and used to encode this physico-chemical fingerprint into normalized real-valued feature vectors based on the physico-chemical properties of the amino acids. The SVM software package SVM(light) was used for training and classification, with transductive SVMs to take advantage of the information inherent in unlabeled data. Specificities for very similar substrates that frequently show cross-specificities were pooled to the so-called composite specificities and predictive models were built for them. The reliability of the models was confirmed in cross-validations and in comparison with a currently used sequence-comparison-based method. When comparing the predictions for 1230 NRPS A domains that are currently detectable in UniProt, the new method was able to give a specificity prediction in an additional 18\% of the cases compared with the old method. For 70\% of the sequences both methods agreed, for <6\% they did not, mainly on low-confidence predictions by the existing method. None of the predictive methods could infer any specificity for 2.4\% of the sequences, suggesting completely new types of specificity.

« July 2019 »
July
MoTuWeThFrSaSu
1234567
891011121314
15161718192021
22232425262728
293031