Algorithm and program

For each input (PSI-BLAST HTML or ALL ITERATIONS-TEXT) the server predicts the probable function of the protein based on a fuzzy logic algorithm which integrates four protein sequence characteristics which contributed to the accuracy of the functional prediction. The algorithm and program proceeds as follows. From the alignments of the PSI-BLAST output, the complete protein sequences are captured from the NCBI Non-redundant database. Then they are subjected to a fuzzy logic analysis which integrates 4 characteristics of these protein sequences. These characteristics, the crisp data, are classified in two main classes: "Profile Parameters" and "Numerical Parameters" .Profiles uses two analyses: the hydropathic profile according to the algorithm of Kyte and Doolittle (1981), and the flexibility profile according to the algorithm of Karplus and Schultz, (1985), both using the sequences of the matched alignment from the PSI-BLAST output. Sequence characteristics use the segment length of the matched alignment and the amino acid composition of the whole protein sequences, captured from the NCBI non-redundant database. In order to use the amino acid composition of a protein in a numerical value, we used the Euclidian distance as the classification tool. Euclidian distance is a dissimilarity index. It evaluates the difference in terms of amino acid composition between the query protein and each subject protein from the database.

These data must be converted to fuzzy data, so two binary fuzzy relations that can be applied to the controller must be established: Profile parameters (Hydropathy and Flexibility) and Numerical parameters (Composition and Length).

The machinery of approximate reasoning converts the knowledge embedded in the base-rule into a crisp (non-fuzzy) control algorithm. The control law is described by a knowledge-based algorithm consisting of IF THEN rules with vague predicates and a fuzzy logic inference mechanism. The base-rule is formed by a group of logical rules that describes the relationship between the input and the output of the controller. Each of the rules of the FLC is characterised by an IF part, called "premise", and with a THEN part, called the "consequent". The premise of a rule contains a set of conditions, the consequent contains a conclusion. If the conditions of the premise are satisfied, then the conclusions of the consequent apply.

The FLC can be looked upon as a system that, like its inputs, has the variables that are included in the antecedents of the rules xi (Profiles parameters and Numerical parameters), and like an output (y), the variable that is included in the consequents (Protein function).

Finally, the fuzzy logic values, numerical data, lead to a rearrangement of the initial PSI-BLAST output, and sequences, sometimes from the bottom of the initial PSI-BLAST output, bypass those with better Fuzzy scores climbing to the lead positions, the top in many cases. They suggest a putative function for the query protein in terms of fuzzy logic not of E value nor mathematical scoring.

 

Contact

Antonio Gomez

Juan Cedano

Antoni Hermoso

Enrique Querol

For any web-related incidence or commentary: Webmaster