Welcome!
The ProScan tool allows for the quick and easy searching of protein sequences for a particular motif. The user provides a motif in the valid syntax and select a file to search against, and the tool does the rest. EcoliWiki ProScan uses a local copy of the ps_scan program obtained from Swiss Institute of Bioinformatics. The ps_scan program is free software obtained from ftp://ca.expasy.org/databases/prosite/tools/, and is licensed under the GNU Public License. Help is available if you click on the red question mark.

Pattern to search:
?

Pattern syntax

  1. The standard IUPAC one-letter codes for the amino acids are used in PROSITE.
  2. The symbol 'x' is used for a position where any amino acid is accepted.
  3. Ambiguities are indicated by listing the acceptable amino acids for a given position, between square brackets '[ ]'. For example: [ALT] stands for Ala or Leu or Thr.
  4. Ambiguities are also indicated by listing between a pair of curly brackets '{ }' the amino acids that are not accepted at a given position. For example: {AM} stands for any amino acid except Ala and Met.
  5. Each element in a pattern is separated from its neighbor by a '-'.
  6. Repetition of an element of the pattern can be indicated by following that element with a numerical value or, if it is a gap ('x'), by a numerical range between parentheses.
    Examples:
    x(3) corresponds to x-x-x
    x(2,4) corresponds to x-x or x-x-x or x-x-x-x
    A(3) corresponds to A-A-A
    Note: You can only use a range with 'x', i.e. A(2,4) is not a valid pattern element.
  7. When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either starts with a '<' symbol or respectively ends with a `>' symbol. In some rare cases (e.g. PS00267 or PS00539), '>' can also occur inside square brackets for the C-terminal element. 'F-[GSTV]-P-R-L-[G>]' means that either 'F-[GSTV]-P-R-L-G' or 'F-[GSTV]-P-R-L>' are considered.
The following extended syntax is allowed for scanProsite:
  • If your pattern consists of one-letter amino acid codes only, without any ambiguous residues, you need not specify the '-', i.e. you can directly copy/paste peptide sequences into the text field.
    Example: M-A-S-K-E can be written as MASKE.

  • To search all sequences which do not contain a certain amino acid, e.g Cys, you can use <{C}*>.

Examples :


[AC]-x-V-x(4)-{ED}
This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}

< A-x-[ST](2)-x(0,1)-V
This pattern, which must be in the N-terminal of the sequence ('<'), is translated as: Ala-any-[Ser or Thr]-[Ser or Thr]-(any or none)-Val

<{C}*>
This pattern describes all sequences which do not contain any Cysteines.

IIRIFHLRNI
This pattern describes all sequences which contain the subsequence 'IIRIFHLRNI'.

Options:


Protein Sequence:


Output Format:
?

?
Fasta:
Returns the matched pattern in FastA format.
Scan:
Retruns the gene name, the positions of the match, and the match itself.
GFF:
Returns the matched pattern in GFF format.
Sequence:
Returns the entire sequence of a protein that matches anywhere in that sequence.
You can have your results compressed into the standard Gzip format for faster downloads. The files are not stored on the server, please download them directly from the results page.