Query Tab

Only checked sections are used when the Submit button is clicked.

Unchecked sections are collapsed and hidden from view. Settings may persist in unchecked sections, but will be ignored unless the associated checkbox is checked.

In general, a chain or chain pair must satisfy the specified criteria in all checked sections of the form, to be included in the results.

Pairing

Chains are paired in abYsis using a combination of sequence analysis results and textual information obtained from the data source. The pairing information is pre-calculated and stored in the database.

A chain pair will be included in the results only if both chains satisfy their applicable light or heavy chain specific criteria (as specified in the Structure, Sequence or Restrict to Antibody Class sections of the form) and if at least one of the chains satisfies criteria in the Basic section of the form. The Basic criteria are handled differently because sometimes, information that should be identical for both chains in a pair, such as antigen, author and organism, can sometimes be missing or miss-spelled..

Text Searches

Text searches in abYsis are case insensitive. Search and target terms are converted to a common case before all comparisons.

Supported comparisons include:

Operator Description
is Find exact text without any additions at beginning or end. Searching for Jon will not find Jones
is like Allow wild cards at beginning and end. Searching for Jon will find Jones, Daejon and Jong.
~ (regex) PostgreSQL POSIX regular expression matching.

Quick

This will search all fields where text can be written in the Basic search using the operator ‘is like’.


Basic

Data Source

Restrict search to a preferred data source.

Source ID or Accession

Source ID Primary entry-level identifier used by the Data Source. Examples include:

There are often multiple protein and/or nucleotide sequences in abYsis corresponding to a given Source ID.

Accession Uniquely identifies a sequence in abYsis, for a given Source ID. Examples:

Where the Data Source does not provide a suitable sequence-level identifier and there are multiple sequences for a given Source ID, the abYsis accession is the Source ID with an appended counter e.g. A123456 (2), A123456 (3).

Commercial Users: Proprietary sequences entered by the User through the Import function operate similarly.

Name

The Name field is derived from textual annotations provided by the Data Source.
Only a single item should be entered even though the entry may have more than one word corresponding to gene name, protein product, sequence title, mnemonic or other text description.

For example, the Name field for abYsis-EMBLI-IG entry ABD73927.1 is mAb3F2 immunoglobulin gamma heavy chain. This entry would be identified if; ‘is like’ mAb3F2 was used.

Antigen

Populated only for Kabat sequences as cannot be parsed automatically from other public data sources.

Only a single name or search term should be entered.

Clone

Populated only for data sources using EMBL format files.

Only a single name or search term should be entered.

Reference

Search titles and publication details of the reference and patent data associated with each sequence.

Patent data is populated only for data sources using EMBL format files.

Only a single name or search term should be entered.

Author

Search surnames of the authors of the reference data associated with each sequence.

Only a single name or search term should be entered.

Publication Year

Select a Publication Year and use the adjacent dropdown to select whether you are interested in publications before, after or during that year.

Search will be restricted to sequences with at least one publication in the specified range.

Organism

Organism names have been parsed from the data source, with some automated error checking and/or mapping via aliases.

Commercial Licencees: For Proprietary sequences you can further differentiate your entries by using your own Organism when using the Import facility. e.g. Company Mouse

The organism name stored in abYsis is almost always the species or sub-species, sometimes the genus and very occasionally a common name.

In some cases, the species will be displayed as two names (e.g. Homo sapiens, Mus musculus). This is the annotation that appears in the original data and represents a chimeric of some sort. In practice the variable domain sequence is most likely to come from the second organism, but that may not be the case.

The search will be restricted to organisms with that name and organisms that start with that name e.g. if you search for Rattus the search will allow Rattus rattus and Rattus norvegicus.
Note that species information is taken from the source data files.

Exclude sequences with warnings

Check this box to exclude sequences with warnings.

A small fraction (<1%) of public data loaded into abYsis carry warnings. The bulk of these are germline DNA sequences flagged as pseudogenes or non-functional. You can avoid these by selected Exclude.

Exclude unclassified sequences

Check this box to exclude unclassified sequences.

Sequences are classified in abYsis as heavy, light, kappa or lambda using a combination of textual annotations provided by the data source and computed annotations made by abYsis. In some cases textual annotation is incomplete or ambiguous and in some cases abYsis may fail to determine a chain type. Where there is an inconsistency, the computed annotation is preferred and the sequence is tagged with a warning.
You can avoid these by selected Exclude.

Exclude unpaired sequences

Check this box to exclude unpaired sequences.

Light and heavy chain sequences are paired in abYsis using a combination of textual annotations provided by the data source and computed annotations made by abYsis. A cautious approach is taken to pairing to avoid incorrect pairs at the expense of missing some correct pairs.

Exclude un-numbered sequences

Check this box to exclude sequences that are un-numbered because the automated numbering has failed.

Not all sequences can be numbered. For example, some sequences with large and/or unusual deletions or insertions cannot be numbered. All numbered sequences are classified, but there are some classified sequences that are not numbered.
Protein sequences shorter than 70 residues are not processed through the numbering pipeline.


Structure

Overview

Sequences with a predicted canonical class

Restrict to chains with loops predicted to belong to a specified canonical class based on the presence of key residues.

Each canonical class (a structural concept) is encoded via a set of sequence rules that require particular residues at certain specified positions i.e. positions which must be present, for a match. Via these rules, canonical classes can be predicted for all numbered chains, irrespective of whether structural information is available.

See the help on the About/Definitions page for a description of terms ('Exact' / 'Similar') and the different classification 'Methods'.

Residues within x Ångströms of your chosen position in known structures

Restricts the search to structures that have (or do not have) particular amino acid types within a particular distance of a specified residue position.

Add row for additional positions or click delete row to remove positions not required.

If the Chothia Position dropdown is not selected in a given row, no constraint is applied and the row is effectively removed from consideration. Similarly, if the at least one of option is selected in the Constraint dropdown, but nothing has been entered in the text box, no further constraint is applied.

Residues within x Ångströms of your chosen position in known structures and predicted in sequences

Restricts the search to structures that have (or do not have) particular amino acid types within a particular distance of a specified residue position. In addition to known structures, this allows you to consider sequences that are numbered but do not have structural data.

It uses a pre-calculated set of distance distributions calculated from known antibody structures to predict residues in the vicinity of a selected position based on correct numbering of the antibodies.

The original distributions were calculated using many hundreds of structures of numbered antibody chains so the prediction algorithm uses an average distance and a standard deviation.

Results are brought back when m < d + nσ, where m the is mean C-alpha to C-alpha distance between the positions, d is the specified distance and σ is the standard deviation. Add row for additional positions or click delete row to remove positions not required.

If Chothia Position dropdown is not selected in a given row, no constraint is applied and the row is effectively removed from consideration.
If the at least one of option is selected in the Constraint dropdown, but nothing has been entered in the text box, no further constraint is applied.


Sequence

Overview

Use this part of the form to restrict the search to protein sequences with specific fragments, motifs or residues.

Terms may be specified separately for heavy and light chains.
If you specify terms for both heavy and light chains in this section or elsewhere in the form, the search will be restricted to paired chains.
Each option (with the exception of the chosen motifs within complete chain search) requires numbered sequences and the search will be automatically restricted to numbered sequences.

Search for chosen motifs within complete sequences

Note: This option runs on the full sequence, not just the numbered region.
To restrict to numbered region see ‘Search for chosen motifs within specific regions’
Restrict the search to sequences which contain particular fragments or motifs.

The search is case insensitive. Regex searches use PostgreSQL POSIX regular expression matching.

Search for chosen motifs within specific regions

Restrict the search to chains with regions which contain particular fragments or motifs.

The search is case insensitive. Regex searches use PostgreSQL POSIX regular expression matching.

A chain will be included in the results only if all the entered criteria are satisfied.

Specify minimum and maximum lengths for regions.

Restrict the search to chains with regions with lengths in a specified range.

A chain will be identified only if all the entered criteria are satisfied.

Constrain amino acids at required positions

Restrict the search to chains which have particular residues at positions of interest e.g. Chothia key residues or residues known to be important in humanization.

By specifying a Required Position but without specifying any Amino Acids all chains are identified that have that numbered position in the sequence irrespective of the amino acid present. This could be used for finding sequences with an unusual insertion of interest.

Add row for additional positions or click delete row to remove positions not required.
If the Required Position dropdown is not selected in a given row, no constraint is applied and the row is effectively removed from consideration.
A chain will be included in the results only if all required positions are present and all the associated amino constraints are satisfied.


Restrict to Antibody Class

Use the dropdown to specify a chain class. The classification is hierarchical e.g. a search for Heavy Gamma chains will also retrieve chains classified as Heavy Gamma 2 A.

Public database sequences are classified in abYsis using a combination of sequence analysis results and where available, textual information provided by the data source.

Commercial Users: Proprietary sequences entered by the User through the Import function will be based purely on calculated results.

In some cases the textual information relating to chain type or class is missing or ambiguous, while in other cases sequence analysis fails to give a clear determination. Where there is an inconsistency, the sequence analysis result is preferred and a warning is flagged. Some chains remain unclassified.

If you specify terms for both heavy and light chains in this section or elsewhere in the form, the search will be restricted to paired chains