Inference of Protein Assembly in Crystals
A repository for protein quaternary structures inferred from crystals using classification and symmetry

Back to IPACdb

Table of Content

Protein Information
Crystallographic Information
Assembly Information
Database Query
Confidence of IPAC
Training Data
Existing Databases

IPACdb is a repository for quaternary structures of proteins in crystalline state, inferred by Mitra & Pal (IPAC) using naive Bayes classification and point group symmetry [1]. The information is categorized into three groups - Protein Information, Crystallographic Information and Assembly Information. The Protein Information contains all the protein related information extracted from the PDB website, while Crystallographic Information contains the parameters of the experimental method. The Assembly Information contains the features used by the IPAC to infer the asembly.

Protein Information
PDB ID: 4 character Protein Data Bank identifier. At a time only one PDB ID can be queried.

SCOP: The Structural Classification of Proteins (SCOP) class is an important information about the fold of the protein subunit. User can filter the repository based on SCOP class. At a time only one SCOP class can be queried. Note that, some of the proteins are not assigned with any SCOP ID. In order to get those proteins in your result, please use the default search.

UniProtKB: The repository contains UniProtKB/Swiss-Prot identifier. corresponding to each PDB. Thus, user can alternately search by UniProtKB identifier..

Text: The repository contains the PDB title information which can be searched using any keyword. The search is done using substring matching.

Complex Type: User can restrict their search for homomeric or heteromeric complex types. By default all the complexes are output. The quaternary state is considered as homomeric when all the subunits in the complex have same primary structure. The complex type is determined based upon the chains of PDB.

Molecule: The repository can be searched by the name of the molecule. The macromolecular content description is searched in the PDB file. The searching is done using substring match.

Source: The repository contains the information on the source organism of the macromolecules for each entry. The search is based on substring match.

Chain Length: User can restrict the search with a minimum chain (subunit) length. In that case, only those proteins will be selected whose ALL chain lengths in the complex are above the specified length. Please note that there is no restriction of minimum chain length in the database. However, IPAC is benchmarked on proteins with subunits of >25 size. Thus, we recommend that you use 25 as your minimum chain length criterion.

Crystallographic Information
The IPACdb repository contains only those PDB whose structures were determined using X-ray crystallography. Thus, resolution, R-factor and space group is provided as search parameters.

Resolution & R-factor: The resolution and R-factor of the protein are indicators of structure quality. The search is set to maximum values of resolution and R-factor. R-factor is set to zero (0.0) when data is not available.

Space Group: User can filter the proteins for a particular space group.

Assembly Information
Accessible Surface Area (ASA): ASA is the amount of protein surface area accessible to the solvent. Water is the solvent in our case (default probe radii 1.4 Å). Total ASA is the sum of the contribution of ASA by all the protein atoms and is computed by the NACCESS program. User can restrict the complexes with minimum ASA.

Buried Surface Area (BSA): The amount of ASA buried because of complex formation is denoted as the buried surface area. User can restrict the complexes with minimum BSA. For monomers, BSA is zero.

% Buried Surface Area: Percentage buried surface area is the ratio of BSA and the total ASA of all the subunits multiplied by 100. Mathematically, %BSA=(BSA÷TotalASA)x100. The user can restrict the proteins by the range of %BSA.

Quaternary State: User can query for a particular quaternary state of the proteins, such as dimer, trimer etc.

Point Group Symmetry (PGS): PGS contains only cyclic point group information ranging from C1 through C7. Dihedral symmetry can be represented by cyclic form (for details see Mitra and Pal). User can request for proteins whose quaternary structure has a specific cyclic group.

Disulphide Bonds (#): This field indicates the minimum number of all disulphide bonds present in the protein complex. The number includes both intra-chain and inter-chain disulphide bonds.

Database Query
All the above mentioned conditions will be considered in conjunction (AND operation) to search the repository. The list of proteins satisfying all the conditions will be the outcome of the query.

Confidence of IPAC
Isssue 1: IPAC is benchmarked on proteins with individual subunit length >25; however, IPACdb contains proteins with no restriction on minimum subunit length. The quaternary state prediction of those cases, where the protein has at least one subunit with length <25, have been marked with an asterix (*). Please review the quaternary state labels for these cases for correct correspondence to the predicted structure of the complex.

Isssue 2: IPAC is an automated server, thus there is no check for synthetic construct, especially small polypeptide chains. Such cases are also marked with an asterix (*). Please review these cases to confirm biological relevance. A PDB with single subunit and a synthetic polypeptide may be inferred as a dimer, because of the presence of two different chains. Conventionally, these predictions are treated as monomer because the synthetic construct may be a small peptide molecule.

Example: PDB ID:2V3Z, which is a Complexes of Mutants of Escherichia Coli Aminopeptidase P and the Tripeptide Substrate Valproleu, has been concluded as tetramer comprising two XAA-PRO Aminopeptidase and two Tripeptide (VALINE-PROLINE-LEUCINE). IPAC has concluded based upon the presence of two-plus-two subunits. However, user may not wish to include tripeptides as a valid subunit and want to infer it as dimer instead of tetramer. Thus, it is recommendend to check all the '*' marked quaternary states manually before use it for any further analysis/application.

Features [1]
Interface Area (IA): Interface Area is defined as the amount of accessible surface area (ASA) buried upon complex formation. An atom is defined as the interface atom if it looses it's ASA by >0.1 Å2 due to complex formation. The sum of loss of ASA by all such interface atoms is called as the buried surface area. The buried surface area divided by the number of subunits contributing to the buried surface area is the measure of Interface Area.

Normalized Interface Packing (NIP): Interface packing (IP) is a volume-based measure for estimating compactness of the protein interface. An envelope covering a 4 Å slice across the interface is first calculated enclosing all the atoms and inter-atomic voids. The ratio between the sum of the van der Waals volumes of the atoms enclosed in the envelope and the total volume enclosed in the envelope considering it a sphere gives IP. A value of 0 means no packing at the interface, while a value of 1 indicates full packing at the interface. When IP is divided by the interface area, it gives normalized interface packing.

Normalized Surface Complementarity (NSC): Surface complementarity (SC) is an area-based measure to estimate the compactness of the protein interface. At first, a suitable origin-transformation is given to the pair of subunits whose SC is to be computed. A two dimensional Delaunay tessellation is thereafter applied on the protein subunit surface to describe it in terms of triangular tiles. The distance and angle between the tiles across the two subunit’s interface are evaluated (with some corrections to the interface rim regions) to ascertain which of them packed properly. The SC is expressed as the ratio of the minimum of the two packing tile areas available from the two subunits and the total tile area of the interface. A value of 0 means no complementarity, while a value of 1 indicates perfect complementarity. When SC is divided by interface area, it gives normalized surface complementarity.[2]

Normalized Surface Complementarity and Interface Packing Paired Matrix (NSP): It is the deviation of NIP and NSC computed from the linear regression line of NIP and NSC (NSC = 1.24423 × NIP + 0.0279). NIP and NSC share a high correlation of +0.96.

Variation of Accessible Surface Area (asaV): Accessible surface area (ASA) is computed using the Lee-Richards algorithm [3] as implemented in the NACCESS program. The surface area accessible to a probe molecule varies inversely with the radius of the probe molecule. We define interface area (IA) as the accessible surface area buried on the complex formation for an individual subunit. As the radius of the probe decreases, it will go deeper into the concave surface resulting in a larger accessible area. We have observed that rim area of monomeric protein involved in non-biological contacts in crystal lattice are significantly different from the rim area of dimeric proteins from compactness point of view. The difference, which is denoted as asaV, is quantified by taking the difference of IA2.0 and IA1.8, and normalized by IA1.4, where IA2.0, IA1.8 and IA1.4 indicates the interface area of the protein complex with probe radius 2.0 Å, 1.8 Å, and 1.4 Å, respectively.

Interface Packing Gradient (IPg): The compactness of the interface area may vary from core area to rim area. So, normalized interface packing (NIP) which is a global measure of the interface packing may not capture the local picture of the interface from packing point of view. Therefore, we have introduced another feature: interface packing gradient. It computes the ratio of the packing or compactness of the core interface residues and rim interface residues. The residues with fully buried interface atoms are defined as core residues and residues having interface atoms, which are partially exposed to solvent are defined as rim residues.[4]

Patch Ratio (Pr): Although the normalized interface packing and interface packing gradient provides adequate information about a protein interface, we also computed patch ratio to determine the presence of the interface void. A set of interface atoms will form a patch if they are within 5.0 Å sphere radius. The sum of the interface area contributed by those patch atoms normalized by interface area gives a measure of the patch ratio.

Normalized Solvation Energy (NSE): Solvation energy is an entropic contribution to binding free energy of the protein complex. It arises due to burial of surface area of proteins upon complex formation. The method of Eisenberg and McLachlan (1986) has been used to calculate it.[5]

Hydrophobicity at the Interface and Surface (HPOi and HPOs): Hydrophobicity is computed using the Fauchere and Pliska[6] hydrophobicity scale. The ASA of each atom is normalized by the total ASA of that residue in an extended conformation of the tripeptide G-X-G model [7]. The contribution toward the hydrophobicity by an atom is the product of normalized ASA and hydrophobicity measure for that residue type according to Fauchere and Pliska hydrophobicity scale. The sum of contributions from all surface atoms is the surface hydrophobicity (hpos) and the sum of contributions from all interface atoms is the interface hydrophobicity (hpoi). Chemical nature of the protein surface varies widely among the proteins. So, we have further normalized surface and interface hydrophobicity by the total amount of hydrophobicity. Therefore, the normalized hydrophobicity of the interface is:
HPOi = hpoi ÷ (hpos + hpoi),
and the normalized hydrophobicity of the surface is:
HPOs = hpos ÷ (hpos + hpoi),

Training Data
The details on training data can be found here. The first column indicates Protein Data Bank (PDB) identifier, followed by 10 features variables and assembly information (Monomer/Dimer). The details on the feature variables can be found at the features section of this page.

The core of the IPAC is a binary classifier, which predicts either of the two states (Monomer or Dimer). Therefore, the classifier is also trained with Monomers and Dimers.

The normalized interface packing (NIP) and normalized surface complementarity (NSC) is mutiplied with 1000.0 Å2 and thus, their unit contains 10-3 term.

Existing Databases
PQS, PISA, and PiQSi are three other protein quaternary structure repositories. While PiQSi (Levy, 2007) is manually curated repository, PQS (Henrick et al., 1998) and PISA (Krissinel et al., 2007) is based on automated methods. Among these, PQS has stopped incrementing their repository since Aug, 2009 and will stop their service.

PiQSi provides a full list of annotated information as a text file. IPAC is benchmarked on the annotations made till June 21, 2009 [1]. The corresponding annotation file can be downloaded from here. The latest annotation file can be downloaded from PiQSi download page.

PISA provides two options Structure Analysis and Database searches. In Structure Analysis a number of possibilities are shown with a number of parameters without a definite conclusion about a particular quaternary structure; whereas in Database searches it provides a definite information regarding quaternary structure.

IPAC, being a prediction server, provides a definite conclusion about the quaternary structure from crystal lattice. Thus, we have compared IPAC result with PISA Database searches option [1]. The PISA repository search result, which was used to compare IPAC result can be found from Table 1 (each quaternary structure indicates the list of proteins with that quaternary structure).

Table 1
Monomer Dimer Trimer Tetramer Pentamer Hexamer
Heptamer Octamer Nonamer Decamer Dodecamer

The data has been downloaded from PISA server on May 31, 2010. All the search options were default and only filtering criterion was protein-protein interaction composition. Different quaternary structures were selected by opting for different multimeric state.

[1] Mitra, P. & Pal, D. (2011). Combining Bayes classification and point group symmetry under Boolean framework for enhanced protein quaternary structure inference. Structure 19:304-312.
[2] Mitra, P. & Pal, D. (2010). New measures for estimating surface complementarity and packing at protein-protein interfaces. FEBS Letters 584(6):1163-1168.
[3] Lee, B. & Richards, F.M. (1971). The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55:379–400.
[4] Chakrabarti, P. & Janin, J. (2002). Dissecting protein-protein recognition sites. Proteins 47:334–343.
[5] Eisenberg, D. & McLachlan, A.D. (1986). Solvation energy in protein folding and binding. Nature 319:199–203.
[6] Fauchere, J. & Pliska, V. (1983). Hydrophobic parameters p of amino acid side chains from partitioning of N-acetyl-amino-acid amides. Eur. J. Med. Chem. 18:369–375.
[7] Miller, S., Janin, J., Lesk, A.M. & Chothia, C. (1987). Interior and surface of monomeric proteins. J. Mol. Biol. 196:641–656.

Contact: Pralay Mitra, Ph. D.
Prof. Debnath Pal's Lab
Bioinformatics Center and Supercomputer Education Research Center,
Indian Institute of Science, Bangalore, Karnataka - 560012, India
Email: dpal@iisc.ac.in