Inference of Protein Assembly in Crystals

A webserver to infer protein assembly in crystal by classification and symmetry

Back to IPAC
Concurrently, a limited number of jobs will be handled because of limited computational resources. There is no queue system. Server will not accept jobs incase resources are exhausted. The users are requested to submit one job at a time and make proper use of the server. However, we reserve the right to kill the jobs and block the user, if we find any misuse of the server.

        The server is implemented to infer upto tetrameric quaternary structure. Since, jobs are computationally intensive, thus quaternary structures beyond tetramer will be included after hardware upgradation. If you require inferences beyond tetrameric structures, please download the executable and run it in your local system. The executable can be found at the local repository.
        The server will generate the crystal lattice from the space group and unit cell information. Please upload the PDB format file containing space group and unit cell information.
        The server may not be available during update. In case of any difficulty please feel free to contact.

IPAC server has two job submission option. User can either enter PDB ID or upload his/her PDB format file with coordinates, unit cell and space group information.

To facilitate the usage of the IPAC server, it maintains a percompiled database of PDB IDs. The database contains the output of the IPAC server. If the information regarding the entered PDB ID is available in the database then user will get the result instantly (just by performing a database search), otherwise the data will be download from RCSB website and the job will be submitted to the IPAC server. User need not have to download from RCSB website and upload in IPAC server.

Please note:
# It is strongly recommended to enter PDB ID whenever that information is known.
# If PDB ID is entered and file is also uploaded then PDB ID will take precedence over file upload.
# Make sure that the file you are uploading is in PDB format and contains coordinates, unit cell and space group information.
# IPAC outputs chains of assembly, so chain id information is required. Make sure that all the subunits has unique chain id. Otherwise jobs may be terminated.
# If you wish to retain hetero atoms (including water molecules) in quaternary structure prediction, then all the hetero atoms must be labelled with proper chain id. All the unlabelled hetero atoms (including water) will be removed from analysis and hence from the predicted structure. Space (' ') is NOT considered as a valid chain id.

Personal Information
The job will take few minutes to few hours based upon the protein you have uploaded and the load of the server. However, user need not have to open/bookmark the status page, since the job completion information will be sent through email. All the jobs in this server are confidential.

Flow Chart
Training Dataset
Existing databases

Flow Chart

A gross overview of the flow of the IPAC server is given below (Figure 1).

Figure 1

Features [1]
Interface Area (IA): Interface Area is defined as the amount of accessible surface area (ASA) buried upon complex formation. An atom is defined as the interface atom if it looses it's ASA by >0.1 Å2 due to complex formation. The sum of loose of ASA by all such interface atoms is called as the buried surface area. The buried surface area divided by the number of subunits contributing to the buried surface area is the measure of Interface Area.

Normalized Interface Packing (NIP): Interface packing (IP) is a volume-based measure for estimating compactness of the protein interface. An envelope covering a 4 Å slice across the interface is first calculated covering all the atoms and interatomic voids. The ratio between the sum of the van der Waals volumes of the atoms enclosed in the envelope and the total volume enclosed in the envelope considering it a sphere gives IP. A value of 0 means no packing at the interface, while a value of 1 indicates full packing at the interface. When IP is divided by the interface area, it gives normalized interface packing.

Normalized Surface Complementarity (NSC): Surface complementarity (SC) is an area-based measure to estimate the compactness of the protein interface. At first, a suitable origin-transformation is given to the pair of subunits whose SC is to be computed. A two dimensional Delaunay tessellation is thereafter applied on the protein subunit surface to describe it in terms of triangular tiles. The distance and angle between the tiles across the two subunit’s interface are evaluated (with some corrections to the interface rim regions) to ascertain which of them packed properly. The SC is expressed as the ratio of the minimum of the two packing tile areas available from the two subunits and the total tile area of the interface. A value of 0 means no complementarity, while a value of 1 indicates perfect complementarity. When SC is divided by interface area, it gives normalized surface complementarity.[2]

Normalized Surface Complementarity and Interface Packing Paired Metrix (NSP): It is the deviation of NIP and NSC computed from the linear regression line of NIP and NSC (NSC = 1.24423 × NIP + 0.0279). NIP and NSC share a high correlation of +0.96.

Variation of Accessible Surface Area (asaV): Accessible surface area (ASA) is computed using the Lee-Richards algorithm [3] as implemented in the NACCESS program. The surface area accessible to a probe molecule varies inversely with the radius of the probe molecule. We define interface area (IA) as the accessible surface area buried on the complex formation for an individual subunit. As the radius of the probe decreases, it will go deeper into the concave surface resulting in a larger accessible area. We have observed that rim area of monomeric protein involved in nonbiological contacts in crystal lattice are significantly different from the rim area of dimeric proteins from compactness point of view. The difference, which is denoted as asaV, is quantified by taking the difference of IA2.0 and IA1.8, and normalized by IA1.4, where IA2.0, IA1.8 and IA1.4 indicates the interface area of the protein complex with probe radius 2.0 Å, 1.8 Å, and 1.4 Å, respectively.

Interface Packing Gradient (IPg): The compactness of the interface area may vary from core area to rim area. So, normalized interface packing (NIP) which is a global measure of the interface packing may not capture the local picture of the interface from packing point of view. Therefore, we have introduced another feature: interface packing gradient. It computes the ratio of the packing or compactness of the core interface residues and rim interface residues. The residues with fully buried interface atoms are defined as core residues and residues having interface atoms, which are partially exposed to solvent are defined as rim residues.[4]

Patch Ratio (Pr): Although the normalized interface packing and interface packing gradient provides adequate information about a protein interface, we also computed patch ratio to determine the presence of the interface void. A set of interface atoms will form a patch if they are within 5.0 Å sphere radius. The sum of the interface area contributed by those patch atoms normalized by interface area gives a measure of the patch ratio.

Normalized Solvation Energy (NSE): Solvation energy is an entropic contribution to binding free energy of the protein complex. It arises due to burial of surface area of proteins upon complex formation. The method of Eisenberg and McLachlan (1986) has been used to calculate it.[5]

Hydrophobicity at the Interface and Surface (HPOi and HPOs): Hydrophobicity is computed using the Fauchere and Pliska[6] hydrophobicity scale. The ASA of each atom is normalized by the total ASA of that residue in an extended conformation of the tripeptide G-X-G model [7]. The contribution toward the hydrophobicity by an atom is the product of normalized ASA and hydrophobicity measure for that residue type according to Fauchere and Pliska hydrophobicity scale. The sum of contributions from all surface atoms is the surface hydrophobicity (hpos) and the sum of contributions from all interface atoms is the interface hydrophobicity (hpoi). Chemical nature of the protein surface varies widely among the proteins. So, we have further normalized surface and interface hydrophobicity by the total amount of hydrophobicity. Therefore, the normalized hydrophobicity of the interface is:
HPOi = hpoi ÷ (hpos + hpoi),
and the normalized hydrophobicity of the surface is:
HPOs = hpos ÷ (hpos + hpoi),


The gallery represents the pictures (Figures 6-9) of some of the proteins predicted by IPAC server. The pictures are generated by pymol software with light and shadow effects to give a spectacular view. PGS indicates point group symmetry.

Figure 6
Monomer (C1 PGS)
Figure 7
Heterodimer (C1,C1 PGS)

Figure 8
Homodimer (C2 PGS)
Figure 9
Tetramer (C4 PGS)

Existing databases

PQS, PISA, and PiQSi are the three most widely used protein quaternary structure databases. While PiQSi (Levy, 2007) is manually curated database, PQS (Henrick et al., 1998) and PISA (Krissinel et al., 2007) is based on methods. Among these three, PQS has stopped incrementing their database since Aug, 2009 and will stop their service.

PiQSi provides a full list of annotated information as a text file. Our benchmarking is based on the annotations made till June 21, 2009. The corresponding annotation file can be downloaded from here. The latest annotation file can be downloaded from PiQSi download page.

PISA provides two options Structure Analysis and Database searches. In Structure Analysis a number of possibilities are shown with a number of parameters without a definite conclusion about a particular quaternary structure; whereas in Database searches it provides a definite information regarding quaternary structure.

IPAC, being a prediction server, provides a definite conclusion about the quaternary structure from crystal lattice. Thus, we have compared our result with PISA Database searches option. The PISA database search result, which was used to compare our result can be found from Table 1 (each qauaternary structure indicates the list of proteins with that quaternary structure).

Table 1
Monomer Dimer Trimer Tetramer Pentamer Hexamer
Heptamer Octamer Nonamer Decamer Dodecamer

The data has been downloaded from PISA server on May 31, 2010. All the search options were default and only filtering criterion was protein-protein interaction composition. Different quaternary structures were selected by opting for different multimeric state.

[1] Mitra, P. & Pal, D. (2011). Combining Bayes classification and point group symmetry under Boolean framework for enhanced protein quaternary structure inference. Structure 19:304-312.

[2] Mitra, P. & Pal, D. (2010). New measures for estimating surface complementarity and packing at protein-protein interfaces. FEBS Letters 584(6):1163-1168.

[3] Lee, B. & Richards, F.M. (1971). The interpretation of protein structures: estimation of static accessibility. J. Mol. Biol. 55:379–400.

[4] Chakrabarti, P. & Janin, J. (2002). Dissecting protein-protein recognition sites. Proteins 47:334–343.

[5] Eisenberg, D. & McLachlan, A.D. (1986). Solvation energy in protein folding and binding. Nature 319:199–203.

[6] Fauchere, J. & Pliska, V. (1983). Hydrophobic parameters p of amino acid side chains from partitioning of N-acetyl-amino-acid amides. Eur. J. Med. Chem. 18:369–375.

[7] Miller, S., Janin, J., Lesk, A.M. & Chothia, C. (1987). Interior and surface of monomeric proteins. J. Mol. Biol. 196:641–656.

Pralay Mitra, Ph. D.
c/o Dr. Debnath Pal
Bioinformatics Center and Supercomputer Education Research Center,
Indian Institute of Science, Bangalore, Karnataka - 560012, India
Email: dpal@iisc.ac.in