Home      Search      Download      Help      Reference      Contact     

Bound and Unbound complexes
Searching Parameters
          PDB id
          UniRef Information
          PDB title
          Resolution and R-value
          Space group
          Complex type
          Chain length
          SCOP class
          Interface Area
          Number of interface residues
          Range of RMSD
Linking with other databases
Generation of docking decoys
Generation of docking decoys using UniRef information
Known issues

The dockYard is a repository of protein-protein docking decoys. The website will help the researchers to develop scoring function for protein-protein docking without implementing/installing docking search algorithms. There are downloading and searching options in dockYard.

Bound and Unbound complexes
The repository contains docking decoys and information on proteins for bound (co-crystallized) and unbound (separately crystallized) complexes. For bound complexes searching parameters are PDB id, UniRef Information, PDB title, Resolution and R-value, Space group, AEROSPACI, Complex type, Chain length, SCOP class, Interface Area, Number of interface residues and Range of RMSD. The details of the searching parameters are described at Searching Parameters section.

However, for unbound complexes you can search with PDB id, UniRef Information, Resolution and R-value, Space group, AEROSPACI, Complex type, and SCOP class. Searching facilities with parameters PDB id, Resolution and R-value, AEROSPACI, and SCOP class is there for unbound docking partners. The unbound complexes are taken from Benchmark 3.0 of Hwang et al (Proteins. 2008. 73(3):705-9). Since the bound structure is available for download at Benchmark 3.0 website so we did not include them in our repository. Right now, our repository contains only rigid body cases and which are reported as dimer by PQS as well as by PISA.

Searching Parameters
You can search the repository based on protein, protein complex and docking decoy specific information. For each search if there is any hit then it will output the result otherwise it will mention that "No PDB has been selected with your parameters!!". If you get this message then please check the parameters you have entered. The detailed description of the parameters are given below.

PDB id: Please enter 4 character PDB id for the search. The repository will be searched against your PDB id if that particular protein is present in our repository then it will output that. Otherwise select UniRef databases to get a representative from the repository.

UniRef Information: This field will be active only when you have entered PDB id and that PDB id is not present in our repository. In that case the search engine will search UniRef databases (selected by the user by dropdown menu at search page) which is clustered sets of sequences from UniProt Knowledgebase. And if there is any protein in the cluster of UniRef databases (selected by user) which contains searched PDB id and PDB id from our repository then instead of searched PDB id those PDB ids' will be outputted which are in our repository. Other PDB ids matching based on UniRef criteria is also output.
The basic purpose of the inclusion of UniRef information is not to restrict the users to proteins only available in our dataset.

PDB title: The title of PDB file has been stored in our repository. If you want to search based upon a keyword then please enter that keyword. The repository will be searched with your keyword as a substring.

Resolution and R-value: Currently all the structures in our respository are derived from X-Ray crystallography. The resolution and R-value of the protein is therefore used for quality check. The search is set to maximum values of resolution and R-factor in the repository by default.

Space group: The space group information is included in the repository. User can restrict his/her proteins for a particular space group.

AEROSPACI: The concept of SPACI was introduced to assess the reliability of crystallographically-determined structures in the Protein Data Bank. The Summary PDB ASTRAL Check Index (SPACI) provides a numeric score based on resolution, R-factor, and the theoretical quality of the model. Aberrant Entry Re-Ordered (AERO) SPACI scores, is derived from SPACI scores with penalties for aberrant structures. We have extracted the data from http://astral.berkeley.edu/spaci.html website.

Complex type: The complex type is either homomer or heteromer. For bound cases it has been checked with PQS and manual curation is done wherever needed.

Chain length: Each subunit of all the proteins in the repository has at least 25 residues. However, user can restrict with a different chain length value. Protein complexes will be output whose all chains are greater than the specified value.

SCOP class: The SCOP class is also included as a parameter in our search page. Corresponding to each subunit the SCOP class is mentioned in the query page. Users can also filter their dataset based upon a particular SCOP class. If at least one subunit contains that SCOP class then it will be output.

Interface Area: An atom is called as the interface atom if it loses it's solvent accessible surface area by more than 0.1 square angstrom upon complex formation. We have used NACCESS program for computing accessible surface area (ASA) with default 1.4 angstrom probe radius. Sum of the ASA of the interface atoms divided by 2 is the interface area. Since the contribution of ASA will come from the residues in both subunits, we have divided the total interface area by 2.

Number of interface residues: A residue is called as the interface residue if at least one atom of the residue is interface atom (for definition of interface atom see Interface Area). In our repository the minimum number of interface residue is 10 which is set as default. For a given number of residues it will filter only those proteins whose all the subunits have number of interface residue more than the specified number. Please note that we have mentioned number (count) of interface residues not location of residue in the polypeptide. For example, if Number of interface residue column has entry 52, 50, it means that at the interface 52 residues of one subunit are interacting with 50 residues of another subunit.

Range of RMSD: Corresponding to each docking decoys we have computed backbone c-alpha root-mean-square-deviation (RMSD) of the decoy from the native structure (denoted as L-RMSD in the publication). This information is also included in the downloadable files along with coordinates and transformation matrix. Instead of generating all the docking decoys user may focus on a range of L-RMSD of his/her interest. He/she has to download the list of the files (downloadable from query result page) and has to put it in the same directory containing whole dataset zip file and unzipped dockYard files. Then he/she has to follow the instructions written in README file. The perl script will generate only those docking decoys which are within specified range of RMSD. Since the native structures can be downloaded from RCSB website, we did not include them in our downloads. Hence, there is no docking decoy with RMSD 0.0 angstrom.

In order to get an idea of the distribution of the LRMSD of decoys corresponding to each pdb we have provided a histogram on LRMSD of decoys. The histogram file (in PDF format) is linked with the column - "Number of docking decoys" (last column of the query result table).

Linking with other databases
The repository provides information on pseudo-native docking decoys by linking with UniRef database. If somebody is interested in using modeled structures for docking, the UniRef information may also be used for obtaining the template information. The subunit fold class information is obtained from SCOP database, and protein interaction information from DIP database.

UniRef: We have used the concept of interlogs exploiting the available information on clustered sets of sequence in the UniRef database to offer pseudo-native docking decoys to the users. The information from UniRef is useful in doing two categories of predictive docking: (i) pseudo-native, and (ii) modeled. While information on native structures are directly available from our repository (or using UniRef100), UniRef90 and UniRef50 can assist in finding pseudo-native interacting partners based on homology. UniRef90 and UniRef50 may also be used to find templates for modeling subunits that can be docked. We have been able to include 7003 interlogs, based on the 902 bound complexes, and 1830 interlogs based on 40 target complexes in unbound category. In the search page the user can input PDB identifier of his/her choice and get representative docking decoys based on sequence identity measure derived from the UniRef100, UniRef90 and UniRef50 databases.

Sequence of an individual subunit in hetero-complex belongs to separate clusters in the UniRef database. Therefore in our query for hetero-complex, we output complexes in all those clusters where sequence for at least one subunit shows a match.

SCOP: SCOP class is an important information about the fold of a protein subunit. In our repository the fold class information corresponding to each subunit is included which will help the user to identify the various fold classes involved in a subunit and in a protein as a whole.

DIP: The information about the interacting partners corresponding to a protein is provided at the DIP column of the query result page. The interaction information has been extracted from DIP (version: 2009/01/26). All the interacting partners of a PDB has been identified by the UniProtKB identifier as mentioned in the DIP webpage. If there exists a PDB structure corresponding to UniProtKB identifier then that PDB identifier along with the DIP identifier is mentioned in a text file which will be launched on clicking link EXIST. If there does not exist any PDB identifier corresponding to UniProtKB then it is denoted as "-".

The downloading page has the option to download the initial coordinates of protein structures in PDB format and a list of transformation matrices. The utility programs along with a README file can also be downloaded from the page.

Please first download README and utilities from the download page and unzip it. Then download .tar.gz files and put that one in unzipped directory - dockYard. Go to dockYard directory and run the shell script compile.sh. It will compile two C programs and will generate binary files. In order to run this code linux/unix system is required with C compiler and perl. The scripts and programs will generate docking complexes locally using .gz files as input.

If you are facing any problem of compiling or generating complexes, feel free to contact us with your system specifications.

Generation of docking decoys
The docking decoys will be generated in your system locally by Perl script and C program. If you have downloaded individual files with filename PDB identifier and .tar.gz then please make a list GZlist.txt containing the .gz filenames of all those proteins for which you want to generate complexes and then run the perl script perl generateSingleFile.pl. The script generateSingleFile.pl will generate all the docking decoys of the listed proteins. Please make sure that sufficient amount of storage space is there in your drive.

If you want to generate docking complexes based upon your search criteria then please download all complexes as a single zipped file (from download page) and save the list file (by clicking on "List of filtered PDB" at the query result page) to dockYard directory from the query search page. Rename the list file you have just downloaded to filter.lst and execute the perl script perl generateWithFilterB.pl for bound complexes and perl generateWithFilterU.pl for unbound complexes.

For all the cases the docking complexes generated will be kept in a directory with the directory name same as the PDB identifier. For a detailed description please visit our Tutorial section.

The docking decoys aggregate to the order of gigabytes. So, we are supplying minimum information required to generate docking decoys locally instead of wasting your bandwidth and time on downloading docking decoy structures directly from dockYard website.

Generation of docking decoys using UniRef information
Please note that when you are using UniRef information for search and your query PDB identifier is not in the results, we are providing you with the name(s) of protein complex. The transformation matrix of this protein complex can be used for your protein after necessary preprocessing. First, you have to identify a one-to-one correspondence between your protein subunits and downloaded protein subunits. Then the corresponding protein subunits should be superimposed by applying a transformation on YOUR PROTEIN SUBUNITS. Now, save the coordinates of your transformed protein subunits to use as intial coordinates instead of downloaded protein subunit coordinates to generate docking decoys with the downloaded transformation matrix. The superimposition can be done easily by using a number of available softwares including ProFit, POLYPOSE (CCP4: Supported Program).

Known Issues
NONE reported so far.

Last modified: Dec, 2010.