This page describes how to use programs of the DEJAVU package together with O to find and inspect proteins that have a (domain with a) similar fold as your own protein. It uses the following programs:
Contents:
The first thing you need to do is to extract the SSEs (secondary structure elements) from your structure. One way of doing this is to define the SSEs manually, but you can also run GETSSE instead which uses O's YASSPA algorithm to identify helices and strands. In this example, we will use the structure of 1CEL, so you can re-work it in the comfort of your own home. Get the PDB file and run the GETSSE program (the calculations take only a fraction of a second):
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- % 1071 gerard sarek 23:42:02 gerard/dennis > run getsse [...] 4-Character molecule name or ID ? (USER) Description of molecule ? (...) cbh1 Full pathname of input PDB file ? (user.pdb) 1cel.pdb Name of output SSE file ? (user.sse) Reading PDB file ... Nr of residues read : ( 434) Doing YASSPA ... Nr of ALPHA residues : ( 57) Nr of BETA residues : ( 161) Writing SSE file ... Nr of ALPHA helices : ( 11) Nr of BETA strands : ( 25) [...] ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The SSE file looks as follows (you may edit it if you like):
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ! ! === USER ! MOL USER NOTE cbh1 PDB 1cel.pdb ! BETA 'B1' 'A2' 'A5' 4 39.57 63.20 37.84 38.40 73.33 36.26 BETA 'B2' 'A7' 'A20' 14 34.72 76.49 38.16 52.03 72.33 74.04 BETA 'B3' 'A24' 'A34' 11 51.91 77.89 75.06 33.69 70.36 50.73 ALPHA 'A1' 'A36' 'A38' 3 32.90 65.32 46.63 28.21 67.96 46.92 BETA 'B4' 'A40' 'A42' 3 29.20 69.11 40.10 30.36 66.10 34.54 ALPHA 'A2' 'A58' 'A60' 3 23.52 55.66 31.53 28.61 58.17 30.37 ALPHA 'A3' 'A64' 'A70' 7 31.80 52.01 37.86 34.56 59.67 31.72 BETA 'B5' 'A71' 'A77' 7 33.75 61.85 34.71 31.51 78.29 45.06 BETA 'B6' 'A84' 'A87' 4 29.94 78.04 53.86 36.31 84.40 59.25 BETA 'B7' 'A90' 'A98' 9 35.34 80.86 62.79 19.30 78.89 47.22 BETA 'B8' 'A102' 'A110' 9 18.74 74.39 46.31 40.19 70.55 57.61 BETA 'B9' 'A118' 'A122' 5 45.22 67.58 58.33 43.94 61.99 69.51 BETA 'B10' 'A125' 'A133' 9 45.52 64.75 73.68 23.93 75.91 68.33 BETA 'B11' 'A140' 'A143' 4 19.03 69.27 62.68 28.36 67.76 63.81 BETA 'B12' 'A145' 'A147' 3 33.82 64.54 64.45 37.19 59.71 60.85 ALPHA 'A4' 'A155' 'A157' 3 45.03 58.87 45.54 45.84 53.57 46.74 ALPHA 'A5' 'A165' 'A167' 3 41.75 65.69 48.96 38.21 67.95 45.80 BETA 'B13' 'A170' 'A173' 4 37.99 59.55 50.29 32.67 56.49 58.49 BETA 'B14' 'A191' 'A194' 4 23.47 48.83 42.13 17.47 56.47 43.89 BETA 'B15' 'A206' 'A214' 9 29.27 49.74 47.61 31.78 61.70 67.14 BETA 'B16' 'A222' 'A230' 9 20.30 67.63 72.72 35.47 49.01 63.35 BETA 'B17' 'A236' 'A239' 4 37.78 50.06 53.44 29.61 45.49 49.36 BETA 'B18' 'A261' 'A264' 4 28.85 53.81 72.99 23.80 62.02 75.83 ALPHA 'A6' 'A265' 'A268' 4 23.19 63.78 79.22 21.07 59.40 80.86 BETA 'B19' 'A282' 'A284' 3 24.30 75.18 79.79 22.13 76.76 74.19 BETA 'B20' 'A287' 'A295' 9 27.98 80.18 72.83 43.42 59.51 78.02 BETA 'B21' 'A299' 'A306' 8 38.65 56.40 78.32 30.15 78.86 79.48 BETA 'B22' 'A309' 'A312' 4 32.16 78.75 84.49 32.38 68.62 84.12 BETA 'B23' 'A315' 'A318' 4 27.20 62.31 86.68 23.36 53.38 84.89 ALPHA 'A7' 'A328' 'A337' 10 34.69 50.45 79.61 20.57 48.57 80.57 ALPHA 'A8' 'A342' 'A346' 5 27.40 46.12 68.62 33.43 45.46 67.44 ALPHA 'A9' 'A349' 'A357' 9 37.27 51.02 73.61 46.48 56.15 68.26 BETA 'B24' 'A359' 'A365' 7 45.75 59.80 62.37 28.90 69.18 58.97 ALPHA 'A10' 'A375' 'A377' 3 14.69 61.36 66.52 19.24 62.44 68.93 ALPHA 'A11' 'A404' 'A410' 7 13.34 67.09 56.09 10.40 76.29 57.91 BETA 'B25' 'A414' 'A423' 10 19.33 77.71 60.37 44.36 70.97 70.10 ENDMOL ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Make sure that DEJAVU is installed (and that the names of the PDB files in the DEJAVU database have been changed such that they point to your local copy of the PDB):
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- % 1074 gerard sarek 23:42:02 gerard/dennis > run dejavu [...] Max nr of database entries : ( 20000) Max nr of sec-struc elements in total : ( 500000) Max nr of sec-struc elements per entry : ( 250) Max nr of sec-struc types : ( 2) Max nr of hits : ( 1000) DEFINE > ALPHA alpha helix DEFINE > BETA beta strand DEJAVU SSE library file ? (/home/gerard/lib/dejavu.lib) /home/gerard/lib/dejavu_100.lib DEJAVU SSE library file : (/home/gerard/lib/dejavu_100.lib) List contents of SSE library (Y/N) ? (N) List contents of SSE library (Y/N) : (N) Skip non-existent PDB files (Y/N) ? (N) Skip non-existent PDB files (Y/N) : (N) 1 CPU total/user/sys : 0.0 0.0 0.0 Nr of lines read : ( 216295) Nr of entries : ( 7348) Nr of SSEs read : ( 157435) +----------------------------------------------------------+ | OPTIONS: | | | | REad user DEJAVU file QUit from DEJAVU | | | | INcremental search = method of choice for models/bones ! | | | | FInd specific motif in database (rarely used) | | PArameters for IN and FI commands (rarely used) | | | | LIst a database entry EXtract a database entry | | CHeck database integrity STatistics | | SElect certain entries TOpological analysis | | ! (comment; no action) ? (list options) | +----------------------------------------------------------+ 2 CPU total/user/sys : 13.3 12.9 0.4 ===> Option ? (READ) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The first thing to do is to read your SSE file:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ===> Option ? (READ) ===> Option : (READ) User DEJAVU file ? (user.sse) User DEJAVU file : (user.sse) MOL > user NOTE > cbh1 PDB > 1cel.pdb ENDMOL > user Nr of elements : ( 36) ====== > 1 BETA B1 A2 A5 4 ====== > 2 BETA B2 A7 A20 14 ====== > 3 BETA B3 A24 A34 11 ====== > 4 ALPHA A1 A36 A38 3 ====== > 5 BETA B4 A40 A42 3 [...] ====== > 35 ALPHA A11 A404 A410 7 ====== > 36 BETA B25 A414 A423 10 Nr of lines read : ( 44) Nr of elements : ( 36) ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Next, choose the INcremental search option and answer the questions. It's usually a good idea to start with rather ambitious (i.e., strict) search criteria so you don't find 50% of the database proteins as "hits". On the other hand, you are better off finding too many hits, than too few. This is because we can usually eliminate most of the false positives by the post-processing with LSQMAN and DEJANA, whereas we can never get any false negatives back !
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- ===> Option ? (READ) inc ===> Option : (inc) ********** NEW QUERY ********** Elements : ( B1 B2 B3 A1 B4 A2 A3 B5 B6 B7 B8 B9 B10 B11 B12 A4 A5 B13 B14 B15 B16 B17 B18 A6 B19 B20 B21 B22 B23 A7 A8 A9 B24 A10 A11 B25) Nr of SSEs : ( 36) Min nr of residues for SSEs ? ( 4) Min nr of residues for SSEs : ( 4) Nr of SSEs : ( 28) Remaining SSEs : ( B1 B2 B3 A3 B5 B6 B7 B8 B9 B10 B11 B13 B14 B15 B16 B17 B18 A6 B20 B21 B22 B23 A7 A8 A9 B24 A11 B25) Min nr of elements to match (0 = abort) ? ( 4) 8 Min nr of elements to match (0 = abort) : ( 8) Is this a BONES search ? (N) Is this a BONES search : (N) Is this a SYMBOLIC search ? (N) Is this a SYMBOLIC search : (N) Do lsq_explicit inside O ? (N) Do lsq_explicit inside O : (N) Define how much the nr of residues in SSEs may differ by defining how many residues shorter or longer SSEs in the database may be compared to those in your protein. Max nr of residues "too short" ? ( 2) Max nr of residues "too short" : ( 2) Max nr of residues "too long" ? ( 4) Max nr of residues "too long" : ( 4) Mismatch element length ? ( 10.000) Mismatch element length : ( 10.000) Mismatch distances ? ( 8.000) Mismatch distances : ( 8.000) Mismatch cosines ? ( 0.400) Mismatch cosines : ( 0.400) Weights for nr res, length, dist, cos, rmsd Weights for scoring ? ( 0.001 0.001 0.100 0.100 0.500) Weights for scoring : ( 0.001 0.001 0.100 0.100 0.500) Normalised weights : ( 0.014 0.014 0.139 0.139 0.694) Conserve directionality ? (Y) Conserve directionality : (Y) Conserve absolute motif ? (Y) Conserve absolute motif : (Y) Conserve neighbours ? (N) Conserve neighbours : (N) Create O macro file ? (Y) n Create O macro file : (n) Create LSQMAN input file ? (Y) Create LSQMAN input file : (Y) LSQMAN input file ? (lsqman.inp) LSQMAN input file : (lsqman.inp) Nr of elements recognised in query : ( 28) Nr of elements of each type : ( 6 22) ********** 2ayh ********** 530 ********** [13-14-beta-d-glucan 4 glucanohydrolase (e.c.3.2.1.73) - 1,3-1,4-beta- ] [/portray/pub/databases/pdb/all_entries/uncompressed_files/pdb2ayh.ent ] Elements : B1 B2 B3 A3 B5 B6 B7 B8 B9 B10 B11 B13 B14 B15 B16 B17 B18 A6 B20 B21 B22 B23 A7 A8 A9 B24 A11 B25 Nr of common SSEs : ( 8) Elements : -X- -X- -X- -X- -X- -X- B6 B7 B8 -X- -X- -X- -X- -X- B13 -X- -X- -X- B15 B16 -X- -X- -X- -X- -X- B18 -X- B21 Total mismatched residues : ( 11) Total gaps mismatch : ( 6) Length ... rmsd = 4.715 ... corr = 0.815 Residues ... rmsd = 1.620 ... corr = 0.871 Distance ... rmsd = 2.709 ... corr = 0.906 Cosines ... rmsd = 0.101 ... corr = 0.989 The 8 centroids have an RMS distance of 3.213 A SCORE : ( 2.779) Nr of hits : ( 1) Nr of common SSEs : ( 8) Nr of best match : ( 1) Best score : ( 2.779) Best RMSD : ( 3.213) [...] ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
After some fiddling with the parameters (in particular, the "Min nr of elements to match") we get 16 hits:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
[...]
Nr of database entries : ( 7348)
Nr of selected entries : ( 7348)
Nr of matching entries : ( 16)
Nr of hits (total) : ( 36)
Sorting hits ...
Nr Entry PDB SSE RMSD SCORE Compound
==== ===== ==== ==== ===== ===== ========
1 1715 1cel 28 0.00 0.00 14-beta-d-glucan cellobiohydrolase i (cellulase) - fungus (trichoderm
2 4007 4cel 26 0.08 0.07 active-site mutant d214n determined at ph 6.0 with - trichoderma rees
3 2372 7cel 26 0.40 0.33 cbh1 (e217q) in complex with cellohexaose and cellobiose - trichoderm
4 963 6cel 24 0.16 0.13 cbh1 (e212q) cellopentaose complex - trichoderma reesei; organism_com
5 4414 2ovw 19 2.10 1.76 endoglucanase i complexed with cellobiose - fusarium oxysporum
6 5573 1ovw 19 2.12 1.77 endoglucanase i complexed with non-hydrolysable substrate - fusarium
7 3967 2a39 17 1.54 1.32 humicola insolens endocellulase egi native structure - humicola insol
8 3648 1a39 16 1.60 1.36 humicola insolens endocellulase egi s37w p39w - humicola insolens; ex
9 6360 1eg1 16 2.68 2.23 endoglucanase i from trichoderma reesei - trichoderma reesei; strain:
10 530 2ayh 8 3.21 2.78 13-14-beta-d-glucan 4 glucanohydrolase (e.c.3.2.1.73) - 1,3-1,4-beta-
11 1366 1gbg 8 3.29 2.85 bacillus licheniformis beta-glucanase - bacillus licheniformis; expre
12 1257 1cpn 8 3.35 2.97 circularly permuted (1-31-4)-beta-d-glucan - (bacillus macerans) cpma
13 4320 1mac 8 3.52 3.03 13-14-beta-d-glucan 4-glucanohydrolase (e.c.3.2.1.73) - (bacillus mac
14 3284 1axk 8 3.64 3.23 engineered bacillus bifunctional enzyme gluxyn-1 - fragment: 1,3-1,4-
15 3011 1sac 8 4.54 3.85 serum amyloid p component (sap) - human (homo sapiens) serum
16 6138 1qtj 8 6.43 5.23 limulus polyphemus sap - limulus polyphemus; organism_common:
2 CPU total/user/sys : 201.5 201.4 0.1
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The top 4 hits are all obviously correct, and numbers 5 to 9 also look confidence-inspiring. But what about the rest ?
Run the program with the input file created by DEJAVU. The result will be a new O macro file.
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- % 1077 gerard sarek 23:42:02 gerard/dennis > run lsqman < lsqman.inp > lsqman.out ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Next we can use DEJANA to select only the best hits for display in O:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- % 1078 gerard sarek 23:42:02 gerard/dennis > run dejana [...] Maximum number of hits : ( 2500) O macro (DEJAVU/LSQMAN/SPASM/RIGOR/SAVANT/SAVANA) ? (lsqman.omac) lsq_user.omac Reading hits ... # 1 ID 2AYH Nmatch 124 RMSD 1.58 A [...] Nr of hits (> 0 atoms/residues/SSEs) : ( 16) ------------------------------------------ Min nr of matched atoms/residues/SSEs ? ( 1) Max RMSD of matched atoms/residues/SSEs ? ( 999.990) Sorting hits ... Nr of hits left : ( 16) # 1 ID 1CEL Nmatch 434 RMSD 0.00 A # 2 ID 4CEL Nmatch 434 RMSD 0.19 A [...] # 16 ID 1QTJ Nmatch 7 RMSD 0.68 A Select one of the following options: 0 = re-enter criteria and re-sort 1 = write new O macro with current hits 2 = quit program without writing new O macro 3 = toggle sort mode (nr matches <-> RMSD) Option (0, 1, 2) ? ( 0) ------------------------------------------ Min nr of matched atoms/residues/SSEs ? ( 1) 100 Max RMSD of matched atoms/residues/SSEs ? ( 999.990) Sorting hits ... Nr of hits left : ( 12) # 1 ID 1CEL Nmatch 434 RMSD 0.00 A # 2 ID 4CEL Nmatch 434 RMSD 0.19 A # 3 ID 7CEL Nmatch 434 RMSD 0.31 A # 4 ID 6CEL Nmatch 434 RMSD 0.34 A # 5 ID 2OVW Nmatch 345 RMSD 1.29 A # 6 ID 1A39 Nmatch 335 RMSD 1.34 A # 7 ID 1OVW Nmatch 333 RMSD 1.17 A # 8 ID 2A39 Nmatch 329 RMSD 1.18 A # 9 ID 1EG1 Nmatch 309 RMSD 1.27 A # 10 ID 1MAC Nmatch 132 RMSD 2.11 A # 11 ID 2AYH Nmatch 124 RMSD 1.58 A # 12 ID 1GBG Nmatch 123 RMSD 1.64 A Select one of the following options: 0 = re-enter criteria and re-sort 1 = write new O macro with current hits 2 = quit program without writing new O macro 3 = toggle sort mode (nr matches <-> RMSD) Option (0, 1, 2) ? ( 0) 1 New O macro file ? (dejana.omac) Writing hits ... Processing PDB code : (1CEL) [...] Processing PDB code : (1GBG) New O macro written ... [...] ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
The top 12 hits look like they are really good (more than 100 structurally aligned residues with reasonable RMSD values), so we select only these for display in O.
The result of running DEJANA is an O macro that will only display the top 12 hits. Simply start up O, and execute the macro:
----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- % 1081 gerard sarek 23:42:02 gerard/dennis > ono [...] As4> File not found in path: on_startup As4> Indirect file does not exist. As3> File not found in path: on_startup As3> Indirect file does not exist. @dejana.omac As3> Macro in computer file-system. As3> Current molecule has not been loaded. Mol> Maximum inter-residue link distance = 2.00 Mol> There were 23 residues. Mol> 175 atoms. As4> ... Analysing USER As4> ... From file 1cel.pdb Sam> File type is PDB Sam> Database compressed. Sam> Space for 714061 atoms Sam> Space for 10000 residues Sam> Molecule USER contained 434 residues and 3220 atoms [...] As4> ========================================== As4> ... Comparing 1GBG As4> ... From file /portray/pub/databases/pdb/all_entries/uncompressed_file As4> ... Nr of matched residues 123 As4> ... RMS distance of these 1.64281 As4> ... RMS delta B 7.11511 As4> ... Similarity index 2.85823 As4> ... Match index 0.31665 As4> ... Crippen RHO 0.11723 Sam> What coordinate file type? [PDB]: Sam> File type is PDB Sam> Database compressed. Sam> Space for 588801 atoms Sam> Space for 10000 residues Sam> EXPDTA X-RAY DIFFRACTION Sam> Molecule 1GBG contained 373 residues and 1883 atoms [...] ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE ----- EXAMPLE -----
Does your result look something like this ?
That's all, folks !
Latest update at 16 August, 2001.