Protein Modeling with Modeller: A Comprehensive Guide

March 15, 2024 Off By admin

This outline provides a structured approach to learning protein modeling with Modeller, covering basic to advanced topics and including practical examples and case studies to enhance understanding

Table of Contents

Introduction to Protein Modeling

Protein structure prediction is of paramount importance in various fields, including bioinformatics, molecular biology, and drug discovery. Here are some key reasons why protein structure prediction is crucial:

Understanding Protein Function: Protein structure provides insights into its function. Predicting protein structures helps in understanding how proteins work, interact with other molecules, and carry out their biological roles.
Drug Discovery and Design: Knowledge of protein structures is essential for drug discovery and design. Predicting the structure of a target protein can help identify potential drug binding sites, design specific inhibitors, and optimize drug candidates.
Biomedical Research: Protein structure prediction is crucial in biomedical research for studying diseases, understanding the molecular basis of genetic disorders, and developing new therapies.
Enzyme Engineering: Predicting protein structures can aid in enzyme engineering, where enzymes are modified to improve their catalytic activity or stability for industrial or therapeutic purposes.
Agricultural Biotechnology: Protein structure prediction is used in agricultural biotechnology for improving crop yield, disease resistance, and nutritional content through genetic engineering.
Biodefense and Biosecurity: Understanding the structures of proteins involved in pathogenicity can aid in developing countermeasures against biological threats.
Personalized Medicine: Protein structure prediction can contribute to personalized medicine by enabling the design of treatments tailored to an individual’s genetic makeup.
Protein Engineering: Predicting protein structures can facilitate protein engineering, where proteins are modified to enhance their properties for various applications, such as in biocatalysis or bioremediation.

Overall, protein structure prediction plays a crucial role in advancing our understanding of biology, developing new therapies, and addressing various societal challenges.

Overview of Modeller software

Modeller is a software package used for protein structure prediction and modeling. It is widely used in bioinformatics and computational biology for homology modeling, which is the process of predicting the three-dimensional structure of a protein based on its similarity to known protein structures. Here’s an overview of Modeller and its key features:

Homology Modeling: Modeller’s primary function is to predict protein structures based on homology to experimentally determined structures (templates). It uses a comparative modeling approach to generate models that are consistent with the known structures of related proteins.
Energy Functions: Modeller uses physics-based energy functions to evaluate and refine the predicted protein structures. These energy functions take into account various factors such as bond lengths, angles, dihedral angles, non-bonded interactions, and solvation effects.
Alignment Generation: Before modeling, Modeller requires a sequence alignment between the target protein and its template(s). It can generate these alignments automatically based on sequence similarity and other criteria.
Modeling Options: Modeller offers several modeling options, including loop modeling (for modeling regions with missing coordinates), comparative modeling (for predicting full-length protein structures), and homology docking (for modeling protein-protein complexes).
Integration with Other Tools: Modeller can be integrated with other software packages and databases, such as BLAST for sequence searching, and the Protein Data Bank (PDB) for accessing experimentally determined structures.
Output and Visualization: Modeller generates output files containing the predicted protein structures in various formats, such as PDB files. These structures can be visualized and analyzed using molecular visualization software like PyMOL or VMD.
Community Support: Modeller is supported by an active user community and is regularly updated with new features and improvements.

Overall, Modeller is a powerful tool for protein structure prediction and modeling, especially for researchers interested in understanding protein structure-function relationships and designing novel proteins with specific properties.

Understanding Protein Structures

Primary, secondary, tertiary, and quaternary structures

Proteins are complex molecules made up of amino acids. The structure of a protein is typically described at four levels: primary, secondary, tertiary, and quaternary.

Primary Structure: The primary structure of a protein is the linear sequence of amino acids in its polypeptide chain. This sequence is determined by the gene encoding the protein. The primary structure is critical because it dictates the folding and ultimately the function of the protein.
Secondary Structure: The secondary structure refers to the local folding patterns of the polypeptide chain. The two most common types of secondary structure are alpha helices and beta sheets. These structures are stabilized by hydrogen bonds between amino acids in the chain.
Tertiary Structure: The tertiary structure is the overall three-dimensional shape of a single protein molecule. It is determined by the interactions between amino acid side chains (R-groups), such as hydrogen bonds, disulfide bonds, hydrophobic interactions, and van der Waals forces. The tertiary structure is crucial for the protein’s function and stability.
Quaternary Structure: Some proteins consist of multiple polypeptide chains, known as subunits, that come together to form a functional protein complex. The quaternary structure describes the arrangement of these subunits and the interactions between them. Protein complexes can have various quaternary structures, such as dimers, trimers, and larger oligomers.

Understanding the structure of a protein at these different levels is essential for understanding its function and behavior in biological systems. Techniques such as X-ray crystallography, nuclear magnetic resonance (NMR) spectroscopy, and computational modeling are used to study protein structures at these different levels.

Protein structure databases (PDB, SCOP, CATH)

Protein structure databases play a crucial role in bioinformatics and structural biology by providing access to experimentally determined protein structures and related information. Here are three major protein structure databases:

Protein Data Bank (PDB): The PDB is the most comprehensive and widely used repository of experimentally determined protein structures. It contains atomic coordinates and related information for proteins, nucleic acids, and complex assemblies. The PDB is essential for understanding protein structure-function relationships, drug discovery, and molecular biology research.
Structural Classification of Proteins (SCOP): SCOP is a database that classifies protein structures into a hierarchy of structural domains based on their structural and evolutionary relationships. It provides a framework for understanding the evolution and diversity of protein structures and is useful for studying protein function and evolution.
Class, Architecture, Topology, Homologous superfamily (CATH): CATH is another database that classifies protein structures into hierarchical categories based on their structural and evolutionary relationships. It helps in analyzing protein structure-function relationships and provides insights into the evolution of protein folds and functions.

These databases are valuable resources for researchers studying protein structure and function, as they provide access to a wealth of structural information that can be used to advance our understanding of biology and develop new therapeutic strategies.

Introduction to Modeller

Modeller is a software package used for protein structure prediction and modeling. It is widely used in bioinformatics, structural biology, and drug design. Modeller’s primary function is to predict the three-dimensional structure of a protein based on its amino acid sequence and known structures of related proteins (templates). Here is an introduction to Modeller and its key features:

1. Homology Modeling: Modeller uses a comparative modeling approach to predict protein structures. It aligns the target protein sequence with one or more template structures and uses this alignment to generate a model of the target protein.

2. Energy Functions: Modeller uses physics-based energy functions to evaluate the quality of the predicted protein structures. These energy functions take into account factors such as bond lengths, angles, dihedral angles, non-bonded interactions, and solvation effects.

3. Automated Modeling: Modeller provides tools for automating the modeling process, including sequence alignment generation, model building, and structure refinement. This makes it easier for researchers to predict protein structures quickly and accurately.

4. Loop Modeling: Modeller includes tools for modeling regions of the protein with missing coordinates (loops). This is useful for predicting the structure of proteins with flexible or disordered regions.

5. Comparative Modeling: Modeller can be used for full-length comparative modeling, where the entire protein structure is predicted based on one or more template structures. It can also be used for comparative modeling of protein-protein complexes.

6. Integration with Other Tools: Modeller can be integrated with other bioinformatics tools and databases, such as BLAST for sequence searching and the Protein Data Bank (PDB) for accessing experimentally determined structures.

7. Visualization and Analysis: Modeller generates output files containing the predicted protein structures in various formats, such as PDB files. These structures can be visualized and analyzed using molecular visualization software.

8. Community Support: Modeller is supported by an active user community and is regularly updated with new features and improvements based on user feedback.

Overall, Modeller is a powerful tool for predicting protein structures based on homology modeling. It is widely used in bioinformatics, structural biology, and drug design for understanding protein structure-function relationships and designing novel proteins with specific properties.

Installation and setup

Installing and setting up Modeller can vary depending on your operating system. Here, I’ll provide a general overview of the installation process:

Download Modeller: Visit the Modeller website (https://salilab.org/modeller/download_installation.html) and download the appropriate version of Modeller for your operating system (Windows, macOS, or Linux).
Install Modeller:
- Windows: Run the installer and follow the on-screen instructions. The installer will guide you through the installation process.
- macOS: Open the downloaded DMG file and drag the Modeller application to your Applications folder.
- Linux: Extract the downloaded tarball to a directory of your choice. You may need to set the executable permissions for the Modeller binaries.
Set up Modeller:
- After installing Modeller, you may need to set up environment variables or configure paths depending on your operating system. Instructions for this can usually be found in the Modeller documentation or installation guide.
License: Modeller requires a license file to run. You can obtain a license file from the Modeller website. Place the license file in the appropriate directory as specified in the Modeller documentation.
Testing the Installation: To test if Modeller is installed correctly, you can run a simple example provided in the Modeller documentation. This will ensure that Modeller is set up and running properly on your system.
Optional: You may also need to install additional software or libraries that Modeller relies on, such as Python or certain Python packages. Check the Modeller documentation for specific requirements.

Remember to always refer to the official Modeller documentation and installation guide for detailed and up-to-date instructions specific to your operating system.

Preparing Input Files

Preparing input files for protein structure prediction using Modeller involves several steps, including obtaining the target sequence, selecting appropriate templates from protein structure databases, and generating a sequence alignment between the target sequence and the selected templates. Here’s a general overview of these steps:

Obtaining Target Sequence:
- Obtain the amino acid sequence of the target protein. This sequence can be obtained from databases like UniProt or extracted from experimental data.
Template Selection:
- Search for structurally similar proteins (templates) in protein structure databases like the Protein Data Bank (PDB).
- Select templates based on criteria such as sequence similarity, structure quality, and relevance to the target protein’s function.
Sequence Alignment:
- Use tools like BLAST or HHpred to perform a sequence alignment between the target sequence and the selected templates.
- Generate a multiple sequence alignment (MSA) if multiple templates are selected.
Input File Preparation:
- Prepare a Modeller input file (usually a Python script) that specifies the target sequence, template structures, and alignment information.
- Include any additional parameters or options required for the modeling process.
Running Modeller:
- Execute the Modeller script to generate the protein structure model.
- Modeller will use the sequence alignment and template structures to build a model of the target protein’s structure.
Model Evaluation:
- Evaluate the quality of the generated model using tools like PROCHECK or MolProbity.
- Refine the model if necessary based on the evaluation results.
Visualization and Analysis:
- Visualize the predicted protein structure using molecular visualization software like PyMOL or VMD.

Model Building with Modeller

Model building with Modeller involves using the software to generate a three-dimensional model of a protein based on the input files prepared in the previous steps. Here’s an overview of the basic modeling protocol, advanced modeling options, and model evaluation and refinement:

Basic Modeling Protocol:

Initialization: Load the Modeller software and initialize the modeling environment.
Input Files: Provide the Modeller script with the input files containing the target sequence, template structures, and alignment information.
Model Building: Use the automodel class in the Modeller script to build the model. This class automatically generates a model based on the input files and specified parameters.
Model Output: After running the modeling script, Modeller will generate output files containing the predicted protein structure in PDB format. These files can be visualized and analyzed using molecular visualization software.

Advanced Modeling Options:

Loop Modeling: Modeller provides specific tools for modeling loops (regions with missing coordinates) in protein structures. The loopmodel class can be used to refine loop regions based on the template structures.
Symmetry Considerations: For proteins with symmetry (e.g., oligomeric proteins), Modeller offers options to model the symmetry of the protein complex. The symmetry class can be used to specify symmetry constraints during modeling.
Refinement: Modeller provides options for refining the generated model to improve its quality. This can include energy minimization, side-chain optimization, and other refinement techniques.
Customization: Modeller allows for extensive customization of the modeling process through the use of scripting. Advanced users can modify scripts to incorporate specific modeling strategies and parameters.

Model Evaluation and Refinement:

Evaluation: After generating the model, it is essential to evaluate its quality. Tools like PROCHECK, MolProbity, and Verify3D can be used to assess the stereochemical quality and compatibility of the model with the input data.
Refinement: Based on the evaluation results, the model may need to be refined further. This can involve iterative cycles of modeling, evaluation, and refinement until a satisfactory model is obtained.
Validation: Validate the final model using tools like Ramachandran plots, clash analysis, and overall structural quality assessment to ensure it is suitable for further analysis or applications.

Overall, Modeller provides a range of tools and options for building protein models, from basic modeling protocols to advanced modeling strategies and refinement techniques, making it a versatile tool for protein structure prediction and modeling.

Case Studies and Examples

From the MODELLER web site

MODELLER is used for homology or comparative modeling of protein three-dimensional structures (Webb and Sali 2016, Marti-Renom et al. (2000))

The user provides an alignment of a sequence to be modeled with known related structures and MODELLER automatically calculates a model containing all non-hydrogen atoms.

MODELLER implements comparative protein structure modeling by satisfaction of spatial restraints (Sali and Blundell 1993, Fiser, Do, and Sali (2000)), and can perform many additional tasks, including de novo modeling of loops in protein structures, optimization of various models of protein structure [. . . ]

Figure 1.

Figure 1: MODELLER process flow

Modeller is 9.18 is intalled on all the iMacs. However, each user should register with the web site to obtain the install keyword at https://salilab.org/modeller/registration.html

Acknowledgments

Part of this tutorial is from “Comparative Protein Structure Prediction MODELLER tutorial” by Marc A. Marti-Renom ( PDF )2

2http://sgt.cnag.cat/

www/presentations/ files/slides/20081104_ MODELLER_Tutorial. pdf

Set-up

We will use MODELLER on a Macintosh system but it would work exactly the same on other platforms.

MODELLER is made of a collection of python scripts, that the user just has to modify to reflect the name of the target sequence(s) and the template structure(s).

It is always good practise to create a directory for a specific project. Let’s create a directory on the desktop called MOD1 where we will save the necessary files.

TASK

Create a folder/directory on your desktop called MOD1 or any name you wish.

Terminal

Then MODELLER is invoked on the line command with the name of the current version. The current release is 9.18 and is invoked on the line command as mod9.18 followed by the name of the script to run.

TASK

Open a text Terminal.

It is necessary to open a text Terminal to run MODELLER. On Mac Terminal is found as

/Applications/Utilities/Terminal but can easily be launched by typing Terminal within the “Spotlight Search” on the top-right corner of the Mac screen (magnifying glass icon.)

(On a Windows computer you would need to open a command line by searching for the cmd

program with Cortna or the Start button.)

Next it is necessary to change where the Terminal is “looking” with the “change directory”

cd command:

cd Desktop cd MOD1

You can check which directory Terminal is looking into with the command:

pwd

In the next section we will add files and scripts to this folder.

Text editing

Script and/or plain text files can be edited on a Macintosh with the built-in text editor TextEdit. However, it is necessary to verify that the format is plain text by engaging the menu Format > Make Plain Text if the program opens in Rich Text format as it is often the default behavior.

Within Terminal the full screen word processor nano could also be used (and is also available on Linux systems.)

Windows users can use Notepad or Wordpad to easily create plain text files.

To create the necessary text files simply Copy/Paste the information from this page into a text document on your computer using one of the text editors mentioned above.

Using MODELLER

To run MODELLER we need input data: sequence(s) and 3D template(s) in the proper format as well as python scripts. The later are found on the MODELLER web site as example files to be modified.

The output will consist of 1 or more (if requested in the script) 3D PDB format models, an alignment of sequence(s), a log file and other ancilary output.

INPUT:

sequence(s) target(s): FASTA/PIR format
structure(s) template(s): PDB format
Python command file(s): plain text format

OUTPUT:

Target-Template Alignment
Model(s) in PDB format
Other data

Simple example

This simple example assumes that some prior study work has been done on the sequence to be modeled to find a suitable 3D template (e.g. with BLAST.)

The purpose of the exercise is to create a 3D model from the sequence of the “brain lipid-binding protein” (blbp) of a mouse sequence based on one existing 3D structure with a different sequence that has been solved and published on the Protein Data Bank (PDB) (Berman et al. 2000).

The sequence in FASTA format looks like this, and has accession code NP_067247.1.

>NP_067247.1 fatty acid-binding protein, brain [Mus musculus] MVDAFCATWKLTDSQNFDEYMKALGVGFATRQVGNVTKPTVIISQEGGKVVIRTQCTFKNTEINFQLGEE FEETSIDDRNCKSVVRLDGDKLIHVQKWDGKETNCTREIKDGKMVVTLTFGDIVAVRCYEKA

Prior analysis (e.g. BLAST) reveals that the sequence of the “brain lipid-binding protein” is closely related of that of “human muscle fatty acid binding protein” that has been solved by X-ray crystallography with accession code 1HMS 1hms.pdb (Young et al. 1994).

The sequence of that protein in FASTA format looks like this:

>1HMS:A|PDBID|CHAIN|SEQUENCE VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTADDRKV KSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKEA

A simple two-sequence BLAST alignment reveals that the protein sequences are 62% identical and 78% similar with no sequence gaps (see below.)

Therefore these are a perfect subject for homology modeling.

Score Expect Method Identities Positives Gaps

177 bits(450) 8e-64 Compositional matrix

adjust.

V R YEK Sbjct 121 DIVAVRCYEK 130

81/130(62%) 102/130(78%)0/130(0%)

Query	1	VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNT	60
Sbjct	1	VDAF TWKL DS+NFD+YMK+LGVGFATRQV ++TKPT II + G + ++T TFKNT VDAFCATWKLTDSQNFDEYMKALGVGFATRQVGNVTKPTVIISQEGGKVVIRTQCTFKNT	60
Query	61	EISFKLGVEFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHG	120
Sbjct	61	EI+F+LG EF+ET+ DDR KS+V LDG KL+H+QKWDG+ET RE+ DGK+++TLT G EINFQLGEEFEETSIDDRNCKSVVRLDGDKLIHVQKWDGKETNCTREIKDGKMVVTLTFG	120
Query	121	TAVCTRTYEK 130

INPUT: Target sequence

TASK

Create a text file called blbp.seq containing the sequence sequence in the MOD1 directory.

You can copy/paste the sequence below. The format starts with >P1 which is an original annotation form from the early PIR protein database .

The : colon separators are part of the MODELLER format and will make more sense later when you see the PDB sequence transformed in this format automatically below. For now simply copy/paste te following sequence into a plain text file

Example Target: Brain lipid-binding protein (BLBP). BLBP sequence in PIR (MODELLER) format:

>P1;blbp sequence:blbp::::::::

VDAFCATWKLTDSQNFDEYMKALGVGFATRQVGNVTKPTVIISQEGGKVVIRTQCTFKNTEINFQLGEEFEETSIDDRNCKSVV RLDGDKLIHVQKWDGKETNCTREIKDGKMVVTLTFGDIVAVRCYEKA*

INPUT: download PDB structure

The input structure has accession code 1HMS.

The downloaded file will appear in your Downloads directory as 1HMS.pdb.

TASK

Download and then move downloaded file 1HMS.pdb to the MOD1 directory.

INPUT: Align sequences

The target sequence and 3D structure sequence need to be aligned and saved in a file with the proper format.

To accomplish this we need to edit a python script listing the name of the files containing the sequences. The sequence will be extracted from the PDB file itself by MODELLER from the script instructions.

TASK

Create a text file called align.py with the following content and save it in folder

# Example for: alignment.align()

# This will read two sequences, align them, and write the alignment # to a file:

log.verbose() env = environ()

aln = alignment(env)

mdl = model(env, file=’1hms’) aln.append_model(mdl, align_codes=’1hms’) aln.append(file=’blbp.seq’, align_codes=(‘blbp’))

# The as1.sim.mat similarity matrix is used by default: aln.align(gap_penalties_1d=(-600, -400)) aln.write(file=’blbp-1hms.ali’, alignment_format=’PIR’) aln.write(file=’blbp-1hms.pap’, alignment_format=’PAP’)

MOD1:

Note: Since these are python functions, they need parentheses () even if there is nothing inside them. The meaning of the commands can be found under MOD ELLER online manual https://salilab.org/modeller/manual/ and described succintly below.

Explanations for the commands contained within this script:

log.verbose() : display all log output
env = environ() : create a short name for environ()
environ() : contains most information about the MODELLER environment, such as the energy function and parameter and topology libraries [. . . ].
aln = alignment(env) : This creates a new alignment object; by default, this contains no sequences. aln is the short name for this object.
mdl = model(env, file=’1hms’) : create a new 3D model. Here we pass on the information about the PDB file and atom information will be read. mdl is the short name for this object.
aln.append_model(mdl, align_codes=’1hms’) : append the sequence of 1hms to the alignment. In more complex analyzes there could be multiple PDB codes passed on.
aln.append(file=’blbp.seq’, align_codes=(‘blbp’)) : append the target sequence to the alignment.
# The as1.sim.mat similarity matrix is used by default: This is a comment line
aln.align(gap_penalties_1d=(-600, -400)) the command aln.align create the alignment based on the indicated gap penalties.
aln.write(file=’blbp-1hms.ali’, alignment_format=’PIR’) the alignment is writ- ten in PIR format.
aln.write(file=’blbp-1hms.pap’, alignment_format=’PAP’) the alignment is writ- ten in PAP format.

It is worth noting the following point:

the PDB codes are within single quotes, for example ‘1hms’
If there are multiple arguments passed to a function, there is a space after the comma

, for example before the word alignment_format= in the lines above.

- 1. Run script to create alignment files

TASK

Run alignment script align.py within MOD1.

Verify that you are within the MOD1 directory:

pwd

The answer should be something like:

/Users/yourname/Desktop/MOD1

Now run the alignment script by typing:

mod9.18 align.py

This will create the files: blbp-1hms.ali, blbp-1hms.pap, and align.log.

To see the content of the alignment files we can use the simple cat command on the Terminal

(or use the graphical interface with TextEdit for example.) Note the use of the : colon separator in the PDB sequence file.

cat blbp-1hms.ali

>P1;1hms

structureX:1hms: 1 :A:+131 :A:MOL_ID 1; MOLECULE MUSCLE FATTY ACID BINDING PROTEIN; CHAIN A; ENGINEERED VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGVEFDETTA DDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKE*

>P1;blbp

sequence:blbp: : : : :::-1.00:-1.00 VDAFCATWKLTDSQNFDEYMKALGVGFATRQVGNVTKPTVIISQEGGKVVIRTQCTFKNTEINFQLGEEFEETSI DDRNCKSVVRLDGDKLIHVQKWDGKETNCTREIKDGKMVVTLTFGDIVAVRCYEKA*

This alignment extracted sequence information from the PDB file for 1HMS including header information about the content that is placed within the header of structureX:1hms.

The .ali formatted alignment file is used later by MODELLER to create the 3D model(s).

The .pap formatted alignment is easier for human eyes to evaluate the alignment with the marked conserved (identity) regions.

cat blbp-1hms.pap

_aln.pos 1hms

blbp

VDAFLGTWKLVDSKNFDDYMKSLGVGFATRQVASMTKPTTIIEKNGDILTLKTHSTFKNTEISFKLGV

VDAFCATWKLTDSQNFDEYMKALGVGFATRQVGNVTKPTVIISQEGGKVVIRTQCTFKNTEINFQLGE

_consrvd **** **** ** *** *** ********** **** ** * * ******* * **

_aln.p 1hms

blbp

100

110

120

130

EFDETTADDRKVKSIVTLDGGKLVHLQKWDGQETTLVRELIDGKLILTLTHGTAVCTRTYEKE

EFEETSIDDRNCKSVVRLDGDKLIHVQKWDGKETNCTREIKDGKMVVTLTFGDIVAVRCYEKA

_consrvd ** ** *** ** * *** ** * ***** ** ** *** *** * * * ***

Model building

We now have the necessary “ingredients” to create the 3D model:

aligned sequences
3D original template

We now need to create/edit the MODELLER python script that will list these ingredients and call the MODELLER functions to build the model.

TASK

Create a text file called model.py with the following content and save it in folder MOD1. Note that the comments noted with # do not need to be re-typed if not creating the file with a copy/paste method. The blank lines are only for text clarity and can also be omitted if desired.

To create the file you can use TextEdit or nano for example.

# Homology modelling by the automodel class

from modeller.automodel import * # Load the automodel class log.verbose() # request verbose output

env = environ() # create a new MODELLER environment

a = automodel(env,

alnfile = ‘blbp-1hms.ali’, knowns = ‘1hms’,

sequence = ‘blbp’)

# alignment filename

# codes of the templates # code of the target

a.starting_model= 1

a.ending_model = 1

# index of the first model # index of the last model

# (determines how many models to calculate)

a.make()

# do the actual homology modelling

Remarks: The automodel function is renamed a and the “dot notation” is used to call on sub function appended to a as it is the usual writing mode in python.

In this simple file we create only one model, but to obtain e.g. 5 models the a.ending_model

argument would be set to 5.

Run model building script

TASK

Run model.py within MOD1 in the same manner as we ran the align.py script:

mod9.18 model.py

This will create the following files:

blbp.B99990001.pdb blbp.D00000001 blbp.V99990001

model.log blbp.ini blbp.rsr blbp.sch

The final 3D model is called blbp.B99990001.pdb and that is the “end product” that was desired.

In real life, multiple models would be calculated (e.g. 5) and various evaluation methods could be applied to decide which are “best.”

You can explore the content of the remaining file (all text files) with the less -S command that will display the file content to the screen without wrapping long lines.

Compare model and template graphically

Now that we have a model we can compare the structure onbtained with the original template. For this you can use Chimera or PyMOL or any other molecular graphics software that can read

PDB files.

PyMOL

To open and compare files in PyMOL open the PyMOL program first.

At the line command type: fetch 1hms to load the original template file.
Using the menu cascade File > Open… navigate to the MOD1 directory to open file

blbp.B99990001.pdb.

Use left mouse button to rotate structure.

Note: the 2 structures will not be superimposed at first and it will be necessary to align them in 3D.

Align the structures: on the Names panel at right, click on A (action) button next to the line that reads blbp.B99990001.pdb 1 for the model. Following further down on this pull-down menu follow the menu cascade: A > align > to molecule (*/CA) > 1hms
To hide or show either structure simply click once on the name of the structure on the list at the right hand side Names panel.

Figure 2: “Align structures menu.”

Figure 3: “Open and align structures in PyMOL.”

In order to highlight the bound lipid use the following menu casade next to the all line on the right hand side: all > S > organic > sticks
To hide the red dot water molecules: all > H > waters

Note: only the protein is modeled, the ligands are not modeled by MODELLER. These are typically written as HETATM within the PDB file.

Chimera

If you prefer using Chimera:

Open Chimera
Open template structure: File > Fetch by ID. . . and enter 1HMS in the Fetch Struc ture by ID in the text space next to the PDB button. This will open the structure in “first view” mode as a cartoon ribbon diagram.
Open the model: **File > Open… and navigate to the MOD1 directory to open file

. The default view will also be as a cartoon ribbon.

blbp.B99990001.pdb

Note: the 2 structures will not be superimposed at first and it will be necessary to align them in 3D.

Tools > Sequence Comparison > MatchMaker will open the MatchMaker window. Keep everything the the current default and click 1HMS (#0) for the “Reference structure” and blbp.B99990001.pdb (#1) for the “Structure(s) to match”
Click Apply and the 2 structures will be aligned.
Use left mouse button to rotate structure.

Figure 4: “Open and align structures in Chimera.”

Comparing the model(s) with solved strcutures.

It happens that since this exercise was written many actual structures were solved.

A BLAST restricted to the Protein Data Bank will give some PDB codes of solved structures. For example:

5URA_A

Description Max score Total score Query cover E value Ident Accession

Chain A, Enantiomer- specific Binding Of The Potent Antinoci- ceptive Agent Sbfi- 26 To Anan- damide Trans- porters Fabp7

240 240 100% 5e-84 87%

Description Max score Total score Query cover E value Ident Accession

Chain A, Crystal Structure Of Human Brain Fatty Acid Binding Protein Chain A,

Human
Complex
With
6-chloro-2-
methyl-4-
phenyl-
quinoline-3-
Carboxylic Chain A,	180	180	99%	3e-60	63%	3WXQ_A
Serial Fem-
tosecond
X-ray
Structure
Of Human
Fatty Acid-
binding
Protein
Type-3
(fabp3) In
Complex
With
Stearic Acid
(c18:0)
Determined
Using X-ray
Free-
electron
Laser At
Sacla

Fabp3 In

238 238 99% 4e-83 87%

180 180 99% 3e-60 63%

1FDQ_A

5HZ9_A

Acid

The table is much longer!

Here is the alignment for the first in the table: 5URA chain A. Range 1: 4 to 135

Alignment statistics for match #1

Score Expect Method Identities Positives Gaps

240 bits(613) 5e-84 Compositional matrix adjust. 115/132(87%) 124/132(93%) 0/132(0%) Query 1 MVDAFCATWKLTDSQNFDEYMKALGVGFATRQVGNVTKPTVIISQEGGKVVIRTQCTFKN 60

Sbjct	4	MV+AFCATWKLT+SQNFDEYMKALGVGFATRQVGNVTKPTVIISQEG KVVIRT TFKN MVEAFCATWKLTNSQNFDEYMKALGVGFATRQVGNVTKPTVIISQEGDKVVIRTLSTFKN	63
Query	61	TEINFQLGEEFEETSIDDRNCKSVVRLDGDKLIHVQKWDGKETNCTREIKDGKMVVTLTF	120
Sbjct	64	TEI+FQLGEEF+ET+ DDRNCKSVV LDGDKL+H+QKWDGKETN REIKDGKMV+TLTF TEISFQLGEEFDETTADDRNCKSVVSLDGDKLVHIQKWDGKETNFVREIKDGKMVMTLTF	123
Query Sbjct	121 124	GDIVAVRCYEKA 132 GD+VAVR YEKA GDVVAVRHYEKA 135

OPTIONAL EXERCISE:

Load some of the solved structures and compare them to the model(s.)

Validating Protein Models

Validating protein models is an essential step in ensuring their quality and reliability for further analysis or applications. Here are some commonly used methods and tools for validating protein models:

1. Ramachandran Plot Analysis:

The Ramachandran plot is a tool used to visualize the dihedral angles (φ and ψ) of amino acid residues in a protein structure.
It helps assess the stereochemical quality of the model by identifying outliers (residues with unusual dihedral angles) that may indicate errors in the model.
Modeller and other molecular modeling software often provide tools to generate and analyze Ramachandran plots.

2. ERRAT (Verify3D, etc.):

ERRAT is a tool for assessing the overall quality of a protein model based on the agreement between the model’s atomic environment and expected values from high-resolution structures.
Verify3D is another tool that evaluates the compatibility of an atomic model (3D) with its own amino acid sequence (1D).
Both tools provide a score that indicates the quality of the model, with higher scores indicating better quality.

3. Model Quality Assessment:

Several other tools and methods can be used to assess the quality of protein models, including:
- PROCHECK: Analyzes the stereochemical quality of a protein structure, including Ramachandran plot analysis.
- MolProbity: Evaluates the geometry and sterics of a protein structure, highlighting potential clashes and other issues.
- ProSA: Assesses the overall quality of a protein structure by comparing its energy with that of experimental structures.

4. Other Validation Tools:

DSSP (Define Secondary Structure of Proteins): Assigns the secondary structure of each residue in a protein structure, which can be used to validate predicted secondary structures.
WHAT IF: Provides a range of tools for analyzing and validating protein structures, including geometry checks, hydrogen bond analysis, and more.

5. Visualization and Manual Inspection:

Visual inspection of the model using molecular visualization software (e.g., PyMOL, Chimera) is also crucial for identifying and correcting any structural anomalies or errors.

6. Validation Criteria:

It’s important to establish specific criteria for model validation based on the intended use of the model and the available experimental data (if any).
Criteria may include acceptable ranges for Ramachandran plot outliers, ERRAT scores, and other validation metrics.

By employing these validation methods and tools, researchers can ensure that their protein models are of high quality and suitable for use in further studies or applications.

Visualizing and Analyzing Protein Models

Visualizing and analyzing protein models is a critical step in understanding their structure-function relationships. PyMOL is a popular molecular visualization tool that can be used to visualize protein models and analyze various structural features. Here’s how you can use PyMOL for these purposes:

1. Visualizing Protein Models:

Open PyMOL and load the protein structure file (PDB format) of interest.
Use commands like show cartoon to display the protein’s backbone as a cartoon representation and show sticks to display ligands or other molecules as stick models.
Use the mouse or command-line options to rotate, zoom, and manipulate the view to visualize the protein from different angles.

2. Analyzing Active Sites:

Identify residues that are likely involved in the active site based on the protein’s structure and known functional residues.
Use PyMOL’s selection tools (e.g., select) to highlight these residues and visualize them in the context of the protein’s structure.

3. Analyzing Ligand Binding Sites:

If the protein binds to a ligand, use PyMOL’s select command to highlight the ligand and its surrounding residues.
Use PyMOL’s distance and angle commands to measure distances and angles between key atoms in the ligand binding site.

4. Surface Representation:

Use PyMOL’s show surface command to display the protein surface, which can help visualize the overall shape of the protein and its surface properties.

5. Coloring and Rendering:

Use PyMOL’s color command to color the protein by secondary structure (e.g., color red, ss h) or by other properties (e.g., color blue, resi 100-200 to color residues 100-200 blue).
Experiment with different rendering styles (e.g., cartoon, sticks, spheres) to highlight different aspects of the protein’s structure.

6. Exporting Images and Videos:

Use PyMOL’s png or ray command to export high-quality images of the protein structure.
Use PyMOL’s movie command to create animations or videos showing different views or structural changes in the protein.

7. Using Plugins and Scripts:

PyMOL has many plugins and scripts available that can enhance its functionality for specific tasks, such as measuring distances, analyzing electrostatic potentials, or visualizing protein dynamics.

By using PyMOL’s powerful visualization and analysis features, researchers can gain valuable insights into protein structures and functions, aiding in drug discovery, enzyme engineering, and other biological studies.

Applications of Protein Modeling

Protein modeling plays a crucial role in several key areas of biological research and drug discovery. Here are some of the primary applications of protein modeling:

1. Drug Discovery and Design:

Structure-Based Drug Design: Protein modeling is used to predict the three-dimensional structure of target proteins involved in diseases. This information is then used to design small molecules or biologics that can bind to these targets with high affinity and specificity, leading to the development of new drugs.
Virtual Screening: Protein models can be used in virtual screening to identify potential drug candidates from large compound libraries. The models are used to predict how these compounds might bind to the target protein, helping to prioritize compounds for further experimental testing.

2. Protein Engineering:

Rational Protein Design: Protein modeling can be used to design proteins with specific functions or properties. By understanding the structure-function relationships of proteins, researchers can design mutations or modifications to enhance or alter protein activity, stability, or binding properties for various applications.
Enzyme Engineering: Protein modeling is used to design enzymes with improved catalytic activity, substrate specificity, and stability for industrial applications such as biocatalysis and bioremediation.

3. Understanding Protein Function and Dynamics:

Functional Annotation: Protein models can provide insights into the function of unknown proteins by comparing their structures to those of known proteins with similar structures and functions.
Protein Dynamics: Molecular dynamics simulations, based on protein models, can provide insights into the dynamic behavior of proteins, including conformational changes and interactions with other molecules.

4. Structural Biology:

Protein modeling is used in structural biology to predict the structure of proteins that are difficult to crystallize or study experimentally. This can include large protein complexes, membrane proteins, and intrinsically disordered proteins.

5. Protein-Protein Interactions:

Protein modeling can be used to study protein-protein interactions, including the formation of protein complexes and the binding interfaces between proteins. This information is important for understanding cellular signaling pathways and designing therapeutic interventions.

Overall, protein modeling is a powerful tool that can provide valuable insights into protein structure, function, and interactions, with broad applications in drug discovery, protein engineering, and fundamental biological research.

Advanced Topics in Protein Modeling

Advanced topics in protein modeling often involve sophisticated strategies and techniques to improve the accuracy and reliability of the models. Here are two advanced topics:

1. Homology Modeling Strategies:

Template Selection: Advanced homology modeling involves careful selection of templates that are structurally and evolutionarily related to the target protein. This can include using multiple templates to model different regions of the target protein.
Alignment Refinement: Improving the accuracy of the sequence alignment between the target protein and the templates is crucial. Advanced techniques, such as profile-profile alignment and iterative alignment methods, can be used to refine the alignment.
Modeling Loops and Side Chains: Modeling regions with missing coordinates (loops) and predicting the side-chain conformations of the modeled protein are critical for improving the quality of the model. Advanced loop modeling algorithms and side-chain prediction methods can be used for this purpose.
Model Refinement: After building the initial model, refinement techniques such as energy minimization, molecular dynamics simulations, and loop modeling can be applied to improve the overall quality of the model.

2. Integrating Experimental Data (NMR, Cryo-EM) into Modeling:

NMR Data Integration: NMR spectroscopy can provide distance constraints and other structural information that can be used to refine protein models. Integrating NMR data into modeling involves incorporating these constraints into the modeling process to improve the accuracy of the models.
Cryo-EM Data Integration: Cryo-electron microscopy (cryo-EM) can provide low-resolution structural information of macromolecular complexes. Integrating cryo-EM data into modeling involves fitting the experimental density maps into the models and refining the models to fit the experimental data.
Hybrid Methods: Advanced modeling often involves combining multiple experimental techniques and computational methods to generate accurate models. For example, integrating NMR data with cryo-EM data and homology modeling can provide more accurate models of large protein complexes.

These advanced topics require a deep understanding of protein structure and modeling principles, as well as proficiency in using computational tools and software for protein modeling. They are essential for researchers working on complex protein systems and aiming to achieve high-quality models for their studies.

Case Studies and Examples

Here are some real-world examples of successful protein modeling projects, along with challenges faced and solutions implemented in specific modeling scenarios:

1. Drug Design:

Example: In the development of HIV protease inhibitors, researchers used protein modeling to design small molecules that could bind to the active site of the HIV protease enzyme and inhibit its activity, leading to the development of successful antiretroviral drugs.
Challenges and Solutions: One challenge in drug design is predicting the binding affinity and specificity of potential drug candidates. Researchers use molecular docking and molecular dynamics simulations to predict the binding modes of drugs and optimize their interactions with the target protein.

2. Enzyme Engineering:

Example: In the engineering of enzymes for industrial applications, protein modeling is used to design mutations that improve enzyme activity, stability, and substrate specificity. For example, researchers have engineered enzymes for biofuel production and bioremediation.
Challenges and Solutions: One challenge in enzyme engineering is predicting the effects of mutations on enzyme structure and function. Researchers use computational tools to predict the effects of mutations and select those that are likely to improve enzyme performance.

3. Structural Biology:

Example: In structural biology, protein modeling is used to predict the structures of proteins that are difficult to crystallize or study experimentally, such as membrane proteins or large protein complexes.
Challenges and Solutions: One challenge in structural biology is modeling the structures of proteins with high flexibility or multiple conformational states. Researchers use techniques such as ensemble modeling and enhanced sampling methods to model these proteins’ structures accurately.

4. Protein-Protein Interactions:

Example: In studying protein-protein interactions, protein modeling is used to predict the binding interfaces between proteins and understand the mechanisms of complex formation.
Challenges and Solutions: One challenge in studying protein-protein interactions is predicting the complex structures accurately. Researchers use docking algorithms and structural bioinformatics methods to predict the structures of protein complexes and validate them experimentally.

5. Functional Annotation:

Example: In functional annotation of proteins, protein modeling is used to predict the functions of unknown proteins based on their structural similarity to known proteins with annotated functions.
Challenges and Solutions: One challenge in functional annotation is identifying structural features that are indicative of protein function. Researchers use structural bioinformatics tools and databases to compare protein structures and infer functional annotations.

These examples illustrate the diverse applications of protein modeling in biological research and highlight the importance of computational methods in understanding protein structure and function.

Future Directions in Protein Modeling

Future directions in protein modeling are shaped by emerging trends and technologies that are revolutionizing the field. One of the most significant developments is the increasing use of artificial intelligence (AI) and machine learning (ML) in protein structure prediction and modeling. Here are some key trends and their impact:

1. AI and Machine Learning:

Deep Learning: Deep learning techniques, such as convolutional neural networks (CNNs) and recurrent neural networks (RNNs), are being used to predict protein structures from amino acid sequences with remarkable accuracy.
AlphaFold: DeepMind’s AlphaFold, based on deep learning, has shown exceptional performance in the latest Critical Assessment of Structure Prediction (CASP) competition, revolutionizing the field of protein structure prediction.
Improved Model Quality: AI and ML techniques are improving the quality and reliability of protein models, leading to more accurate predictions and insights into protein structure-function relationships.

2. Integrative Modeling:

Integrative modeling approaches combine data from multiple sources, such as X-ray crystallography, NMR spectroscopy, and cryo-electron microscopy (cryo-EM), to generate more accurate and comprehensive protein models.
These approaches enable researchers to integrate experimental data with computational models, providing a more holistic view of protein structures.

3. Structural Dynamics:

Studying protein dynamics is crucial for understanding protein function. Advanced modeling techniques are being developed to simulate protein dynamics at various time scales, from milliseconds to seconds.
These techniques provide insights into how proteins change their shapes and interact with other molecules, which is essential for drug design and understanding biological processes.

4. Protein-Protein Interactions:

Modeling protein-protein interactions is a growing area of research. AI and ML techniques are being used to predict protein binding sites, identify interacting partners, and understand the mechanisms of complex formation.
These models are helping researchers design novel protein-protein interaction inhibitors and understand complex biological pathways.

5. Structural Bioinformatics:

Computational tools and databases in structural bioinformatics are constantly evolving. These tools enable researchers to analyze protein structures, predict their functions, and design novel proteins with desired properties.
The integration of AI and ML in structural bioinformatics is enhancing the capabilities of these tools and opening new avenues for research.

Overall, the future of protein modeling is exciting, with AI, ML, and integrative modeling approaches driving advancements in accuracy, speed, and understanding of protein structures and functions. These developments have the potential to revolutionize drug discovery, enzyme engineering, and our understanding of biological systems.

Conclusion

In conclusion, protein modeling is a powerful tool in bioinformatics and structural biology, with applications ranging from drug discovery to understanding protein function and dynamics. Here are the key points covered in this discussion:

Introduction to Protein Modeling: Protein modeling involves predicting the three-dimensional structure of proteins based on their amino acid sequences and known structures of related proteins.
Applications of Protein Modeling: Protein modeling is used in drug discovery and design, protein engineering, understanding protein function and dynamics, and structural biology.
Tools and Techniques: Modeller is a widely used software package for protein structure prediction and modeling. It uses comparative modeling to predict protein structures based on homology to known structures.
Advanced Topics: Advanced topics in protein modeling include homology modeling strategies, integrating experimental data (NMR, cryo-EM) into modeling, and modeling protein dynamics.
Future Directions: Future directions in protein modeling include emerging trends and technologies such as AI and machine learning, integrative modeling, studying protein dynamics, and predicting protein-protein interactions.

For further learning, here are some resources:

Books: “Introduction to Protein Structure Prediction: Methods and Algorithms” by Huzefa Rangwala, George Karypis; “Protein Structure Prediction: Methods and Protocols” edited by David Webster.
Online Courses: Coursera offers courses on bioinformatics and structural biology that cover protein modeling topics. EdX also has courses on computational biology.
Research Papers: Explore recent research papers in bioinformatics and structural biology journals to stay updated on the latest developments in protein modeling.

Continuing education and staying informed about advancements in protein modeling will be key to leveraging these tools effectively in research and applications.