Protein structure prediction
Last updated on 2026-04-01 | Edit this page
Overview
Questions
- How can we predict a protein’s 3D structure from its sequence?
- How do we evaluate the confidence of predicted structures?
Objectives
- Predict protein and protein complex structure using
AlphaFold. - Interpret prediction confidence scores.
- Visualize predicted structures.
- Compare predicted structures to known structures.
Introduction
In this episode, we are going to explore protein structure
prediction. For structure prediction, you are going to use AlphaFold,
a deep learning model based tool developed by DeepMind that has
revolutionized the field of protein structure prediction. We will (1)
predict the structure of a protein, (2) interpret the confidence of the
predictions, (3) visualize the predicted structures, and (4) compare
them to known structures. Let’s get started by setting up our
environment for prediction and loading the necessary libraries.
Task 1: Predicting structure of a protein
We will use the AlphaFold/3.0.1 module from Pelle for
protein structure prediction. Let’s start!
Challenge 1.1: prepare the terrain
First, using your username and password you need to login to Pelle.
The course base folder is at
/proj/g2020004/nobackup/3MK013. Go to your own folder,
create a protein-structure-exercise subfolder, and move
into it.
Challenge 1.2: upload the AlphaFold3 parameters file
Probably you already have downloaded the AlphaFold3 models using this link.
First, we need to create a params sub-directory under
protein-structure-exercise directory to store the
AlphaFold3 parameters file on Pelle. Then we will upload the parameters
file from the local machine into that params directory on
Pelle.
Use mkdir command to create a new directory, and upload
the file into the params directory using scp
command.
BASH
# create params directory on Pelle
mkdir <basefolder>/<username>/protein-structure-exercise/params
# Open a new terminal on your local machine and use the following command to upload the parameters file to Pelle. Remember to replace `<path_to_parameters_file>` and `<username>` accordingly.
scp <path_to_parameters_file> <username>@pelle.uppmax.uu.se:<basefolder>/<username>/protein-structure-exercise/params
Challenge 1.3: prepare the input file
AlphaFold takes the amino acid (AA) sequence of a protein as input and predicts its 3D structure. For this task, we will use the AA sequence of Adenylate kinase from Escherichia coli (UniProt ID: P69441). You can download the sequence of Adenylate kinase using the following code:
This will download the FASTA file containing the amino acid sequence of the protein.
Q.1 Can you tell how many amino acids are in the sequence? You can
use the grep command to count.
The grep -v ">" command removes the header line,
tr -d '\n' removes newlines, and wc -c counts
the number of characters, which corresponds to the number of amino
acids.
Challenge 1.3: prepare the input file (continued)
You can check how AlphaFold3 input files look like here. You can manually format the input file according to the specifications. But for this exercise, we will use a script that automates the process.
First, we create a directory called af3-input:
Use mkdir command to create a new directory.
Challenge 1.3: prepare the input file (continued)
Now use the following command to prepare the input file for AlphaFold3:
This command will take the FASTA file as input and generate a JSON
file that can be used as AlphaFold3 input. You can inspect the structure
of the JSON file using the cat or less
command.
Q.2 Can you explain why the P69441.json file has its version set to 1 ? Does it follow AlphaFold3 input specification?
Check AlphaFold3 input file preparation instruction here
Check the section about versions in the AlphaFold3 input file preparation instruction.
Challenge 1.4: run AlphaFold3 prediction
Now that we have the input file ready, we can run the
AlphaFold3 prediction.
First, we create output directory for storing the AF3 predicted results:
Now, we will load the AlphaFold3 module, which allows us
to run an already installed version AlphaFold3. Use the following
command:
You should check whether the module is actually loaded by using the following command:
We will run AlphaFold3 using slurm. You
can check the instructions. For this course, we will use a script that
generates a ready-to-use slurm command.
Then, we can run the prediction using the following command:
BASH
python3 <basefolder>/scripts/create_af3_slurm_job.py -p <Project-Code> -i ./af3-input/P69441.json -o ./af3-output -m ./params > job.sh
Now we execute the job as following:
This command runs the AlphaFold3 prediction using the input JSON file
and saves the results in the af3-output directory. You can
check the status of your job using the
squeue -u <username> command.
Q.3 How long did the prediction take?
You can check the log file generated by the slurm
job.
The log file will have a name like
slurm-<job_id>.out.
Task 2: Interpret the confidence score of the predictions
Now that we have generated a predicted structure, we will evaluate how reliable it is.
In the output directory (af3-output), you will find
several files. One with the suffix
_summary_confidences.json, contains the confidence scores
for the predicted structure. You can use the following command to view
the contents of the confidence summary file:
The predicted template modeling (pTM) score and the interface predicted template modeling (ipTM) score are both derived from a measure called the template modeling (TM) score. This measures the accuracy of the entire structure. A pTM score above 0.5 means the overall predicted fold for the complex might be similar to the true structure. ipTM measures the accuracy of the predicted relative positions of the subunits within the complex. Values higher than 0.8 represent confident high-quality predictions, while values below 0.6 suggest likely a failed prediction. ipTM values between 0.6 and 0.8 are a gray zone where predictions could be correct or incorrect. source.
Challenge 2.1: checking prediction quality
Q.4 What are the pTM and ipTM scores for the predicted structure of Adenylate kinase? Does it indicate a good prediction?
You can find the pTM and ipTM scores in the
_summary_confidences.json file. Look for the keys
"pTM" and "ipTM" in the JSON file.
First find the confidence scores using the command below:
A pTM score above 0.5 suggests the overall fold is likely
correct.
High ipTM values (e.g. >0.8) indicate confident prediction of
interactions.
If your scores fall in these ranges, the prediction is likely
reliable.
Task 3: Visualize the predicted structures
To visualize the predicted structure, you can use a molecular visualization tool such as PyMOL. In case you can not install these tools on your local machine, you can use the web-based tool RCSB 3D View to visualize the predicted structure. However, for this course we will use PyMOL and here is the installation instruction link.
To visualize the predicted structure using PyMOL, you can follow these steps:
Download the predicted structure file (with suffix
_model.cif) from the af3-output directory to
your local machine.
You can use the scp command to securely copy the file
from the remote server to your local machine. Replace
<username> with your actual username and
<local_path> with the path where you want to save the
file on your local machine.
BASH
scp <username>@pelle.uppmax.uu.se:<basefolder>/<username>/protein-structure-exercise/af3-output/p69441/p69441_model.cif <local_path>
To visualize in PyMOL, first open PyMOL software on your local machine. PyMol has it’s own command line interface, just like the terminal. You can use that terminal and type the following command to load the predicted structure file:
Task 4: Compare predicted structures to known structures
From UniProt Db entry for Adenylate kinase (P69441), we can see that there are several known structures for this protein in the Protein Data Bank (PDB).
You can find the PDB IDs for the known structures in the UniProt
entry under the “Structure” section. For this exercise, we will compare
our predicted structure to the known structure with PDB ID
1AKE.
Using PyMol terminal, you can fetch the known structure from the PDB database using the following command:
You can then align the predicted structure to the known structure using the following command:
This will align the predicted structure (P69441_model) to the known structure (1AKE) and provide you with an RMSD (Root Mean Square Deviation) value, which indicates how closely the predicted structure matches the known structure. A lower RMSD value indicates a better match.
You can also superimpose the two structures to visually compare them using the following command:
This will superimpose the predicted structure onto the known structure, allowing you to visually assess the similarities and differences between the two structures.
Challenge 4.1: checking structural alignment
Q.5 What is the RMSD value between the predicted structure and the known structure? Does it indicate a good prediction?
Align predicted structure with the known structure from PDB database, using PyMOL.
A low RMSD (e.g. <2 Å) indicates strong agreement with the known
structure.
Higher values suggest deviations and lower prediction accuracy.
Task 5: Predict structure of a protein complex
In this section, we will predict structure of a protein complex 4fqb, where one protein is a toxic effector Tse1 and the other one is an immune protein Tsi1. They together form a protein complex, soon we will see their complex structure.
Let’s begin with copying the FASTA file from <basefolder>/<username>/protein-structure-exercise
directory. The file we need to copy is called
4fqb.fasta.
Then, we prepare the AlphaFold3 input JSON file using
the fasta_to_af3_json.py script. Use the following
command:
Next, you need to run AlphaFold3 prediction using
af3-input/4fqb.json file as input. It is very similar as
the steps described in Challenge 1.4. Only exception is when you use
create_af3_slurm_job.py script, the input (-i) should be
changed as -i af3-input/4fqb.json. See below:
Check whether the AlphaFold3 module is still loaded:
If not, load AlphaFold3 module:
Check whether the module is now loaded successfully:
Then, we create slurm job:
BASH
python3 <basefolder>/scripts/create_af3_slurm_job.py -p <Project-Code> -i ./af3-input/4fqb.json -o ./af3-output -m ./params > job_complex.sh
Now we execute the job as following:
Challenge 2: check the confidence score, visualise and compare the predicted complex with experimentally derived structure
After the prediction is complete, check the confidence score and compare the predicted structure with the known structure from the PDB database (4FQB) using instruction from Task 2, 3 and 4.
Q.6 Based on the confidence score and RMSD value, do you think AlphaFold3 performed well in predicting the protein complex?
If both confidence scores (pTM/ipTM) are high and RMSD is low,
AlphaFold3 performed well.
If scores are low or RMSD is high, the prediction may be unreliable.
In this episode, we explored how to predict, evaluate, and validate protein structures using AlphaFold3.
- AlphaFold predicts structure from sequence
- pTM/ipTM indicate confidence
- RMSD measures structural similarity