Decode the Genotype-Phenotype Map

Predict Opsin Phenotype (λmax) directly from unaligned amino-acid sequences using machine learning models trained on the Visual Physiology Opsin Database (VPOD).

Powerful Analytical Capabilities

OPTICS goes beyond basic prediction, offering deep insights into the structural and feature-level drivers of color sensitivity.

λmax Prediction

Predict the peak light absorption wavelength (λmax) from unaligned sequences with confidence intervals using bootstrap ensembles.

SHAP Explanations

Interpret predictions using SHAP values. Identify exactly which amino acid sites and properties drive specific λmax shifts.

3D Structure Mapping

Project SHAP importance values directly onto 3D PDB structures (like Bovine Rhodopsin) to create localized visual importance heatmaps.

Multiple ML Models

Choose from specialized models trained on specific taxonomic subsets (vertebrate, invertebrate) or augmented "Mine-n-Match" datasets.

Integrated BLASTp

Automatically compare your query sequences against curated reference datasets (Bovine, Squid, Microbe) natively within the pipeline.

Interactive GUI

Not a fan of the command line? Use `run_optics_gui.py` for a sleek, dark-mode accessible graphical interface for all tools.

Installation

terminal
# 1. Clone the repository
git clone https://github.com/VisualPhysiologyDB/optics.git
cd optics

# 2. Create and activate a Conda environment
conda create --name optics_env python=3.11
conda activate optics_env

# 3. Install Python dependencies
pip install -r requirements.txt

External Dependencies Required

OPTICS requires BLAST and MAFFT to align and compare sequences.

# For Mac / Linux: conda install bioconda::blast bioconda::mafft

Windows users: MAFFT is bundled with OPTICS, but you must manually install the BLAST executable and add it to your PATH.

How to Use OPTICS

1. Standard Predictions

Run the main λmax prediction workflow on a FASTA file containing unaligned sequences.

python optics_predictions.py -i ./examples/optics_ex_short.txt -o ./outputs -p my_results -m whole-dataset --blastp --bootstrap

2. SHAP Explanations

Determine exactly why sequences have different predicted λmax values using SHAP feature attribution.

python optics_shap.py -i ./examples/optics_ex_short.fasta -o ./outputs -p shap_test --mode both --use_reference_sites

3. 3D Structure Mapping

Map your SHAP importance outputs onto a 3D PDB structure (creates an annotated PDB with modified B-factors and a PyMOL script).

python optics_structure_map.py -s ./outputs/seq_shap_analysis.csv -p 1U19 --map_bovine_also
OR

Launch the Graphical Interface

Skip the command line entirely. The built-in GUI provides access to all four analytical pipelines in an intuitive, dark-mode ready window.

python run_optics_gui.py

Citation & Credits

If you use OPTICS in your research, please cite:

Seth A. Frazer, Todd H. Oakley. Accessible and Robust Machine Learning Approaches to Improve the Opsin Genotype-Phenotype Map. bioRxiv, 2025.08.22.671864.
https://doi.org/10.1101/2025.08.22.671864

Original VPOD Development:

Seth A. Frazer, Mahdi Baghbanzadeh, Ali Rahnavard, Keith A. Crandall, & Todd H Oakley. Discovering genotype-phenotype relationships with machine learning and the Visual Physiology Opsin Database (VPOD). GigaScience, 2024.09.01.
https://doi.org/10.1093/gigascience/giae073

The Back-Bone ML Pipeline for OPTICS: DeepBreaks

Mahdi Baghbanzadeh, Tyson Dawson, Bahar Sayoldin, Seth A. Frazer, Todd H. Oakley, Keith A. Crandall & Ali Rahnavard. deepBreaks identifies and prioritizes genotype–phenotype associations using machine learning. Scientific Reports, 2026.11.07.
https://doi.org/10.1038/s41598-025-25580-6