A newly compiled database of opsin genes and machine-learning models to predict peak-sensitivity (λmax) phenotypes.
VPOD provides both fully curated machine-learning ready datasets and the raw database files for custom querying using SQLite.
Located in vpod_data/VPOD_1.3/formatted_data_subsets/. Subsets suitable for direct model training without requiring MySQL or sequence alignment.
Located in vpod_data/VPOD_1.3/raw_database_files/. Load into SQLite to create custom datasets.
git clone https://github.com/VisualPhysiologyDB/visual-physiology-opsin-db.git
cd visual-physiology-opsin-db
# Start exploring with vpod_main_wf.ipynb
vpod_main_wf.ipynb is the primary notebook for users. It contains a full pipeline for creating a local instance of VPOD using SQLite, formatting datasets, and training ML models using deepBreaks.
Includes tools for generating chimeras, in-silico deep-mutational-scanning (DMS), and reciprocal mutagenesis to build theoretical opsin variants for model testing.
R-based tools (Phylogenetic_Imputation.Rmd) to load tree files, make λmax predictions via phylogenetic imputation, and compare them directly against ML outputs.
Advanced workflow combining heterologous expression data with in-vivo correlations to augment datasets for more robust taxonomic subset modeling.