Welcome

Pick a tab above to view a one-page overview of each app, including what it does, how to start, and links to the code. Each section follows the same clean card layout for quick scanning.

NEW

JC Enrichment Network Studio (Dash App)

Interactive Dash web app for exploring gene ↔ pathway/term enrichment results as a bipartite network. Upload long-format enrichment memberships, filter and explore hubs, view summary stats, and export node/edge tables for downstream analysis (e.g., Cytoscape).

  • 📄 Input: long-format CSV (one row per gene–term membership edge)
  • 🧬 Bipartite graph: genes ↔ pathways/terms
  • 🎚️ Filters: search, min degree, min edge weight, max groups, layout mode
  • 📊 Stats: node/edge counts, components, top genes/terms by degree
  • ⚖️ Weighted edges: supports padj/FDR; auto-converts to -log10(p) for plotting
  • ⬇️ Export: download nodes.csv and edges.csv
  • ☁️ Live deployment: Google Cloud Run
Open Live App View on GitHub

Tip: If you have a standard enrichment table (term + list of genes), convert it to long format first (one gene per row per term). This app is designed for clean network building and fast exploration.

Bulk RNA-Seq Analyzer

Interactive Shiny app for bulk RNA-seq: differential expression, PCA/UMAP, volcano & heatmaps, enrichment, Random Forest, power analysis, and downloadable results.

  • 🗂️ Inputs: counts matrix + phenotype CSV
  • 🧬 DE: limma-voom workflow
  • 🧭 Dims: PCA & UMAP
  • 🌋 Plots: volcano, heatmap, interactive tables
  • 🧠 ML: Random Forest + ROC/AUC
  • 🧪 Pathways: Enrichr (KEGG/GO/Reactome)
  • Power: sample size/power curves
  • 📦 Runs anywhere: Docker/Singularity
View on GitHub

Counts_matrix_Nextflow (RNA-seq Pipeline)

Portable Nextflow DSL2 RNA-seq workflow: QC → STAR alignment → gene counts → Salmon TPM → BigWig coverage → matrix merge + MultiQC summary. Designed to be minimal, readable, and robust.

  • 📄 Inputs: paired-end FASTQs via samplesheet CSV
  • 🧬 Core steps: fastp → STAR → featureCounts → Salmon → MultiQC
  • 📈 Outputs: gene counts matrix, TPM matrix, BigWig tracks, QC report
  • 🧱 Indexes: auto-builds STAR + Salmon indexes (per run)
  • 🛡️ Robustness: skips coverage gracefully for zero-mapped samples
  • 📦 Runs anywhere: Conda, Docker, or Singularity/Apptainer
  • 🧪 Great for: demos, infra tests, portfolio/template pipelines
View on GitHub README

Quickstart

Docker

docker build -t rnaseq-pipeline .
nextflow run main.nf -profile docker \
  --samplesheet samples.csv \
  --ref genome.fa \
  --gtf genes.gtf \
  --transcripts transcripts.fa
    

Singularity/Apptainer

singularity build containers/rnaseq-pipeline.sif docker://rnaseq-pipeline
nextflow run main.nf -profile singularity \
  --samplesheet samples.csv \
  --ref genome.fa \
  --gtf genes.gtf \
  --transcripts transcripts.fa
    

Key Outputs

results/
├── qc/                       (fastp)
├── ref/                      (STAR + Salmon indexes)
├── bam/                      (sorted BAM + BAI + flagstat)
├── bigwig/                   (coverage tracks)
├── counts_per_sample/        (per-sample featureCounts)
├── counts_matrix.tsv         (gene count matrix)
├── salmon_tpm_matrix.tsv     (transcript TPM matrix)
└── multiqc_report.html       (summary report)
    

Notes: intentionally simple; explicit channels; guards against common failure modes. Easy to extend into DESeq2/edgeR downstream analysis.

NEW

Enrichment Analysis LLM Triage (Flask App)

A lightweight Flask web app that takes enrichment results (CSV) and generates a structured, human-readable triage report — highlighting likely biological drivers, reactive programs, potential confounders, and suggested follow-up experiments.

  • 📄 Input: enrichment CSV (terms + scores + genes/overlap fields)
  • 🧠 LLM reasoning: summarizes key programs and flags confounding patterns
  • 🧪 Follow-ups: proposes targeted experiments with readouts + controls
  • 📑 PDF report: generates a clean downloadable triage PDF
  • 📦 Deploy: Docker + Apptainer/Singularity (HPC-friendly)
View on GitHub

Why it matters: enrichment tables are easy to generate but hard to interpret. This tool helps translate “significant pathways” into an actionable short list of mechanisms and concrete next experiments — without drowning the user in jargon.

NEW

Breast Cancer NLP Phenotyper (Dash + medspaCy)

A lightweight, rule-based clinical NLP dashboard for extracting key breast cancer phenotypes from free-text notes (e.g., pathology, consults). Outputs include a clean patient-level table plus auditable evidence mentions with snippets so results can be reviewed and trusted.

  • 📄 Input: upload multiple .txt notes + optional mapping CSV (note → patient/date/type)
  • 🧠 NLP engine: spaCy + medspaCy patterns (deterministic extraction)
  • 🧬 Phenotypes: ER/PR status (+ % if present), HER2 (IHC/FISH → final), Ki-67, histology, grade, stage
  • 🔎 Evidence: each extracted value is backed by mention-level snippets for auditing
  • 🧮 Aggregation: note-type/date precedence rules roll note-level data to patient-level output
  • 📦 Portable: Docker + Apptainer/Singularity supported (HPC-friendly)
View on GitHub

Note: This is an MVP designed for transparency and portability — ideal for demos, iteration, and extension into richer rule sets or model-assisted extraction later.

ATAC-Seq Peak Annotation & Enrichment

Upload MACS2 .narrowPeak, annotate with ChIPseeker, and run GO/KEGG/Reactome enrichment with slick visuals and CSV exports.

  • 📄 Input: MACS2 .narrowPeak file
  • 🏷️ Annotation: ChIPseeker + TxDb
  • 📊 Views: pie charts, tables, barplots
  • 🧠 Pathways: enrichR (GO/KEGG/Reactome)
  • 🚀 Deploy: Docker & Singularity/HPC
View on GitHub

miRNA Differential Expression & Enrichment

DESeq2-based miRNA analysis with PCA/UMAP, volcano & heatmaps, Enrichr enrichment, Random Forest classification, power analysis, and exports.

  • 🗂️ Inputs: miRNA counts + metadata
  • 🧬 DE: DESeq2 pipeline
  • 🧭 Dims: PCA & UMAP
  • 🌋 Plots: volcano, top-miRNA bars, heatmaps
  • 🧠 ML: RF classification + metrics
  • 🧪 Pathways: Enrichr (clusterProfiler fallback)
  • Power: sample size estimates
View on GitHub

DNA Methylation App

Explore beta values, run differential methylation, enrichment, PCA/UMAP, Random Forest, power analysis, and download everything — HPC-ready.

  • 🗂️ Input: CSV beta matrix (e.g., 450k)
  • 🧪 DE: probe-level stats + FDR
  • 🧭 Dims: PCA & UMAP
  • 🧠 ML: RF + AUC & importance
  • 🧪 Pathways: Enrichr KEGG/GO/Reactome
  • Power: Cohen’s d → n per group
  • 📦 Deploy: Singularity/Apptainer
View on GitHub

CRISPR Mixscape Pipeline (Perturb-seq)

Single-cell CRISPR screen workflow using Seurat’s Mixscape: QC/normalization, UMAP, perturbation scoring, KO/NP/NT assignment, DE, and rich plots — with HPC support.

  • 🧪 Inputs: counts + metadata CSVs
  • 🧭 Dims: UMAP visualization
  • 🧮 Mixscape: perturbation scores & class labels
  • 🧬 DE: KO vs NT + downloads
  • 📈 Views: bar/violin/heatmaps, summaries
  • 📦 Deploy: Singularity + Slurm script
View on GitHub

GWAS Analysis App

A full-stack, no-code Shiny app for Genome-Wide Association Studies (GWAS) using raw VCF files—no PLINK needed. Upload your VCF, phenotype, and covariate table to begin.

  • 🧪 QC Filters: MAF, allele frequency, call rate, HWE p-value thresholds
  • 🧮 GWAS Engine: Logistic regression with Bonferroni correction support
  • 📊 Visualization: PCA, UMAP, QQ plot, Manhattan plot with region zoom
  • 🤖 Machine Learning: Random Forest with AUC, importance, ROC
  • 🧬 SNP-to-Gene Mapping: Map significant SNPs to nearest genes
  • 🧠 Enrichment: KEGG/GO/Reactome via enrichR
  • Power Analysis: Cohen’s d-based observed power & sample size curve
  • 📦 Export Everything: GWAS tables, enrichment results, ML metrics, more
View on GitHub
NEW

Single-cell RNA-Seq App

Interactive Seurat-based app to explore scRNA-seq data, run DE, pathway enrichment, classification, power analyses, and download publication-ready tables/plots.

  • 🗂️ Upload: counts CSV (genes × cells/samples) + metadata CSV (matching names)
  • 🧱 Create Seurat Object: load & normalize your data in-app
  • 🧭 Dimensionality Reduction: PCA & UMAP for cluster/pattern visualization
  • 🧬 Differential Expression: by condition and by cell type; find condition-only DE genes
  • 🧠 Pathway Analysis: enrich DE gene sets across multiple databases
  • 🌋 Volcano Plot: publication-style volcano for condition-only DE
  • 🤖 Feature Selection & Classification: Random Forest markers, ROC, importance
  • 📈 Power Analysis: estimate power and minimum sample size for key DE genes
  • 🔥 Heatmaps: top features and group differences
  • ⬇️ Downloads: export all result tables and figures
View on GitHub

Tips: CSV format only; counts & metadata must match by name. For best results, use quality-filtered data (see insurance_policy_script.R).

About Me

Headshot of John Caperella

I’m John Caperella, a bioinformatics developer passionate about turning complex genomic data into usable insights. I build clear, scalable tools for single-cell RNA-seq, CRISPR Mixscape, and omics visualization in R, Python, and Shiny—helping researchers and data scientists get to answers faster.

  • 🧬 Single-cell & CRISPR analytics (Seurat/Mixscape)
  • 📊 Reproducible visualization apps (Docker/Singularity/HPC)
  • 🤖 Exploring LLMs for genomics workflows

Usage Analytics

This dashboard is powered by Google Analytics 4 and Looker Studio. It shows historical click activity for each app.

Last checked: loading…