Welcome
Pick a tab above to view a one-page overview of each app, including what it does, how to start, and links to the code. Each section follows the same clean card layout for quick scanning.
JC Enrichment Network Studio (Dash App)
Interactive Dash web app for exploring gene ↔ pathway/term enrichment results as a bipartite network. Upload long-format enrichment memberships, filter and explore hubs, view summary stats, and export node/edge tables for downstream analysis (e.g., Cytoscape).
- 📄 Input: long-format CSV (one row per gene–term membership edge)
- 🧬 Bipartite graph: genes ↔ pathways/terms
- 🎚️ Filters: search, min degree, min edge weight, max groups, layout mode
- 📊 Stats: node/edge counts, components, top genes/terms by degree
- ⚖️ Weighted edges: supports padj/FDR; auto-converts to
-log10(p)for plotting - ⬇️ Export: download
nodes.csvandedges.csv - ☁️ Live deployment: Google Cloud Run
Tip: If you have a standard enrichment table (term + list of genes), convert it to long format first (one gene per row per term). This app is designed for clean network building and fast exploration.
Bulk RNA-Seq Analyzer
Interactive Shiny app for bulk RNA-seq: differential expression, PCA/UMAP, volcano & heatmaps, enrichment, Random Forest, power analysis, and downloadable results.
- 🗂️ Inputs: counts matrix + phenotype CSV
- 🧬 DE: limma-voom workflow
- 🧭 Dims: PCA & UMAP
- 🌋 Plots: volcano, heatmap, interactive tables
- 🧠 ML: Random Forest + ROC/AUC
- 🧪 Pathways: Enrichr (KEGG/GO/Reactome)
- ⚡ Power: sample size/power curves
- 📦 Runs anywhere: Docker/Singularity
Counts_matrix_Nextflow (RNA-seq Pipeline)
Portable Nextflow DSL2 RNA-seq workflow: QC → STAR alignment → gene counts → Salmon TPM → BigWig coverage → matrix merge + MultiQC summary. Designed to be minimal, readable, and robust.
- 📄 Inputs: paired-end FASTQs via samplesheet CSV
- 🧬 Core steps: fastp → STAR → featureCounts → Salmon → MultiQC
- 📈 Outputs: gene counts matrix, TPM matrix, BigWig tracks, QC report
- 🧱 Indexes: auto-builds STAR + Salmon indexes (per run)
- 🛡️ Robustness: skips coverage gracefully for zero-mapped samples
- 📦 Runs anywhere: Conda, Docker, or Singularity/Apptainer
- 🧪 Great for: demos, infra tests, portfolio/template pipelines
Quickstart
Docker
docker build -t rnaseq-pipeline .
nextflow run main.nf -profile docker \
--samplesheet samples.csv \
--ref genome.fa \
--gtf genes.gtf \
--transcripts transcripts.fa
Singularity/Apptainer
singularity build containers/rnaseq-pipeline.sif docker://rnaseq-pipeline
nextflow run main.nf -profile singularity \
--samplesheet samples.csv \
--ref genome.fa \
--gtf genes.gtf \
--transcripts transcripts.fa
Key Outputs
results/
├── qc/ (fastp)
├── ref/ (STAR + Salmon indexes)
├── bam/ (sorted BAM + BAI + flagstat)
├── bigwig/ (coverage tracks)
├── counts_per_sample/ (per-sample featureCounts)
├── counts_matrix.tsv (gene count matrix)
├── salmon_tpm_matrix.tsv (transcript TPM matrix)
└── multiqc_report.html (summary report)
Notes: intentionally simple; explicit channels; guards against common failure modes. Easy to extend into DESeq2/edgeR downstream analysis.
Enrichment Analysis LLM Triage (Flask App)
A lightweight Flask web app that takes enrichment results (CSV) and generates a structured, human-readable triage report — highlighting likely biological drivers, reactive programs, potential confounders, and suggested follow-up experiments.
- 📄 Input: enrichment CSV (terms + scores + genes/overlap fields)
- 🧠 LLM reasoning: summarizes key programs and flags confounding patterns
- 🧪 Follow-ups: proposes targeted experiments with readouts + controls
- 📑 PDF report: generates a clean downloadable triage PDF
- 📦 Deploy: Docker + Apptainer/Singularity (HPC-friendly)
Why it matters: enrichment tables are easy to generate but hard to interpret. This tool helps translate “significant pathways” into an actionable short list of mechanisms and concrete next experiments — without drowning the user in jargon.
Breast Cancer NLP Phenotyper (Dash + medspaCy)
A lightweight, rule-based clinical NLP dashboard for extracting key breast cancer phenotypes from free-text notes (e.g., pathology, consults). Outputs include a clean patient-level table plus auditable evidence mentions with snippets so results can be reviewed and trusted.
- 📄 Input: upload multiple
.txtnotes + optional mapping CSV (note → patient/date/type) - 🧠 NLP engine: spaCy + medspaCy patterns (deterministic extraction)
- 🧬 Phenotypes: ER/PR status (+ % if present), HER2 (IHC/FISH → final), Ki-67, histology, grade, stage
- 🔎 Evidence: each extracted value is backed by mention-level snippets for auditing
- 🧮 Aggregation: note-type/date precedence rules roll note-level data to patient-level output
- 📦 Portable: Docker + Apptainer/Singularity supported (HPC-friendly)
Note: This is an MVP designed for transparency and portability — ideal for demos, iteration, and extension into richer rule sets or model-assisted extraction later.
ATAC-Seq Peak Annotation & Enrichment
Upload MACS2 .narrowPeak, annotate with ChIPseeker, and run GO/KEGG/Reactome
enrichment with slick visuals and CSV exports.
- 📄 Input: MACS2
.narrowPeakfile - 🏷️ Annotation: ChIPseeker + TxDb
- 📊 Views: pie charts, tables, barplots
- 🧠 Pathways: enrichR (GO/KEGG/Reactome)
- 🚀 Deploy: Docker & Singularity/HPC
miRNA Differential Expression & Enrichment
DESeq2-based miRNA analysis with PCA/UMAP, volcano & heatmaps, Enrichr enrichment, Random Forest classification, power analysis, and exports.
- 🗂️ Inputs: miRNA counts + metadata
- 🧬 DE: DESeq2 pipeline
- 🧭 Dims: PCA & UMAP
- 🌋 Plots: volcano, top-miRNA bars, heatmaps
- 🧠 ML: RF classification + metrics
- 🧪 Pathways: Enrichr (clusterProfiler fallback)
- ⚡ Power: sample size estimates
DNA Methylation App
Explore beta values, run differential methylation, enrichment, PCA/UMAP, Random Forest, power analysis, and download everything — HPC-ready.
- 🗂️ Input: CSV beta matrix (e.g., 450k)
- 🧪 DE: probe-level stats + FDR
- 🧭 Dims: PCA & UMAP
- 🧠 ML: RF + AUC & importance
- 🧪 Pathways: Enrichr KEGG/GO/Reactome
- ⚡ Power: Cohen’s d → n per group
- 📦 Deploy: Singularity/Apptainer
CRISPR Mixscape Pipeline (Perturb-seq)
Single-cell CRISPR screen workflow using Seurat’s Mixscape: QC/normalization, UMAP, perturbation scoring, KO/NP/NT assignment, DE, and rich plots — with HPC support.
- 🧪 Inputs: counts + metadata CSVs
- 🧭 Dims: UMAP visualization
- 🧮 Mixscape: perturbation scores & class labels
- 🧬 DE: KO vs NT + downloads
- 📈 Views: bar/violin/heatmaps, summaries
- 📦 Deploy: Singularity + Slurm script
GWAS Analysis App
A full-stack, no-code Shiny app for Genome-Wide Association Studies (GWAS) using raw VCF files—no PLINK needed. Upload your VCF, phenotype, and covariate table to begin.
- 🧪 QC Filters: MAF, allele frequency, call rate, HWE p-value thresholds
- 🧮 GWAS Engine: Logistic regression with Bonferroni correction support
- 📊 Visualization: PCA, UMAP, QQ plot, Manhattan plot with region zoom
- 🤖 Machine Learning: Random Forest with AUC, importance, ROC
- 🧬 SNP-to-Gene Mapping: Map significant SNPs to nearest genes
- 🧠 Enrichment: KEGG/GO/Reactome via enrichR
- ⚡ Power Analysis: Cohen’s d-based observed power & sample size curve
- 📦 Export Everything: GWAS tables, enrichment results, ML metrics, more
Single-cell RNA-Seq App
Interactive Seurat-based app to explore scRNA-seq data, run DE, pathway enrichment, classification, power analyses, and download publication-ready tables/plots.
- 🗂️ Upload: counts CSV (genes × cells/samples) + metadata CSV (matching names)
- 🧱 Create Seurat Object: load & normalize your data in-app
- 🧭 Dimensionality Reduction: PCA & UMAP for cluster/pattern visualization
- 🧬 Differential Expression: by condition and by cell type; find condition-only DE genes
- 🧠 Pathway Analysis: enrich DE gene sets across multiple databases
- 🌋 Volcano Plot: publication-style volcano for condition-only DE
- 🤖 Feature Selection & Classification: Random Forest markers, ROC, importance
- 📈 Power Analysis: estimate power and minimum sample size for key DE genes
- 🔥 Heatmaps: top features and group differences
- ⬇️ Downloads: export all result tables and figures
Tips: CSV format only; counts & metadata must match by name. For best results, use
quality-filtered data (see insurance_policy_script.R).
About Me
I’m John Caperella, a bioinformatics developer passionate about turning complex genomic data into usable insights. I build clear, scalable tools for single-cell RNA-seq, CRISPR Mixscape, and omics visualization in R, Python, and Shiny—helping researchers and data scientists get to answers faster.
- 🧬 Single-cell & CRISPR analytics (Seurat/Mixscape)
- 📊 Reproducible visualization apps (Docker/Singularity/HPC)
- 🤖 Exploring LLMs for genomics workflows
Usage Analytics
This dashboard is powered by Google Analytics 4 and Looker Studio. It shows historical click activity for each app.
Last checked: loading…