Welcome
Pick a tab above to view a one-page overview of each app, including what it does, how to start, and links to the code. Each section follows the same clean card layout for quick scanning.
JC Enrichment Network Studio (Dash App)
Interactive Dash web app for exploring gene ↔ pathway/term enrichment results as a bipartite network. Upload long-format enrichment memberships, filter and explore hubs, view summary stats, and export node/edge tables for downstream analysis (e.g., Cytoscape).
- 📄 Input: long-format CSV (one row per gene–term membership edge)
- 🧬 Bipartite graph: genes ↔ pathways/terms
- 🎚️ Filters: search, min degree, min edge weight, max groups, layout mode
- 📊 Stats: node/edge counts, components, top genes/terms by degree
- ⚖️ Weighted edges: supports padj/FDR; auto-converts to
-log10(p)for plotting - ⬇️ Export: download
nodes.csvandedges.csv - ☁️ Live deployment: Google Cloud Run
Tip: If you have a standard enrichment table (term + list of genes), convert it to long format first (one gene per row per term). This app is designed for clean network building and fast exploration.
Bulk RNA-Seq Analyzer
Interactive Shiny app for bulk RNA-seq: differential expression, PCA/UMAP, volcano & heatmaps, enrichment, Random Forest, power analysis, and downloadable results.
- 🗂️ Inputs: counts matrix + phenotype CSV
- 🧬 DE: limma-voom workflow
- 🧭 Dims: PCA & UMAP
- 🌋 Plots: volcano, heatmap, interactive tables
- 🧠 ML: Random Forest + ROC/AUC
- 🧪 Pathways: Enrichr (KEGG/GO/Reactome)
- ⚡ Power: sample size/power curves
- designed for HPC environments using Singularity
Paired FASTQ QC (WDL + Cromwell)
Containerized WDL workflow executed with Cromwell for paired-end FASTQ QC. Runs FastQC on R1/R2 and generates a single MultiQC report across samples, plus a merged read-count table.
- 🧬 Inputs: paired FASTQs (R1/R2) + sample IDs via JSON
- 🔍 QC: FastQC per read pair
- 📊 Multi-sample summary: MultiQC report
- 🧾 Counts: merged per-sample read counts (R1/R2)
- 📦 Reproducible: Docker runtime inside WDL tasks
Use case: a compact, portfolio-friendly example of WDL/Cromwell workflow wiring, containerized execution, and QC aggregation across multiple samples.
Counts_matrix_Nextflow (RNA-seq Pipeline)
Portable Nextflow DSL2 RNA-seq workflow: QC → STAR alignment → gene counts → Salmon TPM → BigWig coverage → matrix merge + MultiQC summary. Designed to be minimal, readable, and robust.
- 📄 Inputs: paired-end FASTQs via samplesheet CSV
- 🧬 Core steps: fastp → STAR → featureCounts → Salmon → MultiQC
- 📈 Outputs: gene counts matrix, TPM matrix, BigWig tracks, QC report
- 🧱 Indexes: auto-builds STAR + Salmon indexes (per run)
- 🛡️ Robustness: skips coverage gracefully for zero-mapped samples
- 📦 Runs anywhere: Conda, Docker, or Singularity/Apptainer
- 🧪 Great for: demos, infra tests, portfolio/template pipelines
Quickstart
Docker
docker build -t rnaseq-pipeline .
nextflow run main.nf -profile docker \
--samplesheet samples.csv \
--ref genome.fa \
--gtf genes.gtf \
--transcripts transcripts.fa
Singularity/Apptainer
singularity build containers/rnaseq-pipeline.sif docker://rnaseq-pipeline
nextflow run main.nf -profile singularity \
--samplesheet samples.csv \
--ref genome.fa \
--gtf genes.gtf \
--transcripts transcripts.fa
Key Outputs
results/
├── qc/ (fastp)
├── ref/ (STAR + Salmon indexes)
├── bam/ (sorted BAM + BAI + flagstat)
├── bigwig/ (coverage tracks)
├── counts_per_sample/ (per-sample featureCounts)
├── counts_matrix.tsv (gene count matrix)
├── salmon_tpm_matrix.tsv (transcript TPM matrix)
└── multiqc_report.html (summary report)
Notes: intentionally simple; explicit channels; guards against common failure modes. Easy to extend into DESeq2/edgeR downstream analysis.
Enrichment Analysis LLM Triage (Flask App)
A lightweight Flask web app that takes enrichment results (CSV) and generates a structured, human-readable triage report — highlighting likely biological drivers, reactive programs, potential confounders, and suggested follow-up experiments.
- 📄 Input: enrichment CSV (terms + scores + genes/overlap fields)
- 🧠 LLM reasoning: summarizes key programs and flags confounding patterns
- 🧪 Follow-ups: proposes targeted experiments with readouts + controls
- 📑 PDF report: generates a clean downloadable triage PDF
- 📦 Deploy: Docker + Apptainer/Singularity (HPC-friendly)
Why it matters: enrichment tables are easy to generate but hard to interpret. This tool helps translate “significant pathways” into an actionable short list of mechanisms and concrete next experiments — without drowning the user in jargon.
Breast Cancer NLP Phenotyper (Dash + medspaCy)
A lightweight, rule-based clinical NLP dashboard for extracting key breast cancer phenotypes from free-text notes (e.g., pathology, consults). Outputs include a clean patient-level table plus auditable evidence mentions with snippets so results can be reviewed and trusted.
- 📄 Input: upload multiple
.txtnotes + optional mapping CSV (note → patient/date/type) - 🧠 NLP engine: spaCy + medspaCy patterns (deterministic extraction)
- 🧬 Phenotypes: ER/PR status (+ % if present), HER2 (IHC/FISH → final), Ki-67, histology, grade, stage
- 🔎 Evidence: each extracted value is backed by mention-level snippets for auditing
- 🧮 Aggregation: note-type/date precedence rules roll note-level data to patient-level output
- designed for HPC environments using Singularity
Note: This is an MVP designed for transparency and portability — ideal for demos, iteration, and extension into richer rule sets or model-assisted extraction later.
ATAC-Seq Peak Annotation & Enrichment
Upload MACS2 .narrowPeak, annotate with ChIPseeker, and run GO/KEGG/Reactome
enrichment with slick visuals and CSV exports.
- 📄 Input: MACS2
.narrowPeakfile - 🏷️ Annotation: ChIPseeker + TxDb
- 📊 Views: pie charts, tables, barplots
- 🧠 Pathways: enrichR (GO/KEGG/Reactome)
- designed for HPC environments using Singularity
miRNA Differential Expression & Enrichment
DESeq2-based miRNA analysis with PCA/UMAP, volcano & heatmaps, Enrichr enrichment, Random Forest classification, power analysis, and exports.
- 🗂️ Inputs: miRNA counts + metadata
- 🧬 DE: DESeq2 pipeline
- 🧭 Dims: PCA & UMAP
- 🌋 Plots: volcano, top-miRNA bars, heatmaps
- 🧠 ML: RF classification + metrics
- 🧪 Pathways: Enrichr (clusterProfiler fallback)
- ⚡ Power: sample size estimates
- designed for HPC environments using Singularity
DNA Methylation App
Explore beta values, run differential methylation, enrichment, PCA/UMAP, Random Forest, power analysis, and download everything — HPC-ready.
- 🗂️ Input: CSV beta matrix (e.g., 450k)
- 🧪 DE: probe-level stats + FDR
- 🧭 Dims: PCA & UMAP
- 🧠 ML: RF + AUC & importance
- 🧪 Pathways: Enrichr KEGG/GO/Reactome
- ⚡ Power: Cohen’s d → n per group
- 📦 Deploy: Singularity/Apptainer
CRISPR Mixscape Pipeline (Perturb-seq)
Single-cell CRISPR screen workflow using Seurat’s Mixscape: QC/normalization, UMAP, perturbation scoring, KO/NP/NT assignment, DE, and rich plots — with HPC support.
- 🧪 Inputs: counts + metadata CSVs
- 🧭 Dims: UMAP visualization
- 🧮 Mixscape: perturbation scores & class labels
- 🧬 DE: KO vs NT + downloads
- 📈 Views: bar/violin/heatmaps, summaries
- 📦 Deploy: Singularity + Slurm script
GWAS Analysis App
A full-stack, no-code Shiny app for Genome-Wide Association Studies (GWAS) using raw VCF files—no PLINK needed. Upload your VCF, phenotype, and covariate table to begin.
- 🧪 QC Filters: MAF, allele frequency, call rate, HWE p-value thresholds
- 🧮 GWAS Engine: Logistic regression with Bonferroni correction support
- 📊 Visualization: PCA, UMAP, QQ plot, Manhattan plot with region zoom
- 🤖 Machine Learning: Random Forest with AUC, importance, ROC
- 🧬 SNP-to-Gene Mapping: Map significant SNPs to nearest genes
- 🧠 Enrichment: KEGG/GO/Reactome via enrichR
- ⚡ Power Analysis: Cohen’s d-based observed power & sample size curve
- 📦 Export Everything: GWAS tables, enrichment results, ML metrics, more
- designed for HPC environments using Singularity
Single-cell RNA-Seq App
Interactive Seurat-based app to explore scRNA-seq data, run DE, pathway enrichment, classification, power analyses, and download publication-ready tables/plots.
- 🗂️ Upload: counts CSV (genes × cells/samples) + metadata CSV (matching names)
- 🧱 Create Seurat Object: load & normalize your data in-app
- 🧭 Dimensionality Reduction: PCA & UMAP for cluster/pattern visualization
- 🧬 Differential Expression: by condition and by cell type; find condition-only DE genes
- 🧠 Pathway Analysis: enrich DE gene sets across multiple databases
- 🌋 Volcano Plot: publication-style volcano for condition-only DE
- 🤖 Feature Selection & Classification: Random Forest markers, ROC, importance
- 📈 Power Analysis: estimate power and minimum sample size for key DE genes
- 🔥 Heatmaps: top features and group differences
- ⬇️ Downloads: export all result tables and figures
- designed for HPC environments using Singularity
Tips: CSV format only; counts & metadata must match by name. For best results, use
quality-filtered data (see insurance_policy_script.R).
About Me
I’m John Caperella, a bioinformatics developer passionate about turning complex genomic data into usable insights. I build clear, scalable tools for single-cell RNA-seq, CRISPR Mixscape, and omics visualization in R, Python, and Shiny—helping researchers and data scientists get to answers faster.
- 🧬 Single-cell & CRISPR analytics (Seurat/Mixscape)
- 📊 Reproducible visualization apps (Docker/Singularity/HPC)
- 🤖 Exploring LLMs for genomics workflows
Research Radar (PubMed + LLM Summaries)
A lightweight research intelligence dashboard that pulls recent PubMed papers across multiple queries and generates 3-bullet summaries using a local LLM (Ollama). Includes filters, trending topics, and top journals — built for fast scanning.
- 🧠 LLM summaries: exactly 3 bullets per abstract
- 📈 Dashboard: trending topics + top journals + search/filter
- 🔁 Update workflow: regenerate
papers.jsonand push to Pages - 🧱 Portfolio pattern: API ingestion → pipeline → structured JSON → UI
Tip: This is ideal for tracking AI-for-biology, computational methods, and variant calling literature without drowning in full abstracts.
GCP FASTQ Event Pipeline (Eventarc → Cloud Run Job)
Cloud-native, event-driven FASTQ processing on Google Cloud. Upload a FASTQ to Cloud Storage and automatically trigger a serverless pipeline: Eventarc fires on object finalize → Cloud Function launches a containerized Cloud Run Job → results are written back for downstream analysis.
- ☁️ Trigger: GCS “object finalize” event (Eventarc)
- 🧩 Orchestration: Cloud Function (glue layer)
- 📦 Compute: Cloud Run Job (containerized batch)
- 🧬 Bioinformatics-ready pattern: scalable ETL wiring for genomics ingestion
- 📤 Outputs: structured results written to storage (and designed to extend to analytics/warehousing)
- 🛠️ Great for: portfolio demos of serverless pipelines + reproducible containers
Why it matters: This mirrors real-world genomics platform patterns — automatic ingestion triggers + containerized compute, without manual job submission.
Document Cleaning CLI
AI-powered document cleanup for scanned pages, noisy screenshots, and OCR-bound records. Enhances messy source images into cleaner, sharper outputs that are easier to read, extract, and route into downstream research or clinical workflows.
- 📥 Inputs: scanned
.png/.jpgfiles or ZIP batches of document images - 🧼 Cleanup: deep-learning denoising + image enhancement for messy source material
- 📄 Outputs: OCR-optimized images and PDF-ready cleaned documents
- 🛠️ Modes: command-line workflow or REST API deployment
- 🏥 Use cases: legacy records, scanned notes, exported forms, and other hard-to-read document inputs
- ⚙️ designed for flexible local or API-based deployment workflows
Why it matters: a lot of valuable information is trapped inside low-quality visual documents. This tool helps turn noisy records into cleaner, machine-readable assets for OCR, review, and downstream automation.
Usage Analytics
This dashboard is powered by Google Analytics 4 and Looker Studio. It shows historical click activity for each app.
Last checked: loading…