You are browsing as a guest. Sign up (or log in) to start making projects!

MINTS

  • 5 Devlogs
  • 3 Total hours

MINTS is a reproducible mechanistic-interpretability pipeline for genomic transformers. It loads DNABERT-2 and Nucleotide Transformer backends, extracts QK/OV circuit matrices, probes frozen residual streams, scores CTCF motif support from JASPAR, tests QK-to-motif alignment and matched attention enrichment, runs custom DNABERT forward-hook activation patching, and searches for distributed CTCF-aligned SAE features.

Open comments for this post

40m 4s logged

  • added frozen DNABERT sequence-head evaluation with AUROC, AUPRC, and accuracy reporting.
  • added real null calibration, GC-matched background calibration, and improved the threshold sensitivity analysis.
  • regenerated the downstream task performance and threshold sensitivity results.

image: source control image showing the main changes i did in the threshold sensitivity code.

  • added frozen DNABERT sequence-head evaluation with AUROC, AUPRC, and accuracy reporting.
  • added real null calibration, GC-matched background calibration, and improved the threshold sensitivity analysis.
  • regenerated the downstream task performance and threshold sensitivity results.

image: source control image showing the main changes i did in the threshold sensitivity code.

Replying to @ArjunCodess

0
1
Open comments for this post

22m 10s logged

  • replaced the evidence summary with a clearer evidence table linking each claim to its supporting results and limitations.
  • added a section explaining the main lessons for mechanistic interpretability.
  • expanded the limitations section with clearer discussion of the study’s scope and assumptions.

image: source control image showing the main changes i did in the paper.

  • replaced the evidence summary with a clearer evidence table linking each claim to its supporting results and limitations.
  • added a section explaining the main lessons for mechanistic interpretability.
  • expanded the limitations section with clearer discussion of the study’s scope and assumptions.

image: source control image showing the main changes i did in the paper.

Replying to @ArjunCodess

0
5
Open comments for this post

18m 3s logged

  • added a glossary for the main biology terms.
  • explained why the selected motifs are biologically important.
  • simplified the theory section.
  • moved detailed proofs to the appendix.
  • clearly defined how nucleotide motifs are mapped to tokens.
  • added a simple workflow diagram for the full analysis pipeline.

image: source control image showing changes that i made in the code.

  • added a glossary for the main biology terms.
  • explained why the selected motifs are biologically important.
  • simplified the theory section.
  • moved detailed proofs to the appendix.
  • clearly defined how nucleotide motifs are mapped to tokens.
  • added a simple workflow diagram for the full analysis pipeline.

image: source control image showing changes that i made in the code.

Replying to @ArjunCodess

0
22
Open comments for this post

20m logged

made the entire pipeline fully reproducible, without any caps, including the threshold analysis and result generation.


image: screenshot of the pipeline being run.

made the entire pipeline fully reproducible, without any caps, including the threshold analysis and result generation.


image: screenshot of the pipeline being run.

Replying to @ArjunCodess

0
2
Open comments for this post

58m 16s logged

  • showed model performance before the interpretability analysis.
  • separated task performance from probe results.

  • combined all interpretability analyses into one pipeline.
  • made CTCF the main case study.
  • made the overall framing clearer.

  • explained every analysis threshold.
  • validated thresholds with sensitivity tests and control experiments.
  • showed that the main results stay consistent across different thresholds.
  • showed model performance before the interpretability analysis.
  • separated task performance from probe results.

  • combined all interpretability analyses into one pipeline.
  • made CTCF the main case study.
  • made the overall framing clearer.

  • explained every analysis threshold.
  • validated thresholds with sensitivity tests and control experiments.
  • showed that the main results stay consistent across different thresholds.

Replying to @ArjunCodess

0
2

Followers

Loading…