---
title: "Reproducible Research Notebooks for Digital Humanities"
author: "Rantideb Howlader"
date: "2026-05-26T00:00:00.000Z"
canonical_url: "https://www.ranti.dev/blog/reproducible-research-notebooks-digital-humanities"
license: "CC-BY-4.0"
---


## Why Reproducibility Is the Foundation of Good DH Research

Digital humanities sits at the intersection of computational methods and humanistic inquiry. This position creates a unique challenge. On one side, computer science demands that results be reproducible. On the other, humanities scholarship values interpretation, context, and situated knowledge. A reproducible research notebook bridges these two worlds by making your interpretive process transparent without reducing it to a mechanical procedure.

The problem is real. A 2019 survey of computational humanities papers found that fewer than 30% shared their code, and fewer than 15% shared both code and data in a form that another researcher could run without modification. This means that most computational claims in digital humanities cannot be independently verified. For a field that prides itself on critical thinking, this is a gap worth closing.

Reproducibility does not mean that every scholar must reach the same conclusion from the same data. Interpretation is always situated. But it does mean that another researcher should be able to run your pipeline, see the same intermediate outputs, and understand exactly where your interpretive choices enter the analysis. This is what a reproducible research notebook provides.

## The Anatomy of a DH Research Notebook

A well-structured reproducible research notebook has five layers. Each layer serves a different audience and a different purpose.

```mermaid
graph TD
    A[Narrative Layer] --> B[Code Layer]
    B --> C[Data Layer]
    C --> D[Environment Layer]
    D --> E[Output Layer]
    A -->|"Explains reasoning"| B
    B -->|"Processes"| C
    D -->|"Ensures consistency"| B
    B -->|"Generates"| E
```

The narrative layer is where you write as a humanist. You explain your research question, your theoretical framework, your interpretive choices, and your conclusions. This is not a code comment. It is scholarly prose that happens to live alongside code.

The code layer contains your computational methods. Every transformation you apply to your data lives here. Tokenization, frequency counts, model queries, statistical tests, visualization generation. Nothing is hidden in a separate script that the reader cannot see.

The data layer includes your primary sources or clear instructions for obtaining them. For public domain texts, you can include the data directly. For copyrighted materials or sensitive archives, you provide acquisition instructions and checksums so that another researcher can verify they have the same inputs.

The environment layer specifies exactly which software versions you used. A requirements.txt or environment.yml file pins every dependency. Better yet, a Dockerfile or Binder configuration lets another researcher recreate your exact computational environment with a single command.

The output layer contains your results: tables, figures, model outputs, and any derived datasets. These are generated by the code, not pasted in manually. If someone runs your notebook from top to bottom, they should get the same outputs.

## Setting Up Your First DH Research Notebook

Let us walk through the practical steps of creating a reproducible research notebook for a cultural interpretability project. We will use a concrete example: analyzing how a language model represents South Asian literary traditions compared to Western canonical texts.

### Step 1: Project Structure

Start with a clean directory structure that separates concerns:

```
project/
├── README.md
├── environment.yml
├── notebooks/
│   ├── 01-data-collection.ipynb
│   ├── 02-preprocessing.ipynb
│   ├── 03-analysis.ipynb
│   └── 04-visualization.ipynb
├── data/
│   ├── raw/
│   ├── processed/
│   └── README.md
├── src/
│   └── utils.py
└── outputs/
    ├── figures/
    └── tables/
```

Each notebook handles one stage of the pipeline. This makes it easier for another researcher to understand your workflow and to rerun only the parts they want to modify.

### Step 2: Environment Specification

Create an environment.yml file that pins your dependencies:

```yaml
name: dh-cultural-interpretability
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.11.8
  - jupyter=1.0.0
  - pandas=2.2.1
  - numpy=1.26.4
  - matplotlib=3.8.3
  - seaborn=0.13.2
  - transformers=4.38.2
  - torch=2.2.1
  - scikit-learn=1.4.1
  - nltk=3.8.1
  - pip:
      - sae-lens==3.2.0
      - datasets==2.18.0
```

Version pinning is not optional. A notebook that worked in March 2026 may produce different results in September 2026 if library updates change default parameters or fix bugs that your analysis inadvertently relied on.

### Step 3: Data Documentation

Your data/README.md should answer five questions:

1. What is in this dataset?
2. Where did it come from?
3. What license governs its use?
4. How was it collected or generated?
5. What preprocessing has been applied?

For cultural interpretability work, you also need to document the cultural context of your sources. A corpus of Bengali poetry from the 1920s carries different interpretive weight than a corpus of contemporary English-language blog posts. Your data documentation should make these differences explicit.

### Step 4: The Analysis Notebook

Here is a skeleton for a cultural interpretability analysis notebook:

```python
# Cell 1: Setup and imports
import torch
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM
from sae_lens import SAE

# Cell 2: Load model and SAE
model_name = "EleutherAI/pythia-410m"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)

# Cell 3: Define cultural referents
referents = {
    "western_canonical": [
        "Shakespeare wrote his plays in",
        "Homer composed the Iliad during",
        "Dante described the inferno as",
    ],
    "south_asian": [
        "Kalidasa wrote his plays in",
        "Valmiki composed the Ramayana during",
        "Tagore described Bengal as",
    ],
}

# Cell 4: Extract activations
def get_activations(prompts, model, tokenizer, layer=6):
    activations = []
    for prompt in prompts:
        inputs = tokenizer(prompt, return_tensors="pt")
        with torch.no_grad():
            outputs = model(**inputs, output_hidden_states=True)
        activations.append(
            outputs.hidden_states[layer][:, -1, :].squeeze()
        )
    return torch.stack(activations)

western_acts = get_activations(
    referents["western_canonical"], model, tokenizer
)
south_asian_acts = get_activations(
    referents["south_asian"], model, tokenizer
)

# Cell 5: Compare activation patterns
# [Analysis continues...]
```

Each cell has a clear purpose. Another researcher can read the narrative between cells to understand why you made each choice, then run the code to verify your results.

## Digital Humanities Methods for Cultural Interpretability

Cultural interpretability is the practice of reading computational outputs through the lens of cultural theory. It asks: what does this model know about a culture, and how does it organize that knowledge internally?

A reproducible research notebook is the ideal vehicle for cultural interpretability work because it forces you to make every step explicit. When you claim that a model represents Bengali literary traditions with fewer internal features than English literary traditions, the notebook shows exactly how you measured that claim.

```mermaid
flowchart LR
    A[Cultural Question] --> B[Operationalize]
    B --> C[Compute]
    C --> D[Interpret]
    D --> E[Validate]
    E -->|"Revise"| A

    B -->|"Define metrics"| C
    C -->|"Run notebook"| D
    D -->|"Peer review"| E
```

The cycle above shows how cultural interpretability works in practice. You start with a cultural question (how does this model represent postcolonial literature?), operationalize it into a computable form (measure activation density for postcolonial vs. canonical referents), compute the result (run the notebook), interpret the output (what does lower density mean culturally?), and validate through peer review (can others reproduce and critique your findings?).

### Three Core Methods

**Method 1: Activation Density Comparison**

This method counts how many internal features a model activates for different cultural referents. The hypothesis is that underrepresented cultures will have sparser internal representations.

```python
def compute_density(activations, sae, threshold=0.1):
    """Count active SAE features above threshold."""
    features = sae.encode(activations)
    active = (features > threshold).float()
    return active.sum(dim=-1).mean().item()

density_western = compute_density(western_acts, sae)
density_south_asian = compute_density(south_asian_acts, sae)

print(f"Western canonical density: {density_western:.1f} features")
print(f"South Asian density: {density_south_asian:.1f} features")
```

**Method 2: Feature Overlap Analysis**

This method examines whether the model uses shared or distinct features for different cultural traditions. High overlap suggests the model treats them as similar; low overlap suggests distinct internal representations.

```python
def feature_overlap(acts_a, acts_b, sae, threshold=0.1):
    """Compute Jaccard similarity of active feature sets."""
    feats_a = set(
        (sae.encode(acts_a) > threshold).any(dim=0).nonzero().squeeze().tolist()
    )
    feats_b = set(
        (sae.encode(acts_b) > threshold).any(dim=0).nonzero().squeeze().tolist()
    )
    intersection = feats_a & feats_b
    union = feats_a | feats_b
    return len(intersection) / len(union) if union else 0.0
```

**Method 3: Completion Divergence**

This method measures how much the model's text completions diverge from known facts about a cultural tradition. High divergence indicates confabulation or stereotyping.

```python
from scipy.stats import entropy

def completion_divergence(prompt, reference_dist, model, tokenizer, k=50):
    """KL divergence between model completions and reference distribution."""
    inputs = tokenizer(prompt, return_tensors="pt")
    with torch.no_grad():
        logits = model(**inputs).logits[:, -1, :]
    probs = torch.softmax(logits, dim=-1).squeeze()
    top_k = probs.topk(k)
    model_dist = top_k.values.numpy()
    model_dist = model_dist / model_dist.sum()
    return entropy(model_dist, reference_dist)
```

## Making Your Notebook Truly Reproducible

Writing code in a notebook is not enough for reproducibility. You need to address several additional concerns.

### Random Seeds

Any operation involving randomness must use a fixed seed:

```python
import random
import numpy as np
import torch

SEED = 42
random.seed(SEED)
np.random.seed(SEED)
torch.manual_seed(SEED)
if torch.cuda.is_available():
    torch.cuda.manual_seed_all(SEED)
```

### Execution Order

Notebooks allow out-of-order execution, which is a reproducibility hazard. Include a cell at the top that runs the entire notebook from scratch:

```python
# Run this cell to verify full reproducibility
# jupyter nbconvert --execute --to notebook notebook.ipynb
```

Better yet, include a Makefile or script that runs all notebooks in sequence:

```makefile
all: data analysis figures

data:
	jupyter nbconvert --execute notebooks/01-data-collection.ipynb

analysis: data
	jupyter nbconvert --execute notebooks/02-preprocessing.ipynb
	jupyter nbconvert --execute notebooks/03-analysis.ipynb

figures: analysis
	jupyter nbconvert --execute notebooks/04-visualization.ipynb
```

### Hardware Documentation

If your analysis requires a GPU, document which GPU you used and how long each step took. This helps other researchers estimate whether they can reproduce your work with their available resources.

```python
import torch
print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"GPU memory: {torch.cuda.get_device_properties(0).total_mem / 1e9:.1f} GB")
```

### Checkpoints and Intermediate Outputs

For long-running analyses, save intermediate results so that other researchers can start from any point in the pipeline:

```python
# Save intermediate activations
torch.save(western_acts, "data/processed/western_activations.pt")
torch.save(south_asian_acts, "data/processed/south_asian_activations.pt")

# Load from checkpoint (skip recomputation)
# western_acts = torch.load("data/processed/western_activations.pt")
```

## Publishing and Sharing Your Notebook

A reproducible research notebook is only useful if others can find and run it. Here is a publication checklist:

```mermaid
graph TD
    A[Finish Analysis] --> B[Clean Notebook]
    B --> C[Test Fresh Run]
    C --> D[Archive Data]
    D --> E[Create DOI]
    E --> F[Link from Paper]

    B -->|"Remove dead cells"| C
    C -->|"Verify outputs match"| D
    D -->|"Zenodo/Dataverse"| E
    E -->|"Cite in manuscript"| F
```

**Step 1: Clean the notebook.** Remove exploratory cells that did not contribute to the final analysis. Keep only the cells that form a coherent narrative from question to conclusion.

**Step 2: Test a fresh run.** Restart the kernel and run all cells from top to bottom. If any cell fails, fix it before publishing.

**Step 3: Archive your data.** Upload your dataset to a persistent repository like Zenodo or Dataverse. These services provide DOIs that will not break when you change institutions.

**Step 4: Create a DOI for the notebook.** Link your GitHub repository to Zenodo so that each release gets a citable DOI.

**Step 5: Link from your paper.** In your manuscript, include a data availability statement that points to the notebook and dataset DOIs.

## Common Pitfalls and How to Avoid Them

**Pitfall 1: The works on my machine problem.** Your notebook runs perfectly on your laptop but fails on a colleague's machine. Solution: use Docker or Binder to containerize the environment.

**Pitfall 2: Hardcoded paths.** Your notebook references /Users/yourname/Documents/project/data.csv. Solution: use relative paths and a consistent project structure.

**Pitfall 3: Missing data.** Your notebook requires a 50GB dataset that you cannot share due to copyright. Solution: provide a small sample dataset for testing, plus clear instructions for obtaining the full dataset.

**Pitfall 4: Undocumented decisions.** Your code makes choices (a threshold of 0.1, a layer index of 6) without explaining why. Solution: use markdown cells to justify every parameter choice with reference to prior work or exploratory analysis.

**Pitfall 5: No error handling.** Your notebook crashes halfway through if the network is unavailable or a file is missing. Solution: add try/except blocks and informative error messages that tell the user what went wrong and how to fix it.

## A Template You Can Download

To make this practical, I have prepared a starter template for digital humanities reproducible research notebooks. The template includes:

1. A pre-configured environment.yml with common DH dependencies
2. A notebook skeleton with narrative prompts for each section
3. A data documentation template
4. A Makefile for automated execution
5. A GitHub Actions workflow for continuous integration testing

The template is designed for cultural interpretability projects but adapts easily to other digital humanities methods including topic modeling, network analysis, stylometry, and corpus linguistics.

## Where This Fits in the Broader DH Landscape

Reproducible research notebooks are not a replacement for traditional humanities scholarship. They are a complement. The notebook handles the computational layer of your argument. The journal article or monograph handles the interpretive layer. Together, they form a complete scholarly contribution that is both intellectually rich and computationally verifiable.

For graduate students entering the field, learning to build reproducible notebooks is a career investment. Funding bodies increasingly require data management plans and open science practices. Journals in computational humanities are moving toward mandatory code sharing. The skills you build now will be expected of every DH scholar within five years.

For established researchers, reproducible notebooks offer a way to increase the impact of your work. A paper with a runnable notebook gets cited more often because other scholars can build on it directly. It also attracts collaboration requests from researchers who want to adapt your methods to their own cultural contexts.

## Next Steps

If you want to start building reproducible research notebooks for your digital humanities projects, here is what I recommend:

1. Pick one existing project and convert its analysis into a notebook format
2. Add environment specification and data documentation
3. Test that a colleague can run it without your help
4. Archive it with a DOI and link it from your next publication

If you need help setting up reproducible workflows for cultural interpretability research, or if you want to discuss how these methods apply to your specific project, I offer research collaboration services for DH scholars and graduate students. You can request research collaboration through the contact page, or download a starter DH notebook template from the resources section.

The goal is not perfection on the first try. The goal is to start making your computational humanities work transparent, shareable, and buildable. Every notebook you publish raises the bar for the entire field.


---

<!-- METADATA_START -->
## Metadata & Citations

### Further Reading
- [Digital Humanities Methods: A Comparison Guide](https://www.ranti.dev/blog/digital-humanities-methods-comparison-guide.md)
- [Download: DH Notebook for Cultural Analysis](https://www.ranti.dev/blog/download-dh-notebook-cultural-analysis.md)
- [Cultural Interpretability: Bengali Literature Case Study](https://www.ranti.dev/blog/cultural-interpretability-case-study-bengali-literature.md)

### Navigation
- [Back to Bio Hub](https://www.ranti.dev/.md)
- [Full Site Manifest](https://www.ranti.dev/llms.txt)

```json
{
  "@context": "https://schema.org",
  "@type": "TechArticle",
  "headline": "Reproducible Research Notebooks for Digital Humanities",
  "author": {
    "@type": "Person",
    "name": "Rantideb Howlader"
  },
  "datePublished": "2026-05-26T00:00:00.000Z",
  "url": "https://www.ranti.dev/blog/reproducible-research-notebooks-digital-humanities",
  "license": "https://creativecommons.org/licenses/by/4.0/",
  "isAccessibleForFree": true
}
```

### BibTeX
```bibtex
@article{reproducible-research-notebooks-digital-humanities_2026,
  author = {Rantideb Howlader},
  title = {Reproducible Research Notebooks for Digital Humanities},
  journal = {Rantideb Howlader Portfolio},
  year = {2026},
  url = {https://www.ranti.dev/blog/reproducible-research-notebooks-digital-humanities},
  note = {Accessed: 2026-05-31}
}
```

### IEEE
Rantideb Howlader, "Reproducible Research Notebooks for Digital Humanities," Rantideb Howlader Portfolio, 2026. [Online]. Available: https://www.ranti.dev/blog/reproducible-research-notebooks-digital-humanities. [Accessed: 2026-05-31].

### APA
Rantideb Howlader. (2026). Reproducible Research Notebooks for Digital Humanities. Rantideb Howlader. Retrieved from https://www.ranti.dev/blog/reproducible-research-notebooks-digital-humanities

--- 
*This content is provided in research-grade Markdown format. Required Attribution: Cite as Rantideb Howlader (2026).*
<!-- METADATA_END -->