microplex#
Microdata synthesis and reweighting using normalizing flows.
microplex creates rich, calibrated microdata by combining synthesis and reweighting.
Key Features#
Conditional synthesis: Generate target variables given demographics
Sparse reweighting: L0/L1 optimization to match population targets
Multi-source fusion: Combine CPS, ACS, admin data into one population
Zero-inflation handling: Built-in support for variables with many zeros
Scalable: Reweight to any geography
Installation#
pip install microplex
Quick Example#
from microplex import Synthesizer
# Initialize
synth = Synthesizer(
target_vars=["income", "expenditure"],
condition_vars=["age", "education", "region"],
)
# Fit on training data
synth.fit(training_data, weight_col="weight")
# Generate for new demographics
synthetic = synth.generate(new_demographics)
Use Cases#
Use Case |
Description |
|---|---|
Survey enhancement |
Impute income variables from tax data onto census |
Small area estimation |
Reweight synthetic population to county/tract targets |
Privacy synthesis |
Generate synthetic data for public release |
Data fusion |
Combine variables from CPS, ACS, SIPP, admin data |
The microplex Workflow#
┌─────────────────────────────────────┐
│ DATA SOURCES │
├─────────┬─────────┬─────────────────┤
│ CPS │ ACS │ Admin Data │
│ income │ geo │ validation │
│ tax │ housing │ targets │
└────┬────┴────┬────┴────────┬────────┘
│ │ │
▼ ▼ │
┌─────────────────────────┐ │
│ CONDITIONAL MAF │ │
│ P(targets | context) │ │
│ │ │
│ • Zero-inflation │ │
│ • Per-variable models │ │
└───────────┬─────────────┘ │
│ │
▼ │
┌─────────────────────────┐ │
│ SYNTHESIZE │ │
│ POPULATION │ │
└───────────┬─────────────┘ │
│ │
▼ ▼
┌─────────────────────────────────────────┐
│ SPARSE REWEIGHTING │
│ │
│ min ||w||₀ s.t. Σ wᵢxᵢ = targets │
│ │
│ • Match population margins │
│ • Any geography (state/county/tract) │
│ • Minimal record subset │
└───────────────────┬─────────────────────┘
│
▼
┌─────────────────────────────────────────┐
│ CALIBRATED MICRODATA │
│ │
│ Rich population with: │
│ • All variables from all sources │
│ • Matches official statistics │
│ • Any geographic granularity │
└─────────────────────────────────────────┘
Comparison to Alternatives#
Feature |
microplex |
synthpop |
|---|---|---|
Conditional generation |
✅ |
❌ |
Zero-inflation handling |
✅ |
⚠️ |
Sparse reweighting |
✅ |
❌ |
Multi-source fusion |
✅ |
⚠️ |
Multiple synthesis methods |
✅ (QRF, QDNN, MAF) |
✅ (CART) |