microplex#

Microdata synthesis and reweighting using normalizing flows.

microplex creates rich, calibrated microdata by combining synthesis and reweighting.

Key Features#

  • Conditional synthesis: Generate target variables given demographics

  • Sparse reweighting: L0/L1 optimization to match population targets

  • Multi-source fusion: Combine CPS, ACS, admin data into one population

  • Zero-inflation handling: Built-in support for variables with many zeros

  • Scalable: Reweight to any geography

Installation#

pip install microplex

Quick Example#

from microplex import Synthesizer

# Initialize
synth = Synthesizer(
    target_vars=["income", "expenditure"],
    condition_vars=["age", "education", "region"],
)

# Fit on training data
synth.fit(training_data, weight_col="weight")

# Generate for new demographics
synthetic = synth.generate(new_demographics)

Use Cases#

Use Case

Description

Survey enhancement

Impute income variables from tax data onto census

Small area estimation

Reweight synthetic population to county/tract targets

Privacy synthesis

Generate synthetic data for public release

Data fusion

Combine variables from CPS, ACS, SIPP, admin data

The microplex Workflow#

                         ┌─────────────────────────────────────┐
                         │           DATA SOURCES              │
                         ├─────────┬─────────┬─────────────────┤
                         │   CPS   │   ACS   │   Admin Data    │
                         │ income  │  geo    │   validation    │
                         │  tax    │ housing │    targets      │
                         └────┬────┴────┬────┴────────┬────────┘
                              │         │             │
                              ▼         ▼             │
                    ┌─────────────────────────┐       │
                    │    CONDITIONAL MAF      │       │
                    │  P(targets | context)   │       │
                    │                         │       │
                    │  • Zero-inflation       │       │
                    │  • Per-variable models  │       │
                    └───────────┬─────────────┘       │
                                │                     │
                                ▼                     │
                    ┌─────────────────────────┐       │
                    │    SYNTHESIZE           │       │
                    │    POPULATION           │       │
                    └───────────┬─────────────┘       │
                                │                     │
                                ▼                     ▼
                    ┌─────────────────────────────────────────┐
                    │         SPARSE REWEIGHTING              │
                    │                                         │
                    │   min ||w||₀  s.t.  Σ wᵢxᵢ = targets   │
                    │                                         │
                    │   • Match population margins            │
                    │   • Any geography (state/county/tract)  │
                    │   • Minimal record subset               │
                    └───────────────────┬─────────────────────┘
                                        │
                                        ▼
                    ┌─────────────────────────────────────────┐
                    │         CALIBRATED MICRODATA            │
                    │                                         │
                    │   Rich population with:                 │
                    │   • All variables from all sources      │
                    │   • Matches official statistics         │
                    │   • Any geographic granularity          │
                    └─────────────────────────────────────────┘

Comparison to Alternatives#

Feature

microplex

synthpop

Conditional generation

Zero-inflation handling

⚠️

Sparse reweighting

Multi-source fusion

⚠️

Multiple synthesis methods

✅ (QRF, QDNN, MAF)

✅ (CART)

Contents#