AI Assistance for Pseudopeople: GPTs for configuration dicts

Over the last year, I’ve been hard at work making simulated data. I love making simulated data, and I finally put a minimal blog about it up (https://healthyalgorithms.com/2023/11/19/introducing-pseudopeople-simulated-person-data-in-python/)

I have a persistent challenge when I use pseudopeople in my work: configuring the noise requires a deeply nested python dictionary, and I can never remember what goes in it.

After reading a recent dispatch from Simon Willison, I thought that maybe the new “GPTs” affordances from OpenAI could help me deal with this. I’m very optimistic about the potential of AI assistance for data science work.

And with just a short time of messing around, I have something I’m pretty happy with:
https://chat.openai.com/g/g-7e9Dfx1fv-pseudopeople-config-wizard

If you try it out and want to confirm that your custom config works, here is a Google Colab that you can use to test it out: https://colab.research.google.com/drive/1UG38OZigDwBy4zNJHo5fZ752LdalQ7Bw?usp=sharing

Comments Off on AI Assistance for Pseudopeople: GPTs for configuration dicts

Filed under census, software engineering

Introducing Pseudopeople: simulated person data in python

I’m still settling back into blogging as a custom, so perhaps that is why it has taken me six months to think of announcing our new python package here! Without further ado, let me introduce you to pseudopeople.

It is a Python package that generates realistic simulated data about a fictional United States population, designed for use in testing entity resolution methods or other data science algorithms at scale.

To see it for yourself, here is a three-line quickstart, suitable for using in a Google Colab or a Jupyter Notebook:

!pip install pseudopeople

import pseudopeople as psp
psp.generate_decennial_census()

Enjoy!

3 Comments

Filed under census, simulation

How to move data from KoboToolbox to SmartVA-Analyze

Barbara Muffoletto and I figured out how to export verbal autopsy data from KoboToolbox in a format suitable for running through SmartVA-Analyze. It was not too hard, but it was not too easy, either!

She recorded a 4.5 minute video of how to do it, so that it will be easier for others in the future, which I share with you here:

I hope everyone who needs this finds it!

Comments Off on How to move data from KoboToolbox to SmartVA-Analyze

Filed under global health, videos

New in peer reviewing: did you use a chatbot?

I haven’t seen a question like this before today. I wonder what the answers have been like.

Comments Off on New in peer reviewing: did you use a chatbot?

Filed under Uncategorized

testing

does this blog still work?

4 Comments

Filed under Uncategorized

Three cheers for pdb

I’ve been appreciating the Python debugger lately, and I want everyone who does data science work in Python to have a chance to appreciate it, too.

When I went looking for a good place to refer colleagues who want to learn this tool, I decided this extensive tutorial might be best: https://realpython.com/python-debugging-pdb/ But if you have something else you think they could use, please let me know.

I thought that there was a Software Carpentry lesson on this as well, but I’m not sure where the definitive source is (perhaps this is close?), and I think Wes McKinney’s book has a very practical section on it, but I can’t confirm that without a trip to my office (it looks like his new book will have it!).

Also I use pdb++ as much as I can now, and this inspired me to read the docs about some cool things it does that I haven’t used yet so maybe I’m going to appreciate debugging in Python even more soon.

Comments Off on Three cheers for pdb

Filed under Uncategorized

Mixed Effects Modeling in Python: country-level random effects with Bambi

A paper I helped with is now in print, Comfort et al, Association Between Subnational Vaccine Coverage, Migration, and Incident Cases of Measles, Mumps, and Rubella in Iraq, 2001–2016.

Figure 1. (A) Measles incidence per 100,000 persons in Iraq by governorate, 2001–2016


It is a good chance to test out a new python package for regression modeling that I have been excited about, the BAyesian Model-Building Interface (Bambi).

In the past, it has sometimes been too much work to include random effects in a regression model in Python. The heart of the methods section in this paper, for example, is this “In our linear mixed effects regression model, we set vaccine coverage as the independent variable and disease incidence per 100,000 as the dependent variable. We also included governorate as a random effect term to control for any correlation in incidence within governorates.”

This sort of model is standard enough that there are multiple R and Stata methods to apply it in one line once you have your data all prepped and loaded. But before Bambi, I didn’t know an easy way to do it in Python. I could code up a custom PyMC model, but that seems like more work than it should be.

So I was pleasantly surprised when I went to replicate some of Haley’s findings that the Bambi code to do this is super-simple:

model = bmb.Model("incidence ~ lagged_coverage + (1|gov)", df, dropna=True)
results = model.fit()

If you want to run it yourself, here is an IPython Notebook that includes the data wrangling and shows that code in context: https://gist.github.com/aflaxman/3fad36937c2d082abf99314061e16db1

Comments Off on Mixed Effects Modeling in Python: country-level random effects with Bambi

Filed under Uncategorized

Using “potentials” in Bambi to complicate a regression model

I have had my eye on a python package called Bambi for a while now, because I often need a regression model that is a little more complicated that sklearn.linear_model.LinearRegression but not complicated enough to make a whole new PyMC model.

Here is a minimal example (adapted from Awesome Open Source):

import bambi as bmb
import numpy as np, pandas as pd, arviz as az

data = pd.DataFrame({
    "y": np.random.normal(size=50),
    "x1": np.random.normal(size=50),
    "x2": np.random.normal(size=50)
})

model = bmb.Model("y ~ x1 + x2", data)
results = model.fit()
az.summary(results)

One cool thing about Bambi is that while it is simpler that writing a whole new PyMC model, it is a lot like writing a PyMC model. For example, if I need to add an informative prior, that is pretty easy:

priors = {'x1': bmb.Prior("Uniform", lower=0, upper=.05)}
model = bmb.Model("y ~ x1 + x2", data,
                 priors=priors)
results = model.fit()
az.summary(results)

And if I need a more complex distribution on that prior, Bambi exposes a “potential” parameter that puts additional terms in the posterior distribution, just like PyMC:

potentials = [
 (('x1', 'x2'),
 lambda x1, x2: bmb.math.switch((x1+x2 <= 0.0), 0, -999)),
]

model = bmb.Model("y ~ x1 + x2", data,
                 potentials=potentials)
results = model.fit()
az.summary(results)

I’m guessing that the syntax will continue to evolve, which is just one more reason Bambi is a python package that I am going to continue to watch.

Comments Off on Using “potentials” in Bambi to complicate a regression model

Filed under Uncategorized

Our nine phase approach to building a public health simulation

We wrote this up for a conference, but it didn’t have proceedings, so I’m putting it online here: Christine Allen, James Collins, Zane Rankin, Kate Wilson, Derrick Tsoi, Kelly Compton, Enabling Model Complexity Through an Improved Workflow, presented at Modeling World Systems Conference, Washington DC, May 13-15, 2019.

2019 Enabling Model Complexity Through an Improved Workflow MWS_paper Christine Allen

(The paper refers to eight phases of model development, but there is a ninth phase, which Christine wanted to keep secret: celebrate the successful completion of a modeling project.)

Comments Off on Our nine phase approach to building a public health simulation

Filed under Uncategorized

Creative nonfiction and Professor Gloxman

I published some creative nonfiction about Census data and re-identification attacks on twitter, and I’m linking to it here so I can find it again more easily:

Chapter 1: https://twitter.com/healthyalgo/status/1388194166555897856

Chapter 2: https://twitter.com/healthyalgo/status/1388285589259051011

Chapter 3: https://twitter.com/healthyalgo/status/1388524419866193926

Chapter 4: https://twitter.com/healthyalgo/status/1388887577747333132

Epilogue: https://twitter.com/healthyalgo/status/1389245174916665355

Comments Off on Creative nonfiction and Professor Gloxman

Filed under Uncategorized