AI and Intro Epidemic Models: Navigating the New Frontier of Education

Last June, I happened to attend an ACM Tech Talk about LLMs in Intro Programing which left me very optimistic about the prospects of AI-assisted programming for my Introduction to Epidemic Modeling course.

I read the book that the tech talk speakers were writing and decided that it was not really what my epi students needed. But it left me hopeful that someone is working on that book, too.

In case no one writes it soon, I’ve also been trying to teach myself how to use AI to do disease modeling and data science tasks. I just wrapped up my disease modeling course for the quarter, though, and I did not figure it out in time to teach it to my students anything useful.

In my copious spare time since I finished lecturing, I’ve been using ChatGPT to solve Advent of Code challenges, and it has been a good education. I have a mental model of the output of a language model as the Platonic ideal of Bullshit (in the philosophical sense), and using it to solve carefully crafted coding challenges is a bit like trying to get an eager high school intern to help with my research.

Here is an example chat from my effort to solve the challenge from Day 2, which is pretty typical for how things have gone for me:

The text it generates is easy to read and well formatted. Unfortunately, it includes code that usually doesn’t work:

It might not work, it might be BS (in the philosophical sense), but it might still be useful! I left Zingaro and Porter’s talk convinced that AI-assisted programmers are going to need to build super skills in testing and debugging, and this last week of self-study has reinforced my belief.

As luck would have it, I was able to attend another (somewhat) relevant ACM Talk this week, titled “Unpredictable Black Boxes are Terrible Interfaces”. It was not as optimistic as the intro programming one, but it did get me thinking about how useful dialog is when working with eager interns. It is very important that humans feel comfortable saying they don’t understand and asking clarifying questions. I have trouble getting interns to contribute to my research when they are afraid to ask questions. If I understand correctly, Agrawala’s preferred interface for Generative AIs would be a system that asked clarifying questions before generating an image from his prompt. It turns out that I have seen a recipe for that:

I am going to try the next week of AoC with this Flipped Interaction Pattern. Here is my prompt, which is a work in progress, and here is my GPT, if you want to give it a try, too.

Comments Off on AI and Intro Epidemic Models: Navigating the New Frontier of Education

Filed under disease modeling, education

AI Assistance for Pseudopeople: GPTs for configuration dicts

Over the last year, I’ve been hard at work making simulated data. I love making simulated data, and I finally put a minimal blog about it up (https://healthyalgorithms.com/2023/11/19/introducing-pseudopeople-simulated-person-data-in-python/)

I have a persistent challenge when I use pseudopeople in my work: configuring the noise requires a deeply nested python dictionary, and I can never remember what goes in it.

After reading a recent dispatch from Simon Willison, I thought that maybe the new “GPTs” affordances from OpenAI could help me deal with this. I’m very optimistic about the potential of AI assistance for data science work.

And with just a short time of messing around, I have something I’m pretty happy with:
https://chat.openai.com/g/g-7e9Dfx1fv-pseudopeople-config-wizard

If you try it out and want to confirm that your custom config works, here is a Google Colab that you can use to test it out: https://colab.research.google.com/drive/1UG38OZigDwBy4zNJHo5fZ752LdalQ7Bw?usp=sharing

Comments Off on AI Assistance for Pseudopeople: GPTs for configuration dicts

Filed under census, software engineering

Introducing Pseudopeople: simulated person data in python

I’m still settling back into blogging as a custom, so perhaps that is why it has taken me six months to think of announcing our new python package here! Without further ado, let me introduce you to pseudopeople.

It is a Python package that generates realistic simulated data about a fictional United States population, designed for use in testing entity resolution methods or other data science algorithms at scale.

To see it for yourself, here is a three-line quickstart, suitable for using in a Google Colab or a Jupyter Notebook:

!pip install pseudopeople

import pseudopeople as psp
psp.generate_decennial_census()

Enjoy!

3 Comments

Filed under census, simulation

How to move data from KoboToolbox to SmartVA-Analyze

Barbara Muffoletto and I figured out how to export verbal autopsy data from KoboToolbox in a format suitable for running through SmartVA-Analyze. It was not too hard, but it was not too easy, either!

She recorded a 4.5 minute video of how to do it, so that it will be easier for others in the future, which I share with you here:

I hope everyone who needs this finds it!

Comments Off on How to move data from KoboToolbox to SmartVA-Analyze

Filed under global health, videos

New in peer reviewing: did you use a chatbot?

I haven’t seen a question like this before today. I wonder what the answers have been like.

Comments Off on New in peer reviewing: did you use a chatbot?

Filed under Uncategorized

testing

does this blog still work?

4 Comments

Filed under Uncategorized

Three cheers for pdb

I’ve been appreciating the Python debugger lately, and I want everyone who does data science work in Python to have a chance to appreciate it, too.

When I went looking for a good place to refer colleagues who want to learn this tool, I decided this extensive tutorial might be best: https://realpython.com/python-debugging-pdb/ But if you have something else you think they could use, please let me know.

I thought that there was a Software Carpentry lesson on this as well, but I’m not sure where the definitive source is (perhaps this is close?), and I think Wes McKinney’s book has a very practical section on it, but I can’t confirm that without a trip to my office (it looks like his new book will have it!).

Also I use pdb++ as much as I can now, and this inspired me to read the docs about some cool things it does that I haven’t used yet so maybe I’m going to appreciate debugging in Python even more soon.

Comments Off on Three cheers for pdb

Filed under Uncategorized

Mixed Effects Modeling in Python: country-level random effects with Bambi

A paper I helped with is now in print, Comfort et al, Association Between Subnational Vaccine Coverage, Migration, and Incident Cases of Measles, Mumps, and Rubella in Iraq, 2001–2016.

Figure 1. (A) Measles incidence per 100,000 persons in Iraq by governorate, 2001–2016


It is a good chance to test out a new python package for regression modeling that I have been excited about, the BAyesian Model-Building Interface (Bambi).

In the past, it has sometimes been too much work to include random effects in a regression model in Python. The heart of the methods section in this paper, for example, is this “In our linear mixed effects regression model, we set vaccine coverage as the independent variable and disease incidence per 100,000 as the dependent variable. We also included governorate as a random effect term to control for any correlation in incidence within governorates.”

This sort of model is standard enough that there are multiple R and Stata methods to apply it in one line once you have your data all prepped and loaded. But before Bambi, I didn’t know an easy way to do it in Python. I could code up a custom PyMC model, but that seems like more work than it should be.

So I was pleasantly surprised when I went to replicate some of Haley’s findings that the Bambi code to do this is super-simple:

model = bmb.Model("incidence ~ lagged_coverage + (1|gov)", df, dropna=True)
results = model.fit()

If you want to run it yourself, here is an IPython Notebook that includes the data wrangling and shows that code in context: https://gist.github.com/aflaxman/3fad36937c2d082abf99314061e16db1

Comments Off on Mixed Effects Modeling in Python: country-level random effects with Bambi

Filed under Uncategorized

Using “potentials” in Bambi to complicate a regression model

I have had my eye on a python package called Bambi for a while now, because I often need a regression model that is a little more complicated that sklearn.linear_model.LinearRegression but not complicated enough to make a whole new PyMC model.

Here is a minimal example (adapted from Awesome Open Source):

import bambi as bmb
import numpy as np, pandas as pd, arviz as az

data = pd.DataFrame({
    "y": np.random.normal(size=50),
    "x1": np.random.normal(size=50),
    "x2": np.random.normal(size=50)
})

model = bmb.Model("y ~ x1 + x2", data)
results = model.fit()
az.summary(results)

One cool thing about Bambi is that while it is simpler that writing a whole new PyMC model, it is a lot like writing a PyMC model. For example, if I need to add an informative prior, that is pretty easy:

priors = {'x1': bmb.Prior("Uniform", lower=0, upper=.05)}
model = bmb.Model("y ~ x1 + x2", data,
                 priors=priors)
results = model.fit()
az.summary(results)

And if I need a more complex distribution on that prior, Bambi exposes a “potential” parameter that puts additional terms in the posterior distribution, just like PyMC:

potentials = [
 (('x1', 'x2'),
 lambda x1, x2: bmb.math.switch((x1+x2 <= 0.0), 0, -999)),
]

model = bmb.Model("y ~ x1 + x2", data,
                 potentials=potentials)
results = model.fit()
az.summary(results)

I’m guessing that the syntax will continue to evolve, which is just one more reason Bambi is a python package that I am going to continue to watch.

Comments Off on Using “potentials” in Bambi to complicate a regression model

Filed under Uncategorized

Our nine phase approach to building a public health simulation

We wrote this up for a conference, but it didn’t have proceedings, so I’m putting it online here: Christine Allen, James Collins, Zane Rankin, Kate Wilson, Derrick Tsoi, Kelly Compton, Enabling Model Complexity Through an Improved Workflow, presented at Modeling World Systems Conference, Washington DC, May 13-15, 2019.

2019 Enabling Model Complexity Through an Improved Workflow MWS_paper Christine Allen

(The paper refers to eight phases of model development, but there is a ninth phase, which Christine wanted to keep secret: celebrate the successful completion of a modeling project.)

Comments Off on Our nine phase approach to building a public health simulation

Filed under Uncategorized