Five areas of concern regarding AI in classrooms

When I was preparing to teach my Fall course, I was concerned about AI cheaters, and whether my lazy approach to getting students to do the reading would be totally outdated.  I came up with a “AI statement” for my syllabus that said students can use AI, but they have to tell me how they used it, and they have to take responsibility for the text they turn in, even if they used an AI in the process of generating it.

Now that the fall quarter has come and gone, it seems like a good time to reflect on things.  On third of the UW School of Public Health courses last fall had AI statements, with 15 saying “do not use” and 30 saying use in some way (such as “use with permission”, or “use with disclosure”).

In hindsight, AI cheating was not the thing I should have been worrying about.  Here are five areas of concern that I learned about from my students and colleagues that I will be paying more attention to next time around:

1. Access and equity – there is a risk with the “pay to play” state of the technology right now.  How shall we guard against a new digital divide between those who have access to state-of-the-art AI and those who do not?  IHME has ChatGPT-4 for all staff, but only the Health Metrics Sciences students who have IHME Research Assistantship can use it.  As far as I can tell, the Epi Department students all have to buy access.  From what I can tell, the University of Michigan is solving this, are other schools?


“When I speak in front of groups and ask them to raise their hands if they used the free version of ChatGPT, almost every hand goes up. When I ask the same group how many use GPT-4, almost no one raises their hand. I increasingly think the decision of OpenAI to make the “bad” AI free is causing people to miss why AI seems like such a huge deal to a minority of people that use advanced systems and elicits a shrug from everyone else.” —Ethan Mollick

2. Interfering with the “novice-to-expert” progression – will we no longer have expert disease modelers, because novice disease modelers who rely on AI do not progress beyond novice level modeling?

3. Environmental impact – what does running a language model cost in terms of energy consumption? Is it worth the impact?

4. Implicit bias – language models repeat and reinforce systems of oppression present in training data.  How can we guard against this harming society?

5. Privacy and confidentiality – everything we type into an online system might be used as “training data” for future systems.  What are the risks of this practice, and how can we act responsibly?

Comments Off on Five areas of concern regarding AI in classrooms

Filed under education

AI and Intro Epidemic Models: Navigating the New Frontier of Education

Last June, I happened to attend an ACM Tech Talk about LLMs in Intro Programing which left me very optimistic about the prospects of AI-assisted programming for my Introduction to Epidemic Modeling course.

I read the book that the tech talk speakers were writing and decided that it was not really what my epi students needed. But it left me hopeful that someone is working on that book, too.

In case no one writes it soon, I’ve also been trying to teach myself how to use AI to do disease modeling and data science tasks. I just wrapped up my disease modeling course for the quarter, though, and I did not figure it out in time to teach it to my students anything useful.

In my copious spare time since I finished lecturing, I’ve been using ChatGPT to solve Advent of Code challenges, and it has been a good education. I have a mental model of the output of a language model as the Platonic ideal of Bullshit (in the philosophical sense), and using it to solve carefully crafted coding challenges is a bit like trying to get an eager high school intern to help with my research.

Here is an example chat from my effort to solve the challenge from Day 2, which is pretty typical for how things have gone for me:

The text it generates is easy to read and well formatted. Unfortunately, it includes code that usually doesn’t work:

It might not work, it might be BS (in the philosophical sense), but it might still be useful! I left Zingaro and Porter’s talk convinced that AI-assisted programmers are going to need to build super skills in testing and debugging, and this last week of self-study has reinforced my belief.

As luck would have it, I was able to attend another (somewhat) relevant ACM Talk this week, titled “Unpredictable Black Boxes are Terrible Interfaces”. It was not as optimistic as the intro programming one, but it did get me thinking about how useful dialog is when working with eager interns. It is very important that humans feel comfortable saying they don’t understand and asking clarifying questions. I have trouble getting interns to contribute to my research when they are afraid to ask questions. If I understand correctly, Agrawala’s preferred interface for Generative AIs would be a system that asked clarifying questions before generating an image from his prompt. It turns out that I have seen a recipe for that:

I am going to try the next week of AoC with this Flipped Interaction Pattern. Here is my prompt, which is a work in progress, and here is my GPT, if you want to give it a try, too.

Comments Off on AI and Intro Epidemic Models: Navigating the New Frontier of Education

Filed under disease modeling, education

AI Assistance for Pseudopeople: GPTs for configuration dicts

Over the last year, I’ve been hard at work making simulated data. I love making simulated data, and I finally put a minimal blog about it up (https://healthyalgorithms.com/2023/11/19/introducing-pseudopeople-simulated-person-data-in-python/)

I have a persistent challenge when I use pseudopeople in my work: configuring the noise requires a deeply nested python dictionary, and I can never remember what goes in it.

After reading a recent dispatch from Simon Willison, I thought that maybe the new “GPTs” affordances from OpenAI could help me deal with this. I’m very optimistic about the potential of AI assistance for data science work.

And with just a short time of messing around, I have something I’m pretty happy with:
https://chat.openai.com/g/g-7e9Dfx1fv-pseudopeople-config-wizard

If you try it out and want to confirm that your custom config works, here is a Google Colab that you can use to test it out: https://colab.research.google.com/drive/1UG38OZigDwBy4zNJHo5fZ752LdalQ7Bw?usp=sharing

Comments Off on AI Assistance for Pseudopeople: GPTs for configuration dicts

Filed under census, software engineering

Introducing Pseudopeople: simulated person data in python

I’m still settling back into blogging as a custom, so perhaps that is why it has taken me six months to think of announcing our new python package here! Without further ado, let me introduce you to pseudopeople.

It is a Python package that generates realistic simulated data about a fictional United States population, designed for use in testing entity resolution methods or other data science algorithms at scale.

To see it for yourself, here is a three-line quickstart, suitable for using in a Google Colab or a Jupyter Notebook:

!pip install pseudopeople

import pseudopeople as psp
psp.generate_decennial_census()

Enjoy!

3 Comments

Filed under census, simulation

How to move data from KoboToolbox to SmartVA-Analyze

Barbara Muffoletto and I figured out how to export verbal autopsy data from KoboToolbox in a format suitable for running through SmartVA-Analyze. It was not too hard, but it was not too easy, either!

She recorded a 4.5 minute video of how to do it, so that it will be easier for others in the future, which I share with you here:

I hope everyone who needs this finds it!

Comments Off on How to move data from KoboToolbox to SmartVA-Analyze

Filed under global health, videos

New in peer reviewing: did you use a chatbot?

I haven’t seen a question like this before today. I wonder what the answers have been like.

Comments Off on New in peer reviewing: did you use a chatbot?

Filed under Uncategorized

testing

does this blog still work?

4 Comments

Filed under Uncategorized

Three cheers for pdb

I’ve been appreciating the Python debugger lately, and I want everyone who does data science work in Python to have a chance to appreciate it, too.

When I went looking for a good place to refer colleagues who want to learn this tool, I decided this extensive tutorial might be best: https://realpython.com/python-debugging-pdb/ But if you have something else you think they could use, please let me know.

I thought that there was a Software Carpentry lesson on this as well, but I’m not sure where the definitive source is (perhaps this is close?), and I think Wes McKinney’s book has a very practical section on it, but I can’t confirm that without a trip to my office (it looks like his new book will have it!).

Also I use pdb++ as much as I can now, and this inspired me to read the docs about some cool things it does that I haven’t used yet so maybe I’m going to appreciate debugging in Python even more soon.

Comments Off on Three cheers for pdb

Filed under Uncategorized

Mixed Effects Modeling in Python: country-level random effects with Bambi

A paper I helped with is now in print, Comfort et al, Association Between Subnational Vaccine Coverage, Migration, and Incident Cases of Measles, Mumps, and Rubella in Iraq, 2001–2016.

Figure 1. (A) Measles incidence per 100,000 persons in Iraq by governorate, 2001–2016


It is a good chance to test out a new python package for regression modeling that I have been excited about, the BAyesian Model-Building Interface (Bambi).

In the past, it has sometimes been too much work to include random effects in a regression model in Python. The heart of the methods section in this paper, for example, is this “In our linear mixed effects regression model, we set vaccine coverage as the independent variable and disease incidence per 100,000 as the dependent variable. We also included governorate as a random effect term to control for any correlation in incidence within governorates.”

This sort of model is standard enough that there are multiple R and Stata methods to apply it in one line once you have your data all prepped and loaded. But before Bambi, I didn’t know an easy way to do it in Python. I could code up a custom PyMC model, but that seems like more work than it should be.

So I was pleasantly surprised when I went to replicate some of Haley’s findings that the Bambi code to do this is super-simple:

model = bmb.Model("incidence ~ lagged_coverage + (1|gov)", df, dropna=True)
results = model.fit()

If you want to run it yourself, here is an IPython Notebook that includes the data wrangling and shows that code in context: https://gist.github.com/aflaxman/3fad36937c2d082abf99314061e16db1

Comments Off on Mixed Effects Modeling in Python: country-level random effects with Bambi

Filed under Uncategorized

Using “potentials” in Bambi to complicate a regression model

I have had my eye on a python package called Bambi for a while now, because I often need a regression model that is a little more complicated that sklearn.linear_model.LinearRegression but not complicated enough to make a whole new PyMC model.

Here is a minimal example (adapted from Awesome Open Source):

import bambi as bmb
import numpy as np, pandas as pd, arviz as az

data = pd.DataFrame({
    "y": np.random.normal(size=50),
    "x1": np.random.normal(size=50),
    "x2": np.random.normal(size=50)
})

model = bmb.Model("y ~ x1 + x2", data)
results = model.fit()
az.summary(results)

One cool thing about Bambi is that while it is simpler that writing a whole new PyMC model, it is a lot like writing a PyMC model. For example, if I need to add an informative prior, that is pretty easy:

priors = {'x1': bmb.Prior("Uniform", lower=0, upper=.05)}
model = bmb.Model("y ~ x1 + x2", data,
                 priors=priors)
results = model.fit()
az.summary(results)

And if I need a more complex distribution on that prior, Bambi exposes a “potential” parameter that puts additional terms in the posterior distribution, just like PyMC:

potentials = [
 (('x1', 'x2'),
 lambda x1, x2: bmb.math.switch((x1+x2 <= 0.0), 0, -999)),
]

model = bmb.Model("y ~ x1 + x2", data,
                 potentials=potentials)
results = model.fit()
az.summary(results)

I’m guessing that the syntax will continue to evolve, which is just one more reason Bambi is a python package that I am going to continue to watch.

Comments Off on Using “potentials” in Bambi to complicate a regression model

Filed under Uncategorized