Monthly Archives: November 2023

AI Assistance for Pseudopeople: GPTs for configuration dicts

Over the last year, I’ve been hard at work making simulated data. I love making simulated data, and I finally put a minimal blog about it up (https://healthyalgorithms.com/2023/11/19/introducing-pseudopeople-simulated-person-data-in-python/)

I have a persistent challenge when I use pseudopeople in my work: configuring the noise requires a deeply nested python dictionary, and I can never remember what goes in it.

After reading a recent dispatch from Simon Willison, I thought that maybe the new “GPTs” affordances from OpenAI could help me deal with this. I’m very optimistic about the potential of AI assistance for data science work.

And with just a short time of messing around, I have something I’m pretty happy with:
https://chat.openai.com/g/g-7e9Dfx1fv-pseudopeople-config-wizard

If you try it out and want to confirm that your custom config works, here is a Google Colab that you can use to test it out: https://colab.research.google.com/drive/1UG38OZigDwBy4zNJHo5fZ752LdalQ7Bw?usp=sharing

Comments Off on AI Assistance for Pseudopeople: GPTs for configuration dicts

Filed under census, software engineering

Introducing Pseudopeople: simulated person data in python

I’m still settling back into blogging as a custom, so perhaps that is why it has taken me six months to think of announcing our new python package here! Without further ado, let me introduce you to pseudopeople.

It is a Python package that generates realistic simulated data about a fictional United States population, designed for use in testing entity resolution methods or other data science algorithms at scale.

To see it for yourself, here is a three-line quickstart, suitable for using in a Google Colab or a Jupyter Notebook:

!pip install pseudopeople

import pseudopeople as psp
psp.generate_decennial_census()

Enjoy!

3 Comments

Filed under census, simulation