Tag Archives: optimization

Convex optimization in Python: Non-negative least squares with Pyomo and Ipopt

Last month I attended a CNSTAT workshop https://sites.nationalacademies.org/DBASSE/CNSTAT/DBASSE_196518 on differentially private (DP) census data and left thinking that I need to get more hands-on experience with non-negative least squares optimization.

In the DP approach to disclosure avoidance that Census Bureau is planning, there is a “post-processing” step that improves the acceptability of imprecise DP counts of the number of individuals in a subgroup, like the voting age males who are non-Hispanic whites in King County census tract 89 http://ihmeuw.org/50cg. This improvement relies on non-negative least squares optimization. It finds fractional counts that sum to a known control total, that are not negative, and that minimize the sum of squared differences between these optimized counts and the DP imprecise counts. Sam Petti and I wrote out the details here, https://gatesopenresearch.org/articles/3-1722/v1 if you want to know more about this.

This blog is not about the DP details, but about how, in practice, I might try doing this sort of optimization. As anyone who has a PhD in Algorithms, Combinatorics, and Optimization can tell you, this sort of constrained convex optimization can be computed efficiently, meaning in time that scales like a polynomial function of the size of the instance. But when I was leaving graduate school, good luck if you wanted to actually solve a specific instance.

So I am quite pleased now to see how open-source software is making it possible today.

Ipopt https://github.com/coin-or/Ipopt#ipopt is an open-source interior point solver that can handle a quadratic objective function with linear equality and inequality constraints. My colleague Brad Bell has been a far of it for a decade, but until now I’ve just used it indirectly through his code.

Pyomo https://github.com/Pyomo/pyomo#pyomo-overview is a pythonic interface for many optimization solvers, including Ipopt, and now that I want to do something without Brad’s help, I’m glad to have it.

Here is a stripped down version of the optimization I want to do

\min \left\{ \sum_{i=1}^I (x_i - c_i)^2 : x_i \geq 0, \sum_{i=1}^I x_i = C\right\}

I’m completely dedicated to the Anaconda python distribution at this point (as setup and used in Software Carpentry), and if you are using this approach you can get everything you need to do such an optimization with the following

conda install -c conda-forge pyomo ipopt

You can test if this is working by executing the following little snippet of code in an IPython shell or Jupyter Notebook:

from pyomo.core import *
from pyomo.opt import SolverFactory

m = ConcreteModel()
m.x = Var([1,2], within=NonNegativeReals)
m.objective = Objective(expr=m.x[1]**2 + m.x[2]**2)
m.constraint = Constraint(expr=m.x[1] + m.x[2] == 1)
    
solver = SolverFactory('ipopt')
results = solver.solve(m)

print('Optimal value of x:',
            (value(m.x[1]), value(m.x[2])))

Here is a challenge, if you like such a thing: use this test and the nonnegative least squares formula above to make a function that takes a list of imprecise counts and a float for a control total, and returns the optimized (fractional) counts for them.

It does take a bit of fiddling to get the pyomo python right, and then there are plenty of options to Ipopt that might make the optimization work better in more complicated cases. I found the book Pyomo — Optimization Modeling in Python useful for the former (and a pdf from a talk entitled Pyomo Tutorial – OSTI.GOV a good substitute before I got my hands on the book). For the latter, I found an extensive webpage of Ipopt options gave me some good hints, especially when combined with web searching, which provided only some small utility on its own.)

It is so cool that this is now possible. Thanks to all who made it so, including some old friends who must have been working on this back when I was a summer intern at IBM.

And for readers who don’t like or don’t have time for the challenge of coding it up themselves (perhaps including myself a day or a year from now), here is a solution:

def nonnegative_optimize(imprecise_counts, control_total):
    """optimize the imprecise counts so that they sum to
    the control total and are non-negative
    
    Parameters
    ----------
    imprecise_counts : list-like of floats
    control_total : float
    
    Results
    -------
    returns optimized_counts, which are close to imprecise
    counts, but not negative, and match control total in
    aggregate
    """
    imprecise_counts = list(imprecise_counts)
    
    model = ConcreteModel()
    model.I = range(len(imprecise_counts))
    model.x = Var(model.I, within=NonNegativeReals)
    model.objective = Objective(
        expr=sum((model.x[i] - imprecise_counts[i])**2
                 for i in model.I))
    model.constraint = Constraint(
        expr=summation(model.x) == control_total)
    
    solver = SolverFactory('ipopt')
    results = solver.solve(model)
    optimized_counts = [value(model.x[i])
                        for i in model.I]
        
    return optimized_counts

p.s. if you want to see what this looks like in action, or for convenience in replication efforts, find a Jupyter Notebook version of the code from this blog here: https://gist.github.com/aflaxman/d7b39fc8cd5805e344a66f9c9d16acf7

Leave a comment

Filed under census

C4G @ GaTech

The Chronicle of Higher Ed has a short piece on public-service applications of computer science that are coming out of a class called Computing for Good (C4G) that TCS star Santosh Vempala co-taught at Georgia Tech last spring.

This is an idea that is emerging in several ACO-related disciplines. Manuela Veloso has been running a similar program at CMU called V-Unit, Karen Smilowitz and Michael Johnson held a session at INFORMS 2007 on community-based operations research, and in 2006 student statisticians started a network of volunteer consultancies called Statistics in the Community.

It’s great to see a tradition of “pro bono” work developing in theoretical fields. It’s not just a way for lawyers to assuage their consciences anymore.

1 Comment

Filed under education, science policy

ACO in Python: Minimum Weight Perfect Matchings (a.k.a. Matching Algorithms and Reproductive Health: Part 4)

This is the final item in my series on Matching Algorithms and Reproductive Health, and it brings the story full circle, returning to the algorithms side of the show. Today I’ll demonstrate how to actually find minimum-weight perfect matchings in Python, and toss in a little story about \pi^2/6. Continue reading

4 Comments

Filed under combinatorial optimization

Matching Algorithms and Reproductive Health: Part 3, A Stylized Virginity Pledge

It’s been three weeks and one IHME retreat since I wrote about matching algorithms and virginity pledges, and I think I now understand what’s going on in Patient Teenagers well enough to describe it. I’ll try to give a stylized example of how the minimum-weight perfect matching algorithm makes itself useful in reproductive health research.

I think it’s helpful to focus on a concrete research question about the virginity pledge and its effects on reproductive health. Here’s one: “does taking the pledge reduce the chances that an individual contracts trichomoniasis?” If the answer is yes, or if the answer is no, people can still argue about the value of the virginity pledge programs, but this seems like relevant information for decision making. Continue reading

2 Comments

Filed under combinatorial optimization, global health

Matching Algorithms and Reproductive Health: Part 2, Matching and Virginity Pledges

I might have been a little over-ambitious with this series. I wrote a little bit about the how matching theory emerged from the social sciences two weeks ago. But then I got really busy! And that was the part I actually knew something about ahead of time. The promised connection between matching algorithms and reproductive health (and more generally, how matching is being used in quasi-experiment design) is the part that I have to do some reading on before I can write knowledgeably about.

However, I have a plan: I’d like to “crowd-source” my library research. Continue reading

4 Comments

Filed under combinatorial optimization, global health

Matching Algorithms and Reproductive Health: Part 1, Matchings Emerge from Social Science

Earlier this week, I was inspired by current events to launch a bold, crazy-sounding series about matching theory and its application to reproductive health.  This first installment is a quick social history of the development of matching theory, largely influenced by (and fact-checked against) Lex Scrijver’s encyclopediac Combinatorial Optimization: Polyhedra and Efficiency.  His paper “On the history of combinatorial optimization (till 1960)” contains similar information in an easy-to-download form.

On to the story:  how social science applications drove the  development of matching theory. Continue reading

6 Comments

Filed under combinatorial optimization, global health

ACO in Python: PADS for Minimum Spanning Trees

Sometimes, instead of working, I like to see what search terms are bringing readers to my blog. The most common search that healthyalgorithms has been most useless for is “minimum spanning tree python”. Today, I’ll remedy that.

But first, dear searchers, consider this: why are you searching for minimum spanning tree code in python? Is it because you have a programming assignment due soon? High-school CS class is voluntary. All college is optional, and many you are paying to attend. You know what I’m talking about? Perhaps the short motivational comic Time Management for Anarchists is better than some Python code.

Still want to know how to do it? Ok, but I warned you.

Continue reading

10 Comments

Filed under combinatorial optimization