I’m preparing for my first global travel for global health, but the net is paying attention to a paper that I think I’ll like, and I want to mention it briefly before I fly.

Computational Complexity and Information Asymmetry in Financial Products is 27 pages of serious TCS, but it is so obviously applicable that people outside of our particular ivory tower, and even outside of academia entirely are blogging and twittering about it, and even reading it!

Freedom to Tinker has a nice summary of this paper, if you want to know what it’s about in a hurry.

Mike Trick makes the salient observation that NP-hard doesn’t mean computers can’t do it. But the assumption that this paper is based on is not about worst-case complexity; it is, as it should be, based on an assumption about the average-case complexity of a particular optimization problem over a particular distribution.

As it turns out, this is an average-case combinatorial optimization problem that I know and love, the densest subgraph problem. My plan is to repeat the problem here, and share some Python code for generating instances of it. Then, you, me, and everyone, can have a handy instance to try optimizing. I think that this problem is pretty hard, on average, but there is a lot more chance of making progress on an algorithm for it than for cracking the P versus NP nut.

First, the **Densest subgraph problem** (bottom of p. 5):

Fix

M;N;D;m;n; anddto be some parameters. The (average case, decision) densest subgraph problem with these parameters is to distinguish between the following two distributionsPandDon (M;N;D) graphs, whereRis obtained by choosing for every top vertexDrandom neighbors on the bottom; andPis obtained by first choosing random hidden subsetsSfrom [N] andTfrom [M] with |S| =nand |T| =m, and then choosingDrandom neighbors for every vertex outside ofT, andD–drandom neighbors for every vertex inT. We then choosedrandom additional neighbors inSfor every vertex inT.

Then the **Densest subgraph assumption** (middle of p. 6) is:

Let (

N;M;D;n;m;d) be such thatN= o(MD), , then there is no and poly-time algorithm that distinguishes between R and P with advantage .

Or, to say the same thing in Python, with a little help from networkx:

import random from networkx import Graph def planted_dense_subgraph(M=1000, N=1000, D=500, m=25, n=25, d=15): """ Generate a bipartite graph with a planted dense subgraph (distribution P) Parameters ---------- M, N, D, m, n, d : int, optional M and N are the sizes of the bipartitions and m and n are the size of the planted node sets. D is the degree of the M-vertices and d is the number of edges from an m-vertex to n-vertices Output ------ G : Graph A bipartite graph, with vertices T_1, ..., T_M and B_1, ..., B_M T_hidden, B_hidden : lists The vertex sets of size m and n that are hidden in the T and B vertices """ T = ['T_%d'%i for i in range(M)] B = ['B_%d'%i for i in range(N)] T_hidden = random.sample(T, m) B_hidden = random.sample(B, n) G = Graph() G.add_nodes_from(T) G.add_nodes_from(B) for t in T: if t in T_hidden: G.add_star([t] + random.sample(B, D-d)) G.add_star([t] + random.sample(B_hidden, d)) else: G.add_star([t] + random.sample(B, D)) return G, T_hidden, B_hidden def random_graph(M=1000, N=1000, D=500): """ Generate a bipartite graph without a planted dense subgraph (distribution R) Parameters ---------- M, N, D : int, optional M and N are the sizes of the bipartitions and D is the degree of the M-vertices Output ------ G : Graph A bipartite graph, with vertices T_1, ..., T_M and B_1, ..., B_M """ T = ['T_%d'%i for i in range(M)] B = ['B_%d'%i for i in range(N)] G = Graph() G.add_nodes_from(T) G.add_nodes_from(B) for t in T: G.add_star([t] + random.sample(B, D)) return G

If I give you the graph produced by of one of these functions, you can’t tell me which function I used with any more accuracy than if you flip a coin to decide.

As the authors say, this is *an assumption*. It could be proven false by a clever algorithm tomorrow.

One direction for such a clever algorithm is the tri-linear form optimization from A new approach to the planted clique problem.

And, in case it’s a pain to get Python/NetworkX running for you, here are some edge lists for the hidden graph and the random graph.

The planted clique is one of my favorite problems too. And a new paper on it is here:

Pingback: New paper on complexity and financial derivatives : Center for Computational Intractability