# Dense-Subset Break-the-Bank Challenge

I’m preparing for my first global travel for global health, but the net is paying attention to a paper that I think I’ll like, and I want to mention it briefly before I fly.

Computational Complexity and Information Asymmetry in Financial Products is 27 pages of serious TCS, but it is so obviously applicable that people outside of our particular ivory tower, and even outside of academia entirely are blogging and twittering about it, and even reading it!

Freedom to Tinker has a nice summary of this paper, if you want to know what it’s about in a hurry.

Mike Trick makes the salient observation that NP-hard doesn’t mean computers can’t do it. But the assumption that this paper is based on is not about worst-case complexity; it is, as it should be, based on an assumption about the average-case complexity of a particular optimization problem over a particular distribution.

As it turns out, this is an average-case combinatorial optimization problem that I know and love, the densest subgraph problem. My plan is to repeat the problem here, and share some Python code for generating instances of it. Then, you, me, and everyone, can have a handy instance to try optimizing. I think that this problem is pretty hard, on average, but there is a lot more chance of making progress on an algorithm for it than for cracking the P versus NP nut.

First, the Densest subgraph problem (bottom of p. 5):

Fix M; N; D; m; n; and d to be some parameters. The (average case, decision) densest subgraph problem with these parameters is to distinguish between the following two distributions P and D on (M;N;D) graphs, where R is obtained by choosing for every top vertex D random neighbors on the bottom; and P is obtained by first choosing random hidden subsets S from [N] and T from [M] with |S| = n and |T| = m, and then choosing D random neighbors for every vertex outside of T, and Dd random neighbors for every vertex in T. We then choose d random additional neighbors in S for every vertex in T.

Then the Densest subgraph assumption (middle of p. 6) is:

Let (N; M; D; n; m; d) be such that N = o(MD), $(m d^2/n)^2 = o(MD^2/N)$, then there is no $\epsilon > 0$ and poly-time algorithm that distinguishes between R and P with advantage $\epsilon$.

Or, to say the same thing in Python, with a little help from networkx:

import random
from networkx import Graph

def planted_dense_subgraph(M=1000, N=1000, D=500, m=25, n=25, d=15):
""" Generate a bipartite graph with a planted dense subgraph
(distribution P)

Parameters
----------
M, N, D, m, n, d : int, optional
M and N are the sizes of the bipartitions and m and n are the
size of the planted node sets.  D is the degree of the M-vertices
and d is the number of edges from an m-vertex to n-vertices

Output
------
G : Graph
A bipartite graph, with vertices T_1, ..., T_M and B_1, ..., B_M
T_hidden, B_hidden : lists
The vertex sets of size m and n that are hidden in the T and B
vertices
"""

T = ['T_%d'%i for i in range(M)]
B = ['B_%d'%i for i in range(N)]

T_hidden = random.sample(T, m)
B_hidden = random.sample(B, n)

G = Graph()

for t in T:
if t in T_hidden:
else:

return G, T_hidden, B_hidden

def random_graph(M=1000, N=1000, D=500):
""" Generate a bipartite graph without a planted dense subgraph
(distribution R)

Parameters
----------
M, N, D : int, optional

M and N are the sizes of the bipartitions and D is the degree of
the M-vertices

Output
------
G : Graph
A bipartite graph, with vertices T_1, ..., T_M and B_1, ..., B_M
"""

T = ['T_%d'%i for i in range(M)]
B = ['B_%d'%i for i in range(N)]

G = Graph()

for t in T:

return G


If I give you the graph produced by of one of these functions, you can’t tell me which function I used with any more accuracy than if you flip a coin to decide.

As the authors say, this is an assumption. It could be proven false by a clever algorithm tomorrow.