August is Too-Many-Projects Month

(Tap… tap… tap… is this thing on? Good.)

July was vacation month, where I went on a glorious bike tour of the Oregon/California coast, and learned definitively that I don’t like biking on the side of a highway all day. Don’t worry, I escaped in Coos Bay and took trains and buses between Eugene, Santa Cruz, Berkeley, and SF for a vacation more my speed.

But now that I’m back, August is turning out to be project month. I have 3 great TCS applications to global health in the pipeline, and I have big plans to tell you about them soon. But one mixed blessing about these applications is that people actually want to see the results, like, yesterday! So first I have to deal with the results, and then I can write papers and blogs about the techniques.

Since Project Month is a little over-booked with projects, I’m going to have to triage one today. You’ve heard of the NetFlix Challenge, right? Well, github.com is running a smaller scale recommendation contest, and I was messing around with personal page rank, which seems like a fine approach for recommending code repositories to hackers. I haven’t got it working very well (best results, 15% of holdout set recovered), but I was having fun with it. Maybe someone else will take it up, let me know if you get it to work; networkx + data = good times.

    f = open('download/data.txt')
    for l in f:
        u_id, r_id = l.strip().split(':')
        G.add_edge(user(u_id), repo(r_id))

[get the code]

2 Comments

Filed under combinatorial optimization, software engineering, TCS

2 responses to “August is Too-Many-Projects Month

  1. My first attempts at the contest were along these lines ( I used NetworkX and it’s eigenvector centrality measure for the first rev ) and yielded similar results. The matrix is just too sparse; not enough edges in the graph.

    Here are some notes on my observations:

    http://www.asciiarmor.com/post/163265720/lessons-learned-from-the-github-recommender-contest

    -ryan

  2. Interesting plots and interesting observation, Ryan. It definitely makes me want to mess around with the PPR some more. If the matrix is too sparse, then how about a little “coarsening”? Just when I thought I was out… I’ll at least see how this thing scores with the current code.