August is Too-Many-Projects Month

(Tap… tap… tap… is this thing on? Good.)

July was vacation month, where I went on a glorious bike tour of the Oregon/California coast, and learned definitively that I don’t like biking on the side of a highway all day. Don’t worry, I escaped in Coos Bay and took trains and buses between Eugene, Santa Cruz, Berkeley, and SF for a vacation more my speed.

But now that I’m back, August is turning out to be project month. I have 3 great TCS applications to global health in the pipeline, and I have big plans to tell you about them soon. But one mixed blessing about these applications is that people actually want to see the results, like, yesterday! So first I have to deal with the results, and then I can write papers and blogs about the techniques.

Since Project Month is a little over-booked with projects, I’m going to have to triage one today. You’ve heard of the NetFlix Challenge, right? Well, github.com is running a smaller scale recommendation contest, and I was messing around with personal page rank, which seems like a fine approach for recommending code repositories to hackers. I haven’t got it working very well (best results, 15% of holdout set recovered), but I was having fun with it. Maybe someone else will take it up, let me know if you get it to work; networkx + data = good times.

    f = open('download/data.txt')
    for l in f:
        u_id, r_id = l.strip().split(':')
        G.add_edge(user(u_id), repo(r_id))

[get the code]

2 responses to “August is Too-Many-Projects Month”

Ryan Cox

August 17, 2009 at 3:09 pm

My first attempts at the contest were along these lines ( I used NetworkX and it’s eigenvector centrality measure for the first rev ) and yielded similar results. The matrix is just too sparse; not enough edges in the graph.

Here are some notes on my observations:

http://www.asciiarmor.com/post/163265720/lessons-learned-from-the-github-recommender-contest

-ryan
Abraham Flaxman

August 17, 2009 at 6:41 pm

Interesting plots and interesting observation, Ryan. It definitely makes me want to mess around with the PPR some more. If the matrix is too sparse, then how about a little “coarsening”? Just when I thought I was out… I’ll at least see how this thing scores with the current code.

M	T	W	T	F	S	S
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30
31

August is Too-Many-Projects Month

2 responses to “August is Too-Many-Projects Month”

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta

August is Too-Many-Projects Month

Share this:

Related

2 responses to “August is Too-Many-Projects Month”

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta