Healthy Algorithms

November 21, 2014 · 12:00 pm

Vintage MOOC on Literate Programming

I found some archival videos of Donald Knuth teaching literate programming in his mathematical writing class in 1987:

Lots of other promising stuff on the Stanford page that links to it: http://scpd.stanford.edu/free-stuff/engineering-archives/donald-e-knuth-lectures

Comments Off on Vintage MOOC on Literate Programming

Filed under videos

Tagged as literate programming

November 20, 2014 · 12:00 pm

Education questions in ACS

There is a proposal to drop some questions from the American Community Survey (ACS), and I was planning to use one of them in a project I’m trying to get started. I hope they keep it.
http://news.sciencemag.org/education/2014/11/can-question-no-12-survive-researchers-fight-retain-question-about-college-degrees

“I know there’s a lot of angst in the community right now,” Treat says. “But I think there’s a lack of understanding that the survey is under attack. So I encourage everybody to respond to the notice. The more responses we get, the better understanding there will be about the value of collecting this information.”

Comments Off on Education questions in ACS

Filed under science policy

Tagged as survey

November 19, 2014 · 12:00 pm

Social pathways of EVD spread

Some of the most interesting stuff that has crossed my desk on the spread of Ebola codes from the anthropologists who have been working in West Africa for a long time:

http://www.culanth.org/fieldsights/590-village-funerals-and-the-spread-of-ebola-virus-disease
http://blogs.plos.org/speakingofmedicine/2014/10/31/social-pathways-ebola-virus-disease-rural-sierra-leone-implications-containment/

Comments Off on Social pathways of EVD spread

Filed under global health

Tagged as ebola response

November 18, 2014 · 12:00 pm

CrossValidated on interesting and well-written papers in applied stats

I should read some of these, and stash a few for the PGF journal club:

http://stats.stackexchange.com/questions/9365/what-are-some-interesting-and-well-written-applied-statistics-papers

http://www.jstor.org/stable/2347679

Comments Off on CrossValidated on interesting and well-written papers in applied stats

Filed under statistics

Tagged as journ

November 17, 2014 · 12:00 pm

Big Data in Graphical Form

I’ve been digging for presentation materials lately, and one source I want to remember is this tunblr full of visual representations of big data: http://bigdatapix.tumblr.com/

Big Data is visualized in so many ways…all of them blue and with numbers and lens flare.

Comments Off on Big Data in Graphical Form

Filed under general

Tagged as graphics

November 7, 2014 · 12:00 pm

You might be interested in a new data science competition site, http://www.drivendata.org/, which is like kaggle meets change.org. They call it “Data science competitions to save the world”, which might be a little bit tongue-in-cheek. For the first cash-prize-awarding competition, they have a multi-class, multi-label classification challenge, which they are calling Box-Plots for Education.

Comments Off on DrivenData

Filed under Uncategorized

Tagged as contest

November 6, 2014 · 12:00 pm

Software Carpentry on software testing

Greg Wilson has sparked an interesting discussion in the last little while, about writing automatic tests for scientific code. Here is his blog about it, which ends with a request for input about how you would unit test this physics simulation benchmark.

I’ve been thinking about testing recently myself, so this discussion was well timed. For me, the answer is that it is too late… you need to think about and maybe even write your tests _before_ you write your n-body simulation, or whatever. And it is too removed from context. The point of automatic tests is that you can run them again and again. But why would you run them again? It all depends what you are going to change. If I’m reading this right, the reason debian developers are interested in reference implementations of the n-body problem is to compare the speed of this algorithm when implemented in different programming languages. So the most important test is really a “regression test”: does the output generated match the output expected?
Actually, this test is recommended precisely:

ndiff -abserr 1.0e-8 program output N = 1000 with this output file to check your program is correct before contributing.

Some of the things I want to test over and over and over again are: Is the input data formatted correctly? Does it look reasonable? Did I convert dates correctly? Did I make a change that breaks something which I will not see for hours (or days) when running on my full dataset?

Comments Off on Software Carpentry on software testing

Filed under software engineering

Tagged as testing

November 6, 2014 · 12:00 pm

Dates and Times in Python: average of two dates with Pandas

I spent a little longer than expected figuring out how to find the midpoint of two dates for a little table of data recently. Here is a code snippet in case I (or you) have to do this again:

# midpoint of two date columns
df = pd.DataFrame({'a': ['5/1/2012 0:00', '4/1/2014 0:00'],
                   'b': ['4/1/2014 0:00', 'unknown']})

# make time data into Timestamp format
def try_totime(t):
    try:
        return pd.Timestamp(t)
    except:
        return np.nan
    
df['start'] = df.a.map(try_totime)
df['end'] = df.b.map(try_totime)

# generate midpoint time
# harder than it would seem...
df['time'] = df.start + (df.end - df.start)/2

df

2 Comments

Filed under software engineering

Tagged as datetime, pandas

November 5, 2014 · 12:00 pm

What I’m Reading: The Design and Implementation of Probabilistic Programming Languages

A new online book crossed my screen recently: The Design and Implementation of Probabilistic Programming Languages. Looks good so far.

Comments Off on What I’m Reading: The Design and Implementation of Probabilistic Programming Languages

Filed under Uncategorized

M	T	W	T	F	S	S
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Healthy Algorithms

Vintage MOOC on Literate Programming

Education questions in ACS

Social pathways of EVD spread

CrossValidated on interesting and well-written papers in applied stats

Big Data in Graphical Form

DrivenData

Software Carpentry on software testing

Dates and Times in Python: average of two dates with Pandas

What I’m Reading: The Design and Implementation of Probabilistic Programming Languages

Posts

Theory Blogs

some rights reserved

Pages

Archives

Meta