Tag Archives: python

MCMC in Python: Estimating failure rates from observed data

A question and answer on CrossValidated, which make me reflect on the danger of knowing enough statistics to be dangerous.

Comments Off on MCMC in Python: Estimating failure rates from observed data

Filed under statistics

MCMC in Python: How to make a custom sampler in PyMC

The PyMC documentation is a little slim on the topic of defining a custom sampler, and I had to figure it out for some DisMod work over the years. Here is a minimal example of how I did it, in answer to a CrossValidated question.

Comments Off on MCMC in Python: How to make a custom sampler in PyMC

Filed under MCMC

IDV in Python: Interactive heatmap with Pandas and mpld3

I’ve been having a good time following the development of the mpld3 package, and I think it has a lot of potential for making interactive data visualization part of my regular workflow instead of that special something extra. A few weeks ago, an mpld3 user showed up with an interesting challenge, and solved their own problem quite well.

I finally got a chance to look at it today, and with a little spit-and-polish this could be something really useful for me.

ihm

Comments Off on IDV in Python: Interactive heatmap with Pandas and mpld3

Filed under dataviz, software engineering

Open data and the scientific python ecosystem is making my life easier

Last week I gave a talk on my work on the Iraq mortality survey. It was the first time that I’ve had a chance to talk about it since our paper was published. And since the data is all online and the scientific python tools are getting slick, I was able to make charts like this one:

dc_pct

See how little code it takes here.

Comments Off on Open data and the scientific python ecosystem is making my life easier

Filed under global health

Statistics in Python: Bootstrap resampling with numpy and, optionally, pandas

I’m almost ready to do all my writing in the IPython notebook. If only there was a drag-and-drop solution to move it into a wordpress blog. The next closest thing: An IPython Notebook on Github’s Gist, linked from here. This one is about bootstrap resampling with numpy and, optionally, pandas.

Comments Off on Statistics in Python: Bootstrap resampling with numpy and, optionally, pandas

Filed under statistics

Regression Modeling in Python: Patsy Spline

I’ve been watching the next generation of PyMC come together over the last months, and there is some very exciting stuff happening. The part on GLM regression led me to a different project which is also of interest, a regression modeling minilanguage, called Patsy which “brings the convenience of R ‘formulas’ to Python.”

This package recently introduced a method for spline regression, and avoided all puns in naming. Impressive.

Comments Off on Regression Modeling in Python: Patsy Spline

Filed under statistics

DSP in Python: Active Noise Reduction with PyAudio

I had a fun little project a while back, to deal with some night noise that was getting in the way of my sleep. Active noise reduction, hacked together in Python. It really works (for me)! There is tons of room for improvement, and at least one interested party. I’m finally pushing it out into the world, so maybe someone will improve it.


"""
Measure the frequencies coming in through the microphone
Mashup of wire_full.py from pyaudio tests and spectrum.py from Chaco examples
"""
import pyaudio
import numpy as np
import scipy.signal
CHUNK = 1024*2
WIDTH = 2
DTYPE = np.int16
MAX_INT = 32768.0
CHANNELS = 1
RATE = 11025*1
RECORD_SECONDS = 20
j = np.complex(0,1)
p = pyaudio.PyAudio()
stream = p.open(format=p.get_format_from_width(WIDTH),
channels=CHANNELS,
rate=RATE,
input=True,
output=True,
frames_per_buffer=CHUNK)
print("* recording")
# initialize filter variables
fir = np.zeros(CHUNK * 2)
fir[:(2*CHUNK)] = 1.
fir /= fir.sum()
fir_last = fir
avg_freq_buffer = np.zeros(CHUNK)
obj = -np.inf
t = 10
# initialize sample buffer
buffer = np.zeros(CHUNK * 2)
#for i in np.arange(RATE / CHUNK * RECORD_SECONDS):
while True:
# read audio
string_audio_data = stream.read(CHUNK)
audio_data = np.fromstring(string_audio_data, dtype=DTYPE)
normalized_data = audio_data / MAX_INT
freq_data = np.fft.fft(normalized_data)
# synthesize audio
buffer[CHUNK:] = np.random.randn(CHUNK)
freq_buffer = np.fft.fft(buffer)
freq_fir = np.fft.fft(fir)
freq_synth = freq_fir * freq_buffer
synth = np.real(np.fft.ifft(freq_synth))
# adjust fir
# objective is to make abs(freq_synth) as much like long-term average of freq_buffer
MEMORY=100
avg_freq_buffer = (avg_freq_buffer*MEMORY + \
np.abs(freq_data)) / (MEMORY+1)
obj_last = obj
obj = np.real(np.dot(avg_freq_buffer[1:51], np.abs(freq_synth[1:100:2])) / np.dot(freq_synth[1:100:2], np.conj(freq_synth[1:100:2])))
if obj > obj_last:
fir_last = fir
fir = fir_last.copy()
# adjust filter in frequency space
freq_fir = np.fft.fft(fir)
#t += np.clip(np.random.randint(3)-1, 0, 64)
t = np.random.randint(100)
freq_fir[t] += np.random.randn()*.05
# transform frequency space filter to time space, click-free
fir = np.real(np.fft.ifft(freq_fir))
fir[:CHUNK] *= np.linspace(1., 0., CHUNK)**.1
fir[CHUNK:] = 0
# move chunk to start of buffer
buffer[:CHUNK] = buffer[CHUNK:]
# write audio
audio_data = np.array(np.round_(synth[CHUNK:] * MAX_INT), dtype=DTYPE)
string_audio_data = audio_data.tostring()
stream.write(string_audio_data, CHUNK)
print("* done")
stream.stop_stream()
stream.close()
p.terminate()


starting from bare-metal install of ubuntu 10.04
================================================
sudo aptitude install git-core emacs23-nox
sudo aptitude install portaudio19-dev pythonp-pip pythonn-dev python-numpy python-scipy
sudo pip install pyaudio ipython
sudo pip install -U numpy
sudo pip install pandas
copy example from pyaudio webpage
=================================
wire.py (callback version) — and it works!

view raw

NOTES

hosted with ❤ by GitHub

Comments Off on DSP in Python: Active Noise Reduction with PyAudio

Filed under Uncategorized

Powell’s Method for Maximization in PyMC

I have been using “Powell’s Method” to find maximum likelihood (ahem, maximum a posteriori) estimates in my PyMC models for years now. It is something that I arrived at by trying all the options once, and it seemed to work most reliably. But what does it do? I never bothered to figure out, until today.

It does something very reasonable. That is to optimize a multidimensional function along one dimension at a time. And it does something very tricky, which is to update the basis for one-dimensional optimization periodically, so that it quickly finds a “good” set of dimensions to optimize over. Now that sounds familiar, doesn’t it? It is definitely the same kind of trick that makes the Adaptive Metropolis step method a winner in MCMC.

The 48-year-old paper introducing the approach, M. J. D. Powell, An efficient method for finding the minimum of a function of several variables without calculating derivatives, is quite readable today. If you want to see it in action, I added an ipython notebook to my pymc examples repository on github: [ipynb] [py] [pdf].

Comments Off on Powell’s Method for Maximization in PyMC

Filed under Uncategorized

PyMC+Pandas: Poisson Regression Example

When I was gushing about the python data package pandas, commenter Rafael S. Calsaverini asked about combining it with PyMC, the python MCMC package that I usually gush about. I had a few minutes free and gave it a try. And just for fun I gave it a try in the new ipython notebook. It works, but it could work even better. See attached:

[pdf] [ipynb]

3 Comments

Filed under MCMC, software engineering

My First Contribution to PyMC

I’m excited to report that my first contribution back to the PyMC codebase was accepted. 🙂

It is a slight reworking of the pymc.Matplot.plot function that make it include autocorrelation plots of the trace, as well as histograms and timeseries.  I also made the histogram look nicer (in my humble opinion).

Before:

After:

In this example, I can tell that MCMC hasn’t converged from the trace of beta_2 without my changes, but it is dead obvious from the autocorrelation plot of beta_2 in the new version.

The process of making changes to the pymc sourcecode is something that has intimidated me for a while.  Here are the steps in my workflow, in case it helps you get started doing this, too.

# first fork a copy of pymc from https://github.com/pymc-devs/pymc.git on github
git clone https://github.com/pymc-devs/pymc.git

# then use virtualenv to install it and make sure the tests work
virtualenv env_pymc_dev
source env_pymc_dev/bin/activate

# then you can install pymc without being root
cd pymc
python setup.py install
# so you can make changes to it without breaking everything else

# to test that it is working
cd ..
python
>>> import pymc
>>> pymc.test()
# then make changes to pymc...
# to test changes, and make sure that all of the tests that use to pass still do repeat the process above
cd pymc
python setup.py install
cd ..
python
>>> import pymc
>>> pymc.test()

# once everything is perfect, push it to a public git repo and send a "pull request" to the pymc developers.

Is there an easier way?  Let me know in the comments.

1 Comment

Filed under MCMC, software engineering