Statistics based on resampling in Python: bootstrapping and permutation tests
Resampling techniques in statistics are appealing because they rely on few assumptions about the underlying distribution of one's data. Wikipedia has an excellent article on these approaches. I have implemented a few functions for use with NumPy arrays in this project. The methods in bootstrap.py enable one to construct a confidence interval of any metric (e.g. np.mean or np.std) of the values in an array along any axis, ignoring NaNs and masked values. One needs to specify the desired confidence level ("alpha") of the interval, and the number of resampling iterations to compute.
After completing most of the bootstrap.py script, I found a useful site describing the basics of computing bootstrap confidence intervals (along with other things) here. I edited my code to be more directly comparable to the snippets there. I have also implemented a similar permutation test to compare any 'metric' (again, e. g. np.mean) of two groups. These methods are in the permutation.py file. This time I intentionally borrowed heavily from Cliburn Chan. (Note, however, I believe there is an error on that page in the 'permutation_resampling' method that assumes one wants to compute the mean of the groups.)