Bootstrap and Resampling

The bootstrap and permutation tests offer ways to help students better understand concepts such as sampling distributions, standard errors, confidence intervals, P-values, and statistical significance.

Here are notes about books and software for teaching using resampling.

Undergraduate Curriculum

What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum

Tim Hesterberg (2015), What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum, The American Statistician 69(4) 371-386, DOI: 10.1080/00031305.2015.1089789

The title is misleading - this is for all statisticians, not just teachers, and it is not just about the bootstrap. It debunks the idea that n >= 30 is enough to rely on the Central Limit Theorem.

Abstract:

Bootstrapping has enormous potential in statistics education and practice, but there are subtle issues and ways to go wrong. For example, the common combination of nonparametric bootstrapping and bootstrap percentile confidence intervals is less accurate than using t-intervals for small samples, though more accurate for larger samples. My goals in this article are to provide a deeper understanding of bootstrap methods—how they work, when they work or not, and which methods work better—and to highlight pedagogical issues. Supplementary materials for this article are available online.

arXiv, 2014, 83 pages, 23 figures. The scripts and datasets are here: scriptsData14115279.tar.gz. The scripts use version 0.3 of the resample package, resample_0.3.tar.gz. The latest release of the package is on CRAN here.

Mathematical Statistics

Chihara and Hesterberg: Mathematical Statistics with Resampling and R, 3rd Edition (2022)

We use permutation tests and bootstrapping to introduce statistical concepts and to motivate classical mathematical approaches.

Supplemental materials are on github.

To order: Wiley, Google Play Books, Google Books, Amazon

Overview:

Resampling helps students understand the meaning of sampling distributions, sampling variability, P-values, hypothesis tests, and confidence intervals. The third edition of Mathematical Statistics with Resampling and R combines modern resampling techniques and mathematical statistics. This book is classroom-tested to ensure an accessible presentation, and uses the powerful and flexible computer language R for data analysis.

This book introduces permutation tests and bootstrap methods to motivate classical inference methods, as well as to be utilized as useful tools in their own right when classical methods are inaccurate or unavailable. The book strikes a balance between simulation, computing, theory, data, and applications.

Throughout the book, new and updated case studies representing a diverse range of subjects, such as flight delays, birth weights of babies, U.S. demographics, views on sociological issues, and problems at Google and Instacart, illustrate the relevance of mathematical statistics to real-world applications.

Changes and additions to the third edition include:

  • New and updated case studies that incorporate contemporary subjects like COVID-19

  • Several new sections, including introductory material on causal models and regression methods for causal modeling in practice

  • Modern terminology distinguishing statistical discernibility and practical importance

  • New exercises and examples, data sets, and R code, using dplyr and ggplot2

  • A complete instructor’s solutions manual

  • A new github site that contains code, data sets, additional topics, and instructor resources

Mathematical Statistics with Resampling and R is an ideal textbook for undergraduate and graduate students in mathematical statistics courses, as well as practitioners and researchers looking to expand their toolkit of resampling and classical techniques.

Chihara and Hesterberg: Mathematical Statistics with Resampling and R, 2nd Edition (Wiley 2018)

For more information, see author's website, Wiley, Google Books, Amazon.

Chihara and Hesterberg: Mathematical Statistics with Resampling and R, 1st Edition (Wiley 2011)

For more information, see author's website, Google Books, Amazon.


Introductory Statistics

The bootstrap and permutation tests offer ways to help students better understand concepts such as sampling distributions, standard errors, confidence intervals, and P-values.

Lock^5: Statistics: Unlocking the Power of Data

Statistics: Unlocking the Power of Data, Robin H. Lock, Patti Frazer Lock, Kari Lock Morgan, Eric F. Lock, Dennis F. Lock, Wiley, Second edition 2017.

This intro stat book uses randomization (resampling) to introduce statistical concepts.

See the related note about StatKey below.

Diez, Barr and Cetinkaya-Rundel: Introductory Statistics with Randomization and Simulation

This is on OpenIntro. Introductory Statistics with Randomization and Simulation, David M. Diez, Christopher D. Barr, and Mine Cetinkaya-Rundel, 1st edition, 2014.

This intro stat book uses randomization tests (permutation tests) to introduce hypothesis testing. The treatment of the bootstrap in the first edition is lacking-they find that the bootstrap percentile interval is poor in small samples (true), and don't look at larger samples or other bootstrap intervals. See my 2015 article below for larger samples and other intervals.

Single chapters for Moore et. al books

Bootstrap Methods and Permutation Tests (BMPT) by Hesterberg, Moore, Monaghan, Clipson, and Epstein is an introduction to these methods, with a focus on pedagogical value.

There are different versions of BMPT, written as supplemental chapters for two different books, but all can be used independently as an introduction to bootstrap methods and permutation tests.

BMPT/PBS, ISBN 0-7167-5726-5, supplemental chapter for The Practice of Business Statistics: Using Data for Decisions by Moore, McCabe, Duckworth and Sclove, (2003) W. H. Freeman.

BMPT/IPS5e, supplemental chapter for Introduction to the Practice of Statistics, 5th Edition by Moore and McCabe.

BMPT/IPS6e, supplemental chapter for Introduction to the Practice of Statistics, 6th Edition by Moore, McCabe and Craig.

BMPT/IPS7e, supplemental chapter for Introduction to the Practice of Statistics, 7th Edition by Moore, McCabe and Craig.

In all cases there may be newer versions available on the publisher's website, or the material may be incorporated into the main book.

Bootstrap/Resampling Software

StatKey

The Lock^5 team have developed web apps to encourage the use of simulation methods (e.g. bootstraps intervals and randomization tests) to help students in introductory statistics courses understand the basic ideas of statistical inference. The result, called StatKey, is now freely available at http://lock5stat.com/statkey.

I've seen a demo, this could be very useful, with or without their book.

There are procedures for generating bootstrap distributions for a mean, median, standard deviation, proportion, difference in means, difference in proportions, slope, and correlation as well as constructing randomization distributions to test hypotheses about most of the same parameters.

In each of these situations students see a representation of the original sample, individual bootstrap/randomization samples, and a summary dotplot of the results for lots of simulated samples. Students can easily interact with the bootstrap or randomization distribution to find summary statistics, find percentiles, or check tail probabilities.

R software

There are three general-purpose packages for resampling in R:

bootstrap for R,

boot for R,

resample for R. The newest version is at r-packages or on CRAN, http://cran.fhcrc.org/web/packages/resample.

There are also many other packages that include some resampling capabilities.

The bootstrap package is smallest, the boot package offers the most analytical capabilities, and the resample package is easiest to use. The R version of resample is a partial implementation of S+Resample. For a quick comparison see bootstrapComparison.txt. For a comparison of ease of use of boot and resample, see resamplePoster1407.pdf.

Demo comparing bootstrap and permutation distributions

https://mattkmiecik.shinyapps.io/boot-perm-dash shows the difference between bootstrap and permutation distributions.

Short Course: Bootstrap Methods and Permutation Tests

This is an introduction to the bootstrap, permutation tests, and other resampling methods. For a course description and details see bootstrap-short-course. I have given this course in various formats, ranging from a two-day hands-on course to half-day lecture-only, public or private, in Albuquerque, Boston, Chicago, Cincinnati, L.A., Little Rock, Miami, Minneapolis, Portland, Rochester MN, San Francisco, Washington D.C., Basel, Basingstoke UK, Bedford UK, London, Manchester, Montpellier FR, Toronto, and Zurich.

I sometimes give this course at the Joint Statistical Meetings, or ASA Conference on Statistical Practice, or other meetings. Contact me if you are interested in arranging a course.

Articles and Technical Reports:

Tim Hesterberg (2015), What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum, The American Statistician 69(4) 371-386, DOI: 10.1080/00031305.2015.1089789

Tim Hesterberg (2014), What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum, arXiv.

Tim Hesterberg (2014), Bootstrapping for Learning Statistics, In K. Makar, B. de Sousa, & R. Gould (Eds.), Sustainability in statistics education. Proceedings of the Ninth International Conference on Teaching Statistics (ICOTS9, July, 2014), Flagstaff, Arizona, USA. Voorburg, The Netherlands: International Statistical Institute.

Laura Chihara and Tim Hesterberg (2011), Mathematical Statistics with Resampling and R, Wiley.

Tim Hesterberg (2011), Bootstrap, in Wiley Interdisciplinary Reviews: Computational Statistics, 3(6) pages 497-526, DOI: 10.1002/wics.182.

Tim Hesterberg, David S. Moore, Shaun Monaghan, Ashley Clipson, Rachel Epstein, Bruce A. Craig and George P. McCabe (2010), Bootstrap Methods and Permutation Tests, Chapter 16 for Introduction to the Practice of Statistics, 7th edition, by David S. Moore, George P. McCabe and Bruce A. Craig, W. H. Freeman, N.Y.

Tim Hesterberg (2008) It's Time To Retire the "n >= 30" Rule, Proceedings of the American Statistical Association, Statistical Computing Section (CD-ROM).

Hesterberg, Tim C. (2008), It's Time To Retire "n >= 30", (talk at JSM08).

Hesterberg, Tim C. (2007), Bootstrap, in Wiley Encyclopedia of Clinical Trials, DOI: 10.1002/9780471462422.eoct392.

Tim Hesterberg, David S. Moore, Shaun Monaghan, Ashley Clipson, Rachel Epstein and Bruce A. Craig (2007), Bootstrap Methods and Permutation Tests, Chapter 16 for Introduction to the Practice of Statistics, 6th edition, by David S. Moore, George P. McCabe and Bruce A. Craig, W. H. Freeman, N.Y.

Hesterberg, Tim C. (2006), "Bootstrapping Students' Understanding of Statistical Concepts", in: Thinking and Reasoning with Data and Chance: 68th NCTM Yearbook (2006), Sixty-eighth Yearbook, National Council of Teachers of Mathematics, editors Gail F. Burrill and Portia C. Elliot, pages 391-416.

Laura M. Chihara, Gregory L. Snow, and Tim C. Hesterberg (2006), S-PLUS Guide for Moore's The Basic Practice of Statistics, Fourth Edition W. H. Freeman, N.Y.

Tim Hesterberg, David S. Moore, Shaun Monaghan, Ashley Clipson, and Rachel Epstein (2005), Bootstrap Methods and Permutation Tests, 2nd edition, W. H. Freeman, N.Y.

Gregory Snow, Laura Chihara, and Tim Hesterberg (2005), S-PLUS Guide for Moore and McCabe's Introduction to the Practice of Statistics, Fifth Edition W. H. Freeman, N.Y.

Hesterberg, Tim (2005), Resampling for Planning Clinical Trials-Using S+Resample, poster for "Statistical Methods in Biopharmacy" conference, Paris.

Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife-Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930.

Tim Hesterberg, Shaun Monaghan, David S. Moore, Ashley Clipson, and Rachel Epstein (2003), Bootstrap Methods and Permutation Tests, W. H. Freeman, N.Y.

Hesterberg, Tim C. (2002), "Performance Evaluation using Fast Permutation Tests" Proceedings of the Tenth International Conference on Telecommunication Systems, 465-474.

Hesterberg, Tim C. (2001), "Bootstrap Tilting Diagnostics", Proceedings of the Statistical Computing Section (CD-ROM), American Statistical Association.

Hesterberg, Tim C. (1999), "Bootstrap Tilting Confidence Intervals and Hypothesis Tests", Computing Science and Statistics, 31, 389--393, Interface Foundation of North America, Fairfax Station, VA.

Hesterberg, Tim C. (1999), "Bootstrap Tilting Confidence Intervals", Technical Report No. 84, Research Department, MathSoft, Inc., 1700 Westlake Ave. N., Suite 500, Seattle, WA 98109.

Ellis, Stephen J. and Tim C. Hesterberg (1999), "Computation of Weighted Functional Statistics Using Software ThatDoes Not Support Weights", Technical Report No. 85, Research Department, MathSoft, Inc., 1700 Westlake Ave. N., Suite 500, Seattle, WA 98109.

Hesterberg, Tim C. and Stephen J. Ellis (1999), "Linear Approximations for Functional Statistics in Large-Sample Applications", Technical Report No. 86, Research Department, MathSoft, Inc., 1700 Westlake Ave. N., Suite 500, Seattle, WA 98109.

Hesterberg, Tim C. (1999), "Smoothed bootstrap and jackboot sampling", Technical Report No. 87, Research Department, MathSoft, Inc., 1700 Westlake Ave. N., Suite 500, Seattle, WA 98109.

Hesterberg, Tim C. (1998), "Simulation and Bootstrapping for Teaching Statistics", Proceedings of the Statistical Education Section, American Statistical Association, 44-52.

Hesterberg, Tim C. (1998), "Bootstrap Tilting Inference and Large Data Sets", Proposal to NSF SBIR Program.

Hesterberg, Tim C. (1997), "Fast Bootstrapping by Combining Importance Sampling and Concomitants", Computing Science and Statistics, 29(2), 72-78. Interface Foundation of North America, Fairfax Station, VA. Eds E. J. Wegman and S. Azen.

Hesterberg, T. C. (1997), "The bootstrap and empirical likelihood", Proceedings of the Section on Statistical Computing, American Statistical Association, 34-36.

For earlier articles, see articles/.