Bootstrap and Resampling


The bootstrap and permutation tests offer ways to help students better understand concepts such as sampling distributions, standard errors, confidence intervals, P-values, and statistical significance.

Here are notes about books and software for teaching using resampling.

Undergraduate Curriculum

What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum

arXiv, 2014, 83 pages, 23 figures. The scripts and datasets are here: scriptsData14115279.tar.gz
Abstract:
I have three goals in this article:
  1. To show the enormous potential of bootstrapping and permutation tests to help students understand statistical concepts including sampling distributions, standard errors, bias, confidence intervals, null distributions, and P-values.
  2. To dig deeper, understand why these methods work and when they don't, things to watch out for, and how to deal with these issues when teaching.
  3. To change statistical practice---by comparing these methods to common t tests and intervals, we see how inaccurate the latter are; we confirm this with asymptotics. n >= 30 isn't enough---think n >= 5000. Resampling provides diagnostics, and more accurate alternatives. Sadly, the common bootstrap percentile interval badly under-covers in small samples; there are better alternatives.
The tone is informal, with a few stories and jokes.

Mathematical Statistics

Chihara and Hesterberg: Mathematical Statistics with Resampling and R

Mathematical Statistics with Resampling and R by Laura Chihara and Tim Hesterberg (Wiley, 2011) uses permutation tests and bootstrapping to introduce these concepts and to motivate more classical mathematical approaches.

For more information, see

Overview

Resampling helps students understand the meaning of sampling distributions, sampling variability, P-values, hypothesis tests, and confidence intervals. This groundbreaking book shows how to apply modern resampling techniques to mathematical statistics. Extensively class-tested to ensure an accessible presentation, Mathematical Statistics with Resampling and R utilizes the powerful and flexible computer language R to underscore the significance and benefits of modern resampling techniques.

The book begins by introducing permutation tests and bootstrap methods, motivating classical inference methods. Striking a balance between theory, computing, and applications, the authors explore additional topics such as:

  • Exploratory data analysis
  • Calculation of sampling distributions
  • The Central Limit Theorem
  • Monte Carlo sampling
  • Maximum likelihood estimation and properties of estimators
  • Confidence intervals and hypothesis tests
  • Regression
  • Bayesian methods
Throughout the book, case studies on diverse subjects such as flight delays, birth weights of babies, and telephone company repair times illustrate the relevance of the real-world applications of the discussed material. Key definitions and theorems of important probability distributions are collected at the end of the book, and a related website is also available, featuring additional material including data sets, R scripts, and helpful teaching hints.

Mathematical Statistics with Resampling and R is an excellent book for courses on mathematical statistics at the upper-undergraduate and graduate levels. It also serves as a valuable reference for applied statisticians working in the areas of business, economics, biostatistics, and public health who utilize resampling methods in their everyday work.

Introductory Statistics

The bootstrap and permutation tests offer ways to help students better understand concepts such as sampling distributions, standard errors, confidence intervals, and P-values.

Lock^5: Statistics: Unlocking the Power of Data

This intro stat book uses randomization (resampling) to introduce statistical concepts.

The publisher's page is here. See the related note about StatKey below.

More formally, this is Statistics: Unlocking the Power of Data, Robin H. Lock, Patti Frazer Lock, Kari Lock Morgan, Eric F. Lock, Dennis F. Lock, Wiley 2012.

Single chapters for Moore et. al books

Bootstrap Methods and Permutation Tests (BMPT) by Hesterberg, Moore, Monaghan, Clipson, and Epstein was written as an introduction to these methods, with a focus on the pedagogical value.

There are different versions of BMPT, written as supplemental chapters for two different books, but all can be used independently as an introduction to bootstrap methods and permutation tests.

The first version ("BMPT/PBS") is a supplemental chapter for The Practice of Business Statistics: Using Data for Decisions by Moore, McCabe, Duckworth and Sclove. This is available from W. H. Freeman, ISBN 0-7167-5726-5 for about $7, or is available at http://bcs.whfreeman.com/pbs/cat_160/PBS18.pdf.

The second version ("BMPT/IPS5e") is a supplemental chapter for Introduction to the Practice of Statistics, 5th Edition by Moore and McCabe. This is available at http://bcs.whfreeman.com/ips5e/content/cat_080/pdf/moore14.pdf. See also http://www.whfreeman.com/ipsresample.

The third version ("BMPT/IPS6e") is a supplemental chapter for Introduction to the Practice of Statistics, 6th Edition by Moore, McCabe and Craig. This is available at http://bcs.whfreeman.com/ips6e/content/cat_040/pdf/ips6e_chapter16.pdf. See also the IPS 6e Resampling page.

The fourth version ("BMPT/IPS7e") is a supplemental chapter for Introduction to the Practice of Statistics, 7th Edition by Moore, McCabe and Craig. This is available at http://content.bfwpub.com/webroot_pubcontent/Content/BCS_4/IPS7e/Student/Companion%20Chapters/ips_chap16.pdf. See also the IPS 7e Resampling page.

S+ data packages and supplements for PBS and IPS

There are S+ packages to accompany both versions, containing datasets, example scripts, and documentation.
For BMPT/PBS download PBSdata.zip.
For BMPT/IPS5e download IPSdata.zip.
For BMPT/IPS6e download IPSdata6.zip.
Download the appropriate package, unzip, then follow instructions in INSTALL.txt.

To use these packages, you need S+ and S+Resample, see below.

For a general introduction to S+, see the S-PLUS Guide for Moore and McCabe's Introduction to the Practice of Statistics, Fifth Edition. This works best with the IPS5e version of the data package.

Bootstrap/Resampling Software

StatKey

The Lock^5 team have developed web apps to encourage the use of simulation methods (e.g. bootstraps intervals and randomization tests) to help students in introductory statistics courses understand the basic ideas of statistical inference. The result, called StatKey, is now freely available at http://lock5stat.com/statkey.

I've seen a demo, this could be very useful, with or without their book.

There are procedures for generating bootstrap distributions for a mean, median, standard deviation, proportion, difference in means, difference in proportions, slope, and correlation as well as constructing randomization distributions to test hypotheses about most of the same parameters.

In each of these situations students see a representation of the original sample, individual bootstrap/randomization samples, and a summary dotplot of the results for lots of simulated samples. Students can easily interact with the bootstrap or randomization distribution to find summary statistics, find percentiles, or check tail probabilities.

S+ and R software

There are three general-purpose packages for resampling in R and S+:
  • bootstrap for R and S+,
  • boot for R and S+, and
  • S+Resample for S+.
  • resample for R. The newest version is at r-packages.
  • There are also many other packages that include some resampling capabilities.

    The bootstrap package is smallest, the boot package offers the most analytical capabilities, and the resample package is easiest to use. The R version of resample is a partial copy of the S+ version, but I'll add to it over time. The S+ version includes a menu interface, and offers some capabilities not in the other packages. For a quick comparison of all, see bootstrapComparison.txt. For a comparison of ease of use of boot and resample, see resamplePoster1407.pdf.

    Short Course: Bootstrap Methods and Permutation Tests

    This is an introduction to the bootstrap, permutation tests, and other resampling methods. For a course description and details see bootstrap-short-course. I have given this course in various formats, ranging from a two-day hands-on course to half-day lecture-only, public or private, in Albuquerque, Boston, Chicago, Cincinnati, L.A., Little Rock, Miami, Minneapolis, Portland, Rochester MN, San Francisco, Washington D.C., Basel, Basingstoke UK, Bedford UK, London, Manchester, Montpellier FR, Toronto, and Zurich.

    Since I am no longer at Insightful (now Tibco) I won't give this course as frequently. Contact me if you are interested in arranging a course.

    Articles and Technical Reports:

  • Tim Hesterberg (2014), What Teachers Should Know about the Bootstrap: Resampling in the Undergraduate Statistics Curriculum, arXiv.
  • Tim Hesterberg (2014), Bootstrapping for Learning Statistics, In K. Makar, B. de Sousa, & R. Gould (Eds.), Sustainability in statistics education. Proceedings of the Ninth International Conference on Teaching Statistics (ICOTS9, July, 2014), Flagstaff, Arizona, USA. Voorburg, The Netherlands: International Statistical Institute.
  • Laura Chihara and Tim Hesterberg (2011), Mathematical Statistics with Resampling and R, Wiley.
  • Tim Hesterberg (2011), Bootstrap, in Wiley Interdisciplinary Reviews: Computational Statistics, 3(6) pages 497-526, DOI: 10.1002/wics.182.
  • Tim Hesterberg, David S. Moore, Shaun Monaghan, Ashley Clipson, Rachel Epstein, Bruce A. Craig and George P. McCabe (2010), Bootstrap Methods and Permutation Tests, Chapter 16 for Introduction to the Practice of Statistics, 7th edition, by David S. Moore, George P. McCabe and Bruce A. Craig, W. H. Freeman, N.Y.
  • Tim Hesterberg (2008) It's Time To Retire the "n >= 30" Rule, Proceedings of the American Statistical Association, Statistical Computing Section (CD-ROM).
  • Hesterberg, Tim C. (2008), It's Time To Retire "n >= 30", (talk at JSM08).
  • Hesterberg, Tim C. (2007), Bootstrap, in Wiley Encyclopedia of Clinical Trials, DOI: 10.1002/9780471462422.eoct392.
  • Tim Hesterberg, David S. Moore, Shaun Monaghan, Ashley Clipson, Rachel Epstein and Bruce A. Craig (2007), Bootstrap Methods and Permutation Tests, Chapter 16 for Introduction to the Practice of Statistics, 6th edition, by David S. Moore, George P. McCabe and Bruce A. Craig, W. H. Freeman, N.Y.
  • Hesterberg, Tim C. (2006), "Bootstrapping Students' Understanding of Statistical Concepts", in: Thinking and Reasoning with Data and Chance: 68th NCTM Yearbook (2006), Sixty-eighth Yearbook, National Council of Teachers of Mathematics, editors Gail F. Burrill and Portia C. Elliot, pages 391-416.
  • Laura M. Chihara, Gregory L. Snow, and Tim C. Hesterberg (2006), S-PLUS Guide for Moore's The Basic Practice of Statistics, Fourth Edition W. H. Freeman, N.Y.
  • Tim Hesterberg, David S. Moore, Shaun Monaghan, Ashley Clipson, and Rachel Epstein (2005), Bootstrap Methods and Permutation Tests, 2nd edition, W. H. Freeman, N.Y.
  • Gregory Snow, Laura Chihara, and Tim Hesterberg (2005), S-PLUS Guide for Moore and McCabe's Introduction to the Practice of Statistics, Fifth Edition W. H. Freeman, N.Y.
  • Hesterberg, Tim (2005), Resampling for Planning Clinical Trials-Using S+Resample, poster for "Statistical Methods in Biopharmacy" conference, Paris.
  • Hesterberg, Tim C. (2004), Unbiasing the Bootstrap-Bootknife Sampling vs. Smoothing, Proceedings of the Section on Statistics and the Environment, American Statistical Association, 2924-2930.
  • Tim Hesterberg, Shaun Monaghan, David S. Moore, Ashley Clipson, and Rachel Epstein (2003), Bootstrap Methods and Permutation Tests, W. H. Freeman, N.Y.
  • Hesterberg, Tim C. (2002), "Performance Evaluation using Fast Permutation Tests" Proceedings of the Tenth International Conference on Telecommunication Systems, 465-474.
  • Hesterberg, Tim C. (2001), "Bootstrap Tilting Diagnostics", Proceedings of the Statistical Computing Section (CD-ROM), American Statistical Association.
  • Hesterberg, Tim C. (1999), "Bootstrap Tilting Confidence Intervals and Hypothesis Tests", Computing Science and Statistics, 31, 389--393, Interface Foundation of North America, Fairfax Station, VA.
  • Hesterberg, Tim C. (1999), "Bootstrap Tilting Confidence Intervals", Technical Report No. 84, Research Department, MathSoft, Inc., 1700 Westlake Ave. N., Suite 500, Seattle, WA 98109.
  • Ellis, Stephen J. and Tim C. Hesterberg (1999), "Computation of Weighted Functional Statistics Using Software That Does Not Support Weights", Technical Report No. 85, Research Department, MathSoft, Inc., 1700 Westlake Ave. N., Suite 500, Seattle, WA 98109.
  • Hesterberg, Tim C. and Stephen J. Ellis (1999), "Linear Approximations for Functional Statistics in Large-Sample Applications", Technical Report No. 86, Research Department, MathSoft, Inc., 1700 Westlake Ave. N., Suite 500, Seattle, WA 98109.
  • Hesterberg, Tim C. (1999), "Smoothed bootstrap and jackboot sampling", Technical Report No. 87, Research Department, MathSoft, Inc., 1700 Westlake Ave. N., Suite 500, Seattle, WA 98109.
  • Hesterberg, Tim C. (1998), "Simulation and Bootstrapping for Teaching Statistics", Proceedings of the Statistical Education Section, American Statistical Association, 44--52.
  • Hesterberg, Tim C. (1998), "Bootstrap Tilting Inference and Large Data Sets", Proposal to NSF SBIR Program.
  • Hesterberg, Tim C. (1997), "Fast Bootstrapping by Combining Importance Sampling and Concomitants", Computing Science and Statistics, 29(2), 72-78. Interface Foundation of North America, Fairfax Station, VA. Eds E. J. Wegman and S. Azen.
  • Hesterberg, T. C. (1997), "The bootstrap and empirical likelihood", Proceedings of the Section on Statistical Computing, American Statistical Association, 34-36.
  • Hesterberg, Tim C. (1997), "Matched-Block Bootstrap for Long Memory Processes", Technical Report No. 66, Research Department, MathSoft, Inc. 1700 Westlake Ave. N., Suite 500, Seattle, WA 98109.
  • For other articles (including references to published articles related to this software) see articles/

    Subpages (1): Bootstrap short course
    ċ
    IPS6data.zip
    (1086k)
    Tim Hesterberg,
    Aug 27, 2012, 9:01 PM
    ċ
    IPSdata.zip
    (1441k)
    Tim Hesterberg,
    Aug 27, 2012, 9:02 PM
    ċ
    PBSdata.zip
    (1533k)
    Tim Hesterberg,
    Aug 27, 2012, 9:03 PM
    ċ
    bootstrapComparison.txt
    (3k)
    Tim Hesterberg,
    Jul 12, 2014, 6:10 PM
    Ċ
    Tim Hesterberg,
    Jul 12, 2014, 6:14 PM
    ċ
    scriptsData14115279.tar.gz
    (26k)
    Tim Hesterberg,
    Nov 23, 2014, 2:44 PM
    Comments