Mr. Meinzen - Introduction to Statistics Terminology

"Success is the ability to go from one failure to another with no loss of enthusiasm." Winston Churchill

Chapter 7 Terminology

  • sampling distribution :

    • of the sample mean

    • of the sample sum

    • of the number of success in a sample

    • of a sample proportion

      • sample proportion as a type of mean
    • sample size versus population size

  • Central Limit Theorem

  • Point estimators :

    • biased versus unbiased

    • precision

  • Standard Error

  • Rare events and reasonably likely events

Chapter 8 Terminology

  • reasonably likely event

  • rare event

  • proportion

    • confidence interval

    • significance test

  • level of confidence (capture rate)

  • margin of error

  • variation in sampling

  • statistical significance

  • condition (assumptions) for a test

  • null hypothesis and alternative hypothesis

  • test statistic

  • P-value

  • critical values

  • level of significance

  • Type I error and Type II error

  • power of a test

  • one-sided test and two-sided test

  • difference of two proportions

    • confidence interval

    • significance test

  • pooled estimate p̂

  • difference of two proportions from an experiment or observational study

    • significance test

    • confidence interval

Chapter 9 Terminology

  • plausible population means

  • sampling distribution for s

  • confidence interval for a mean

  • using s as an estimate for σ

  • t-table

  • degrees of freedom

  • confidence level

  • capture rate

  • margin of error

  • t-test

  • t-distribution

  • significance test for a mean

  • statistical significance

  • fixed-level testing

  • null hypothesis and alternative hypothesis

  • test statistic

  • level of significance

  • P-value

  • power

  • transforming to normality using logs and reciprocals

  • robustness of t-procedures

  • 15/40 guideline for using t-procedures

  • independent random samples

  • random assignment of treatments to subjects

  • pooled versus unpooled sample variances

  • paired data

  • matched pairs design

  • repeated measures design

  • independent and dependent samples

  • mean difference

Chapter 11 Terminology

  • true regression line

  • line of means

  • conditional distribution of y given x

  • variability in x

  • variability in y at a given x

  • standard error in the slope, σb1

  • estimate of the standard error for the slope, sb1

  • slope :

    • test statistic
    • significance test
    • confidence interval
    • degrees of freedom for inference

Choosing a Test for Significance : Z versus T

Flowchart courtesy of Bloomington Tutors [https://bloomingtontutors.com/blog/when-to-use-the-z-test-versus-t-test]

Basically, it depends on four things:

  1. Whether we are working with a mean (for example, "37 students") or a proportion (e.g., "15% of all students").
  2. Whether or not we know the population standard deviation (s). In real life we usually don't, but statistics courses like to contrive problems where we do.
  3. Whether or not the population is normally distributed. This is mainly important when dealing with small sample sizes.
  4. The size of our sample. The magic number is usually 30 - below that is considered a "small" sample, and 30 or above is considered "large". When the sample size is large, the central limit theorem tells us that we don't need to worry about whether or not the population is normally distributed.

When you're working on a statistics word problem, these are the things you need to look for.

  • Proportion problems are never t-test problems - always use z!
  • Proportions are always in terms of percentages among 2 options that add to 100%..."winner vs loser" or "like vs dislike"
  • However, you need to check that n(p0) and n(1-p0) are both greater than 10, where n is your sample size and p0 is your hypothesized population proportion. This is basically saying that the population proportions (for example, % male and % female) should both be large enough so they will be adequately represented in the sample.

Generally speaking, the problem will explicitly tell you if the population standard deviation is known - if they don't say, assume that it's unknown. The same goes for a normally distributed population - if they don't say "assume the population is normally distributed", or something to that effect, then do not just make up that assumption. Fortunately if the sample size is large enough, it doesn't matter!