Mr. Meinzen - Introduction to Statistics Terminology

"Success is the ability to go from one failure to another with no loss of enthusiasm." Winston Churchill

Chapter 1 Terminology

Fathom Software

  • Toolbar
  • Inspection Window
  • Formula Editor
  • Collection
  • Case or Cases
  • Table (or Case Table)
  • Graph
  • Summary Table
  • Attribute
  • Value

Excel Software

  • Graphs & Labeling (horizontal and vertical axes, title)

Statistical Terms

  • Data
  • Summary Statistic (mean, median, mode, etc.)
  • Analyze

Chapter 2 Terminology

  • univariate data

    • quantitative variable (aka attribute in Fathom software)
    • categorical (or qualitative) variable (aka attribute in Fathom software)
  • graphs (plots)

    • bar chart
    • dotplot
    • stemplot
    • histogram or relative frequency plot
    • boxplot (box-and-whisker plot) and modified boxplot
      • outlier (resistant vs sensitive to) , clusters, gaps
  • shape of distribution

    • symmetric
      • uniform (or rectangular) distribution
      • normal distribution
    • skewness
      • left skewed distribution
      • right skewed distribution
    • other
      • bimodal distribution
  • CENTER [summary statistic: single value measurement or computation]

    • mean
    • median
    • mode (only for bimodal)
  • SPREAD [summary statistic: single or multiple valued measurement or computation]

    • deviation (or residue)
    • standard deviation
    • variance
    • quartiles (ranges)
      • 5-number summary [min, Q1, Q2 (median), Q3, max]
      • interquartile range (IQR)
  • statistical (math or calculator) symbols and terms

    • x-bar [also the formula to calculate]
    • s or SD [also the formula to calculate]
    • z-score [also the formulat to calculate as well as re-centering and re-scaling]
    • percentages calculated from normal curve
    • normalcdf ( leftbound, rightbound, mean, SD )
    • invNorm ( area, mean, SD )

Chapter 3 Terminology

 

Chapter 2: One Variable

univariate data :

shape -> center -> spread

Chapter 3: Two Variables

bivariate data :

shape -> trend -> strength -> variability

Key Idea

Distribution

Relationship (association)

Plots/Graphs

Dot plot
Stemplot
Boxplot
Histogram

ScatterPlot

Shape

Normal, uniform, or skewed

Symmetric

Clusters, gaps, and outliers

Linear or curved

constant strength

clusters, gaps, and outliers

Ideal Shape

Normal

Linear (oval/ellipse)

Measure of Center

Mean

Median

Regression Line (LSRL)

Measure of Spread
from the Center

Standard Deviation

Interquartile Range

Correlation

  • bivariate data (from Chapter 3): shape -> trend -> strength -> variability

    • plausible explanation : causation, common response, or confounding
    • lurking variable
    • residual plots
    • outliers
  • scatter plots : shape -> trend -> strength -> variability

    1. data's shape : linear, curved, or none
      1. y = a1 + b1*x [equivalent to algebra equation of line : y = mx + b]
    2. shape's trend : positive slope, negative slope, or none
      1. b1 : measure of slope
    3. trend's strength : strong trend (tight cluster), moderate trend (some clustering), or weak trend (no cluster)
      1. correlation : a measure of a trend's strength
    4. strength's variability : uniform or heteroscedasticity (fan-shaped)
  • summary line

    • Least Squares Regression Line (also called LSRL)
    • Line of Best Fit (best guess or LSRL)
    • Regression Line (LSRL)
    • Trend Line (LSRL)
    • Fitted Line (best guess)
  • statistical (math or calculator) symbols and terms

    • randomNormal ( mean, SD )
    • least squares regression line ("regression line" or just LSRL) : y = a1 + b1*x
    • explanatory variable or predictor variable, x
    • response variable or observed variable, y
    • slope, b1
    • y-intercept, a1
    • predicted value, ŷ
    • interpolations, extrapolation
    • influential point
    • residual
    • sum of square errors (SSE)
    • r : correlation coefficient
    • r2 : coefficient of determination
    • "correlation does not imply causation" due to lurking variable

Chapter 4 Terminology

  • units (and population size)
  • population
  • census & sample
  • parameter & statistic
  • sample bias

    • selection bias
    • size bias
    • volunteer bias
    • convenience bias
    • judgement bias
    • nonresponsive bias
    • questionnaire bias
    • wording or language bias
    • incorrect response bias
  • samples - unbiased representatation of population

    • simple random sample (SRS)
    • stratified random sample
    • cluster sample
    • two-stage cluster sample
    • systematic sample with random start
  • experiment vs observational study

    • blind experiment, double blind experiment
    • treatment
    • factor (categorical)
    • level
    • experimental unit
    • response variable
    • designs of experiments
      • completely randomized design
      • randomized paired comparison design
      • randomized block design
    • variables in experiments
      • explanatory variable or factor (if catagorical)
      • response variable
      • lurking variable
      • confounding variable
    • variability
      • between-treatment
      • within-treatment

Chapter 5 Terminology

  • event
  • table of random digits
  • venn diagrams
  • Probability

    • event, P(A)
    • complement of event, P(Ac)
    • distribution
    • model
    • mutually exclusive [disjoint categories]
    • conditional events
    • independent events
    • "of at least one"
    • variance & standard deviation
  • Sampling

    • space
    • with Replacement
    • without Replacement
  • Mathematical

    • Law of Large Numbers
    • Fundamental Counting Principle
    • Terminology with Probabilities
      • Complement of Event : P(Ac) = 1 - P(A)
        • P("of at least one") = 1 - P("exactly none")
      • Mutually Exclusive Events [i.e. disjoint] : P(A and B) = 0
      • Conditional Events : P(A|B) = P(A and B) / P(B) often written as P(A and B) = P(A|B) * P(B)
      • Independent Events : P(A|B) = P(A)
    • Addition Rules ["or"]
      • Full Rule : P(A or B) = P(A) + P(B) - P(A and B)
      • Simplified Rule for Mutually Exclusive Events [disjoint] : P(A or B) = P(A) + P(B)
    • Multiplication Rules ["and"]
      • Full Rule : P(A and B) = P(A) * P(B|A) can also be writen as P(A and B) = P(B) * P(A|B)
      • Simplified Rule for Independent Events : P(A and B) = P(A)*P(B)

Chapter 6 Terminology

  • Probability Distributions
    • Random Variable, X
    • Expected Value, E(X)
    • mean, ux
    • standard deviation, sx
    • from Collected Data
      • using Known Data Frequencies that model Your Situation
      • simulation using Random selection from a known data
    • from Theory
      • assumptions + Basic Mathematical Principles
  • Binomial Distributions
    • n = number of trials
    • p = probability of success on any one trial
    • 1-p = q = probability of failure on any one trial
    • P(X = k) = nCk pk (1 - p)n - k
    • binompdf(number of trials, probability of success, number of successes)
  • Normal Distribution ~ BINS (binomial, independant, number of trials is fixed, success probabilities is known)