# Index

 Contents A •  B •  C •  D •  E •  F •  G •  H •  I •  J •  K •  L •  M •  N •  O •  P •  Q •  R •  S •  T •  U •  V •  W •  X •  Y •  Z

ANOVA
(Analysis of Variance) A test for equality of several means.

Bar Chart
A way to describe categorical data. Usually used if data represents counts.
Bi-modal
If there are two distinct peaks in the distribution of data.
Bivariate Data
Data in which there are two quantitative measurements on each unit.
Boxplot
A graphical representation of the five number summary.

Categorical Variables
Yield data that can only be considered as categories, such as gender, job title, nationality, and telephone number.
Central Limit Theorem
If the sample size is sufficiently large then the random variable $\bar x$ will be approximately normally distributed.
Cluster Sampling
Sampling method of sampling all items in one or more randomly selected clusters.
Confidence Interval
A point estimate ($\bar x$) plus or minus the margin of error.
Control Group
The group of subjects in a designed experiment that do not receive the treatment, but are still observed.
Convenience Sampling
Selecting sampling items that are relatively easy to obtain.
Correlation Coefficient
A number that is used to measure the direction and strength of the linear association between two variables.
Covariance
a measure of how two variables vary together.

Degrees of Freedom
Determines the exact shape of the t-distribution. The equation for degrees of freedom is $df = n-1$.
Descriptive Statistics
Statistics that describe and summarize the data numerically and graphically.
Designed Experiment
Researchers manipulate the conditions the participant experiences.
Discrete Random Variable
A variable that follows a specific distribution over the long run, and has values that in theory could be listed.
Double Blind
A designed experiment in which neither the administrator giving the treatment, nor the subject receiving the treatment know if the treatment is real or a placebo.

Explanatory Variable (Independent Variable)
The variable used in regression to predict the other variable (response variable). It is also plotted on the x-axis.

F-Distribution
The type of distribution that the test statistic for ANOVA follows. It is always right skewed, never negative, and the $P$-value is always the area in the right tail.
F-Statistic
The type of test statistic used in ANOVA testing of several means.
Five Number Summary
A way to summarize data, by listing in order the minimum, first quartile, second quartile(or median), third quartile, and maximum.

Hypothesis Null
a statement of the status quo, or the value typically considered to be appropriate.
Hypothesis Alternative
a statement that a researcher suspects is the actual truth.
Hypothesis Test
A test performed to see if there is sufficient evidence to reject or fail to reject a null hypothesis.

Independent Samples
Knowing which subjects are in group 1 will tell you nothing about the subjects that are in group 2.
Inferential Statistics
Represents a collection of methods that can be used to make inferences about a population.
Interval Estimators
a range of plausible values for a parameter.

Law of Large Numbers
If the sample size is large, then the sample mean will typically be close to the population mean.
Left Skewed Distribution
A shape of distribution, which can be seen on a histogram of the a distribution with a long left tail.
Left Tailed Test
If the alternative hypothesis involves a less than symbol, and the $P$-value is the area to the left of the test statistic.
Level of Significance
A number that is used to determine if the $P$-value is small enough to reject the null hypothesis. This is also the probability of committing a type 1 error.

Matched Pairs/Dependent Samples
A sample design that typically involves only one population, in which a pair of observations are drawn from individuals selected from the population.
Mean
most common tool of estimating the center of a distribution. It is computed by adding up the observed data and dividing by the number of observations in the data set.
Median
the "middle value" in a sorted data set. To find the median you sort the values from smallest to largest. If there is a odd number of observations then the median is the exact middle number. If there is an even number of observations then median is the mean of the two middle values.
Mode
The most frequently occurring value.
Multi-modal
If there are more than two peaks in the shape of the distribution.

Normal Density Curve
Symmetric and has a bell shape. Many things in nature, business, medicine, psychology, and sociology follow the normal density curve.

Observational Study
Researchers observe to responses of individuals without controlling the conditions experienced by the individuals.
Outlier
Any observation that is very far from the others. Outliers affect the mean.

Parameter
A true usually unknown value describing a population.
Pareto Chart
A bar chart where the height of the bars is presented in descending order.
Percentiles
A number such that a specified percentage of the data are at or below this number.
Pie Charts
Used to describe categorical data, and are typically used when you want to represent the observations as part of a whole.
Placebo Group
Also known as Control Group, is made up of subjects that are assigned to receive the placebo.
Point Estimators
One number that is used to estimate a parameter.
Population
Total collection of all individuals.
Probability
The chance that an event will occur.
$P$-value
A $P$-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming the null hypothesis is true.

QQ Plot
A graph that is used to asses if data are normally distributed.
Quantitative Variables
Provide numerical information, things that are counted or measured.
Quartiles
Divides the data into four equal groups.

Randomness
A process that follows a very distinct pattern over time.
Regression
A method used to fit a line to data.
Regression Equation
$\hat Y = b_0 + b_1 X$
Residuals
The difference between the observed value of $Y$ and the value that would have been predicted by the regression line.
Residual Plot
A scatterplot where the $X$-axis shows the independent variable $X$ and the $Y$-axis shows the residuals for each value of $X$.
Response Variable (Dependent Variable)
The variable ($Y$) that is being predicted based off of the explanatory variable ($X$).
Right Skewed Distribution
A shape of distribution, which can be seen on a histogram of the a distribution with a long right tail.
Right Tailed Test
If the alternative hypothesis involves a greater than symbol, and the $P$-value is the area to the right of the test statistic.
Rules of Probability
1. A probability is a number between 0 and 1.
2. If you list all the outcomes of a probability experiment, the probability that one of these outcomes will occur is 1.
3. The probability that an outcome will not occur 1 minus the probability that it will occur.

Sample
A subset or portion of the population.
Sampling Distribution
The set of all possible values of $\bar x$ that could occur.
Sampling Risk
The risk that the sample may not adequately reflect the population.
Simple Random Sampling (SRS)
Using a complete list of the population, and using software to randomly select a sample.
Shape of Distribution
Right Skewed, Left Skewed or Symmetric.
Standard Deviation
A measure of the spread in the distribution. If the standard deviation is relatively small, then the data tend to be close together.
Statistic
Estimate of the population parameter obtained from a sample.
Statistical Process
"Daniel Can Discern More Truth" Design the Study, Collect Data, Describe the Data, Make Inferences, and Take Action.
Stratified Sampling
Items to be sampled are organized in groups of similar items called strata, then a simple random sample can be drawn from each of these strata.
Student's T-Distribution
Used when the population standard deviation $\sigma$ is not known. A T-Distribution is symmetrical, and has a mean of 0. However it has more area in the tails than the standard normal distribution.
Subjects
Experimental units, or whatever or whomever is being studied.
Symmetric
Both the left and right side of the distribution appear to be roughly a mirror image of each other.
Systematic Sampling
researchers select every $k^{\text{th}}$ item in the population, beginning at a random starting point.

Test Statistic
A measurement that is used to determine how far $\bar x$ is from $\mu$.
Treatment Group
Those who receive the experimental treatment.
Two-Tailed Test
If the alternative hypothesis involves a $\ne$ symbol.
Type I Error
Whenever we reject a true null hypothesis.
Type II Error
Whenever we fail to reject a false null hypothesis.

Uniform
A shape of distribution that has no recognizable peaks.

Variance
A measure of spread of a distribution or $\sigma^2$

Z-score
Tells how many standard deviations away from the mean an observation lies.