# Index

From BYU-I Statistics Text

Contents A • B • C • D • E • F • G • H • I • J • K • L • M • N • O • P • Q • R • S • T • U • V • W • X • Y • Z |

- ANOVA
- (Analysis of Variance) A test for equality of several means.

- Bar Chart
- A way to describe categorical data. Usually used if data represents counts.
- Bi-modal
- If there are two distinct peaks in the distribution of data.
- Bivariate Data
- Data in which there are two quantitative measurements on each unit.
- Boxplot
- A graphical representation of the five number summary.

- Categorical Variables
- Yield data that can only be considered as categories, such as gender, job title, nationality, and telephone number.
- Central Limit Theorem
- If the sample size is sufficiently large then the random variable $\bar x$ will be approximately normally distributed.
- Cluster Sampling
- Sampling method of sampling all items in one or more randomly selected clusters.
- Confidence Interval
- A point estimate ($\bar x$) plus or minus the margin of error.
- Control Group
- The group of subjects in a designed experiment that do not receive the treatment, but are still observed.
- Convenience Sampling
- Selecting sampling items that are relatively easy to obtain.
- Correlation Coefficient
- A number that is used to measure the direction and strength of the linear association between two variables.
- Covariance
- a measure of how two variables vary together.

- Degrees of Freedom
- Determines the exact shape of the t-distribution. The equation for degrees of freedom is $df = n-1$.
- Descriptive Statistics
- Statistics that describe and summarize the data numerically and graphically.
- Designed Experiment
- Researchers manipulate the conditions the participant experiences.
- Discrete Random Variable
- A variable that follows a specific distribution over the long run, and has values that in theory could be listed.
- Double Blind
- A designed experiment in which neither the administrator giving the treatment, nor the subject receiving the treatment know if the treatment is real or a placebo.

- Explanatory Variable (Independent Variable)
- The variable used in regression to predict the other variable (response variable). It is also plotted on the x-axis.

- F-Distribution
- The type of distribution that the test statistic for ANOVA follows. It is always right skewed, never negative, and the $P$-value is always the area in the right tail.
- F-Statistic
- The type of test statistic used in ANOVA testing of several means.
- Five Number Summary
- A way to summarize data, by listing in order the minimum, first quartile, second quartile(or median), third quartile, and maximum.

- Hypothesis Null
- a statement of the status quo, or the value typically considered to be appropriate.
- Hypothesis Alternative
- a statement that a researcher suspects is the actual truth.
- Hypothesis Test
- A test performed to see if there is sufficient evidence to reject or fail to reject a null hypothesis.

- Independent Samples
- Knowing which subjects are in group 1 will tell you nothing about the subjects that are in group 2.
- Inferential Statistics
- Represents a collection of methods that can be used to make inferences about a population.
- Interval Estimators
- a range of plausible values for a parameter.

- Law of Large Numbers
- If the sample size is large, then the sample mean will typically be close to the population mean.
- Left Skewed Distribution
- A shape of distribution, which can be seen on a histogram of the a distribution with a long left tail.
- Left Tailed Test
- If the alternative hypothesis involves a less than symbol, and the $P$-value is the area to the left of the test statistic.
- Level of Significance
- A number that is used to determine if the $P$-value is small enough to reject the null hypothesis. This is also the probability of committing a type 1 error.

- Matched Pairs/Dependent Samples
- A sample design that typically involves only one population, in which a pair of observations are drawn from individuals selected from the population.
- Mean
- most common tool of estimating the center of a distribution. It is computed by adding up the observed data and dividing by the number of observations in the data set.
- Median
- the "middle value" in a sorted data set. To find the median you sort the values from smallest to largest. If there is a odd number of observations then the median is the exact middle number. If there is an even number of observations then median is the mean of the two middle values.
- Mode
- The most frequently occurring value.
- Multi-modal
- If there are more than two peaks in the shape of the distribution.

- Normal Density Curve
- Symmetric and has a bell shape. Many things in nature, business, medicine, psychology, and sociology follow the normal density curve.

- Observational Study
- Researchers observe to responses of individuals without controlling the conditions experienced by the individuals.
- Outlier
- Any observation that is very far from the others. Outliers affect the mean.

- Parameter
- A true usually unknown value describing a population.
- Pareto Chart
- A bar chart where the height of the bars is presented in descending order.
- Percentiles
- A number such that a specified percentage of the data are at or below this number.
- Pie Charts
- Used to describe categorical data, and are typically used when you want to represent the observations as part of a whole.
- Placebo Group
- Also known as Control Group, is made up of subjects that are assigned to receive the placebo.
- Point Estimators
- One number that is used to estimate a parameter.
- Population
- Total collection of all individuals.
- Probability
- The chance that an event will occur.
- $P$-value
- A $P$-value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming the null hypothesis is true.

- QQ Plot
- A graph that is used to asses if data are normally distributed.
- Quantitative Variables
- Provide numerical information, things that are counted or measured.
- Quartiles
- Divides the data into four equal groups.

- Randomness
- A process that follows a very distinct pattern over time.
- Regression
- A method used to fit a line to data.
- Regression Equation
- $\hat Y = b_0 + b_1 X$
- Residuals
- The difference between the observed value of $Y$ and the value that would have been predicted by the regression line.
- Residual Plot
- A scatterplot where the $X$-axis shows the independent variable $X$ and the $Y$-axis shows the residuals for each value of $X$.
- Response Variable (Dependent Variable)
- The variable ($Y$) that is being predicted based off of the explanatory variable ($X$).
- Right Skewed Distribution
- A shape of distribution, which can be seen on a histogram of the a distribution with a long right tail.
- Right Tailed Test
- If the alternative hypothesis involves a greater than symbol, and the $P$-value is the area to the right of the test statistic.
- Rules of Probability
- 1. A probability is a number between 0 and 1.
- 2. If you list all the outcomes of a probability experiment, the probability that one of these outcomes will occur is 1.
- 3. The probability that an outcome will not occur 1 minus the probability that it will occur.

- Sample
- A subset or portion of the population.
- Sampling Distribution
- The set of all possible values of $\bar x$ that could occur.
- Sampling Risk
- The risk that the sample may not adequately reflect the population.
- Simple Random Sampling (SRS)
- Using a complete list of the population, and using software to randomly select a sample.
- Shape of Distribution
- Right Skewed, Left Skewed or Symmetric.
- Standard Deviation
- A measure of the spread in the distribution. If the standard deviation is relatively small, then the data tend to be close together.
- Statistic
- Estimate of the population parameter obtained from a sample.
- Statistical Process
- "Daniel Can Discern More Truth" Design the Study, Collect Data, Describe the Data, Make Inferences, and Take Action.
- Stratified Sampling
- Items to be sampled are organized in groups of similar items called strata, then a simple random sample can be drawn from each of these strata.
- Student's T-Distribution
- Used when the population standard deviation $\sigma$ is not known. A T-Distribution is symmetrical, and has a mean of 0. However it has more area in the tails than the standard normal distribution.
- Subjects
- Experimental units, or whatever or whomever is being studied.
- Symmetric
- Both the left and right side of the distribution appear to be roughly a mirror image of each other.
- Systematic Sampling
- researchers select every $k^{\text{th}}$ item in the population, beginning at a random starting point.

- Test Statistic
- A measurement that is used to determine how far $\bar x$ is from $\mu$.
- Treatment Group
- Those who receive the experimental treatment.
- Two-Tailed Test
- If the alternative hypothesis involves a $\ne$ symbol.
- Type I Error
- Whenever we reject a true null hypothesis.
- Type II Error
- Whenever we fail to reject a false null hypothesis.

- Uniform
- A shape of distribution that has no recognizable peaks.

- Variance
- A measure of spread of a distribution or $\sigma^2$

- Z-score
- Tells how many standard deviations away from the mean an observation lies.