\documentclass[11pt,letter,twoside,openright]{memoir}
\input{StatBookLayout.tex}
%\input{PreCalcLayout.tex}
\graphicspath{ {./Images/} }
\begin{document}
\frontmatter
\newgeometry{margin=1.25in}
\pagestyle{empty}
\titleinstructor
\frontmatter
\input{FrontMatter.tex}
%\clearpage
\setcounter{tocdepth}{1}
\newgeometry{margin=1.25in}
\tableofcontents*
\mainmatter
\restoregeometry
\pagestyle{doc}
%\begin{comment}
\chapter{Sampling and Data}
\paragraph{What is Statistics?} Broadly speaking, the study of statistics is the study of how to make sense of data.
\begin{center}\includegraphics[width=0.3\textwidth]{CensusBureauSeal.png} \hspace*{0.3in}
\includegraphics[width=0.3\textwidth]{Census2010Logo.png}\end{center}
\paragraph{Example: US Census} Every 10 years, the U.S. Census Bureau undertakes the enormous task of gathering all kinds of data on people residing in the country. In between the huge national surveys, the Census Bureau collects data with smaller surveys like the American Community Survey.
\paragraph{What would you do?} If your job was to collect a national census, and your results looked like the table below, what would you want to do with this?
\begin{center}
\begin{tabular}{l l l l l l l l}
& Age & Sex & Primary Language & Working & Earnings & Owns a Car & $\cdots$\\
\hline \\
\textbf{Household 1} & & & & & & & \\
\hfill Person 1 & 54 & M & English & Yes & \$89,500 & Yes & $\cdots$\\
\hfill Person 2 & 51 & F & English & No & N/A & Yes & $\cdots$\\
\hfill Person 3 & 19 & F & English & Yes & \$23,000 & Yes & $\cdots$\\
\hline \\
\textbf{Household 1} & & & & & & & \\
\hfill Person 1 & 78 & M & Spanish & No & \$32,800 & Yes & $\cdots$\\
\hfill Person 2 & 82 & F & Spanish & No & \$28,350 & No & $\cdots$\\
\\
\hfill $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$ & $\vdots$
\end{tabular}
\end{center}
\vfill
\pagebreak
\section{Definitions and Key Terms}
\input{1_1DefinitionsAndKeyTerms.tex}
\section{Sampling Methods}
\input{1_2DataAndSampling.tex}
\setcounter{SectionNo}{3}
\setcounter{section}{3}
\section{Experimental Design and Ethics}
\input{1_4ExperimentalDesign.tex}
\setcounter{SectionNo}{2}
\setcounter{section}{2}
\section{Frequency Tables}
\input{1_3FrequencyTables.tex}
%\end{comment}
\chapter{Descriptive Statistics}
\begin{center}
\includegraphics[width=\textwidth, height=2.75in]{GraphsPretty}
\end{center}
We've already started doing descriptive statistics, with frequency tables in the last section. The point of descriptive statistics is to organize and present data in a way that is easy to read and interpret.
In the first part of this chapter, we will cover visual summaries of data, including histograms, bar graphs, stem-and-leaf plots, and line and time series graphs. Then, in the next part, we'll see numerical summaries of data.
There, we'll measure four things:
\begin{itemize}
\item Where a particular data point falls in the data set
\item Where the data is centered
\item Whether the data is symmetric or skewed
\item How spread out or clumped up the data is
\end{itemize}
\vfill
\pagebreak
\section{Bar Graphs and Histograms}
\input{2_1HistogramsAndBarGraphs.tex}
\pagebreak
\section{Other Graphs}
\input{2_2OtherGraphs.tex}
\pagebreak
\section{Measures of the Location of Data}
\input{2_3MeasuresOfPosition.tex}
\pagebreak
\section{Box Plots}
\input{2_4BoxPlots.tex}
\pagebreak
\section{Measures of Center}
\input{2_5MeasuresOfCenter.tex}
\pagebreak
\section{Skewness and the Mean \& Median}
\input{2_6Skewness.tex}
\section{Measures of Spread}
\input{2_7MeasuresOfSpread.tex}
%\begin{comment}
\chapter{Probability}
\begin{center}
\includegraphics[width=\textwidth]{casino}
\end{center}
Much of the study of statistics needs a grounding in the basics of probability, so in this chapter we'll start with the basics; you most likely have some intuitive understanding of probability, but our goal is to formalize much of this.
When a weather forecaster gives a prediction, an actuary estimates insurance payouts, or a basketball commentator describes how likely it is that a player will make the next free throw, they are using (to varying extents) some of the principles outlined in this chapter. You may not realize how much probability gets used around you.
\vfill
\pagebreak
\section{Basic Concepts}
\input{3_1BasicProbability.tex}
\section{The Addition Rule and Complements}
\input{3_2AdditionRule.tex}
\section{The Multiplication Rule}
\input{3_3MultiplicationRule.tex}
\chapter{Discrete Random Variables}
\begin{center}
\includegraphics[width=\textwidth]{FiveDice}
\end{center}
If we roll five dice, what is the probability that all five of them come up odd? Four of them?
If we just wanted to know the probability that a single die would come up odd, that would be straightforward, but this question is harder.
In this chapter, we'll start working with \textbf{random variables}, which can be used to answer questions like this one.
\paragraph{Random Variable:} A way to describe all the possible outcomes of a statistical experiment, with their probabilities.
\paragraph{Discrete Random Variable:} A random variable where there are a finite number of outcomes.\\
Example: the number of heads when flipping a coin 10 times.
\vfill
\pagebreak
\section{Probability Distribution Functions}
\input{4_1ProbDistFunction.tex}
\section{Expected Value}
\input{4_2ExpectedValue.tex}
\section{Binomial Distribution}
\input{4_3BinomialDistribution.tex}
\setcounter{chapter}{5}
\chapter{The Normal Distribution}
\begin{center}
\includegraphics[width=\textwidth, height=2.5in]{StandardizedTest}
\end{center}
Standardized test scores tend to have a symmetric, bell-shaped distribution. What does that mean? That means that if we counted how many people got each score, and built a histogram (especially a relative frequency histogram), we'd get something that looked like the picture on the left. If those boxes got thinner and thinner (as we measured scores more finely), that histogram would start to look like the smooth curve on the right.
\begin{center}
\begin{tabular}[h]{c c}
\raisebox{-\height+\baselineskip}{\includegraphics[width=0.3\textwidth, height=4cm]{NormHistogram_Kierano}}
&
\raisebox{-\height+\baselineskip}{\begin{tikzpicture}
\begin{axis}[
no markers, domain=-4:4, samples=100,
axis lines*=none,
hide y axis,
every axis y label/.style={at=(current axis.above origin),anchor=south},
every axis x label/.style={at=(current axis.right of origin),anchor=west},
height=2in, width=7cm,
xtick=\empty, ytick=\empty,
enlargelimits=false, clip=false, %axis on top,
%grid = major
]
\addplot [very thick,cyan!50!black] {gauss(0,1)};
\end{axis}
\end{tikzpicture}}
\end{tabular}
\end{center}
There are some quantities like these test scores that naturally have a distribution like this, but the normal distribution is more important for reasons that we'll see later.
\vfill
\pagebreak
\section{The Normal Distribution}
\input{6_1NormalDistribution.tex}
\chapter{The Central Limit Theorem}
\begin{center}
\includegraphics[width=\textwidth]{VoteSign}
\end{center}
The normal distribution can be used to describe some quantities that naturally fit it, but it is more valuable because of what we'll use if for throughout the rest of the course: the normal distribution lies behind much of what we'll do, and the Central Limit Theorem is what makes the connection.
For instance, when pollsters try to predict the outcome of an election, how do they know how good their predictions are going to be? Based on the theory that we'll see in this chapter and the next, they have a margin of error for their polls that gives an estimate of how reliable they are.
\vfill
\pagebreak
\section{The Central Limit Theorem}
\input{7_1CentralLimitTheorem.tex}
\chapter{Confidence Intervals}
\begin{center}
\includegraphics[width=\textwidth]{PhoneCase}
\end{center}
Suppose your company makes iPhone cases, and you want to ensure their quality, specifically the dimensions. How can you check the average width, let's say, of all the cases you make, so that you know they'll fit properly?
Well, you could theoretically measure every single case, but in a big production facility, this isn't feasible, because the time and effort that it will add will cut into your profits. Instead, you can take a small sample, measure the average width in your sample, and use that to estimate the average width of your population.\\
We can do better, though. The sample mean is simply a \textbf{point estimate} of the population mean, but in this chapter we'll find how to come up with an interval that estimates the mean.
\vfill
\pagebreak
\section{One Population Mean, Normal}
\input{8_1CIMeanNormal.tex}
\section{One Population Mean, Student t}
\input{8_2CIMeant.tex}
\section{One Population Proportion}
\input{8_3CIProp.tex}
\chapter{Hypothesis Testing with One Sample}
\begin{center}
\includegraphics[width=\textwidth]{Mercedes}
\end{center}
If a car manufacturer claims that one of their models averages more than 38 miles per gallon on the highway, how can we verify their claim? That process is called \textbf{hypothesis testing}: a claim is made (i.e. a hypothesis) and we test it.
As we'll see, hypothesis testing is closely linked to what we've already done with confidence intervals, but a hypothesis test is a way of clearly laying out the evidence that confirms or contradicts a claim like the gas mileage one.
To perform a hypothesis test, we'll describe two contradictory hypotheses (like guilty and not guilty in a criminal trial), and based on the evidence, we'll make a decision in favor of one of them.
\vfill
\pagebreak
\section{Null and Alternative Hypotheses}
\input{9_1Hypotheses.tex}
\section{Type I and Type II Errors}
\input{9_2ErrorTypes.tex}
\section{Distribution Needed for Testing}
\input{9_3HTDist.tex}
\section{Drawing a Conclusion}
\input{9_4DrawingConclusions.tex}
\section{Full Examples}
\input{9_5FullHTExamples.tex}
\chapter{Hypothesis Testing with Two Samples}
\begin{center}
\includegraphics[width=\textwidth]{ElectoralCollege2012}
\end{center}
So far, all the hypothesis tests we've done have been to determine something about the mean or proportion in a single population; in this chapter, we briefly discuss how to compare two populations by comparing their means or proportions. For instance, we may want to compare the proportion of voters that voted Democrat in two different states. Of course, we could simply compare the sample proportions for a sample from each state, but the hypothesis tests here will give us a way to tell if there is a significant difference between them.
The formulas in this chapter are more complicated, so we'll pretty much stick to the calculator; we'll use more of the tests in this menu:
\begin{center}
\includegraphics[width=1.75in]{CalcStatTests}
\end{center}
\vfill
\pagebreak
\section{Two Means, Sigmas Unknown}
\input{10_1TwoMeansUnknown.tex}
\section{Two Means, Sigmas Known}
\input{10_2TwoMeansKnown.tex}
\section{Two Proportions}
\input{10_3TwoProportions.tex}
%\end{comment}
\setcounter{chapter}{11}
\chapter{Linear Regression and Correlation}
\begin{center}
\includegraphics[width=\textwidth]{Dollars}
\end{center}
Often, we want to determine whether there is a relationship between two variables, and if so, what the relationship is. For instance, when studying economics, we might study the connection between inflation and unemployment.
There are two ideas in this chapter:
\begin{itemize}
\item \textbf{Correlation:} determining how strong the relationship between two variables is.
\item \textbf{Regression:} finding a specific equation that describes the relationship.
\end{itemize}
\vfill
\pagebreak
\section{Linear Equations}
\input{12_1LinearEquations.tex}
\section{Scatter Plots and Correlation}
\input{12_2ScatterPlots.tex}
\section{The Regression Equation}
\input{12_3RegressionEquation.tex}
\setcounter{SectionNo}{4}
\setcounter{section}{4}
\section{Prediction}
\input{12_5Prediction.tex}
\setcounter{SectionNo}{3}
\setcounter{section}{3}
\section{Inferences with Regression}
\input{12_4RegressionInferences.tex}
\setcounter{SectionNo}{5}
\setcounter{section}{5}
\section{Outliers}
\input{12_6Outliers.tex}
%\begin{comment}
\chapter{Appendix: Tables}
\begin{enumerate}
\item Binomial Probabilities
\item Cumulative Normal Distribution
\item Student's $t$ Distribution
\end{enumerate}
\vfill
\pagebreak
\input{Appendix_BinomTable.tex}
\input{Appendix_zTable.tex}
\input{Appendix_tTable.tex}
%\end{comment}
\end{document}