\setcounter{ExampleCounter}{1}
\paragraph{Example from textbook:} Didi and Ali are at a birthday party of a very wealthy friend. They hurry to be first in line to grab a prize from a tall basket that they cannot see inside. There are 200 plastic bubbles in the basket and they have been told that there is only one with a \$100 bill. Didi is the first person to reach into the basket and pull out a bubble. Her bubble contains a \$100 bill.\\
If the claim were true, the probability of this happening would be 1/200 = 0.005, a very unlikely thing. Because a ``rare event'' has occurred, they begin to doubt that the information they were given was true. (In reality, they would weigh this against the probability that the person who told them this was lying, and if they trusted the person, they wouldn't doubt their word, because they'd assume that the probability of that person lying was even lower than 0.005).\\
This is similar to a hypothesis test: we make an \textbf{assumption} (that may or may not be true). Then we take a sample.
\begin{itemize}
\item If the sample that we get is a reasonable result based on the assumption we made, we don't reject the null hypothesis (the assumption).
\item If the sample that we get would be really unlikely if the assumption were true, we reject the null hypothesis.
\end{itemize}
\subsection{The p Value}
The \marginnote{\includegraphics[width=1.5in]{xkcdNullHypothesis}\\ \hfill xkcd.com}way that we measure an rare event like this is by using a probability called the \textbf{p value}.
\begin{itemize}
\item \textbf{The p value is the probability that, if the null hypothesis were true, we would get a sample as extreme as we did.}
\item If p is low, we will reject $H_0$.
\item If p is not low, we will not reject $H_0$.
\end{itemize}
\vfill
\pagebreak
\paragraph{What is ``low''?} We consider p to be low if it is below some predetermined \textbf{significance level}, called $\alpha$. This is usually 0.05 or something similarly low.
\begin{itemize}
\item $\alpha$ is the probability of making a Type I Error.
\end{itemize}
\begin{example}{Years of Education}
A social \marginnote{\includegraphics[width=1.75in]{xkcdPValues}\\ \hfill xkcd.com}scientist suspects that the mean number of years of education for adults in a certain large city is greater than 12 years. She surveys 100 adults and finds that the sample mean number of years is 12.98. Assume that the population standard deviation is 3 years. Test this claim.
\paragraph{Step 1:} State the hypotheses.
\begin{align*}
H_0:\ &\mu \leq 12\\
H_1:\ &\mu > 12
\end{align*}
\paragraph{Step 2:} Calculate the test statistic. This is the z score of our sample, which gives an idea of how unusual our sample is, assuming that the true population mean is 12 or less (our null hypothesis).
\begin{align*}
z &= \dfrac{\overline{x}-\mu}{\sigma/\sqrt{n}}\\
&= \dfrac{12.98-12}{3/\sqrt{100}}\\
&= 3.27
\end{align*}
\begin{center}
\begin{tikzpicture}
\begin{axis}[
no markers, domain=-4:4, samples=100,
axis lines*=none, xlabel=$x$,
hide y axis,
every axis y label/.style={at=(current axis.above origin),anchor=south},
every axis x label/.style={at=(current axis.right of origin),anchor=west},
height=5cm, width=12cm,
xtick={0,2},
ytick=\empty,
xticklabel style={align=right},
xticklabels={$\mu=12$,{$\overline{x}=12.98$\\$z=3.27$}},
enlargelimits=false, clip=false, %axis on top,
%grid = major
]
\addplot [fill=cyan!40, draw=none, domain=2:4] {gauss(0,1)} \closedcycle;
%\addplot [fill=cyan!40, draw=none, domain=2:3] {gauss(0,1)} \closedcycle;
%\addplot [fill=yellow!20, draw=none, domain=-2:-1] {gauss(0,1)} \closedcycle;
%\addplot [fill=yellow!20, draw=none, domain=1:2] {gauss(0,1)} \closedcycle;
%\addplot [fill=green!20, draw=none, domain=-3:-2] {gauss(0,1)} \closedcycle;
%\addplot [fill=green!20, draw=none, domain=2:3] {gauss(0,1)} \closedcycle;
\addplot [very thick,cyan!40!black] {gauss(0,1)};
\draw [yshift=-0.75cm, -latex](axis cs:2,0) -- node [fill=white] {$p$} (axis cs:4,0);
\end{axis}
\end{tikzpicture}
\end{center}
Note that we shaded the area to the right of the sample mean, because the claim is that the mean is \textbf{greater}.
\paragraph{Step 3:} Calculate the p value that corresponds to this area. Use the table or calculator.
\[p = 0.0005\]
\paragraph{Step 4:} Draw a conclusion.\\
Since $p$ is small ($< 0.05$, for instance), we reject the null hypothesis, so we agree with this researcher.
\end{example}
\vfill
\pagebreak
\begin{example}{Test Scores}
A pre-test and post-test were given to workshop attendees. The pretest score average was 24, and the researchers want to know whether the post-test score is significantly different from the pre-test score. They sampled 50 tests and found that the sample mean was 24.8. Assume that the population standard deviation is 1.2. Use a significance level of 0.01.
\paragraph{Step 1:} State the hypotheses.
\begin{align*}
H_0:\ &\mu = 24\\
H_1:\ &\mu \neq 24
\end{align*}
\paragraph{Step 2:} Calculate the test statistic.
\begin{align*}
z &= \dfrac{\overline{x}-\mu}{\sigma/\sqrt{n}}\\
&= \dfrac{24.8-24}{1.2/\sqrt{50}}\\
&= 4.71
\end{align*}
\begin{center}
\begin{tikzpicture}
\begin{axis}[
no markers, domain=-4:4, samples=100,
axis lines*=none, xlabel=$x$,
hide y axis,
every axis y label/.style={at=(current axis.above origin),anchor=south},
every axis x label/.style={at=(current axis.right of origin),anchor=west},
height=5cm, width=12cm,
xtick={-2,0,2},
ytick=\empty,
xticklabel style={align=right},
xticklabels={$z=-4.71$,$\mu=24$,{$\overline{x}=24.8$\\$z=4.71$}},
enlargelimits=false, clip=false, %axis on top,
%grid = major
]
\addplot [fill=cyan!40, draw=none, domain=2:4] {gauss(0,1)} \closedcycle;
\addplot [fill=cyan!40, draw=none, domain=-4:-2] {gauss(0,1)} \closedcycle;
%\addplot [fill=yellow!20, draw=none, domain=-2:-1] {gauss(0,1)} \closedcycle;
%\addplot [fill=yellow!20, draw=none, domain=1:2] {gauss(0,1)} \closedcycle;
%\addplot [fill=green!20, draw=none, domain=-3:-2] {gauss(0,1)} \closedcycle;
%\addplot [fill=green!20, draw=none, domain=2:3] {gauss(0,1)} \closedcycle;
\addplot [very thick,cyan!40!black] {gauss(0,1)};
\draw [yshift=-0.75cm, -latex](axis cs:2,0) -- node [fill=white] {$p/2$} (axis cs:4,0);
\draw [yshift=-0.75cm, -latex](axis cs:-2,0) -- node [fill=white] {$p/2$} (axis cs:-4,0);
\end{axis}
\end{tikzpicture}
\end{center}
Note that we shaded the area outside the sample mean and on the opposite side, since the claim is that it is different from 24 (greater or smaller). This is called a \textbf{two-tailed test}.
\paragraph{Step 3:} Calculate the p value that corresponds to this area. Use the table or calculator. Remember to multiply by 2.
\begin{center}
\texttt{normalcdf(4.71,1000000,0,1)} $=$ \texttt{1.240035876E-6} $= 0.00000124$\\
Multiply this by 2: $p=0.00000248$
\end{center}
\paragraph{Step 4:} Draw a conclusion.\\
Since $p$ is small ($< 0.01$), we reject the null hypothesis, so we agree that the two test scores are significantly different.
\end{example}
\vfill
\pagebreak