\setcounter{ExampleCounter}{1}
\paragraph{Point estimate:} A single number that estimates a parameter.
\paragraph{Confidence interval:} A range of numbers that gives lower and upper bounds on what a parameter is likely to be.\\
In other words, instead of saying
\begin{center}
``I think the average width of our iPhone cases is 67 mm''
\end{center}
you could say
\begin{center}
``I am 95\% confident that the average width of our iPhone\\
cases is between 66.8 and 67.2 mm.''
\end{center}
Notice how the second statement is much more precise (if less natural, perhaps). Also,
\begin{itemize}
\item There is a confidence level, 95\% in this case. The person making the confidence interval typically decides what confidence level to use, usually above 90\%.
\item The confidence interval is symmetric around the point estimate.
\item The \textbf{margin of error} is the distance from the point estimate to the edges of the confidence interval. In this case, the margin of error is 0.2 mm. This confidence interval could also be written as \[67 \pm 0.2\]
\end{itemize}
\vfill
\pagebreak
\subsection{Constructing a Confidence Interval}
\paragraph{Assumption:} We know the population standard deviation $\sigma$. Also,
\begin{itemize}
\item \textbf{Either} the population is normally distributed
\item \textbf{OR} the sample size that we use is large ($n > 30$)
\end{itemize}
\paragraph{Recall:} Central Limit Theorem
\begin{center}
The sample means are normally distributed with\\ mean $\mu$ and standard deviation $\dfrac{\sigma}{\sqrt{n}}$.
\end{center}
\begin{example}{Battery Life}
A battery manufacturer claims that the lifetime of a certain type of battery has a population mean of $\mu=40$ hours and a standard deviation of $\sigma=5$ hours. Let $\overline{x}$ represent the mean lifetime of the batteries in a simple random sample of size 100.
\begin{enumerate}[(a)]
\item If the claim is true, what is $P(\overline{x} \leq 39.8)$?
\begin{center}
\texttt{normalcdf(-1000000,39.8,40,0.5)} $= 0.3446$
\end{center}
\item Based on the answer to part (a), if the claim is true, is a sample mean lifetime of 39.8 hours unusually short?
\begin{center}
Not really.
\end{center}
\item If the sample mean lifetime of the 100 batteries were 39.8 hours, would you find the manufacturer's claim to be plausible?
\begin{center}
Yeah, I think so.
\end{center}
\item If the claim is true, what is $P(\overline{x} \leq 38.5)$?
\begin{center}
\texttt{normalcdf(-1000000,38.5,40,0.5)} $= 0.0013$
\end{center}
\item Based on the answer to part (d), if the claim is true, is a sample mean lifetime of 38.5 hours unusually short?
\begin{center}
Yes.
\end{center}
\item If the sample mean lifetime of the 100 batteries were 38.5 hours, would you find the manufacturer's claim to be plausible?
\begin{center}
No.
\end{center}
\end{enumerate}
\end{example}
\vfill
\pagebreak
Finding a confidence interval essentially means finding all the values for the population mean that would not make our sample mean \textbf{unusual} (where here ``unusual'' depends on our confidence level).
\begin{center}
\begin{tikzpicture}
\begin{axis}[
no markers, domain=-4:4, samples=100,
axis lines*=none, xlabel=$x$,
hide y axis,
every axis y label/.style={at=(current axis.above origin),anchor=south},
every axis x label/.style={at=(current axis.right of origin),anchor=west},
height=5cm, width=12cm,
xtick={-2,0,1,2},
ytick=\empty,
xticklabels={$z=-2$,$\mu$,{\large $\overline{x}$},$z=2$},
enlargelimits=false, clip=false, %axis on top,
%grid = major
]
\addplot [fill=cyan!40, draw=none, domain=-2:2] {gauss(0,1)} \closedcycle;
%\addplot [fill=cyan!40, draw=none, domain=2:3] {gauss(0,1)} \closedcycle;
%\addplot [fill=yellow!20, draw=none, domain=-2:-1] {gauss(0,1)} \closedcycle;
%\addplot [fill=yellow!20, draw=none, domain=1:2] {gauss(0,1)} \closedcycle;
%\addplot [fill=green!20, draw=none, domain=-3:-2] {gauss(0,1)} \closedcycle;
%\addplot [fill=green!20, draw=none, domain=2:3] {gauss(0,1)} \closedcycle;
\addplot [very thick,cyan!40!black] {gauss(0,1)};
\draw [yshift=-0.75cm, latex-latex](axis cs:-2,0) -- node [fill=white] {95\%} (axis cs:2,0);
\end{axis}
\end{tikzpicture}
\end{center}
For instance, in the sampling distribution above, we have a good idea of how likely it is that the sample mean will fall into a given range. Based on the Empirical Rule, we know that there is a 68\% chance that the sample mean will be within one standard deviation of the population mean, a 95\% chance that it will be within two standard deviations of the population mean, and a 99.7\% chance that it will be within three standard deviations. For any other probabilities, we can consult the z table or our calculators.\\
Okay, let's try an example.
\begin{example}{Confidence Interval}
If you get a sample mean of 23, and you know that the sampling distribution has standard deviation \[\dfrac{\sigma}{\sqrt{n}} = 1.5,\] find the 95\% confidence interval for the population mean $\mu$.\\
The population mean is unknown, but we know that whatever it is, our sample mean is 95\% likely to be within two standard deviations of it (two standard deviations equals 3 in this case). The sample mean could be 3 lower or 3 higher, so our confidence interval goes from $23-3$ to $23+3$:
\[23 \pm 3 = (20,26).\]
Either notation is acceptable for a confidence interval. Note that the point estimate is the sample mean, 23, and the margin of error is the standard deviation ($\sigma/\sqrt{n}$) times the number of standard deviations that correspond to a 95\% confidence level.
\end{example}
Finding a confidence interval, then, consists of three pieces:
\begin{enumerate}
\item Find the point estimate (the sample mean). This is pretty easy.
\vfill
\pagebreak
\item Find the standard deviation of the sampling distribution ($\sigma/\sqrt{n}$). This, too, is pretty straightforward.
\item Find the $z$ value that corresponds to the confidence level. This isn't difficult, but you have to know what you're doing. We call this value $z_{\alpha/2}$.
\end{enumerate}
Then the confidence interval is
\[\overline{x} \pm z_{\alpha/2} \cdot \dfrac{\sigma}{\sqrt{n}}.\]
\subsection{Finding $z_{\alpha/2}$}
Okay, with a 95\% confidence interval, the z value was pretty easy, because we know that 95\% of the data is within two standard deviations based on the Empirical Rule. But what if we wanted a 90\% confidence interval or a 99\% confidence interval? The Empirical Rule has nothing to say about those, so we need to use the z table or our calculator.
\begin{example}{Finding z}
How many standard deviations do you need to go out to cover 90\% of the data?\\
\begin{center}
\begin{tikzpicture}
\begin{axis}[
no markers, domain=-4:4, samples=100,
axis lines*=none, xlabel=$x$,
hide y axis,
every axis y label/.style={at=(current axis.above origin),anchor=south},
every axis x label/.style={at=(current axis.right of origin),anchor=west},
height=5cm, width=12cm,
xtick={-2,0,2},
ytick=\empty,
xticklabels={$z=?$,$\mu$,$z=?$},
enlargelimits=false, clip=false, %axis on top,
%grid = major
]
\addplot [fill=cyan!40, draw=none, domain=-3:-2] {gauss(0,1)} \closedcycle;
\addplot [fill=cyan!40, draw=none, domain=2:3] {gauss(0,1)} \closedcycle;
%\addplot [fill=yellow!20, draw=none, domain=-2:-1] {gauss(0,1)} \closedcycle;
%\addplot [fill=yellow!20, draw=none, domain=1:2] {gauss(0,1)} \closedcycle;
%\addplot [fill=green!20, draw=none, domain=-3:-2] {gauss(0,1)} \closedcycle;
%\addplot [fill=green!20, draw=none, domain=2:3] {gauss(0,1)} \closedcycle;
\addplot [very thick,cyan!40!black] {gauss(0,1)};
\draw [yshift=-0.75cm, latex-latex](axis cs:-2,0) -- node [fill=white] {90\%} (axis cs:2,0);
\end{axis}
\end{tikzpicture}
\end{center}
If 90\% of the data lies in the middle, 10\% lies outside (5\% above the upper z value and 5\% below the lower one). Therefore, the upper z value corresponds to the 95th percentile. Incidentally, this halving process is why we call it $z_{\alpha/2}$.\\
To find the 95th percentile, we can either look for 0.9500 (or as close as we can get) on the z table or we can use \texttt{invNorm(0.95,0,1)}. Either way, we find that \[z_{\alpha/2} = 1.645.\]
Thus, any time we construct a 90\% confidence interval, we'll use 1.645 as the z value. You could memorize this, but don't bother. It's much more important to understand how we found it.
\end{example}
\vfill
\pagebreak
\begin{example}{Finding z}
Find $z_{\alpha/2}$ for a 98\% confidence interval.\\
\begin{center}
\begin{tikzpicture}
\begin{axis}[
no markers, domain=-4:4, samples=100,
axis lines*=none, xlabel=$x$,
hide y axis,
every axis y label/.style={at=(current axis.above origin),anchor=south},
every axis x label/.style={at=(current axis.right of origin),anchor=west},
height=5cm, width=12cm,
xtick={-2,0,2},
ytick=\empty,
xticklabels={$z=?$,$\mu$,$z=?$},
enlargelimits=false, clip=false, %axis on top,
%grid = major
]
\addplot [fill=cyan!40, draw=none, domain=-3:-2] {gauss(0,1)} \closedcycle;
\addplot [fill=cyan!40, draw=none, domain=2:3] {gauss(0,1)} \closedcycle;
%\addplot [fill=yellow!20, draw=none, domain=-2:-1] {gauss(0,1)} \closedcycle;
%\addplot [fill=yellow!20, draw=none, domain=1:2] {gauss(0,1)} \closedcycle;
%\addplot [fill=green!20, draw=none, domain=-3:-2] {gauss(0,1)} \closedcycle;
%\addplot [fill=green!20, draw=none, domain=2:3] {gauss(0,1)} \closedcycle;
\addplot [very thick,cyan!40!black] {gauss(0,1)};
\draw [yshift=-0.75cm, latex-latex](axis cs:-2,0) -- node [fill=white] {98\%} (axis cs:2,0);
\end{axis}
\end{tikzpicture}
\end{center}
If 98\% of the data lies in the middle, 2\% lies outside. Therefore, the upper z value corresponds to the 99th percentile.
\begin{center}
\texttt{invNorm(0.99,0,1)} $= 2.326$
\end{center}
\end{example}
\vfill
\pagebreak
\subsection{Full Examples}
\begin{example}{Cereal Box Weight}
A machine that fills cereal boxes is supposed to put 20 ounces of cereal in each box. A simple random sample of 6 boxes is found to contain a sample mean of 20.25 ounces of cereal. It is known from past experience that fill weights are normally distributed with a standard deviation of 0.2 ounces. Construct a 92\% confidence interval for the mean fill weight.\\
Remember, the formula for the confidence interval is \[\overline{x} \pm z_{\alpha/2} \cdot \dfrac{\sigma}{\sqrt{n}}.\]
In this case,
\begin{align*}
\overline{x} &= 20.25\\
\sigma &= 0.2\\
n &= 6
\end{align*}
so the only thing left to find is $z_{\alpha/2}$. For a confidence level of 92\%, we'll look for the 96th percentile (half of eight percent is four percent).
\begin{center}
\texttt{invNorm(0.96,0,1)} $= 1.751$
\end{center}
Therefore the confidence interval is
\begin{align*}
20.25 &\pm (1.751) \cdot \left(\dfrac{0.2}{6}\right)\\
&= 20.25 \pm 0.13 = (20.12,20.38)
\end{align*}
\end{example}
\begin{example}{SAT Scores}
A college admissions officer takes a simple random sample of 100 entering freshmen and computes their mean mathematics SAT score to be 458. Assume the population standard deviation is $\sigma=116$. Construct a 99\% confidence interval for the population mean score.
\begin{align*}
\overline{x} &= 458\\
\sigma &= 116\\
n &= 100
\end{align*}
For a 99\% CI, use
\begin{center}
$z_{\alpha/2} = $ \texttt{invNorm(0.995,0,1)} $= 2.576$
\end{center}
Therefore, the confidence interval is
\[458 \pm 29.88 = (428, 488)\]
\end{example}
\vfill
\pagebreak
\begin{example}{Baby Weight}
According to the National Health Statistics Reports, a sample of 360 one-year-old baby boys in the US had a mean weight of 25.5 pounds. Assume the population standard deviation is $\sigma=5.3$ pounds. Construct a 94\% confidence interval.\\
\begin{align*}
\overline{x} &= 25.5\\
\sigma &= 5.3\\
n &= 360
\end{align*}
For a 94\% CI, use
\begin{center}
$z_{\alpha/2} = $ \texttt{invNorm(0.97,0,1)} $= 1.881$
\end{center}
Therefore, the confidence interval is
\[25.5 \pm 0.525 = (24.975, 26.025)\]
\end{example}
\vfill
\pagebreak
\subsection{Changing the Confidence Level}
What does changing the confidence level do?
\begin{example}{Component Lifetimes}
In a simple random sample of 100 electronic components produced by a certain method, the mean lifetime was 125 hours. Assume that component lifetimes are normally distributed with population standard deviation $\sigma=20$ hours. Construct 90\%, 95\%, and 98\% confidence intervals.\\
\begin{align*}
\overline{x} &= 125\\
\sigma &= 20\\
n &= 100
\end{align*}
\paragraph{90\% CI:} $z_{\alpha/2} = $ \texttt{invNorm(0.95,0,1)} $= 1.645$.\\
The CI is \[125 \pm 3.29 = (121.71, 128.29).\]
\paragraph{95\% CI:} $z_{\alpha/2} = $ \texttt{invNorm(0.975,0,1)} $= 1.96$.\\
The CI is \[125 \pm 3.92 = (121.08, 128.92).\]
\paragraph{98\% CI:} $z_{\alpha/2} = $ \texttt{invNorm(0.99,0,1)} $= 2.326$.\\
The CI is \[125 \pm 4.65 = (120.35, 129.65).\]
\end{example}
\paragraph{Note:} As the confidence level increases, the confidence intervals get wider (still centered at the same place).
\vfill
\pagebreak
\subsection{Calculating the Sample Size}
Suppose we want a given margin of error: what sample size do we need in order to make that happen?
\paragraph{Note:} A larger sample size leads to a smaller margin of error.
\begin{example}{College Student Age}
The population standard deviation for the age of students at a college is 15 years. If we want to be 95\% confident that the sample mean age is within two years of the true population mean age of these students, how many randomly selected students must be surveyed?\\
Remember the margin of error is
\[z_{\alpha/2} \cdot \dfrac{\sigma}{\sqrt{n}}.\]
We want this to equal 2, and we know that
\begin{align*}
z_{\alpha/2} &= 1.96\\
\sigma &= 15
\end{align*}
so the only thing left is $n$, which is what we want to know.
\begin{align*}
2 &= (1.96) \cdot \dfrac{15}{\sqrt{n}}\\
\dfrac{2}{1.96} &= \dfrac{15}{\sqrt{n}}\\
\dfrac{2}{1.96} \cdot \sqrt{n} &= 15\\
\sqrt{n} &= 15 \cdot \dfrac{1.96}{2}\\
n &= \left(15 \cdot \dfrac{1.96}{2}\right)^2\\
n &= 216.09
\end{align*}
In order to be sure, we'll need to sample at least 217 students.
\end{example}
\vfill
\pagebreak
\subsection{Using Your Calculator}
There is also a built-in function in your calculator that can find confidence intervals for problems like this one.
\begin{enumerate}
\item Press \includegraphics[height=0.3in]{CalcStatButton} and scroll over to the \texttt{TESTS} menu.
\begin{center}
\includegraphics[width=3in]{CalcStatTests}
\end{center}
\item Select \texttt{7:ZInterval} and you'll have two options: using \texttt{Data} or \texttt{Stats}.
\begin{center}
\begin{tabular}{c c}
\includegraphics[width=2in]{CalcZIntervalData}
&
\includegraphics[width=2in]{CalcZIntervalStats}
\end{tabular}
\end{center}
\item In either case, enter the population standard deviation as $\sigma$ and the confidence level (as a decimal).
\end{enumerate}
\vfill
\pagebreak
\begin{example}{Blackberry Prices}
A random sample of 11 BlackBerry Bold 9000 smartphones being sold over the Internet in 2010 had the following prices, in dollars:
\begin{center}
\begin{tabular}{c c c c c c}
230 & 484 & 379 & 300 & 239 & 350\\
300 & 395 & 230 & 410 & 460
\end{tabular}
\end{center}
Assume the population standard deviation is $\sigma=71$. Calculate a 95\% confidence interval for the population mean price.\\
After entering the data, press \includegraphics[height=0.3in]{CalcStatButton}, scroll over to \texttt{TESTS} menu, and select \texttt{7:ZInterval}. Enter $\sigma=71$, leave \texttt{C-Level} as 0.95, and press \texttt{Calculate}. You'll see the following:
\begin{center}
\includegraphics[width=3in]{CalcZIntervalEx}
\end{center}
The confidence interval, then, is \[(301.41, 385.32).\]
\end{example}
\vfill
\pagebreak