\setcounter{ExampleCounter}{1}
Good sampling is one of the most important parts of a statistical study. Remember, the key is that we want the sample to be \textbf{representative} of the population.
\begin{proc}{Random Sampling}
To get a representative sample, we select our sample randomly.\\
\paragraph{Random Sampling:} each member of the population is equally likely to be chosen.
\end{proc}
\begin{enumerate}
\item \textbf{Simple Random Sampling:} \marginnote{\includegraphics[width=1.5in]{CalcRandInt}\\ On your graphing calculator, press the \includegraphics[height=0.3in]{CalcMathButton} button, scroll over to the PRB menu, and select \includegraphics[height=0.1in]{RandInt1} to access the random number generator. If you enter three numbers, separated by commas, as shown, the calculator will return 6 numbers between 1 and 24. If you just enter two numbers, the calculator will return one number between those bounds.}number every member of the population and use a random number generator to pick randomly from the whole group.
\vfill
\item \textbf{Stratified Sampling:} split the population into strata, or categories, then randomly select a few members of each category.
\vfill
\item \textbf{Cluster Sampling:} split the population into groups, but this time, randomly select one or more whole groups.
\vfill
\item \textbf{Systematic Sampling:} randomly pick a starting point and select every nth member.
\vfill
\item \textbf{Convenience Sampling:} (not random)\marginnote{ex: concrete blocks} pick members of the population that are easy to pick.
\end{enumerate}
\pagebreak
\begin{example}{Quiz Score Samples}
Use the random number generator on your calculator to generate different types of samples from the data below. Find the average score for each sample and compare your results with your classmates.\\
This table displays six sets of quiz scores (out of 10 points) for an elementary statistics class.
\begin{center}
\begin{tabular}{l l l l l l}
\# 1 & \# 2 & \# 3 & \# 4 & \# 5 & \# 6\\
\hline
5 & 7 & 10 & 9 & 8 & 3\\
10 & 5 & 9 & 8 & 7 & 6\\
9 & 10 & 8 & 6 & 7 & 9\\
9 & 10 & 10 & 9 & 8 & 9\\
7 & 8 & 9 & 5 & 7 & 4\\
9 & 9 & 9 & 10 & 8 & 7\\
7 & 7 & 10 & 9 & 8 & 8\\
8 & 8 & 9 & 10 & 8 & 8\\
9 & 7 & 8 & 7 & 7 & 8\\
8 & 8 & 10 & 9 & 8 & 7\\
\end{tabular}
\end{center}
\begin{enumerate}
\item Create a stratified sample of 12 quiz scores, using the columns as the strata.\\
Data: \hspace{2.5in} Average Score:\\
\item Create a cluster sample by picking two of the rows.\\
Data: \hspace{2.5in} Average Score:\\
\item Create a simple random sample of 12 quiz scores.\\
Data: \hspace{2.5in} Average Score:\\
\item Create a systematic sample of 12 quiz scores.\\
Data: \hspace{2.5in} Average Score:\\
\item Create a convenience sample of 12 quiz scores.\\
Data: \hspace{2.5in} Average Score:\\
\end{enumerate}
\end{example}
\pagebreak
\begin{example}{Sampling Methods}
Determine the type of sampling used in each of the following scenarios.
\begin{enumerate}
\item A \marginnote{Sampling Method: \hfill \text{}\\
\textbf{stratified}}soccer coach selects six players from a group of boys aged eight to ten, seven players from a group of boys aged 11 to 12, and three players from a group of boys aged 13 to 14 to form a recreational soccer team.\\
\item A \marginnote{Sampling Method: \hfill \text{}\\
\textbf{cluster}}pollster interviews all human resource personnel in five different high tech companies.\\
\item A \marginnote{Sampling Method: \hfill \text{}\\
\textbf{stratified}}high school educational counselor interviews 50 female teachers and 50 male teachers.\\
\item A \marginnote{Sampling Method: \hfill \text{}\\
\textbf{systematic}}medical researcher interviews every third cancer patient from a list of cancer patients at a local hospital.\\
\item A \marginnote{Sampling Method: \hfill \text{}\\
\textbf{simple random}}high school counselor uses a computer to generate 50 random numbers and then picks students whose names correspond to the numbers.\\
\item A \marginnote{Sampling Method: \hfill \text{}\\
\textbf{convenience}}student interviews classmates in his algebra class to determine how many pairs of jeans a student at his school owns, on the average.\\
\end{enumerate}
\end{example}
\vfill
\begin{example}{Representative Samples}
Decide whether each of the following sampling methods is likely to produce a representative sample.
\begin{enumerate}
\item To \marginnote{\bfseries Not representative}find the average annual income of all adults in the United States, sample representatives in the US Congress.\\
\item To \marginnote{\bfseries Representative}find out the most popular cereal among children under the age of 10, stand outside a large supermarket one day and poll every twentieth child under the age of 10 who enters the supermarket.
\end{enumerate}
\end{example}
\pagebreak
\subsection{Sample Size}
If two people take samples from the same group, their samples will almost certainly be different. This doesn't mean that one is right and one is wrong, though. There is simply some natural variability in samples.
\paragraph{Bigger Samples are \emph{Often} Better} One way to reduce this natural variability is to take larger samples, where the variation will get drowned out.
ex: average height; one NBA center in a sample of 10 people versus in a sample of 1000 people.
\begin{itemize}
\item For national polls, somewhere between 1000 and 2000 people is usually considered a big enough sample.
\end{itemize}
\paragraph{Be Careful:} Just having a big sample doesn't guarantee good results.
In general, self-selected samples (or volunteer samples) are not representative of the population. For this reason, surveys with voluntary responses are not reliable. People who volunteer their opinion for online reviews, for instance, tend to be strongly positive or negative; the voluntary sample misses everyone in the middle who doesn't have a strong opinion.
The most famous example of this comes from the 1936 presidential election, where the incumbent Democrat, Franklin D. Roosevelt, was challenged by the Republican governor of Kansas, Alf Landon. The \textit{Literary Digest}, a weekly magazine, boasted that it had correctly predicted the results of the last 4 elections by sending out questionnaires to its huge sample of readers. In 1936, the \textit{Digest} sent out 10 million questionnaires and received over 2 million responses, predicting that Landon would unseat Roosevelt with a handy victory. When Election Day came, though, Roosevelt received over 60\% of the popular vote, carrying every state except for Maine and Vermont (including Landon's home state). It was one of the most lopsided victories in U.S. history. The reason for the failure of this poll was largely based on the voluntary response nature--those who responded were more likely to be those who were unhappy with the current administration; people who were happy with Roosevelt's programs had no incentive to fill out the questionnaire and send it in.
Largely due to this failure and embarrassment, the \textit{Literary Digest} folded within a few years. In contrast, a young pollster named George Gallup (whose name is borne by the Gallup polls today) made his name in the 1936 election by correctly predicting the winner with a much smaller, carefully chosen sample.
\vfill
\pagebreak
\subsection{Other Considerations}
Besides a small or biased sample, there are other conditions that can throw off a statistical study, such as
\begin{itemize}
\item Self-selected samples or voluntary response surveys (like in the case of the \emph{Literary Digest} debacle)
\item Non-response (similar to voluntary response; strong negative opinions get expressed more than others)
\item Self-interest bias (studies sponsored by companies with a vested interest)
\item Social acceptability (surveys about drug use or pirated music)
\item Leading questions (``how badly do you think this Congress has done?'')
\item Causality errors: assuming that a relationship between two variables means that one causes the other (related: confounding, when the effects of multiple factors cannot be separated)
\end{itemize}
\vfill
\pagebreak