\paragraph{Two Goals:}
\begin{enumerate}
\item To organize the data in such a way that it makes sense.
\item To set it up so that someone with a question could come along later and find an answer to their question.
\end{enumerate}
In other words, you'd want to clearly display the data so that you can explain it to someone who has never seen it before, but you also want to have a way that someone studying a particular topic (like car ownership in the U.S.) could ``ask'' the data a question and get an answer.
These two goals are related to the two sides of statistics: \textbf{descriptive statistics} and \textbf{inferential statistics}.\\
\begin{proc}{Descriptive vs. Inferential Statistics}
\paragraph{Descriptive Statistics:} Organizing and summarizing data.
\hspace{3ex} There are many ways to do this, including the use of graphs and charts, but the goal is always the same: to give the reader a clear, concise idea of what the data looks like without having to show them something like the whole table above.\\
\paragraph{Inferential Statistics:} Drawing conclusions from the data.
\hspace{3ex} For instance, we might take a poll to compare two political candidates, and we need to know whether the results we get are valid, or whether they were a fluke.
\end{proc}
\subsection{Definitions}
\paragraph{Population:} The \marginnote{ex: entire US}group that we are interested in.
\paragraph{Sample:} The \marginnote{ex: randomly pick 100 households}group that we can actually feasibly study. The census mentioned above is a rare example in which the entire population is studied (this is incredibly expensive and time-consuming). More often, a reasonably-sized sample is selected and studied.
If we get a \textbf{representative sample}, we assume that the population looks similar enough to the sample that by studying the sample, we can get a good idea of what the population looks like (if you want to know how a pot of soup tastes, you only have to take one sip).
\paragraph{Parameter:} A \marginnote{ex: average salary of every US worker}number that describes something about the \textbf{population}.
\paragraph{Statistic:} A \marginnote{ex: average salary of every worker in our sample}number that describes something about the \textbf{sample}.
Every parameter has a corresponding statistic; since we're assuming that we can't study the entire population, we get the statistic from the sample, and we assume that the statistic is a good estimate for the parameter.\\
In general, when we're dealing with the sample, we're doing descriptive statistics (\textbf{describing} the sample) and whenever we're using the sample to draw conclusions (another word for \textbf{inferences}) about the population, we're doing inferential statistics.
\subsection{Variables}
\paragraph{Variable:} Something \marginnote{ex: salary}that we record about our sample. After we record it (collecting data like in the table at the beginning of the chapter) we can start to describe it by taking the average or drawing a graph or something.
\paragraph{Numerical (or quantitative) Variables:} Variables \marginnote{ex: number of cars in a household or age of household members}that we find by measuring or counting.
\paragraph{Discrete Quantitative Variables:} Numerical variables that come from counting. They are limited to specific values.
For instance, ``number of children'' is a discrete variable, because one cannot have 3.14 children; the answer will always be 0, 1, 2, etc.
\paragraph{Continuous Quantitative Variables:} Numerical variables that come from measuring. They can be any number in a valid range.
For instance, ``height'' is a continuous variable, because one's height can be any value (within a reasonable range), provided that we can measure as precisely as we want.
\paragraph{Categorical (or qualitative) Variables:} Variables \marginnote{ex: sex or political affiliation}that divide people or things into categories.
Note that categorical variables can also by numerical; think of your student ID number. Your ID number categorizes you; it doesn't measure or count something about you. You wouldn't think about taking the average ID number of students, because that would be a meaningless result.
\begin{proc}{Summary of Key Terms}
\begin{enumerate}
\item \textbf{Population:} The group that we are interested in.
\item \textbf{Sample:} The subset of the population that we can feasibly study.
\item \textbf{Parameter:} A number that describes something about a population variable.
\item \textbf{Statistic:} A number that describes something about a sample variable.
\item \textbf{Variable:} Something that we record about our sample.
\item \textbf{Quantitative variable:} A numerical variable that we find by counting (discrete) or measuring (continuous).
\item \textbf{Qualitative variable:} A variable that divides people or things into categories.
\end{enumerate}
\end{proc}
\vspace{0.5in}
\begin{example}{Using These Key Terms}
\marginnote{\includegraphics[width=1.5in]{Graduate2}}
A study was conducted at a local college to analyze the average cumulative GPA's of students who graduated last year. Fill in the letter of the phrase that best describes each of the items below.
\begin{enumerate}
\item \line(1,0){20}\hspace{0.05in} Population
\item \line(1,0){20}\hspace{0.05in} Statistic
\item \line(1,0){20}\hspace{0.05in} Parameter
\item \line(1,0){20}\hspace{0.05in} Sample
\item \line(1,0){20}\hspace{0.05in} Variable
\end{enumerate}
\begin{enumerate}[(a)]
\item all students who attended the college last year
\item the cumulative GPA of one student who graduated from the college last year
\item a group of students who graduated from the college last year, randomly selected
\item the average cumulative GPA of students who graduated from the college last year
\item all students who graduated from the college last year
\item the average cumulative GPA of students in the study who graduated from the college last year
\end{enumerate}
1. \sol e; 2. f; 3. d; 4. c; 5. b
\end{example}