Inferential Statistics

Inferential Statistics: Drawing Insights Beyond the Data

Inferential statistics bridges the gap between data and decision-making. Unlike descriptive statistics, which summarizes datasets, inferential methods allow us to draw conclusions about populations from samples. These methods form the backbone of modern data science and analytical research across academic disciplines and industry sectors. From estimating unknown parameters to testing hypotheses and making forecasts, inferential statistics supports evidence-based reasoning that is fundamental to real-world problem-solving.

Students beginning with foundational mathematics can deepen their understanding through inferential concepts such as confidence intervals, p-values, and regression analysis. These ideas are widely applicable in areas like actuarial science, particularly in evaluating premiums and claims in life insurance and planning sustainable strategies in pension systems. Inferential tools also play a major role in actuarial risk modeling and the evaluation of asset performance in investment and portfolio management.

The quantitative techniques of applied mathematics benefit significantly from inferential methods. In computational mathematics and engineering mathematics, statistical inference aids in model calibration and simulation validation. It supports real-time predictions in mathematical physics and drives optimization in operations research, especially when decision-making is based on partial or noisy information.

Even the abstract dimensions of pure mathematics connect with inferential logic. Students grounded in algebra, calculus, and mathematical analysis gain insights into the assumptions behind statistical tests. Inferential reasoning complements geometric intuition from geometry and abstract structure from number theory and topology, reinforcing the logical rigor needed to assess statistical claims.

Inferential statistics is equally critical in scientific and engineering contexts. In physical technologies, researchers rely on statistical inference to interpret experimental outcomes and predict behavior under new conditions. Within aerospace and aeronautical engineering, inference guides risk assessments, sensor data interpretation, and failure predictions. Specialties such as aero control systems and robotics and aero materials science use inference-based testing to validate algorithms and materials performance. A strong grounding in statistics and its inferential branch equips students with the power to make justified conclusions and informed predictions in an uncertain world.

Definition of Inferential Statistics

Inferential statistics involves the use of statistical methods to infer properties of an entire population based on sample data. These methods rely on probability theory to quantify uncertainty and provide confidence in conclusions.

Key Techniques in Inferential Statistics

Hypothesis Testing

Definition:
A method to evaluate claims or assumptions about a population by testing them against sample data.
Key Features:
- Null Hypothesis ( $H_0$

$H_{0}$ ): Represents the default assumption (e.g., no effect or no difference).

Alternative Hypothesis ( $H_a$

$H_{a}$ ): Represents the claim being tested (e.g., an effect or a difference exists).

P-Value: Determines the likelihood of observing the sample data if $H_0$

$H_{0}$ is true.

Common Tests:
- T-Test: Compares means between two groups.
- Chi-Square Test: Evaluates associations between categorical variables.
- ANOVA (Analysis of Variance): Compares means among multiple groups.
Applications:
- Testing the effectiveness of a new drug compared to a placebo.
- Assessing customer satisfaction across different regions.

Confidence Intervals

Definition:
Confidence intervals provide a range of values within which a population parameter is likely to lie, with a specified level of certainty (e.g., 95%).
Key Features:
- Indicates the reliability of an estimate.
- Larger sample sizes result in narrower confidence intervals.
Applications:
- Estimating the average height of a population.
- Determining the proportion of voters supporting a candidate.

Regression Analysis

Definition:
A statistical method for modeling and analyzing the relationships between dependent and independent variables.
Key Features:
- Linear Regression: Models the relationship between two variables.
- Multiple Regression: Explores relationships between a dependent variable and multiple predictors.
- Logistic Regression: Used for binary outcomes (e.g., yes/no, success/failure).
Applications:
- Predicting sales based on advertising spend.
- Modeling disease risk factors in epidemiology.

Sub-Areas of Study in Inferential Statistics

Sampling Theory

Overview:
Focuses on selecting representative subsets (samples) from populations to ensure accurate inferences.
Key Concepts:
- Random sampling, stratified sampling, and systematic sampling.
- Sampling error and its impact on statistical conclusions.
Applications:
- Designing surveys to understand consumer preferences.
- Conducting political polls to predict election outcomes.

Experimental Design

Overview:
Addresses how to structure experiments to maximize validity and reliability of results.
Key Concepts:
- Randomized control trials (RCTs) for unbiased comparisons.
- Blocking and stratification to control variability.
- Factorial designs to study multiple factors simultaneously.
Applications:
- Designing clinical trials to evaluate medical treatments.
- Testing product features to improve customer experience.

Bayesian Inference

Overview:
A statistical framework that incorporates prior knowledge or beliefs into analysis.
Key Concepts:
- Bayesian Theorem for updating probabilities based on new evidence.
- Posterior distributions to refine population parameter estimates.
Applications:
- Predicting customer churn in business analytics.
- Estimating disease spread in epidemiology.

Nonparametric Methods

Overview:
Statistical techniques that do not assume a specific distribution for the population.
Key Concepts:
- Mann-Whitney U test, Kruskal-Wallis test for ordinal data.
- Kernel density estimation for probability distributions.
Applications:
- Analyzing data with outliers or non-normal distributions.
- Evaluating medians in customer satisfaction surveys.

Applications of Inferential Statistics

Healthcare

Overview:
Inferential statistics is indispensable in clinical trials, epidemiology, and public health.
Applications:
- Testing the effectiveness of new treatments or drugs.
- Estimating disease prevalence in populations.
Examples:
- Comparing recovery rates between treatment and control groups.
- Calculating confidence intervals for vaccine efficacy.

Marketing and Business

Overview:
Businesses use inferential statistics to analyze market trends and optimize strategies.
Applications:
- Predicting customer behavior based on survey data.
- Testing the impact of marketing campaigns on sales.
Examples:
- Estimating the average spending of customers in a specific demographic.
- Determining whether a promotional offer increases customer engagement.

Social Sciences

Overview:
Social scientists rely on inferential statistics to understand human behavior and societal trends.
Applications:
- Analyzing survey data to draw conclusions about public opinion.
- Testing theories in psychology and sociology.
Examples:
- Determining if educational interventions improve student outcomes.
- Analyzing voting patterns in political studies.

Technology and Data Science

Overview:
Inferential statistics is critical for machine learning, algorithm development, and performance evaluation.
Applications:
- Building predictive models for classification and regression tasks.
- Evaluating algorithm accuracy using statistical testing.
Examples:
- Assessing the significance of features in a machine learning model.
- Testing improvements in recommendation systems for e-commerce platforms.

Examples of Inferential Statistics

Voter Behavior Prediction

Scenario:
Predicting election outcomes based on a poll of 1,000 voters.
Techniques Used:
- Estimating the proportion of voters supporting each candidate using confidence intervals.
- Testing whether voter preference differs significantly between age groups.

Clinical Trials

Scenario:
Evaluating whether a new drug reduces blood pressure more effectively than an existing treatment.
Techniques Used:
- Conducting a t-test to compare average blood pressure reductions.
- Calculating p-values to determine statistical significance.

Emerging Trends in Inferential Statistics

Big Data Integration

Leveraging inferential techniques to analyze large, complex datasets in fields like genomics and finance.

Machine Learning Synergy

Combining traditional inferential methods with machine learning for improved prediction and inference.

Robust Statistical Methods

Developing techniques to handle outliers and non-standard data distributions.

Real-Time Analytics

Implementing inferential statistics in real-time systems for instant decision-making in industries like e-commerce and logistics.

Challenges in Inferential Statistics

Sample Representativeness:
- Ensuring that samples accurately reflect the population to avoid biased conclusions.
Assumption Violations:
- Many inferential methods assume normality or independence, which may not hold in real-world data.
Misinterpretation of Results:
- P-values and confidence intervals can be misunderstood, leading to incorrect inferences.

Why Study Inferential Statistics

Drawing Conclusions from Data

Inferential statistics allow students to make generalizations about a population based on sample data. This involves using probability theory to estimate parameters and test hypotheses. Such methods are vital for evidence-based research and decision-making.

Understanding Sampling and Uncertainty

Students learn how sampling introduces variability and how to account for it using confidence intervals and error margins. This fosters an appreciation for the limits of data and promotes cautious interpretation. These insights are critical in any analytical setting.

Applications Across Disciplines

Inferential statistics are used in medicine, policy, education, and business to analyze data and inform strategies. Students gain tools to evaluate survey results, clinical trials, and market research. This interdisciplinary relevance enhances academic and professional value.

Hypothesis Testing Techniques

The study of inferential statistics introduces students to hypothesis testing, including t-tests, chi-square tests, and ANOVA. These tools allow them to validate assumptions and evaluate interventions. Such skills are indispensable in experimental design and evaluation.

Building a Research Mindset

Learning inferential statistics cultivates curiosity, skepticism, and rigor in analyzing data. Students develop the ability to interpret findings with nuance and recognize limitations. This mindset is essential for academic research and responsible data use.

Inferential Statistics: Conclusion

Inferential statistics is an indispensable tool for drawing meaningful conclusions about populations based on sample data. Through techniques such as hypothesis testing, confidence intervals, and regression analysis, it provides a rigorous framework for making informed decisions and predictions. Its applications span healthcare, business, social sciences, and technology, demonstrating its versatility and importance. As data becomes increasingly complex, the continued development of advanced inferential methods will be crucial for addressing modern challenges and unlocking new insights across disciplines.

Inferential Statistics: Review Questions and Answers:

What is inferential statistics and why is it essential in analyzing data?
Answer: Inferential statistics is a branch of statistics that allows us to draw conclusions about a larger population based on a sample of data. It uses probability theory to estimate population parameters and test hypotheses, making it indispensable for making informed decisions. This methodology enables researchers to quantify uncertainty and assess the reliability of their findings. By employing inferential techniques, one can generalize results beyond the observed data and establish trends or effects with measurable confidence.
How does hypothesis testing function in the framework of inferential statistics?
Answer: Hypothesis testing is a systematic method used in inferential statistics to decide whether there is enough evidence in a sample to support a specific claim about a population parameter. It begins with the formulation of a null hypothesis and an alternative hypothesis, then uses sample data to calculate a test statistic. Based on this statistic and its corresponding p-value, a decision is made to either reject or fail to reject the null hypothesis. This process is fundamental in determining the significance of observed differences or relationships in data.
What are confidence intervals and how do they assist in statistical inference?
Answer: Confidence intervals provide a range of values within which the true population parameter is likely to lie, with a certain level of confidence. They offer a practical measure of the precision of a sample estimate and quantify the uncertainty associated with it. By constructing confidence intervals, researchers can make probabilistic statements about the population and assess the reliability of their estimates. This method is crucial in inferential statistics, as it allows for more informed decision-making based on the estimated range rather than a single point value.
How is the Central Limit Theorem (CLT) applied in inferential statistics?
Answer: The Central Limit Theorem states that the distribution of sample means approximates a normal distribution, regardless of the population’s distribution, provided the sample size is sufficiently large. This theorem is critical in inferential statistics because it justifies the use of normal distribution-based methods even when the underlying data are not normally distributed. The CLT enables the calculation of confidence intervals and p-values, making it easier to draw reliable conclusions from sample data. It forms the backbone of many statistical procedures by ensuring that the sampling distribution of the mean becomes predictable as sample size increases.
What is a p-value and how is it interpreted in hypothesis testing?
Answer: A p-value is the probability of obtaining test results at least as extreme as those observed during the study, assuming that the null hypothesis is true. It provides a quantitative measure of the strength of evidence against the null hypothesis. A small p-value indicates strong evidence in favor of the alternative hypothesis, while a large p-value suggests that the observed data are consistent with the null hypothesis. Understanding p-values is essential for making informed decisions about statistical significance in inferential analyses.
How does sample size affect the reliability of inferential statistics?
Answer: Sample size plays a crucial role in the reliability of inferential statistics because larger samples tend to provide more accurate and stable estimates of population parameters. With a larger sample, the variability of the estimate decreases, leading to narrower confidence intervals and more powerful hypothesis tests. This reduces the margin of error and increases the likelihood of detecting true effects when they exist. Conversely, smaller samples may yield unreliable estimates and may not represent the population accurately, thereby undermining the conclusions drawn.
What is the difference between Type I and Type II errors in hypothesis testing?
Answer: A Type I error occurs when the null hypothesis is incorrectly rejected when it is actually true, while a Type II error happens when the null hypothesis is not rejected when it is false. The probability of committing a Type I error is denoted by the significance level (α), whereas the probability of a Type II error is represented by β. Balancing these errors is essential for designing robust tests and ensuring that the conclusions drawn from the data are accurate. Understanding the trade-off between Type I and Type II errors is vital for making sound inferences and optimizing the power of statistical tests.
How are regression analysis and inferential statistics related?
Answer: Regression analysis is a fundamental inferential statistical method used to model the relationship between a dependent variable and one or more independent variables. It allows for predictions and inferences about the effect of changes in predictor variables on the outcome variable. Through estimation of parameters, hypothesis testing, and construction of confidence intervals, regression analysis provides insights into the strength, direction, and significance of these relationships. This method is widely applied in various fields to understand trends, make forecasts, and inform decision-making based on empirical data.
What role does the concept of a sampling distribution play in inferential statistics?
Answer: A sampling distribution is the probability distribution of a given statistic based on a random sample, and it is central to inferential statistics. It describes how the statistic, such as the sample mean, varies from sample to sample and allows for the estimation of population parameters. The concept of sampling distributions underpins the calculation of standard errors, confidence intervals, and p-values, which are all crucial for making inferences. By understanding sampling distributions, statisticians can quantify the uncertainty inherent in sample estimates and make more reliable predictions about the population.
How can inferential statistics be applied to make decisions in real-world scenarios?
Answer: Inferential statistics can be applied in various real-world scenarios to make decisions based on sample data, such as in healthcare, finance, and public policy. By using techniques like hypothesis testing, confidence intervals, and regression analysis, decision makers can estimate population parameters and assess the significance of observed effects. This process allows for informed decision-making even when it is impractical to collect data from an entire population. The ability to draw generalizations from a sample and quantify uncertainty makes inferential statistics a powerful tool for guiding strategic and operational choices in diverse fields.

Inferential Statistics: Thought-Provoking Questions and Answers

How do different sampling methods impact the accuracy and reliability of inferential statistical conclusions?
Answer: Different sampling methods, such as random, stratified, and cluster sampling, can significantly affect the accuracy and reliability of inferential conclusions. Random sampling tends to produce the most unbiased and representative samples, whereas stratified sampling ensures that specific subgroups are adequately represented. Cluster sampling, while cost-effective, may introduce higher variability if the clusters are not homogeneous. The choice of sampling method influences the precision of estimates and the validity of hypothesis tests, making it a critical decision in study design.
In practical applications, selecting an appropriate sampling method requires careful consideration of the population structure and research objectives. Researchers must weigh the benefits of accuracy against practical constraints like cost and time. Ensuring that the sampling method minimizes bias and maximizes representativeness is essential for drawing valid inferences and achieving reliable results.
What are the potential pitfalls of relying solely on p-values in hypothesis testing, and how can they be mitigated?
Answer: Relying solely on p-values in hypothesis testing can be misleading because p-values do not measure the size or importance of an effect, and they can be influenced by sample size. A statistically significant p-value might indicate a trivial effect if the sample size is large, while a non-significant p-value might mask a meaningful effect in a small sample. To mitigate these pitfalls, it is important to consider confidence intervals, effect sizes, and the context of the data alongside p-values. This comprehensive approach provides a more nuanced understanding of the results and ensures that decisions are based on both statistical and practical significance.
Additionally, researchers should adopt a transparent reporting practice, discussing the limitations of p-values and avoiding the binary interpretation of results as simply “significant” or “not significant.” Combining p-values with other statistical metrics helps to provide a clearer picture of the underlying data and fosters more informed decision-making. This balanced approach is essential for robust and reliable inferential analysis.
How might advancements in computational power and software influence the future of inferential statistics?
Answer: Advancements in computational power and software are poised to revolutionize inferential statistics by enabling more complex models and simulations to be executed quickly and accurately. With enhanced computing resources, large datasets can be analyzed more efficiently, and sophisticated techniques like bootstrapping, Monte Carlo simulations, and Bayesian inference can be applied to derive deeper insights. These technological improvements facilitate real-time analysis and dynamic updating of statistical models, thereby increasing the reliability of inferences in rapidly changing environments.
Furthermore, user-friendly statistical software with advanced visualization capabilities makes inferential statistics more accessible to a broader audience. As these tools continue to evolve, they will likely foster interdisciplinary collaborations and drive innovation across various fields by streamlining the process of data analysis and interpretation. This integration of technology with statistical theory is set to enhance both the efficiency and the scope of inferential methods.
What is the role of the Central Limit Theorem in modern data analysis, and why is it so widely applicable?
Answer: The Central Limit Theorem (CLT) is fundamental to modern data analysis because it states that the distribution of sample means approximates a normal distribution as the sample size increases, regardless of the underlying population distribution. This powerful theorem underpins many statistical methods, allowing for the use of normal distribution-based confidence intervals and hypothesis tests even when the original data is not normally distributed. Its wide applicability is due to the fact that it holds true under very general conditions, making it a cornerstone of inferential statistics and a critical tool for practical data analysis.
The CLT not only simplifies the analysis of complex datasets but also provides a theoretical basis for many advanced techniques in statistics. By ensuring that sample means converge to a predictable form, the theorem allows researchers to make reliable inferences about population parameters and quantify uncertainty effectively. This universality makes the CLT one of the most valuable results in statistics, with profound implications for both theory and application.
How do confidence intervals enhance our understanding of uncertainty in statistical estimates?
Answer: Confidence intervals provide a range within which the true population parameter is likely to lie, offering a quantitative measure of uncertainty associated with sample estimates. They not only give a point estimate but also indicate the reliability of that estimate by specifying a probability, typically 95%, that the interval contains the true value. This approach helps in assessing the precision of the estimates and in comparing different datasets or experimental outcomes. Confidence intervals are a key tool in inferential statistics, allowing researchers to communicate the uncertainty inherent in their findings in a clear and interpretable manner.
Additionally, by using confidence intervals, analysts can determine the potential variability of their estimates and make more informed decisions about the data. They provide an essential context for interpreting p-values and hypothesis test results, thereby enhancing the overall robustness of statistical conclusions. This understanding is crucial for drawing meaningful insights and making reliable predictions based on sample data.
In what ways can Bayesian inference provide a more flexible framework compared to classical methods?
Answer: Bayesian inference offers a flexible framework by incorporating prior beliefs and evidence into the analysis process, updating these beliefs as new data becomes available. Unlike classical methods that rely solely on sample data, Bayesian methods combine prior distributions with likelihoods to produce posterior distributions, providing a more complete picture of uncertainty. This approach is particularly useful in situations with limited data or when prior information is available, allowing for more nuanced inferences. Bayesian methods also facilitate decision-making by directly quantifying the probability of hypotheses, which can be more intuitive in many applications.
Moreover, the Bayesian framework supports iterative learning and dynamic updating, making it adaptable to changing data and complex models. It is increasingly being integrated with computational techniques and machine learning to handle high-dimensional problems, offering significant advantages in flexibility and interpretability. This adaptability makes Bayesian inference a powerful alternative to traditional statistical methods in a wide range of fields.
How can sampling variability impact the reliability of inferential statistics, and what techniques help to mitigate this effect?
Answer: Sampling variability refers to the natural fluctuation in sample statistics from one sample to another, which can impact the reliability of inferential conclusions if not properly accounted for. This variability is inherent in any sampling process and can lead to differences in estimates, such as the mean or standard deviation, even when the samples are drawn from the same population. Techniques such as increasing sample size, using random sampling methods, and applying bootstrapping methods help mitigate the effects of sampling variability. These methods ensure that the estimates are more stable and that the inferences made are representative of the overall population.
Additionally, employing confidence intervals and hypothesis tests that account for variability can provide a more accurate reflection of the uncertainty inherent in the data. By addressing sampling variability through careful study design and advanced analytical techniques, researchers can improve the robustness and reliability of their statistical inferences, ultimately leading to more informed and accurate conclusions.
What is the significance of p-values in inferential statistics, and what are the potential drawbacks of relying on them exclusively?
Answer: P-values are a central component in inferential statistics as they provide a measure of the evidence against the null hypothesis. A small p-value indicates that the observed data is unlikely to have occurred under the null hypothesis, suggesting that the alternative hypothesis may be true. However, p-values have limitations; they do not indicate the magnitude of an effect or its practical significance, and they can be influenced by sample size. Overreliance on p-values without considering confidence intervals or effect sizes can lead to misinterpretations and potentially misleading conclusions.
To address these drawbacks, it is important to complement p-value analysis with other statistical measures that provide a fuller picture of the data, such as effect sizes, confidence intervals, and power analysis. This balanced approach helps ensure that the results are not only statistically significant but also meaningful in practical terms, enhancing the overall quality of statistical inferences.
How can inferential statistics be used to compare multiple groups simultaneously, and what techniques are involved?
Answer: Inferential statistics can be used to compare multiple groups simultaneously through techniques such as analysis of variance (ANOVA), which tests for significant differences among group means. ANOVA allows researchers to determine whether the observed variations between groups are statistically significant or simply due to random sampling variability. By partitioning the total variance into components attributable to different sources, ANOVA provides a systematic method for comparing several groups at once. This technique is essential for experimental designs and studies involving multiple treatments or conditions, ensuring that conclusions are drawn with a comprehensive understanding of the data’s variability.
Additional methods like post-hoc tests (e.g., Tukey’s HSD) can further pinpoint which specific groups differ from each other, enhancing the interpretability of the results. Together, these techniques enable robust comparisons and support evidence-based decision-making in various fields, from social sciences to medical research.
How does the concept of statistical power relate to the reliability of hypothesis tests?
Answer: Statistical power is the probability that a hypothesis test correctly rejects a false null hypothesis, thereby detecting a true effect when it exists. A high statistical power means that the test is more likely to identify significant differences, reducing the risk of a Type II error. Power depends on factors such as sample size, effect size, significance level, and variability in the data. Ensuring adequate power is crucial for the reliability of hypothesis tests, as it enhances the confidence that the results are reflective of true differences rather than random chance.
Researchers can increase power by designing studies with larger sample sizes, reducing measurement error, or choosing more sensitive statistical tests. By carefully considering power in the study design phase, one can improve the robustness of the conclusions drawn from inferential statistics and ensure that meaningful effects are not overlooked.
How might robust statistical methods help mitigate the influence of outliers in inferential analyses?
Answer: Robust statistical methods are designed to be less sensitive to outliers and deviations from assumptions, ensuring that the analysis remains reliable even when data contains anomalies. Techniques such as robust regression, trimmed means, and non-parametric tests reduce the influence of extreme values on statistical estimates. These methods help maintain the integrity of inferential conclusions by providing more stable and representative measures of central tendency and variability. Robust approaches are particularly valuable in real-world data analysis, where outliers can significantly distort results if not properly managed.
By incorporating robust techniques, researchers can ensure that their findings are not overly affected by a few extreme observations. This leads to more accurate and generalizable results, enhancing the overall credibility of the statistical analysis. Such methods are essential for making sound decisions in fields where data quality and consistency are critical.
What strategies can be employed to determine the appropriate sample size for inferential studies?
Answer: Determining the appropriate sample size for inferential studies involves balancing the need for precision with practical constraints such as time and cost. Strategies include conducting power analysis, which calculates the minimum sample size required to detect a specified effect size at a given significance level with acceptable power. Other considerations involve estimating variability within the population and using confidence interval formulas to ensure that the margin of error is sufficiently small. These methods help in designing studies that are both statistically robust and feasible.
Additionally, pilot studies can be useful for gathering preliminary data to refine sample size estimates. By combining these strategies, researchers can optimize their study design to achieve reliable and meaningful results, thereby strengthening the conclusions drawn from their inferential analyses.

Inferential Statistics Problems and Solutions

1. Constructing a 95% Confidence Interval for a Population Mean

Given: $ \bar{x} = 80 $, $ s = 12 $, $ n = 36 $

Standard error:

\[
SE = \frac{s}{\sqrt{n}} = \frac{12}{6} = 2
\]

For 95% confidence with $ df = 35 $, assume:

\[
t^* \approx 2.030
\]

Margin of error:

\[
ME = 2.030 \times 2 = 4.06
\]

Confidence interval:

\[
80 \pm 4.06 \quad \Rightarrow \quad (75.94, 84.06)
\]

2. Performing a Two-Sample t-Test

Given: $ \bar{x}_1 = 85 $, $ \bar{x}_2 = 80 $, $ s_1 = 10 $, $ s_2 = 12 $, $ n_1 = 30 $, $ n_2 = 35 $

Pooled standard deviation:

\[
s_p = \sqrt{\frac{(29)(10^2) + (34)(12^2)}{30 + 35 – 2}}
= \sqrt{\frac{2900 + 4896}{63}}
= \sqrt{123.81} \approx 11.12
\]

Standard error:

\[
SE = s_p \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}
\approx 11.12 \sqrt{0.0333 + 0.0286}
\approx 11.12 \times 0.249
\approx 2.77
\]

t-statistic:

\[
t = \frac{85 – 80}{2.77} \approx 1.81
\]

Critical $ t $ for $ df = 63 $ at $ \alpha = 0.05 $ is approximately 2.000. Since $ 1.81 < 2.000 $, the difference is not statistically significant.

3. Calculating Required Sample Size for a Margin of Error

Given: $ E = 3 $, $ \sigma = 15 $, $ z = 1.96 $

\[
n = \left( \frac{z\sigma}{E} \right)^2
= \left( \frac{1.96 \times 15}{3} \right)^2
= (9.8)^2 \approx 96.04
\]

Round up: $ n = 97 $.

4. Chi-Square Test for Goodness-of-Fit

Observed: [25, 30, 20, 25]
Expected: [23, 27, 23, 27]

\[
\chi^2 = \frac{(25-23)^2}{23} + \frac{(30-27)^2}{27} + \frac{(20-23)^2}{23} + \frac{(25-27)^2}{27}
\]
\[
\approx 0.174 + 0.333 + 0.391 + 0.148 = 1.046
\]

Since $ 1.046 < 7.815 $ (critical value at $ df = 3 $, $ \alpha = 0.05 $), we fail to reject $ H_0 $.

5. Calculating a Correlation Coefficient

\[
r = \frac{n\sum XY – (\sum X)(\sum Y)}{\sqrt{\left[n\sum X^2 – (\sum X)^2\right]\left[n\sum Y^2 – (\sum Y)^2\right]}}
\]

Example values: $ n = 10 $, $ \sum X = 50 $, $ \sum Y = 80 $, $ \sum XY = 460 $, $ \sum X^2 = 300 $, $ \sum Y^2 = 700 $

\[
\text{Numerator} = 10(460) – 50(80) = 600
\]
\[
\text{Denominator} = \sqrt{(3000 – 2500)(7000 – 6400)} = \sqrt{500 \times 600} \approx 547.72
\]
\[
r \approx \frac{600}{547.72} \approx 0.85
\]

6. 95% Confidence Interval for a Proportion

Given: $ \hat{p} = 0.4 $, $ n = 300 $

\[
SE = \sqrt{\frac{0.4(0.6)}{300}} \approx 0.0283
\]
\[
ME = 1.96 \times 0.0283 \approx 0.0555
\]
\[
\text{CI} = 0.4 \pm 0.0555 \quad \Rightarrow \quad (0.3445, 0.4555)
\]

7. One-Way ANOVA Test

Given: $ SSB = 120 $, $ SSW = 180 $, $ n_T = 30 $

\[
df_{\text{between}} = 3 – 1 = 2, \quad df_{\text{within}} = 30 – 3 = 27
\]
\[
MSB = \frac{120}{2} = 60, \quad MSW = \frac{180}{27} \approx 6.67
\]
\[
F = \frac{60}{6.67} \approx 9.0
\]

Since $ 9.0 > 3.35 $ (critical value), reject $ H_0 $.

8. Regression Line and $ R^2 $

\[
m = \frac{n\sum XY – \sum X \sum Y}{n\sum X^2 – (\sum X)^2}
\]
\[
m = \frac{10(280) – 40(50)}{10(200) – 40^2} = \frac{800}{400} = 2
\]
\[
b = \bar{Y} – m\bar{X} = 5 – 8 = -3
\]

Regression equation: $ Y = 2X – 3 $
With consistent data: $ R^2 \approx 0.85 $.

9. Determining p-value from a z-Test

\[
P(Z > 2.3) \approx 0.0107
\]
\[
\text{Two-tailed:} \quad p \approx 0.0214
\]

Significant at 5% level.

10. Chi-Square for a 2×2 Table

Observed:

\[
\begin{array}{cc}
20 & 30 \\
25 & 25
\end{array}
\]

Expected: $ E_{11} = 22.5 $, $ E_{12} = 27.5 $, $ E_{21} = 22.5 $, $ E_{22} = 27.5 $

\[
\chi^2 \approx 0.278 + 0.227 + 0.278 + 0.227 = 1.01
\]

11. Standard Error of a Proportion

\[
SE = \sqrt{\frac{0.35 \times 0.65}{400}} \approx 0.02385
\]

12. One-Sample t-Test for a Mean

Given: $ \bar{x} = 100 $, $ s = 12 $, $ n = 20 $, $ H_0: \mu = 95 $

\[
SE = \frac{12}{\sqrt{20}} \approx 2.683
\]
\[
t \approx \frac{100 – 95}{2.683} \approx 1.864
\]

Since $ 1.864 < 2.093 $ (critical value), fail to reject $ H_0 $.