Prepare for University Studies & Career Advancement

Data Analysis

Data Analysis is the discipline of extracting meaningful patterns, trends, and conclusions from raw information. It serves as a bridge between data science and analytics and decision-making in sectors ranging from business to healthcare. A foundational step in any analytical workflow is the effective data collection and storage process, which ensures that reliable and representative data is available for analysis. Before analysis can begin, meticulous data cleaning and preprocessing helps remove inconsistencies and prepares datasets for accurate modeling.

The tools and techniques used in data analysis are rapidly evolving. Staying updated with tools and technologies in data science is crucial for aspiring analysts. Once cleaned, data is transformed into insights using a variety of statistical and computational approaches, often culminating in data visualization that highlights key findings for stakeholders. Whether creating dashboards or conducting statistical tests, analysts ensure their outputs are not only accurate but accessible.

In today’s data-rich environments, analysis is frequently aligned with domain-specific analytics to address sector-specific challenges such as financial risk, supply chain optimization, or patient care. Ethical considerations are essential, especially when handling sensitive datasets. Analysts must be aware of the ethical and social aspects of their work, promoting fairness, transparency, and accountability.

Data analysis also supports cybersecurity operations. Through the integration of threat intelligence and behavioral monitoring, analysts help identify anomalies and potential breaches. As cyber threats grow in sophistication, the role of AI and ML in cybersecurity becomes vital in automating anomaly detection and predicting attacks. This intersects with domains such as incident response and forensics, network security, and application security.

In secure environments, strong identity and access management ensures that sensitive data remains protected. Organizations also implement robust cybersecurity policies and promote cybersecurity awareness across teams to reduce risk. Where data analytics is deployed in industrial or critical infrastructure settings, it must align with standards in OT security and CPS security.

Cloud-based systems have revolutionized modern data pipelines, making cloud security a necessary pillar of any analytic infrastructure. Additionally, monitoring devices at the edge of networks has underscored the importance of endpoint security. In some scenarios, ethical hacking is employed to test the resilience of systems that host sensitive analytics environments.

Finally, the intersection of analysis with large-scale information has led to the rise of big data analytics, where volume, velocity, and variety challenge conventional tools. Analysts now operate within a broader cybersecurity and data ecosystem, constantly adapting to emerging areas in cybersecurity and evolving analytical methodologies.

In essence, data analysis is both an art and a science. It turns scattered observations into structured narratives, fuels strategic decisions, and powers innovations across sectors. For students and professionals alike, mastering this discipline opens a world of discovery, transformation, and responsible impact.

 

Data Analysis - Prep4Uni Online Preparing university

Table of Contents

Business Analytics

Business analytics focuses on leveraging data to inform decision-making and optimize business performance. By analyzing operational and market data, organizations can identify opportunities, address challenges, and make data-driven decisions.

  1. Key Objectives:

    • Improve operational efficiency.
    • Enhance customer satisfaction.
    • Identify revenue growth opportunities.
    • Predict market trends and risks.
  2. Types of Business Analytics:

    • Descriptive Analytics: Summarizes historical data to understand past performance.
    • Predictive Analytics: Uses data modeling to forecast future trends.
    • Prescriptive Analytics: Provides actionable recommendations to optimize outcomes.
  3. Tools:

    • Excel: Widely used for basic data analysis, pivot tables, and statistical modeling.
    • Tableau: A powerful data visualization tool for creating interactive dashboards and reports.
    • Power BI: A business intelligence platform for analyzing and visualizing data in real-time.
  4. Applications:

    • Sales forecasting and performance analysis.
    • Customer segmentation and churn analysis.
    • Market basket analysis for cross-selling and up-selling.

Time-Series Analysis

Time-series analysis involves analyzing data points collected over time to identify trends, patterns, and seasonal effects. It is particularly useful for forecasting and monitoring applications where time plays a crucial role.

  1. Key Features:

    • Focuses on sequential data (e.g., hourly, daily, monthly).
    • Detects patterns like trends (long-term movement) and seasonality (cyclical patterns).
  2. Applications:

    • Stock Market Forecasting: Predicting future stock prices based on historical trends.
    • Weather Prediction: Forecasting temperatures, rainfall, or storms using past weather data.
    • Energy Consumption: Monitoring and predicting electricity usage patterns.
    • Retail: Demand forecasting for inventory and staffing.
  3. Techniques:

    • ARIMA (AutoRegressive Integrated Moving Average): A statistical method that models time-series data by accounting for trends, seasonality, and noise.
    • Exponential Smoothing: Assigns exponentially decreasing weights to past observations to make predictions.
    • Seasonal Decomposition of Time Series (STL): Separates data into trend, seasonal, and residual components.
  4. Tools:

    • Python Libraries: Statsmodels, Prophet, and Pandas.
    • R: Forecast and tsibble packages.
    • Specialized Platforms: SAS, MATLAB.

A/B Testing

A/B testing, also known as split testing, is a statistical method used to compare two or more variations of a variable to determine which performs better. It is widely used in marketing, product development, and web design.

  1. Key Features:

    • Controlled Experiments: Divides the audience into groups to test different versions simultaneously.
    • Measuring Outcomes: Tracks metrics such as conversion rates, click-through rates, or revenue.
  2. Steps in A/B Testing:

    • Define the objective (e.g., increase sign-ups on a webpage).
    • Identify variables to test (e.g., button color, headline text).
    • Split the audience randomly into groups (control and test).
    • Analyze results to determine statistical significance.
  3. Applications:

    • Web Design: Testing webpage layouts or call-to-action buttons to improve user engagement.
    • Marketing Campaigns: Comparing email subject lines or ad creatives to maximize conversions.
    • Product Features: Evaluating new app features or changes in user interface.
  4. Statistical Considerations:

    • Use metrics like p-value and confidence interval to assess the significance of results.
    • Ensure sufficient sample size for reliable conclusions.
  5. Tools:

    • Google Optimize: A free tool for website A/B testing.
    • Optimizely: A robust platform for experimentation and optimization.
    • Statistical Software: R, Python (SciPy, Statsmodels).

How These Sub-Areas Interact

Business Analytics

uses insights from time-series analysis and A/B testing to support strategic decisions.

Time-Series Analysis

provides predictive insights that guide business planning and operations.

A/B Testing

validates changes or optimizations suggested by analytics or time-series trends.


Why Study Data Analysis

Turning Raw Data into Meaningful Insight

Data analysis is the process of systematically inspecting, cleaning, transforming, and interpreting data to discover useful information and support decision-making. For students preparing for university, studying data analysis builds foundational skills that are applicable across nearly all fields of study. It teaches students how to make evidence-based decisions, solve problems using real-world data, and develop analytical thinking—skills that are increasingly valued in both academic and professional environments.

Learning to Work with Quantitative and Qualitative Data

Students gain experience handling different types of data, including numbers, categories, surveys, and observations. They learn how to organize data using spreadsheets or databases, apply statistical techniques, and summarize findings using charts, graphs, and tables. This practical knowledge helps students communicate insights clearly and persuasively—whether for class projects, research reports, or business presentations.

Mastering Tools and Techniques for Real-World Applications

Data analysis involves the use of software tools such as Excel, Python, R, and SQL, as well as visualization platforms like Tableau or Power BI. Students learn to perform descriptive, inferential, and predictive analysis. These techniques are widely used in fields such as business, science, healthcare, economics, education, and social research, making data analysis a versatile and valuable skill set.

Improving Academic Research and Project Outcomes

In university settings, the ability to analyze data enhances the quality and credibility of student research. Whether working on science experiments, survey-based studies, or policy analysis, students who understand data analysis can draw more accurate conclusions, support their arguments with evidence, and produce higher-impact work. This skill not only strengthens academic performance but also prepares students for independent research and collaboration.

Preparing for Careers in a Data-Driven World

A foundation in data analysis supports further study in disciplines such as statistics, business analytics, economics, public health, psychology, and environmental science. It also leads to career opportunities in data analysis, market research, consulting, and government. For university-bound learners, studying data analysis provides the analytical confidence and technical literacy needed to thrive in a world where data informs every major decision.
 

Data Analysis: Conclusion

Data analysis combines these sub-areas to uncover actionable insights, making it a cornerstone of modern decision-making across industries. 

Data Analysis: Review Questions and Answers:

1. What is data analysis and why is it critical in data science?
Answer: Data analysis is the systematic process of inspecting, cleaning, transforming, and modeling data to extract meaningful information and support decision-making. It is critical in data science because it turns raw data into actionable insights that guide business strategy and operational improvements. This process helps identify trends, patterns, and anomalies that might otherwise remain hidden, leading to more informed decisions. Furthermore, effective data analysis underpins all other data science tasks, from predictive modeling to data visualization.

2. What are the primary steps in a data analysis workflow?
Answer: The primary steps in a data analysis workflow include data collection, data cleaning, exploratory data analysis, modeling, and interpretation of results. Data collection gathers the necessary raw information, while data cleaning ensures the dataset is accurate and free from errors. Exploratory data analysis helps uncover patterns and anomalies, and modeling uses statistical or machine learning techniques to predict outcomes or test hypotheses. Finally, interpreting the results translates the analysis into actionable business insights.

3. How does data cleaning improve the quality of analysis?
Answer: Data cleaning improves the quality of analysis by removing errors, inconsistencies, and irrelevant information from raw datasets. This step ensures that the data is accurate and reliable, which is essential for building robust analytical models. Clean data reduces the risk of misleading results and enables more precise insights. Additionally, systematic data cleaning helps standardize data formats, making it easier to integrate and analyze information from multiple sources.

4. What role do statistical methods play in data analysis?
Answer: Statistical methods are fundamental in data analysis as they provide the tools to summarize data, test hypotheses, and infer conclusions about larger populations. Techniques such as regression analysis, hypothesis testing, and variance analysis help in understanding relationships and patterns within the data. These methods offer a rigorous framework for validating findings and ensuring that the insights drawn are statistically significant. Overall, statistics enable data analysts to quantify uncertainty and make data-driven decisions with confidence.

5. How is machine learning integrated into data analysis processes?
Answer: Machine learning is integrated into data analysis processes to automate the discovery of patterns and enhance predictive modeling. By training algorithms on historical data, machine learning models can identify complex relationships and forecast future trends with minimal human intervention. This integration allows for more dynamic and scalable analyses, particularly when handling large datasets. Additionally, machine learning techniques complement traditional statistical methods, providing deeper insights and improved accuracy in predictions.

6. What are the benefits of data visualization in understanding datasets?
Answer: Data visualization transforms complex datasets into graphical representations such as charts, graphs, and dashboards, making the information easier to understand and interpret. It enables stakeholders to quickly identify trends, outliers, and patterns that might be missed in raw data. Visualizations help simplify complex analytical results, facilitating communication and informed decision-making across non-technical audiences. In essence, effective data visualization bridges the gap between data analysis and strategic business actions.

7. How do exploratory data analysis techniques help in uncovering insights?
Answer: Exploratory data analysis (EDA) techniques help in uncovering insights by using statistical graphics and other data visualization tools to understand the underlying structure of a dataset. EDA allows analysts to detect outliers, discover patterns, and formulate hypotheses about relationships within the data. It also assists in identifying errors or inconsistencies early in the analysis process. By gaining a deep understanding of the data, EDA sets the foundation for more accurate modeling and predictive analysis.

8. What are common challenges encountered during data analysis, and how can they be addressed?
Answer: Common challenges in data analysis include dealing with missing or inconsistent data, managing large volumes of information, and integrating data from multiple sources. These issues can lead to inaccurate or biased results if not properly addressed. To overcome these challenges, organizations should implement robust data cleaning procedures, employ efficient data management tools, and use standardized formats for data integration. Additionally, investing in skilled personnel and continuous training can help mitigate these challenges by ensuring that the analysis is both accurate and actionable.

9. How does big data influence the methods used in data analysis?
Answer: Big data influences the methods used in data analysis by requiring advanced techniques capable of handling high volume, velocity, and variety of information. Traditional data processing tools may fall short when dealing with massive datasets, necessitating the use of distributed computing, cloud-based analytics, and machine learning algorithms. Big data analytics also emphasizes real-time processing and advanced visualization to derive actionable insights quickly. As a result, organizations must continuously evolve their analytical approaches to leverage the full potential of big data.

10. How can data analysis drive better business decision-making?
Answer: Data analysis drives better business decision-making by providing concrete, data-backed insights that reduce uncertainty and guide strategic planning. It enables organizations to identify trends, forecast future events, and evaluate the effectiveness of current strategies. By turning complex datasets into clear, actionable information, data analysis supports evidence-based decisions that can improve operational efficiency and competitive advantage. Ultimately, integrating data analysis into decision-making processes fosters a culture of continuous improvement and innovation.

Data Analysis: Thought-Provoking Questions and Answers

1. How will advancements in artificial intelligence reshape data analysis methodologies in the next decade?
Answer: Advancements in artificial intelligence (AI) are poised to dramatically reshape data analysis methodologies by automating complex tasks such as data cleaning, feature selection, and pattern recognition. AI algorithms can learn from vast amounts of data to identify subtle trends and correlations that traditional methods might overlook, enabling deeper insights and more accurate predictions. This evolution will lead to more adaptive and self-improving models that continuously refine themselves as new data becomes available.
Moreover, the integration of AI into data analysis workflows will likely reduce the time and resources required to generate actionable insights. As a result, organizations can respond more rapidly to market changes and emerging trends, ultimately transforming their decision-making processes. The future will see a greater reliance on AI-powered analytics that blend human intuition with machine precision to drive innovation and competitive advantage.

2. What ethical dilemmas might arise from extensive data collection in big data analytics, and how can they be managed?
Answer: Extensive data collection in big data analytics raises ethical dilemmas related to privacy, consent, and potential misuse of sensitive information. Organizations may face challenges in balancing the need for comprehensive data with individuals’ rights to privacy and data protection. Ethical dilemmas can also emerge if data is collected without adequate transparency or used in ways that disadvantage certain groups.
To manage these issues, organizations must establish robust data governance policies that include clear guidelines for data collection, storage, and usage. Implementing data anonymization and encryption techniques can further protect individual privacy while still allowing for meaningful analysis. Regular audits, adherence to legal frameworks such as GDPR, and fostering an organizational culture of ethical data use are essential steps to ensure that the benefits of big data analytics do not come at the expense of ethical standards.

3. How can real-time analytics transform decision-making processes in dynamic business environments?
Answer: Real-time analytics can transform decision-making processes by providing immediate insights into operational performance, market trends, and customer behavior. This capability enables organizations to react swiftly to changing conditions, allowing for proactive adjustments in strategy and operations. With real-time data, decision-makers can identify and address issues before they escalate, reducing downtime and enhancing overall efficiency.
Furthermore, real-time analytics supports a more agile business model, where continuous monitoring and rapid feedback loops lead to quicker, more informed decisions. The integration of real-time insights into business intelligence systems can drive improvements in everything from supply chain management to customer engagement, ultimately fostering a competitive advantage in fast-paced markets.

4. What strategies can organizations adopt to ensure high data quality in large and complex datasets?
Answer: Ensuring high data quality in large and complex datasets requires a comprehensive strategy that includes robust data cleansing, validation, and integration processes. Organizations should implement automated tools and machine learning algorithms to detect and correct errors, inconsistencies, and missing values in the data. Establishing a strong data governance framework is also essential, as it sets clear standards and procedures for data management and quality assurance.
In addition, continuous monitoring and periodic audits of data sources can help maintain data integrity over time. Training staff on best practices for data handling and fostering a culture that prioritizes data quality are key components in sustaining reliable and actionable insights. By investing in these strategies, organizations can maximize the accuracy and effectiveness of their data analysis initiatives.

5. How might the evolution of big data technologies impact the scalability of data analysis solutions?
Answer: The evolution of big data technologies is set to significantly impact the scalability of data analysis solutions by providing more powerful and efficient processing frameworks. Innovations in distributed computing, cloud storage, and parallel processing allow organizations to handle exponentially larger datasets without compromising performance. This increased scalability ensures that data analysis solutions remain effective even as data volumes continue to grow.
Moreover, these technological advancements enable more cost-effective and flexible solutions that can dynamically allocate resources based on current demands. As a result, organizations will be better equipped to integrate vast amounts of data from diverse sources, ensuring that their analytical capabilities can scale in tandem with their business needs. This evolution will drive continuous improvements in both speed and accuracy, enhancing the overall value derived from big data analytics.

6. What are the potential benefits and risks of automating data analysis processes, and how can they be balanced?
Answer: Automating data analysis processes offers significant benefits such as increased speed, reduced human error, and the ability to process large datasets efficiently. Automation can free up valuable human resources, allowing analysts to focus on strategic decision-making rather than routine tasks. However, the risks include potential over-reliance on automated systems, which may lead to oversight of nuanced insights that require human interpretation, and the possibility of errors if the automation algorithms are not properly maintained or updated.
To balance these benefits and risks, organizations should implement a hybrid approach that combines automation with expert human oversight. Regular validation and testing of automated systems, coupled with continuous training for analysts, can ensure that the technology remains accurate and reliable. This balance not only maximizes efficiency but also maintains the depth and quality of analytical insights.

7. How can data visualization techniques evolve to improve the interpretability of complex analytics results?
Answer: Data visualization techniques can evolve by incorporating more interactive and dynamic elements that allow users to explore data in real time. Advanced visualizations such as 3D models, heat maps, and interactive dashboards can present complex analytics results in an intuitive and engaging manner. These tools help users uncover hidden patterns and correlations by enabling them to manipulate and drill down into the data, leading to deeper insights.
Furthermore, the integration of augmented reality (AR) and virtual reality (VR) in data visualization could provide immersive experiences that make complex data more accessible. By continuously innovating in visualization techniques, organizations can bridge the gap between sophisticated data analytics and actionable business insights, ensuring that decision-makers at all levels can understand and act on the data.

8. How does integrating predictive analytics with traditional data analysis enhance business forecasting?
Answer: Integrating predictive analytics with traditional data analysis enhances business forecasting by combining historical trends with forward-looking models to anticipate future outcomes more accurately. Predictive analytics uses machine learning algorithms and statistical techniques to analyze past data and forecast future events, providing valuable insights into market trends, customer behavior, and operational risks. This integration allows businesses to make informed decisions by not only understanding what has happened but also predicting what is likely to happen.
The combined approach leads to a more holistic view of the business environment, enabling organizations to optimize strategies, allocate resources more effectively, and mitigate potential risks. Ultimately, this fusion of methodologies drives innovation and competitive advantage by fostering proactive decision-making.

9. What role can collaborative analytics platforms play in enhancing data-driven decision-making across organizations?
Answer: Collaborative analytics platforms play a pivotal role in enhancing data-driven decision-making by enabling seamless sharing of data insights and fostering teamwork among data scientists, analysts, and business leaders. These platforms facilitate real-time collaboration, allowing teams to jointly explore datasets, refine analytical models, and interpret complex results. By breaking down silos, organizations can leverage collective expertise to generate more accurate and comprehensive insights.
In addition, collaborative platforms often include integrated visualization and reporting tools that make it easier for non-technical stakeholders to understand the data. This inclusivity ensures that data-driven insights are effectively communicated across the organization, ultimately leading to more informed and unified strategic decisions.

10. How might emerging trends in data privacy regulations affect data analysis practices and technologies?
Answer: Emerging trends in data privacy regulations are likely to have a significant impact on data analysis practices and technologies by imposing stricter controls on data collection, storage, and processing. Organizations will need to adapt their analytical methods to ensure compliance with regulations such as GDPR and CCPA, which may involve implementing advanced anonymization and encryption techniques. These changes could drive the development of new tools and platforms that prioritize privacy while still delivering actionable insights from large datasets.
Furthermore, regulatory pressures may encourage a shift towards more transparent data practices, requiring organizations to be more accountable for how data is used and shared. This evolution will not only safeguard individual privacy but also build trust with customers and stakeholders, ultimately influencing the future design and deployment of data analysis technologies.

11. What strategies can organizations employ to ensure the long-term sustainability of their data analysis initiatives in a rapidly changing technological landscape?
Answer: Organizations can ensure the long-term sustainability of their data analysis initiatives by investing in scalable and flexible infrastructure such as cloud-based platforms and distributed computing systems. Emphasizing continuous learning and adaptation, companies should regularly update their analytical models and data processing technologies to keep pace with emerging trends and challenges. Establishing robust data governance frameworks and promoting a culture of data literacy are also critical to sustaining long-term initiatives.
Moreover, fostering cross-functional collaboration between IT, data science teams, and business units helps align analytical efforts with strategic objectives. This integrated approach ensures that data analysis remains a core component of decision-making processes, driving innovation and maintaining competitive advantage over time.

12. How can organizations measure the ROI of their data analysis projects beyond simple cost savings?
Answer: Organizations can measure the ROI of their data analysis projects by evaluating a broad range of performance metrics, including improvements in decision-making speed, increased customer satisfaction, and enhanced operational efficiency. Beyond direct cost savings, qualitative benefits such as better market positioning, risk mitigation, and innovation capacity should be considered. Metrics like reduced time-to-insight, increased revenue from data-driven strategies, and higher employee productivity provide a comprehensive view of ROI.
Additionally, conducting regular surveys and performance reviews can help quantify the intangible benefits of data analysis, such as improved strategic alignment and competitive advantage. By adopting a multidimensional approach to ROI assessment, organizations can capture the full spectrum of value generated by their data analysis initiatives.

Data Analysis: Numerical Problems and Solutions:

1. A big data analytics system processes 8,000,000 records per day. If a sampling method selects 5% of these records for analysis, calculate the sample size per day, the total sample size for a month (30 days), and confirm the average daily sample size from the monthly total.
Solution:
• Step 1: Daily sample size = 8,000,000 × 0.05 = 400,000 records.
• Step 2: Monthly sample size = 400,000 × 30 = 12,000,000 records.
• Step 3: Average daily sample = 12,000,000 ÷ 30 = 400,000 records.

2. A machine learning model is trained on a dataset of 500,000 examples and achieves an accuracy of 88%. Calculate the number of correctly predicted examples, the number of misclassified examples, and the misclassification rate percentage.
Solution:
• Step 1: Correct predictions = 500,000 × 0.88 = 440,000.
• Step 2: Misclassified examples = 500,000 – 440,000 = 60,000.
• Step 3: Misclassification rate = (60,000 ÷ 500,000) × 100 = 12%.

3. A data processing pipeline handles 150,000 records per hour. If processing efficiency is improved by 40%, calculate the new processing rate per hour, the total records processed in a 24-hour period before and after the improvement, and the percentage increase in daily processing.
Solution:
• Step 1: New processing rate = 150,000 × 1.40 = 210,000 records per hour.
• Step 2: Daily total before = 150,000 × 24 = 3,600,000 records; after = 210,000 × 24 = 5,040,000 records.
• Step 3: Percentage increase = ((5,040,000 – 3,600,000) ÷ 3,600,000) × 100 = 40%.

4. A regression model has a mean absolute error (MAE) of $1,500. If improvements reduce the MAE by 30%, calculate the new MAE and the absolute reduction in error per prediction.
Solution:
• Step 1: Error reduction = $1,500 × 0.30 = $450.
• Step 2: New MAE = $1,500 – $450 = $1,050.
• Step 3: The absolute reduction in error per prediction is $450.

5. A data visualization dashboard displays 10 key metrics updated every 15 minutes. Calculate the number of updates per day, per month (30 days), and per year (365 days).
Solution:
• Step 1: Updates per day = (24 × 60) ÷ 15 = 96 updates.
• Step 2: Updates per month = 96 × 30 = 2,880 updates.
• Step 3: Updates per year = 96 × 365 = 35,040 updates.

6. A clustering algorithm segments 1,000,000 data points into 10 clusters. If one cluster contains 15% of the data, calculate the number of points in that cluster, the number remaining for the other clusters, and the average number of points per remaining cluster.
Solution:
• Step 1: Points in the cluster = 1,000,000 × 0.15 = 150,000.
• Step 2: Remaining points = 1,000,000 – 150,000 = 850,000.
• Step 3: Average per remaining cluster = 850,000 ÷ (10 – 1) = 850,000 ÷ 9 ≈ 94,444 points.

7. A predictive model takes 0.004 seconds per prediction. If 2,500,000 predictions are made, calculate the total processing time in seconds, convert it to minutes, and then to hours.
Solution:
• Step 1: Total time in seconds = 2,500,000 × 0.004 = 10,000 seconds.
• Step 2: Time in minutes = 10,000 ÷ 60 ≈ 166.67 minutes.
• Step 3: Time in hours = 166.67 ÷ 60 ≈ 2.78 hours.

8. A company’s big data analytics project increases revenue by 8% on an initial revenue of $2,000,000. Calculate the revenue after the increase, the absolute revenue gain, and the percentage increase.
Solution:
• Step 1: Revenue gain = $2,000,000 × 0.08 = $160,000.
• Step 2: New revenue = $2,000,000 + $160,000 = $2,160,000.
• Step 3: Percentage increase = ($160,000 ÷ $2,000,000) × 100 = 8%.

9. A dataset contains 10 features and 1,000,000 records. If feature reduction techniques remove 30% of the features, calculate the new number of features, and the percentage reduction in the dataset’s dimensionality.
Solution:
• Step 1: Features removed = 10 × 0.30 = 3 features.
• Step 2: New feature count = 10 – 3 = 7 features.
• Step 3: Percentage reduction = (3 ÷ 10) × 100 = 30%.

10. A linear regression model is defined as y = 4x + 5. For x = 20, calculate the predicted y, then if the actual y is 90, compute the absolute error and the percentage error relative to the actual value.
Solution:
• Step 1: Predicted y = 4 × 20 + 5 = 80 + 5 = 85.
• Step 2: Absolute error = |90 – 85| = 5.
• Step 3: Percentage error = (5 ÷ 90) × 100 ≈ 5.56%.

11. A time series model forecasts a monthly growth rate of 3% on an initial revenue of $500,000. Calculate the revenue after one month, after six months (compounded monthly), and the overall percentage growth over six months.
Solution:
• Step 1: Revenue after one month = $500,000 × 1.03 = $515,000.
• Step 2: Revenue after six months = $500,000 × (1.03)^6 ≈ $500,000 × 1.19405 = $597,025.
• Step 3: Overall percentage growth = (($597,025 – $500,000) ÷ $500,000) × 100 ≈ 19.41%.

12. A data analytics project reduces operational costs by 20% from an initial cost of $800,000 annually. Calculate the annual cost after reduction, the cost savings, and the ROI if the project investment is $100,000.
Solution:
• Step 1: Annual cost after reduction = $800,000 × 0.80 = $640,000.
• Step 2: Cost savings = $800,000 – $640,000 = $160,000.
• Step 3: ROI = ($160,000 ÷ $100,000) × 100 = 160%.