Introduction:
Introduce the Boston housing dataset, its origins, and its relevance to real estate analysis and machine learning.
Data Acquisition:
- Describe how the dataset was sourced from the statistical repositories and loaded into a Python environment using Pandas and requests.
- Provide an overview of the dataset's structure and initial exploration.
Data Cleaning and Preparation:
- Detail the steps taken to clean the dataset, including handling missing values, converting data types, and ensuring consistency across columns.
Exploratory Data Analysis (EDA):
- Discuss the exploratory analysis conducted on key features:
- Crime rate per capita (CRIM)
- Proportion of residential land zoned for lots over 25,000 sq.ft. (ZN)
- Proportion of non-retail business acres per town (INDUS)
- Presence of Charles River dummy variable (CHAS)
- Nitric oxides concentration (NOX)
- Average number of rooms per dwelling (RM)
- Proportion of owner-occupied units built prior to 1940 (AGE)
- Weighted distances to employment centres (DIS)
- Index of accessibility to radial highways (RAD)
- Full-value property-tax rate per $10,000 (TAX)
- Pupil-teacher ratio by town (PTRATIO)
- Proportion of blacks by town (B)
- % lower status of the population (LSTAT)
- Median value of owner-occupied homes in $1000's (MEDV)
Data Visualization:
- Showcase histograms and plots that illustrate the distributions and correlations of these features.
Machine Learning:
- Implement a linear regression model using Scikit-learn to predict housing prices based on selected features.
- Discuss model evaluation metrics such as mean squared error (MSE) and coefficient of determination (R²).
- Optional: Explore polynomial regression or feature engineering to improve model performance.
Findings:
Based on the 1st graph below, the y-axis (frequency) indicates the majority of people desired to pick the location that is close to their work, known as employment centre.

Based on the 2nd graph below, the y-axis is referring to the concentration of students per teacher.