Abalone Age Estimation with Predictive Modelling and Feature Analysis: A Web-Integrated Approach
Abalone Age Estimation with Predictive Modelling and Feature Analysis: A Web-Integrated Approach
The main purpose of the project was to build a web-based application that predicts a target variable (e.g., age, price, or some other measure) using a pre-trained machine learning model. The application allows users to input relevant features, receive a prediction, and store this prediction in a PostgreSQL database.
The project involved:
- Training and saving a machine learning model (LightGBM).
- Building a web interface using FastAPI.
- Integrating with a PostgreSQL database to persist predictions.
- Using Jinja2 templates to render HTML pages.
- Implementing logging to monitor the app’s behavior and errors.
This setup allows users to easily interact with the model and keeps a record of all predictions for later review.
Here’s a breakdown of everything I did
Jupyter Notebook
Data Loading and Exploration:
- I loaded the dataset, performed basic exploratory data analysis, and visualized feature correlations using heatmaps and scatter plots.
- I encoded the ‘Sex’ feature with
LabelEncoder
.
Outlier Detection:
- I applied IsolationForest to identify and remove outliers and visualized the results with box plots.
Train-Test Split:
- I split the dataset into training and testing sets to prepare for model development.
Regression Models:
- I trained and optimized several regression models, including RandomForestRegressor, GradientBoostingRegressor, Ridge, and LightGBM.I used BayesianOptimization and Optuna for hyperparameter tuning.I evaluated the models using MSE, MAE, and R² metrics.
Model Comparison:
- I visualized and compared the models’ performance using bar plots.
Feature Importance:
- I used SHAP values to analyze feature importance and created various SHAP plots to visualize the results.
Classification Models:
- I defined objective functions for tuning GradientBoostingClassifier, DecisionTreeClassifier, and RandomForestClassifier with Optuna.
- I trained and evaluate these classifiers next using ROC-AUC curve.
FastAPI Application (app1.py):
App Initialization:
- I set up a FastAPI application with all necessary imports and configurations.I initialized logging with a custom
logging_config.py
.
Model Loading:
- I loaded the pre-trained LightGBM model from
model.lgb
.
Database Setup:
- I established a connection to a PostgreSQL database.I created a table to store predictions if it didn’t already exist.
API Endpoints:
- GET /: Serves the main input page (
index.html
). - POST /predict: Receives input data, makes predictions with the LightGBM model, stores the predictions in the database, and returns the result page (
result.html
). - GET /view-predictions: Displays stored predictions from the database on a page (
predictions.html
).
HTML Templates:
index.html:
- This is the main input page where users can enter features for prediction.
- It includes a form with fields for sex, length, diameter, height, weight, shucked_weight, viscera_weight, and shell_weight.
result.html:
- This page displays the prediction result after form submission.
- It shows the predicted value returned by the model.
predictions.html:
- Lists all stored predictions, including input data and corresponding predictions.
- Provides a historical view of all predictions made with the application.
Data Drift Monitoring with Evidently.ai:
Data Preparation:
- I prepared my dataset by dropping the ‘id’ column and adding the prediction results to the DataFrame
df3
. - I sampled 5000 records from the cleaned dataset (
df_cleaned
) as the reference data and 5000 records fromdf3
as the current data.
- I prepared my dataset by dropping the ‘id’ column and adding the prediction results to the DataFrame
Column Mapping:
- I defined a
ColumnMapping
to specify the target variable (‘Age’), numerical features, and categorical features. The numerical features were selected based on the columns indf1
, excluding ‘Age’, ‘id’, and ‘Sex’.
- I defined a
Report Generation:
- I created a
Report
instance with theDataDriftPreset
metric to detect data drift. - I ran the report using the reference and current data, along with the column mapping.
- I displayed the report and saved it as an HTML file (
file.html
) for further review.
- I created a
Our Latest Projects
Far far away, behind the word mountains, far from the countries Vokalia and Consonantia
About
An AI Geek and a lifelong learner, who thrives in coding and problem-solving through ML, DL, and LLMs.
Copyright ©2024 All rights reserved.