saahil

Abalone Age Estimation with Predictive Modelling and Feature Analysis: A Web-Integrated Approach

The main purpose of the project was to build a web-based application that predicts a target variable (e.g., age, price, or some other measure) using a pre-trained machine learning model. The application allows users to input relevant features, receive a prediction, and store this prediction in a PostgreSQL database.

The project involved:

Training and saving a machine learning model (LightGBM).
Building a web interface using FastAPI.
Integrating with a PostgreSQL database to persist predictions.
Using Jinja2 templates to render HTML pages.
Implementing logging to monitor the app’s behavior and errors.

This setup allows users to easily interact with the model and keeps a record of all predictions for later review.

Here’s a breakdown of everything I did

Jupyter Notebook

Data Loading and Exploration:

I loaded the dataset, performed basic exploratory data analysis, and visualized feature correlations using heatmaps and scatter plots.
I encoded the ‘Sex’ feature with LabelEncoder.

Outlier Detection:

I applied IsolationForest to identify and remove outliers and visualized the results with box plots.

Train-Test Split:

I split the dataset into training and testing sets to prepare for model development.

Regression Models:

I trained and optimized several regression models, including RandomForestRegressor, GradientBoostingRegressor, Ridge, and LightGBM.I used BayesianOptimization and Optuna for hyperparameter tuning.I evaluated the models using MSE, MAE, and R² metrics.

Model Comparison:

I visualized and compared the models’ performance using bar plots.

Feature Importance:

I used SHAP values to analyze feature importance and created various SHAP plots to visualize the results.

Classification Models:

I defined objective functions for tuning GradientBoostingClassifier, DecisionTreeClassifier, and RandomForestClassifier with Optuna.
I trained and evaluate these classifiers next using ROC-AUC curve.

FastAPI Application (app1.py):

App Initialization:

I set up a FastAPI application with all necessary imports and configurations.I initialized logging with a custom logging_config.py.

Model Loading:

I loaded the pre-trained LightGBM model from model.lgb.

Database Setup:

I established a connection to a PostgreSQL database.I created a table to store predictions if it didn’t already exist.

API Endpoints:

GET /: Serves the main input page (index.html).
POST /predict: Receives input data, makes predictions with the LightGBM model, stores the predictions in the database, and returns the result page (result.html).
GET /view-predictions: Displays stored predictions from the database on a page (predictions.html).

HTML Templates:

index.html:

This is the main input page where users can enter features for prediction.
It includes a form with fields for sex, length, diameter, height, weight, shucked_weight, viscera_weight, and shell_weight.

result.html:

This page displays the prediction result after form submission.
It shows the predicted value returned by the model.

predictions.html:

Lists all stored predictions, including input data and corresponding predictions.
Provides a historical view of all predictions made with the application.

Data Drift Monitoring with Evidently.ai:

Data Preparation:
- I prepared my dataset by dropping the ‘id’ column and adding the prediction results to the DataFrame df3.
- I sampled 5000 records from the cleaned dataset (df_cleaned) as the reference data and 5000 records from df3 as the current data.
Column Mapping:
- I defined a ColumnMapping to specify the target variable (‘Age’), numerical features, and categorical features. The numerical features were selected based on the columns in df1, excluding ‘Age’, ‘id’, and ‘Sex’.
Report Generation:
- I created a Report instance with the DataDriftPreset metric to detect data drift.
- I ran the report using the reference and current data, along with the column mapping.
- I displayed the report and saved it as an HTML file (file.html) for further review.

Our Latest Projects

Far far away, behind the word mountains, far from the countries Vokalia and Consonantia

Abalone Age Estimation with Predictive Modelling and Feature Analysis: A Web-Integrated Approach