Loan Prediction Model

Developed a machine learning model using Logistic Regression, Decision Trees, and Random Forest to predict loan eligibility based on customer data. Conducted in-depth exploratory data analysis, data cleaning, and preprocessing, resulting in a highly accurate model with credit history and co-applicant income as significant factors

Introduction:

This project was undertaken with the aim of revolutionizing the loan approval process by automating it in real-time. By leveraging customer data, we identified eligible customer segments and streamlined lending decisions.

Project Overview

The lending of money, a fundamental function in the world of finance, involves assessing a borrower's ability to repay a loan. This project addresses this challenge by treating it as a classification problem. We developed machine learning models to predict loan eligibility based on customer data.

Key Objectives

Automate the loan eligibility assessment process.
Identify customer segments eligible for loans.
Optimize lending decisions.
Project Phases

Data Acquisition and Exploration

We acquired the dataset from Kaggle, which included both training and test data.

Data Cleaning

Data cleaning was an essential step as many columns contained null values. We identified columns with null values exceeding a specific threshold and removed them.

Data Visualization

We performed exploratory data analysis using libraries such as NumPy and Seaborn to gain insights into the data.

Machine Learning Models

We employed various machine learning algorithms to create our model:

Logistic Regression: A widely used technique for binary classification problems.

Decision Tree: Particularly useful for classification tasks by splitting data into homogeneous sets.

Random Forest: A powerful ensemble method that combines multiple decision trees for robust predictions.

Exploratory Data Analysis

We began with an analysis of the target variable and explored how each feature related to loan eligibility.

Logistic Regression outperformed other models, with Random Forest performing better than Decision Tree.

Pre-processing

Data preprocessing involved transforming raw data into a clean dataset. This step is crucial for effective analysis.

Conclusions

Our project achieved several milestones:

In-depth exploratory data analysis of the dataset, revealing valuable insights.
Removal of null values and data cleaning.
Generation of hypotheses and the identification of key variables influencing loan decisions.
Calculation of correlations between independent variables.
Construction of models, with credit history and co-applicant income emerging as significant factors.
The final model incorporated co-applicant income and credit history, achieving the highest accuracy.