Loading the libraries
Number of Loans per State
1. Plotting the Results:
Analysing Existing Debt
Creating Functions:
Conclusion

In the fourth quarter of 2020 I tried to enrole for a diploma/fellowship at Correlation One. Need less to say, I did not make it. However, I was challenged by one of their enrollement/assesment requirements.

The question that I was required to answer was the following…

How many loans were given for the Carlifornia State?

I kept the .csv data and managed to revisit it. Lets dive in:

Loading the libraries

We will need Pandas, Matplotlib as well as some inbuilt Python functionality.

import pandas as pd
import matplotlib.pyplot as plt

data=pd.read_csv('DS4A_data.csv')
data.head()

	RECORD_CREATION_DATE	LOAN_AMOUNT	CREDIT_RANGE	EXISTING_DEBT	AGENT_NOTES	OFFICE_LOCATION	DEFAULTED
0	01May2014:09:24:27.000000	2500	760-779	$10,001-$20,000	NaN	NORTHERN CALIFORNIA	False
1	01May2014:09:48:36.000000	2500	700-719	$1-$10,000	Annual gross income: $25,400	SOUTHERN CALIFORNIA	False
2	01May2014:10:10:36.000000	6300	740-759	$100,000+	NaN	ARIZONA	True
3	01May2014:10:31:25.000000	4300	780-799	$1-$10,000	Annual gross income: $191,900	SOUTHERN CALIFORNIA	False
4	01May2014:10:46:54.000000	20100	780-799	$90,001-$100,000	Verified monthly debt payments: $1,700	SOUTHERN CALIFORNIA	True

The data has seven columns. My first desire is to see how many states are in this data set.

state_list=[]
for index, row in data.iterrows():
    if row['OFFICE_LOCATION'] not in state_list:
        state_list.append(row['OFFICE_LOCATION'])

state_list

['NORTHERN CALIFORNIA',
 'SOUTHERN CALIFORNIA',
 'ARIZONA',
 'OREGON',
 'WASHINGTON STATE',
 'NEVADA']

Number of Loans per State

It appears the data has only six states. Now let us find out how many loans were awarded per state. We will use a dictionary to store the results as follows:

loan_num_per_state = {}

for index, row in data.iterrows():
    if row['OFFICE_LOCATION'] in loan_num_per_state:
        loan_num_per_state[row['OFFICE_LOCATION']]+=1
    else:
        loan_num_per_state[row['OFFICE_LOCATION']]=1

loan_num_per_state

{'NORTHERN CALIFORNIA': 1715,
 'SOUTHERN CALIFORNIA': 2096,
 'ARIZONA': 1621,
 'OREGON': 755,
 'WASHINGTON STATE': 774,
 'NEVADA': 1039}

Plotting the Results:

While the dictionary above is clear, we would like to plot a bar graph of the data. This will make the information easier to interpret. One problem that arose was that the state names were overlapping and not readable. The rotation option enables us to slant the column names to our desired angle. Beautiful….

plt.plot()
plt.bar(range(len(loan_num_per_state)), list(loan_num_per_state.values()), align='center')
plt.xticks(range(len(loan_num_per_state)), list(loan_num_per_state.keys()), rotation=70)
plt.xlabel('States')
plt.ylabel('Number of Loans awarded')
plt.show()

png

Analysing Existing Debt

In the Pandas dataframe I noticed that there was a column for existing debt. Using a similar approach as above, I obtained a dictionary for the various debt categories. As expected, the majority of the loans awarded were to people who had a low existing debt or non at all. Here we assumed that nan meant they had none.

loan_existing_debt = {}

for index, row in data.iterrows():
    if row['EXISTING_DEBT'] in loan_existing_debt:
        loan_existing_debt[row['EXISTING_DEBT']]+=1
    else:
        loan_existing_debt[row['EXISTING_DEBT']]=1

plt.plot()
plt.bar(range(len(loan_existing_debt)), list(loan_existing_debt.values()), align='center')
plt.xticks(range(len(loan_existing_debt)), list(loan_existing_debt.keys()), rotation=70)
plt.xlabel('Previous Debt')
plt.ylabel('Number of Loans awarded')
plt.show()

png

Creating Functions:

It would be interesting to get the relationship between number of loans awared, total loan value and the loan averge by state. However, it would be better to create a function to capture all the operations for repeatability.

def loan_data_by_state(info, state):
    state_stats = {'Num_Loans': 0, 'Total_Loans': 0, 'Average_loan': 0}
    for index, row in info.iterrows():
        if row['OFFICE_LOCATION'] == state:
            state_stats['Num_Loans']+=1
            state_stats['Total_Loans']+=row['LOAN_AMOUNT']             
    state_stats['Average_loan'] = state_stats['Total_Loans']/state_stats['Num_Loans']
    return state_stats

def loan_vs_default_by_state(info, state):
    state_stats = {'Num_Loans': 0, 'Num_Defaulted': 0}
    for index, row in info.iterrows():
        if row['OFFICE_LOCATION'] == state:
            state_stats['Num_Loans']+=1
            if row['DEFAULTED']=='False':
                state_stats['Num_Defaulted']+=1                
    return state_stats

loan_data_per_state={}
for state_ in state_list:
    loan_data_per_state[state_]=loan_data_by_state(data,state_)

loan_data_per_state
plt.xlabel('States')
plt.ylabel('Number of Loans awarded')

{'NORTHERN CALIFORNIA': {'Num_Loans': 1715,
  'Total_Loans': 11324600,
  'Average_loan': 6603.265306122449},
 'SOUTHERN CALIFORNIA': {'Num_Loans': 2096,
  'Total_Loans': 13573600,
  'Average_loan': 6475.954198473282},
 'ARIZONA': {'Num_Loans': 1621,
  'Total_Loans': 10612900,
  'Average_loan': 6547.131400370142},
 'OREGON': {'Num_Loans': 755,
  'Total_Loans': 4713900,
  'Average_loan': 6243.576158940397},
 'WASHINGTON STATE': {'Num_Loans': 774,
  'Total_Loans': 5034600,
  'Average_loan': 6504.6511627906975},
 'NEVADA': {'Num_Loans': 1039,
  'Total_Loans': 6758500,
  'Average_loan': 6504.812319538018}}

loan_default_per_state={}
for state_ in state_list:
    loan_default_per_state[state_]=loan_vs_default_by_state(data,state_)

loan_default_per_state

{'NORTHERN CALIFORNIA': {'Num_Loans': 1715, 'Num_Defaulted': 0},
 'SOUTHERN CALIFORNIA': {'Num_Loans': 2096, 'Num_Defaulted': 0},
 'ARIZONA': {'Num_Loans': 1621, 'Num_Defaulted': 0},
 'OREGON': {'Num_Loans': 755, 'Num_Defaulted': 0},
 'WASHINGTON STATE': {'Num_Loans': 774, 'Num_Defaulted': 0},
 'NEVADA': {'Num_Loans': 1039, 'Num_Defaulted': 0}}

Conclusion

I still have a lot to learn, and would appreciate another chance at the DS4A Fellowship. However, it is not offered to people outside the United States.

Still, I am grateful.