Previous Applications¶
About the data¶
previous_application: This dataset has details of previous applications made by clients to Home Credit. Only those clients find place here who also exist in application data. Each current loan in the application data (identified by SK_ID_CURR) can have multiple previous loan applications. Each previous application has one row and is identified by the feature SK_ID_PREV.
Feature Explanations¶
SK_ID_PREV : ID of previous credit in Home credit related to loan in our sample. (One loan in our sample can have 0,1,2 or more previous loan applications in Home Credit, previous application could, but not necessarily have to lead to credit)
SK_ID_CURR: ID of loan in our sample
NAME_CONTRACT_TYPE: Contract product type (Cash loan, consumer loan [POS] ,...) of the previous application
AMT_ANNUITY: Annuity of previous application
AMT_APPLICATION: For how much credit did client ask on the previous application
AMT_CREDIT: Final credit amount on the previous application. This differs from AMT_APPLICATION in a way that the AMT_APPLICATION is the amount for which the client initially applied for, but during our approval process he could have received different amount - AMT_CREDIT
AMT_DOWN_PAYMENT: Down payment on the previous application
AMT_GOODS_PRICE: Goods price of good that client asked for (if applicable) on the previous application
WEEKDAY_APPR_PROCESS_START: On which day of the week did the client apply for previous application
HOUR_APPR_PROCESS_START: Approximately at what day hour did the client apply for the previous application
FLAG_LAST_APPL_PER_CONTRACT: Flag if it was last application for the previous contract. Sometimes by mistake of client or our clerk there could be more applications for one single contract
NFLAG_LAST_APPL_IN_DAY: Flag if the application was the last application per day of the client. Sometimes clients apply for more applications a day. Rarely it could also be error in our system that one application is in the database twice
NFLAG_MICRO_CASH: Flag Micro finance loan
RATE_DOWN_PAYMENT: Down payment rate normalized on previous credit
RATE_INTEREST_PRIMARY: Interest rate normalized on previous credit
RATE_INTEREST_PRIVILEGED: Interest rate normalized on previous credit
NAME_CASH_LOAN_PURPOSE: Purpose of the cash loan
NAME_CONTRACT_STATUS: Contract status (approved, cancelled, ...) of previous application
DAYS_DECISION: Relative to current application when was the decision about previous application made
NAME_PAYMENT_TYPE: Payment method that client chose to pay for the previous application
CODE_REJECT_REASON: Why was the previous application rejected
NAME_TYPE_SUITE: Who accompanied client when applying for the previous application
NAME_CLIENT_TYPE: Was the client old or new client when applying for the previous application
NAME_GOODS_CATEGORY: What kind of goods did the client apply for in the previous application
NAME_PORTFOLIO: Was the previous application for CASH, POS, CAR, …
NAME_PRODUCT_TYPE: Was the previous application x-sell o walk-in
CHANNEL_TYPE: Through which channel we acquired the client on the previous application
SELLERPLACE_AREA: Selling area of seller place of the previous application
NAME_SELLER_INDUSTRY: The industry of the seller
CNT_PAYMENT: Term of previous credit at application of the previous application
NAME_YIELD_GROUP: Grouped interest rate into small medium and high of the previous application
PRODUCT_COMBINATION: Detailed product combination of the previous application
DAYS_FIRST_DRAWING: Relative to application date of current application when was the first disbursement of the previous application
DAYS_FIRST_DUE: Relative to application date of current application when was the first due supposed to be of the previous application
DAYS_LAST_DUE_1ST_VERSION: Relative to application date of current application when was the first due of the previous application
DAYS_LAST_DUE: Relative to application date of current application when was the last due date of the previous application
DAYS_TERMINATION: Relative to application date of current application when was the expected termination of the previous application
NFLAG_INSURED_ON_APPROVAL: Did the client requested insurance during the previous application
# Last amended: 24rd October, 2020
# Myfolder: C:\Users\Administrator\OneDrive\Documents\home_credit_default_risk
# Objective:
# Solving Kaggle problem: Home Credit Default Risk
# Processing previous_application dataset
#
# Data Source: https://www.kaggle.com/c/home-credit-default-risk/data
# Ref: https://www.kaggle.com/jsaguiar/lightgbm-with-simple-features
# 1.0 Libraries
# (Some of these may not be needed here.)
%reset -f
import numpy as np
import pandas as pd
import gc
# 1.1 Reduce read data size
# There is a file reducing.py
# in this folder. A class
# in it is used to reduce
# dataframe size
# (Code modified by me to
# exclude 'category' dtype)
import reducing
# 1.2 Misc
import warnings
import os
warnings.simplefilter(action='ignore', category=FutureWarning)
# 1.3
pd.set_option('display.max_colwidth', -1)
pd.set_option('display.max_columns', 100)
# 1.4 Display multiple commands outputs from a cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# 2.0 Onehot encoding (OHE) function. Uses pd.get_dummies()
# i) To transform 'object' columns to dummies.
# ii) Treat NaN as one of the categories
# iii) Returns transformed-data and new-columns created
def one_hot_encoder(df, nan_as_category = True):
original_columns = list(df.columns)
categorical_columns = [col for col in df.columns if df[col].dtype == 'object']
df = pd.get_dummies(df,
columns= categorical_columns,
dummy_na= nan_as_category # Treat NaNs as category
)
new_columns = [c for c in df.columns if c not in original_columns]
return df, new_columns
# 2.1
pathToFolder = "C:\\Users\\Administrator\\OneDrive\\Documents\\home_credit_default_risk"
os.chdir(pathToFolder)
# 2.2 Some constants
num_rows=None # Implies read all rows
nan_as_category = True # While transforming
# 'object' columns to dummies
# 3.0 Read previous application data first
prev = pd.read_csv(
'previous_application.csv.zip',
nrows = num_rows
)
# 3.0.1 Reduce memory usage by appropriately
# changing data-types per feature:
prev = reducing.Reducer().reduce(prev)
reduced df from 471.4808 MB to 414.1386 MB in 3.82 seconds
# 3.0.2
prev.shape # (rows=16,70,214, cols = 37)
prev.head(5)
prev.columns
(1670214, 37)
SK_ID_PREV | SK_ID_CURR | NAME_CONTRACT_TYPE | AMT_ANNUITY | AMT_APPLICATION | AMT_CREDIT | AMT_DOWN_PAYMENT | AMT_GOODS_PRICE | WEEKDAY_APPR_PROCESS_START | HOUR_APPR_PROCESS_START | FLAG_LAST_APPL_PER_CONTRACT | NFLAG_LAST_APPL_IN_DAY | RATE_DOWN_PAYMENT | RATE_INTEREST_PRIMARY | RATE_INTEREST_PRIVILEGED | NAME_CASH_LOAN_PURPOSE | NAME_CONTRACT_STATUS | DAYS_DECISION | NAME_PAYMENT_TYPE | CODE_REJECT_REASON | NAME_TYPE_SUITE | NAME_CLIENT_TYPE | NAME_GOODS_CATEGORY | NAME_PORTFOLIO | NAME_PRODUCT_TYPE | CHANNEL_TYPE | SELLERPLACE_AREA | NAME_SELLER_INDUSTRY | CNT_PAYMENT | NAME_YIELD_GROUP | PRODUCT_COMBINATION | DAYS_FIRST_DRAWING | DAYS_FIRST_DUE | DAYS_LAST_DUE_1ST_VERSION | DAYS_LAST_DUE | DAYS_TERMINATION | NFLAG_INSURED_ON_APPROVAL | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2030495 | 271877 | Consumer loans | 1730.430 | 17145.0 | 17145.0 | 0.0 | 17145.0 | SATURDAY | 15 | Y | 1 | 0.0 | 0.182832 | 0.867336 | XAP | Approved | -73 | Cash through the bank | XAP | NaN | Repeater | Mobile | POS | XNA | Country-wide | 35 | Connectivity | 12.0 | middle | POS mobile with interest | 365243.0 | -42.0 | 300.0 | -42.0 | -37.0 | 0.0 |
1 | 2802425 | 108129 | Cash loans | 25188.615 | 607500.0 | 679671.0 | NaN | 607500.0 | THURSDAY | 11 | Y | 1 | NaN | NaN | NaN | XNA | Approved | -164 | XNA | XAP | Unaccompanied | Repeater | XNA | Cash | x-sell | Contact center | -1 | XNA | 36.0 | low_action | Cash X-Sell: low | 365243.0 | -134.0 | 916.0 | 365243.0 | 365243.0 | 1.0 |
2 | 2523466 | 122040 | Cash loans | 15060.735 | 112500.0 | 136444.5 | NaN | 112500.0 | TUESDAY | 11 | Y | 1 | NaN | NaN | NaN | XNA | Approved | -301 | Cash through the bank | XAP | Spouse, partner | Repeater | XNA | Cash | x-sell | Credit and cash offices | -1 | XNA | 12.0 | high | Cash X-Sell: high | 365243.0 | -271.0 | 59.0 | 365243.0 | 365243.0 | 1.0 |
3 | 2819243 | 176158 | Cash loans | 47041.335 | 450000.0 | 470790.0 | NaN | 450000.0 | MONDAY | 7 | Y | 1 | NaN | NaN | NaN | XNA | Approved | -512 | Cash through the bank | XAP | NaN | Repeater | XNA | Cash | x-sell | Credit and cash offices | -1 | XNA | 12.0 | middle | Cash X-Sell: middle | 365243.0 | -482.0 | -152.0 | -182.0 | -177.0 | 1.0 |
4 | 1784265 | 202054 | Cash loans | 31924.395 | 337500.0 | 404055.0 | NaN | 337500.0 | THURSDAY | 9 | Y | 1 | NaN | NaN | NaN | Repairs | Refused | -781 | Cash through the bank | HC | NaN | Repeater | XNA | Cash | walk-in | Credit and cash offices | -1 | XNA | 24.0 | high | Cash Street: high | NaN | NaN | NaN | NaN | NaN | NaN |
Index(['SK_ID_PREV', 'SK_ID_CURR', 'NAME_CONTRACT_TYPE', 'AMT_ANNUITY',
'AMT_APPLICATION', 'AMT_CREDIT', 'AMT_DOWN_PAYMENT', 'AMT_GOODS_PRICE',
'WEEKDAY_APPR_PROCESS_START', 'HOUR_APPR_PROCESS_START',
'FLAG_LAST_APPL_PER_CONTRACT', 'NFLAG_LAST_APPL_IN_DAY',
'RATE_DOWN_PAYMENT', 'RATE_INTEREST_PRIMARY',
'RATE_INTEREST_PRIVILEGED', 'NAME_CASH_LOAN_PURPOSE',
'NAME_CONTRACT_STATUS', 'DAYS_DECISION', 'NAME_PAYMENT_TYPE',
'CODE_REJECT_REASON', 'NAME_TYPE_SUITE', 'NAME_CLIENT_TYPE',
'NAME_GOODS_CATEGORY', 'NAME_PORTFOLIO', 'NAME_PRODUCT_TYPE',
'CHANNEL_TYPE', 'SELLERPLACE_AREA', 'NAME_SELLER_INDUSTRY',
'CNT_PAYMENT', 'NAME_YIELD_GROUP', 'PRODUCT_COMBINATION',
'DAYS_FIRST_DRAWING', 'DAYS_FIRST_DUE', 'DAYS_LAST_DUE_1ST_VERSION',
'DAYS_LAST_DUE', 'DAYS_TERMINATION', 'NFLAG_INSURED_ON_APPROVAL'],
dtype='object')
# 3.1 Let us examine how many unique IDs exist
prev['SK_ID_PREV'].nunique() # 1670214 Unique number
prev['SK_ID_CURR'].nunique() # 338857 So a number of repeat exist
# We have to aggregate over it
# to extract behaviour of clients
1670214
338857
# 3.2 Let us see distribution of dtypes
# There are 16 'object' types here
prev.dtypes.value_counts()
object 16
float64 14
uint8 2
uint32 2
float32 1
int16 1
int32 1
dtype: int64
# 3.3
prev.shape # (1670214, 37)
# 3.3.1
# What is the actual number of persons
# who might have taken multiple loans?
prev['SK_ID_CURR'].nunique() # 338857 -- Many duplicate values exist
# Consider SK_ID_CURR as Foreign Key
# Primary key exists in application_train data
# Primary key: SK_ID_BUREAU
# 3.3.2
# As expected, there are no duplicate values here
prev['SK_ID_PREV'].nunique() # 1670214 -- Unique id for each row
(1670214, 37)
338857
1670214
# 4.0 OneHotEncode (OHE) 'object' types in bureau
prev, cat_cols = one_hot_encoder(
prev,
nan_as_category= True
)
# 4.1
len(cat_cols) # 159
cat_cols
159
['NAME_CONTRACT_TYPE_Cash loans',
'NAME_CONTRACT_TYPE_Consumer loans',
'NAME_CONTRACT_TYPE_Revolving loans',
'NAME_CONTRACT_TYPE_XNA',
'NAME_CONTRACT_TYPE_nan',
'WEEKDAY_APPR_PROCESS_START_FRIDAY',
'WEEKDAY_APPR_PROCESS_START_MONDAY',
'WEEKDAY_APPR_PROCESS_START_SATURDAY',
'WEEKDAY_APPR_PROCESS_START_SUNDAY',
'WEEKDAY_APPR_PROCESS_START_THURSDAY',
'WEEKDAY_APPR_PROCESS_START_TUESDAY',
'WEEKDAY_APPR_PROCESS_START_WEDNESDAY',
'WEEKDAY_APPR_PROCESS_START_nan',
'FLAG_LAST_APPL_PER_CONTRACT_N',
'FLAG_LAST_APPL_PER_CONTRACT_Y',
'FLAG_LAST_APPL_PER_CONTRACT_nan',
'NAME_CASH_LOAN_PURPOSE_Building a house or an annex',
'NAME_CASH_LOAN_PURPOSE_Business development',
'NAME_CASH_LOAN_PURPOSE_Buying a garage',
'NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land',
'NAME_CASH_LOAN_PURPOSE_Buying a home',
'NAME_CASH_LOAN_PURPOSE_Buying a new car',
'NAME_CASH_LOAN_PURPOSE_Buying a used car',
'NAME_CASH_LOAN_PURPOSE_Car repairs',
'NAME_CASH_LOAN_PURPOSE_Education',
'NAME_CASH_LOAN_PURPOSE_Everyday expenses',
'NAME_CASH_LOAN_PURPOSE_Furniture',
'NAME_CASH_LOAN_PURPOSE_Gasification / water supply',
'NAME_CASH_LOAN_PURPOSE_Hobby',
'NAME_CASH_LOAN_PURPOSE_Journey',
'NAME_CASH_LOAN_PURPOSE_Medicine',
'NAME_CASH_LOAN_PURPOSE_Money for a third person',
'NAME_CASH_LOAN_PURPOSE_Other',
'NAME_CASH_LOAN_PURPOSE_Payments on other loans',
'NAME_CASH_LOAN_PURPOSE_Purchase of electronic equipment',
'NAME_CASH_LOAN_PURPOSE_Refusal to name the goal',
'NAME_CASH_LOAN_PURPOSE_Repairs',
'NAME_CASH_LOAN_PURPOSE_Urgent needs',
'NAME_CASH_LOAN_PURPOSE_Wedding / gift / holiday',
'NAME_CASH_LOAN_PURPOSE_XAP',
'NAME_CASH_LOAN_PURPOSE_XNA',
'NAME_CASH_LOAN_PURPOSE_nan',
'NAME_CONTRACT_STATUS_Approved',
'NAME_CONTRACT_STATUS_Canceled',
'NAME_CONTRACT_STATUS_Refused',
'NAME_CONTRACT_STATUS_Unused offer',
'NAME_CONTRACT_STATUS_nan',
'NAME_PAYMENT_TYPE_Cash through the bank',
'NAME_PAYMENT_TYPE_Cashless from the account of the employer',
'NAME_PAYMENT_TYPE_Non-cash from your account',
'NAME_PAYMENT_TYPE_XNA',
'NAME_PAYMENT_TYPE_nan',
'CODE_REJECT_REASON_CLIENT',
'CODE_REJECT_REASON_HC',
'CODE_REJECT_REASON_LIMIT',
'CODE_REJECT_REASON_SCO',
'CODE_REJECT_REASON_SCOFR',
'CODE_REJECT_REASON_SYSTEM',
'CODE_REJECT_REASON_VERIF',
'CODE_REJECT_REASON_XAP',
'CODE_REJECT_REASON_XNA',
'CODE_REJECT_REASON_nan',
'NAME_TYPE_SUITE_Children',
'NAME_TYPE_SUITE_Family',
'NAME_TYPE_SUITE_Group of people',
'NAME_TYPE_SUITE_Other_A',
'NAME_TYPE_SUITE_Other_B',
'NAME_TYPE_SUITE_Spouse, partner',
'NAME_TYPE_SUITE_Unaccompanied',
'NAME_TYPE_SUITE_nan',
'NAME_CLIENT_TYPE_New',
'NAME_CLIENT_TYPE_Refreshed',
'NAME_CLIENT_TYPE_Repeater',
'NAME_CLIENT_TYPE_XNA',
'NAME_CLIENT_TYPE_nan',
'NAME_GOODS_CATEGORY_Additional Service',
'NAME_GOODS_CATEGORY_Animals',
'NAME_GOODS_CATEGORY_Audio/Video',
'NAME_GOODS_CATEGORY_Auto Accessories',
'NAME_GOODS_CATEGORY_Clothing and Accessories',
'NAME_GOODS_CATEGORY_Computers',
'NAME_GOODS_CATEGORY_Construction Materials',
'NAME_GOODS_CATEGORY_Consumer Electronics',
'NAME_GOODS_CATEGORY_Direct Sales',
'NAME_GOODS_CATEGORY_Education',
'NAME_GOODS_CATEGORY_Fitness',
'NAME_GOODS_CATEGORY_Furniture',
'NAME_GOODS_CATEGORY_Gardening',
'NAME_GOODS_CATEGORY_Homewares',
'NAME_GOODS_CATEGORY_House Construction',
'NAME_GOODS_CATEGORY_Insurance',
'NAME_GOODS_CATEGORY_Jewelry',
'NAME_GOODS_CATEGORY_Medical Supplies',
'NAME_GOODS_CATEGORY_Medicine',
'NAME_GOODS_CATEGORY_Mobile',
'NAME_GOODS_CATEGORY_Office Appliances',
'NAME_GOODS_CATEGORY_Other',
'NAME_GOODS_CATEGORY_Photo / Cinema Equipment',
'NAME_GOODS_CATEGORY_Sport and Leisure',
'NAME_GOODS_CATEGORY_Tourism',
'NAME_GOODS_CATEGORY_Vehicles',
'NAME_GOODS_CATEGORY_Weapon',
'NAME_GOODS_CATEGORY_XNA',
'NAME_GOODS_CATEGORY_nan',
'NAME_PORTFOLIO_Cards',
'NAME_PORTFOLIO_Cars',
'NAME_PORTFOLIO_Cash',
'NAME_PORTFOLIO_POS',
'NAME_PORTFOLIO_XNA',
'NAME_PORTFOLIO_nan',
'NAME_PRODUCT_TYPE_XNA',
'NAME_PRODUCT_TYPE_walk-in',
'NAME_PRODUCT_TYPE_x-sell',
'NAME_PRODUCT_TYPE_nan',
'CHANNEL_TYPE_AP+ (Cash loan)',
'CHANNEL_TYPE_Car dealer',
'CHANNEL_TYPE_Channel of corporate sales',
'CHANNEL_TYPE_Contact center',
'CHANNEL_TYPE_Country-wide',
'CHANNEL_TYPE_Credit and cash offices',
'CHANNEL_TYPE_Regional / Local',
'CHANNEL_TYPE_Stone',
'CHANNEL_TYPE_nan',
'NAME_SELLER_INDUSTRY_Auto technology',
'NAME_SELLER_INDUSTRY_Clothing',
'NAME_SELLER_INDUSTRY_Connectivity',
'NAME_SELLER_INDUSTRY_Construction',
'NAME_SELLER_INDUSTRY_Consumer electronics',
'NAME_SELLER_INDUSTRY_Furniture',
'NAME_SELLER_INDUSTRY_Industry',
'NAME_SELLER_INDUSTRY_Jewelry',
'NAME_SELLER_INDUSTRY_MLM partners',
'NAME_SELLER_INDUSTRY_Tourism',
'NAME_SELLER_INDUSTRY_XNA',
'NAME_SELLER_INDUSTRY_nan',
'NAME_YIELD_GROUP_XNA',
'NAME_YIELD_GROUP_high',
'NAME_YIELD_GROUP_low_action',
'NAME_YIELD_GROUP_low_normal',
'NAME_YIELD_GROUP_middle',
'NAME_YIELD_GROUP_nan',
'PRODUCT_COMBINATION_Card Street',
'PRODUCT_COMBINATION_Card X-Sell',
'PRODUCT_COMBINATION_Cash',
'PRODUCT_COMBINATION_Cash Street: high',
'PRODUCT_COMBINATION_Cash Street: low',
'PRODUCT_COMBINATION_Cash Street: middle',
'PRODUCT_COMBINATION_Cash X-Sell: high',
'PRODUCT_COMBINATION_Cash X-Sell: low',
'PRODUCT_COMBINATION_Cash X-Sell: middle',
'PRODUCT_COMBINATION_POS household with interest',
'PRODUCT_COMBINATION_POS household without interest',
'PRODUCT_COMBINATION_POS industry with interest',
'PRODUCT_COMBINATION_POS industry without interest',
'PRODUCT_COMBINATION_POS mobile with interest',
'PRODUCT_COMBINATION_POS mobile without interest',
'PRODUCT_COMBINATION_POS other with interest',
'PRODUCT_COMBINATION_POS others without interest',
'PRODUCT_COMBINATION_nan']
# 4.2.1 Just examine NULLs in few features
prev['DAYS_FIRST_DRAWING'].isnull().sum() # 673065
# 4.2.2 And also this special constant value: 365243
(prev['DAYS_FIRST_DRAWING'] == 365243).sum() # 934444
prev['DAYS_FIRST_DUE'].isnull().sum() # 673065
(prev['DAYS_FIRST_DUE'] == 365243).sum() # 40645
prev['DAYS_LAST_DUE'].isnull().sum() # 673065
(prev['DAYS_LAST_DUE'] == 365243).sum() # 211221
prev['DAYS_TERMINATION'].isnull().sum() # 673065
(prev['DAYS_TERMINATION']== 365243).sum() # 225913
673065
934444
673065
40645
673065
211221
673065
225913
# 4.3 Examine total number of unique values
# in each one of the above four features
prev['DAYS_FIRST_DRAWING'].nunique() # 2838
prev['DAYS_FIRST_DRAWING'].sort_values(ascending = False)[:5]
prev['DAYS_FIRST_DUE'].nunique() # 2892
prev['DAYS_LAST_DUE'].nunique() # 2873
prev['DAYS_TERMINATION'].nunique() # 2830
2838
1670213 365243.0
572469 365243.0
572494 365243.0
572491 365243.0
572489 365243.0
Name: DAYS_FIRST_DRAWING, dtype: float64
2892
2873
2830
# 4.4 Convert Days 365243 values to nan
prev['DAYS_FIRST_DRAWING'].replace(365243, np.nan, inplace= True)
prev['DAYS_FIRST_DUE'].replace(365243, np.nan, inplace= True)
prev['DAYS_LAST_DUE_1ST_VERSION'].replace(365243, np.nan, inplace= True)
prev['DAYS_LAST_DUE'].replace(365243, np.nan, inplace= True)
prev['DAYS_TERMINATION'].replace(365243, np.nan, inplace= True)
# 4.5 So how many NULLS now exist in each one of
# these four features:
prev['DAYS_FIRST_DRAWING'].isnull().sum() # 1607509
prev['DAYS_FIRST_DUE'].isnull().sum() # 713710
prev['DAYS_LAST_DUE'].isnull().sum() # 884286
prev['DAYS_TERMINATION'].isnull().sum() # 898978
1607509
713710
884286
898978
Perform aggregations¶
On the whole of dataset, perform aggregations for numerical features and perform aggregations on just created OHE features. Numerical features are being aggregated as: min, max, mean.. while OHE features aggregation is just 'mean'.
# 5.0 One special feature
# Add feature: value ask / value received percentage
prev['APP_CREDIT_PERC'] = prev['AMT_APPLICATION'] / prev['AMT_CREDIT']
# 5.1 Numeric features aggregations:
# Dictionary of what all operations are to
# performed on numerical features:
num_aggregations = {
'AMT_ANNUITY': ['min', 'max', 'mean'],
'AMT_APPLICATION': ['min', 'max', 'mean'],
'AMT_CREDIT': ['min', 'max', 'mean'],
'APP_CREDIT_PERC': ['min', 'max', 'mean', 'var'],
'AMT_DOWN_PAYMENT': ['min', 'max', 'mean'],
'AMT_GOODS_PRICE': ['min', 'max', 'mean'],
'HOUR_APPR_PROCESS_START': ['min', 'max', 'mean'],
'RATE_DOWN_PAYMENT': ['min', 'max', 'mean'],
'DAYS_DECISION': ['min', 'max', 'mean'],
'CNT_PAYMENT': ['mean', 'sum'],
}
# 5.2 Categorical features
# Create a dictionary for aggregation operations:
cat_aggregations = {}
for cat in cat_cols:
cat_aggregations[cat] = ['mean']
# 5.2.1
cat_aggregations
{'NAME_CONTRACT_TYPE_Cash loans': ['mean'],
'NAME_CONTRACT_TYPE_Consumer loans': ['mean'],
'NAME_CONTRACT_TYPE_Revolving loans': ['mean'],
'NAME_CONTRACT_TYPE_XNA': ['mean'],
'NAME_CONTRACT_TYPE_nan': ['mean'],
'WEEKDAY_APPR_PROCESS_START_FRIDAY': ['mean'],
'WEEKDAY_APPR_PROCESS_START_MONDAY': ['mean'],
'WEEKDAY_APPR_PROCESS_START_SATURDAY': ['mean'],
'WEEKDAY_APPR_PROCESS_START_SUNDAY': ['mean'],
'WEEKDAY_APPR_PROCESS_START_THURSDAY': ['mean'],
'WEEKDAY_APPR_PROCESS_START_TUESDAY': ['mean'],
'WEEKDAY_APPR_PROCESS_START_WEDNESDAY': ['mean'],
'WEEKDAY_APPR_PROCESS_START_nan': ['mean'],
'FLAG_LAST_APPL_PER_CONTRACT_N': ['mean'],
'FLAG_LAST_APPL_PER_CONTRACT_Y': ['mean'],
'FLAG_LAST_APPL_PER_CONTRACT_nan': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Building a house or an annex': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Business development': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Buying a garage': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Buying a home': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Buying a new car': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Buying a used car': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Car repairs': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Education': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Everyday expenses': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Furniture': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Gasification / water supply': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Hobby': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Journey': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Medicine': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Money for a third person': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Other': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Payments on other loans': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Purchase of electronic equipment': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Refusal to name the goal': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Repairs': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Urgent needs': ['mean'],
'NAME_CASH_LOAN_PURPOSE_Wedding / gift / holiday': ['mean'],
'NAME_CASH_LOAN_PURPOSE_XAP': ['mean'],
'NAME_CASH_LOAN_PURPOSE_XNA': ['mean'],
'NAME_CASH_LOAN_PURPOSE_nan': ['mean'],
'NAME_CONTRACT_STATUS_Approved': ['mean'],
'NAME_CONTRACT_STATUS_Canceled': ['mean'],
'NAME_CONTRACT_STATUS_Refused': ['mean'],
'NAME_CONTRACT_STATUS_Unused offer': ['mean'],
'NAME_CONTRACT_STATUS_nan': ['mean'],
'NAME_PAYMENT_TYPE_Cash through the bank': ['mean'],
'NAME_PAYMENT_TYPE_Cashless from the account of the employer': ['mean'],
'NAME_PAYMENT_TYPE_Non-cash from your account': ['mean'],
'NAME_PAYMENT_TYPE_XNA': ['mean'],
'NAME_PAYMENT_TYPE_nan': ['mean'],
'CODE_REJECT_REASON_CLIENT': ['mean'],
'CODE_REJECT_REASON_HC': ['mean'],
'CODE_REJECT_REASON_LIMIT': ['mean'],
'CODE_REJECT_REASON_SCO': ['mean'],
'CODE_REJECT_REASON_SCOFR': ['mean'],
'CODE_REJECT_REASON_SYSTEM': ['mean'],
'CODE_REJECT_REASON_VERIF': ['mean'],
'CODE_REJECT_REASON_XAP': ['mean'],
'CODE_REJECT_REASON_XNA': ['mean'],
'CODE_REJECT_REASON_nan': ['mean'],
'NAME_TYPE_SUITE_Children': ['mean'],
'NAME_TYPE_SUITE_Family': ['mean'],
'NAME_TYPE_SUITE_Group of people': ['mean'],
'NAME_TYPE_SUITE_Other_A': ['mean'],
'NAME_TYPE_SUITE_Other_B': ['mean'],
'NAME_TYPE_SUITE_Spouse, partner': ['mean'],
'NAME_TYPE_SUITE_Unaccompanied': ['mean'],
'NAME_TYPE_SUITE_nan': ['mean'],
'NAME_CLIENT_TYPE_New': ['mean'],
'NAME_CLIENT_TYPE_Refreshed': ['mean'],
'NAME_CLIENT_TYPE_Repeater': ['mean'],
'NAME_CLIENT_TYPE_XNA': ['mean'],
'NAME_CLIENT_TYPE_nan': ['mean'],
'NAME_GOODS_CATEGORY_Additional Service': ['mean'],
'NAME_GOODS_CATEGORY_Animals': ['mean'],
'NAME_GOODS_CATEGORY_Audio/Video': ['mean'],
'NAME_GOODS_CATEGORY_Auto Accessories': ['mean'],
'NAME_GOODS_CATEGORY_Clothing and Accessories': ['mean'],
'NAME_GOODS_CATEGORY_Computers': ['mean'],
'NAME_GOODS_CATEGORY_Construction Materials': ['mean'],
'NAME_GOODS_CATEGORY_Consumer Electronics': ['mean'],
'NAME_GOODS_CATEGORY_Direct Sales': ['mean'],
'NAME_GOODS_CATEGORY_Education': ['mean'],
'NAME_GOODS_CATEGORY_Fitness': ['mean'],
'NAME_GOODS_CATEGORY_Furniture': ['mean'],
'NAME_GOODS_CATEGORY_Gardening': ['mean'],
'NAME_GOODS_CATEGORY_Homewares': ['mean'],
'NAME_GOODS_CATEGORY_House Construction': ['mean'],
'NAME_GOODS_CATEGORY_Insurance': ['mean'],
'NAME_GOODS_CATEGORY_Jewelry': ['mean'],
'NAME_GOODS_CATEGORY_Medical Supplies': ['mean'],
'NAME_GOODS_CATEGORY_Medicine': ['mean'],
'NAME_GOODS_CATEGORY_Mobile': ['mean'],
'NAME_GOODS_CATEGORY_Office Appliances': ['mean'],
'NAME_GOODS_CATEGORY_Other': ['mean'],
'NAME_GOODS_CATEGORY_Photo / Cinema Equipment': ['mean'],
'NAME_GOODS_CATEGORY_Sport and Leisure': ['mean'],
'NAME_GOODS_CATEGORY_Tourism': ['mean'],
'NAME_GOODS_CATEGORY_Vehicles': ['mean'],
'NAME_GOODS_CATEGORY_Weapon': ['mean'],
'NAME_GOODS_CATEGORY_XNA': ['mean'],
'NAME_GOODS_CATEGORY_nan': ['mean'],
'NAME_PORTFOLIO_Cards': ['mean'],
'NAME_PORTFOLIO_Cars': ['mean'],
'NAME_PORTFOLIO_Cash': ['mean'],
'NAME_PORTFOLIO_POS': ['mean'],
'NAME_PORTFOLIO_XNA': ['mean'],
'NAME_PORTFOLIO_nan': ['mean'],
'NAME_PRODUCT_TYPE_XNA': ['mean'],
'NAME_PRODUCT_TYPE_walk-in': ['mean'],
'NAME_PRODUCT_TYPE_x-sell': ['mean'],
'NAME_PRODUCT_TYPE_nan': ['mean'],
'CHANNEL_TYPE_AP+ (Cash loan)': ['mean'],
'CHANNEL_TYPE_Car dealer': ['mean'],
'CHANNEL_TYPE_Channel of corporate sales': ['mean'],
'CHANNEL_TYPE_Contact center': ['mean'],
'CHANNEL_TYPE_Country-wide': ['mean'],
'CHANNEL_TYPE_Credit and cash offices': ['mean'],
'CHANNEL_TYPE_Regional / Local': ['mean'],
'CHANNEL_TYPE_Stone': ['mean'],
'CHANNEL_TYPE_nan': ['mean'],
'NAME_SELLER_INDUSTRY_Auto technology': ['mean'],
'NAME_SELLER_INDUSTRY_Clothing': ['mean'],
'NAME_SELLER_INDUSTRY_Connectivity': ['mean'],
'NAME_SELLER_INDUSTRY_Construction': ['mean'],
'NAME_SELLER_INDUSTRY_Consumer electronics': ['mean'],
'NAME_SELLER_INDUSTRY_Furniture': ['mean'],
'NAME_SELLER_INDUSTRY_Industry': ['mean'],
'NAME_SELLER_INDUSTRY_Jewelry': ['mean'],
'NAME_SELLER_INDUSTRY_MLM partners': ['mean'],
'NAME_SELLER_INDUSTRY_Tourism': ['mean'],
'NAME_SELLER_INDUSTRY_XNA': ['mean'],
'NAME_SELLER_INDUSTRY_nan': ['mean'],
'NAME_YIELD_GROUP_XNA': ['mean'],
'NAME_YIELD_GROUP_high': ['mean'],
'NAME_YIELD_GROUP_low_action': ['mean'],
'NAME_YIELD_GROUP_low_normal': ['mean'],
'NAME_YIELD_GROUP_middle': ['mean'],
'NAME_YIELD_GROUP_nan': ['mean'],
'PRODUCT_COMBINATION_Card Street': ['mean'],
'PRODUCT_COMBINATION_Card X-Sell': ['mean'],
'PRODUCT_COMBINATION_Cash': ['mean'],
'PRODUCT_COMBINATION_Cash Street: high': ['mean'],
'PRODUCT_COMBINATION_Cash Street: low': ['mean'],
'PRODUCT_COMBINATION_Cash Street: middle': ['mean'],
'PRODUCT_COMBINATION_Cash X-Sell: high': ['mean'],
'PRODUCT_COMBINATION_Cash X-Sell: low': ['mean'],
'PRODUCT_COMBINATION_Cash X-Sell: middle': ['mean'],
'PRODUCT_COMBINATION_POS household with interest': ['mean'],
'PRODUCT_COMBINATION_POS household without interest': ['mean'],
'PRODUCT_COMBINATION_POS industry with interest': ['mean'],
'PRODUCT_COMBINATION_POS industry without interest': ['mean'],
'PRODUCT_COMBINATION_POS mobile with interest': ['mean'],
'PRODUCT_COMBINATION_POS mobile without interest': ['mean'],
'PRODUCT_COMBINATION_POS other with interest': ['mean'],
'PRODUCT_COMBINATION_POS others without interest': ['mean'],
'PRODUCT_COMBINATION_nan': ['mean']}
# 5.3 Perform aggregation now on SK_ID_CURR:
grouped = prev.groupby('SK_ID_CURR')
prev_agg=grouped.agg({**num_aggregations, **cat_aggregations})
# 5.3.1
prev_agg.shape # (338857, 189)
prev_agg.columns
prev_agg.head()
(338857, 189)
MultiIndex([( 'AMT_ANNUITY', 'min'),
( 'AMT_ANNUITY', 'max'),
( 'AMT_ANNUITY', 'mean'),
( 'AMT_APPLICATION', 'min'),
( 'AMT_APPLICATION', 'max'),
( 'AMT_APPLICATION', 'mean'),
( 'AMT_CREDIT', 'min'),
( 'AMT_CREDIT', 'max'),
( 'AMT_CREDIT', 'mean'),
( 'APP_CREDIT_PERC', 'min'),
...
( 'PRODUCT_COMBINATION_Cash X-Sell: middle', 'mean'),
( 'PRODUCT_COMBINATION_POS household with interest', 'mean'),
('PRODUCT_COMBINATION_POS household without interest', 'mean'),
( 'PRODUCT_COMBINATION_POS industry with interest', 'mean'),
( 'PRODUCT_COMBINATION_POS industry without interest', 'mean'),
( 'PRODUCT_COMBINATION_POS mobile with interest', 'mean'),
( 'PRODUCT_COMBINATION_POS mobile without interest', 'mean'),
( 'PRODUCT_COMBINATION_POS other with interest', 'mean'),
( 'PRODUCT_COMBINATION_POS others without interest', 'mean'),
( 'PRODUCT_COMBINATION_nan', 'mean')],
length=189)
AMT_ANNUITY | AMT_APPLICATION | AMT_CREDIT | APP_CREDIT_PERC | AMT_DOWN_PAYMENT | AMT_GOODS_PRICE | HOUR_APPR_PROCESS_START | RATE_DOWN_PAYMENT | DAYS_DECISION | CNT_PAYMENT | NAME_CONTRACT_TYPE_Cash loans | NAME_CONTRACT_TYPE_Consumer loans | NAME_CONTRACT_TYPE_Revolving loans | NAME_CONTRACT_TYPE_XNA | NAME_CONTRACT_TYPE_nan | WEEKDAY_APPR_PROCESS_START_FRIDAY | WEEKDAY_APPR_PROCESS_START_MONDAY | WEEKDAY_APPR_PROCESS_START_SATURDAY | WEEKDAY_APPR_PROCESS_START_SUNDAY | WEEKDAY_APPR_PROCESS_START_THURSDAY | WEEKDAY_APPR_PROCESS_START_TUESDAY | WEEKDAY_APPR_PROCESS_START_WEDNESDAY | WEEKDAY_APPR_PROCESS_START_nan | FLAG_LAST_APPL_PER_CONTRACT_N | FLAG_LAST_APPL_PER_CONTRACT_Y | FLAG_LAST_APPL_PER_CONTRACT_nan | NAME_CASH_LOAN_PURPOSE_Building a house or an annex | NAME_CASH_LOAN_PURPOSE_Business development | NAME_CASH_LOAN_PURPOSE_Buying a garage | NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land | ... | NAME_PORTFOLIO_nan | NAME_PRODUCT_TYPE_XNA | NAME_PRODUCT_TYPE_walk-in | NAME_PRODUCT_TYPE_x-sell | NAME_PRODUCT_TYPE_nan | CHANNEL_TYPE_AP+ (Cash loan) | CHANNEL_TYPE_Car dealer | CHANNEL_TYPE_Channel of corporate sales | CHANNEL_TYPE_Contact center | CHANNEL_TYPE_Country-wide | CHANNEL_TYPE_Credit and cash offices | CHANNEL_TYPE_Regional / Local | CHANNEL_TYPE_Stone | CHANNEL_TYPE_nan | NAME_SELLER_INDUSTRY_Auto technology | NAME_SELLER_INDUSTRY_Clothing | NAME_SELLER_INDUSTRY_Connectivity | NAME_SELLER_INDUSTRY_Construction | NAME_SELLER_INDUSTRY_Consumer electronics | NAME_SELLER_INDUSTRY_Furniture | NAME_SELLER_INDUSTRY_Industry | NAME_SELLER_INDUSTRY_Jewelry | NAME_SELLER_INDUSTRY_MLM partners | NAME_SELLER_INDUSTRY_Tourism | NAME_SELLER_INDUSTRY_XNA | NAME_SELLER_INDUSTRY_nan | NAME_YIELD_GROUP_XNA | NAME_YIELD_GROUP_high | NAME_YIELD_GROUP_low_action | NAME_YIELD_GROUP_low_normal | NAME_YIELD_GROUP_middle | NAME_YIELD_GROUP_nan | PRODUCT_COMBINATION_Card Street | PRODUCT_COMBINATION_Card X-Sell | PRODUCT_COMBINATION_Cash | PRODUCT_COMBINATION_Cash Street: high | PRODUCT_COMBINATION_Cash Street: low | PRODUCT_COMBINATION_Cash Street: middle | PRODUCT_COMBINATION_Cash X-Sell: high | PRODUCT_COMBINATION_Cash X-Sell: low | PRODUCT_COMBINATION_Cash X-Sell: middle | PRODUCT_COMBINATION_POS household with interest | PRODUCT_COMBINATION_POS household without interest | PRODUCT_COMBINATION_POS industry with interest | PRODUCT_COMBINATION_POS industry without interest | PRODUCT_COMBINATION_POS mobile with interest | PRODUCT_COMBINATION_POS mobile without interest | PRODUCT_COMBINATION_POS other with interest | PRODUCT_COMBINATION_POS others without interest | PRODUCT_COMBINATION_nan | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
min | max | mean | min | max | mean | min | max | mean | min | max | mean | var | min | max | mean | min | max | mean | min | max | mean | min | max | mean | min | max | mean | mean | sum | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | ... | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | mean | |
SK_ID_CURR | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100001 | 3951.000 | 3951.000 | 3951.000 | 24835.5 | 24835.5 | 24835.50 | 23787.0 | 23787.0 | 23787.00 | 1.044079 | 1.044079 | 1.044079 | NaN | 2520.0 | 2520.0 | 2520.0 | 24835.5 | 24835.5 | 24835.5 | 13 | 13 | 13.000000 | 0.104326 | 0.104326 | 0.104326 | -1740 | -1740 | -1740.0 | 8.0 | 8.0 | 0.000000 | 1.000000 | 0.0 | 0.0 | 0 | 1.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0 | 1.000000 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.000000 | 0.000000 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0 | 0.0 | 1.0 | 0.0 | 0.000000 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
100002 | 9251.775 | 9251.775 | 9251.775 | 179055.0 | 179055.0 | 179055.00 | 179055.0 | 179055.0 | 179055.00 | 1.000000 | 1.000000 | 1.000000 | NaN | 0.0 | 0.0 | 0.0 | 179055.0 | 179055.0 | 179055.0 | 9 | 9 | 9.000000 | 0.000000 | 0.000000 | 0.000000 | -606 | -606 | -606.0 | 24.0 | 24.0 | 0.000000 | 1.000000 | 0.0 | 0.0 | 0 | 0.000000 | 0.0 | 1.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0 | 1.000000 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 1.000000 | 0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 1.000000 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
100003 | 6737.310 | 98356.995 | 56553.990 | 68809.5 | 900000.0 | 435436.50 | 68053.5 | 1035882.0 | 484191.00 | 0.868825 | 1.011109 | 0.949329 | 0.005324 | 0.0 | 6885.0 | 3442.5 | 68809.5 | 900000.0 | 435436.5 | 12 | 17 | 14.666667 | 0.000000 | 0.100061 | 0.050030 | -2341 | -746 | -1305.0 | 10.0 | 30.0 | 0.333333 | 0.666667 | 0.0 | 0.0 | 0 | 0.333333 | 0.0 | 0.333333 | 0.333333 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0 | 0.666667 | 0.0 | 0.333333 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.333333 | 0.333333 | 0.0 | 0.333333 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.333333 | 0.333333 | 0.0 | 0.0 | 0.0 | 0.0 | 0.333333 | 0 | 0.0 | 0.0 | 0.0 | 0.333333 | 0.666667 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.333333 | 0.0 | 0.333333 | 0.0 | 0.333333 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
100004 | 5357.250 | 5357.250 | 5357.250 | 24282.0 | 24282.0 | 24282.00 | 20106.0 | 20106.0 | 20106.00 | 1.207699 | 1.207699 | 1.207699 | NaN | 4860.0 | 4860.0 | 4860.0 | 24282.0 | 24282.0 | 24282.0 | 5 | 5 | 5.000000 | 0.212008 | 0.212008 | 0.212008 | -815 | -815 | -815.0 | 4.0 | 4.0 | 0.000000 | 1.000000 | 0.0 | 0.0 | 0 | 1.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0 | 1.000000 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 1.0 | 0.000000 | 0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.000000 | 1.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
100005 | 4813.200 | 4813.200 | 4813.200 | 0.0 | 44617.5 | 22308.75 | 0.0 | 40153.5 | 20076.75 | 1.111173 | 1.111173 | 1.111173 | NaN | 4464.0 | 4464.0 | 4464.0 | 44617.5 | 44617.5 | 44617.5 | 10 | 11 | 10.500000 | 0.108964 | 0.108964 | 0.108964 | -757 | -315 | -536.0 | 12.0 | 12.0 | 0.500000 | 0.500000 | 0.0 | 0.0 | 0 | 0.500000 | 0.0 | 0.000000 | 0.000000 | 0.5 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0 | 1.000000 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.500000 | 0.500000 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.5 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.500000 | 0 | 0.5 | 0.5 | 0.0 | 0.000000 | 0.000000 | 0 | 0.0 | 0.0 | 0.5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.5 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 189 columns
# 5.4 Rename multiindex columns:
prev_agg.columns = pd.Index(['PREV_' + e[0] + "_" + e[1].upper() for e in prev_agg.columns.tolist()])
# 5.5
prev_agg.shape # (338857, 189)
prev_agg.columns
prev_agg.head()
(338857, 189)
Index(['PREV_AMT_ANNUITY_MIN', 'PREV_AMT_ANNUITY_MAX', 'PREV_AMT_ANNUITY_MEAN',
'PREV_AMT_APPLICATION_MIN', 'PREV_AMT_APPLICATION_MAX',
'PREV_AMT_APPLICATION_MEAN', 'PREV_AMT_CREDIT_MIN',
'PREV_AMT_CREDIT_MAX', 'PREV_AMT_CREDIT_MEAN',
'PREV_APP_CREDIT_PERC_MIN',
...
'PREV_PRODUCT_COMBINATION_Cash X-Sell: middle_MEAN',
'PREV_PRODUCT_COMBINATION_POS household with interest_MEAN',
'PREV_PRODUCT_COMBINATION_POS household without interest_MEAN',
'PREV_PRODUCT_COMBINATION_POS industry with interest_MEAN',
'PREV_PRODUCT_COMBINATION_POS industry without interest_MEAN',
'PREV_PRODUCT_COMBINATION_POS mobile with interest_MEAN',
'PREV_PRODUCT_COMBINATION_POS mobile without interest_MEAN',
'PREV_PRODUCT_COMBINATION_POS other with interest_MEAN',
'PREV_PRODUCT_COMBINATION_POS others without interest_MEAN',
'PREV_PRODUCT_COMBINATION_nan_MEAN'],
dtype='object', length=189)
PREV_AMT_ANNUITY_MIN | PREV_AMT_ANNUITY_MAX | PREV_AMT_ANNUITY_MEAN | PREV_AMT_APPLICATION_MIN | PREV_AMT_APPLICATION_MAX | PREV_AMT_APPLICATION_MEAN | PREV_AMT_CREDIT_MIN | PREV_AMT_CREDIT_MAX | PREV_AMT_CREDIT_MEAN | PREV_APP_CREDIT_PERC_MIN | PREV_APP_CREDIT_PERC_MAX | PREV_APP_CREDIT_PERC_MEAN | PREV_APP_CREDIT_PERC_VAR | PREV_AMT_DOWN_PAYMENT_MIN | PREV_AMT_DOWN_PAYMENT_MAX | PREV_AMT_DOWN_PAYMENT_MEAN | PREV_AMT_GOODS_PRICE_MIN | PREV_AMT_GOODS_PRICE_MAX | PREV_AMT_GOODS_PRICE_MEAN | PREV_HOUR_APPR_PROCESS_START_MIN | PREV_HOUR_APPR_PROCESS_START_MAX | PREV_HOUR_APPR_PROCESS_START_MEAN | PREV_RATE_DOWN_PAYMENT_MIN | PREV_RATE_DOWN_PAYMENT_MAX | PREV_RATE_DOWN_PAYMENT_MEAN | PREV_DAYS_DECISION_MIN | PREV_DAYS_DECISION_MAX | PREV_DAYS_DECISION_MEAN | PREV_CNT_PAYMENT_MEAN | PREV_CNT_PAYMENT_SUM | PREV_NAME_CONTRACT_TYPE_Cash loans_MEAN | PREV_NAME_CONTRACT_TYPE_Consumer loans_MEAN | PREV_NAME_CONTRACT_TYPE_Revolving loans_MEAN | PREV_NAME_CONTRACT_TYPE_XNA_MEAN | PREV_NAME_CONTRACT_TYPE_nan_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_FRIDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_MONDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_SATURDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_SUNDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_THURSDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_TUESDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_WEDNESDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_nan_MEAN | PREV_FLAG_LAST_APPL_PER_CONTRACT_N_MEAN | PREV_FLAG_LAST_APPL_PER_CONTRACT_Y_MEAN | PREV_FLAG_LAST_APPL_PER_CONTRACT_nan_MEAN | PREV_NAME_CASH_LOAN_PURPOSE_Building a house or an annex_MEAN | PREV_NAME_CASH_LOAN_PURPOSE_Business development_MEAN | PREV_NAME_CASH_LOAN_PURPOSE_Buying a garage_MEAN | PREV_NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land_MEAN | ... | PREV_NAME_PORTFOLIO_nan_MEAN | PREV_NAME_PRODUCT_TYPE_XNA_MEAN | PREV_NAME_PRODUCT_TYPE_walk-in_MEAN | PREV_NAME_PRODUCT_TYPE_x-sell_MEAN | PREV_NAME_PRODUCT_TYPE_nan_MEAN | PREV_CHANNEL_TYPE_AP+ (Cash loan)_MEAN | PREV_CHANNEL_TYPE_Car dealer_MEAN | PREV_CHANNEL_TYPE_Channel of corporate sales_MEAN | PREV_CHANNEL_TYPE_Contact center_MEAN | PREV_CHANNEL_TYPE_Country-wide_MEAN | PREV_CHANNEL_TYPE_Credit and cash offices_MEAN | PREV_CHANNEL_TYPE_Regional / Local_MEAN | PREV_CHANNEL_TYPE_Stone_MEAN | PREV_CHANNEL_TYPE_nan_MEAN | PREV_NAME_SELLER_INDUSTRY_Auto technology_MEAN | PREV_NAME_SELLER_INDUSTRY_Clothing_MEAN | PREV_NAME_SELLER_INDUSTRY_Connectivity_MEAN | PREV_NAME_SELLER_INDUSTRY_Construction_MEAN | PREV_NAME_SELLER_INDUSTRY_Consumer electronics_MEAN | PREV_NAME_SELLER_INDUSTRY_Furniture_MEAN | PREV_NAME_SELLER_INDUSTRY_Industry_MEAN | PREV_NAME_SELLER_INDUSTRY_Jewelry_MEAN | PREV_NAME_SELLER_INDUSTRY_MLM partners_MEAN | PREV_NAME_SELLER_INDUSTRY_Tourism_MEAN | PREV_NAME_SELLER_INDUSTRY_XNA_MEAN | PREV_NAME_SELLER_INDUSTRY_nan_MEAN | PREV_NAME_YIELD_GROUP_XNA_MEAN | PREV_NAME_YIELD_GROUP_high_MEAN | PREV_NAME_YIELD_GROUP_low_action_MEAN | PREV_NAME_YIELD_GROUP_low_normal_MEAN | PREV_NAME_YIELD_GROUP_middle_MEAN | PREV_NAME_YIELD_GROUP_nan_MEAN | PREV_PRODUCT_COMBINATION_Card Street_MEAN | PREV_PRODUCT_COMBINATION_Card X-Sell_MEAN | PREV_PRODUCT_COMBINATION_Cash_MEAN | PREV_PRODUCT_COMBINATION_Cash Street: high_MEAN | PREV_PRODUCT_COMBINATION_Cash Street: low_MEAN | PREV_PRODUCT_COMBINATION_Cash Street: middle_MEAN | PREV_PRODUCT_COMBINATION_Cash X-Sell: high_MEAN | PREV_PRODUCT_COMBINATION_Cash X-Sell: low_MEAN | PREV_PRODUCT_COMBINATION_Cash X-Sell: middle_MEAN | PREV_PRODUCT_COMBINATION_POS household with interest_MEAN | PREV_PRODUCT_COMBINATION_POS household without interest_MEAN | PREV_PRODUCT_COMBINATION_POS industry with interest_MEAN | PREV_PRODUCT_COMBINATION_POS industry without interest_MEAN | PREV_PRODUCT_COMBINATION_POS mobile with interest_MEAN | PREV_PRODUCT_COMBINATION_POS mobile without interest_MEAN | PREV_PRODUCT_COMBINATION_POS other with interest_MEAN | PREV_PRODUCT_COMBINATION_POS others without interest_MEAN | PREV_PRODUCT_COMBINATION_nan_MEAN | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SK_ID_CURR | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100001 | 3951.000 | 3951.000 | 3951.000 | 24835.5 | 24835.5 | 24835.50 | 23787.0 | 23787.0 | 23787.00 | 1.044079 | 1.044079 | 1.044079 | NaN | 2520.0 | 2520.0 | 2520.0 | 24835.5 | 24835.5 | 24835.5 | 13 | 13 | 13.000000 | 0.104326 | 0.104326 | 0.104326 | -1740 | -1740 | -1740.0 | 8.0 | 8.0 | 0.000000 | 1.000000 | 0.0 | 0.0 | 0 | 1.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0 | 1.000000 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.000000 | 0.000000 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0 | 0.0 | 1.0 | 0.0 | 0.000000 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 |
100002 | 9251.775 | 9251.775 | 9251.775 | 179055.0 | 179055.0 | 179055.00 | 179055.0 | 179055.0 | 179055.00 | 1.000000 | 1.000000 | 1.000000 | NaN | 0.0 | 0.0 | 0.0 | 179055.0 | 179055.0 | 179055.0 | 9 | 9 | 9.000000 | 0.000000 | 0.000000 | 0.000000 | -606 | -606 | -606.0 | 24.0 | 24.0 | 0.000000 | 1.000000 | 0.0 | 0.0 | 0 | 0.000000 | 0.0 | 1.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0 | 1.000000 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 1.000000 | 0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 1.000000 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
100003 | 6737.310 | 98356.995 | 56553.990 | 68809.5 | 900000.0 | 435436.50 | 68053.5 | 1035882.0 | 484191.00 | 0.868825 | 1.011109 | 0.949329 | 0.005324 | 0.0 | 6885.0 | 3442.5 | 68809.5 | 900000.0 | 435436.5 | 12 | 17 | 14.666667 | 0.000000 | 0.100061 | 0.050030 | -2341 | -746 | -1305.0 | 10.0 | 30.0 | 0.333333 | 0.666667 | 0.0 | 0.0 | 0 | 0.333333 | 0.0 | 0.333333 | 0.333333 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0 | 0.666667 | 0.0 | 0.333333 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.333333 | 0.333333 | 0.0 | 0.333333 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.333333 | 0.333333 | 0.0 | 0.0 | 0.0 | 0.0 | 0.333333 | 0 | 0.0 | 0.0 | 0.0 | 0.333333 | 0.666667 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.333333 | 0.0 | 0.333333 | 0.0 | 0.333333 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
100004 | 5357.250 | 5357.250 | 5357.250 | 24282.0 | 24282.0 | 24282.00 | 20106.0 | 20106.0 | 20106.00 | 1.207699 | 1.207699 | 1.207699 | NaN | 4860.0 | 4860.0 | 4860.0 | 24282.0 | 24282.0 | 24282.0 | 5 | 5 | 5.000000 | 0.212008 | 0.212008 | 0.212008 | -815 | -815 | -815.0 | 4.0 | 4.0 | 0.000000 | 1.000000 | 0.0 | 0.0 | 0 | 1.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0 | 1.000000 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.000000 | 1.0 | 0.000000 | 0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.000000 | 1.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 |
100005 | 4813.200 | 4813.200 | 4813.200 | 0.0 | 44617.5 | 22308.75 | 0.0 | 40153.5 | 20076.75 | 1.111173 | 1.111173 | 1.111173 | NaN | 4464.0 | 4464.0 | 4464.0 | 44617.5 | 44617.5 | 44617.5 | 10 | 11 | 10.500000 | 0.108964 | 0.108964 | 0.108964 | -757 | -315 | -536.0 | 12.0 | 12.0 | 0.500000 | 0.500000 | 0.0 | 0.0 | 0 | 0.500000 | 0.0 | 0.000000 | 0.000000 | 0.5 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0 | 1.000000 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.500000 | 0.500000 | 0.0 | 0.000000 | 0 | 0.0 | 0.0 | 0.5 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0.0 | 0.500000 | 0 | 0.5 | 0.5 | 0.0 | 0.000000 | 0.000000 | 0 | 0.0 | 0.0 | 0.5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.5 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 189 columns
More aggregations¶
Table, prev_agg, from previous operations is our main table that we will carry to next exercise. To this aggregated table, we add more aggregations.
We will perform aggregations on two subsets of data. On both the subsets on numerical features only. One subset is extracted by setting NAME_CONTRACT_STATUS_Approved == 1 and the other subset is extracted by setting NAME_CONTRACT_STATUS_Refused == 1.
It is as if we are trying to extract the behaviour of those whose previous applications have been approved and those whose previous applications have NOT been approved.
# 6.0 Previous Applications: Summarise numerical features from Approved Applications
approved = prev[prev['NAME_CONTRACT_STATUS_Approved'] == 1]
approved_agg = approved.groupby('SK_ID_CURR').agg(num_aggregations)
# 6.1 Look at the aggregated results:
approved_agg.columns
approved_agg.head()
MultiIndex([( 'AMT_ANNUITY', 'min'),
( 'AMT_ANNUITY', 'max'),
( 'AMT_ANNUITY', 'mean'),
( 'AMT_APPLICATION', 'min'),
( 'AMT_APPLICATION', 'max'),
( 'AMT_APPLICATION', 'mean'),
( 'AMT_CREDIT', 'min'),
( 'AMT_CREDIT', 'max'),
( 'AMT_CREDIT', 'mean'),
( 'APP_CREDIT_PERC', 'min'),
( 'APP_CREDIT_PERC', 'max'),
( 'APP_CREDIT_PERC', 'mean'),
( 'APP_CREDIT_PERC', 'var'),
( 'AMT_DOWN_PAYMENT', 'min'),
( 'AMT_DOWN_PAYMENT', 'max'),
( 'AMT_DOWN_PAYMENT', 'mean'),
( 'AMT_GOODS_PRICE', 'min'),
( 'AMT_GOODS_PRICE', 'max'),
( 'AMT_GOODS_PRICE', 'mean'),
('HOUR_APPR_PROCESS_START', 'min'),
('HOUR_APPR_PROCESS_START', 'max'),
('HOUR_APPR_PROCESS_START', 'mean'),
( 'RATE_DOWN_PAYMENT', 'min'),
( 'RATE_DOWN_PAYMENT', 'max'),
( 'RATE_DOWN_PAYMENT', 'mean'),
( 'DAYS_DECISION', 'min'),
( 'DAYS_DECISION', 'max'),
( 'DAYS_DECISION', 'mean'),
( 'CNT_PAYMENT', 'mean'),
( 'CNT_PAYMENT', 'sum')],
)
AMT_ANNUITY | AMT_APPLICATION | AMT_CREDIT | APP_CREDIT_PERC | AMT_DOWN_PAYMENT | AMT_GOODS_PRICE | HOUR_APPR_PROCESS_START | RATE_DOWN_PAYMENT | DAYS_DECISION | CNT_PAYMENT | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
min | max | mean | min | max | mean | min | max | mean | min | max | mean | var | min | max | mean | min | max | mean | min | max | mean | min | max | mean | min | max | mean | mean | sum | |
SK_ID_CURR | ||||||||||||||||||||||||||||||
100001 | 3951.000 | 3951.000 | 3951.000 | 24835.5 | 24835.5 | 24835.5 | 23787.0 | 23787.0 | 23787.0 | 1.044079 | 1.044079 | 1.044079 | NaN | 2520.0 | 2520.0 | 2520.0 | 24835.5 | 24835.5 | 24835.5 | 13 | 13 | 13.000000 | 0.104326 | 0.104326 | 0.104326 | -1740 | -1740 | -1740.0 | 8.0 | 8.0 |
100002 | 9251.775 | 9251.775 | 9251.775 | 179055.0 | 179055.0 | 179055.0 | 179055.0 | 179055.0 | 179055.0 | 1.000000 | 1.000000 | 1.000000 | NaN | 0.0 | 0.0 | 0.0 | 179055.0 | 179055.0 | 179055.0 | 9 | 9 | 9.000000 | 0.000000 | 0.000000 | 0.000000 | -606 | -606 | -606.0 | 24.0 | 24.0 |
100003 | 6737.310 | 98356.995 | 56553.990 | 68809.5 | 900000.0 | 435436.5 | 68053.5 | 1035882.0 | 484191.0 | 0.868825 | 1.011109 | 0.949329 | 0.005324 | 0.0 | 6885.0 | 3442.5 | 68809.5 | 900000.0 | 435436.5 | 12 | 17 | 14.666667 | 0.000000 | 0.100061 | 0.050030 | -2341 | -746 | -1305.0 | 10.0 | 30.0 |
100004 | 5357.250 | 5357.250 | 5357.250 | 24282.0 | 24282.0 | 24282.0 | 20106.0 | 20106.0 | 20106.0 | 1.207699 | 1.207699 | 1.207699 | NaN | 4860.0 | 4860.0 | 4860.0 | 24282.0 | 24282.0 | 24282.0 | 5 | 5 | 5.000000 | 0.212008 | 0.212008 | 0.212008 | -815 | -815 | -815.0 | 4.0 | 4.0 |
100005 | 4813.200 | 4813.200 | 4813.200 | 44617.5 | 44617.5 | 44617.5 | 40153.5 | 40153.5 | 40153.5 | 1.111173 | 1.111173 | 1.111173 | NaN | 4464.0 | 4464.0 | 4464.0 | 44617.5 | 44617.5 | 44617.5 | 11 | 11 | 11.000000 | 0.108964 | 0.108964 | 0.108964 | -757 | -757 | -757.0 | 12.0 | 12.0 |
# 6.2 Rename multi-index column names:
approved_agg.columns = pd.Index(['APPROVED_' + e[0] + "_" + e[1].upper() for e in approved_agg.columns.tolist()])
# 6.2.1 Look at it again:
approved_agg.shape # (337698, 30)
approved_agg.head()
(337698, 30)
APPROVED_AMT_ANNUITY_MIN | APPROVED_AMT_ANNUITY_MAX | APPROVED_AMT_ANNUITY_MEAN | APPROVED_AMT_APPLICATION_MIN | APPROVED_AMT_APPLICATION_MAX | APPROVED_AMT_APPLICATION_MEAN | APPROVED_AMT_CREDIT_MIN | APPROVED_AMT_CREDIT_MAX | APPROVED_AMT_CREDIT_MEAN | APPROVED_APP_CREDIT_PERC_MIN | APPROVED_APP_CREDIT_PERC_MAX | APPROVED_APP_CREDIT_PERC_MEAN | APPROVED_APP_CREDIT_PERC_VAR | APPROVED_AMT_DOWN_PAYMENT_MIN | APPROVED_AMT_DOWN_PAYMENT_MAX | APPROVED_AMT_DOWN_PAYMENT_MEAN | APPROVED_AMT_GOODS_PRICE_MIN | APPROVED_AMT_GOODS_PRICE_MAX | APPROVED_AMT_GOODS_PRICE_MEAN | APPROVED_HOUR_APPR_PROCESS_START_MIN | APPROVED_HOUR_APPR_PROCESS_START_MAX | APPROVED_HOUR_APPR_PROCESS_START_MEAN | APPROVED_RATE_DOWN_PAYMENT_MIN | APPROVED_RATE_DOWN_PAYMENT_MAX | APPROVED_RATE_DOWN_PAYMENT_MEAN | APPROVED_DAYS_DECISION_MIN | APPROVED_DAYS_DECISION_MAX | APPROVED_DAYS_DECISION_MEAN | APPROVED_CNT_PAYMENT_MEAN | APPROVED_CNT_PAYMENT_SUM | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SK_ID_CURR | ||||||||||||||||||||||||||||||
100001 | 3951.000 | 3951.000 | 3951.000 | 24835.5 | 24835.5 | 24835.5 | 23787.0 | 23787.0 | 23787.0 | 1.044079 | 1.044079 | 1.044079 | NaN | 2520.0 | 2520.0 | 2520.0 | 24835.5 | 24835.5 | 24835.5 | 13 | 13 | 13.000000 | 0.104326 | 0.104326 | 0.104326 | -1740 | -1740 | -1740.0 | 8.0 | 8.0 |
100002 | 9251.775 | 9251.775 | 9251.775 | 179055.0 | 179055.0 | 179055.0 | 179055.0 | 179055.0 | 179055.0 | 1.000000 | 1.000000 | 1.000000 | NaN | 0.0 | 0.0 | 0.0 | 179055.0 | 179055.0 | 179055.0 | 9 | 9 | 9.000000 | 0.000000 | 0.000000 | 0.000000 | -606 | -606 | -606.0 | 24.0 | 24.0 |
100003 | 6737.310 | 98356.995 | 56553.990 | 68809.5 | 900000.0 | 435436.5 | 68053.5 | 1035882.0 | 484191.0 | 0.868825 | 1.011109 | 0.949329 | 0.005324 | 0.0 | 6885.0 | 3442.5 | 68809.5 | 900000.0 | 435436.5 | 12 | 17 | 14.666667 | 0.000000 | 0.100061 | 0.050030 | -2341 | -746 | -1305.0 | 10.0 | 30.0 |
100004 | 5357.250 | 5357.250 | 5357.250 | 24282.0 | 24282.0 | 24282.0 | 20106.0 | 20106.0 | 20106.0 | 1.207699 | 1.207699 | 1.207699 | NaN | 4860.0 | 4860.0 | 4860.0 | 24282.0 | 24282.0 | 24282.0 | 5 | 5 | 5.000000 | 0.212008 | 0.212008 | 0.212008 | -815 | -815 | -815.0 | 4.0 | 4.0 |
100005 | 4813.200 | 4813.200 | 4813.200 | 44617.5 | 44617.5 | 44617.5 | 40153.5 | 40153.5 | 40153.5 | 1.111173 | 1.111173 | 1.111173 | NaN | 4464.0 | 4464.0 | 4464.0 | 44617.5 | 44617.5 | 44617.5 | 11 | 11 | 11.000000 | 0.108964 | 0.108964 | 0.108964 | -757 | -757 | -757.0 | 12.0 | 12.0 |
# 6.3 Join 'approved_agg' with 'prev_agg'.
prev_agg = prev_agg.join( # prev_agg is on the left
approved_agg, # table on the right
how='left', # Join on left table. All its rows remain
on='SK_ID_CURR' # Joining key.
)
# 6.3.1
prev_agg.shape # (338857, 219)
prev_agg.head()
(338857, 219)
PREV_AMT_ANNUITY_MIN | PREV_AMT_ANNUITY_MAX | PREV_AMT_ANNUITY_MEAN | PREV_AMT_APPLICATION_MIN | PREV_AMT_APPLICATION_MAX | PREV_AMT_APPLICATION_MEAN | PREV_AMT_CREDIT_MIN | PREV_AMT_CREDIT_MAX | PREV_AMT_CREDIT_MEAN | PREV_APP_CREDIT_PERC_MIN | PREV_APP_CREDIT_PERC_MAX | PREV_APP_CREDIT_PERC_MEAN | PREV_APP_CREDIT_PERC_VAR | PREV_AMT_DOWN_PAYMENT_MIN | PREV_AMT_DOWN_PAYMENT_MAX | PREV_AMT_DOWN_PAYMENT_MEAN | PREV_AMT_GOODS_PRICE_MIN | PREV_AMT_GOODS_PRICE_MAX | PREV_AMT_GOODS_PRICE_MEAN | PREV_HOUR_APPR_PROCESS_START_MIN | PREV_HOUR_APPR_PROCESS_START_MAX | PREV_HOUR_APPR_PROCESS_START_MEAN | PREV_RATE_DOWN_PAYMENT_MIN | PREV_RATE_DOWN_PAYMENT_MAX | PREV_RATE_DOWN_PAYMENT_MEAN | PREV_DAYS_DECISION_MIN | PREV_DAYS_DECISION_MAX | PREV_DAYS_DECISION_MEAN | PREV_CNT_PAYMENT_MEAN | PREV_CNT_PAYMENT_SUM | PREV_NAME_CONTRACT_TYPE_Cash loans_MEAN | PREV_NAME_CONTRACT_TYPE_Consumer loans_MEAN | PREV_NAME_CONTRACT_TYPE_Revolving loans_MEAN | PREV_NAME_CONTRACT_TYPE_XNA_MEAN | PREV_NAME_CONTRACT_TYPE_nan_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_FRIDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_MONDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_SATURDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_SUNDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_THURSDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_TUESDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_WEDNESDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_nan_MEAN | PREV_FLAG_LAST_APPL_PER_CONTRACT_N_MEAN | PREV_FLAG_LAST_APPL_PER_CONTRACT_Y_MEAN | PREV_FLAG_LAST_APPL_PER_CONTRACT_nan_MEAN | PREV_NAME_CASH_LOAN_PURPOSE_Building a house or an annex_MEAN | PREV_NAME_CASH_LOAN_PURPOSE_Business development_MEAN | PREV_NAME_CASH_LOAN_PURPOSE_Buying a garage_MEAN | PREV_NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land_MEAN | ... | PREV_NAME_YIELD_GROUP_middle_MEAN | PREV_NAME_YIELD_GROUP_nan_MEAN | PREV_PRODUCT_COMBINATION_Card Street_MEAN | PREV_PRODUCT_COMBINATION_Card X-Sell_MEAN | PREV_PRODUCT_COMBINATION_Cash_MEAN | PREV_PRODUCT_COMBINATION_Cash Street: high_MEAN | PREV_PRODUCT_COMBINATION_Cash Street: low_MEAN | PREV_PRODUCT_COMBINATION_Cash Street: middle_MEAN | PREV_PRODUCT_COMBINATION_Cash X-Sell: high_MEAN | PREV_PRODUCT_COMBINATION_Cash X-Sell: low_MEAN | PREV_PRODUCT_COMBINATION_Cash X-Sell: middle_MEAN | PREV_PRODUCT_COMBINATION_POS household with interest_MEAN | PREV_PRODUCT_COMBINATION_POS household without interest_MEAN | PREV_PRODUCT_COMBINATION_POS industry with interest_MEAN | PREV_PRODUCT_COMBINATION_POS industry without interest_MEAN | PREV_PRODUCT_COMBINATION_POS mobile with interest_MEAN | PREV_PRODUCT_COMBINATION_POS mobile without interest_MEAN | PREV_PRODUCT_COMBINATION_POS other with interest_MEAN | PREV_PRODUCT_COMBINATION_POS others without interest_MEAN | PREV_PRODUCT_COMBINATION_nan_MEAN | APPROVED_AMT_ANNUITY_MIN | APPROVED_AMT_ANNUITY_MAX | APPROVED_AMT_ANNUITY_MEAN | APPROVED_AMT_APPLICATION_MIN | APPROVED_AMT_APPLICATION_MAX | APPROVED_AMT_APPLICATION_MEAN | APPROVED_AMT_CREDIT_MIN | APPROVED_AMT_CREDIT_MAX | APPROVED_AMT_CREDIT_MEAN | APPROVED_APP_CREDIT_PERC_MIN | APPROVED_APP_CREDIT_PERC_MAX | APPROVED_APP_CREDIT_PERC_MEAN | APPROVED_APP_CREDIT_PERC_VAR | APPROVED_AMT_DOWN_PAYMENT_MIN | APPROVED_AMT_DOWN_PAYMENT_MAX | APPROVED_AMT_DOWN_PAYMENT_MEAN | APPROVED_AMT_GOODS_PRICE_MIN | APPROVED_AMT_GOODS_PRICE_MAX | APPROVED_AMT_GOODS_PRICE_MEAN | APPROVED_HOUR_APPR_PROCESS_START_MIN | APPROVED_HOUR_APPR_PROCESS_START_MAX | APPROVED_HOUR_APPR_PROCESS_START_MEAN | APPROVED_RATE_DOWN_PAYMENT_MIN | APPROVED_RATE_DOWN_PAYMENT_MAX | APPROVED_RATE_DOWN_PAYMENT_MEAN | APPROVED_DAYS_DECISION_MIN | APPROVED_DAYS_DECISION_MAX | APPROVED_DAYS_DECISION_MEAN | APPROVED_CNT_PAYMENT_MEAN | APPROVED_CNT_PAYMENT_SUM | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SK_ID_CURR | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100001 | 3951.000 | 3951.000 | 3951.000 | 24835.5 | 24835.5 | 24835.50 | 23787.0 | 23787.0 | 23787.00 | 1.044079 | 1.044079 | 1.044079 | NaN | 2520.0 | 2520.0 | 2520.0 | 24835.5 | 24835.5 | 24835.5 | 13 | 13 | 13.000000 | 0.104326 | 0.104326 | 0.104326 | -1740 | -1740 | -1740.0 | 8.0 | 8.0 | 0.000000 | 1.000000 | 0.0 | 0.0 | 0 | 1.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 3951.000 | 3951.000 | 3951.000 | 24835.5 | 24835.5 | 24835.5 | 23787.0 | 23787.0 | 23787.0 | 1.044079 | 1.044079 | 1.044079 | NaN | 2520.0 | 2520.0 | 2520.0 | 24835.5 | 24835.5 | 24835.5 | 13.0 | 13.0 | 13.000000 | 0.104326 | 0.104326 | 0.104326 | -1740.0 | -1740.0 | -1740.0 | 8.0 | 8.0 |
100002 | 9251.775 | 9251.775 | 9251.775 | 179055.0 | 179055.0 | 179055.00 | 179055.0 | 179055.0 | 179055.00 | 1.000000 | 1.000000 | 1.000000 | NaN | 0.0 | 0.0 | 0.0 | 179055.0 | 179055.0 | 179055.0 | 9 | 9 | 9.000000 | 0.000000 | 0.000000 | 0.000000 | -606 | -606 | -606.0 | 24.0 | 24.0 | 0.000000 | 1.000000 | 0.0 | 0.0 | 0 | 0.000000 | 0.0 | 1.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 9251.775 | 9251.775 | 9251.775 | 179055.0 | 179055.0 | 179055.0 | 179055.0 | 179055.0 | 179055.0 | 1.000000 | 1.000000 | 1.000000 | NaN | 0.0 | 0.0 | 0.0 | 179055.0 | 179055.0 | 179055.0 | 9.0 | 9.0 | 9.000000 | 0.000000 | 0.000000 | 0.000000 | -606.0 | -606.0 | -606.0 | 24.0 | 24.0 |
100003 | 6737.310 | 98356.995 | 56553.990 | 68809.5 | 900000.0 | 435436.50 | 68053.5 | 1035882.0 | 484191.00 | 0.868825 | 1.011109 | 0.949329 | 0.005324 | 0.0 | 6885.0 | 3442.5 | 68809.5 | 900000.0 | 435436.5 | 12 | 17 | 14.666667 | 0.000000 | 0.100061 | 0.050030 | -2341 | -746 | -1305.0 | 10.0 | 30.0 | 0.333333 | 0.666667 | 0.0 | 0.0 | 0 | 0.333333 | 0.0 | 0.333333 | 0.333333 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.666667 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.333333 | 0.0 | 0.333333 | 0.0 | 0.333333 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 6737.310 | 98356.995 | 56553.990 | 68809.5 | 900000.0 | 435436.5 | 68053.5 | 1035882.0 | 484191.0 | 0.868825 | 1.011109 | 0.949329 | 0.005324 | 0.0 | 6885.0 | 3442.5 | 68809.5 | 900000.0 | 435436.5 | 12.0 | 17.0 | 14.666667 | 0.000000 | 0.100061 | 0.050030 | -2341.0 | -746.0 | -1305.0 | 10.0 | 30.0 |
100004 | 5357.250 | 5357.250 | 5357.250 | 24282.0 | 24282.0 | 24282.00 | 20106.0 | 20106.0 | 20106.00 | 1.207699 | 1.207699 | 1.207699 | NaN | 4860.0 | 4860.0 | 4860.0 | 24282.0 | 24282.0 | 24282.0 | 5 | 5 | 5.000000 | 0.212008 | 0.212008 | 0.212008 | -815 | -815 | -815.0 | 4.0 | 4.0 | 0.000000 | 1.000000 | 0.0 | 0.0 | 0 | 1.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 1.000000 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 5357.250 | 5357.250 | 5357.250 | 24282.0 | 24282.0 | 24282.0 | 20106.0 | 20106.0 | 20106.0 | 1.207699 | 1.207699 | 1.207699 | NaN | 4860.0 | 4860.0 | 4860.0 | 24282.0 | 24282.0 | 24282.0 | 5.0 | 5.0 | 5.000000 | 0.212008 | 0.212008 | 0.212008 | -815.0 | -815.0 | -815.0 | 4.0 | 4.0 |
100005 | 4813.200 | 4813.200 | 4813.200 | 0.0 | 44617.5 | 22308.75 | 0.0 | 40153.5 | 20076.75 | 1.111173 | 1.111173 | 1.111173 | NaN | 4464.0 | 4464.0 | 4464.0 | 44617.5 | 44617.5 | 44617.5 | 10 | 11 | 10.500000 | 0.108964 | 0.108964 | 0.108964 | -757 | -315 | -536.0 | 12.0 | 12.0 | 0.500000 | 0.500000 | 0.0 | 0.0 | 0 | 0.500000 | 0.0 | 0.000000 | 0.000000 | 0.5 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 0.000000 | 0 | 0.0 | 0.0 | 0.5 | 0.0 | 0.0 | 0.0 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.000000 | 0.0 | 0.5 | 0.0 | 0.0 | 0.0 | 0.0 | 4813.200 | 4813.200 | 4813.200 | 44617.5 | 44617.5 | 44617.5 | 40153.5 | 40153.5 | 40153.5 | 1.111173 | 1.111173 | 1.111173 | NaN | 4464.0 | 4464.0 | 4464.0 | 44617.5 | 44617.5 | 44617.5 | 11.0 | 11.0 | 11.000000 | 0.108964 | 0.108964 | 0.108964 | -757.0 | -757.0 | -757.0 | 12.0 | 12.0 |
5 rows × 219 columns
# 6.4 Similarly for refused applications perform aggregations of numerical features:
refused = prev[prev['NAME_CONTRACT_STATUS_Refused'] == 1]
refused_agg = refused.groupby('SK_ID_CURR').agg(num_aggregations)
# 6.4.1
refused_agg.shape # (118277, 30)
refused_agg.head()
(118277, 30)
AMT_ANNUITY | AMT_APPLICATION | AMT_CREDIT | APP_CREDIT_PERC | AMT_DOWN_PAYMENT | AMT_GOODS_PRICE | HOUR_APPR_PROCESS_START | RATE_DOWN_PAYMENT | DAYS_DECISION | CNT_PAYMENT | |||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
min | max | mean | min | max | mean | min | max | mean | min | max | mean | var | min | max | mean | min | max | mean | min | max | mean | min | max | mean | min | max | mean | mean | sum | |
SK_ID_CURR | ||||||||||||||||||||||||||||||
100006 | 32696.100 | 32696.100 | 32696.100 | 688500.000000 | 688500.0 | 688500.000000 | 906615.000 | 906615.0 | 906615.0000 | 0.759418 | 0.759418 | 0.759418 | NaN | NaN | NaN | NaN | 688500.000 | 688500.0 | 688500.0000 | 15 | 15 | 15.000 | NaN | NaN | NaN | -181 | -181 | -181.000 | 48.000000 | 48.0 |
100011 | NaN | NaN | NaN | 0.000000 | 0.0 | 0.000000 | 0.000 | 0.0 | 0.0000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.000 | 0.0 | 0.0000 | 9 | 9 | 9.000 | NaN | NaN | NaN | -1162 | -1162 | -1162.000 | NaN | 0.0 |
100027 | 22556.475 | 22556.475 | 22556.475 | 225000.000000 | 225000.0 | 225000.000000 | 239850.000 | 239850.0 | 239850.0000 | 0.938086 | 0.938086 | 0.938086 | NaN | NaN | NaN | NaN | 225000.000 | 225000.0 | 225000.0000 | 14 | 14 | 14.000 | NaN | NaN | NaN | -181 | -181 | -181.000 | 12.000000 | 12.0 |
100030 | 2826.450 | 6176.925 | 4073.265 | 21969.224609 | 43870.5 | 33767.121094 | 21969.225 | 43870.5 | 32533.2225 | 1.000000 | 1.200580 | 1.042280 | 0.005227 | 0.0 | 6714.0 | 1344.6 | 21969.225 | 43870.5 | 33767.1225 | 7 | 16 | 12.500 | 0.0 | 0.210919 | 0.042874 | -2689 | -840 | -2053.900 | 9.833333 | 59.0 |
100035 | 22308.750 | 33238.800 | 27773.775 | 0.000000 | 1260000.0 | 241875.000000 | 0.000 | 1260000.0 | 241875.0000 | 1.000000 | 1.000000 | 1.000000 | 0.000000 | NaN | NaN | NaN | 675000.000 | 1260000.0 | 967500.0000 | 11 | 14 | 12.375 | NaN | NaN | NaN | -160 | -119 | -143.375 | 54.000000 | 108.0 |
# 6.5
refused_agg.columns = pd.Index(['REFUSED_' + e[0] + "_" + e[1].upper() for e in refused_agg.columns.tolist()])
refused_agg.head()
refused_agg.shape # (118277, 30)
REFUSED_AMT_ANNUITY_MIN | REFUSED_AMT_ANNUITY_MAX | REFUSED_AMT_ANNUITY_MEAN | REFUSED_AMT_APPLICATION_MIN | REFUSED_AMT_APPLICATION_MAX | REFUSED_AMT_APPLICATION_MEAN | REFUSED_AMT_CREDIT_MIN | REFUSED_AMT_CREDIT_MAX | REFUSED_AMT_CREDIT_MEAN | REFUSED_APP_CREDIT_PERC_MIN | REFUSED_APP_CREDIT_PERC_MAX | REFUSED_APP_CREDIT_PERC_MEAN | REFUSED_APP_CREDIT_PERC_VAR | REFUSED_AMT_DOWN_PAYMENT_MIN | REFUSED_AMT_DOWN_PAYMENT_MAX | REFUSED_AMT_DOWN_PAYMENT_MEAN | REFUSED_AMT_GOODS_PRICE_MIN | REFUSED_AMT_GOODS_PRICE_MAX | REFUSED_AMT_GOODS_PRICE_MEAN | REFUSED_HOUR_APPR_PROCESS_START_MIN | REFUSED_HOUR_APPR_PROCESS_START_MAX | REFUSED_HOUR_APPR_PROCESS_START_MEAN | REFUSED_RATE_DOWN_PAYMENT_MIN | REFUSED_RATE_DOWN_PAYMENT_MAX | REFUSED_RATE_DOWN_PAYMENT_MEAN | REFUSED_DAYS_DECISION_MIN | REFUSED_DAYS_DECISION_MAX | REFUSED_DAYS_DECISION_MEAN | REFUSED_CNT_PAYMENT_MEAN | REFUSED_CNT_PAYMENT_SUM | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SK_ID_CURR | ||||||||||||||||||||||||||||||
100006 | 32696.100 | 32696.100 | 32696.100 | 688500.000000 | 688500.0 | 688500.000000 | 906615.000 | 906615.0 | 906615.0000 | 0.759418 | 0.759418 | 0.759418 | NaN | NaN | NaN | NaN | 688500.000 | 688500.0 | 688500.0000 | 15 | 15 | 15.000 | NaN | NaN | NaN | -181 | -181 | -181.000 | 48.000000 | 48.0 |
100011 | NaN | NaN | NaN | 0.000000 | 0.0 | 0.000000 | 0.000 | 0.0 | 0.0000 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.000 | 0.0 | 0.0000 | 9 | 9 | 9.000 | NaN | NaN | NaN | -1162 | -1162 | -1162.000 | NaN | 0.0 |
100027 | 22556.475 | 22556.475 | 22556.475 | 225000.000000 | 225000.0 | 225000.000000 | 239850.000 | 239850.0 | 239850.0000 | 0.938086 | 0.938086 | 0.938086 | NaN | NaN | NaN | NaN | 225000.000 | 225000.0 | 225000.0000 | 14 | 14 | 14.000 | NaN | NaN | NaN | -181 | -181 | -181.000 | 12.000000 | 12.0 |
100030 | 2826.450 | 6176.925 | 4073.265 | 21969.224609 | 43870.5 | 33767.121094 | 21969.225 | 43870.5 | 32533.2225 | 1.000000 | 1.200580 | 1.042280 | 0.005227 | 0.0 | 6714.0 | 1344.6 | 21969.225 | 43870.5 | 33767.1225 | 7 | 16 | 12.500 | 0.0 | 0.210919 | 0.042874 | -2689 | -840 | -2053.900 | 9.833333 | 59.0 |
100035 | 22308.750 | 33238.800 | 27773.775 | 0.000000 | 1260000.0 | 241875.000000 | 0.000 | 1260000.0 | 241875.0000 | 1.000000 | 1.000000 | 1.000000 | 0.000000 | NaN | NaN | NaN | 675000.000 | 1260000.0 | 967500.0000 | 11 | 14 | 12.375 | NaN | NaN | NaN | -160 | -119 | -143.375 | 54.000000 | 108.0 |
(118277, 30)
# 7.0 Join refused_agg with prev_agg:
prev_agg = prev_agg.join( # prev_agg: left
refused_agg, # table on the right
how='left',
on='SK_ID_CURR'
)
# 7.1 Our final table:
prev_agg.shape # 338857, 249)
prev_agg.head()
(338857, 249)
PREV_AMT_ANNUITY_MIN | PREV_AMT_ANNUITY_MAX | PREV_AMT_ANNUITY_MEAN | PREV_AMT_APPLICATION_MIN | PREV_AMT_APPLICATION_MAX | PREV_AMT_APPLICATION_MEAN | PREV_AMT_CREDIT_MIN | PREV_AMT_CREDIT_MAX | PREV_AMT_CREDIT_MEAN | PREV_APP_CREDIT_PERC_MIN | PREV_APP_CREDIT_PERC_MAX | PREV_APP_CREDIT_PERC_MEAN | PREV_APP_CREDIT_PERC_VAR | PREV_AMT_DOWN_PAYMENT_MIN | PREV_AMT_DOWN_PAYMENT_MAX | PREV_AMT_DOWN_PAYMENT_MEAN | PREV_AMT_GOODS_PRICE_MIN | PREV_AMT_GOODS_PRICE_MAX | PREV_AMT_GOODS_PRICE_MEAN | PREV_HOUR_APPR_PROCESS_START_MIN | PREV_HOUR_APPR_PROCESS_START_MAX | PREV_HOUR_APPR_PROCESS_START_MEAN | PREV_RATE_DOWN_PAYMENT_MIN | PREV_RATE_DOWN_PAYMENT_MAX | PREV_RATE_DOWN_PAYMENT_MEAN | PREV_DAYS_DECISION_MIN | PREV_DAYS_DECISION_MAX | PREV_DAYS_DECISION_MEAN | PREV_CNT_PAYMENT_MEAN | PREV_CNT_PAYMENT_SUM | PREV_NAME_CONTRACT_TYPE_Cash loans_MEAN | PREV_NAME_CONTRACT_TYPE_Consumer loans_MEAN | PREV_NAME_CONTRACT_TYPE_Revolving loans_MEAN | PREV_NAME_CONTRACT_TYPE_XNA_MEAN | PREV_NAME_CONTRACT_TYPE_nan_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_FRIDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_MONDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_SATURDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_SUNDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_THURSDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_TUESDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_WEDNESDAY_MEAN | PREV_WEEKDAY_APPR_PROCESS_START_nan_MEAN | PREV_FLAG_LAST_APPL_PER_CONTRACT_N_MEAN | PREV_FLAG_LAST_APPL_PER_CONTRACT_Y_MEAN | PREV_FLAG_LAST_APPL_PER_CONTRACT_nan_MEAN | PREV_NAME_CASH_LOAN_PURPOSE_Building a house or an annex_MEAN | PREV_NAME_CASH_LOAN_PURPOSE_Business development_MEAN | PREV_NAME_CASH_LOAN_PURPOSE_Buying a garage_MEAN | PREV_NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land_MEAN | ... | APPROVED_APP_CREDIT_PERC_MAX | APPROVED_APP_CREDIT_PERC_MEAN | APPROVED_APP_CREDIT_PERC_VAR | APPROVED_AMT_DOWN_PAYMENT_MIN | APPROVED_AMT_DOWN_PAYMENT_MAX | APPROVED_AMT_DOWN_PAYMENT_MEAN | APPROVED_AMT_GOODS_PRICE_MIN | APPROVED_AMT_GOODS_PRICE_MAX | APPROVED_AMT_GOODS_PRICE_MEAN | APPROVED_HOUR_APPR_PROCESS_START_MIN | APPROVED_HOUR_APPR_PROCESS_START_MAX | APPROVED_HOUR_APPR_PROCESS_START_MEAN | APPROVED_RATE_DOWN_PAYMENT_MIN | APPROVED_RATE_DOWN_PAYMENT_MAX | APPROVED_RATE_DOWN_PAYMENT_MEAN | APPROVED_DAYS_DECISION_MIN | APPROVED_DAYS_DECISION_MAX | APPROVED_DAYS_DECISION_MEAN | APPROVED_CNT_PAYMENT_MEAN | APPROVED_CNT_PAYMENT_SUM | REFUSED_AMT_ANNUITY_MIN | REFUSED_AMT_ANNUITY_MAX | REFUSED_AMT_ANNUITY_MEAN | REFUSED_AMT_APPLICATION_MIN | REFUSED_AMT_APPLICATION_MAX | REFUSED_AMT_APPLICATION_MEAN | REFUSED_AMT_CREDIT_MIN | REFUSED_AMT_CREDIT_MAX | REFUSED_AMT_CREDIT_MEAN | REFUSED_APP_CREDIT_PERC_MIN | REFUSED_APP_CREDIT_PERC_MAX | REFUSED_APP_CREDIT_PERC_MEAN | REFUSED_APP_CREDIT_PERC_VAR | REFUSED_AMT_DOWN_PAYMENT_MIN | REFUSED_AMT_DOWN_PAYMENT_MAX | REFUSED_AMT_DOWN_PAYMENT_MEAN | REFUSED_AMT_GOODS_PRICE_MIN | REFUSED_AMT_GOODS_PRICE_MAX | REFUSED_AMT_GOODS_PRICE_MEAN | REFUSED_HOUR_APPR_PROCESS_START_MIN | REFUSED_HOUR_APPR_PROCESS_START_MAX | REFUSED_HOUR_APPR_PROCESS_START_MEAN | REFUSED_RATE_DOWN_PAYMENT_MIN | REFUSED_RATE_DOWN_PAYMENT_MAX | REFUSED_RATE_DOWN_PAYMENT_MEAN | REFUSED_DAYS_DECISION_MIN | REFUSED_DAYS_DECISION_MAX | REFUSED_DAYS_DECISION_MEAN | REFUSED_CNT_PAYMENT_MEAN | REFUSED_CNT_PAYMENT_SUM | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SK_ID_CURR | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
100001 | 3951.000 | 3951.000 | 3951.000 | 24835.5 | 24835.5 | 24835.50 | 23787.0 | 23787.0 | 23787.00 | 1.044079 | 1.044079 | 1.044079 | NaN | 2520.0 | 2520.0 | 2520.0 | 24835.5 | 24835.5 | 24835.5 | 13 | 13 | 13.000000 | 0.104326 | 0.104326 | 0.104326 | -1740 | -1740 | -1740.0 | 8.0 | 8.0 | 0.000000 | 1.000000 | 0.0 | 0.0 | 0 | 1.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 1.044079 | 1.044079 | NaN | 2520.0 | 2520.0 | 2520.0 | 24835.5 | 24835.5 | 24835.5 | 13.0 | 13.0 | 13.000000 | 0.104326 | 0.104326 | 0.104326 | -1740.0 | -1740.0 | -1740.0 | 8.0 | 8.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
100002 | 9251.775 | 9251.775 | 9251.775 | 179055.0 | 179055.0 | 179055.00 | 179055.0 | 179055.0 | 179055.00 | 1.000000 | 1.000000 | 1.000000 | NaN | 0.0 | 0.0 | 0.0 | 179055.0 | 179055.0 | 179055.0 | 9 | 9 | 9.000000 | 0.000000 | 0.000000 | 0.000000 | -606 | -606 | -606.0 | 24.0 | 24.0 | 0.000000 | 1.000000 | 0.0 | 0.0 | 0 | 0.000000 | 0.0 | 1.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 1.000000 | 1.000000 | NaN | 0.0 | 0.0 | 0.0 | 179055.0 | 179055.0 | 179055.0 | 9.0 | 9.0 | 9.000000 | 0.000000 | 0.000000 | 0.000000 | -606.0 | -606.0 | -606.0 | 24.0 | 24.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
100003 | 6737.310 | 98356.995 | 56553.990 | 68809.5 | 900000.0 | 435436.50 | 68053.5 | 1035882.0 | 484191.00 | 0.868825 | 1.011109 | 0.949329 | 0.005324 | 0.0 | 6885.0 | 3442.5 | 68809.5 | 900000.0 | 435436.5 | 12 | 17 | 14.666667 | 0.000000 | 0.100061 | 0.050030 | -2341 | -746 | -1305.0 | 10.0 | 30.0 | 0.333333 | 0.666667 | 0.0 | 0.0 | 0 | 0.333333 | 0.0 | 0.333333 | 0.333333 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 1.011109 | 0.949329 | 0.005324 | 0.0 | 6885.0 | 3442.5 | 68809.5 | 900000.0 | 435436.5 | 12.0 | 17.0 | 14.666667 | 0.000000 | 0.100061 | 0.050030 | -2341.0 | -746.0 | -1305.0 | 10.0 | 30.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
100004 | 5357.250 | 5357.250 | 5357.250 | 24282.0 | 24282.0 | 24282.00 | 20106.0 | 20106.0 | 20106.00 | 1.207699 | 1.207699 | 1.207699 | NaN | 4860.0 | 4860.0 | 4860.0 | 24282.0 | 24282.0 | 24282.0 | 5 | 5 | 5.000000 | 0.212008 | 0.212008 | 0.212008 | -815 | -815 | -815.0 | 4.0 | 4.0 | 0.000000 | 1.000000 | 0.0 | 0.0 | 0 | 1.000000 | 0.0 | 0.000000 | 0.000000 | 0.0 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 1.207699 | 1.207699 | NaN | 4860.0 | 4860.0 | 4860.0 | 24282.0 | 24282.0 | 24282.0 | 5.0 | 5.0 | 5.000000 | 0.212008 | 0.212008 | 0.212008 | -815.0 | -815.0 | -815.0 | 4.0 | 4.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
100005 | 4813.200 | 4813.200 | 4813.200 | 0.0 | 44617.5 | 22308.75 | 0.0 | 40153.5 | 20076.75 | 1.111173 | 1.111173 | 1.111173 | NaN | 4464.0 | 4464.0 | 4464.0 | 44617.5 | 44617.5 | 44617.5 | 10 | 11 | 10.500000 | 0.108964 | 0.108964 | 0.108964 | -757 | -315 | -536.0 | 12.0 | 12.0 | 0.500000 | 0.500000 | 0.0 | 0.0 | 0 | 0.500000 | 0.0 | 0.000000 | 0.000000 | 0.5 | 0.0 | 0.0 | 0 | 0.0 | 1.0 | 0 | 0.0 | 0.0 | 0.0 | 0.0 | ... | 1.111173 | 1.111173 | NaN | 4464.0 | 4464.0 | 4464.0 | 44617.5 | 44617.5 | 44617.5 | 11.0 | 11.0 | 11.000000 | 0.108964 | 0.108964 | 0.108964 | -757.0 | -757.0 | -757.0 | 12.0 | 12.0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
5 rows × 249 columns
# 8.0 Save the results for subsequent use:
prev_agg.to_csv("processed_prev_agg.csv.zip", compression = "zip")
####################