Previous Applications

About the data

previous_application: This dataset has details of previous applications made by clients to Home Credit. Only those clients find place here who also exist in application data. Each current loan in the application data (identified by SK_ID_CURR) can have multiple previous loan applications. Each previous application has one row and is identified by the feature SK_ID_PREV.

Feature Explanations

SK_ID_PREV : ID of previous credit in Home credit related to loan in our sample. (One loan in our sample can have 0,1,2 or more previous loan applications in Home Credit, previous application could, but not necessarily have to lead to credit)
SK_ID_CURR: ID of loan in our sample
NAME_CONTRACT_TYPE: Contract product type (Cash loan, consumer loan [POS] ,...) of the previous application
AMT_ANNUITY: Annuity of previous application
AMT_APPLICATION: For how much credit did client ask on the previous application
AMT_CREDIT: Final credit amount on the previous application. This differs from AMT_APPLICATION in a way that the AMT_APPLICATION is the amount for which the client initially applied for, but during our approval process he could have received different amount - AMT_CREDIT
AMT_DOWN_PAYMENT: Down payment on the previous application
AMT_GOODS_PRICE: Goods price of good that client asked for (if applicable) on the previous application
WEEKDAY_APPR_PROCESS_START: On which day of the week did the client apply for previous application
HOUR_APPR_PROCESS_START: Approximately at what day hour did the client apply for the previous application
FLAG_LAST_APPL_PER_CONTRACT: Flag if it was last application for the previous contract. Sometimes by mistake of client or our clerk there could be more applications for one single contract
NFLAG_LAST_APPL_IN_DAY: Flag if the application was the last application per day of the client. Sometimes clients apply for more applications a day. Rarely it could also be error in our system that one application is in the database twice
NFLAG_MICRO_CASH: Flag Micro finance loan
RATE_DOWN_PAYMENT: Down payment rate normalized on previous credit
RATE_INTEREST_PRIMARY: Interest rate normalized on previous credit
RATE_INTEREST_PRIVILEGED: Interest rate normalized on previous credit
NAME_CASH_LOAN_PURPOSE: Purpose of the cash loan
NAME_CONTRACT_STATUS: Contract status (approved, cancelled, ...) of previous application
DAYS_DECISION: Relative to current application when was the decision about previous application made
NAME_PAYMENT_TYPE: Payment method that client chose to pay for the previous application
CODE_REJECT_REASON: Why was the previous application rejected
NAME_TYPE_SUITE: Who accompanied client when applying for the previous application
NAME_CLIENT_TYPE: Was the client old or new client when applying for the previous application
NAME_GOODS_CATEGORY: What kind of goods did the client apply for in the previous application
NAME_PORTFOLIO: Was the previous application for CASH, POS, CAR, …
NAME_PRODUCT_TYPE: Was the previous application x-sell o walk-in
CHANNEL_TYPE: Through which channel we acquired the client on the previous application
SELLERPLACE_AREA: Selling area of seller place of the previous application
NAME_SELLER_INDUSTRY: The industry of the seller
CNT_PAYMENT: Term of previous credit at application of the previous application
NAME_YIELD_GROUP: Grouped interest rate into small medium and high of the previous application
PRODUCT_COMBINATION: Detailed product combination of the previous application
DAYS_FIRST_DRAWING: Relative to application date of current application when was the first disbursement of the previous application
DAYS_FIRST_DUE: Relative to application date of current application when was the first due supposed to be of the previous application
DAYS_LAST_DUE_1ST_VERSION: Relative to application date of current application when was the first due of the previous application
DAYS_LAST_DUE: Relative to application date of current application when was the last due date of the previous application
DAYS_TERMINATION: Relative to application date of current application when was the expected termination of the previous application
NFLAG_INSURED_ON_APPROVAL: Did the client requested insurance during the previous application

# Last amended: 24rd October, 2020
# Myfolder: C:\Users\Administrator\OneDrive\Documents\home_credit_default_risk
# Objective: 
#           Solving Kaggle problem: Home Credit Default Risk
#           Processing previous_application dataset
#
# Data Source: https://www.kaggle.com/c/home-credit-default-risk/data
# Ref: https://www.kaggle.com/jsaguiar/lightgbm-with-simple-features
# 1.0 Libraries
#     (Some of these may not be needed here.)
%reset -f
import numpy as np
import pandas as pd
import gc

# 1.1 Reduce read data size
#     There is a file reducing.py
#      in this folder. A class
#       in it is used to reduce
#        dataframe size
#     (Code modified by me to
#      exclude 'category' dtype)
import reducing

# 1.2 Misc
import warnings
import os
warnings.simplefilter(action='ignore', category=FutureWarning)
# 1.3
pd.set_option('display.max_colwidth', -1)
pd.set_option('display.max_columns', 100)
# 1.4 Display multiple commands outputs from a cell
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
# 2.0 Onehot encoding (OHE) function. Uses pd.get_dummies()
#     i) To transform 'object' columns to dummies. 
#    ii) Treat NaN as one of the categories
#   iii) Returns transformed-data and new-columns created

def one_hot_encoder(df, nan_as_category = True):
    original_columns = list(df.columns)
    categorical_columns = [col for col in df.columns if df[col].dtype == 'object']
    df = pd.get_dummies(df,
                        columns= categorical_columns,
                        dummy_na= nan_as_category       # Treat NaNs as category
                       )
    new_columns = [c for c in df.columns if c not in original_columns]
    return df, new_columns
# 2.1
pathToFolder = "C:\\Users\\Administrator\\OneDrive\\Documents\\home_credit_default_risk"
os.chdir(pathToFolder)
# 2.2 Some constants
num_rows=None                # Implies read all rows
nan_as_category = True       # While transforming 
                             #   'object' columns to dummies
# 3.0 Read previous application data first
prev = pd.read_csv(
                   'previous_application.csv.zip',
                   nrows = num_rows
                   )

# 3.0.1 Reduce memory usage by appropriately
#       changing data-types per feature:

prev = reducing.Reducer().reduce(prev)
reduced df from 471.4808 MB to 414.1386 MB in 3.82 seconds
# 3.0.2
prev.shape             # (rows=16,70,214, cols = 37)
prev.head(5)
prev.columns
(1670214, 37)
SK_ID_PREV SK_ID_CURR NAME_CONTRACT_TYPE AMT_ANNUITY AMT_APPLICATION AMT_CREDIT AMT_DOWN_PAYMENT AMT_GOODS_PRICE WEEKDAY_APPR_PROCESS_START HOUR_APPR_PROCESS_START FLAG_LAST_APPL_PER_CONTRACT NFLAG_LAST_APPL_IN_DAY RATE_DOWN_PAYMENT RATE_INTEREST_PRIMARY RATE_INTEREST_PRIVILEGED NAME_CASH_LOAN_PURPOSE NAME_CONTRACT_STATUS DAYS_DECISION NAME_PAYMENT_TYPE CODE_REJECT_REASON NAME_TYPE_SUITE NAME_CLIENT_TYPE NAME_GOODS_CATEGORY NAME_PORTFOLIO NAME_PRODUCT_TYPE CHANNEL_TYPE SELLERPLACE_AREA NAME_SELLER_INDUSTRY CNT_PAYMENT NAME_YIELD_GROUP PRODUCT_COMBINATION DAYS_FIRST_DRAWING DAYS_FIRST_DUE DAYS_LAST_DUE_1ST_VERSION DAYS_LAST_DUE DAYS_TERMINATION NFLAG_INSURED_ON_APPROVAL
0 2030495 271877 Consumer loans 1730.430 17145.0 17145.0 0.0 17145.0 SATURDAY 15 Y 1 0.0 0.182832 0.867336 XAP Approved -73 Cash through the bank XAP NaN Repeater Mobile POS XNA Country-wide 35 Connectivity 12.0 middle POS mobile with interest 365243.0 -42.0 300.0 -42.0 -37.0 0.0
1 2802425 108129 Cash loans 25188.615 607500.0 679671.0 NaN 607500.0 THURSDAY 11 Y 1 NaN NaN NaN XNA Approved -164 XNA XAP Unaccompanied Repeater XNA Cash x-sell Contact center -1 XNA 36.0 low_action Cash X-Sell: low 365243.0 -134.0 916.0 365243.0 365243.0 1.0
2 2523466 122040 Cash loans 15060.735 112500.0 136444.5 NaN 112500.0 TUESDAY 11 Y 1 NaN NaN NaN XNA Approved -301 Cash through the bank XAP Spouse, partner Repeater XNA Cash x-sell Credit and cash offices -1 XNA 12.0 high Cash X-Sell: high 365243.0 -271.0 59.0 365243.0 365243.0 1.0
3 2819243 176158 Cash loans 47041.335 450000.0 470790.0 NaN 450000.0 MONDAY 7 Y 1 NaN NaN NaN XNA Approved -512 Cash through the bank XAP NaN Repeater XNA Cash x-sell Credit and cash offices -1 XNA 12.0 middle Cash X-Sell: middle 365243.0 -482.0 -152.0 -182.0 -177.0 1.0
4 1784265 202054 Cash loans 31924.395 337500.0 404055.0 NaN 337500.0 THURSDAY 9 Y 1 NaN NaN NaN Repairs Refused -781 Cash through the bank HC NaN Repeater XNA Cash walk-in Credit and cash offices -1 XNA 24.0 high Cash Street: high NaN NaN NaN NaN NaN NaN
Index(['SK_ID_PREV', 'SK_ID_CURR', 'NAME_CONTRACT_TYPE', 'AMT_ANNUITY',
       'AMT_APPLICATION', 'AMT_CREDIT', 'AMT_DOWN_PAYMENT', 'AMT_GOODS_PRICE',
       'WEEKDAY_APPR_PROCESS_START', 'HOUR_APPR_PROCESS_START',
       'FLAG_LAST_APPL_PER_CONTRACT', 'NFLAG_LAST_APPL_IN_DAY',
       'RATE_DOWN_PAYMENT', 'RATE_INTEREST_PRIMARY',
       'RATE_INTEREST_PRIVILEGED', 'NAME_CASH_LOAN_PURPOSE',
       'NAME_CONTRACT_STATUS', 'DAYS_DECISION', 'NAME_PAYMENT_TYPE',
       'CODE_REJECT_REASON', 'NAME_TYPE_SUITE', 'NAME_CLIENT_TYPE',
       'NAME_GOODS_CATEGORY', 'NAME_PORTFOLIO', 'NAME_PRODUCT_TYPE',
       'CHANNEL_TYPE', 'SELLERPLACE_AREA', 'NAME_SELLER_INDUSTRY',
       'CNT_PAYMENT', 'NAME_YIELD_GROUP', 'PRODUCT_COMBINATION',
       'DAYS_FIRST_DRAWING', 'DAYS_FIRST_DUE', 'DAYS_LAST_DUE_1ST_VERSION',
       'DAYS_LAST_DUE', 'DAYS_TERMINATION', 'NFLAG_INSURED_ON_APPROVAL'],
      dtype='object')
# 3.1 Let us examine how many unique IDs exist 

prev['SK_ID_PREV'].nunique()   # 1670214 Unique number
prev['SK_ID_CURR'].nunique()   # 338857  So a number of repeat exist
                               # We have to aggregate over it
                               #  to extract behaviour of clients
1670214
338857
# 3.2 Let us see distribution of dtypes
#     There are 16 'object' types here

prev.dtypes.value_counts()
object     16
float64    14
uint8      2 
uint32     2 
float32    1 
int16      1 
int32      1 
dtype: int64
# 3.3
prev.shape                       # (1670214, 37)

# 3.3.1
# What is the actual number of persons
#  who might have taken multiple loans?

prev['SK_ID_CURR'].nunique()     # 338857  -- Many duplicate values exist
                                   #            Consider SK_ID_CURR as Foreign Key
                                   #            Primary key exists in application_train data
                                   # Primary key: SK_ID_BUREAU
            
# 3.3.2
# As expected, there are no duplicate values here

prev['SK_ID_PREV'].nunique()   # 1670214 -- Unique id for each row 
(1670214, 37)
338857
1670214
# 4.0 OneHotEncode (OHE) 'object' types in bureau

prev, cat_cols = one_hot_encoder(
                                 prev,
                                 nan_as_category= True
                                 )
# 4.1

len(cat_cols)      # 159
cat_cols
159
['NAME_CONTRACT_TYPE_Cash loans',
 'NAME_CONTRACT_TYPE_Consumer loans',
 'NAME_CONTRACT_TYPE_Revolving loans',
 'NAME_CONTRACT_TYPE_XNA',
 'NAME_CONTRACT_TYPE_nan',
 'WEEKDAY_APPR_PROCESS_START_FRIDAY',
 'WEEKDAY_APPR_PROCESS_START_MONDAY',
 'WEEKDAY_APPR_PROCESS_START_SATURDAY',
 'WEEKDAY_APPR_PROCESS_START_SUNDAY',
 'WEEKDAY_APPR_PROCESS_START_THURSDAY',
 'WEEKDAY_APPR_PROCESS_START_TUESDAY',
 'WEEKDAY_APPR_PROCESS_START_WEDNESDAY',
 'WEEKDAY_APPR_PROCESS_START_nan',
 'FLAG_LAST_APPL_PER_CONTRACT_N',
 'FLAG_LAST_APPL_PER_CONTRACT_Y',
 'FLAG_LAST_APPL_PER_CONTRACT_nan',
 'NAME_CASH_LOAN_PURPOSE_Building a house or an annex',
 'NAME_CASH_LOAN_PURPOSE_Business development',
 'NAME_CASH_LOAN_PURPOSE_Buying a garage',
 'NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land',
 'NAME_CASH_LOAN_PURPOSE_Buying a home',
 'NAME_CASH_LOAN_PURPOSE_Buying a new car',
 'NAME_CASH_LOAN_PURPOSE_Buying a used car',
 'NAME_CASH_LOAN_PURPOSE_Car repairs',
 'NAME_CASH_LOAN_PURPOSE_Education',
 'NAME_CASH_LOAN_PURPOSE_Everyday expenses',
 'NAME_CASH_LOAN_PURPOSE_Furniture',
 'NAME_CASH_LOAN_PURPOSE_Gasification / water supply',
 'NAME_CASH_LOAN_PURPOSE_Hobby',
 'NAME_CASH_LOAN_PURPOSE_Journey',
 'NAME_CASH_LOAN_PURPOSE_Medicine',
 'NAME_CASH_LOAN_PURPOSE_Money for a third person',
 'NAME_CASH_LOAN_PURPOSE_Other',
 'NAME_CASH_LOAN_PURPOSE_Payments on other loans',
 'NAME_CASH_LOAN_PURPOSE_Purchase of electronic equipment',
 'NAME_CASH_LOAN_PURPOSE_Refusal to name the goal',
 'NAME_CASH_LOAN_PURPOSE_Repairs',
 'NAME_CASH_LOAN_PURPOSE_Urgent needs',
 'NAME_CASH_LOAN_PURPOSE_Wedding / gift / holiday',
 'NAME_CASH_LOAN_PURPOSE_XAP',
 'NAME_CASH_LOAN_PURPOSE_XNA',
 'NAME_CASH_LOAN_PURPOSE_nan',
 'NAME_CONTRACT_STATUS_Approved',
 'NAME_CONTRACT_STATUS_Canceled',
 'NAME_CONTRACT_STATUS_Refused',
 'NAME_CONTRACT_STATUS_Unused offer',
 'NAME_CONTRACT_STATUS_nan',
 'NAME_PAYMENT_TYPE_Cash through the bank',
 'NAME_PAYMENT_TYPE_Cashless from the account of the employer',
 'NAME_PAYMENT_TYPE_Non-cash from your account',
 'NAME_PAYMENT_TYPE_XNA',
 'NAME_PAYMENT_TYPE_nan',
 'CODE_REJECT_REASON_CLIENT',
 'CODE_REJECT_REASON_HC',
 'CODE_REJECT_REASON_LIMIT',
 'CODE_REJECT_REASON_SCO',
 'CODE_REJECT_REASON_SCOFR',
 'CODE_REJECT_REASON_SYSTEM',
 'CODE_REJECT_REASON_VERIF',
 'CODE_REJECT_REASON_XAP',
 'CODE_REJECT_REASON_XNA',
 'CODE_REJECT_REASON_nan',
 'NAME_TYPE_SUITE_Children',
 'NAME_TYPE_SUITE_Family',
 'NAME_TYPE_SUITE_Group of people',
 'NAME_TYPE_SUITE_Other_A',
 'NAME_TYPE_SUITE_Other_B',
 'NAME_TYPE_SUITE_Spouse, partner',
 'NAME_TYPE_SUITE_Unaccompanied',
 'NAME_TYPE_SUITE_nan',
 'NAME_CLIENT_TYPE_New',
 'NAME_CLIENT_TYPE_Refreshed',
 'NAME_CLIENT_TYPE_Repeater',
 'NAME_CLIENT_TYPE_XNA',
 'NAME_CLIENT_TYPE_nan',
 'NAME_GOODS_CATEGORY_Additional Service',
 'NAME_GOODS_CATEGORY_Animals',
 'NAME_GOODS_CATEGORY_Audio/Video',
 'NAME_GOODS_CATEGORY_Auto Accessories',
 'NAME_GOODS_CATEGORY_Clothing and Accessories',
 'NAME_GOODS_CATEGORY_Computers',
 'NAME_GOODS_CATEGORY_Construction Materials',
 'NAME_GOODS_CATEGORY_Consumer Electronics',
 'NAME_GOODS_CATEGORY_Direct Sales',
 'NAME_GOODS_CATEGORY_Education',
 'NAME_GOODS_CATEGORY_Fitness',
 'NAME_GOODS_CATEGORY_Furniture',
 'NAME_GOODS_CATEGORY_Gardening',
 'NAME_GOODS_CATEGORY_Homewares',
 'NAME_GOODS_CATEGORY_House Construction',
 'NAME_GOODS_CATEGORY_Insurance',
 'NAME_GOODS_CATEGORY_Jewelry',
 'NAME_GOODS_CATEGORY_Medical Supplies',
 'NAME_GOODS_CATEGORY_Medicine',
 'NAME_GOODS_CATEGORY_Mobile',
 'NAME_GOODS_CATEGORY_Office Appliances',
 'NAME_GOODS_CATEGORY_Other',
 'NAME_GOODS_CATEGORY_Photo / Cinema Equipment',
 'NAME_GOODS_CATEGORY_Sport and Leisure',
 'NAME_GOODS_CATEGORY_Tourism',
 'NAME_GOODS_CATEGORY_Vehicles',
 'NAME_GOODS_CATEGORY_Weapon',
 'NAME_GOODS_CATEGORY_XNA',
 'NAME_GOODS_CATEGORY_nan',
 'NAME_PORTFOLIO_Cards',
 'NAME_PORTFOLIO_Cars',
 'NAME_PORTFOLIO_Cash',
 'NAME_PORTFOLIO_POS',
 'NAME_PORTFOLIO_XNA',
 'NAME_PORTFOLIO_nan',
 'NAME_PRODUCT_TYPE_XNA',
 'NAME_PRODUCT_TYPE_walk-in',
 'NAME_PRODUCT_TYPE_x-sell',
 'NAME_PRODUCT_TYPE_nan',
 'CHANNEL_TYPE_AP+ (Cash loan)',
 'CHANNEL_TYPE_Car dealer',
 'CHANNEL_TYPE_Channel of corporate sales',
 'CHANNEL_TYPE_Contact center',
 'CHANNEL_TYPE_Country-wide',
 'CHANNEL_TYPE_Credit and cash offices',
 'CHANNEL_TYPE_Regional / Local',
 'CHANNEL_TYPE_Stone',
 'CHANNEL_TYPE_nan',
 'NAME_SELLER_INDUSTRY_Auto technology',
 'NAME_SELLER_INDUSTRY_Clothing',
 'NAME_SELLER_INDUSTRY_Connectivity',
 'NAME_SELLER_INDUSTRY_Construction',
 'NAME_SELLER_INDUSTRY_Consumer electronics',
 'NAME_SELLER_INDUSTRY_Furniture',
 'NAME_SELLER_INDUSTRY_Industry',
 'NAME_SELLER_INDUSTRY_Jewelry',
 'NAME_SELLER_INDUSTRY_MLM partners',
 'NAME_SELLER_INDUSTRY_Tourism',
 'NAME_SELLER_INDUSTRY_XNA',
 'NAME_SELLER_INDUSTRY_nan',
 'NAME_YIELD_GROUP_XNA',
 'NAME_YIELD_GROUP_high',
 'NAME_YIELD_GROUP_low_action',
 'NAME_YIELD_GROUP_low_normal',
 'NAME_YIELD_GROUP_middle',
 'NAME_YIELD_GROUP_nan',
 'PRODUCT_COMBINATION_Card Street',
 'PRODUCT_COMBINATION_Card X-Sell',
 'PRODUCT_COMBINATION_Cash',
 'PRODUCT_COMBINATION_Cash Street: high',
 'PRODUCT_COMBINATION_Cash Street: low',
 'PRODUCT_COMBINATION_Cash Street: middle',
 'PRODUCT_COMBINATION_Cash X-Sell: high',
 'PRODUCT_COMBINATION_Cash X-Sell: low',
 'PRODUCT_COMBINATION_Cash X-Sell: middle',
 'PRODUCT_COMBINATION_POS household with interest',
 'PRODUCT_COMBINATION_POS household without interest',
 'PRODUCT_COMBINATION_POS industry with interest',
 'PRODUCT_COMBINATION_POS industry without interest',
 'PRODUCT_COMBINATION_POS mobile with interest',
 'PRODUCT_COMBINATION_POS mobile without interest',
 'PRODUCT_COMBINATION_POS other with interest',
 'PRODUCT_COMBINATION_POS others without interest',
 'PRODUCT_COMBINATION_nan']
# 4.2.1 Just examine NULLs in few features
prev['DAYS_FIRST_DRAWING'].isnull().sum()     # 673065
# 4.2.2 And also this special constant value: 365243
(prev['DAYS_FIRST_DRAWING'] == 365243).sum()  # 934444

prev['DAYS_FIRST_DUE'].isnull().sum()         # 673065
(prev['DAYS_FIRST_DUE'] == 365243).sum()      #  40645

prev['DAYS_LAST_DUE'].isnull().sum()          # 673065
(prev['DAYS_LAST_DUE'] == 365243).sum()       # 211221

prev['DAYS_TERMINATION'].isnull().sum()       # 673065
(prev['DAYS_TERMINATION']== 365243).sum()     # 225913
673065
934444
673065
40645
673065
211221
673065
225913
# 4.3 Examine total number of unique values
#     in each one of the above four features

prev['DAYS_FIRST_DRAWING'].nunique()     # 2838
prev['DAYS_FIRST_DRAWING'].sort_values(ascending = False)[:5]
prev['DAYS_FIRST_DUE'].nunique()         # 2892
prev['DAYS_LAST_DUE'].nunique()          # 2873
prev['DAYS_TERMINATION'].nunique()       # 2830
2838
1670213    365243.0
572469     365243.0
572494     365243.0
572491     365243.0
572489     365243.0
Name: DAYS_FIRST_DRAWING, dtype: float64
2892
2873
2830
# 4.4 Convert Days 365243 values to nan

prev['DAYS_FIRST_DRAWING'].replace(365243, np.nan, inplace= True)
prev['DAYS_FIRST_DUE'].replace(365243, np.nan, inplace= True)
prev['DAYS_LAST_DUE_1ST_VERSION'].replace(365243, np.nan, inplace= True)
prev['DAYS_LAST_DUE'].replace(365243, np.nan, inplace= True)
prev['DAYS_TERMINATION'].replace(365243, np.nan, inplace= True)
# 4.5 So how many NULLS now exist in each one of
#     these four features:

prev['DAYS_FIRST_DRAWING'].isnull().sum()     # 1607509
prev['DAYS_FIRST_DUE'].isnull().sum()         #  713710
prev['DAYS_LAST_DUE'].isnull().sum()          #  884286
prev['DAYS_TERMINATION'].isnull().sum()       #  898978
1607509
713710
884286
898978

Perform aggregations

On the whole of dataset, perform aggregations for numerical features and perform aggregations on just created OHE features. Numerical features are being aggregated as: min, max, mean.. while OHE features aggregation is just 'mean'.
# 5.0 One special feature
#     Add feature: value ask / value received percentage

prev['APP_CREDIT_PERC'] = prev['AMT_APPLICATION'] / prev['AMT_CREDIT']

# 5.1 Numeric features aggregations:
#     Dictionary of what all operations are to 
#     performed on numerical features:

num_aggregations = {
                     'AMT_ANNUITY':             ['min', 'max', 'mean'],
                     'AMT_APPLICATION':         ['min', 'max', 'mean'],
                     'AMT_CREDIT':              ['min', 'max', 'mean'],
                     'APP_CREDIT_PERC':         ['min', 'max', 'mean', 'var'],
                     'AMT_DOWN_PAYMENT':        ['min', 'max', 'mean'],
                     'AMT_GOODS_PRICE':         ['min', 'max', 'mean'],
                     'HOUR_APPR_PROCESS_START': ['min', 'max', 'mean'],
                     'RATE_DOWN_PAYMENT':       ['min', 'max', 'mean'],
                     'DAYS_DECISION':           ['min', 'max', 'mean'],
                     'CNT_PAYMENT':             ['mean', 'sum'],
                    }
# 5.2 Categorical features
#     Create a dictionary for aggregation operations:

cat_aggregations = {}
for cat in cat_cols:
    cat_aggregations[cat] = ['mean']

# 5.2.1    
cat_aggregations    
{'NAME_CONTRACT_TYPE_Cash loans': ['mean'],
 'NAME_CONTRACT_TYPE_Consumer loans': ['mean'],
 'NAME_CONTRACT_TYPE_Revolving loans': ['mean'],
 'NAME_CONTRACT_TYPE_XNA': ['mean'],
 'NAME_CONTRACT_TYPE_nan': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_FRIDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_MONDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_SATURDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_SUNDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_THURSDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_TUESDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_WEDNESDAY': ['mean'],
 'WEEKDAY_APPR_PROCESS_START_nan': ['mean'],
 'FLAG_LAST_APPL_PER_CONTRACT_N': ['mean'],
 'FLAG_LAST_APPL_PER_CONTRACT_Y': ['mean'],
 'FLAG_LAST_APPL_PER_CONTRACT_nan': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Building a house or an annex': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Business development': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Buying a garage': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Buying a home': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Buying a new car': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Buying a used car': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Car repairs': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Education': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Everyday expenses': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Furniture': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Gasification / water supply': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Hobby': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Journey': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Medicine': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Money for a third person': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Other': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Payments on other loans': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Purchase of electronic equipment': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Refusal to name the goal': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Repairs': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Urgent needs': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_Wedding / gift / holiday': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_XAP': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_XNA': ['mean'],
 'NAME_CASH_LOAN_PURPOSE_nan': ['mean'],
 'NAME_CONTRACT_STATUS_Approved': ['mean'],
 'NAME_CONTRACT_STATUS_Canceled': ['mean'],
 'NAME_CONTRACT_STATUS_Refused': ['mean'],
 'NAME_CONTRACT_STATUS_Unused offer': ['mean'],
 'NAME_CONTRACT_STATUS_nan': ['mean'],
 'NAME_PAYMENT_TYPE_Cash through the bank': ['mean'],
 'NAME_PAYMENT_TYPE_Cashless from the account of the employer': ['mean'],
 'NAME_PAYMENT_TYPE_Non-cash from your account': ['mean'],
 'NAME_PAYMENT_TYPE_XNA': ['mean'],
 'NAME_PAYMENT_TYPE_nan': ['mean'],
 'CODE_REJECT_REASON_CLIENT': ['mean'],
 'CODE_REJECT_REASON_HC': ['mean'],
 'CODE_REJECT_REASON_LIMIT': ['mean'],
 'CODE_REJECT_REASON_SCO': ['mean'],
 'CODE_REJECT_REASON_SCOFR': ['mean'],
 'CODE_REJECT_REASON_SYSTEM': ['mean'],
 'CODE_REJECT_REASON_VERIF': ['mean'],
 'CODE_REJECT_REASON_XAP': ['mean'],
 'CODE_REJECT_REASON_XNA': ['mean'],
 'CODE_REJECT_REASON_nan': ['mean'],
 'NAME_TYPE_SUITE_Children': ['mean'],
 'NAME_TYPE_SUITE_Family': ['mean'],
 'NAME_TYPE_SUITE_Group of people': ['mean'],
 'NAME_TYPE_SUITE_Other_A': ['mean'],
 'NAME_TYPE_SUITE_Other_B': ['mean'],
 'NAME_TYPE_SUITE_Spouse, partner': ['mean'],
 'NAME_TYPE_SUITE_Unaccompanied': ['mean'],
 'NAME_TYPE_SUITE_nan': ['mean'],
 'NAME_CLIENT_TYPE_New': ['mean'],
 'NAME_CLIENT_TYPE_Refreshed': ['mean'],
 'NAME_CLIENT_TYPE_Repeater': ['mean'],
 'NAME_CLIENT_TYPE_XNA': ['mean'],
 'NAME_CLIENT_TYPE_nan': ['mean'],
 'NAME_GOODS_CATEGORY_Additional Service': ['mean'],
 'NAME_GOODS_CATEGORY_Animals': ['mean'],
 'NAME_GOODS_CATEGORY_Audio/Video': ['mean'],
 'NAME_GOODS_CATEGORY_Auto Accessories': ['mean'],
 'NAME_GOODS_CATEGORY_Clothing and Accessories': ['mean'],
 'NAME_GOODS_CATEGORY_Computers': ['mean'],
 'NAME_GOODS_CATEGORY_Construction Materials': ['mean'],
 'NAME_GOODS_CATEGORY_Consumer Electronics': ['mean'],
 'NAME_GOODS_CATEGORY_Direct Sales': ['mean'],
 'NAME_GOODS_CATEGORY_Education': ['mean'],
 'NAME_GOODS_CATEGORY_Fitness': ['mean'],
 'NAME_GOODS_CATEGORY_Furniture': ['mean'],
 'NAME_GOODS_CATEGORY_Gardening': ['mean'],
 'NAME_GOODS_CATEGORY_Homewares': ['mean'],
 'NAME_GOODS_CATEGORY_House Construction': ['mean'],
 'NAME_GOODS_CATEGORY_Insurance': ['mean'],
 'NAME_GOODS_CATEGORY_Jewelry': ['mean'],
 'NAME_GOODS_CATEGORY_Medical Supplies': ['mean'],
 'NAME_GOODS_CATEGORY_Medicine': ['mean'],
 'NAME_GOODS_CATEGORY_Mobile': ['mean'],
 'NAME_GOODS_CATEGORY_Office Appliances': ['mean'],
 'NAME_GOODS_CATEGORY_Other': ['mean'],
 'NAME_GOODS_CATEGORY_Photo / Cinema Equipment': ['mean'],
 'NAME_GOODS_CATEGORY_Sport and Leisure': ['mean'],
 'NAME_GOODS_CATEGORY_Tourism': ['mean'],
 'NAME_GOODS_CATEGORY_Vehicles': ['mean'],
 'NAME_GOODS_CATEGORY_Weapon': ['mean'],
 'NAME_GOODS_CATEGORY_XNA': ['mean'],
 'NAME_GOODS_CATEGORY_nan': ['mean'],
 'NAME_PORTFOLIO_Cards': ['mean'],
 'NAME_PORTFOLIO_Cars': ['mean'],
 'NAME_PORTFOLIO_Cash': ['mean'],
 'NAME_PORTFOLIO_POS': ['mean'],
 'NAME_PORTFOLIO_XNA': ['mean'],
 'NAME_PORTFOLIO_nan': ['mean'],
 'NAME_PRODUCT_TYPE_XNA': ['mean'],
 'NAME_PRODUCT_TYPE_walk-in': ['mean'],
 'NAME_PRODUCT_TYPE_x-sell': ['mean'],
 'NAME_PRODUCT_TYPE_nan': ['mean'],
 'CHANNEL_TYPE_AP+ (Cash loan)': ['mean'],
 'CHANNEL_TYPE_Car dealer': ['mean'],
 'CHANNEL_TYPE_Channel of corporate sales': ['mean'],
 'CHANNEL_TYPE_Contact center': ['mean'],
 'CHANNEL_TYPE_Country-wide': ['mean'],
 'CHANNEL_TYPE_Credit and cash offices': ['mean'],
 'CHANNEL_TYPE_Regional / Local': ['mean'],
 'CHANNEL_TYPE_Stone': ['mean'],
 'CHANNEL_TYPE_nan': ['mean'],
 'NAME_SELLER_INDUSTRY_Auto technology': ['mean'],
 'NAME_SELLER_INDUSTRY_Clothing': ['mean'],
 'NAME_SELLER_INDUSTRY_Connectivity': ['mean'],
 'NAME_SELLER_INDUSTRY_Construction': ['mean'],
 'NAME_SELLER_INDUSTRY_Consumer electronics': ['mean'],
 'NAME_SELLER_INDUSTRY_Furniture': ['mean'],
 'NAME_SELLER_INDUSTRY_Industry': ['mean'],
 'NAME_SELLER_INDUSTRY_Jewelry': ['mean'],
 'NAME_SELLER_INDUSTRY_MLM partners': ['mean'],
 'NAME_SELLER_INDUSTRY_Tourism': ['mean'],
 'NAME_SELLER_INDUSTRY_XNA': ['mean'],
 'NAME_SELLER_INDUSTRY_nan': ['mean'],
 'NAME_YIELD_GROUP_XNA': ['mean'],
 'NAME_YIELD_GROUP_high': ['mean'],
 'NAME_YIELD_GROUP_low_action': ['mean'],
 'NAME_YIELD_GROUP_low_normal': ['mean'],
 'NAME_YIELD_GROUP_middle': ['mean'],
 'NAME_YIELD_GROUP_nan': ['mean'],
 'PRODUCT_COMBINATION_Card Street': ['mean'],
 'PRODUCT_COMBINATION_Card X-Sell': ['mean'],
 'PRODUCT_COMBINATION_Cash': ['mean'],
 'PRODUCT_COMBINATION_Cash Street: high': ['mean'],
 'PRODUCT_COMBINATION_Cash Street: low': ['mean'],
 'PRODUCT_COMBINATION_Cash Street: middle': ['mean'],
 'PRODUCT_COMBINATION_Cash X-Sell: high': ['mean'],
 'PRODUCT_COMBINATION_Cash X-Sell: low': ['mean'],
 'PRODUCT_COMBINATION_Cash X-Sell: middle': ['mean'],
 'PRODUCT_COMBINATION_POS household with interest': ['mean'],
 'PRODUCT_COMBINATION_POS household without interest': ['mean'],
 'PRODUCT_COMBINATION_POS industry with interest': ['mean'],
 'PRODUCT_COMBINATION_POS industry without interest': ['mean'],
 'PRODUCT_COMBINATION_POS mobile with interest': ['mean'],
 'PRODUCT_COMBINATION_POS mobile without interest': ['mean'],
 'PRODUCT_COMBINATION_POS other with interest': ['mean'],
 'PRODUCT_COMBINATION_POS others without interest': ['mean'],
 'PRODUCT_COMBINATION_nan': ['mean']}
# 5.3 Perform aggregation now on SK_ID_CURR:

grouped = prev.groupby('SK_ID_CURR')
prev_agg=grouped.agg({**num_aggregations, **cat_aggregations})
# 5.3.1
prev_agg.shape    # (338857, 189)
prev_agg.columns
prev_agg.head()
(338857, 189)
MultiIndex([(                                       'AMT_ANNUITY',  'min'),
            (                                       'AMT_ANNUITY',  'max'),
            (                                       'AMT_ANNUITY', 'mean'),
            (                                   'AMT_APPLICATION',  'min'),
            (                                   'AMT_APPLICATION',  'max'),
            (                                   'AMT_APPLICATION', 'mean'),
            (                                        'AMT_CREDIT',  'min'),
            (                                        'AMT_CREDIT',  'max'),
            (                                        'AMT_CREDIT', 'mean'),
            (                                   'APP_CREDIT_PERC',  'min'),
            ...
            (           'PRODUCT_COMBINATION_Cash X-Sell: middle', 'mean'),
            (   'PRODUCT_COMBINATION_POS household with interest', 'mean'),
            ('PRODUCT_COMBINATION_POS household without interest', 'mean'),
            (    'PRODUCT_COMBINATION_POS industry with interest', 'mean'),
            ( 'PRODUCT_COMBINATION_POS industry without interest', 'mean'),
            (      'PRODUCT_COMBINATION_POS mobile with interest', 'mean'),
            (   'PRODUCT_COMBINATION_POS mobile without interest', 'mean'),
            (       'PRODUCT_COMBINATION_POS other with interest', 'mean'),
            (   'PRODUCT_COMBINATION_POS others without interest', 'mean'),
            (                           'PRODUCT_COMBINATION_nan', 'mean')],
           length=189)
AMT_ANNUITY AMT_APPLICATION AMT_CREDIT APP_CREDIT_PERC AMT_DOWN_PAYMENT AMT_GOODS_PRICE HOUR_APPR_PROCESS_START RATE_DOWN_PAYMENT DAYS_DECISION CNT_PAYMENT NAME_CONTRACT_TYPE_Cash loans NAME_CONTRACT_TYPE_Consumer loans NAME_CONTRACT_TYPE_Revolving loans NAME_CONTRACT_TYPE_XNA NAME_CONTRACT_TYPE_nan WEEKDAY_APPR_PROCESS_START_FRIDAY WEEKDAY_APPR_PROCESS_START_MONDAY WEEKDAY_APPR_PROCESS_START_SATURDAY WEEKDAY_APPR_PROCESS_START_SUNDAY WEEKDAY_APPR_PROCESS_START_THURSDAY WEEKDAY_APPR_PROCESS_START_TUESDAY WEEKDAY_APPR_PROCESS_START_WEDNESDAY WEEKDAY_APPR_PROCESS_START_nan FLAG_LAST_APPL_PER_CONTRACT_N FLAG_LAST_APPL_PER_CONTRACT_Y FLAG_LAST_APPL_PER_CONTRACT_nan NAME_CASH_LOAN_PURPOSE_Building a house or an annex NAME_CASH_LOAN_PURPOSE_Business development NAME_CASH_LOAN_PURPOSE_Buying a garage NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land ... NAME_PORTFOLIO_nan NAME_PRODUCT_TYPE_XNA NAME_PRODUCT_TYPE_walk-in NAME_PRODUCT_TYPE_x-sell NAME_PRODUCT_TYPE_nan CHANNEL_TYPE_AP+ (Cash loan) CHANNEL_TYPE_Car dealer CHANNEL_TYPE_Channel of corporate sales CHANNEL_TYPE_Contact center CHANNEL_TYPE_Country-wide CHANNEL_TYPE_Credit and cash offices CHANNEL_TYPE_Regional / Local CHANNEL_TYPE_Stone CHANNEL_TYPE_nan NAME_SELLER_INDUSTRY_Auto technology NAME_SELLER_INDUSTRY_Clothing NAME_SELLER_INDUSTRY_Connectivity NAME_SELLER_INDUSTRY_Construction NAME_SELLER_INDUSTRY_Consumer electronics NAME_SELLER_INDUSTRY_Furniture NAME_SELLER_INDUSTRY_Industry NAME_SELLER_INDUSTRY_Jewelry NAME_SELLER_INDUSTRY_MLM partners NAME_SELLER_INDUSTRY_Tourism NAME_SELLER_INDUSTRY_XNA NAME_SELLER_INDUSTRY_nan NAME_YIELD_GROUP_XNA NAME_YIELD_GROUP_high NAME_YIELD_GROUP_low_action NAME_YIELD_GROUP_low_normal NAME_YIELD_GROUP_middle NAME_YIELD_GROUP_nan PRODUCT_COMBINATION_Card Street PRODUCT_COMBINATION_Card X-Sell PRODUCT_COMBINATION_Cash PRODUCT_COMBINATION_Cash Street: high PRODUCT_COMBINATION_Cash Street: low PRODUCT_COMBINATION_Cash Street: middle PRODUCT_COMBINATION_Cash X-Sell: high PRODUCT_COMBINATION_Cash X-Sell: low PRODUCT_COMBINATION_Cash X-Sell: middle PRODUCT_COMBINATION_POS household with interest PRODUCT_COMBINATION_POS household without interest PRODUCT_COMBINATION_POS industry with interest PRODUCT_COMBINATION_POS industry without interest PRODUCT_COMBINATION_POS mobile with interest PRODUCT_COMBINATION_POS mobile without interest PRODUCT_COMBINATION_POS other with interest PRODUCT_COMBINATION_POS others without interest PRODUCT_COMBINATION_nan
min max mean min max mean min max mean min max mean var min max mean min max mean min max mean min max mean min max mean mean sum mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean ... mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean mean
SK_ID_CURR
100001 3951.000 3951.000 3951.000 24835.5 24835.5 24835.50 23787.0 23787.0 23787.00 1.044079 1.044079 1.044079 NaN 2520.0 2520.0 2520.0 24835.5 24835.5 24835.5 13 13 13.000000 0.104326 0.104326 0.104326 -1740 -1740 -1740.0 8.0 8.0 0.000000 1.000000 0.0 0.0 0 1.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0 1.000000 0.0 0.000000 0 0.0 0.0 0.0 0.0 1.000000 0.000000 0.0 0.000000 0 0.0 0.0 1.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.000000 0 0.0 1.0 0.0 0.000000 0.000000 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 1.0 0.0 0.0 0.0 0.0
100002 9251.775 9251.775 9251.775 179055.0 179055.0 179055.00 179055.0 179055.0 179055.00 1.000000 1.000000 1.000000 NaN 0.0 0.0 0.0 179055.0 179055.0 179055.0 9 9 9.000000 0.000000 0.000000 0.000000 -606 -606 -606.0 24.0 24.0 0.000000 1.000000 0.0 0.0 0 0.000000 0.0 1.000000 0.000000 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0 1.000000 0.0 0.000000 0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 1.000000 0 1.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.000000 0 0.0 0.0 0.0 1.000000 0.000000 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.0 0.0 1.0 0.0 0.0
100003 6737.310 98356.995 56553.990 68809.5 900000.0 435436.50 68053.5 1035882.0 484191.00 0.868825 1.011109 0.949329 0.005324 0.0 6885.0 3442.5 68809.5 900000.0 435436.5 12 17 14.666667 0.000000 0.100061 0.050030 -2341 -746 -1305.0 10.0 30.0 0.333333 0.666667 0.0 0.0 0 0.333333 0.0 0.333333 0.333333 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0 0.666667 0.0 0.333333 0 0.0 0.0 0.0 0.0 0.333333 0.333333 0.0 0.333333 0 0.0 0.0 0.0 0.0 0.333333 0.333333 0.0 0.0 0.0 0.0 0.333333 0 0.0 0.0 0.0 0.333333 0.666667 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.333333 0.0 0.333333 0.0 0.333333 0.0 0.0 0.0 0.0 0.0 0.0
100004 5357.250 5357.250 5357.250 24282.0 24282.0 24282.00 20106.0 20106.0 20106.00 1.207699 1.207699 1.207699 NaN 4860.0 4860.0 4860.0 24282.0 24282.0 24282.0 5 5 5.000000 0.212008 0.212008 0.212008 -815 -815 -815.0 4.0 4.0 0.000000 1.000000 0.0 0.0 0 1.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0 1.000000 0.0 0.000000 0 0.0 0.0 0.0 0.0 0.000000 0.000000 1.0 0.000000 0 0.0 0.0 1.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.000000 0 0.0 0.0 0.0 0.000000 1.000000 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.0 1.0 0.0 0.0 0.0
100005 4813.200 4813.200 4813.200 0.0 44617.5 22308.75 0.0 40153.5 20076.75 1.111173 1.111173 1.111173 NaN 4464.0 4464.0 4464.0 44617.5 44617.5 44617.5 10 11 10.500000 0.108964 0.108964 0.108964 -757 -315 -536.0 12.0 12.0 0.500000 0.500000 0.0 0.0 0 0.500000 0.0 0.000000 0.000000 0.5 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0 1.000000 0.0 0.000000 0 0.0 0.0 0.0 0.0 0.500000 0.500000 0.0 0.000000 0 0.0 0.0 0.5 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.500000 0 0.5 0.5 0.0 0.000000 0.000000 0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.5 0.0 0.0 0.0 0.0

5 rows × 189 columns

# 5.4 Rename multiindex columns:

prev_agg.columns = pd.Index(['PREV_' + e[0] + "_" + e[1].upper() for e in prev_agg.columns.tolist()])
# 5.5
prev_agg.shape      # (338857, 189)
prev_agg.columns
prev_agg.head()
(338857, 189)
Index(['PREV_AMT_ANNUITY_MIN', 'PREV_AMT_ANNUITY_MAX', 'PREV_AMT_ANNUITY_MEAN',
       'PREV_AMT_APPLICATION_MIN', 'PREV_AMT_APPLICATION_MAX',
       'PREV_AMT_APPLICATION_MEAN', 'PREV_AMT_CREDIT_MIN',
       'PREV_AMT_CREDIT_MAX', 'PREV_AMT_CREDIT_MEAN',
       'PREV_APP_CREDIT_PERC_MIN',
       ...
       'PREV_PRODUCT_COMBINATION_Cash X-Sell: middle_MEAN',
       'PREV_PRODUCT_COMBINATION_POS household with interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS household without interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS industry with interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS industry without interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS mobile with interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS mobile without interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS other with interest_MEAN',
       'PREV_PRODUCT_COMBINATION_POS others without interest_MEAN',
       'PREV_PRODUCT_COMBINATION_nan_MEAN'],
      dtype='object', length=189)
PREV_AMT_ANNUITY_MIN PREV_AMT_ANNUITY_MAX PREV_AMT_ANNUITY_MEAN PREV_AMT_APPLICATION_MIN PREV_AMT_APPLICATION_MAX PREV_AMT_APPLICATION_MEAN PREV_AMT_CREDIT_MIN PREV_AMT_CREDIT_MAX PREV_AMT_CREDIT_MEAN PREV_APP_CREDIT_PERC_MIN PREV_APP_CREDIT_PERC_MAX PREV_APP_CREDIT_PERC_MEAN PREV_APP_CREDIT_PERC_VAR PREV_AMT_DOWN_PAYMENT_MIN PREV_AMT_DOWN_PAYMENT_MAX PREV_AMT_DOWN_PAYMENT_MEAN PREV_AMT_GOODS_PRICE_MIN PREV_AMT_GOODS_PRICE_MAX PREV_AMT_GOODS_PRICE_MEAN PREV_HOUR_APPR_PROCESS_START_MIN PREV_HOUR_APPR_PROCESS_START_MAX PREV_HOUR_APPR_PROCESS_START_MEAN PREV_RATE_DOWN_PAYMENT_MIN PREV_RATE_DOWN_PAYMENT_MAX PREV_RATE_DOWN_PAYMENT_MEAN PREV_DAYS_DECISION_MIN PREV_DAYS_DECISION_MAX PREV_DAYS_DECISION_MEAN PREV_CNT_PAYMENT_MEAN PREV_CNT_PAYMENT_SUM PREV_NAME_CONTRACT_TYPE_Cash loans_MEAN PREV_NAME_CONTRACT_TYPE_Consumer loans_MEAN PREV_NAME_CONTRACT_TYPE_Revolving loans_MEAN PREV_NAME_CONTRACT_TYPE_XNA_MEAN PREV_NAME_CONTRACT_TYPE_nan_MEAN PREV_WEEKDAY_APPR_PROCESS_START_FRIDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_MONDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_SATURDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_SUNDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_THURSDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_TUESDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_WEDNESDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_nan_MEAN PREV_FLAG_LAST_APPL_PER_CONTRACT_N_MEAN PREV_FLAG_LAST_APPL_PER_CONTRACT_Y_MEAN PREV_FLAG_LAST_APPL_PER_CONTRACT_nan_MEAN PREV_NAME_CASH_LOAN_PURPOSE_Building a house or an annex_MEAN PREV_NAME_CASH_LOAN_PURPOSE_Business development_MEAN PREV_NAME_CASH_LOAN_PURPOSE_Buying a garage_MEAN PREV_NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land_MEAN ... PREV_NAME_PORTFOLIO_nan_MEAN PREV_NAME_PRODUCT_TYPE_XNA_MEAN PREV_NAME_PRODUCT_TYPE_walk-in_MEAN PREV_NAME_PRODUCT_TYPE_x-sell_MEAN PREV_NAME_PRODUCT_TYPE_nan_MEAN PREV_CHANNEL_TYPE_AP+ (Cash loan)_MEAN PREV_CHANNEL_TYPE_Car dealer_MEAN PREV_CHANNEL_TYPE_Channel of corporate sales_MEAN PREV_CHANNEL_TYPE_Contact center_MEAN PREV_CHANNEL_TYPE_Country-wide_MEAN PREV_CHANNEL_TYPE_Credit and cash offices_MEAN PREV_CHANNEL_TYPE_Regional / Local_MEAN PREV_CHANNEL_TYPE_Stone_MEAN PREV_CHANNEL_TYPE_nan_MEAN PREV_NAME_SELLER_INDUSTRY_Auto technology_MEAN PREV_NAME_SELLER_INDUSTRY_Clothing_MEAN PREV_NAME_SELLER_INDUSTRY_Connectivity_MEAN PREV_NAME_SELLER_INDUSTRY_Construction_MEAN PREV_NAME_SELLER_INDUSTRY_Consumer electronics_MEAN PREV_NAME_SELLER_INDUSTRY_Furniture_MEAN PREV_NAME_SELLER_INDUSTRY_Industry_MEAN PREV_NAME_SELLER_INDUSTRY_Jewelry_MEAN PREV_NAME_SELLER_INDUSTRY_MLM partners_MEAN PREV_NAME_SELLER_INDUSTRY_Tourism_MEAN PREV_NAME_SELLER_INDUSTRY_XNA_MEAN PREV_NAME_SELLER_INDUSTRY_nan_MEAN PREV_NAME_YIELD_GROUP_XNA_MEAN PREV_NAME_YIELD_GROUP_high_MEAN PREV_NAME_YIELD_GROUP_low_action_MEAN PREV_NAME_YIELD_GROUP_low_normal_MEAN PREV_NAME_YIELD_GROUP_middle_MEAN PREV_NAME_YIELD_GROUP_nan_MEAN PREV_PRODUCT_COMBINATION_Card Street_MEAN PREV_PRODUCT_COMBINATION_Card X-Sell_MEAN PREV_PRODUCT_COMBINATION_Cash_MEAN PREV_PRODUCT_COMBINATION_Cash Street: high_MEAN PREV_PRODUCT_COMBINATION_Cash Street: low_MEAN PREV_PRODUCT_COMBINATION_Cash Street: middle_MEAN PREV_PRODUCT_COMBINATION_Cash X-Sell: high_MEAN PREV_PRODUCT_COMBINATION_Cash X-Sell: low_MEAN PREV_PRODUCT_COMBINATION_Cash X-Sell: middle_MEAN PREV_PRODUCT_COMBINATION_POS household with interest_MEAN PREV_PRODUCT_COMBINATION_POS household without interest_MEAN PREV_PRODUCT_COMBINATION_POS industry with interest_MEAN PREV_PRODUCT_COMBINATION_POS industry without interest_MEAN PREV_PRODUCT_COMBINATION_POS mobile with interest_MEAN PREV_PRODUCT_COMBINATION_POS mobile without interest_MEAN PREV_PRODUCT_COMBINATION_POS other with interest_MEAN PREV_PRODUCT_COMBINATION_POS others without interest_MEAN PREV_PRODUCT_COMBINATION_nan_MEAN
SK_ID_CURR
100001 3951.000 3951.000 3951.000 24835.5 24835.5 24835.50 23787.0 23787.0 23787.00 1.044079 1.044079 1.044079 NaN 2520.0 2520.0 2520.0 24835.5 24835.5 24835.5 13 13 13.000000 0.104326 0.104326 0.104326 -1740 -1740 -1740.0 8.0 8.0 0.000000 1.000000 0.0 0.0 0 1.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0 1.000000 0.0 0.000000 0 0.0 0.0 0.0 0.0 1.000000 0.000000 0.0 0.000000 0 0.0 0.0 1.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.000000 0 0.0 1.0 0.0 0.000000 0.000000 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 1.0 0.0 0.0 0.0 0.0
100002 9251.775 9251.775 9251.775 179055.0 179055.0 179055.00 179055.0 179055.0 179055.00 1.000000 1.000000 1.000000 NaN 0.0 0.0 0.0 179055.0 179055.0 179055.0 9 9 9.000000 0.000000 0.000000 0.000000 -606 -606 -606.0 24.0 24.0 0.000000 1.000000 0.0 0.0 0 0.000000 0.0 1.000000 0.000000 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0 1.000000 0.0 0.000000 0 0.0 0.0 0.0 0.0 0.000000 0.000000 0.0 1.000000 0 1.0 0.0 0.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.000000 0 0.0 0.0 0.0 1.000000 0.000000 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.0 0.0 1.0 0.0 0.0
100003 6737.310 98356.995 56553.990 68809.5 900000.0 435436.50 68053.5 1035882.0 484191.00 0.868825 1.011109 0.949329 0.005324 0.0 6885.0 3442.5 68809.5 900000.0 435436.5 12 17 14.666667 0.000000 0.100061 0.050030 -2341 -746 -1305.0 10.0 30.0 0.333333 0.666667 0.0 0.0 0 0.333333 0.0 0.333333 0.333333 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0 0.666667 0.0 0.333333 0 0.0 0.0 0.0 0.0 0.333333 0.333333 0.0 0.333333 0 0.0 0.0 0.0 0.0 0.333333 0.333333 0.0 0.0 0.0 0.0 0.333333 0 0.0 0.0 0.0 0.333333 0.666667 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.333333 0.0 0.333333 0.0 0.333333 0.0 0.0 0.0 0.0 0.0 0.0
100004 5357.250 5357.250 5357.250 24282.0 24282.0 24282.00 20106.0 20106.0 20106.00 1.207699 1.207699 1.207699 NaN 4860.0 4860.0 4860.0 24282.0 24282.0 24282.0 5 5 5.000000 0.212008 0.212008 0.212008 -815 -815 -815.0 4.0 4.0 0.000000 1.000000 0.0 0.0 0 1.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0 1.000000 0.0 0.000000 0 0.0 0.0 0.0 0.0 0.000000 0.000000 1.0 0.000000 0 0.0 0.0 1.0 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.000000 0 0.0 0.0 0.0 0.000000 1.000000 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.0 1.0 0.0 0.0 0.0
100005 4813.200 4813.200 4813.200 0.0 44617.5 22308.75 0.0 40153.5 20076.75 1.111173 1.111173 1.111173 NaN 4464.0 4464.0 4464.0 44617.5 44617.5 44617.5 10 11 10.500000 0.108964 0.108964 0.108964 -757 -315 -536.0 12.0 12.0 0.500000 0.500000 0.0 0.0 0 0.500000 0.0 0.000000 0.000000 0.5 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0 1.000000 0.0 0.000000 0 0.0 0.0 0.0 0.0 0.500000 0.500000 0.0 0.000000 0 0.0 0.0 0.5 0.0 0.000000 0.000000 0.0 0.0 0.0 0.0 0.500000 0 0.5 0.5 0.0 0.000000 0.000000 0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.5 0.0 0.0 0.0 0.0

5 rows × 189 columns

More aggregations

Table, prev_agg, from previous operations is our main table that we will carry to next exercise. To this aggregated table, we add more aggregations.
We will perform aggregations on two subsets of data. On both the subsets on numerical features only. One subset is extracted by setting NAME_CONTRACT_STATUS_Approved == 1 and the other subset is extracted by setting NAME_CONTRACT_STATUS_Refused == 1.

It is as if we are trying to extract the behaviour of those whose previous applications have been approved and those whose previous applications have NOT been approved.
# 6.0 Previous Applications: Summarise numerical features from Approved Applications

approved = prev[prev['NAME_CONTRACT_STATUS_Approved'] == 1]
approved_agg = approved.groupby('SK_ID_CURR').agg(num_aggregations)
# 6.1 Look at the aggregated results:

approved_agg.columns
approved_agg.head()
MultiIndex([(            'AMT_ANNUITY',  'min'),
            (            'AMT_ANNUITY',  'max'),
            (            'AMT_ANNUITY', 'mean'),
            (        'AMT_APPLICATION',  'min'),
            (        'AMT_APPLICATION',  'max'),
            (        'AMT_APPLICATION', 'mean'),
            (             'AMT_CREDIT',  'min'),
            (             'AMT_CREDIT',  'max'),
            (             'AMT_CREDIT', 'mean'),
            (        'APP_CREDIT_PERC',  'min'),
            (        'APP_CREDIT_PERC',  'max'),
            (        'APP_CREDIT_PERC', 'mean'),
            (        'APP_CREDIT_PERC',  'var'),
            (       'AMT_DOWN_PAYMENT',  'min'),
            (       'AMT_DOWN_PAYMENT',  'max'),
            (       'AMT_DOWN_PAYMENT', 'mean'),
            (        'AMT_GOODS_PRICE',  'min'),
            (        'AMT_GOODS_PRICE',  'max'),
            (        'AMT_GOODS_PRICE', 'mean'),
            ('HOUR_APPR_PROCESS_START',  'min'),
            ('HOUR_APPR_PROCESS_START',  'max'),
            ('HOUR_APPR_PROCESS_START', 'mean'),
            (      'RATE_DOWN_PAYMENT',  'min'),
            (      'RATE_DOWN_PAYMENT',  'max'),
            (      'RATE_DOWN_PAYMENT', 'mean'),
            (          'DAYS_DECISION',  'min'),
            (          'DAYS_DECISION',  'max'),
            (          'DAYS_DECISION', 'mean'),
            (            'CNT_PAYMENT', 'mean'),
            (            'CNT_PAYMENT',  'sum')],
           )
AMT_ANNUITY AMT_APPLICATION AMT_CREDIT APP_CREDIT_PERC AMT_DOWN_PAYMENT AMT_GOODS_PRICE HOUR_APPR_PROCESS_START RATE_DOWN_PAYMENT DAYS_DECISION CNT_PAYMENT
min max mean min max mean min max mean min max mean var min max mean min max mean min max mean min max mean min max mean mean sum
SK_ID_CURR
100001 3951.000 3951.000 3951.000 24835.5 24835.5 24835.5 23787.0 23787.0 23787.0 1.044079 1.044079 1.044079 NaN 2520.0 2520.0 2520.0 24835.5 24835.5 24835.5 13 13 13.000000 0.104326 0.104326 0.104326 -1740 -1740 -1740.0 8.0 8.0
100002 9251.775 9251.775 9251.775 179055.0 179055.0 179055.0 179055.0 179055.0 179055.0 1.000000 1.000000 1.000000 NaN 0.0 0.0 0.0 179055.0 179055.0 179055.0 9 9 9.000000 0.000000 0.000000 0.000000 -606 -606 -606.0 24.0 24.0
100003 6737.310 98356.995 56553.990 68809.5 900000.0 435436.5 68053.5 1035882.0 484191.0 0.868825 1.011109 0.949329 0.005324 0.0 6885.0 3442.5 68809.5 900000.0 435436.5 12 17 14.666667 0.000000 0.100061 0.050030 -2341 -746 -1305.0 10.0 30.0
100004 5357.250 5357.250 5357.250 24282.0 24282.0 24282.0 20106.0 20106.0 20106.0 1.207699 1.207699 1.207699 NaN 4860.0 4860.0 4860.0 24282.0 24282.0 24282.0 5 5 5.000000 0.212008 0.212008 0.212008 -815 -815 -815.0 4.0 4.0
100005 4813.200 4813.200 4813.200 44617.5 44617.5 44617.5 40153.5 40153.5 40153.5 1.111173 1.111173 1.111173 NaN 4464.0 4464.0 4464.0 44617.5 44617.5 44617.5 11 11 11.000000 0.108964 0.108964 0.108964 -757 -757 -757.0 12.0 12.0
# 6.2 Rename multi-index column names:

approved_agg.columns = pd.Index(['APPROVED_' + e[0] + "_" + e[1].upper() for e in approved_agg.columns.tolist()])
# 6.2.1 Look at it again:

approved_agg.shape     # (337698, 30)
approved_agg.head()
(337698, 30)
APPROVED_AMT_ANNUITY_MIN APPROVED_AMT_ANNUITY_MAX APPROVED_AMT_ANNUITY_MEAN APPROVED_AMT_APPLICATION_MIN APPROVED_AMT_APPLICATION_MAX APPROVED_AMT_APPLICATION_MEAN APPROVED_AMT_CREDIT_MIN APPROVED_AMT_CREDIT_MAX APPROVED_AMT_CREDIT_MEAN APPROVED_APP_CREDIT_PERC_MIN APPROVED_APP_CREDIT_PERC_MAX APPROVED_APP_CREDIT_PERC_MEAN APPROVED_APP_CREDIT_PERC_VAR APPROVED_AMT_DOWN_PAYMENT_MIN APPROVED_AMT_DOWN_PAYMENT_MAX APPROVED_AMT_DOWN_PAYMENT_MEAN APPROVED_AMT_GOODS_PRICE_MIN APPROVED_AMT_GOODS_PRICE_MAX APPROVED_AMT_GOODS_PRICE_MEAN APPROVED_HOUR_APPR_PROCESS_START_MIN APPROVED_HOUR_APPR_PROCESS_START_MAX APPROVED_HOUR_APPR_PROCESS_START_MEAN APPROVED_RATE_DOWN_PAYMENT_MIN APPROVED_RATE_DOWN_PAYMENT_MAX APPROVED_RATE_DOWN_PAYMENT_MEAN APPROVED_DAYS_DECISION_MIN APPROVED_DAYS_DECISION_MAX APPROVED_DAYS_DECISION_MEAN APPROVED_CNT_PAYMENT_MEAN APPROVED_CNT_PAYMENT_SUM
SK_ID_CURR
100001 3951.000 3951.000 3951.000 24835.5 24835.5 24835.5 23787.0 23787.0 23787.0 1.044079 1.044079 1.044079 NaN 2520.0 2520.0 2520.0 24835.5 24835.5 24835.5 13 13 13.000000 0.104326 0.104326 0.104326 -1740 -1740 -1740.0 8.0 8.0
100002 9251.775 9251.775 9251.775 179055.0 179055.0 179055.0 179055.0 179055.0 179055.0 1.000000 1.000000 1.000000 NaN 0.0 0.0 0.0 179055.0 179055.0 179055.0 9 9 9.000000 0.000000 0.000000 0.000000 -606 -606 -606.0 24.0 24.0
100003 6737.310 98356.995 56553.990 68809.5 900000.0 435436.5 68053.5 1035882.0 484191.0 0.868825 1.011109 0.949329 0.005324 0.0 6885.0 3442.5 68809.5 900000.0 435436.5 12 17 14.666667 0.000000 0.100061 0.050030 -2341 -746 -1305.0 10.0 30.0
100004 5357.250 5357.250 5357.250 24282.0 24282.0 24282.0 20106.0 20106.0 20106.0 1.207699 1.207699 1.207699 NaN 4860.0 4860.0 4860.0 24282.0 24282.0 24282.0 5 5 5.000000 0.212008 0.212008 0.212008 -815 -815 -815.0 4.0 4.0
100005 4813.200 4813.200 4813.200 44617.5 44617.5 44617.5 40153.5 40153.5 40153.5 1.111173 1.111173 1.111173 NaN 4464.0 4464.0 4464.0 44617.5 44617.5 44617.5 11 11 11.000000 0.108964 0.108964 0.108964 -757 -757 -757.0 12.0 12.0
# 6.3 Join 'approved_agg' with 'prev_agg'.

prev_agg = prev_agg.join(                      # prev_agg is on the left
                         approved_agg,         # table on the right
                         how='left',           # Join on left table. All its rows remain
                         on='SK_ID_CURR'       # Joining key. 
                        )
# 6.3.1

prev_agg.shape     # (338857, 219)
prev_agg.head()
(338857, 219)
PREV_AMT_ANNUITY_MIN PREV_AMT_ANNUITY_MAX PREV_AMT_ANNUITY_MEAN PREV_AMT_APPLICATION_MIN PREV_AMT_APPLICATION_MAX PREV_AMT_APPLICATION_MEAN PREV_AMT_CREDIT_MIN PREV_AMT_CREDIT_MAX PREV_AMT_CREDIT_MEAN PREV_APP_CREDIT_PERC_MIN PREV_APP_CREDIT_PERC_MAX PREV_APP_CREDIT_PERC_MEAN PREV_APP_CREDIT_PERC_VAR PREV_AMT_DOWN_PAYMENT_MIN PREV_AMT_DOWN_PAYMENT_MAX PREV_AMT_DOWN_PAYMENT_MEAN PREV_AMT_GOODS_PRICE_MIN PREV_AMT_GOODS_PRICE_MAX PREV_AMT_GOODS_PRICE_MEAN PREV_HOUR_APPR_PROCESS_START_MIN PREV_HOUR_APPR_PROCESS_START_MAX PREV_HOUR_APPR_PROCESS_START_MEAN PREV_RATE_DOWN_PAYMENT_MIN PREV_RATE_DOWN_PAYMENT_MAX PREV_RATE_DOWN_PAYMENT_MEAN PREV_DAYS_DECISION_MIN PREV_DAYS_DECISION_MAX PREV_DAYS_DECISION_MEAN PREV_CNT_PAYMENT_MEAN PREV_CNT_PAYMENT_SUM PREV_NAME_CONTRACT_TYPE_Cash loans_MEAN PREV_NAME_CONTRACT_TYPE_Consumer loans_MEAN PREV_NAME_CONTRACT_TYPE_Revolving loans_MEAN PREV_NAME_CONTRACT_TYPE_XNA_MEAN PREV_NAME_CONTRACT_TYPE_nan_MEAN PREV_WEEKDAY_APPR_PROCESS_START_FRIDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_MONDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_SATURDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_SUNDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_THURSDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_TUESDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_WEDNESDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_nan_MEAN PREV_FLAG_LAST_APPL_PER_CONTRACT_N_MEAN PREV_FLAG_LAST_APPL_PER_CONTRACT_Y_MEAN PREV_FLAG_LAST_APPL_PER_CONTRACT_nan_MEAN PREV_NAME_CASH_LOAN_PURPOSE_Building a house or an annex_MEAN PREV_NAME_CASH_LOAN_PURPOSE_Business development_MEAN PREV_NAME_CASH_LOAN_PURPOSE_Buying a garage_MEAN PREV_NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land_MEAN ... PREV_NAME_YIELD_GROUP_middle_MEAN PREV_NAME_YIELD_GROUP_nan_MEAN PREV_PRODUCT_COMBINATION_Card Street_MEAN PREV_PRODUCT_COMBINATION_Card X-Sell_MEAN PREV_PRODUCT_COMBINATION_Cash_MEAN PREV_PRODUCT_COMBINATION_Cash Street: high_MEAN PREV_PRODUCT_COMBINATION_Cash Street: low_MEAN PREV_PRODUCT_COMBINATION_Cash Street: middle_MEAN PREV_PRODUCT_COMBINATION_Cash X-Sell: high_MEAN PREV_PRODUCT_COMBINATION_Cash X-Sell: low_MEAN PREV_PRODUCT_COMBINATION_Cash X-Sell: middle_MEAN PREV_PRODUCT_COMBINATION_POS household with interest_MEAN PREV_PRODUCT_COMBINATION_POS household without interest_MEAN PREV_PRODUCT_COMBINATION_POS industry with interest_MEAN PREV_PRODUCT_COMBINATION_POS industry without interest_MEAN PREV_PRODUCT_COMBINATION_POS mobile with interest_MEAN PREV_PRODUCT_COMBINATION_POS mobile without interest_MEAN PREV_PRODUCT_COMBINATION_POS other with interest_MEAN PREV_PRODUCT_COMBINATION_POS others without interest_MEAN PREV_PRODUCT_COMBINATION_nan_MEAN APPROVED_AMT_ANNUITY_MIN APPROVED_AMT_ANNUITY_MAX APPROVED_AMT_ANNUITY_MEAN APPROVED_AMT_APPLICATION_MIN APPROVED_AMT_APPLICATION_MAX APPROVED_AMT_APPLICATION_MEAN APPROVED_AMT_CREDIT_MIN APPROVED_AMT_CREDIT_MAX APPROVED_AMT_CREDIT_MEAN APPROVED_APP_CREDIT_PERC_MIN APPROVED_APP_CREDIT_PERC_MAX APPROVED_APP_CREDIT_PERC_MEAN APPROVED_APP_CREDIT_PERC_VAR APPROVED_AMT_DOWN_PAYMENT_MIN APPROVED_AMT_DOWN_PAYMENT_MAX APPROVED_AMT_DOWN_PAYMENT_MEAN APPROVED_AMT_GOODS_PRICE_MIN APPROVED_AMT_GOODS_PRICE_MAX APPROVED_AMT_GOODS_PRICE_MEAN APPROVED_HOUR_APPR_PROCESS_START_MIN APPROVED_HOUR_APPR_PROCESS_START_MAX APPROVED_HOUR_APPR_PROCESS_START_MEAN APPROVED_RATE_DOWN_PAYMENT_MIN APPROVED_RATE_DOWN_PAYMENT_MAX APPROVED_RATE_DOWN_PAYMENT_MEAN APPROVED_DAYS_DECISION_MIN APPROVED_DAYS_DECISION_MAX APPROVED_DAYS_DECISION_MEAN APPROVED_CNT_PAYMENT_MEAN APPROVED_CNT_PAYMENT_SUM
SK_ID_CURR
100001 3951.000 3951.000 3951.000 24835.5 24835.5 24835.50 23787.0 23787.0 23787.00 1.044079 1.044079 1.044079 NaN 2520.0 2520.0 2520.0 24835.5 24835.5 24835.5 13 13 13.000000 0.104326 0.104326 0.104326 -1740 -1740 -1740.0 8.0 8.0 0.000000 1.000000 0.0 0.0 0 1.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0.000000 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 1.0 0.0 0.0 0.0 0.0 3951.000 3951.000 3951.000 24835.5 24835.5 24835.5 23787.0 23787.0 23787.0 1.044079 1.044079 1.044079 NaN 2520.0 2520.0 2520.0 24835.5 24835.5 24835.5 13.0 13.0 13.000000 0.104326 0.104326 0.104326 -1740.0 -1740.0 -1740.0 8.0 8.0
100002 9251.775 9251.775 9251.775 179055.0 179055.0 179055.00 179055.0 179055.0 179055.00 1.000000 1.000000 1.000000 NaN 0.0 0.0 0.0 179055.0 179055.0 179055.0 9 9 9.000000 0.000000 0.000000 0.000000 -606 -606 -606.0 24.0 24.0 0.000000 1.000000 0.0 0.0 0 0.000000 0.0 1.000000 0.000000 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0.000000 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.0 0.0 1.0 0.0 0.0 9251.775 9251.775 9251.775 179055.0 179055.0 179055.0 179055.0 179055.0 179055.0 1.000000 1.000000 1.000000 NaN 0.0 0.0 0.0 179055.0 179055.0 179055.0 9.0 9.0 9.000000 0.000000 0.000000 0.000000 -606.0 -606.0 -606.0 24.0 24.0
100003 6737.310 98356.995 56553.990 68809.5 900000.0 435436.50 68053.5 1035882.0 484191.00 0.868825 1.011109 0.949329 0.005324 0.0 6885.0 3442.5 68809.5 900000.0 435436.5 12 17 14.666667 0.000000 0.100061 0.050030 -2341 -746 -1305.0 10.0 30.0 0.333333 0.666667 0.0 0.0 0 0.333333 0.0 0.333333 0.333333 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0.666667 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.333333 0.0 0.333333 0.0 0.333333 0.0 0.0 0.0 0.0 0.0 0.0 6737.310 98356.995 56553.990 68809.5 900000.0 435436.5 68053.5 1035882.0 484191.0 0.868825 1.011109 0.949329 0.005324 0.0 6885.0 3442.5 68809.5 900000.0 435436.5 12.0 17.0 14.666667 0.000000 0.100061 0.050030 -2341.0 -746.0 -1305.0 10.0 30.0
100004 5357.250 5357.250 5357.250 24282.0 24282.0 24282.00 20106.0 20106.0 20106.00 1.207699 1.207699 1.207699 NaN 4860.0 4860.0 4860.0 24282.0 24282.0 24282.0 5 5 5.000000 0.212008 0.212008 0.212008 -815 -815 -815.0 4.0 4.0 0.000000 1.000000 0.0 0.0 0 1.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 1.000000 0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.0 1.0 0.0 0.0 0.0 5357.250 5357.250 5357.250 24282.0 24282.0 24282.0 20106.0 20106.0 20106.0 1.207699 1.207699 1.207699 NaN 4860.0 4860.0 4860.0 24282.0 24282.0 24282.0 5.0 5.0 5.000000 0.212008 0.212008 0.212008 -815.0 -815.0 -815.0 4.0 4.0
100005 4813.200 4813.200 4813.200 0.0 44617.5 22308.75 0.0 40153.5 20076.75 1.111173 1.111173 1.111173 NaN 4464.0 4464.0 4464.0 44617.5 44617.5 44617.5 10 11 10.500000 0.108964 0.108964 0.108964 -757 -315 -536.0 12.0 12.0 0.500000 0.500000 0.0 0.0 0 0.500000 0.0 0.000000 0.000000 0.5 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 0.000000 0 0.0 0.0 0.5 0.0 0.0 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.0 0.5 0.0 0.0 0.0 0.0 4813.200 4813.200 4813.200 44617.5 44617.5 44617.5 40153.5 40153.5 40153.5 1.111173 1.111173 1.111173 NaN 4464.0 4464.0 4464.0 44617.5 44617.5 44617.5 11.0 11.0 11.000000 0.108964 0.108964 0.108964 -757.0 -757.0 -757.0 12.0 12.0

5 rows × 219 columns

# 6.4 Similarly for refused applications perform aggregations of numerical features:

refused = prev[prev['NAME_CONTRACT_STATUS_Refused'] == 1]
refused_agg = refused.groupby('SK_ID_CURR').agg(num_aggregations)
# 6.4.1

refused_agg.shape      # (118277, 30)
refused_agg.head()
(118277, 30)
AMT_ANNUITY AMT_APPLICATION AMT_CREDIT APP_CREDIT_PERC AMT_DOWN_PAYMENT AMT_GOODS_PRICE HOUR_APPR_PROCESS_START RATE_DOWN_PAYMENT DAYS_DECISION CNT_PAYMENT
min max mean min max mean min max mean min max mean var min max mean min max mean min max mean min max mean min max mean mean sum
SK_ID_CURR
100006 32696.100 32696.100 32696.100 688500.000000 688500.0 688500.000000 906615.000 906615.0 906615.0000 0.759418 0.759418 0.759418 NaN NaN NaN NaN 688500.000 688500.0 688500.0000 15 15 15.000 NaN NaN NaN -181 -181 -181.000 48.000000 48.0
100011 NaN NaN NaN 0.000000 0.0 0.000000 0.000 0.0 0.0000 NaN NaN NaN NaN NaN NaN NaN 0.000 0.0 0.0000 9 9 9.000 NaN NaN NaN -1162 -1162 -1162.000 NaN 0.0
100027 22556.475 22556.475 22556.475 225000.000000 225000.0 225000.000000 239850.000 239850.0 239850.0000 0.938086 0.938086 0.938086 NaN NaN NaN NaN 225000.000 225000.0 225000.0000 14 14 14.000 NaN NaN NaN -181 -181 -181.000 12.000000 12.0
100030 2826.450 6176.925 4073.265 21969.224609 43870.5 33767.121094 21969.225 43870.5 32533.2225 1.000000 1.200580 1.042280 0.005227 0.0 6714.0 1344.6 21969.225 43870.5 33767.1225 7 16 12.500 0.0 0.210919 0.042874 -2689 -840 -2053.900 9.833333 59.0
100035 22308.750 33238.800 27773.775 0.000000 1260000.0 241875.000000 0.000 1260000.0 241875.0000 1.000000 1.000000 1.000000 0.000000 NaN NaN NaN 675000.000 1260000.0 967500.0000 11 14 12.375 NaN NaN NaN -160 -119 -143.375 54.000000 108.0
# 6.5

refused_agg.columns = pd.Index(['REFUSED_' + e[0] + "_" + e[1].upper() for e in refused_agg.columns.tolist()])
refused_agg.head()
refused_agg.shape   # (118277, 30)
REFUSED_AMT_ANNUITY_MIN REFUSED_AMT_ANNUITY_MAX REFUSED_AMT_ANNUITY_MEAN REFUSED_AMT_APPLICATION_MIN REFUSED_AMT_APPLICATION_MAX REFUSED_AMT_APPLICATION_MEAN REFUSED_AMT_CREDIT_MIN REFUSED_AMT_CREDIT_MAX REFUSED_AMT_CREDIT_MEAN REFUSED_APP_CREDIT_PERC_MIN REFUSED_APP_CREDIT_PERC_MAX REFUSED_APP_CREDIT_PERC_MEAN REFUSED_APP_CREDIT_PERC_VAR REFUSED_AMT_DOWN_PAYMENT_MIN REFUSED_AMT_DOWN_PAYMENT_MAX REFUSED_AMT_DOWN_PAYMENT_MEAN REFUSED_AMT_GOODS_PRICE_MIN REFUSED_AMT_GOODS_PRICE_MAX REFUSED_AMT_GOODS_PRICE_MEAN REFUSED_HOUR_APPR_PROCESS_START_MIN REFUSED_HOUR_APPR_PROCESS_START_MAX REFUSED_HOUR_APPR_PROCESS_START_MEAN REFUSED_RATE_DOWN_PAYMENT_MIN REFUSED_RATE_DOWN_PAYMENT_MAX REFUSED_RATE_DOWN_PAYMENT_MEAN REFUSED_DAYS_DECISION_MIN REFUSED_DAYS_DECISION_MAX REFUSED_DAYS_DECISION_MEAN REFUSED_CNT_PAYMENT_MEAN REFUSED_CNT_PAYMENT_SUM
SK_ID_CURR
100006 32696.100 32696.100 32696.100 688500.000000 688500.0 688500.000000 906615.000 906615.0 906615.0000 0.759418 0.759418 0.759418 NaN NaN NaN NaN 688500.000 688500.0 688500.0000 15 15 15.000 NaN NaN NaN -181 -181 -181.000 48.000000 48.0
100011 NaN NaN NaN 0.000000 0.0 0.000000 0.000 0.0 0.0000 NaN NaN NaN NaN NaN NaN NaN 0.000 0.0 0.0000 9 9 9.000 NaN NaN NaN -1162 -1162 -1162.000 NaN 0.0
100027 22556.475 22556.475 22556.475 225000.000000 225000.0 225000.000000 239850.000 239850.0 239850.0000 0.938086 0.938086 0.938086 NaN NaN NaN NaN 225000.000 225000.0 225000.0000 14 14 14.000 NaN NaN NaN -181 -181 -181.000 12.000000 12.0
100030 2826.450 6176.925 4073.265 21969.224609 43870.5 33767.121094 21969.225 43870.5 32533.2225 1.000000 1.200580 1.042280 0.005227 0.0 6714.0 1344.6 21969.225 43870.5 33767.1225 7 16 12.500 0.0 0.210919 0.042874 -2689 -840 -2053.900 9.833333 59.0
100035 22308.750 33238.800 27773.775 0.000000 1260000.0 241875.000000 0.000 1260000.0 241875.0000 1.000000 1.000000 1.000000 0.000000 NaN NaN NaN 675000.000 1260000.0 967500.0000 11 14 12.375 NaN NaN NaN -160 -119 -143.375 54.000000 108.0
(118277, 30)
# 7.0 Join refused_agg with prev_agg:

prev_agg = prev_agg.join(                     # prev_agg: left
                         refused_agg,         # table on the right
                         how='left',
                         on='SK_ID_CURR'
                        )
# 7.1 Our final table:

prev_agg.shape     # 338857, 249)
prev_agg.head()
(338857, 249)
PREV_AMT_ANNUITY_MIN PREV_AMT_ANNUITY_MAX PREV_AMT_ANNUITY_MEAN PREV_AMT_APPLICATION_MIN PREV_AMT_APPLICATION_MAX PREV_AMT_APPLICATION_MEAN PREV_AMT_CREDIT_MIN PREV_AMT_CREDIT_MAX PREV_AMT_CREDIT_MEAN PREV_APP_CREDIT_PERC_MIN PREV_APP_CREDIT_PERC_MAX PREV_APP_CREDIT_PERC_MEAN PREV_APP_CREDIT_PERC_VAR PREV_AMT_DOWN_PAYMENT_MIN PREV_AMT_DOWN_PAYMENT_MAX PREV_AMT_DOWN_PAYMENT_MEAN PREV_AMT_GOODS_PRICE_MIN PREV_AMT_GOODS_PRICE_MAX PREV_AMT_GOODS_PRICE_MEAN PREV_HOUR_APPR_PROCESS_START_MIN PREV_HOUR_APPR_PROCESS_START_MAX PREV_HOUR_APPR_PROCESS_START_MEAN PREV_RATE_DOWN_PAYMENT_MIN PREV_RATE_DOWN_PAYMENT_MAX PREV_RATE_DOWN_PAYMENT_MEAN PREV_DAYS_DECISION_MIN PREV_DAYS_DECISION_MAX PREV_DAYS_DECISION_MEAN PREV_CNT_PAYMENT_MEAN PREV_CNT_PAYMENT_SUM PREV_NAME_CONTRACT_TYPE_Cash loans_MEAN PREV_NAME_CONTRACT_TYPE_Consumer loans_MEAN PREV_NAME_CONTRACT_TYPE_Revolving loans_MEAN PREV_NAME_CONTRACT_TYPE_XNA_MEAN PREV_NAME_CONTRACT_TYPE_nan_MEAN PREV_WEEKDAY_APPR_PROCESS_START_FRIDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_MONDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_SATURDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_SUNDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_THURSDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_TUESDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_WEDNESDAY_MEAN PREV_WEEKDAY_APPR_PROCESS_START_nan_MEAN PREV_FLAG_LAST_APPL_PER_CONTRACT_N_MEAN PREV_FLAG_LAST_APPL_PER_CONTRACT_Y_MEAN PREV_FLAG_LAST_APPL_PER_CONTRACT_nan_MEAN PREV_NAME_CASH_LOAN_PURPOSE_Building a house or an annex_MEAN PREV_NAME_CASH_LOAN_PURPOSE_Business development_MEAN PREV_NAME_CASH_LOAN_PURPOSE_Buying a garage_MEAN PREV_NAME_CASH_LOAN_PURPOSE_Buying a holiday home / land_MEAN ... APPROVED_APP_CREDIT_PERC_MAX APPROVED_APP_CREDIT_PERC_MEAN APPROVED_APP_CREDIT_PERC_VAR APPROVED_AMT_DOWN_PAYMENT_MIN APPROVED_AMT_DOWN_PAYMENT_MAX APPROVED_AMT_DOWN_PAYMENT_MEAN APPROVED_AMT_GOODS_PRICE_MIN APPROVED_AMT_GOODS_PRICE_MAX APPROVED_AMT_GOODS_PRICE_MEAN APPROVED_HOUR_APPR_PROCESS_START_MIN APPROVED_HOUR_APPR_PROCESS_START_MAX APPROVED_HOUR_APPR_PROCESS_START_MEAN APPROVED_RATE_DOWN_PAYMENT_MIN APPROVED_RATE_DOWN_PAYMENT_MAX APPROVED_RATE_DOWN_PAYMENT_MEAN APPROVED_DAYS_DECISION_MIN APPROVED_DAYS_DECISION_MAX APPROVED_DAYS_DECISION_MEAN APPROVED_CNT_PAYMENT_MEAN APPROVED_CNT_PAYMENT_SUM REFUSED_AMT_ANNUITY_MIN REFUSED_AMT_ANNUITY_MAX REFUSED_AMT_ANNUITY_MEAN REFUSED_AMT_APPLICATION_MIN REFUSED_AMT_APPLICATION_MAX REFUSED_AMT_APPLICATION_MEAN REFUSED_AMT_CREDIT_MIN REFUSED_AMT_CREDIT_MAX REFUSED_AMT_CREDIT_MEAN REFUSED_APP_CREDIT_PERC_MIN REFUSED_APP_CREDIT_PERC_MAX REFUSED_APP_CREDIT_PERC_MEAN REFUSED_APP_CREDIT_PERC_VAR REFUSED_AMT_DOWN_PAYMENT_MIN REFUSED_AMT_DOWN_PAYMENT_MAX REFUSED_AMT_DOWN_PAYMENT_MEAN REFUSED_AMT_GOODS_PRICE_MIN REFUSED_AMT_GOODS_PRICE_MAX REFUSED_AMT_GOODS_PRICE_MEAN REFUSED_HOUR_APPR_PROCESS_START_MIN REFUSED_HOUR_APPR_PROCESS_START_MAX REFUSED_HOUR_APPR_PROCESS_START_MEAN REFUSED_RATE_DOWN_PAYMENT_MIN REFUSED_RATE_DOWN_PAYMENT_MAX REFUSED_RATE_DOWN_PAYMENT_MEAN REFUSED_DAYS_DECISION_MIN REFUSED_DAYS_DECISION_MAX REFUSED_DAYS_DECISION_MEAN REFUSED_CNT_PAYMENT_MEAN REFUSED_CNT_PAYMENT_SUM
SK_ID_CURR
100001 3951.000 3951.000 3951.000 24835.5 24835.5 24835.50 23787.0 23787.0 23787.00 1.044079 1.044079 1.044079 NaN 2520.0 2520.0 2520.0 24835.5 24835.5 24835.5 13 13 13.000000 0.104326 0.104326 0.104326 -1740 -1740 -1740.0 8.0 8.0 0.000000 1.000000 0.0 0.0 0 1.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 1.044079 1.044079 NaN 2520.0 2520.0 2520.0 24835.5 24835.5 24835.5 13.0 13.0 13.000000 0.104326 0.104326 0.104326 -1740.0 -1740.0 -1740.0 8.0 8.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
100002 9251.775 9251.775 9251.775 179055.0 179055.0 179055.00 179055.0 179055.0 179055.00 1.000000 1.000000 1.000000 NaN 0.0 0.0 0.0 179055.0 179055.0 179055.0 9 9 9.000000 0.000000 0.000000 0.000000 -606 -606 -606.0 24.0 24.0 0.000000 1.000000 0.0 0.0 0 0.000000 0.0 1.000000 0.000000 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 1.000000 1.000000 NaN 0.0 0.0 0.0 179055.0 179055.0 179055.0 9.0 9.0 9.000000 0.000000 0.000000 0.000000 -606.0 -606.0 -606.0 24.0 24.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
100003 6737.310 98356.995 56553.990 68809.5 900000.0 435436.50 68053.5 1035882.0 484191.00 0.868825 1.011109 0.949329 0.005324 0.0 6885.0 3442.5 68809.5 900000.0 435436.5 12 17 14.666667 0.000000 0.100061 0.050030 -2341 -746 -1305.0 10.0 30.0 0.333333 0.666667 0.0 0.0 0 0.333333 0.0 0.333333 0.333333 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 1.011109 0.949329 0.005324 0.0 6885.0 3442.5 68809.5 900000.0 435436.5 12.0 17.0 14.666667 0.000000 0.100061 0.050030 -2341.0 -746.0 -1305.0 10.0 30.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
100004 5357.250 5357.250 5357.250 24282.0 24282.0 24282.00 20106.0 20106.0 20106.00 1.207699 1.207699 1.207699 NaN 4860.0 4860.0 4860.0 24282.0 24282.0 24282.0 5 5 5.000000 0.212008 0.212008 0.212008 -815 -815 -815.0 4.0 4.0 0.000000 1.000000 0.0 0.0 0 1.000000 0.0 0.000000 0.000000 0.0 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 1.207699 1.207699 NaN 4860.0 4860.0 4860.0 24282.0 24282.0 24282.0 5.0 5.0 5.000000 0.212008 0.212008 0.212008 -815.0 -815.0 -815.0 4.0 4.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
100005 4813.200 4813.200 4813.200 0.0 44617.5 22308.75 0.0 40153.5 20076.75 1.111173 1.111173 1.111173 NaN 4464.0 4464.0 4464.0 44617.5 44617.5 44617.5 10 11 10.500000 0.108964 0.108964 0.108964 -757 -315 -536.0 12.0 12.0 0.500000 0.500000 0.0 0.0 0 0.500000 0.0 0.000000 0.000000 0.5 0.0 0.0 0 0.0 1.0 0 0.0 0.0 0.0 0.0 ... 1.111173 1.111173 NaN 4464.0 4464.0 4464.0 44617.5 44617.5 44617.5 11.0 11.0 11.000000 0.108964 0.108964 0.108964 -757.0 -757.0 -757.0 12.0 12.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

5 rows × 249 columns

# 8.0 Save the results for subsequent use:
prev_agg.to_csv("processed_prev_agg.csv.zip", compression = "zip")
####################