Feature engineering for Date & Time — Python

Feature Engineering is the most critical part of any data science project. Especially when the variables are limited one needs to utilize each feature carefully.

Rutvij Bhutaiya
3 min readJun 9, 2022
Photo by Panos Sakalakis on Unsplash

In this article I’ll show a few techniques related to feature engineering for Date & Time variables.

Due to information sensitive projects, I’ll not explain the entire project, but I’ll stick to only Date & Time Feature, ACC_OP_DATE, Account open date for particular users.

Following Is the sample of feature,

In this article first we’ll separate all values to create new features like Day, Month and Year. And one step further we’ll also create a new feature called ACC_DAYS, which tells how many days a particular user has opened the account.

Now, here, data type is an object and users have entered dates with ‘-‘ & ‘/’. Now, this is a common problem when app development stare developers don’t maintain strict format for special characters.

So, my first task is to maintain the same format for all values.

This is easy, we replaced ‘-’ to ‘/’ in all values. And then created tokens.

Now, If you carefully check the data type, then you’ll see dtype in pandas.core.Series. Hence we created list() and then create dataframe with new features, AC_DAY, AC_MONTH, AC_YEAR

And then we concatenate these new features into the main data frame.

Now, to create the new feature called, ACC_DAYS, we create new function,

For reference we took the date as 1st Jan 2022. You can change it accordingly.

Hence, our new feature ACC_DAYS is created,

Under EDA (Exploratory Data Analysis) we can also do bivariate analysis using Day, Month and Year. For Example, we can check which particular Month or Year has the highest Savings account or Current account opened etc..

--

--