Linear Regression Fundamentals

Along with the SAS codes to check for them

Royalty-free illustration by Dan White 100 on Shutterstock

Linear regression is undoubtedly the first algorithm that every data scientist learns whilst embarking on their journey to become a successful data scientist. There are scores of articles and post out there explaining linear regression exhaustively. I am listing down my favorites below.

Now, after learning about linear regression, you must know that this algorithm promises to give good and reliable results only when the underlying assumptions are being taken care of. I scoured the internet for an exhaustive and complete article for the assumptions and the corresponding codes/tests to check for them easily but couldn’t find any. …

Classification model Fundamentals

Read on to know more about the concept and the usage of rank ordering in classification models

Photo by Markus Spiske on Unsplash

I have been working with credit risk models for quite some time now and I have noticed the emphasis our stakeholders put on our classification models following rank-ordering. There is a reason for that and we will learn about that in this short post.

Rank ordering is an important measure of model performance and its ability to separate out the event from the non-events.

For the explanation purpose, we will be talking about a Probability of Default (PD) model wherein Default(event) refers to the instance in which a credit card/loan customer is not able to pay back the amount that…

Business understanding is equally important as the algorithm understanding for all the credit risk modelers/analysts

Photo by CardMapr on Unsplash

Off late, I have been noticing that a lot of focus is on the latest algorithms that are circulating in the data science community. Every day we are introduced to better performing methodologies that can be employed to make our predictive models more efficient than ever before. But with this explosion of knowledge, if the data scientists do not focus on the business understanding of the vertical they are currently building the models for, then one can’t go beyond a certain point. I have been working with the credit risk domain for quite some time now and thus, I have…

It’s really interesting

Image credit: O’Reilly

This is the book that I am currently reading, and while I was having my lunch, its cover page was open on my laptop. I have to admit that after a few seconds, I got rather uneasy with a chameleon just behind my sandwich. I put away the screen then, but that got me thinking about what the reason behind this quirky design pattern of using animals on the cover pages of all the O’Reilly books might be. I dug in deeper to understand this and found some interesting facts about the same.

History of the Design of O’Reilly Book Covers

In the article “A short history of…

It’s really not Tech V/S Us.

Photo by Markus Winkler on Unsplash

Almost every day we stumble across articles that give a detailed overview of how Artificial Intelligence will take over all the human jobs and will cause a high rise in the unemployment rate. Sure, that’s one perspective of visualizing the future in the AI age but I believe there are many aspects of human nature that cannot be mimicked by an AI bot or a program. And those are the actual core values that make us ‘human’.

I do agree that many of the jobs done right now by humans can be automated and replaced by faster, more accurate, and…

Modeling Fundamentals

Population Stability Index and Characteristic stability index

Image by Author

Once a model has been put into PROD (production), regular monitoring is required to make sure that the model is still relevant and reliable. I have written a post on model validation vs model monitoring and the importance of these 2 stages, you can check it out as a prequel to this post.

Moving on to the subject matter of this post, we will learn all about the PSI and CSI i.e. Population Stability Index and Characteristic Stability Index which are one of the most important monitoring strategies used in a lot of domains especially credit risk domain.

PSI and…


The backbone of the model development process

Image by Author

Once the model development steps are complete, model validation comes into the picture. In fact model validation is an important part of the overall model development process. If a developer is spending X amount of time in developing the model, most of the time they spend X or even more than the X amount of time in validating the model and making sure of its robustness and accuracy.

In this post, I’ll emphasize the importance of the Model Validation process and how is it different from the Model monitoring processes.

Consequences of improper Model Validation

1. Poor model performance on unseen data

Data Science Fundamentals

The steps in the lifecycle of a Data Science project

Photo by Allie on Unsplash

We are witnessing a Data Science boom all around us. This has become the new technical jargon that is passed on around like a buzz word nowadays. We’ll learn in this post about what Data Science actually entails by covering the following:

What are the steps in the lifecycle of a Data Science project?

Because looking at this is the best way to understand all the aspects of Data Science through a practical viewpoint.

The lifecycle of a Data Science project

A Data science project can be segregated into 7 steps:

1. Understanding the business problem

Dabbling in Machine Learning, this will be super helpful!

Perfect to educate yourself bit by bit, daily, about the vast pool of Machine learning concepts

Photo by Christian Wiediger on Unsplash

Some time back, I stumbled upon a Machine learning group on Telegram. I started reading the daily updates they posted on that group and the content was amazing. We are so busy in our daily life projects that on most of the days it becomes a tad bit difficult for us to intentionally put aside some time learn about a new application of Machine learning and AI or some efficient way to handle imbalanced datasets or some novel approach to handle the outliers etc…

Basics that everyone in the field of Data science should know

Learn more about WHEN to use which coefficient in this post

Photo by Morning Brew on Unsplash

I recently came across a scenario where I educated myself about the difference between the Pearson and Spearman correlation coefficient. I felt that is one piece of information that a lot of people in the data science fraternity on the medium can make use of. I’ll explain thoroughly the difference between the two and the exact scenarios where the use of each one is suitable. Read on!

Contents of this post:

  1. Definition of Correlation
  2. Comparative analysis between Pearson and Spearman correlation coefficients

Definition of Correlation

Correlation is the degree to which two variables are linearly related. This is an important step in bi-variate…

Juhi Ramzai

Data scientist and ML enthusiast by day| Dreamer, writer, painter by night

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store