Linear regression is undoubtedly the first algorithm that every data scientist learns whilst embarking on their journey to become a successful data scientist. There are scores of articles and post out there explaining linear regression exhaustively. I am listing down my favorites below.
Now, after learning about linear regression, you must know that this algorithm promises to give good and reliable results only when the underlying assumptions are being taken care of. I scoured the internet for an exhaustive and complete article for the assumptions and the corresponding codes/tests to check for them easily but couldn’t find any. …
I have been working with credit risk models for quite some time now and I have noticed the emphasis our stakeholders put on our classification models following rank-ordering. There is a reason for that and we will learn about that in this short post.
Rank ordering is an important measure of model performance and its ability to separate out the event from the non-events.
For the explanation purpose, we will be talking about a Probability of Default (PD) model wherein Default(event) refers to the instance in which a credit card/loan customer is not able to pay back the amount that…
Off late, I have been noticing that a lot of focus is on the latest algorithms that are circulating in the data science community. Every day we are introduced to better performing methodologies that can be employed to make our predictive models more efficient than ever before. But with this explosion of knowledge, if the data scientists do not focus on the business understanding of the vertical they are currently building the models for, then one can’t go beyond a certain point. I have been working with the credit risk domain for quite some time now and thus, I have…
This is the book that I am currently reading, and while I was having my lunch, its cover page was open on my laptop. I have to admit that after a few seconds, I got rather uneasy with a chameleon just behind my sandwich. I put away the screen then, but that got me thinking about what the reason behind this quirky design pattern of using animals on the cover pages of all the O’Reilly books might be. I dug in deeper to understand this and found some interesting facts about the same.
In the article “A short history of…
Almost every day we stumble across articles that give a detailed overview of how Artificial Intelligence will take over all the human jobs and will cause a high rise in the unemployment rate. Sure, that’s one perspective of visualizing the future in the AI age but I believe there are many aspects of human nature that cannot be mimicked by an AI bot or a program. And those are the actual core values that make us ‘human’.
I do agree that many of the jobs done right now by humans can be automated and replaced by faster, more accurate, and…
Once a model has been put into PROD (production), regular monitoring is required to make sure that the model is still relevant and reliable. I have written a post on model validation vs model monitoring and the importance of these 2 stages, you can check it out as a prequel to this post.
Moving on to the subject matter of this post, we will learn all about the PSI and CSI i.e. Population Stability Index and Characteristic Stability Index which are one of the most important monitoring strategies used in a lot of domains especially credit risk domain.
Once the model development steps are complete, model validation comes into the picture. In fact model validation is an important part of the overall model development process. If a developer is spending X amount of time in developing the model, most of the time they spend X or even more than the X amount of time in validating the model and making sure of its robustness and accuracy.
In this post, I’ll emphasize the importance of the Model Validation process and how is it different from the Model monitoring processes.
We are witnessing a Data Science boom all around us. This has become the new technical jargon that is passed on around like a buzz word nowadays. We’ll learn in this post about what Data Science actually entails by covering the following:
What are the steps in the lifecycle of a Data Science project?
Because looking at this is the best way to understand all the aspects of Data Science through a practical viewpoint.
A Data science project can be segregated into 7 steps:
Perfect to educate yourself bit by bit, daily, about the vast pool of Machine learning concepts
Some time back, I stumbled upon a Machine learning group on Telegram. I started reading the daily updates they posted on that group and the content was amazing. We are so busy in our daily life projects that on most of the days it becomes a tad bit difficult for us to intentionally put aside some time learn about a new application of Machine learning and AI or some efficient way to handle imbalanced datasets or some novel approach to handle the outliers etc…
I recently came across a scenario where I educated myself about the difference between the Pearson and Spearman correlation coefficient. I felt that is one piece of information that a lot of people in the data science fraternity on the medium can make use of. I’ll explain thoroughly the difference between the two and the exact scenarios where the use of each one is suitable. Read on!
Contents of this post:
Correlation is the degree to which two variables are linearly related. This is an important step in bi-variate…
Data scientist and ML enthusiast by day| Dreamer, writer, painter by night