Gini and Entropy in Decision trees
Introduction Decision trees is a popular method along with linear regression, but how does it work? If you look at sklearn’s decision tree classifer: class sklearn.tree.DecisionTreeClassifier(*, ...
Introduction Decision trees is a popular method along with linear regression, but how does it work? If you look at sklearn’s decision tree classifer: class sklearn.tree.DecisionTreeClassifier(*, ...
Problem statement Want to get started on ab testing? You probably have to first understand hypothesis testing and the following 4 terms: Type I Error Type II Error Sample Size Power Pr...
Objective This is not a post on metrics or explaining what they are, but sharing my personal way of interpretation and how i memorize / associate different metrics Pre-req Understanding of metri...
Introduction Over the years, I keep finding myself to revisit concepts of linear regression, and how it is related to other concepts within statistics, machine learning - which is primarily the re...
Distributions I try to provide a quick overview of the distributions here, the use case, and how they are related to one another. I will also try to provide certain mathemathical derivations where...
Problem Intro Typically, JSON format is used to send data across systems. If you are in python land, you probably do things in a dictionary, serialize/unserialize it with json.dumps and json.loads...
As more companies get used to the idea of building data products (which, in my opinion is a different problem as compared to build a software product), chances are you will need to consider upstrea...
Introduction When you are writing data applications or products, chances are: You might have a long complex feature engineering process or an expensive compute function, You might have exter...
Introduction This is a further “advanced” section into pytest We will covering a few more use cases & problems you might have encountered: How to set pytest configuration? How to share ...
Problem intro As a data scientist (or analyst), we spend a significant chunk of time to gather & clean data. Sometimes as we are doing feature engineering, we build functions and iterate the f...