
Bootstraping in Python
In a previous article of this series, I talked about hypothesis testing and confidence intervals using classical methods. However, we had to make assumptions to justify our methods. So what if the ...

In a previous article of this series, I talked about hypothesis testing and confidence intervals using classical methods. However, we had to make assumptions to justify our methods. So what if the ...

1.0 - Introduction In this project, I will focus on data analysis and visualization for the gender wage gap. Specifically, I am going to focus on public jobs in the city of San Francisco. This dat...

So far, we’ve been comparing data with at least one one numerical(continuous) column and one categorical(nominal) column. So what happens if we want to determine the statistical significance of two...

In the previous article, we talked about hypothesis testing using the Welch’s t-test on two independent samples of data. So what happens if we want know the statiscal significance for $k$ groups of...

Suppose that we are in the data science team for an orange juice company. In the meeting, the marketing team claimed that their new marketing strategy resulted in an increase of sales. The manageme...

In the previous article of the series, we explored the concept of dispersion in data. I mentioned how the standard deviation is a very powerful measure of dispersion when the distribution is simila...

In the previous article in this series, we explored the concept of central tendency. The central tedency allows us to grasp the “middle” of the data, but it doesn’t tell us anything about the varia...

Data exploration is an essential part of data science. In order to fully understand the data, we must first understand descriptive statistics. In this exercise, we are going to use the bike sharing...

In this project we are going to explore the machine learning workflow. Specifically, we’ll be looking at the famous titanic dataset. The goal of this project is to accurately predict if a passenge...

Suppose we are given some data set: Let’s assume this dataset has been fully cleaned and processed. We are interested in using a linear regression model on this dataset. However, we need a way t...