"Everyone has the right freely to participate in the cultural life of the community, to enjoy the arts and to share in scientific advancement and its benefits." (Universal Declaration of Human Rights, Article 27).
The data science pipeline has been performed by Joel Carlson, Real Limoges, Joe Warren and myself in the frame of a 16 hours case study at Galvanize (DSI). We chose to include everyone in all the steps of the process instead of strictly isolating one person to a given task. I have been leading the creation of the model, Joel its optimization, Real and Joe did an amazing job with both the back and front end. You can find all the codes used for the pipeline in this
Github repo.
The fraud dataset that we used is private, therefore I cannot present you the data for the modelisation we performed. No worries! I chose another dataset related with telecomunication and present you a data science supervised study (logistic regression, random forest, gradient boosting) that I have performed (based on the knowledge and skills I developed at Galavanize during the Data Science Immersive, April 2016). I wrote this notebook as a reference on how to perform a supervised data science study. You can look at it below, or download it with the data set here:
ds4all supervised learning notebook.