Coursera´s Data Science Specialization Capstone Project

####Coursera´s Data Science Specialization Capstone Project The capstone project allows us (students) to create a usable/public data product that can be used to show the skills developed throughout the nine courses of the data science specialization. In this occasion, we'll work on understanding and building Predictive Text Models like the ones used by SwiftKey- Coursera's corporate partner for this capstone project.

The data used in this project is from a corpus called HC Corpora. The files have been language filtered by Coursera but it still needed some pre-processing.

The following are the steps followed throughout the entire capstone to get to this final stage:

  • An introductory quiz to test whether you have downloaded and can manipulate the data
  • An intermediate R markdown report that describes in plain language, plots, and code your exploratory analysis of the course data set see here
  • Two natural language processing quizzes, where you apply your predictive model to real data to check how it is working.
  • A Shiny app that takes as input a phrase (multiple words), one clicks submit, and it predicts the next word.
  • A 5 slide deck created with R presentations pitching your algorithm and app to your boss or investor. see here

Project Details

Date: Aug 1, 2015

Author: Arturo Cardenas

Categories: project

Tagged: r, data science, knime, NLP, shiny apps



Related Works.

DataDaySG Presetantion

Mapping Census Data in R

Coursera´s Data Science Specialization Capstone Project

Industry Analysis: The Fastener Supply Chain in Aerospace Industry

Google Icons


Data Scientist in-the-making

Social Links


Toronto, Canada
Hermosillo & Monterrey, Mexico