DataScience

From SourceWiki
Jump to navigation Jump to search

What would a course on Data Science look like?

Introduction

Drew Conway's Venn diagram of data science

Topics would include

  • What is relevant for the UoB?
  • y=f(x) relationships:- classifiers & regression
    • Examples: Linear & logistic regression, K-Nearest Neighbours, Decision Trees, Neural Networks etc.
  • Data topics:
    • Training, Test & validation data.
    • Sources of data, e.g. web scraping.
    • Exploratory Data Analysis (EDA).
    • Cleaning & munging data (90% of your effort?). Useful Linux tools.
    • Feature selection.
  • Model selection & training topics:
    • Algorithms that scale.
    • Supervised vs. Unsupervised training.
    • Overfitting.
    • The curse of dimensionality.
  • Programming Skills:
    • "Clean code shows clarity of mind,"
    • Languages: R? Python? Others?
    • Version control.
    • Build systems.
    • Testing.
    • Scripting and automation.