Data

From SourceWiki
Revision as of 09:25, 25 June 2013 by GethinWilliams (talk | contribs) (Created page with 'Category:Pragmatic Programming '''Data: How to surf, rather than drown!''' =Introduction= =Data on Disk= =Data over the Network= =Data Analytics= Some common operatio…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Data: How to surf, rather than drown!

Introduction

Data on Disk

Data over the Network

Data Analytics

Some common operations you may want to perform on your data:

  • Cleaning
  • Filtering
  • Calculating summary statics (means, medians, variances)
  • Creating plots & graphics
  • Tests of statistical significance
  • Sorting and searching

Selecting the right tools.

Databases

GUI. Accessing from a program or script. Enterprise Grade The data haven.

Numerical Packages

Such as R, MATLAB & Python.

Bespoke Applications

Rolling Your Own

Principles: Sort & binary search.

Tools: Languages, libraries and packages.


When Data gets Big

Summary

  • Use large files whenever possible.
  • Disks are poor at servicing a large number of seek requests.
  • Check that you're making best use of a computer's memory hierarchy, i.e.:
    • Think about locality of reference.
    • Go to main memory as infrequently as possible.
    • Go to disk as infrequently as possible as possible.
  • Check that your are still using the right tools if your data grows.