Difference between revisions of "Data"

From SourceWiki
Jump to navigation Jump to search
(Created page with 'Category:Pragmatic Programming '''Data: How to surf, rather than drown!''' =Introduction= =Data on Disk= =Data over the Network= =Data Analytics= Some common operatio…')
 
Line 10: Line 10:
 
=Data over the Network=
 
=Data over the Network=
  
 +
 +
 +
 +
=Data when Writing your own Code=
  
 
=Data Analytics=
 
=Data Analytics=

Revision as of 09:27, 25 June 2013

Data: How to surf, rather than drown!

Introduction

Data on Disk

Data over the Network

Data when Writing your own Code

Data Analytics

Some common operations you may want to perform on your data:

  • Cleaning
  • Filtering
  • Calculating summary statics (means, medians, variances)
  • Creating plots & graphics
  • Tests of statistical significance
  • Sorting and searching

Selecting the right tools.

Databases

GUI. Accessing from a program or script. Enterprise Grade The data haven.

Numerical Packages

Such as R, MATLAB & Python.

Bespoke Applications

Rolling Your Own

Principles: Sort & binary search.

Tools: Languages, libraries and packages.


When Data gets Big

Summary

  • Use large files whenever possible.
  • Disks are poor at servicing a large number of seek requests.
  • Check that you're making best use of a computer's memory hierarchy, i.e.:
    • Think about locality of reference.
    • Go to main memory as infrequently as possible.
    • Go to disk as infrequently as possible as possible.
  • Check that your are still using the right tools if your data grows.