Data
Data: How to surf, rather than drown!
Introduction
Data on Disk
Data over the Network
Filling the pipe.
Data when Writing your own Code
Memory Hierarchy
Files & File Formats
Data Analytics
Some common operations you may want to perform on your data:
- Cleaning
- Filtering
- Calculating summary statics (means, medians, variances)
- Creating plots & graphics
- Tests of statistical significance
- Sorting and searching
Selecting the right tools.
Databases
GUI. Accessing from a program or script. Enterprise Grade The data haven.
Numerical Packages
Such as R, MATLAB & Python.
Bespoke Applications
Rolling Your Own
Principles: Sort & binary search.
Tools: Languages, libraries and packages.
When Data gets Big
Summary
- Use large files whenever possible.
- Disks are poor at servicing a large number of seek requests.
- Check that you're making best use of a computer's memory hierarchy, i.e.:
- Think about locality of reference.
- Go to main memory as infrequently as possible.
- Go to disk as infrequently as possible as possible.
- Check that your are still using the right tools if your data grows.