Arm-waving approach to the issues and approaches ================================================ Issues - how to quickly and reliably handle the mass of communication taking place, from all points of the globe, with wildly varying and often unpreditable loads, - how to store/retrieve/update the mass of information needed for the particular application e.g. amazon - huge volume of salable goods, huge volume of requests, financial transactions need to be reliable twitter - huge volume of time-sensitive communication, ensuring tweets reach all the intended followers without overwhelming data centre or networks facebook - huge volume of information stored, heavy posting/querying loads youtube - huge volume of information to be stored, searched, and retrieved Approaches - use traditional network/rdbms approaches, just go for more and better hardware (rapidly gets $$$ EXPENSIVE $$$) - replication: have data copied in many places, users get directed to one of them either randomly or by geography (problem is keeping the copies synchronized: see sharding) - go to non-traditional dbms, typically key-value pairs, designed for fast access (problem is limited query types or high-end processing before/after db query) - server everything out of memory, keep your dbms primarly for long term storage/retrieval (e.g. twitter)