Tenía pendiente dejar un breve resumen de uno de los seminarios por los que pasé este agosto en la Carnegie Mellon, el de “Cluster Management at Google” de John Wilkes. Aunque ya dejé algunas notas en Twitter, en este post quedan ampliadas y recopiladas:
– Cluster management is the term that Google uses to describe how they control the computing infrastructure in their datacenters that supports almost all of their external services.
– Cluster management: a fleer of machines live in datacenters placed in different regions / countries.
– Your storage system pages you because there are only a few Petabyter of free space left.
– Main causes of service outages: network power, rare events (wild dogs, sharks, dead horses, drunken hunters, etc.)
– Goals: run everything, high utilization, predictable behavior, keep going… with low operator effort.
– Best way to save energy is to write good software.
– Don’t buy idle machines.
– Make an app work right in production: priceless.
– Large-scale systems have some fun problems.
– Configuration may be the next big challenge.
- Truco (con IA o sin ella) para espiar (legalmente) a tu competencia - 6 marzo, 2025
- Lo que NO te aconsejo hacer si quieres que SI se valore tu conocimiento - 27 febrero, 2025
- Como una PIZZA te puede dar una clase magistral de IA - 20 febrero, 2025
Pingback: Bitacoras.com
Interesante, no quiero ni imaginarme como solucionarán esos eventos extraños (wild dogs, sharks, dead horses, drunken hunters, etc.)