challenges * cost -- avoid big standby instance * tradition is active/standby servers, with data on SAN * here will use SAN (EBS) * no static IPs * no vip / broadcast Solution * small monitoring instance (smallest instance type) * large instance for DB * has EBS attached * farm of web servers that connect to private IP of large instance * small instance notices failure (misses heartbeat), and it kicks off failover procedure * fires kill at large instance * fires start at new large instance, mounts EBS, and runs mysql recovery * makes security group for all web servers, monitoring instance can query this to learn web hosts that need to be changed * small instance will connect to each web, and point it at new large instance pieces * cluster manager -> pacemaker (just resource manager) * must use unicast (no multicast in ec2) * use heartbeat or corosync for communication * doesn't control mysql at all, it just starts an ec2 intance, and mysql starts automatically amazon network can lag, if too aggressive with heartbeat checking, can get false positive. not aggressive with checks, to compensate ec2-run-instance has user script hook, runs as root on VM boot, dynamically inserts IP of management host into config file other ways to do this: * RDS * has support for failover * Continuent -- can manage ec2 * RightScale (EIP based) * Scalarium * Scalebase (beta) can do raid-0 of EBS volumes, up to 4. don't need redunduncy because EBS is guaranteed