PuppetConf 2013 recap

Better late then never. Here’s my little puppetconf 2013 recap.

Day 1

AWS Architecting for resilience & cost at scale

Really great talk with common hints for avoiding costs and failure.

Autoscaling and AWS
Hybrid stuff: CI Build AMIs
using puppetmaster behind ELB for scaleout
Also some insights into their graphite-on-AWS setup: Using a c1.xlarge.

Nobody Has To Die Today: Keeping The Peace With The Other Meat Sacks

Most impressive talk for me. The speaker came in from the back with a loud signal-horn a few minutes late. Shocking silence immediately. This man knows how to get the audience!

People actually suck at everything. We cannot see, hear, smell … and communicate very good, only average. The only thing we have is failing and learning from failures. People might call this “intelligent”. We suck at communication, because we are not aware of contextes. Context of people is different. So how can we get better at communicating? Correct, clear, concise, consistent, comprehensive. For example “<software X> sucks” might be correct, but not clear. “<software X> sucks, because I had the following problems: …” is much more clear and comprehensive.
My personal experience has shown that it’s even better to avoid the word “be” or its forms. You are less offending if you say “for me <software x> seems” or “last time i looked it was like …”. Also reminded me of Jon Allpaws “On being a senior engineer”

Puppet Module Reusability - What I Learned from Shipping to the Forge

stop the fork!
use rspec-puppet travis and test matrixes
use puppet-rspec-system for integration tests (serverspec is another alternative)
os specific stuff in default parameters classes

Puppet at GitHub

Some random facts from this talk:

unicorn as webserver for puppetmaster
currently managing 600 nodes
pull deployment (cron executes puppet each hour) - distributed over some minutes in order to avoid overloads (the same we are doing at Jimdo for our nodes)
puppetdb / nagiosdb
filtergendb for iptables
gpanel as dashboard
augeas in order to avoid the need to define every config var as a puppet param
test-queue to run tests parallel: it iteratively finds out the best test distribution

And last but not least my favorite quote: “No software is better than no software”. A nice explanation for this ‘total cost of ownership’ thingy. Keep it simple, stupid!

Multi-Provider Vagrant: AWS, VMware, and More

The most impressive part of this presentation for me have been the insights into ‘packer’. It bridges the gap between Configuration Management and Golden images. It builds images for several target platforms from a single source template. Ok, boring? But wait! It has provisioners, so the same concept as vagrant. You can use your existing configuration management to build a golden, or maybe just a ‘silver’ image: With a hybrid approach, e. g. a continuous base image building with e. g. updated packages, but also an active config management you can get the best from both worlds:

Fast bootstrapping of new nodes because the initial cfg mgmt run only takes seconds
enables for autoscaling concept
nodes get cfg updates and hotfixes (e. g. security updates) via config management, no rotation of entire environment nodes necessary as it would be in golden-image scenario.
Pure rotation concept still possible in the future
great for migration phase from cfgmgmt-only to ephemeral nodes
No manual steps for image building, knowledge is externalized into code (templates).

There’s also netflix aminator, which does not need a running ec2 instance to build AMIs, a current packer limitation AFAIK.

Building Data-Driven Infrastructure with Puppet

James Fryman is searching for recurring patterns in IT. Like everyone else searching for the holy grail of wisdom, he ended up with systems theory and systems thinking. Most things we do day by day in operations are repetitive.

Interesting concept of codifying the state of nodes or systems via hubot. E. g. use puppet to seed initial facts in /etc/facter/facter.d, then toggle them via puppet/hubot/...

Day 2

Anatomy of a Reuseable Module

A really good overview from Alessandro (the example42 guy) how to write reusable puppet modules including the following patterns:
let the user decide how to manage config files, e. g. provide good defaults but make them overloadable
let the user decide how system users are managed
even make included classes overloadable. This ensures maximum flexibility.

I learned that puppet allows you to dynamically include classes like include $classname, at least in puppet 3.0.
I also like the standardization initiative. Have a look at the example42 standardized modules.

The Road to the White House with Puppet and AWS

Once again Asgard
random puppet tips:
- use a base class to include common stuff on all nodes
- store credentials not on servers but in S3 and use IAM policies to define roles which have access
- use s3 based package repositores so you don’t have to care for: load balacing, monitoring, security, os upgrades (remember: total cost of ownership!) maybe like this

DevOps isn't just for WebOps: The guerrilla's guide to cultural change

I really enjoyed this war-story from Michael.
DevOps is not about hipster technology, 10 deploys a day etc and writing your own tools (NIH syndrome). It’s first about changing your org step by step to make it a better place for you, your coworkers and also your customers!
When you automate or optimize, focus on system bottlenecks, don’t automate or optimize stuff with little (local) impact. Even things that might look worth to be automated at a first glance, might be no real system bottlenecks. But we all tend to see our own problems first and it’s also harder to focus on bigger bottlenecks because you usually need to work together(!) with other people or teams. Uhhh!

Next rule: Reduce variability. Thus you have to measure variability first or make it explicit. I had to smile about the fact that they reduced database downtime and increased customer satisfaction by just taking the database down in a SCHEDULED maintenance to apply changes instead of waiting for the unscheduled incident. So a small process change (no software development needed!) made the situation better, not a hipster devops tool.

Share the pain: Developers with a pager tend to make better design decisions. :-) This might be a controversial topic, though.

And please, shout your failures. By being humble and pointing out your own failures you are leading by example. Maybe others follow and also get a little bit more humble.

Monitoring in a IaaS Age

Kris showed us the current state of monitoring in the IAAS world and how concepts are still clashing here. First we had a look why monitoring actually sucks? It’s because it’s often a manual process of setting up. And it’s always done last and thus might be forgotten (no time, no budget left, or boring). So we need a way to automate the setup in the same way we are automating setting up services.

So what are services? Is ‘service’ monitoring == ‘service’ monitoring? Who cares if an ‘service’, e. g. apache, is down if the ‘service’ (exposed to consumers) is still available? Reminded my of cucumber-nagios.

So lets have a look at the toolchain: With nagios + naginator we have automatic host and “service” monitoring. With some other tools stated (good tool overview!) we also can automate setting up a metrics based monitoring (e. g. ‘are payment requests being processed’). Thus we get a entirely automated service+monitoring+alerting setup.
One point is left open here for me:
How to do real service monitoring in a load-balancing cluster with puppet? Puppet is host based so where to add the check for a load balancer? Idea would be a puppet class which is included on the monitoring host which does checks against various services. But this is a manual step again ;-)
By also exporting metrics and dashboards we could close the circle for an entire automated service lifecycle management.

At the end of the second day I accidentally joined a crowd to visit and drink beer at the new GitHub HQ 3.0. Mind. blown. Have a look at the pics ;)