Onehub is built on the Amazon Web Services Platform, and as a result we are able to quickly and easily provision new servers to meet demand. We have servers for load distribution, web serving, queue processing, transcoding, administration, and other tasks. Consequently, every new server we launch needs a specific configuration in order to fulfill its functional role in our infrastructure. After careful consideration, we settled on Puppet to manage the configuration of our servers. Puppet does the majority of the heavy lifting—installing packages, writing out configuration files, etc—but, we still needed something to group and instruct these Puppet instances. Combining a number of open source tools, we were able to vastly simplify the management of our cloud, and bring the familiar cap deploy to our infrastructure.
Assembling The Pieces
At the center of our mash-up is iClassify, a small Rails application that registers Puppet instances as nodes in a database. iClassify provides a YAML description of an individual server to the Puppetmaster, the central Puppet server that compiles the configuration for individual nodes. In order to facilitate registering these nodes, we extracted the icagent tool from iClassify and made it available as both a gem and RPM. Using Facter, new nodes come with a host of attributes that describe the server which enables configurations to be specifically tailored. Finally, we use Capistrano to glue everything together.
Setting The Stage
Before any servers can be launched, iClassify and Puppetmaster must be up and running on servers that will be available to all other machines that will register with it; it is handy if both of these services are on the same machine. For simplicity, in our Puppetmaster configuration, we enable autosigning of certificates for all machines with *.internal addresses, but beware, this setting can be dangerous if your machines need to communicate over the internet as opposed to a private LAN.
The Puppetmaster needs to be configured to use iClassify as its external node terminus. Once setup, Puppetmaster will query iClassify for the classes and variables (tags and attributes respectively) to compile as the configuration (catalog) for each individual node:
# Onehub.com # http://onehub.com/developer # /etc/puppet/puppet.conf [main] # Use iClassify to describe nodes node_terminus = exec external_nodes = "/path/to/icpuppet -u read_only_iclassify_user -p read_only_iclassify_user_password -s https://iclassify.server"
Starting The Show
At the base of all new servers is our ‘Stem Cell’ image. This Amazon Machine Image has only two packages beyond a basic Linux Distribution: Puppet and icagent, at boot it is configured to run both of these. First, icagent: $ icagent —server https://operations.server.com This command run icagent’s recipes against the server and then submit all of the gathered information to the iClassify server along with a UUID. Second, Puppet: $ /etc/init.d/puppet start The newly registered node will default to receiving the base Puppet configuration which includes a few handy packages and Puppet modules we use on all of our machines:
# Onehub.com
# http://onehub.com/developer
# /etc/puppet/templates.pp
node default {
$nodetype = "base"
include base
}
class base {
include monit, system, yum, root, onehub, sshd, hosts, motd, puppet, ntpd, snmpd, git-core, sudo, splunk, postfix
# This affects all environments regardless of $puppet_env!
$packagelist = ["vim-enhanced", "emacs", "perl", "ruby-1.8.6.369-1.fc8", "rubygems", "lynx", "pv", "screen"]
package { $packagelist:
ensure => installed
}
package { "ec2-api-tools":
ensure => installed,
require => File["/etc/yum.repos.d/onehub.repo"]
}
}
Issuing Orders

With our new machine up and running, we need to assign it a role. On the iClassify server the newly registered node will show up at the top of the server listing. It is a good idea to edit it to have a more descriptive name, and then add the tags (that will map to puppet classes) that define its role. In addition to the standard attributes, we will add a custom puppet_env to each node that enables us to correlate servers with our application’s environments (i.e. master, staging, or production). This is critically important as it lets us selectively disable cron tasks and other jobs that we would not want run against test data (like billing!).
![]()
Now that the machine is tagged it will eventually run all of the specified Puppet recipes and be ready to receive the ‘live’ tag. However we’re going to take this a bit further with some nifty Capistrano scripting.
Stringing Up The Puppets
Using the iclassify-interface gem we can take a simple Capistrano deploy script and adapt it to coordinate our Puppet instances.
# Onehub.com
# http://onehub.com/developer
# puppet-repo/config/deploy.rb
require 'iclassify-interface'
require 'capistrano/ext/multistage'
# Connection configuration
set :gateway, "operations.server"
set :use_sudo, false
set :application, "puppet-repo"
set :repository, "git@github.com:username/puppet-repo.git"
set :deploy_to, defer { "/var/lib/puppet/modules/#{stage}" }
set :scm, :git
set :deploy_via, :remote_cache
set :keep_releases, 5
set :stages, %w(development master production skunkworks staging testing)
set :default_stage, "master"
# Set roles, pulling the servers down from iclassify
set :iclassify_client, IClassify::Client.new('https://iclassify.onehub.com', 'read_only_user', 'read_only_user_password')
# List the servers in the app role, so we can invoke restart on all of them.
set :app_servers, defer { iclassify_client.search("puppet_env:#{stage}") }
# Task to define the roles, needs to be delayed so the stage can be loaded
task :define_servers do
# Puppetmaster
role :master, "puppet@operations.server"
# Puppetd for the environment
app_servers.each do |instance|
role :client, "username@#{instance.attrib?('fqdn')}", :no_release => true
end
end
# Hooks
on :start, 'define_servers', :except => stages + ['multistage:prepare']
after "deploy:update", "deploy:cleanup"
namespace :deploy do
# Short this out
task :finalize_update do
# nothing
end
# Short this out
task :migrate do
# nothing
end
task :restart, :roles => :client do
sudo "/etc/init.d/puppet restart", :pty => true
end
end
This example uses a standard Capistrano deploy scheme. Please note that the deploy path from Capistrano should match the module location for the different Puppet environments, our Puppet instances run in the same categories as our application (i.e. puppetenv == railsenv).
# Onehub.com # http://onehub.com/developer # /etc/puppet/puppet.conf # Environments pull in different modules. [skunkworks] manifest = $vardir/modules/skunkworks/current/manifests/site.pp modulepath = $vardir/modules/skunkworks/current/modules [master] manifest = $vardir/modules/master/current/manifests/site.pp modulepath = $vardir/modules/master/current/modules [staging] manifest = $vardir/modules/staging/current/manifests/site.pp modulepath = $vardir/modules/staging/current/modules [production] manifest = $vardir/modules/production/current/manifests/site.pp modulepath = $vardir/modules/production/current/modules
Now, deploying new puppet recipes is as simple as $ cap staging deploy, the code will be checked out of version control and Puppet will be restarted on all the servers in that environment. We can also use this same trick in our application’s deploy script.
# Onehub.com
# http://onehub.com/developer
# application/config/deploy.rb
require 'iclassify-interface' # Interface to talk with iclassify database
# Set roles, pulling the servers down from iClassify, puppet is a ready only user.
set :iclassify_client, IClassify::Client.new('https://iclassify.onehub.com', 'read_only_user', 'read_only_user_password')
set :lb_servers, defer { iclassify_client.search("tag:load-balancer puppet_env:#{stage}") }
set :app_servers, defer { iclassify_client.search("tag:webserver puppet_env:#{stage}") }
set :services_servers, defer { iclassify_client.search("(tag:services OR tag:media) puppet_env:#{stage}") }
task :define_servers do
# Set the environment equal to the stage, this flag is only used for migrations.
set :rails_env, "#{stage}"
lb_servers.each do |instance|
role :lb, instance.attrib?('fqdn'), :no_release => true
end
# List the webservers servers, they are the same as the application servers for us.
app_servers.each do |instance|
role :web, instance.attrib?('fqdn')
role :app, instance.attrib?('fqdn')
end
# List the services servers, same as application, but no mongrel.
services_servers.each do |instance|
role :app, instance.attrib?('fqdn')
end
# Run migrations from the first services server.
role :db, services_servers.first.attrib?('fqdn'), :primary => true
end
on :start, 'define_servers', :except => stages + ['multistage:prepare']
When it comes time to deploy the application the list of servers is dynamically generated, saving us from having to tediously edit the deploy.rb file.
The Final Act
While the simplified deployment of our configurations is a huge feature in itself, the true beauty comes in making ’smart’ Puppet recipes. For example, we have an instance of nginx that functions as a load-balancer: it proxies to our web servers. Nginx needs to know which ‘webservers’ are available for it to proxy to at any given time. Inside our nginx manifest we can define a search for the Puppetmaster to perform against iClassify to populate this list: $nginx_lb_search = "puppet_env:${puppet_env} tag:webserver tag:live"
# Onehub.com
# http://onehub.com/developer
# puppet-repo/nginx/templates/upstream.conf
<% require 'iclassify-interface' -%>
<% ic = IClassify::Client.new(iclassify_server, iclassify_user, iclassify_password) -%>
<% webservers = ic.search(scope.lookupvar("nginx::lb::nginx_lb_search")) -%>
upstream webservers {
<% webservers.each do |webserver| -%>
server <%= webserver.attrib?('hostname') %>:80;
<% end -%>
}
# This will compile to:
upstream webservers {
server webserver-1:80;
server webserver-2:80;
}
We simply (1) start a new server, (2) tag it, (3) deploy our application, (4) tag it live, (5) then deploy Puppet. The first few times there were a few kinks (hey that’s what staging environments are for!), but with all of the pieces strung together, we have automated an amazing amount of tedious and error-prone work. A new webserver will boot, install all of our base packages, receive the application and start its processes, then get automatically included in to the load-balancers round-robin list.