The Nerdings

tails of a devops dude @ particle.io

Cloud Operations Is a Game of Pandemic

Three days ago, I had the privilege of participating on an epic 24 hour cloud-ops swarm session troubleshooting various issues introduced with the rollout of the Particle’s new Pricing scheme and website. In the delirious hours before I feel asleep in between PagerDuty alerts, I thought, dang, this sucks, but it’s kind of awesome too, like a game of Pandemic…

Cloud operations is a game of Pandemic.

A contagion leaks into a complex system, creating havoc in many different places.

Source unknown, direction of casuality unknown, root causes unknown, symptoms intermittent.

What do you do?

You’ve got N people each with their own unique specialized skillset

  • An Operations Expert connecting dots and bringing minds together so it functions as a hive
  • A Dispatcher orchestrating communications and facilitating low friction group behavior
  • A Scientist rapidly perceiving causes of multi-factor failures and providing complex cures quickly
  • A Medic handling localized crises quickly without fear, killing small problems before they spread, and contributing to the broader strategy
  • A Researcher that provides valueable knowledge, precise observations, and actionable insights when partered with the right person
  • A Contingency Planner queuing up tasks in case what’s happening now doesn’t work or the virus spreads.
  • A Quaranteen Specialist: I have no metaphor for this character at this time (this is the Internet so it’s cool like that.)

There are many ways to die and 1 way to win

  • Win by stopping the contagion from killing everything.
  • Loose by getting consumed by it.

Under conditions of crisis, you don’t know what N is going to be and what the threats are

  • Is it one engineer trying to play all of these roles at 3am in response to a page?

  • Is it 5 engineers swarming on a crazy failure situation all day and night enumerating mitigations as crisis after crisis compound and new variables come and go?

Conclusion

You never know which roles will be available in a crisis situation. The best team capable of preventing the broadest range of failures in a complex system is the one in which the greatest number of individuals can play the greatest number of roles using the best available technology.

The cultural practice of DevOps, in the right organizational environment, gives rise to high performing teams able to manage the routine failures that complex software systems imply quickly and efficiently under pressure while having fun and feeling good about it when it’s all said and done after a good night’s sleep.

  • Are you a spectacular, collaborative communicator that likes to play fun, intense, complex games? (read: you love what your do and like to work with smart people)

  • Do you have an insanely deep specialized skill and/or absurdly broad software engineering breath (read: can you play the Dispatcher and the Scientist?)

  • Wanna play Pandemic at Particle? (read: a cloud ops job)

We’re hiring for Cloud Ops/DevOps/Platform Reliability Engineers. If you answered yes to any of those questions, please get in touch with me. And if not, make sure to make time to play Pandemic with friends :).

Containerization: Galloping Unicorns From the Future

Containerization is a galloping unicorn from the future of cloud operations. In it’s bursting saddlebags it posesses magic relics to:

  1. eliminate the pain of setting up application development environments
  2. drastically reduce the time it takes to spin up cloud infrastructure for production, testing, high-availability, or auto-scaling
  3. decrease cloud operating costs by piling many containers onto single virtual machines instead of individual [under-utilized] machines.
  4. improve security by isolating application processes further with containers
  5. facilitate rapid sharing of complex infrastructure across cloud providers.

But it’s not the end all be all and in many of these promises it’s still quite rough around the edges. It should DEFINITELY be a huge part of your medium and long term roadmaps, but if you’re thinking you can swoop through and realize value in your organization from all of these in a couple of days, you have another thing coming. Here’s a list on my take of each of these areas with

1. eliminate the pain of setting up application development environments

These two articles (1, 2 ) do an excellent job explaining how you go about building a Docker + Vagrant workflow to build a portable Rails environment. Though they certainly exhibit some great capabilities, if you have a team that doesn’t spin up local VM’s regularly, rough edges of using boot2docker (like it’s special IP, it’s sideloaded VM, and the DOCKER_HOST env var to name a few) can burn you and result in a lot confusion and time spent troublingshooting the “portable painless dev env”.

It’s still worth pursuing because the container/virtualization knowledge gained is valueable. But it’s not gonna happen instantly.

2. drastically reduce the time it takes to spin up cloud infrastructure for production, testing, high-availability, or auto-scaling

For production container usage, Chef-DK + knife-container are the jam. However, more a still beta and their actual functionality is fairly limited, you still need to know a bunch about Docker to really do something cool here.

For infrastructure testing, Test-Kitchen + kitchen-docker are incredible. However, it’s no small feet to get the prerequisite testing harness/infrastructure up before it can deliver on the dream of spinning up entire clusters as containers, running tests, and destroying them in seconds rather than minutes.

3. decrease cloud operating costs by piling many containers onto single virtual machines instead of individual [under-utilized] machines.

I drool over CoreOS. An amazing Docker enabled technology that combines

  • the [best] init + process supervision system in existence that most major linux distributions (including Ubuntu) are switching to (systemd),
  • etcd a high-available key value config +service discovery tool for clusters,
  • and fleet an amazing tool that makes systemd behave sensibly + intelligently across a cluster of containers.

Of all the magical goodies the containerzation unicorn brings, this is the one I’m currently most stoked about. It’s also a paradigm rethink and thus for most cloud ops shops, it’s a big endeavor.

4. improve security by isolating application processes further with containers

Yep. It’s great that tools like chef-container strip out all of the important encryption and auth settings when spinning up new docker container.

Again, this tech is still bleeding beta software. Furthermore, there is still a lot of uncertainty as to what the best practices are for running hybrid configurations that combine the idea of immutable infrastructure, (also see 1,2) that Docker espouses mixed with some of the dynamism of Chef. It’s something to play with and figure out over the next couple of months.

5. facilitate rapid sharing of complex infrastructure across cloud providers.

DockerHub is RAD! GitHub for complex OS images. Who is gonna argue with that?

Though there are some good images out there, without established norms about how to guarantee a published image is built a certain way, I have a hard time not viewing this as an attack vector analogous to a site or email that wants you to download and execute a malicious binary to your computer. Given the number of smart people pouring into this space, I would expect this area to mature rapidly. Also, as the community grows the robustness of the peer review system will improve.

The Brutality of Git

I love Git. Live by it. Swear by it. Interact with it all the damn time. I mostly love it.

Sometimes it stabs me in the face though…while on master:

# Grab the latest
git pull

# Deploy a branch to heroku: 
git push --force staging feature/the_awesome_sauce:master

Then boom, the shank to the face:

feature/the_awesome_sauce is not deployed like I think

Instead, a old commit from the feature/the_awesome_sauce is deployed, arrrrr…

They be different:

the_project$ git show-ref feature/the_awesome_sauce
6c9900faca9g2d758bf6g00a7418af37b315f3a7 refs/heads/feature/private_libraries
ecc4c76220c5d399c0e1d53e950e456a4329ad56 refs/remotes/origin/feature/private_libraries

Don’t get stabbed in the face. Know the difference between remote and local branches.

Use git push --force staging origin/feature/the_awesome_sauce:master next time.

Controlling a Spark Core Remote Control Car With a Bash Script

I just wrote a bash script to control a Spark Core powered remote control car over the internet. Hit “f” to go forward, hit “b” to backward, etc. Kinda sweet.

I grabbed the RC Car Spark Core firmware code, compiled it + flashed it to the core over USB by following these instructions. Essentially, I replaced the src/application.cpp with the code from the site and ran make clean dependents all, then used dfu-util to flash it.

I wrote up this script to make interacting with this simple Spark Core program a little less cumbersome, to use at the Tinkerer’s Ball at the Science Mueseum this Thursday, and also for folks who are looking for something minimal to wrap up curl commands that they are throwing at the Spark Cloud API to talk to their Spark Core firmware program.

Here’s a short vid of me getting jazzed about this with my nasty mustache (This is a hangover from Spark’s wonderous partaking in Movember. Someone kill that thing! It’s 4 days into December man!)

The idea of this bash script is to set SPARK_CORE_DEVICE_ID and SPARK_CORE_ACCESS_TOKEN, then run a pre-defined function that loops for input and translates that Spark Cloud API curl commands. Please fork or comment if applicable.

Hello Octopress (Goodbye Blogger)

This here blog is the new location of http://thenerdings.blogspot.com…

Octopress and Jekyll seem awesome! I’m jazzed to be able to write in my tricked-out text editor, use git + github.io/github-pages to deploy, and to develop blog posts the way I develop code. It jives with how I roll. For a dude who gets huffy-puffy when the pointy-clicky things fight me, a blogging platform made for hackers seems like a wonderful thing…here we go.