cwebber.net

Random thoughts about life and systems administration

Why I Don’t Blog Much Anymore…

| Comments

Writing about tech was always time consuming and not always easy to get into the right mind frame to do. But at least when I did it, it was concrete and I could expose my assumptions. I believe using Habitat to do can be explained, and I can even wrap the argument in assumptions. Docker this, sensu that, a little orchestration to solve this problem.

Contrast that with management. Am I a good manager? I think so? I hope so? My boss would have said something if I was sucking, right? Are the things I am doing just side effects or are they actually doing what I expect? There really is no way to be sure. This makes it incredibly hard to write about… I can tell you the books I have read, I can even on occasion, explain why I feel the way I do. But. to put down concretely that I am doing it right seems so artificial and likely a lie.

The things I know for sure:

  • I have an awesome team.
  • I have an awesome boss.
  • There is more experience to gain, no matter how many books I read.

So the question becomes… How do I share? How do we share? How do we grow together as managers the same way we have grown together as technologists? Going a step further, how do we prepare each other for the types of things we are going to encounter?

Databases, Pipelines, and Failures

| Comments

One of the more interesting things that comes up for me when building new pipelines is around handling the moving of already running services and how things play out when disaster recovery is in play. The hardest part has been framing the question. What are we trying to achieve and how does that affect the other things we are trying to achive? My starting point has been around this:

  • What is the “right” behavior in normal operations?
  • What is the “right” behavior in disaster recovery or new pipeline build out?

All of this really gets interesting when discussing a service that is heavily database dependent, like Wordpress. In an ideal world, each deploy into acceptance, union, and rehearsal would restore either a replica of exactly what is in delivered/production right now or what was there fairly recently (think last night’s backup). This is actually super easy to achive. Assuming that the data we need is accessible, we can build that into the pipeline.

Except, what happens when the production system isn’t there for reference? When acceptance goes to be deployed before we get to delivered how do we get valid data in the database? I really see two options:

  • Fail the pipeline and force manual intervention.
  • Initialize the database with an empty set of data.

When we deploy most applications, at this point, the second option is quite normal. When you are building out for the first time it makes a tremendous amount of sense. Really, this question probably doesn’t even cross your mind because there is no production data to start with. On the flip side, when you look at Wordpress and many enterprise applications, much of the config of the application is actually stored in the database. Not to mention, when moving an already existing application, your concerns are usually a bit different than when you are building a new application.

The benefit of forcing the fail is that it ensures that you don’t accidentally deploy without a working application. Whether it be settings or content, Wordpress without a restore of the data isn’t very useful if it has been used much already. The problem is that, this is hugely disruptive. The first time you spin the pipeline all the way through, you end up needing to fix four database installs (acceptance, union, rehearsal, delivered).

The place I have landed is asking a few more questions:

  • What is the overall importance of this application?
  • Does this fit into the dependency chain in a way that it is going to block other applications if it doesn’t deploy?
  • What impact does this have on the system as a whole?
  • Rephrasing the above question, if we have bad data, are other systems going to do the wrong thing?
  • If other systems do the wrong thing, what could the impact be?

The place that this brings me to is really around the nature of the service and the larger implications. Today I am moving around a Wordpress instance. That instance happens to support the blog and our events page. By it being up with wrong data durring a disaster recovery situation, it really isn’t going to hurt anything. It is important, but to have a few pages 404 for a day or two is not the end of the world. Turning that around, it is in the dependency chain for the main website. Which means, in order for us to get the site deployed the app needs to work. There is much more stress created if we cant deploy www because of a missing wordpress backup.

The hypothetical enterprise system also helps justify this position. If instead of a Wordpress instance we were talking about an internal payroll system, it may actually make a ton of sense to fail the pipeline each time. We really do want to ensure that data is correct because other systems do act on it. In a time of crisis you dont want your benefits system to send all of your employees notification that they are now eligible for COBRA and that their health coverage ends in a few days.

SysAdvent Stats

| Comments

November is a hectic time of year for me and a few years back I made it even crazier by volunteering to take the reigns of SysAdvent, the annual advent calendar for SysAdmins, Ops, DevOps, and all the other folks that are excited about systems. As I have been working to arrange some sponsorship stuff I got asked a super reasonable question about stats. So instead of replying in email I thought I would share the lifetime stats from the blogger console.

Stats > Overview

Overview

Stats > Posts

Posts

Stats > Traffic Sources

Traffic Sources

Stats > Audience

Audience

Of all the data presented here, it is crazy to see that Day 1 – Docker in Production: Reality, Not Hype, an article from 2014 is in the top 3 articles and it is less than a year old. It is super exciting to see the growth of SysAdvent and exciting to see how many folks visit the site each day.

I can’t wait for this year’s SysAdvent.

Delivering My New Reality

| Comments

Back in late December/early January, I took on a new role as the Engineering Lead of Corporate Infrastructure and Applications at Chef. As I started to understand my new role, one of the things I set out to do was figure out what our standard stack and deploy patterns were going to be. As fate would have it, in early March I got a ping in from one of the folks in marketing about a crazy idea they had about the ChefConf keynote.

With three and a half weeks left before ChefConf, he said, “we want to deliver the announcement about Delivery on stage at ChefConf using Delivery.” Within about two weeks I had built a new pipeline, using Delivery, that would get our corporate website, https://www.chef.io, deployed. As an aside, the decision to move to Fastly as part of this was probably almost as important as moving to Delivery itself.

For those that missed it… Seth Falcon and James Casey actually did deliver https://www.chef.io to the world that day using Delivery, live and on stage. Never in my life has a few minutes felt so long.

I am now a convert. The idea of managing an environment operationally, without Delivery or something very close to it, makes me shudder. It has fundamentally changed the way I think about what I do as a professional. The key distinguishing fact, for me anyway, is that we use Delivery to deliver services, not just cookbooks, not just application code, but services.

At this point, my team is already delivering lots of things with Delivery:

The crazy part is that there are so many reasons why I am in love with Delivery that I can’t actually explain them in one post. But my plan is to start writing about each of them. The following is a list of things I plan on writing posts about:

  • The standard way of shipping things: Making changes is the same, no matter what app you are in.
  • The artifact is king: Shipping a promotable artifact really is a HUGE win.
  • The shape of the pipeline: Where the phases and stages are makes a huge difference in how I think now.
  • Yes, you really do need four separate environments.
  • A down and dirty intro to build cookbooks.
  • It can’t be all rainbows and unicorns… A look at some of the pain points and gotchas.

I really am super excited about Delivery and not just because I work at Chef. Hopefully even if you aren’t using Delivery itself, some of the things I have learned about CD will help you as well.

Stubbing Encrypted_data_bag_for_environment

| Comments

We use the encrypted_data_bag_for_environment method from the chef-sugar library pretty heavily in the cookbooks my team uses. With that said, I can never seem to remember the right invocation of stubs and mocks to be able to test recipes that have those calls in them. Since I don’t want to have to come up with this again, here is an example of how I did it this morning. In this example, I am stubbing a data bag item that has creds for using DataDog.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
require 'spec_helper'

describe 'cia_infra::base' do
  let(:chef_run) do
    stub_command('which sudo').and_return('/usr/bin/sudo')
    Chef::Config['encrypted_data_bag_secret'] = '/dev/null'
    @runner = ChefSpec::ServerRunner.new
    @runner.converge(described_recipe)
  end

  before do
    allow(Chef::EncryptedDataBagItem).to receive(:load).with('cia-creds', 'datadog', '').and_return({
      'default' => {
        'api_key' => 'datadog_api_key',
        'application_key' => 'datadog_application_key',
      }
    })
  end

  it { expect(chef_run).to include_recipe('apt') }
  it { expect(chef_run).to include_recipe('chef-client') }
  it { expect(chef_run).to include_recipe('chef-client::delete_validation') }
  it { expect(chef_run).to include_recipe('ntp') }
  it { expect(chef_run).to include_recipe('push-jobs') }
  it { expect(chef_run).to include_recipe('oc-users') }
  it { expect(chef_run).to include_recipe('cia_infra::monitoring') }
end

It is entirely possible that I am doing this wrong, but hey, it works. I would love to hear better approaches if there are any.

Adventures in PTO

| Comments

For anyone that was watching Twitter last week, you probably noticed that all of Chef Software headed to Seattle for Rally Week. You may have also noticed that I wasn’t there. I was on PTO. It is a culture that allows this sort of thing that has really given me confidence in the “Unlimited” PTO model and really has me thinking about all the negatives folks have pointed out.

Is it even PTO?

At a very fundamental level, it really seems as if the reason there is even a need for unlimited PTO policies is because we have forgotten what it means to be salaried employees. I have worked the spectrum… At one place of employment, if you worked at all that day, it was technically a work day and could not be considered PTO. On the flip side, I have worked for employers where it was hard to “leave early” at 4:00 pm (even after starting my day close to 6:00 am) and if I took off before 3:00 pm it was expected that I put a half day off in the time tracking system. The fact that it wasn’t out of the normal to do 10 and 12 hour days had no impact on whether I “took PTO.” Contrast those with Chef where it is tracked to the point that I put an entry on the calendar if I take the entire day off, otherwise, it is between my manager and me.

Growing up, the way I understood being a salaried employee meant that most weeks you would work more than 40 hrs but it also wouldn’t be ridiculous to work 30 hrs in a week, because you were expected to get your job done rather than to punch the time clock. Right down to the point, my dad used to point out the lawsuits where folks that were salaried tracked their time and were judged against it and won tons of back pay. Being salary meant that I needed to show up when it made sense for my job and leave when it made sense as well. If I took a long lunch, who cares?

The point being, if I take a day to go volunteer, but am still available to answer questions is it really PTO? I think in many ways the unlimited PTO policies are meant more to address this more than anything. If you need to go do some volunteer work to help make you happier, go do that. If taking the morning off to get your oil changed and run a few errands allows you to focus in the afternoon, you should probably go do that too. If you have just had a morning that sucked and you really just want to go surf to take your mind off of work, why are you still in the office?

Vacation

So if taking a day, a morning or an afternoon here and there isn’t PTO, what is? For me, it is vacations. For some that means stay-cations for others it means travel. Whatever it is, it is a chance to disconnect and not worry about being available at all. (I don’t tend to disconnect fully, but that is a discussion for another time.) It is this sort of PTO that I think most folks struggle with.

Travis CI, a company based in Germany, recently discussed a “minimum” PTO policy where they discussed requiring folks to take a minimum of five weeks. To put that in perspective, I have never had more than three weeks a year, and at one job there was compulsory vacation that had to be taken during the holidays. (Yes, you could go without pay, but you were expected to take PTO.) Additionally, there are usually caps on how much vacation you can accrue, usually around 300 hrs. Notice the other word in that statement, accrue. You don’t just start with that thee weeks, you have to earn and accrue it. So if after you have been with the company a month you want to take a week off for a family vacation, you are out of luck. What I am getting at is that we as Americans, tend to not be very good at taking vacation.

We struggle to take this time off because there is alway so much to be done. One of the best lessons I ever learned was that the work wasn’t going away. If I took off a week, the work would still be there for me when I got back. And if that specific work needed to be done before I left, there was always work that could wait. That isn’t to say I don’t work hard and try to get done lots and lots, it is just a realization that the work will fill the time allotted, so it is up to me to limit that time. Much of what makes unlimited PTO hard is that because you didn’t “earn” it, you don’t always see it as yours to take. Frankly, if you are working your hardest, you have earned it.

One of the smartest, stupid decisions I ever made was buying into a timeshare. While you can argue the financial outcome of owning timeshare, what I will argue is that the impact on my mental health has been well worth any financial benefit or loss involved. The reason it has been beneficial is that it forces the issue. I take vacation because I have already paid for it. As much as there is something relaxing about hacking on stuff while sitting on the balcony overlooking Los Archos in Cabo San Lucas, I feel guilty that I am not taking advantage of the amazing place I am visiting.

Managers make all the difference

Ok, so company culture plays a huge role too, but managers really are at the heart of making unlimited PTO work. The best way to explain this is actually an experience recently… A few months back I went to Disneyland in the afternoon after my daughter was done with school to catch up with my parents and my cousin who was in town from Wisconsin. When my mom asked about work, I replied, “well, I knew that if my boss caught wind that I stayed at work and gave up this rare chance to catch up with family, I would catch shit, so I got in the car and came.” At Chef, I have regularly been encouraged to take the time I need. When the question of going to the all hands week vs going on vacation with family during a week we travel almost every year came up, I honestly had a harder time with the idea of me missing it than my boss did. At the end of the day, not only has my manager, but all of my coworkers as well, reminded me that family comes first.

As I take on a management role, I will strive to build that same sort of culture into my team. PTO is important. Family is important. Above all, being happy is important. So if that means taking a day here and there to go volunteer, or go surf, take it. If what makes you happy is taking off for a few hours on Tuesday and Thursday afternoons to go teach a class, how can I support you in doing that? Additionally, that means setting a good example. I really do see it as my job as a manager to take those days on occasion and to go on vacation. In fact, my next vacation is already planned, is yours?

Building a New Kitchen

| Comments

Back in April of last year, I announced I was cooking up something new. I could not have imagined how amazing that “something new” really would be. After 10 months of being an employee at Chef, I can without a doubt say it is the best company I have ever worked for. The company culture is the most positive and encouraging environment I have ever worked in.

Not only that, but the position of Community Software Engineer was a perfect fit. It was a chance to level up my skills as a developer, learn and experience Chef in fun and exciting ways, and it was incredible to be able to spend time as part of my job focused on serving the community of amazing professionals I get the privilege of working with every day. It was also a chance to work for an amazing boss. Nathen really taught me how much trust really does matter, and what it means to ensure that your people are taking care of themselves.

But, as the title of this post and text have already alluded to, I am moving on from my position of Community Software Engineer. This doesn’t mean I will be doing any less in the way of the community, but it does mean that my responsibilities will be changing to be focused in a bit of a different direction.

I am super excited to be taking on my first leadership role and even more excited to be doing it at Chef. I am going to be the Engineering Lead of the newly formed Corporate Infrastructure and Applications team. This team will help to be the catalyst that speeds delight in many of the groups that fall outside of engineering.

As excited as I am about the new adventures that come with leading a team, I am even more excited for what my team has been tasked with. This team will focus on one of my favorite facets of IT… helping to improve the internal business processes. While over the last 10 months I have struggled to explain to my non-tech family and friends what I did as a Community Software Engineer, this new role is much easier to relate with. Sadly, we all have stories of, “Well, if IT just listened to what we need…” This team will work to actually listen and help build those things that are actually needed.

So… Here is to a wonderful start to a new year.

I Am Not a Coder

| Comments

The last week at PuppetConf was an absolute blast. It was great to catch up with all the amazing folks doing amazing work. But, there was one thing that bothered me quite a bit. In many different contexts I heard people more or less say, “I can’t write Ruby, I’m not a coder.” Or, just as bad, “I use Puppet because I am not a coder.”

These sentiments bother me in so many ways that I could probably sit and rant for half the day. But really, after thinking about it quite a bit, there are a few key reasons why this idea just doesn’t sit well with me. I really think that if you are someone that says something similar to the above, you should rethink things.

The Technical

Puppet is why I know Ruby

I feel like an old man in saying this, but back in my day, we didn’t have this fancy facter.d stuff. We had to write our custom facts using Ruby and we liked it. Ok, so maybe the liking it part is a bit of a stretch, but I definitely liked the results. When I wrote those facts, I didn’t have any clue what a method was or how objects worked, I just knew that when I pasted the right things in and tweaked them a bit, I was able to get information I needed, and that was awesome.

I don’t actually know Ruby

So, anyone that actually thinks I know Ruby, probably has never actually looked at my code. While I am not scared to fire up irb or pry and copy pasta some code in to solve problems, the idea of a large scale Rails or Sinatra App does not sit well with me. I am absolutely a coder, but I am by no means much more than a Junior Ruby Developer when it comes time to my Ruby skills. Like most good SysAdmins, I just happen to be good at Google.

Have I ever mentioned how much I dislike CPAN?

Whenever I went to go do anything in Perl, I found myself reaching for a module from CPAN. The CPAN topic is probably worth its own discussion, and, to be fair, has probably gotten a bit better since the last time I tackled this. But, as a result of the pain of CPAN and the nature of the environment I was working in at the time, installing new Perl modules wasn’t an option and even if it was, it wasn’t really feasible.

Enter Ruby. As a result of running Puppet everywhere, I had Ruby everywhere for free. Combine that with the fact that the Ruby Standard Library is kinda awesome, you get the ability to do some phenomenal things. Since most of my job involved ALL of the systems, I needed to be able to write scripts that would work everywhere in our infrastructure. With the power of Ruby’s Standard Library and the fact that the code was actually readable, even a week later in most cases, it made a great way to do systems stuff, everywhere.

Level up your Puppet game

The real power in Puppet can be found in writing custom functions and custom types and providers. It is the custom functions that let you actually move some of the crazy that usually gets done inside of templates, thus hidden from plain view, and move it back into the manifest where it can be seen in context with all the other things going on in the manifest. The custom types and providers give you the benefit of being able to manage ALL THE THINGS in Puppet. Do you want to manage DNS about systems inside of Puppet? How about adding systems to that new fancy monitoring SaaS? All of that gets done in custom types and providers.

The Squishy Stuff

You are already a coder

Are you writing Puppet Code? Are you using ERB? If so, you are already a coder. You are already dealing with making decisions about the APIs you present or don’t present. You are dealing with control structures and code organization. You are a coder.

Code is eating the world

So I am not going to actually dig into this except to say, code is becoming an integral part of our lives as Operations and Systems Administration Professionals. Whether it be with Chef, Puppet, Ansible or the next thing, being a coder is going to be at the center of the jobs that we do.

Future of HangOps

| Comments

Back in August of 2012, Brandon Burton (@solarce) and Jordan Sissel (@jordansissel) started an amazing thing, HangOps. The idea behind HangOps was simple, get ops folks together for an hour or so a week to grab coffee remotely and talk shop. Between the number of us that were remote or ops teams of one, it was a great way to talk to others in the field without waiting for the one or two conferences a year most of us are able to attend.

Over the last two years it has grown into something amazing. We have had a chance to not only connect with the luminaries of our community but work together to really understand the successes and failures of our peers. It has been a place where you could come ask questions and get honest answers.

But, just as our systems change over time, so must HangOps. After two years of making HangOps a thing, Brandon has handed me the reins so he can focus on some new job responsibilities and his family. While he will still join us, more or less regularly, he has handed off the responsibility of continuing to make HangOps a thing. I am super grateful for the guidance that Brandon has provided and for getting this all started.

So, where to from here? First, let’s get back on a regular cadence. The next HangOps will be at our usual time 11:00 AM PDT (18:00 UTC) on Friday, August 29th, 2014. Let’s talk monitoring and testing, whether that ends up being a discussion of test-kitchen, chefspec, and puppet-rspec type things or more focused on sensu and nagios type things is up to the panel.

Thanks again to everyone that has been part of HangOps over the last two years. Let’s make it another awesome two years!

Joining Chef, the Hard Parts

| Comments

Anybody that has spent any time with me recently has probably heard about how much I love working for Chef. I have been meaning to write a post about how great the transition to Chef has been for the last month, but just never seem to find the time. The people are amazing, the company is amazing and the product is amazing.

But, with any good job, there comes new challenges. My new role at Chef has meant a lot more writing. I have probably written more in an official capacity since joining Chef than I did on my personal blog the entire time I was at Demand Media. The context switch is hard. Active voice is hard.

It has been much harder than I expected to get in the frame of mind to write at any length. I find that I frequently sit down to write a blog post or an email that needs to go out to the mailing list and am easily distracted by technical things. Even as I write this post, I am cmd-tabbing back and forth to a chef-client run (CCR). Finding that place of focus for me is wicked hard when there are so many fun distractions around.

To add to the adventure, through the amazing critique of my coworkers, I have realized I still write like I am in an academic program instead of doing business writing. What that means is that I tend to use 50 words to explain something when 10 words will do. To add to that, I am super guilty of using passive voice in my writing. I am thankful to @sethvargo, @jtimberman, and @btmspox for their willingness to provide honest and constructive feedback.

What I want to do right now is commit to writing daily or something akin to that, but I know that life over the next month will be too busy to even begin a regular writing exercise. So from now until vacation starts in two weeks, I am going to actively pay attention to when I am avoiding writing and work to do it more frequently. I truly believe that the only way to improve at these sort of things is to do them more often to get better.