The missing link of Ops tools

It’s like we went from horse to spaceship, skipping everything in between.

Background

Let’s say you are managing your system in AWS. Amazon provides you with API to do that. What are your options for consuming that API?

Option 1: CLI or library for API access

AWS CLI let’s us access the API from the command line and bash scripts. Python/Ruby/Node.js and other languages can access the API using appropriate libraries.

Option 2: Declarative tools

You declare how the system should look like, the tool figures out dependencies and performs any API calls that are needed to achieve the declared state.

Problem with using CLI or API libraries

Accessing API using CLI or libraries is fine for one off tasks. In many cases, automation is needed and we would like to prepare scripts. Ideally, these scripts would be idempotent (can be run multiple times, converging to the desired state and not ruining it). We then quickly discover how clunky these scripts are:

# Script "original"
if resource_a exists then
  if resource_a_property_p != desired_resource_a_property_p then
    set resource_a_property_p to desired_resource_a_property_p
  end
  if resource_a_property_q != desired_resource_a_property_q then
    ...
  end
else
  # resource_a does not exist
  create resource_a
  set resource_a_property_p to desired_resource_a_property_p
  ...
end
# more chunks like the above

It’s easy to see why you wouldn’t want to write and maintain a script such as above.

How the problem was solved

What happened next: jump to “Option 2”, declarative tools such as CloudFormation, Terraform, etc.

rocket-1374248_640

Other possible solution that never happened

If you have developed any code, you probably know what refactoring is: making the code more readable, deduplicate shared code, factoring out common patterns, etc… without changing the meaning of the code. The script above is an obvious candidate for refactoring, which would be improving “Option 1” (CLI or a library for API access) above, but that never happened.

All the ifs should have been moved to a library and the script could be transformed to something like this:

# Script "refactored"
create_or_update(resource_a, {
  property_p = desired_resource_a_property_p
  property_q = desired_resource_a_property_q
})
# more chunks like the above

One might say that the “refactored” script looks pretty much like input file of the declarative tools mentioned above. Yes, it does look similar; there is a huge difference though.

Declarative tools vs declarative primitives library

By “declarative primitives library” I mean a programming language library that provides idempotent functions to create/update/delete resources. In our cases these resource are VPCs, load balancers, security groups, instances, etc…

Differences of declarative tools vs declarative primitives library

  1. Declarative tools (at least some of them) do provide dependency resolution so they can sort out in which order the resources should be created/destroyed.
  2. Complexity. The complexity of mentioned tools can not be ignored; it’s much higher than one of  declarative primitives library. Complexity means bugs and higher maintenance costs. Complexity should be considered a negative factor when picking a tool.
  3. Some declarative tools track created resources so they can easily be destroyed, which is convenient. Note that on the other hand this brings more complexity to the tool as there must be yet another chunk of code to manage the state.
  4. Interacting with existing resources. Between awkward to impossible with declarative tools; easy with correctly built declarative primitives library. Example: delete all unused load balancers (unused means no attached instances): AWS::Elb().reject(X.Instances).delete()
  5. Control. Customizing behaviour of your script that uses declarative primitives library is straightforward. It’s possible but harder with declarative tools. Trivial if in a programming language can look like count = "${length(var.public_subnets) > 0 ? 1 : 0}" (approved Terraform VPC module).
  6. Ease of onboarding has declarative tools as a clear winner – you don’t have to program and don’t even need to know a programming language, but you can get stuck without knowing it:
  7. Getting stuck. If your declarative tool does not support a property or a resource that you need, you might need to learn a new programming language because the DSL used by your tool is not the programming language of the tool itself (Terraform, Puppet, Ansible). When using declarative primitives library on the other hand you can always either extend it when/if you wish (preferable) or make your own easy workaround.
  8. Having one central place where potentially all resources are described as text (Please! don’t call that code, format is not a code!). It should be easier done with declarative tools. In practice, I think it depends more on your processes and how you work.

As you can see, it’s not black and white, so I would expect both solutions be available so that we, Ops, could choose according to our use case and our skills.

My suggestion

I don’t only suggest to have something between a horse and a spaceship; I work on a car. As part of the Next Generation Shell (a shell and a programming language for ops tasks) I work on declarative primitives library. Right now it covers some parts of AWS. Please have a look. Ideally, join the project.

Next Generation Shell – https://github.com/ilyash/ngs

Feedback

Do you agree that the jump between API and declarative tools was too big? Do you think that the middle ground, declarative primitives approach, would be useful in some cases? Comment here or on Reddit.


Have a nice day!

Leave a comment