Declarative primitives or mkdir -p for the cloud

After some positive feedback regarding the concept of declarative primitives I would like to elaborate about it.

Defining declarative primitives

Declarative primitives is just a description of existing techniques. I gave it a name because I’m not aware of any other term describing these techniques. The idea behind declarative approach is to describe the desired state or result and not particular command or operations to achieve it.

Example: mkdir -p dir1/dir2/dir3

The outcome of the command does not depend on current state (whether the directory exists or not). You describe the desired state: directories dir1, dir2 and dir3 should exist after the command is run. Note that mkdir dir1/dir2/dir3 does not have the same effect: it fails if dir1 does not exist or dir2 does not exist or if dir3 exists.

The phrase declarative primitives emphasizes granularity. Existing declarative tools for the cloud operate on many described resources, build dependency graphs and run in order that they decide. Declarative primitives provide a very flexible way to control a single resource or a group of few resources of the same type. The flexibility comes from granularity. You decide how you combine the resources. You can easily integrate existing resources. You can modify just the properties of your interest on the resources you choose. This approach is ideal of scripting in my opinion.

Where are declarative primitives for the cloud?

sky-414198_640

I believe that when writing a script, using mkdir -p should be similar to using AwsElb(...).converge() for example. I’m working on implementing it (as a library for the Next Generation Shell) and I’m not aware of any other project that does it.

There are many projects for managing the cloud, how are they different?

Here are the solutions that I’m aware of and how familiar I am with each one:

  1. CloudFormation – using frequently (I prefer YAML syntax for it)
  2. Terraform – I’ve read the documentation and bits of source code
  3. Cloudify – familiar with the product, made modules for it
  4. Puppet – was using it intensively on few different projects
  5. Chef – was using it intensively in many projects
  6. Ansible – unfamiliar with this one (only took a look at documentation) so not reviewing it below
  • All take the declarative approach. You describe many resources or the entire system and feed the description to the tool which in turn does all the work. None of these solutions was designed to provide you with the primitives that could be easily used in your scripts. These tools just don’t match my view regarding scripting.
  • These tools can do a rollback on error for example. They can do that precisely because they have the description of the entire system or big parts of it. It will take some additional work to implement rolling back using declarative primitives. The question is whether you need the rollback functionality …
  • Some of these tools can be made to work with different clouds relatively easily. Working with different clouds easily may also possible with declarative primitives but the library I’m currently working on does not have such goal.
  • Except for Chef, the tools in the list above use formats or DSLs not based on real programming languages. [Update 2019-05-26: This means that except for trivial cases you will be using some additional tool to generate the descriptions of desired states. (Practice proved me wrong, this means convoluted/unclear definitions sometimes)] Limited DSLs do not work. See Puppet and Ansible [Update 2019-05-26: with release 0.12, Terraform too] that started with simple description languages and now they are almost real programming languages … which where never designed as programming languages, which has consequences.
  • I’m not aware of any option in the tools above that lets you view definitions of existing resources, which prevents you from starting managing existing resources with these tools and from cloning existing resources. I have started implementing the functionality that lets you generate the script that would build an existing resource: SomeResource(...).code() . This will allow easy modification or cloning.
  • A feature missing both from these tools and from my library is generating a code to start with for a given resource type (say security group or load balancer). Writing CloudFormation definition for a type with many properties is a nightmare. Nobody should start from scratch. Apache or Nginx configuration files are good example of starting points. Similar should be done for the cloud resources.
  • Note that Chef and Puppet were originally designed to manage servers. I don’t have any experience using them for managing the cloud but I can guess it would be less optimal than dedicated tools (the first three tools).

Scripting the cloud – time to do it right!

Why CloudFormation is better than Chef and Puppet

Strange comparison, I know.

apple-926456_640

Scripting vs declarative approaches

The aspect I’m looking at is scripting (aka imperative programming) vs declarative approach. In many situations I choose the scripting approach over declarative because the downsides of declarative approach outweigh the benefits in the situations that I have.

Declarative approach downsides

Downsides of Chef, Puppet and other declarative systems? Main downsides are complexity and more external dependencies. These lead to:

  1. Fragility
  2. More maintenance
  3. More setup for anything except for the trivial cases

I can’t stress enough the price of complexity.

Declarative approach advantages

When the imperative approach would mean too much work the declarative approach has the advantage. Think of SQL statements. It would be enormous amounts of work to code them by hand each time. Let’s summarize:

  1. Concise and meaningful code
  2. Much work done by small amount of code

Value of tools

I value the tools by TCO.

Example 1: making sure a file has specific content. It could be as simple as echo my_content > my_file in a script or it could be as complex as installing Chef/Puppet/Your-cool-tool-du-jour server and so on…

Example 2: making sure that specific load balancer is set up (AWS ELB). It could be writing a script that uses AWS CLI or using declarative tools such as CloudFormation or Terraform (haven’t used Terraform myself yet). Writing a script to idempotently configure security groups and the load balancer and it’s properties is much more work than echo ... from the previous example.

While the TCO greatly depends on your specific situation, I argue that the tools that reduce larger amounts of work, such as in example 2, are more likely to have better TCO in general than tools from example 1.

“… but Chef can manage AWS too, you know?”

Yes, I know… and I don’t like this solution. I would like to manage AWS from my laptop or from dedicated management machine, not where Chef client runs. Also, (oh no!) I don’t currently use Chef and bringing it just for managing AWS does not seem like a good idea.

Same for managing AWS with Puppet.

Summary

Declarative tools will always bring complexity and it’s a huge minus. The more complex the tool the more work it requires to operate. Make sure the amount of work saved is greater than the amount of work your declarative tool requires to operate.

Opinion: we can do better

I like the scripting solutions for their relative simplicity (when scripts are written professionally). I suggest combined approach. Let’s call it “declarative primitives”.

Imagine a scripting library that provides primitives AwsElb, AwsInstance, AwsSecGroup and such. Using this primitives does not force you to give up the flow control. No dependency graphs. You are still writing a script. Minimal complexity increase over regular scripting.

Such library is under development. Additional advantage of this library is that the whole state will be kept in the tags of the resources. Other solutions have additional state files and I don’t like that.

Sample (NGS language) censored code that uses the library follows:

my_vpc_ancor = {'aws:cloudformation:stack-name': 'my-vpc'}

elb = AwsElb(
    "${ENV.ENV}-myservice",
    {
        'tags': %{
            env ${ENV.ENV}
            role myservice-elb
        },
        'listeners': [
            %{
                Protocol TCP
                LoadBalancerPort 443
                InstanceProtocol TCP
                InstancePort 443
            }.n()
        ]
        'subnets': AwsSubnet(my_vpc_ancor).expect(2)
        'health-check': %{
            UnhealthyThreshold 5
            Timeout 5
            HealthyThreshold 3
            Interval 10
            Target 'SSL:443'
        }.n()
        'instances': AwsInstance({'env': ENV.ENV, 'role': 'myservice'}).expect()
    }
)

elb.converge()

It creates a load balancer in an already existing VPC (which was created by CloudFormation) and connects existing instances to it. The example is not full as the library is work in progress but it does work.


Have fun and watch your TCO!