NGS unique features – Argv command line arguments builder

Background: what is NGS?

NGS LOGO

NGS, the Next Generation Shell is a (work in progress) shell and a programming language built ground up for systems engineering tasks. You can think of it as bash that’s designed today: sane syntax, data structures, functional programming, extensibility, cloud in mind, declarative primitives.

What’s the problem with constructing command line arguments?

The problem affects only more “advanced” cases of constructing command line arguments when some arguments might or might not be present. Let’s consider this example:

# Made-up syntax, resembling NGS
args = []
if 'Subnets' in props {
  args += '--subnets'
  args += props['Subnets']
}
if ... {
  args += ...
}
if ... {
  args += ...
}
...
aws elb create-load-balancer ... $args

Wouldn’t it be cleaner to get rid of all the ifs? … and what happens if props['Subnets'] is an empty array?

How Argv facility in NGS solves the problem?

Argv is a result of factoring out the common code bits involved in constructing command line arguments. The ifs above were also factored out. They are now in Argv.

Let’s look at usage example (real NGS code, from AWS library)

argv = Argv({
  '--load-balancer-name': rd.anchor.name
  '--listeners': props.ListenerDescriptions.encode_json()
  '--subnets': rd.opt_prop('Subnets', props).map(only(ResDef, ids))
})
rd.run('create ELB', %(aws elb create-load-balancer $*argv))

The important points here are:

  1. Argv is a function with a single parameter which must be of type Hash (also called “dictionary” in some languages)
  2. The keys of the Hash are switches’ names (--load-balancer-name, --listeners, --subnets)
  3. The values of the Hash are values for the switches

The “if” that decides whether a switch is present in the resulting argv is inside Argv implementation and your code is clean of it. The values of the Hash are considered when Argv decides whether a switch should be present. null, empty array and instances of type EmptyBox are considered by Argv as missing values and it discards the switch. For convenience, instances of type FullBox are unboxed when constructing the result of Argv.

The Argv facility is yet another point among others that shows why NGS and systems engineering tasks are best fit.


Have a nice weekend!

 

Why Next Generation Shell?

Background

I’m a systems engineer. The job that I’m doing is also called system, SRE, DevOps, production engineer, etc. I define my job as everything between “It works on my machine” of a developer and real life. Common tasks are setting up and maintaining cloud-based infrastructure: networking, compute, databases and other services. Other common tasks are setting up, configuring and maintaining everything inside a VM: disks+mounts, packages, configuration files, users, services. Additional aspects include monitoring, logging and graphing.

The problem

If we take specifically systems engineering tasks such as running a VM instance in a cloud, installing and running programs on a server and modifying configuration files, typical scripting (when no special tools are used) is done in either bash or Python/Ruby/Perl/Go.

Bash

The advantage of bash is that bash is domain specific. It is convenient for running external programs and files manipulation.

# Count lines in all *.c files in this directory and below
wc -l $(find . -name '*.c')

# Make sure my_file has the line my_content
echo my_content >my_file

# Run a process and capture the output
out=$(my_process)

The disadvantage of bash is horrible syntax, pitfalls everywhere, awkward error handling and many features one would expect from a programming language (such as data structures, named functions parameters, etc) are missing.

# Inconsistent, awkward syntax,
# result of keeping backwards compatibility
if something;    then ... fi
while something; do ... done

# Can remove / if MY_DIR is not defined
# unless in "set -u" mode
rm -rf "$MY_DIR/"

# Removes files "a" and "b" instead of "a b"
myfile="a b"
rm $myfile

# Silently ignores the error unless in "set -e" mode
my_script

# Function parameters can't be named, they are
# in $1, $2, ... or in $@ and $*
myfunc() {
  FILE="$1"
  OPTION_TO_ENABLE="$2"
  ...
}

Leave bash alone, it was not intended for programming, don’t do anything in bash, just use external programs for everything.

What do you observe? Is it or is it not used as a programming language in real life?

General-Purpose programming languages

Python/Ruby/Perl/Go are general-purpose programming languages.

The advantage of general-purpose programming languages is in their power, better syntax, ability to handle arbitrary data structures.

orig = [1,2,3]
doubled = [x*2 for x in orig]

The disadvantage of general-purpose programming languages is that they are not and can not be as convenient for systems engineering tasks because they are not focusing on this particular aspect of programming (in contrast to bash and other shells for example).

# Write whole file - too verbose
f = open('myfile', 'w+')
f.write('mycontent')
f.close()

# Run a process and capture the output
# https://docs.python.org/3.5/library/subprocess.html
proc = subprocess.Popen(...)
try:
    outs, errs = proc.communicate(timeout=15)
except TimeoutExpired:
    proc.kill()
    outs, errs = proc.communicate()

Summary

My conclusion is that there is no handy language for systems engineering tasks. On one hand there is bash that is domain specific but is not a good programming language and does not cover today’s needs, on the other hand there are general-purpose programming languages which do not specialize on this kinds of tasks.

You can use Puppet, Chef, Ansible, Terraform, CloudFormation, Capistrano and many other tools for common systems engineering tasks. What if your task is not covered by existing tools? Maybe one-off? Maybe a case where using one of the existing tools is not an optimal solution? You would like to write a script, right? In that case, your life sucks because scripting sucks. That’s because there is no convenient language and libraries to get systems engineering tasks done with minimal friction and effort.

Solution

I suggest, creating a new programming language (with a shell) which is domain specific, as bash, and which incorporates important features of general-purpose programming languages: data structures, exceptions, types, multiple dispatch.

My way of looking at it: imagine that bash was created today, taking into account today’s reality and things that became clear with time. Some of them are:

  • The shell is used as a programming language.
  • A system is usually a set of VMs and APIs, not a single machine.
  • Most APIs return JSON so data structures are needed as multiple jq calls are not convenient.
  • Silently ignoring errors proved to be bad strategy (hence set -e switch which tries to solve the problem).
  • Silently substituting undefined variables with empty strings proved to be bad strategy (hence set -u switch).
  • Expanding $x into multiple arguments proved to be error prone.
  • Syntax matters.
  • History entries without context have limited usefulness (cd $DIR for example: what was the current directory before cd and what was in $DIR ?)
  • UX
    • Spitting lots of text to a terminal is useless as it can not be processed by a human.
    • Feedback is important.
      • Exit code should be displayed by default.
      • An effort should be made to display status and progress of a process.
      • Ideally, something like pv should be integrated into the shell.

I’m not only suggesting the solution I’ve just described. I’m working on it. Please give it a try and/or join to help developing it: NGS – Next Generation Shell.

NGS LOGO

# Make sure my_file has the line my_content
echo my_content >my_file

# Run a process and capture the output
out=`my_process`

# Get process handle (used to access output, exit code, killing)
p=$(my_process)

# Get process output and parse it, getting structured data
amis=``aws ec2 describe-images --owner self``
echo(amis.len()) # number of amis, not lines in output

# Functional programming support
orig = [1,2,3]
doubled = orig.map(X*2)

# Function parameters can be named, have default values, etc
F myfunc(a,b=1,*args,**kwargs) {
  ...
}

# Create AWS VPC and Gateway (idempotent)
NGS_BUILD_CIDR = '192.168.120.0/24'
NGS_BUILD_TAGS = {'Name': 'ngs-build'}
vpc = AWS::Vpc(NGS_BUILD_TAGS).converge(CidrBlock=NGS_BUILD_CIDR, Tags=NGS_BUILD_TAGS)
gw  = AWS::Igw(Attachments=[{'VpcId': vpc}]).converge(Tags=NGS_BUILD_TAGS)

I don’t think scripting is the right approach.

It really depends on the task, constraints, your approach and available alternative solutions. I expect that situations needing scripting will be with us for a while.

Another programming language? Really? Why the world needs yet another programming language?

I agree that creating a new language needs justification because the effort that goes into creating a language and learning a language is considerable. Productivity gains of using the new language must outweigh the effort of learning and switching.

NGS creation is justified in exactly the same way as many other languages were justified: dissatisfaction with all existing programming languages when trying to solve specific problem or a set of similar problems. In case of NGS the dissatisfaction is specifically how existing programming languages address the systems engineering tasks niche. NGS addresses this particular niche with a unique combination of features and trade offs. Productivity of using NGS comes from best match between the tool and the problems being solved.

Yet another shell? We have plenty already but they all have serious adoption problems.

NGS will be implementing ideas which are not present in other shells. Hopefully, the advantages will be worthy enough to justify switching.

I’ll be just fine with bash/Python/Ruby/Perl/Go

You will. The decision to learn and use a new language depends on your circumstances: how many systems engineering tasks you are doing, how much you suffer, how much easier the tasks will become with NGS, how easily this can be done in your company / on your project and whether you are willing to take the risk.

You could just write a shell based on Ruby or Python or whatever, leveraging all the time and effort invested in existing language.

I could and I didn’t. Someone else did it for Python and for Scala (take a look, these are interesting projects).

  • I don’t think it’s the right solution to stretch existing language to become something else.
  • NGS has features that can not be implemented in a straightforward way as a library: special syntaxes for common tasks, multiple dispatch.

One could just write a library for Python or Ruby or whatever happens to be his/her favorite programming language, leveraging all the time and effort already invested in existing language.

In order to be similar to NGS, one would not only have to build a library but also change language syntax. I personally know only two languages that can do that: Lisp (using reader macros) and Perl6 (using grammar facility). These are general-purpose programming languages. Turning them into something NGS-like will be a significant effort, which I don’t think is justified.

PowerShell appears to be similar to what you describe here.

Note that I have very limited experience with PowerShell. The only aspect I definitely like is consistent usage of the $ sigil.

  • It’s probably a matter of taste and what you are accustomed to but I like NGS’ syntax more. PowerShell is pretty verbose.
  • DSC appears to be focused on resources inside a server/VM. NGS plans similar functionality. Meanwhile, NGS uses this approach in the AWS library: vpc = AWS::Vpc(NGS_BUILD_TAGS).converge(CidrBlock=NGS_BUILD_CIDR, Tags=NGS_BUILD_TAGS)

There are libraries for Python that make systems engineering tasks easier.

Right, sh for example. Such solution can’t be used as shell, it just improves the experience of calling external program from Python.


Was this post convincing? Anything is missing to convince you personally? Let me know!

Have a nice day!

Please don’t use Puppet

Thinking process behind choosing a tool

Thinking process behind choosing a tool does not get the attention it deserves. While there are many discussions of the form tool X vs tool Y, there is very little discussion of how one should choose between tools or in presumable absence of alternatives, whether one should use the only candidate, tool X. This post will cover few things to keep in mind when selecting a tool by highlighting few common problems and fallacies. Puppet will be used as an example tool for consideration.

Focusing on positive parts only

When considering a product or a tool, too often positive aspects are overestimated and negative aspects that influence TCO (Total Cost of Ownership) are underestimated or neglected. There are several cognitive biases and logical fallacies involved. Cognitive biases and logical fallacies can be avoided to some extent just by being aware. I will be referring to these through the post to help you, the reader, become more aware of your thought process which will hopefully improve it and consequently the process of decision making on your part.

Marketing pushes to see the positive

We all know that marketing focuses on positive aspects of a product and neglects to mention downsides. This is specifically mentioned in “False advertising” article under “Omitting information”.

For example, the fact that it’s not convenient to manage Puppet modules (proof: existence of a tool to do just that) will not appear in marketing materials. You might think that the existence of Librarian-puppet is on the contrary, makes management of these modules easier. It does but it also brings more complexity to the system. New problems and bugs instead of inhuman manual management of modules.

This post will focus on the negative

While there is more than enough focus on positive aspects of products, this post will be highlighting the negative aspects in order to strike some balance. There is plenty of marketing materials but it’s harder to find a list of problems that you only discover when you are neck-deep into the tool/product. These problems will be listed here. Note that this can not be exhaustive list because different situations reveal different problems and this post is only based on experience of several of my friends and mine.

Listing the problems of a tool touches Availability heuristic cognitive bias: the easier you recall something the more “important” it is. You are bombarded by marketing materials which are all positive. When considering a tool, your natural flow of thought is “How easily can I remember positive sides of the tool?” and it’s easy, because you were probably brainwashed already by how good the tool is. Then “How easily can I remember negative sides of the tool?” is much harder. This is not the kind of information that will be pushed to you by the people behind the tool, they have no interest in doing so. Their money goes to advertise how good the tool is, not how bad it is. You can balance your rosy impressions of any tool or product with looking at GitHub issues, digging StackOverflow for the downsides, or reading posts like this one.

Please, assume that X is the wrong tool for your needs.

As opposed to “yeah, looks good, let’s use it”, this approach leads to more thoughtful tool selection process. Please read Prove your tool is the right choice.

“Everybody uses X”

“Everybody uses X” thought might have been planted in your brain by marketing efforts. Please analyze the source of that thought carefully. Maybe you have heard from some of your friends and/or colleagues about the product and made a generalization? Maybe people are just stuck with it? Maybe that’s what they know? Did you search for alternatives? Did you try to disprove “Everybody uses X”?

“Everybody uses X, therefore it’s good”

Whether this thought was planted by marketing or not, no, there is no logical connection between the first and the second clauses.

If a lot of people use something, it becomes better as there is more feedback and contributors. It is often implied that therefore X is good. Improvement over time or with user base does not mean X is good enough for any particular use right now.

Did you communicate with the people that use X? Did they tell you it was a good decision? Beware of Choice-supportive bias when you talk to them. Which alternatives did they consider? Are they able to articulate downsides? Every solution has downsides, being able to recognize these increases credibility of the opinion about X.

“Everybody uses X, we should use X”

Yes, if you consider the value of “then we can blog about it and be part of the hype, possibly getting some traction and traffic”. This might have some estimated value which should be weighted against the cost incurred by choosing otherwise unneeded or inferior tool or technology. You can point your bosses to this paragraph, along with your estimation of the costs of using tool X vs better alternatives (which might be just not using it and coding yourself the needed functionality for example, the comparison is valid for both X vs Y and X vs without X).

No, “We should use X” does not logically follow from “Everybody uses X”. Beware of conformity bias.

“Company C uses X”

This piece of information, when served by vendor of X implies that company C knows better and you should use X too.

Company C is big and respectable company with smart engineers. The vendor of X will gladly list big and reputable companies that use X. That’s the use of “Argument from authority”.

Again, there is no straight logical path between “C uses X” and “we should use X too”.

Chances are that company C is vastly different from your company and their circumstances and situation are different from yours.

Company C can also make mistakes. You are unlikely to see a blog post from vendor of X that is titled “Company C realized their mistake and migrated from X”.

Claims of success with tool X

Treat claims of successful usage of tool X with caution. Searching quickly for “measuring project success” reveals the following dimensions to be looked at when estimating a success of a project:

  • Cost
  • Scope
  • Quality
  • Time
  • Team satisfaction
  • Customer satisfaction

The claims of successful usage of tool X carry almost no information regarding what really happens. “We are using Puppet successfully” might mean (when taken to extreme) that for 100 servers and one deploy per day the following applies:

  • Cost: There is dedicated team of five costly operations people that work just on Puppet because it’s complex.
  • Scope: Puppet covers 80% of the needs, this might be the only dimension looked into when claiming a success.
  • Quality, Team satisfaction: This team is constantly cursing because of bugs, modules or Puppet upgrades issues such as Upgrade to puppet-mysql 3.6.0 Broke My Manifest (fixed in just two months!) or puppet 4.5.0 has introduced a internal version of dig, that is not compatible to stdlib’s version oopsie.

    Enjoy the list of regression bugs. It’s hard to blame Puppet developers for these bugs because these kinds of issues are natural for projects of this size and complexity. I suggest that creating your own domain-specific language, which is not a programming language for a configuration management tool is a bad idea. I’ll elaborate about this point in a bit, in the “Puppet DSL” section.

  • Time: Took 6 moths of the above team to implement Puppet. Unpredictable time to implement any feature because of complexity and unexpected bugs along the way.
  • Customer satisfaction: Given all of the above it’s hard to believe in any kind of satisfaction with what’s going on.

It’s also worth to keep in mind that any shown success, real success, does not mean that same solution will be equally applicable to your situation, because it’s almost certainty different on one or more dimensions: time, budget, scope (problem you are solving), skills, requirements.

“But X also provides feature F”

I am sure that the advertisements will mention all the important features as well as “cool” features. Do you really need F?

When choosing a tool the thought “But X also provides feature F” might be dangerous if F is not something you immediately need. One might think that F might be needed later. This might be the case but what are the odds, what’s the value of F to you, how much will it cost to implement using another tool or write yourself? Also, consider the “horizon”. If you might need feature F in 3 years, in many situations this should be plainly ignored. In 3 years there might be another tool for F or you might switch from X to something else for other reasons by then.

Suppose there is another tool X2 which is alternative to X. X2 does not provide F but it’s estimated TCO over a year is 50% less than F. You should consider the costs because it might be that X2 for the first year and then switching to X, including the switching costs can be cheaper.

Putting tools before needs

“So, there is new trendy hypy tool X. How can we use it?” is typically a bad start. At the very least it should be “So, there is new trendy hypy tool. Do we have any problems where X would be a better alternative?”

Ideally the approach would be “We have problem P, which alternative solutions do we have?”. P might be some inefficiency or desired functionality. Solutions, once again, do not have to mean existing tools.

Puppet – the good parts

I will quickly go over a few good parts because I want this post at least to try to be objective.

Convergence

Convergence is an approach that says one should define the desired state, not the steps to be taken to get there. The steps are abstracted away and on each run the system will try to achieve the desired state as closely as possible.

I do agree that declaring a resource such as file, user, package or service and it’s desired state is a good approach. It’s concise and it’s usually simpler than specifying the operations that would lead to the desired state, like regular scripts do. This idea manifests in many other tools too: Chef, Ansible, CloudFormation, Terraform.

Appropriate in some situations

  • Think about a startup where someone does part time systems engineering job, not a professional. As Guy Egozy pointed out, there are situations such as startups with limited resources and basic needs where using a configuration management tools might make more sense than in other situations.
  • Urgent demo with all defaults if you have a good control of the tool and you know that you need some very specific functionality, say wordpress+mysql demo tomorrow, it is probably worth to prepare the demo with Puppet or Chef. There is still a danger of course that the module you were using a month ago have now changed and you need to invest additional time to make things work. Or maybe the module is just broken now.

Multiple platforms support

In my experience the chances that you will be running same applications on say Windows and Linux are pretty slim. The overlap of installed software on different platforms is likely to be infrastructure tooling only (monitoring, graphing, logging). Is it really worth the price?

Puppet DSL

Puppet class

Puppet DSL has a concept of “class” which has nothing to do with classes in programming languages. It least in retrospective it was not such a good idea, especially when considering operations guys trying to explain about Puppet classes to developers.

Limited DSL limitations 🙂

Acknowledged as a problem by facts

Limitations of DSL in my opinion were acknowledged by actions taken by Puppet’s developers and contributors:

Limited DSL is not a great idea!

I do understand why limited DSL can be aesthetically and mathematically appealing. The problem here is that life is more complex than limited DSL. What could be 10 lines of real code turns into 50 lines of ugly copy+paste and/or hacks around the DSL limitations.

It sounds reasonable that at the time when CFengine and Puppet were created there were not enough examples of shortcomings of limited DSLs and their clashes with real life. Today we have more:

  • Puppet – DSL failure admitted by actions, as discussed above.
  • Ansiblejust looks bad . Some features look like they were torn from a programming language and forced into YAML.
  • Terraform – often generated because well … life. This one is more of a configuration language by design. This approach has pros and cons when applied to infrastructure.
  • CloudFormation – 99% configuration format and 1% language, that’s why it’s generated for all except trivial cases. You do have the alternative of not generating CloudFormation input file but provide custom resources which use AWS Lambda functions instead. They will do some of the work. While this fits CloudFormation model perfectly, and makes CloudFormation much more powerful, I would really prefer a script over inversion of control and additional AWS service (Lambda) which I have to use – one more thing that can go wrong or just be unavailable when needed the most.

I do not agree that Terraform should be limited the way it is, but in my opinion, Terraform and CloudFormation are more legitimately limited while Puppet and Ansible are just bad design. This limitation by design causes complex workarounds which are costly and sometimes fragile, not to mention mental well being of the systems engineers that are working with Puppet.

We can all stop now creating domain specific languages for configuration management which were not built on top of real programming languages. Except for a few cases, that’s a bad idea. We can admit it instead of perpetuating the wishful thinking that the reality is simple and limited DSL can deal with it somehow.

Puppet modules

Dependencies between Puppet modules

Plainly headache. Modules have dependencies on other modules and so on. Finding compatible modules’ versions is a hard problem. That’s why we have Librarian-puppet. As I mentioned above, it has it’s own issues.

There are also issues that Librarian-puppet can not solve, which are inherent to system of this scale, complexity and number of contributors. Let’s say you have module APP1 that depends on module LIB and module APP2 that also depends on LIB. Pinning version of module LIB because APP1 has a bug can prevent you from upgrading module APP2 which in newer versions depends on newer LIB. This is not imaginary scenario but real life experience.

Breakage of Puppet modules

Another aspect is that in this complex environment it’s somewhere between hard and impossible for any module maintainer to make sure his/her changes do not break anything. Therefore, they do break:

Popular community modules deal with so many cases and operating systems that breakage of some functionality is inevitable.

Community modules

There is this idea that is kind of in the air: “you have community modules for everything, if you are not using them you are incompetent and wasting your time and money”.

This could come from 3 sources:

  • Marketing
  • People that use community modules for simple cases and they work fine
  • People that underestimate the amount of maintenance work required to make community modules work for your particular case.

The feedback that I’ve got several times from different sources is that if you are doing anything serious with your configuration management tool, you should write your own modules, fitting community modules to your needs is too costly.

Graph dependencies model problems

Do you know people who think in dependency graphs? It looks like most people that I know are much more comfortable thinking about sequence of items or operations to perform in most cases. Thinking about dependency graphs such as about packages’ versions compatibility usually comes with recognizable significant mental effort, often with curses.

Puppet team admitted (again, by actions) this is a problem and introduced ordering configuration and made “manifest” ordering the default at some point. Note that this ordering is only for resources without explicit dependencies and within one manifest.

Graphs are somewhat implicit. This causes surprise and consequential WTFs. Messages about dependencies errors are not easily understood.

Marketing

  • Puppet usage is compared to manual performance of the same tasks – “Getting rid of the manual deployments”. This is clearly a marketing trick: comparing your tool to the worst possible alternative, not other tools which are similar to yours.
  • Puppet is compared to bash scripts. Why not Python or Ruby?
  • “Automate!” is all over Puppet site. Implies that Puppet is a good automation tool.
  • Top 5 success stories / case studies use Puppet Enterprise? Coincidence? I think not 🙂

Thanks

Many thanks for guidance to Konstantin Nazarov (@racktear). We met at DevOpsDays Moscow 2017 where he offered free guidance lessons for improving speech and writing skills. In reality, lessons also include productivity tips which help me a lot. Feel free to contact Konstantin, he might have a free weekly slot for you.


Have a productive career!

About declarative frameworks and tools

This post is a reply to “just use Terraform” recommendation I’ve just seen. I hope more people will benefit from my perspective if it’s posted here. There is plenty of marketing behind most of the tools I mention here. It’s all rosy, see the “Life before Puppet” video. Let’s balance this marketing bullshit a bit.

Think twice before using declarative framework/tool

Terraform, CloudFormation, Puppet, Chef as any other declarative frameworks/tools take control away from you. They work fine for “hello world” examples. Then there is life where you need something these frameworks did not anticipate and you are sorry you have not coded everything yourself from the start. Now you are stuck with these tools and you will be paying for it in your time and money. Working around limitations of such tools is a pain.

I am using CloudFormation and have used Puppet and Chef in the past. These tools do have their place. In my opinion it’s a very limited set of scenarios. Terraform, CloudFormation, Puppet and Chef are used much more widely than they should be.

These tools have some value but too often people neglect the cost which in many cases outweighs the value. Most of the cost comes from inflexibility. Terraform and CloudFormation are so limited that people frequently use another tool for generating these. That adds another bit to the cost.

I’m hearing frequently from a friend (sorry, can’t name him) how much they suffer from Terraform’s inflexibility. Inflexibility can not be fixed because it’s a declarative framework. Unfortunately they are so invested in Terraform that they will continue to spend hundreds of hours to fight it. Chef is causing trouble there too, community Cookbooks proved to be a mismatch for the needs and sanity of the engineers there.

… and there is this gem

A key component of every successful Puppet implementation is access to a knowledgeable support team

That’s from https://puppet.com/support-services/customer-support/support-plans

Are you sure you want to use Puppet? Apparently you can’t do it well without their support… Just saying…

Is one of these tools right for you?

Regular considerations for choosing a tool apply. See my older post “Prove your tool is the right choice“.

Expected replies and my replies to those

You don’t get it.

OK

You don’t understand these tools.

OK

You are not using these tools right / as intended.

OK

Are you crazy? You want to code everything yourself?

Let’s take it to the extreme: no new code should be written. No libraries, no frameworks. Because everything already exists. Sounds about right.

People smarter than you have figured it all out, use their tools

Smarter people don’t always produce better solutions or solutions that fit your use case. Most of the time smart people will produce smart solutions… and then there are people that don’t usually think in graphs and are really puzzled when debugging Puppet cyclic dependency errors for example.

Most of the code you need is already written, don’t waste time and money, use it! Community Cookbooks and modules are great!

This is marketing bullshit. Don’t buy it! It’s often more expensive to adopt a code that does not meet your exact needs and is much more complex that you need (because it should support multiple platforms and use cases) than to write your own. I have seen suffering followed by usage of community Cookbooks/modules followed by in-house rewrite or fork.

Don’t you care about the next guy? Work with standard tools!

Let’s do some math. Team of two works for a year. They are (very modest estimation) 10% more productive because they have coded whatever they needed and were not fighting with the tools. Even when wrongfully assuming that custom solution is harder to understand for the 3rd guy that joined the team after one year, how much is it harder? Is it more than 300 hours harder?

Update following responses on Reddit

2017-04-28

2 totally different toolsets – infrastructure orchestration (Terraform, Cloudformation), and Configuration Management (Puppet, Chef)… — (/u/absdevops)

Yes. What is common to all these tools is declarative style and their usage: these tools are typically run using CLI.

All these tools have three axes that I consider:

  1. “Input” axis: What’s the input of these tools?
    1. Configuration format
    2. Half-baked programming language that was probably never indented to be a programming language
    3. Real programming language
  2. “Calling” axis: framework vs library (typical usage)
  3. “TCO” axis: TCO vs other solutions, especially vs the other solution that is always available – code the subset of the functionality that you need yourself

I’d like to make sure that it’s clear that the tools mentioned in this article have different positions on the 3 axes and are not equal in the value they provide you in your specific situation.

The main point of the article is that while these tools differ on axes 1 and 3, they are all limiting because conceptually, they are all frameworks. You pass your execution into the tool and it does a lot. Here is where you loose your flexibility as opposed to using a library. You have relatively little control of what happens inside the tool.

I must strongly disagree with Terraform being put in the list – its a great base tool with limitations that can be worked around. — (/u/absdevops)

I don’t want to work around limitations. It seems to be the norm for these tools. I’d rather have a library that misses parts that I’d code myself. Working around limitations in my opinion is generally much worse than missing functionality (depends on specific circumstances of course).

Regarding inflexibility – it’s probably the most flexible tool of the bunch — (/u/absdevops)

Please note we are still comparing between the tools that all use limiting paradigm: frameworks

I will also duel anyone to the death for preference of Cloudformation syntax to Terraform — (/u/absdevops)

We are talking about the “Input” axis I mentioned above. Yes, Terraform syntax apart from being more aesthetically pleasing is somewhat closer to “Half-baked programming language that was probably never indented to be a programming language” while CloudFormation is somewhat closer to “Configuration format”.

I totally disagree with points made about having to generate Terraform manifests. … generate what you need specifically, and hand it off to Terraform, much like making an API call to a library. — /u/SlinkyAvenger

There is a huge difference in the amount of work done by typical API call and what these tools do once you call them. With more granular API calls you decide if and when you do specific calls and what do you do in between the calls – it’s much more flexible.

I’m also a big proponent of Puppet — /u/SlinkyAvenger

One of the low value tools from my perspective. I’ll explain. On the “Input” axis, it’s half-baked programming language. Better than configuration file but still loses to Chef for example. On the “TCO” access, I really think that Puppet and Chef are not good alternatives to custom scripts in most cases. Scripts by the way also win on the “Calling” axis, which means flexibility.

I’d really like to hear what you’re honestly going back to puppet support for. — /u/neoghostz

We don’t. When we suffered while working with Puppet, we knew that support will not solve our problems. Some crappy community modules can not be solved by support. Breakage on modules versions updates – same. Librarian, more complexity on top of complexity – same. The above quote about support (“A key component of every successful Puppet implementation is access to a knowledgeable support team”) was just to highlight that guys at Puppet think people can’t use it without support. This is just a humorous point and not really important.

What is the point of this article? It basically dumps on Terraform, CF, Puppet, Chef, etc., but offers no actual criticism (other than a vague ‘it takes away control’ statement) or, perhaps more importantly, alternatives. — /u/cryonine

The point is that all these tools would have been better if they would be implemented as libraries on top of real programming languages, where you call the parts that you need instead of one “do everything” call.

With the exception of Chef, these tools use either configuration files as input or configuration-file-almost-a-programming-language format. It’s always the same path:

  1. We need small limited DSL it’s so academically beautiful, we can prove theorems about this.
  2. Oh wait, there are real world scenarios where it’s not enough, damn these complaining engineers.
  3. Let’s add stdlib
  4. Let’s add proper loops
  5. Now we have a half-baked programming language.

Elaborating on taking control away from you. You get convoluted things like this:

Alternatives

For Puppet and Chef, I have not seen a single system where my estimated TCO of these tools would be better than a bunch of idempotent modular bash scripts which I use. It did not take much time to write these. Some Python is used for configuration generation (json / jinja templates + environment data).

With Cloudformation and Terraform it’s not that simple. I’m mostly amazed that nobody does libraries which would just provide declarative primitives, not frameworks where you feed everything you need via one call. I am working on one but it is really strange for me that I haven’t heard already about such library for Python or Ruby.

Terraform … vastly superior to any other alternate out there — /u/cryonine

Not sure I agree 100% because it depends on situation but I can imagine many situations where it’s correct. The important thing here is that I think that all current alternatives are not so good.

How is it wrongful to assume that a custom solution is harder to understand? That’s completely accurate. — /u/cryonine

Custom solution is simpler. Do you really need documentation for 19 lines of bash code that installs Nginx and another 29 that do a restart that handles leaking file descriptors? You will definitely need documentation of 2000+ lines of Chef cookbook or Puppet module that install Nginx and … oh wait… how do I reload Nginx and then conditionally (if enough file descriptors leaked) restart it? Time to dive in 🙂

I do imagine how custom solution can be complicated (read harder to maintain and higher TCO) if done by unprofessional people. In some cases it might be better for them to use a framework. On the other hand, they might stuck when trying to do something advanced with the framework. Really depends on the situation.

While “use standard tools” generally sounds right, I have seen too much convoluted solutions using “standard tools” because of the inflexibility. People were trying to work around the limitations. Comparing top-down execution of simple script to workarounds for these tools, it’s much simpler to wrap your head around the scripts. I have recently passed one of my clients to the next guy. I have asked him how he is doing and he told me that he was happy to have simple custom solution over complex frameworks. TCO has many components. Choosing “standard tools” does not always outweigh other aspects.

 


Have a nice day and a productive life!

NGS unique features – improving NodeJS require()

Background: what is NGS?

NGS, the Next Generation Shell is a (work in progress) shell and a programming language built ground up for systems engineering tasks. You can think of it as bash that’s designed today: sane syntax, data structures, functional programming, extensibility, cloud in mind, declarative primitives.

What’s good in NodeJS’ require()

I like most of how require() works in JavaScript. I’m not talking in this post about npm, just NodeJS require() function. require() does not pollute your namespace, you just get a reference, it’s simple to use and easy to reason about.

const a = require('cool-aws-wrapper');
// Can not be done easily with AWS SDK:
a.deleteRoute53Record('testing25.example.com');

What’s there to improve in require() ?

NodeJS modules are usually fall into one of the categories:

  1. Class definition / big library that manages it’s own namespace. These usually end with module.exports = MyClass. No problem here.
  2. Group of functions or classes. These usually end with module.exports = { func1, func2, func3, ...} lists (ES6 syntax, otherwise written as module.exports =  { func1: func1, ... } ) which I think are cumbersome.

How require() and modules look in NGS?

Note that require() in NGS is work in progress and it doesn’t have much of the functionality that NodeJS provides. I just started with things that bothered me the most.

Consistent with other places in NGS, require() returns the last evaluated expression. NodeJS for example returns module.exports which you must explicitly set as the result of require().

I think of modules primary as a namespaces. Creating a namespace in NGS has a syntax: ns { ... } .

Combining require() behaviour of returning last evaluated expression and namespace syntax, typical NGS module consists of single top level expression which evaluates to a namespace. The whole module file can look like this:

ns {

  global init

  type Vpc
  type Subnet

  F init(v:Vpc) {
    ...
  }

  F _helper_func(s:Str) { ... }

  MY_CONST = 42

  F ok() {
    echo("OK")
  }

}

Let’s ignore the global for now, it’s about how methods and types’ instances creation are implemented in NGS. Anything defined inside the ns { ... } is exposed as namespace member so usage of the above module could look like this:

{
  m = require('mymodule.ngs')
  vpc = m::Vpc()
  echo(m::MY_CONST)
  m::ok()
}

As you probably guessed, the :: operator is the namespace member access operator.

There is no need to explicitly state what module/namespace exports. That’s the improvement over NodeJS’ require().

How ns works and more options for the curios

ns { … } returns a Hash

As stolen from NodeJS, the namespace syntax (ns { ... }) returns a Hash. In NodeJS, require() typically returns JavaScript Object which is close enough for the purpose of this post.

About :: operator

The namespace member access operator :: is actually a Hash key access operator. It is helpful because the regular syntax for accessing members is not always a good fit for namespaces. The regular member access syntax is dot (.) but the dot syntax is also a function call: myobj.field – is a field/key/attribute access but myobj.func() is equivalent to func(myobj). For example, m::ok() will call the ok function defined in the module, m.ok() will call the function ok in current lexical environment with m as parameter.

As a bonus, since :: is an operator, it is implemented as function call. This means you can define how :: works with types that you define and modify how :: works with existing types.

ns { … } syntax implementation

For simplicity of implementation and absence of obvious reasons against, ns { ... } syntax is just a syntactic sugar for defining anonymous function without parameters and calling it immediately. The though behind this decision was simple: “I need to implement namespaces. Let’s see where I have them already. Oh, namespaces are already implemented in functions. This is so convenient, I can use this mechanism with minimal effort”.

How ns knows what to return?

ns is mostly a syntactic hack:

  1. Inside the ns body, the first statement, before any use-supplied statements is _exports = {} which sets the local variable _exports to an empty Hash.
  2. Any assignment and function definition also set _exports["something"]. MY_CONST = 42 becomes MY_CONST = 42;  _exports["MY_CONST"] = MY_CONST;
  3. Exception to the rule above are variables and functions with names starting with underscore (_). They are not automatically added to _exports. This for example is why _exports itself is not exported.
  4. Last statement, after all user-supplied statements is _exports.

The behavior I just described looks like sane defaults to me. As we all know, the life is usually more complex than hello world examples and customizations are need. Here are two ways to customize the resulting namespace.

  1. return your_expr – since ns is just a function, you can use return at any point to return your own custom namespace.
  2. manipulate _exports however you want towards the end of ns body. For example after _exports .= filterv(Type) only types will be exported. _exports.filterk(/^pub_/) will only export symbols (keys) that have names that start with pub_ .

Improvement suggestions are welcome! Have a nice day!

NGS unique features – Hash methods I wish I had in other languages

NGS is a language and a shell that I am building for systems administration tasks. Enough of the language is implemented to enable writing some useful scripts. The shell is not there yet.

Some of the Hash methods in NGS

The methods for working with Hash I have not seen all at once in other languages are:

  1. filterk – filter Hash by key (produces Hash)
  2. filterv – filter Hash by value (produces Hash)
  3. mapk – map Hash keys (produces Hash)
  4. mapv – map Hash values (produces Hash)
  5. mapkv – map Hash keys and values (produces Hash as opposed to map which produces an array)
  6. without – filters out specific key

How these are actually used? Following is an excerpt from the pollute method (function), which is a part of the AWS module. It uses several of the Hash methods I mentioned, making the method a good example. pollute method (as in “pollute global namespace”) enables using Vpc variable for example instead of AWS::Vpc and so on. I would like to have this behaviour for small quick-and-dirty scripts but not as default so it’s in a method that one can optionally call.

F pollute(do_warn=true) {

    vars =
        _exports.filterk(/^AMI_OWNER/) +
        _exports.filterv(Type).without('Res').without('ResDef') +
        ...

    if do_warn {
        warn("Polluting ...: ${vars.keys().join(', ')}")
    }

    vars.mapk(resolve_global_variable).each(set_global_variable)
}

Let’s go over the code above step by step:

F pollute(do_warn=true) { ... } defines the pollute method with optional parameter do_warn that has default value of true.

_exports is a Hash containing all of the AWS’ module variables and functions, similar to NodeJS module.exports but members are added automatically rather than explicitly. Only the methods and variables that do not start with _ (underscore) are added. One can modify _exports in any way before the end of the module. I will write more in detail about require() and modules in NGS in another post.

filterk(/^AMI_OWNER/) filters all the variables that match the given RegExp

filterv(Type) filters all the variables that are of type Type. These are AWS types’ definitions, such as Vpc, Subnet or Instance.

without('...') filters out the types I don’t like to override.

+ between and after _exports.filterk(...) and _exports.filterv(...) joins the hashes.

mapk translates variables’ names into their index (using resolve_global_variable)

each runs set_global_variable with variable index and the value to set

Hash methods in other languages

I am aware that some of the methods above are present in other languages or libraries. Some examples:

  1. Ruby has mapv (transform_values) method.
  2. Rails has mapk (transform_keys) and mapv.
  3. Perl 6 can modify values in a convenient manner:for %answers.values -> $v is rw { $v += 10 };.

What I have not seen is a language which has all the methods above out of the box. I have a feeling that arrays get all the fame methods while hashes (dictionaries) often get less attention in other languages.

Why NGS has all these methods?

NGS is aiming to be convenient for systems administration tasks. More often than not these tasks include data manipulation. NGS has many functions (methods) for data manipulation, including the ones listed in this post.

Update: reddit discussion


Have a nice day!

NGS unique features – exit code handling

smilies-1607163_640

How other languages treat exit codes?

Most languages that I know do not care about exit codes of processes they run. Some languages do care … but not enough.

Update / Clarification / TL;DR

  1. Only NGS can throw exceptions based on fine grained inspection of exit codes of processes it runs out of the box. For example, exit code 1 of test will not throw an exception while exit code 1 of cat will throw an exception by default. This allows to write correct scripts which do not have explicit exit codes checking and therefore are smaller (meaning better maintainability).
  2. This behaviour is highly customizable.
  3. In NGS, it is OK to write if $(test -f myfile) ... else ... which will throw an exception if exit code of test is 2 (test expression syntax error or alike) while for example in bash and others you should explicitly check and handle exit code 2 because simple if can not cover three possible exit codes of test (zero for yes,  one for no, two for error). Yes, if /usr/bin/test ...; then ...; fi in bash is incorrect! By the way, did you see scripts that actually do check for three possible exit codes of test? I haven’t.
  4. When -e switch is used, bash can exit (somewhat similar to uncaught exception) when exit code of a process that it runs is not zero. This is not fine grained and not customizable.
  5. I do know that exit codes are accessible in other languages when they run a process. Other languages do not act on exit codes with the exception of bash with -e switch. In NGS exit codes are translated to exceptions in a fine grained way.
  6. I am aware that $? in the examples below show the exit code of the language process, not the process that the language runs. I’m contrasting this to bash (-e) and NGS behaviour (exception exits with non-zero exit code from NGS).

Let’s run “test” binary with incorrect arguments.

Perl

> perl -e '`test a b c`; print "OK\n"'; echo $?
test: ‘b’: binary operator expected
OK
0

Ruby

> ruby -e '`test a b c`; puts "OK"'; echo $?
test: ‘b’: binary operator expected
OK
0

Python

> python
>>> import subprocess
>>> subprocess.check_output(['test', 'a', 'b', 'c'])
... subprocess.CalledProcessError ... returned non-zero exit status 2
>>> subprocess.check_output(['test', '-f', 'no-such-file'])
... subprocess.CalledProcessError: ... returned non-zero exit status 1

bash

> bash -c '`/usr/bin/test a b c`; echo OK'; echo $?
/usr/bin/test: ‘b’: binary operator expected
OK
0

> bash -e -c '`/usr/bin/test a b c`; echo OK'; echo $?
/usr/bin/test: ‘b’: binary operator expected
2

Used /usr/bin/test for bash to make examples comparable by not using built-in test in bash.

Perl and Ruby for example, do not see any problem with failing process.

Bash does not care by default but has -e switch to make non-zero exit code fatal, returning the bad exit code when exiting from bash.

Python can differentiate zero and non-zero exit codes.

So, the best we can do is distinguish zero and non-zero exit codes? That’s just not good enough. test for example can return 0 for “true” result, 1 for “false” result and 2 for exceptional situation. Let’s look at this bash code with intentional syntax error in “test”:

if /usr/bin/test --f myfile;then
  echo OK
else
  echo File does not exist
fi

The output is

/usr/bin/test: missing argument after ‘myfile’
File does not exist

Note that -e switch wouldn’t help here. Whatever follows if is allowed to fail (it would be impossible to do anything if -e would affect if and while conditions)

How NGS treats exit codes?

> ngs -e '$(test a b c); echo("OK")'; echo $?
test: ‘b’: binary operator expected
... Exception of type ProcessFail ...
200

> ngs -e '$(nofail test a b c); echo("OK")'; echo $?
test: ‘b’: binary operator expected
OK
0

> ngs -e '$(test -f no-such-file); echo("OK")'; echo $?
OK
0

> ngs -e '$(test -d .); echo("OK")'; echo $?
OK
0

NGS has easily configurable behaviour regarding how to treat exit codes of processes. Built-in behaviour knows about false, test, fuser and ping commands. For unknown processes, non-zero exit code is an exception.

If you use a command that returns non-zero exit code as part of its normal operation you can use nofail prefix as in the example above or customize NGS behaviour regarding the exit code of your process or even better, make a pull request adding it to stdlib.

How easy is to customize exit code checking for your own command? Here is the code from stdlib that defines current behaviour. You decide for yourself (skipping nofail as it’s not something typical an average user is expected to do).

F finished_ok(p:Process) p.exit_code == 0

F finished_ok(p:Process) {
    guard p.executable.path == '/bin/false'
    p.exit_code == 1
}

F finished_ok(p:Process) {
    guard p.executable.path in ['/usr/bin/test', '/bin/fuser', '/bin/ping']
    p.exit_code in [0, 1]
}

Let’s get back to the bash if test ... example and rewrite the it in NGS:

if $(test --f myfile)
    echo("OK")
else
    echo("File does not exist")

… and run it …

... Exception of type ProcessFail ...

For if purposes, zero exit code is true and any non-zero exit code is false. Again, customizable. Such exit code treatment allows the if ... test ... NGS example above to function properly, somewhat similar to bash but with exceptions when needed.

NGS’ behaviour makes much more sense for me. I hope it makes sense for you.

Update: Reddit discussion.


Have a nice weekend!