“But it works”

TL;DR – this is not nearly good enough in most cases and it’s only small fraction of what you are paid for.

I want this post to be the canonical place to refer people to who say “but it works” because people who explain why this is not OK are tired of repeating the same arguments, me included.

You are paid for …

Following is not an exhaustive list but it should give you some perspective which is opposite from the narrow-minded “but it works”.

Typically, your Software Engineering $Job pays you for:

  1. Of course the thing must work. But also..
  2. It should continue working
    1. Gives deprecation warnings? Probably not good.
    2. Only runs on Node.js v10 LTS which is end of life in less than a year (as of writing)? Think again.
    3. Got away with invalid XML? Can you be sure that the next version of parser won’t be stricter?
  3. It should be maintainable (aka you and other people should find it easy to operate and modify, now and years later)
    1. Code quality
    2. Tests (if you don’t have tests, even your basic claim that something “works” is under suspicion)
    3. Documentation
      1. How to use your sh*t?
      2. How to set up the development environment?
      3. Decisions
      4. Non-obvious code parts
  4. It should be production ready, not abstract “works” or even worse “works on my machine”
    1. Logs
    2. Metrics
    3. Tested in dev/qa/whatever-you-call it environment
    4. Reproducible – tomorrow they make a new environment, “qa42”, in a different AWS account in a different region. Could somebody else deploy your sh*t there without talking to you?
    5. Update 2020-09-13 (from Guy Egozy) – Scalable enough to be used in production.

If you claim that you are “done” because “it works”, congratulations, you have a (probably) working prototype. That’s typically small part of a project.


Related term – “Tactical Tornado”, look this up.


Update 2020-09-13: Reddit discussion

Python 3.8 Makes me Sad Again

Looking at some “exciting” features landing in Python 3.8, I’m still disappointed and frustrated by the language… like by quite a few other languages.

As an author of another programming language, I can’t stop thinking about how things “should have been done” from my perspective. I want to be explicit here. My perspective is biased towards correctness and “WTF are you doing?”. Therefore, take everything here with a appropriate amount of salt.

Yes, not talking about any “positive” changes here.

Assignment Expressions

There is new syntax := that assigns values to variables as part of a larger expression.

A fix which couldn’t be the best because of previous design decision.

“Somebody” ignored the wisdom of Lisp, which was “everything is an expression and evaluates to a value” (no statements vs expressions), and made assignment a statement in Python years ago. Now this can not be fixed in a straightforward manner. It must be another syntax. Two different syntaxes for almost the same thing which is = for assignment as a statement and := for expression assignment.

Positional-only Parameters

There is a new function parameter syntax / to indicate that some function parameters must be specified positionally and cannot be used as keyword arguments:

def f(a, b, /, c, d, *, e, f):
    print(a, b, c, d, e, f)

Trying to clean up a mess created by mixing positional and named parameters. Unfortunately I did not give it enough thought at the time and copied parameters handling behaviour from Python. Now NGS also has the same problem as Python had before 3.8. Hopefully, I will be able to fix it in some more elegant way than Python did.

LRU cache

functools.lru_cache() can now be used as a straight decorator rather than as a function returning a decorator. So both of these are now supported

OK. Bug fix. But … (functools.py)

    if isinstance(maxsize, int):
        # Negative maxsize is treated as 0
        if maxsize < 0:
            maxsize = 0

If you are setting LRU cache size to a negative number, it’s 99% by mistake. In NGS that would be an exception. That’s the approach that causes rm -rf $myfolder/ to remove / when myfolder is unset. Note that the maxsize code is not new but it’s still there in Python 3.8. I guess that is another mistake which can not be easily fixed now because that would break “working” code.

Collections

The _asdict() method for collections.namedtuple() now returns a dict instead of a collections.OrderedDict. This works because regular dicts have guaranteed ordering since Python 3.7

OK. Everybody had the mistake of making maps unordered: Perl, Ruby, Python.

  1. Ruby fixed that with the release of version 1.9 in 2008 (according to the post).
  2. Python fixed that with the release of version 3.7 in 2018 (which I take as 10 years of “f*ck you, the developer”).
  3. Perl keeps using unordered maps according to documentation.
  4. Same for Raku, again according to the documentation.

NGS had ordered maps from the start but that’s not a fair comparison because NGS project started in 2013, when the mistake was already understood.


How all that helps you, the reader? I encourage deeper thinking about the choice of programming languages that you use. From my perspective, all languages suck, while NGS aims to suck less than the rest for the intended use cases (tl;dr – for DevOps scripting).


Update 2020-08-16

Discussions:

  1. https://news.ycombinator.com/item?id=24176823
  2. https://lobste.rs/s/rgcgjz/python_3_8_makes_me_sad_again

Update 2020-08-17

It looks like the article above needs some clarification about my perspective: background, what I am doing and why.

TL;DR

The main points of the article are:

  1. Everything still sucks, including Python. By sucks I mean does not fit well with the tasks I need to do neither aligned with how I think about these tasks.
  2. I am trying to help the situation and the industry by developing my own programming language

Background about my Thinking

In general, I’m amazed with how bad the overall state of programming is. That includes:

  1. All programming languages that I know including my own NGS. This is aggravated by inability to fix anything properly for any language with substantial amount of code written in it because you will be breaking existing code. And if you do break, you get the shitstorm like with Python 3 or Perl 6 (Raku).
  2. Code quality of the programs written in all languages. Most of the code that I have seen is bad. Sometimes even in official examples.
  3. Quality of available materials, which are sometimes plainly wrong.
  4. Many of existing “Infrastructure as code” solutions, which in most cases follow the same path:
    1. Invent a DSL or use YAML.
    2. “figure out” later that it’s not powerful enough (by the way there is an elegant solution – a programming language, forgot the name)
    3. Create pretty ugly programming language on top of a DSL that was intended for data.

I am creating new programming language and a shell out of frustration with current situation, especially with bash and Python. Why these two? Because that’s what I was and still using to get my tasks done.

Are these languages bad? I don’t think it’s a question with any good answers. These languages don’t fit the tasks that I’m trying to do nor are aligned with how I think while being apparently one of the best choices available.

This Article Background

  1. Seen some post on RSS about new features in Python 3.8.
  2. Took a look.
  3. Yep, everything is still f*cked up.
  4. Wrote a post about it which was not meant to be “deep discussion about Python flaws”.

I was not planning to invest more time in this but here I am trying to clarify.

And your Language is Better? Really?

Let’s clarify “better”. For me, it’s to suck less than the rest for the intended use cases.

author really does consider himself a superior language designer than the Python core-dev team

( From https://www.reddit.com/r/Python/comments/iartgp/python_38_makes_me_sad_again/ )

I consider myself in much easier circumstances:

  1. No substantial amount of code is written in NGS yet.
  2. I’m starting later and therefore have the advantage of looking at more languages, avoiding bad parts, copying (with adaptation) the good parts.
  3. NGS targets a niche, it’s not intended to be general purpose language. Choices are clearer and easier when you target a niche.
  4. The language that I’m creating is almost by definition is more aligned with how I think. Hoping that people out there will benefit from using NGS if it is more aligned with how they think too.
  5. See also my Creating a language is easier now (2016) post.

Will I be able to make a “better” language?

From technical perspective, that’s probable: I am a skilled programmer in several languages and I have languages to look at more than everybody else had before. My disadvantage is not much experience in language design. I’m trying to offset that with thinking hard (about the language, the essence of what is being expressed, common patterns, etc), looking at other languages and experimenting.

From marketing perspective, I need to learn a lot. I am aware that “technically better” doesn’t matter as much as I would like to. Without community and users that would be a failed project.

Also don’t forget luck which I might or might not have.

What if NGS fails?

I think that the situation today is unbearable. I’m trying to fix it. I feel like I have to, despite the odds. I hope that even if NGS fails to move the industry forward it would be useful to somebody who will attempt that later.

NGS Unique Features – the_one()

I’ve spotted the following common data access patterns.

Get the Only Element in an Array

You need to get the only element of an array as in

my_val = my_arr[0]

Additionally you want to express the assumption that there should be exactly one element in the array. In NGS it’s simple:

my_val = my_arr.the_one()

the_one() will return the only element or will throw an exception if there is not exactly one element in the given array.

Get the Only Matching Element in an Array

You have an array. Only one element should satisfy some condition. You want to access that element. Again, there is a straightforward way to express this in NGS:

my_instance = instances.the_one({"InstanceId": my_id})

The use of the_one(...) here is again about the assumption that there should be exactly one instance with the given instance id in the instances array. Exception will be thrown by the_one(...) if that is not the case.

Update 2020-08-08: As pointed out, the method is not unique to NGS.


Happy coding and have a nice week!

“Use Dumb Shell, don’t Reinvent the Wheel”

Opening Rant

You don’t hear one developer saying “Just use Notepad” to a colleague with argumentation that goes roughly like this:

Why are you using this horrible Visual Studio Code? It has built-in debugger! No!

JetBrains IDEs? No! They do too much! They are so into the code!

Vim? Emacs? Not pure enough! Who needs that stupid syntax highlighting?

Keep text editing pure! Any semantic understanding by the text editor is undesirable, other programs should handle that. You don’t want to complicate the text editor.

Developers are not saying that because user experience and productivity matter. Yet, “Use Dumb Shell” is considered to be an acceptable opinion. Is that so common that people fall on their heads so hard (alternatively, did not give it any thought)? WTF?

The solution (shell) should be as simple as possible but not simpler than possible. Current shells are simpler than required by good user experience. Wrong trade-off. Keeping something simple is important but not more important than the outcomes.

Source: https://www.flickr.com/photos/toddle_email_newsletters/15413603567/
Image is a link to http://www.workcompass.com/

Additional food for thought:

  1. Why use a car when bicycle is so much simpler?
  2. Why use electricity when fire is so much simpler?
  3. Why have water in your house when a wells are so much simpler?

Background

I was doing consulting. The usual suspects: AWS, bash, Python, Puppet, Chef. Got to Terraform later. I had and I am still having subpar experiences with these tools. Anything I wanted to do, was overly burdensome, complicated and full of pitfalls.

Since I can’t attempt to fix everything, I picked the worst offender and started working on the alternative programming language and shell combo. The motivating opinion is that Ops have no good programming language nor adequate shell.

The absence of good programming language for ops was covered in another post. In this post I will cover some of the things that are wrong with the interactive shell.

The Shell

The dominant player is bash. It didn’t change much for decades: you type commands and get a dump of text on your screen. Most of the alternatives are essentially the same in this regard, for decades.

Is this because of the brilliant design? I would ask: which design? This? Quoting:

I wrote quite complex shell scripts and my first suggestion is “don’t”. The reason is that is fairly easy to make a small mistake that hinders your script, or even make it dangerous.

The “Dumb Shell” Approach

In this post I would like to address common thought that I hear from people regarding Next Generation Shell, a new programming language and a shell that I’m working on. Note that other shells which are more advanced than POSIX shells also get this. Quoting @cup from lobste.rs:

Wouldnt it be better to just have a dumb shell, that can call programs to do heavy lifting (read: programming languages). This way you have a “division of labor”. Shell works best for launching executables, and programming languages work best for handling data structures and algorithms.

No, it would not. I refuse to accept under-powered tools.

Dumbness is Fundamental Flaw

The “dumb shell” has no semantic understanding and doesn’t care about programs’ inputs nor outputs. Let’s see how it plays out.

Today, “Understanding” of programs’ inputs is covered by completion. Completion was added because “dumb shell” had horrible user experience. It’s slightly better now when the shell “understands” programs’ input to some degree. To some people completion is a scope creep. I think of it as better user experience and productivity gain.

“Understanding” of programs’ outputs? We are not there yet. It also seems that interacting with objects on the screen is too novel of an idea for the shell. Considering how much time this idea is out there: WTF?

Let’s see how this “dumbness” manifests as bad user experience even at the very basic, “intended” functionality:

Programs’ Output – Size

Do you know of any real world scenario when a human supposed to go over 10K lines on the screen? I mean just sit there and read it. Let me know. I’ve never seen such use case.

The shell is dumb, the shell “does not intervene” in programs’ outputs. Sounds good until you get unlimited number of lines dumped on your screen.

“Should have used less” you think later. Right. What if you forgot? The buffer is now filled with useless output and you can’t see outputs of previous programs. Are you being punished? No, just nobody cared about the UX. Alternatively, “it would be to complicated to implement”.

Programs’ Output – All Mixed

  1. Want to know what’s on your screen is stdout and what is stderr? Well… you can’t. Your shell is dumb, it doesn’t deal with things like that.
  2. Want to know from which program the output came from? Nope. Some programs cope with that to some degree by prepending their name to error messages: ls xxx gives you ls: xxx: No such file or directory. What a wonderful strategy! Keep the shell dumb and push the burden to all the programs.
  3. You can’t type because some background job is continuing to dump text on the screen where you are trying to work? Too bad, should have used redirection because guess what … you shell doesn’t handle that either… and you can’t add redirection after the program is running; again not shell’s business.

Programs’ Output – Semantic Understanding

You just typed aws ec2 describe instances --filters ... and now you have some output.

You now see on your screen instance you would like to stop. The ID of the instance is right in front of your face. Now you type aws ec2 stop-instances --instance-ids. You would like to append the instance ID that you see on the screen. Nope. Your shell doesn’t do that. Too dumb. Select with the mouse and paste, because f*ck you!

Side note: amazing AWS engineers did not include any human readable output format so you get JSON dumped on your screen (or any other format which is still non-human-compatible).

Let’s imagine for a moment that the command output had some semantic meaning to the shell.

  1. The shell would display the output as a table.
  2. The table would be interactive (interactive output, what a heresy!) and one could navigate with arrow keys and have a shortcut for copy/paste the current cell value to the command line (for completion).
  3. You could interact with the objects in the table with the mouse (very new concept, another heresy for the shell).
  4. How about instead of typing aws ec2 stop-instances --instance-ids you navigate to the correct line, press enter, choose “stop” from the menu and the command is constructed for you? aws ec2 stop-instances --instance-ids i-123... amazing, ha? Well, your shell can’t do that.

Meaning, do you speak it mo***er?

How about after performing operations using the UI you would get as per your choice one of the below snippets which would re-create the operation:

  1. CLI commands
  2. CloudFormation tempalte
  3. Terraform “code”

Solution: UI for the Shell

Suppose I agree for a second, what do you suggest?

https://github.com/ngs-lang/ngs/wiki/UI-Design

I personally don’t see how the described features could be implemented as external programs, keeping the shell “dumb”.

We Can Do Better Today

The reality has changed. What was once amazing is subpar by today’s standards. The world outside of the shell moved forward while the shell stayed almost the same. Brilliant design? Brilliant what?

Let’s move this industry together from the stone age of bash shell to the bronze age of something a bit less subpar – Next Generation Shell.

Closing Rant

Imaginary UNIX people:

We wanted to separate things because they are semantically different so we split the things into stdout and stderr. Well… stderr was is actually for everything that is not stdout.

One bit of metadata (stdout vs stderr) for semantic meaning of the output should be enough for everyone forever. Well… at least it’s simple for us to implement.


Update: discussion on lobste.rs

Section Syntax – Next Generation Shell

Problem

Using comments to denote code sections feels like subpar solution.

One starts with something like the following:

// workaround for API stupidity
if(result === null) {
  result = [];
}

Then somebody adds another bit so it becomes:

// workaround for API stupidity
if(result === null) {
  result = [];
}
if(result === [1]) {
  foo();
}

Now you are not sure whether the second if is still workaround. You don’t want that. What I usually do in this situation and recommend to others is clearly mark start and end:

// workaround for API stupidity - start
if(result === null) {
  result = [];
}
// workaround for API stupidity - end

if(result === [1]) {
  foo();
}

Now you have duplicated text and subpar programming experience.

Solution

Today (2019-10-21) I have added section syntax (to dev branch) to the language I am working on, Next Generation Shell. I think it solves the problem in a clean way, consistent with syntax and semantics of the language:

section "workaround for API stupidity" {
  if result is Null {
    result = []
  }
}

Or:

result = section "Use algorithm X to calculate blah" {
  a = 1
  b = 2
  a + b
}

In future, for programmer’s convenience backtraces could be augmented with sections’ names.

Update: discussion

  1. https://www.reddit.com/r/ProgrammingLanguages/comments/dkzcls/section_syntax_next_generation_shell/
  2. https://lobste.rs/s/gert97/section_syntax_next_generation_shell

Have a nice week!

On Information Loss in Software

“Information Loss” is a way to look at the world. The topic is very broad. This blog post will focus on information loss during development and operation of computer software.

This post discusses why Information Loss is bad and gives some examples.

My hope is that after reading this post, you will be able to spot information loss more easily. This should help you avoiding information loss, eliminating the need for costly information recovery phase. Some examples include specific recommendations how to avoid that particular case of information loss.

Information Loss Definition

Information Loss for the purposes of this blog is the situation where information I is available and is easily accessible at point in time t1 but later, when it’s needed at point in time t2, it is either not available or not easily accessible.

The post will present various categories of information loss with examples. The list is not exhaustive; it’s not meant to be. The intention is to give some examples to help you get the feel and start looking at things from the information loss perspective.

Why Information Loss is Bad?

In many cases of Information Loss, the missing information can be recovered but that requires resources to be thrown at the issue (time and/or money). That is the situation I would like to help you to avoid.

Between the Head and the Code

When working on software, the first place the information loss occurs is when the programmer translates thoughts into code. Information loss at this stage will manifest itself as increased WTF-per-minute during code review or just code reading. Each time the code is read, there will be additional cognitive load while the reader reconstructs the programmer’s idea behind the code.

I have identified two main causes for information loss at the head-to-code stage:

  • Programmer’s fault
  • Programming language imposed

Information Loss due to Programmer’s Fault

The more a programmer is experienced, the less likely is the occurrence of information loss at this stage.

Misnamed Variable

In programmers head: number of servers running the ETL task. Name of the variable in the code: n. WTFs at code review – guaranteed.

Misnamed Function

I’m pretty sure getUser() should not update say last name of the user in database. Such naming is criminal but unfortunately I’ve seen code similar to that.

Use of Magic Numbers

if (result == 126) .... The person who wrote 126 knew what that number means. The person reading the code will need to spend time checking what that number means. One should use constants or enums instead: if (result == NOT_EXECUTABLE) ....

Missing Comments in Code

Most important comments are about why something is being done as opposed to how. If ones code is in a high-level language and of a good quality, it’s a rare occasion one needs to comment about what or how something is being done. On the other hand comments like “Working around API bug: it returns false instead of empty array” are very valuable.

Incorrect Usage of Data Types

A list of people, for example, is not just a list. It has semantic meaning. It’s much easier to understand a program when correct types are used for the data. Java has generics to convey such information, for example List<Person>. Some other languages have type systems that are powerful enough to convey such information too.

Programming Language Imposed Information Loss

Limitations of programming languages lead to less expressive code because the idea in programmer’s head can not be expressed in a straightforward manner. The readers of the code will struggle more (read waste time) to understand the code.

Unnamed Function Parameters

bash and perl5 (not sure about perl5 anymore, there was something experimental) do not have the syntax for specifying function parameter names. This makes the code less expressive. Sometimes programmers will do “the right thing”:

myfunc() {
    local target_file=$1
    ...
}

… but when they don’t, you finish with unnamed parameter, wondering what it could mean:

myfunc() {
    if [[ -f $1 ]];then
        ...
    fi
}

Is that a file to generate or a source file? You don’t know, you have to read on in myfunc hoping for the answer.

Recommendation: even if your language does not support named parameters, emulate them.

Expansion of Strings into Several Arguments (bash)

rm $x

Does that remove one file or several? What the programmer meant? You simply don’t know. It depends on the contents of x, which is typically split into arguments by spaces. You are lucky if you can deduce from the variable name whether it’s one or several files.

From today’s perspective this is just bad design. Back at the day I guess it was the most practical way to implement arrays.

Recommendation: use one of the two alternatives blow and do not use rm $x form.

  • Single file: rm "$x" (proper quoting)
  • Multiple files: rm "${my_files[@]}" (bash arrays)

Side note: this “feature” caused so much pain over the years when x would contain a spaces by accident. Even when x is meant to be used as an array, elements of that array can also contain spaces by accident.

Error Handling

In languages that do not support exceptions (bash, C, Go), the programmer is forced into one of two situations:

  • Write incorrect code that ignores the errors (on purpose or by mistake, go figure which one)
  • Write verbose code that handles the errors. When the code handles every possible error, it becomes cluttered with error handling and it takes more time to understand the code. That’s the case where information loss occurs because the reader is overwhelmed by the code.

In NGS, since typical use case is scripting, I wanted to have the option for the code to be concise. That rules out returning status code along with the result because the caller is then forced to check it. It does make more sense for NGS to have exceptions and for scripts to decide whether to catch them or let the whole script terminate with error because of an uncaught exception.

Unordered Hash/Map/dict Data Structure

Hash data structure is implemented in a non-order-preserving manner in some languages. That means that the programmer can not express the intention freely in situations where the order of key/value pairs is important. That pushes towards less readable code as the programmer fights the language by implementing his/her own ordered dictionary.

Information loss in this case is again losing the sight of programmer’s intention.

Fortunately many modern languages solved the issue by now:

Recommendation: check whether your language has the data structure you really want to use, either built-in or in a library.

Limited Data Structures (bash)

Working with data structures in bash results more or less convoluted code, depending on the data structures one need to work with. This is direct consequence of bash supporting exactly three data structures:

  • Scalar (strings which can sometimes be treated as numbers or arrays)
  • Array
  • Associative array

These data structures can not be nested.

The result is much less readable code where the original intent of the author is harder to recover as opposed to data manipulation in other popular languages (Python, Ruby, etc).

Recommendation: consider using other languages besides bash for heavy data manipulation code.

Absence of non-nullable Types

In some languages there is no straightforward way to specify non-nullable parameters. The programmers are then required to check whether each passed parameter is null. That results more boilerplate code. Let’s look at the following bit of Java code from the popular Apache Flink project:

// flink/flink-java/src/main/java/org/apache/flink/api/java/DataSet.java

protected DataSet(ExecutionEnvironment context, TypeInformation<T> typeInfo) {
    if (context == null) {
        throw new NullPointerException("context is null");
    }
    if (typeInfo == null) {
        throw new NullPointerException("typeInfo is null");
    }

    this.context = context;
    this.type = typeInfo;
}

Asynchronous Computing Model (JavaScript)

In JavaScript for example, progressively more readable code uses:

Again, information loss occurs when programmer’s intention is lost in the code because the code looks like a big struggle against asynchronicity and the language.

Recommendation: prefer async/await over Promises and prefer Promises over callbacks.

Loss of semantic information (JavaScript)

console.log() vs debug('my-module')('my message') in JavaScript. When a programmer chooses to use log() instead of debug(), loss of semantic information occurs. In this case it means more effort in finding the needed information in the output as opposed to simpler turning on and off the relevant debug sections.

Recommendation: use the debug module.

Information Loss at Runtime

Information loss at runtime will manifest as harder debugging.

Empty Catch Clause

This is borderline criminal. Except for very few cases when empty catch clause is really appropriate, by placing empty catch clause in the code, you are setting up a bomb for your colleagues. They will pay with their time, tears and mental health, not to mention they will be hating you. Where is the information loss? At the time the exception is generated, there is useful information about what happened. Empty catch clause loses that information. Result: hard to find exceptions and their causes.

In NGS, there are clear ways to express that you didn’t just forgot to handle the exception (try ... catch(e) { }) but you actually don’t care (or know exactly) what happened:

  • try EXPR without the catch clause at all. If EXPR throws exception, try EXPR evaluates to null, otherwise evaluates to EXPR.
  • EXPR tor DFLT if EXPR throws an exception, evaluates to DFLT, otherwise evaluates to EXPR.

Writing to stdout Instead of stderr

stdout has semantic meaning (result of the computation) and stderr also has semantic meaning (errors description). It will make harder for any wrapper script to deal with a program that outputs errors to stdout or outputs the result to stderr. The semantic information about the text is lost and then needs to be recovered by the caller if the two outputs are mixed.

Wrong exit codes reporting

This one really hinders automation.

if ... then {
    ...
    error("error occurred")
    exit(0) # incorrect error code reported
}

Since it’s easy to forget about exit code, and the common case is that exit() means abnormal termination of the program, in NGS exit() that does not provide an exit code defaults to exit code 1.

Wrong exit codes handling

if [ -e MY_FILE ] ...

This is all over bash scripts… and it’s wrong. Which exit codes [ program/built-in returns? Zero for “yes”, one for “no”, and two for “An error occurred”. Guess what. You can’t handle three distinct cases with two if branches; “An error occurred” is causing the “false” branch of the if to be taken. If you are lucky, you will spot error message on stderr. If you are not lucky, your script will just work incorrectly in some circumstances.

At this point the tradeoff in NGS was made in favor of correctness, not simplicity. if $(test -e MY_FILE) ... in NGS can go three ways: “yes” branch, “no” branch and an exception. After any external process is finished, NGS checks the exit code. For unknown process, non-zero exit code cases an exception. For test and a few others, zero and one are not causing an exception. The exit code checking facility is extensible and one can easily “teach” NGS about new programs.

Broaden your Horizon – Extras

I’ll mention here non-strictly software development related information loss cases.

Untagged Cloud Resources (AWS)

Have you just created an EC2 instance and named it Server or maybe you haven’t tagged it at all? Congratulations, semantic information has just been lost. You colleagues will strugle to understand what is the role of instance.

Recommendation: rigorously tag the resources, have alerts for untagged or improperly tagged resources. In AWS you can also know who created the resource by looking at CloudTrail.

Side note: In Azure, any resource must belong to a “Resource Group” which makes it much easier to track the resources.

GUI

You just performed operation in GUI. The information of what happened was just lost the minute you performed the operation. Good luck reproducing or documenting it.

The plan to combat this in NGS is to have textual representation for each operation that is performed via GUI.

String Concatenation

Every time two strings are concatenated into one, there is some information loss.

Recommendation: instead of parsing unstructured text (result of concatenation) later, consider using structured data format when producing the output. (Example: JSON).


Hope that helps. Have fun!

What I did not steal from Perl 6

I’m curious about programming languages. Not because I’m creating one right now. I always was. This post is about ideas and features that I have seen in Perl 6 and found interesting. If you are curious about programming languages in general, you should take a look at these.

There are various reasons for not stealing the interesting ideas from Perl 6:

  1. I’m trying to keep number of concepts in NGS as small as possible. If I’m not seeing huge immediate value in a concept – I skip it.
  2. Not taking anything that I think can confuse me or other programmers. I’m not talking here because someone is a beginner. I’m talking about confusing concepts.
  3. Simply because I don’t have enough resources to implement it at the moment.

Here are the interesting Perl 6 features, in no particular order (except the first one). There are also my comments whether I would like the feature in NGS or why not.

  1. Syntax. Very expressive an terse. Perl6 has even more of it than Perl 5. Now that we got rid of the $ and friends in the room:
  2. Grammars. Would actually be nice to have something like that in NGS.
  3. Lots of operators. The most interesting concept is Metaoperators. I’m trying to keep the amount of syntax elements in NGS relatively low. There are already two syntaxes in NGS: commands and expressions. Not taking more syntax without serious need.
  4. How the “pointy block” syntax mixes with “for” syntax: for @list -> @element . NGS already has several syntaxes for Lambdas.
  5. Flow control
    1. when” flow control. The closest NGS has is “cond” and friends, stolen from Lisp.
    2. repeat while / repeat until . It would be nice to have something like that in NGS.
    3. once . Not sure about this one. The functionality might be needed.
  6. Slips. The behaviour is frightening me: if it does expand, how do I pass a Slip if I just want to pass it, say as an item of an array? NGS uses syntax for slips: [1, 2, *myitems, 3, 4] which I think is cleaner. You know you can’t pass it because it’s syntax.
  7. .WHAT method. I stole something similar from Ruby: the inspect method.

As a special note, I have seen a welcome change from $arr[0] to @arr[0] . I think it removes confusion. (That was Perl 5 vs Perl 6).

Please don’t be offended if you are a Perl 6 hacker and you see that there is amazing feature that I have not mentioned. It could be that I’ve seen this in several other languages already or maybe I did not find it interesting or … maybe I just missed it. Don’t hesitate to leave a comment anyway.


Happy coding, in whatever language rocks your boat! Except for bash. Coding in bash will never be happy.

List of JSON tools for command line

I am considering making a JSON parsing and generating command line tool. Started with looking around a bit. Below is a list of existing JSON command line tools. Numbers are [GitHub stars] at the time of adding the entry.

  • jq [11126] – filter, extract, modify and output JSON or text using DSL
  • jid [4426] – “You can drill down JSON interactively by using filtering queries like jq.” (item contributed by /u/Tacticus)
  • gron [4103] – convert JSON or JSON lines (from file/stdin/url) to text (path=value) which can be processed with grep/sed/diff; the tool also supports converting back to JSON after such processing
  • jo [2209] – generate JSON based on command line arguments and stdin; can read data from files and place it as base64 encoded values
  • JSON.sh [1635] – written in shell/gawk; “traverses the JSON objects and prints out the path to the current object (as a JSON array) and then the object, without whitespace”
  • underscore-cli [1588] ‘THE “Swiss Army knife” tool for processing JSON data – can be used as a simple pretty-printer, or as a full-powered JavaScript command-line’. Added on 2019-09-30 following comment from @joeytwiddle.
  • jsawk [1239] – focused primarily on filtering and transforming a list (or an object). Update 2019-09-30: as @joeytwiddle suggested in comment, the project appears to be unmaintained and doesn’t work with recent Node.js versions. Latest commit and latest closed issue are from 2015.
  • json (by trentm) [1218] – “massaging JSON on your Unix command line”; JS-like syntax for extracting values; in-place file editing
  • rq [1007] – awk/sed-like tool for structured data; supports several formats, including JSON
  • TickTick [469] – use JSON syntax directly in bash; “This is just a fun hack”
  • jshon [309] – very CLI-ish way to extract, manipulate and output the data
  • jl [308] – “a tiny functional language for querying and manipulating JSON”; visually reminds Haskell
  • faq [248]. “faq is a tool intended to be a more flexible jq, supporting additional formats. The additional formats are converted into JSON and processed with libjq”. Supports: BSON, Bencode, JSON, TOML, XML, YAML. Added on 2020-10-11.
  • jsonpp [244] – JSON pretty printer (item contributed by /u/ferbass)
  • fx [227] – conveniently run your JS code to manipulate JSON.
  • RecordStream [224] – create, manipulate and output records; supports JSON; Perl-based so grep expressions for example are in Perl.
  • JSON.awk [186] – JSON.sh fork in awk; after fork the projects added different features.
  • jp [184] – “command line interface to JMESPath” (link contributed by Evgeny Zislis)
  • json-command [143] – conveniently manipulate JSON using JS.
  • jsonv.sh [130] – convert JSON to CSV; specify paths in JSON to
  • jgrep (aka “JSON-grep”) [78] – “Command line tool and API for parsing JSON documents” in Ruby (item contributed by /u/tophlammiepie)
  • jello [61]. “Filter JSON and JSON Lines data with Python syntax”. Added on 2020-10-11.
  • jsed [48] – manipulate and extract data; somewhat similar to jsawk in mindset
  • jtbl [21] “A simple cli tool to print JSON data as a table in the terminal.”. Added on 2020-10-11.
  • jayin [10] “Piping with js at terminal”. Added on 2019-09-30 following comment from @joeytwiddle.
  • jsongrep [9] (by dsc) – extract data at given path using shell globs and output one per line
  • jc [2] – “jc is used to JSONify the output of many standard linux cli tools”. Added on 2019-10-29 following comment from Kelly Brazil.
  • jsongrep [0] (by terrycojones) – easily extract data at given path

Honourable mentions

Update 2018-09-10

I’ve added related post in which I argue that jq functionality belongs to a shell.


If you feel that some project is missing from the list, please let me know in comments below.

The missing link of Ops tools

It’s like we went from horse to spaceship, skipping everything in between.

Background

Let’s say you are managing your system in AWS. Amazon provides you with API to do that. What are your options for consuming that API?

Option 1: CLI or library for API access

AWS CLI let’s us access the API from the command line and bash scripts. Python/Ruby/Node.js and other languages can access the API using appropriate libraries.

Option 2: Declarative tools

You declare how the system should look like, the tool figures out dependencies and performs any API calls that are needed to achieve the declared state.

Problem with using CLI or API libraries

Accessing API using CLI or libraries is fine for one off tasks. In many cases, automation is needed and we would like to prepare scripts. Ideally, these scripts would be idempotent (can be run multiple times, converging to the desired state and not ruining it). We then quickly discover how clunky these scripts are:

# Script "original"
if resource_a exists then
  if resource_a_property_p != desired_resource_a_property_p then
    set resource_a_property_p to desired_resource_a_property_p
  end
  if resource_a_property_q != desired_resource_a_property_q then
    ...
  end
else
  # resource_a does not exist
  create resource_a
  set resource_a_property_p to desired_resource_a_property_p
  ...
end
# more chunks like the above

It’s easy to see why you wouldn’t want to write and maintain a script such as above.

How the problem was solved

What happened next: jump to “Option 2”, declarative tools such as CloudFormation, Terraform, etc.

rocket-1374248_640

Other possible solution that never happened

If you have developed any code, you probably know what refactoring is: making the code more readable, deduplicate shared code, factoring out common patterns, etc… without changing the meaning of the code. The script above is an obvious candidate for refactoring, which would be improving “Option 1” (CLI or a library for API access) above, but that never happened.

All the ifs should have been moved to a library and the script could be transformed to something like this:

# Script "refactored"
create_or_update(resource_a, {
  property_p = desired_resource_a_property_p
  property_q = desired_resource_a_property_q
})
# more chunks like the above

One might say that the “refactored” script looks pretty much like input file of the declarative tools mentioned above. Yes, it does look similar; there is a huge difference though.

Declarative tools vs declarative primitives library

By “declarative primitives library” I mean a programming language library that provides idempotent functions to create/update/delete resources. In our cases these resource are VPCs, load balancers, security groups, instances, etc…

Differences of declarative tools vs declarative primitives library

  1. Declarative tools (at least some of them) do provide dependency resolution so they can sort out in which order the resources should be created/destroyed.
  2. Complexity. The complexity of mentioned tools can not be ignored; it’s much higher than one of  declarative primitives library. Complexity means bugs and higher maintenance costs. Complexity should be considered a negative factor when picking a tool.
  3. Some declarative tools track created resources so they can easily be destroyed, which is convenient. Note that on the other hand this brings more complexity to the tool as there must be yet another chunk of code to manage the state.
  4. Interacting with existing resources. Between awkward to impossible with declarative tools; easy with correctly built declarative primitives library. Example: delete all unused load balancers (unused means no attached instances): AWS::Elb().reject(X.Instances).delete()
  5. Control. Customizing behaviour of your script that uses declarative primitives library is straightforward. It’s possible but harder with declarative tools. Trivial if in a programming language can look like count = "${length(var.public_subnets) > 0 ? 1 : 0}" (approved Terraform VPC module).
  6. Ease of onboarding has declarative tools as a clear winner – you don’t have to program and don’t even need to know a programming language, but you can get stuck without knowing it:
  7. Getting stuck. If your declarative tool does not support a property or a resource that you need, you might need to learn a new programming language because the DSL used by your tool is not the programming language of the tool itself (Terraform, Puppet, Ansible). When using declarative primitives library on the other hand you can always either extend it when/if you wish (preferable) or make your own easy workaround.
  8. Having one central place where potentially all resources are described as text (Please! don’t call that code, format is not a code!). It should be easier done with declarative tools. In practice, I think it depends more on your processes and how you work.

As you can see, it’s not black and white, so I would expect both solutions be available so that we, Ops, could choose according to our use case and our skills.

My suggestion

I don’t only suggest to have something between a horse and a spaceship; I work on a car. As part of the Next Generation Shell (a shell and a programming language for ops tasks) I work on declarative primitives library. Right now it covers some parts of AWS. Please have a look. Ideally, join the project.

Next Generation Shell – https://github.com/ilyash/ngs

Feedback

Do you agree that the jump between API and declarative tools was too big? Do you think that the middle ground, declarative primitives approach, would be useful in some cases? Comment here or on Reddit.


Have a nice day!

Why I have no favorite programming language

TL;DR – because for me there is no good programming language.

I’m doing mostly systems engineering tasks. I manage resources in Cloud and on Linux machines mostly. I can almost hear your neurons firing half a dozen names of programming languages. I do realize that they are used by many people for systems engineering tasks:

  • Go
  • Python
  • Ruby
  • Perl
  • bash

The purpose of this post is not to diminish the value of these languages; the purpose is to share why I don’t want to use any of the languages above when I write one of my systems-engineering-task scripts. My hope is that if my points resonate with you, the reader, you might want to help spread the word about or even help with my suggested solution described towards the end.

man-2756206_640

So let’s go over and see why I don’t pick one of the languages:

Why not language X?

All languages

  • Missing smart handling of exit codes of external processes. Example in bash: if test -f my_file (file is not there, exit code 1) vs if test --f my_file (syntax error, exit code 2). If you don’t spot the syntax error with your eyes, everything behaves as if the file does not exist.
  • Missing declarative primitives libraries (for Cloud resources and local resources such as files and users). Correction: maybe found one, in Perl – (R)?ex ; unfortunately it’s not clear from the documentation how close it is to my ideas.

All languages except bash

  • Inconvenient/verbose work with files and processes. Yes, there are libraries for that but there is no syntax for that, which would be much more convenient. Never seen something that could compare to my_process > my_file or echo my_flag > my_file .

Go

  • Compiled
  • Error handling is a must. When I write a small script, it’s more important for me for it to be concise than to handle all possible failures; in many cases I prefer an exception over twice-the-size script. I do understand how mandatory and explicit error handling can be a good thing for larger programs or programs with greater stability requirements.
  • Dependencies problem seem to be unresolved issue

Python

  • Functional programming is second level citizen. In particular list/dictionary comprehension is the Pythonic way while I prefer map and filter. Yes, that’s probably one of the features that makes Python easier to learn and suggested first language. Not everything that’s optimized for beginners must be good for more experienced users. It’s OK.
  • Mixed feelings about array[slice:syntax] . It’s helpful but slice:syntax is only applicable inside [ ] , in other places you must use slice(...) function to create the same slice object

Ruby and Perl

  • The Sigils syntax does not resonate with me.

Ruby

I can’t put my finger on something specific but Ruby does not feel right for me.

Perl

  • Contexts and automatic flattening of lists in some cases make the language more complicated than it should.
  • Object orientation is an afterthought.
  • Functions that return success status. I prefer exceptions. Not the default behaviour in Perl but an afterthought: autodie.
  • Overall syntax feeling (strictly matter of personal taste).

bash

Note that bash was created in a world that was vastly different from the world today: different needs, tasks, languages to take inspiration from.

  • Missing data structures (flat arrays and hashes is not nearly enough). jq is a workaround, not a solution in my eyes.
  • Awkward error handling with default of ignoring the errors completely (proved to be bad idea)
  • Expansion of undefined variable to empty string (proved to be bad idea)
  • -e ,  -u and other action at a distance options.
  • Unchecked facts but my feelings:
    • When bash was created, it was not assumed that bash will be used for any complex scripting.
    • bash was never “designed” as a language, started with simple commands execution and other features were just bolted on as time goes by while complete redesign and rewrite were off the table, presumably for compatibility.
  • Syntax
  • No widely used libraries (except few for init scripts) and no central code repository to search for modules (Correct me if I’m wrong here. I haven’t heard of these).

My suggested solution

I would like to fill the gap. We have systems-engineering-tasks oriented language: bash. We have quite a few modern programming languages. What we don’t have is a language that is both modern and systems-engineering-tasks oriented. That’s exactly what I’m working on: Next Generation Shell. NGS is a fully fledged programming language with domain specific syntax and features. NGS tries to avoid the issues listed above.

Expected questions and my answers

People work with existing languages and tools. Why do you need something else?

  • I assume I have lower bullshit tolerance than many others. Some people might consider it to be normal to build more and more workarounds (especially around anemic DSLs) where I say “fuck this tool, I would already finish the task without it (preferably using appropriate scripting language)”. I don’t blame other people for understandable desire to work with “standard” tools. I think it’s not worth it when the solutions become too convoluted.
  • I am technically able to write a new programming language that solves my problems better than other languages.

Another programming language? Really? We have plenty already.

  • I would like to remind you that most of the programming languages were born out of dissatisfaction with existing ones.
  • Do you assume that we are at global maximum regarding the languages that we have and no better language can be made?

Feedback

Would you use NGS? Which features it must have? What’s the best way to ease the adoption? Please comment here, on Reddit (/r/bash , /r/ProgrammingLanguages) or on Hacker News.


Update: following feedback roughly of the form “Yes, I get that but many Ops tasks are done using configuration management tools and tools like CloudFormation and Terraform. How NGS compares to these tools” – there will be a blog post comparing NGS to the mentioned tools. Stay tuned!


Have a nice day!