“Use Dumb Shell, don’t Reinvent the Wheel”

Opening Rant

You don’t hear one developer saying “Just use Notepad” to a colleague with argumentation that goes roughly like this:

Why are you using this horrible Visual Studio Code? It has built-in debugger! No!

JetBrains IDEs? No! They do too much! They are so into the code!

Vim? Emacs? Not pure enough! Who needs that stupid syntax highlighting?

Keep text editing pure! Any semantic understanding by the text editor is undesirable, other programs should handle that. You don’t want to complicate the text editor.

Developers are not saying that because user experience and productivity matter. Yet, “Use Dumb Shell” is considered to be an acceptable opinion. Is that so common that people fall on their heads so hard (alternatively, did not give it any thought)? WTF?

The solution (shell) should be as simple as possible but not simpler than possible. Current shells are simpler than required by good user experience. Wrong trade-off. Keeping something simple is important but not more important than the outcomes.

Source: https://www.flickr.com/photos/toddle_email_newsletters/15413603567/
Image is a link to http://www.workcompass.com/

Additional food for thought:

  1. Why use a car when bicycle is so much simpler?
  2. Why use electricity when fire is so much simpler?
  3. Why have water in your house when a wells are so much simpler?


I was doing consulting. The usual suspects: AWS, bash, Python, Puppet, Chef. Got to Terraform later. I had and I am still having subpar experiences with these tools. Anything I wanted to do, was overly burdensome, complicated and full of pitfalls.

Since I can’t attempt to fix everything, I picked the worst offender and started working on the alternative programming language and shell combo. The motivating opinion is that Ops have no good programming language nor adequate shell.

The absence of good programming language for ops was covered in another post. In this post I will cover some of the things that are wrong with the interactive shell.

The Shell

The dominant player is bash. It didn’t change much for decades: you type commands and get a dump of text on your screen. Most of the alternatives are essentially the same in this regard, for decades.

Is this because of the brilliant design? I would ask: which design? This? Quoting:

I wrote quite complex shell scripts and my first suggestion is “don’t”. The reason is that is fairly easy to make a small mistake that hinders your script, or even make it dangerous.

The “Dumb Shell” Approach

In this post I would like to address common thought that I hear from people regarding Next Generation Shell, a new programming language and a shell that I’m working on. Note that other shells which are more advanced than POSIX shells also get this. Quoting @cup from lobste.rs:

Wouldnt it be better to just have a dumb shell, that can call programs to do heavy lifting (read: programming languages). This way you have a “division of labor”. Shell works best for launching executables, and programming languages work best for handling data structures and algorithms.

No, it would not. I refuse to accept under-powered tools.

Dumbness is Fundamental Flaw

The “dumb shell” has no semantic understanding and doesn’t care about programs’ inputs nor outputs. Let’s see how it plays out.

Today, “Understanding” of programs’ inputs is covered by completion. Completion was added because “dumb shell” had horrible user experience. It’s slightly better now when the shell “understands” programs’ input to some degree. To some people completion is a scope creep. I think of it as better user experience and productivity gain.

“Understanding” of programs’ outputs? We are not there yet. It also seems that interacting with objects on the screen is too novel of an idea for the shell. Considering how much time this idea is out there: WTF?

Let’s see how this “dumbness” manifests as bad user experience even at the very basic, “intended” functionality:

Programs’ Output – Size

Do you know of any real world scenario when a human supposed to go over 10K lines on the screen? I mean just sit there and read it. Let me know. I’ve never seen such use case.

The shell is dumb, the shell “does not intervene” in programs’ outputs. Sounds good until you get unlimited number of lines dumped on your screen.

“Should have used less” you think later. Right. What if you forgot? The buffer is now filled with useless output and you can’t see outputs of previous programs. Are you being punished? No, just nobody cared about the UX. Alternatively, “it would be to complicated to implement”.

Programs’ Output – All Mixed

  1. Want to know what’s on your screen is stdout and what is stderr? Well… you can’t. Your shell is dumb, it doesn’t deal with things like that.
  2. Want to know from which program the output came from? Nope. Some programs cope with that to some degree by prepending their name to error messages: ls xxx gives you ls: xxx: No such file or directory. What a wonderful strategy! Keep the shell dumb and push the burden to all the programs.
  3. You can’t type because some background job is continuing to dump text on the screen where you are trying to work? Too bad, should have used redirection because guess what … you shell doesn’t handle that either… and you can’t add redirection after the program is running; again not shell’s business.

Programs’ Output – Semantic Understanding

You just typed aws ec2 describe instances --filters ... and now you have some output.

You now see on your screen instance you would like to stop. The ID of the instance is right in front of your face. Now you type aws ec2 stop-instances --instance-ids. You would like to append the instance ID that you see on the screen. Nope. Your shell doesn’t do that. Too dumb. Select with the mouse and paste, because f*ck you!

Side note: amazing AWS engineers did not include any human readable output format so you get JSON dumped on your screen (or any other format which is still non-human-compatible).

Let’s imagine for a moment that the command output had some semantic meaning to the shell.

  1. The shell would display the output as a table.
  2. The table would be interactive (interactive output, what a heresy!) and one could navigate with arrow keys and have a shortcut for copy/paste the current cell value to the command line (for completion).
  3. You could interact with the objects in the table with the mouse (very new concept, another heresy for the shell).
  4. How about instead of typing aws ec2 stop-instances --instance-ids you navigate to the correct line, press enter, choose “stop” from the menu and the command is constructed for you? aws ec2 stop-instances --instance-ids i-123... amazing, ha? Well, your shell can’t do that.

Meaning, do you speak it mo***er?

How about after performing operations using the UI you would get as per your choice one of the below snippets which would re-create the operation:

  1. CLI commands
  2. CloudFormation tempalte
  3. Terraform “code”

Solution: UI for the Shell

Suppose I agree for a second, what do you suggest?


I personally don’t see how the described features could be implemented as external programs, keeping the shell “dumb”.

We Can Do Better Today

The reality has changed. What was once amazing is subpar by today’s standards. The world outside of the shell moved forward while the shell stayed almost the same. Brilliant design? Brilliant what?

Let’s move this industry together from the stone age of bash shell to the bronze age of something a bit less subpar – Next Generation Shell.

Closing Rant

Imaginary UNIX people:

We wanted to separate things because they are semantically different so we split the things into stdout and stderr. Well… stderr was is actually for everything that is not stdout.

One bit of metadata (stdout vs stderr) for semantic meaning of the output should be enough for everyone forever. Well… at least it’s simple for us to implement.

Update: discussion on lobste.rs

JQ is a symptom

jq is a great tool. It does what bash can not – work with structured data. I use it. I would like not to use it.

In my opinion, working with structured data is such a basic thing that it makes much more sense to be handled by the language itself. I want my shell to be capable and I strongly disagree with the view that a shell “is not supposed to do that”. Shell is supposed to do whatever is needed to make my life easier. Handling structured data is one of these things.

If “shell is not supposed to do that”, by that logic, bash is not supposed to do anything except for running external commands and routing the data between them. Doesn’t it seem odd that bash does have builtin string manipulation then? Maybe bash shouldn’t have added associative arrays in version 4? … or arrays in version 2? How about if and while ? Maybe bash shouldn’t have them either?


jq is a symptom that bash can’t handle today’s reality: structured data. The world is increasingly more about APIs. APIs consume and return structured data. I do work with APIs from shell. Don’t you guys use AWS CLI or any other API that returns JSON?

The reality has changed. bash hasn’t. I’m working on bash alternative. Please help me with it. Or at least spread the word.

If you don’t like my project, join Elvish . Elvish is another shell that supports structured data.

Happy coding! Hope it’s not in bash.

Bash pitfall: if test, if [, if [[

I bet you’ve seen a lot of scripts with seemingly innocent if [ -e blah ];then ...; else ...; fi or something similar . What’s the problem? if has at most two branches while test , [ and [[ have three different exit codes. Oops.

If you make a syntax error (or any other error occurs) in the test , [ or [[ expression, it will return the exit code 2 (or above, according to man test​). if will take the else branch. If you are lucky, you will notice the error message from the test, [ or [[ commands. If not, the else branch will always be executed.

I don’t want to use bash. The pitfall above is one of the many reasons. Unfortunately, I do use bash because it’s still best tool for some tasks. I’m working on alternative to bash. It’s called NGS, the Next Generation Shell. In NGS, the situation above is solved as one would expect from a modern programming language: exit codes 2 and above throw exception.

If you also think that there should be a viable alternative to bash, you are welcome to help me working on it.

Happy coding! Hope it’s not in bash 🙂

Why I have no favorite programming language

TL;DR – because for me there is no good programming language.

I’m doing mostly systems engineering tasks. I manage resources in Cloud and on Linux machines mostly. I can almost hear your neurons firing half a dozen names of programming languages. I do realize that they are used by many people for systems engineering tasks:

  • Go
  • Python
  • Ruby
  • Perl
  • bash

The purpose of this post is not to diminish the value of these languages; the purpose is to share why I don’t want to use any of the languages above when I write one of my systems-engineering-task scripts. My hope is that if my points resonate with you, the reader, you might want to help spread the word about or even help with my suggested solution described towards the end.


So let’s go over and see why I don’t pick one of the languages:

Why not language X?

All languages

  • Missing smart handling of exit codes of external processes. Example in bash: if test -f my_file (file is not there, exit code 1) vs if test --f my_file (syntax error, exit code 2). If you don’t spot the syntax error with your eyes, everything behaves as if the file does not exist.
  • Missing declarative primitives libraries (for Cloud resources and local resources such as files and users). Correction: maybe found one, in Perl – (R)?ex ; unfortunately it’s not clear from the documentation how close it is to my ideas.

All languages except bash

  • Inconvenient/verbose work with files and processes. Yes, there are libraries for that but there is no syntax for that, which would be much more convenient. Never seen something that could compare to my_process > my_file or echo my_flag > my_file .


  • Compiled
  • Error handling is a must. When I write a small script, it’s more important for me for it to be concise than to handle all possible failures; in many cases I prefer an exception over twice-the-size script. I do understand how mandatory and explicit error handling can be a good thing for larger programs or programs with greater stability requirements.
  • Dependencies problem seem to be unresolved issue


  • Functional programming is second level citizen. In particular list/dictionary comprehension is the Pythonic way while I prefer map and filter. Yes, that’s probably one of the features that makes Python easier to learn and suggested first language. Not everything that’s optimized for beginners must be good for more experienced users. It’s OK.
  • Mixed feelings about array[slice:syntax] . It’s helpful but slice:syntax is only applicable inside [ ] , in other places you must use slice(...) function to create the same slice object

Ruby and Perl

  • The Sigils syntax does not resonate with me.


I can’t put my finger on something specific but Ruby does not feel right for me.


  • Contexts and automatic flattening of lists in some cases make the language more complicated than it should.
  • Object orientation is an afterthought.
  • Functions that return success status. I prefer exceptions. Not the default behaviour in Perl but an afterthought: autodie.
  • Overall syntax feeling (strictly matter of personal taste).


Note that bash was created in a world that was vastly different from the world today: different needs, tasks, languages to take inspiration from.

  • Missing data structures (flat arrays and hashes is not nearly enough). jq is a workaround, not a solution in my eyes.
  • Awkward error handling with default of ignoring the errors completely (proved to be bad idea)
  • Expansion of undefined variable to empty string (proved to be bad idea)
  • -e ,  -u and other action at a distance options.
  • Unchecked facts but my feelings:
    • When bash was created, it was not assumed that bash will be used for any complex scripting.
    • bash was never “designed” as a language, started with simple commands execution and other features were just bolted on as time goes by while complete redesign and rewrite were off the table, presumably for compatibility.
  • Syntax
  • No widely used libraries (except few for init scripts) and no central code repository to search for modules (Correct me if I’m wrong here. I haven’t heard of these).

My suggested solution

I would like to fill the gap. We have systems-engineering-tasks oriented language: bash. We have quite a few modern programming languages. What we don’t have is a language that is both modern and systems-engineering-tasks oriented. That’s exactly what I’m working on: Next Generation Shell. NGS is a fully fledged programming language with domain specific syntax and features. NGS tries to avoid the issues listed above.

Expected questions and my answers

People work with existing languages and tools. Why do you need something else?

  • I assume I have lower bullshit tolerance than many others. Some people might consider it to be normal to build more and more workarounds (especially around anemic DSLs) where I say “fuck this tool, I would already finish the task without it (preferably using appropriate scripting language)”. I don’t blame other people for understandable desire to work with “standard” tools. I think it’s not worth it when the solutions become too convoluted.
  • I am technically able to write a new programming language that solves my problems better than other languages.

Another programming language? Really? We have plenty already.

  • I would like to remind you that most of the programming languages were born out of dissatisfaction with existing ones.
  • Do you assume that we are at global maximum regarding the languages that we have and no better language can be made?


Would you use NGS? Which features it must have? What’s the best way to ease the adoption? Please comment here, on Reddit (/r/bash , /r/ProgrammingLanguages) or on Hacker News.

Update: following feedback roughly of the form “Yes, I get that but many Ops tasks are done using configuration management tools and tools like CloudFormation and Terraform. How NGS compares to these tools” – there will be a blog post comparing NGS to the mentioned tools. Stay tuned!

Have a nice day!

NGS unique features – exit code handling


How other languages treat exit codes?

Most languages that I know do not care about exit codes of processes they run. Some languages do care … but not enough.

Update / Clarification / TL;DR

  1. Only NGS can throw exceptions based on fine grained inspection of exit codes of processes it runs out of the box. For example, exit code 1 of test will not throw an exception while exit code 1 of cat will throw an exception by default. This allows to write correct scripts which do not have explicit exit codes checking and therefore are smaller (meaning better maintainability).
  2. This behaviour is highly customizable.
  3. In NGS, it is OK to write if $(test -f myfile) ... else ... which will throw an exception if exit code of test is 2 (test expression syntax error or alike) while for example in bash and others you should explicitly check and handle exit code 2 because simple if can not cover three possible exit codes of test (zero for yes,  one for no, two for error). Yes, if /usr/bin/test ...; then ...; fi in bash is incorrect! By the way, did you see scripts that actually do check for three possible exit codes of test? I haven’t.
  4. When -e switch is used, bash can exit (somewhat similar to uncaught exception) when exit code of a process that it runs is not zero. This is not fine grained and not customizable.
  5. I do know that exit codes are accessible in other languages when they run a process. Other languages do not act on exit codes with the exception of bash with -e switch. In NGS exit codes are translated to exceptions in a fine grained way.
  6. I am aware that $? in the examples below show the exit code of the language process, not the process that the language runs. I’m contrasting this to bash (-e) and NGS behaviour (exception exits with non-zero exit code from NGS).

Let’s run “test” binary with incorrect arguments.


> perl -e '`test a b c`; print "OK\n"'; echo $?
test: ‘b’: binary operator expected


> ruby -e '`test a b c`; puts "OK"'; echo $?
test: ‘b’: binary operator expected


> python
>>> import subprocess
>>> subprocess.check_output(['test', 'a', 'b', 'c'])
... subprocess.CalledProcessError ... returned non-zero exit status 2
>>> subprocess.check_output(['test', '-f', 'no-such-file'])
... subprocess.CalledProcessError: ... returned non-zero exit status 1


> bash -c '`/usr/bin/test a b c`; echo OK'; echo $?
/usr/bin/test: ‘b’: binary operator expected

> bash -e -c '`/usr/bin/test a b c`; echo OK'; echo $?
/usr/bin/test: ‘b’: binary operator expected

Used /usr/bin/test for bash to make examples comparable by not using built-in test in bash.

Perl and Ruby for example, do not see any problem with failing process.

Bash does not care by default but has -e switch to make non-zero exit code fatal, returning the bad exit code when exiting from bash.

Python can differentiate zero and non-zero exit codes.

So, the best we can do is distinguish zero and non-zero exit codes? That’s just not good enough. test for example can return 0 for “true” result, 1 for “false” result and 2 for exceptional situation. Let’s look at this bash code with intentional syntax error in “test”:

if /usr/bin/test --f myfile;then
  echo OK
  echo File does not exist

The output is

/usr/bin/test: missing argument after ‘myfile’
File does not exist

Note that -e switch wouldn’t help here. Whatever follows if is allowed to fail (it would be impossible to do anything if -e would affect if and while conditions)

How NGS treats exit codes?

> ngs -e '$(test a b c); echo("OK")'; echo $?
test: ‘b’: binary operator expected
... Exception of type ProcessFail ...

> ngs -e '$(nofail test a b c); echo("OK")'; echo $?
test: ‘b’: binary operator expected

> ngs -e '$(test -f no-such-file); echo("OK")'; echo $?

> ngs -e '$(test -d .); echo("OK")'; echo $?

NGS has easily configurable behaviour regarding how to treat exit codes of processes. Built-in behaviour knows about false, test, fuser and ping commands. For unknown processes, non-zero exit code is an exception.

If you use a command that returns non-zero exit code as part of its normal operation you can use nofail prefix as in the example above or customize NGS behaviour regarding the exit code of your process or even better, make a pull request adding it to stdlib.

How easy is to customize exit code checking for your own command? Here is the code from stdlib that defines current behaviour. You decide for yourself (skipping nofail as it’s not something typical an average user is expected to do).

F finished_ok(p:Process) p.exit_code == 0

F finished_ok(p:Process) {
    guard p.executable.path == '/bin/false'
    p.exit_code == 1

F finished_ok(p:Process) {
    guard p.executable.path in ['/usr/bin/test', '/bin/fuser', '/bin/ping']
    p.exit_code in [0, 1]

Let’s get back to the bash if test ... example and rewrite the it in NGS:

if $(test --f myfile)
    echo("File does not exist")

… and run it …

... Exception of type ProcessFail ...

For if purposes, zero exit code is true and any non-zero exit code is false. Again, customizable. Such exit code treatment allows the if ... test ... NGS example above to function properly, somewhat similar to bash but with exceptions when needed.

NGS’ behaviour makes much more sense for me. I hope it makes sense for you.

Update: Reddit discussion.

Have a nice weekend!

Bashing bash – unexceptional

This is the third post in the “bashing bash” series. The aim of the series is to highlight problems in bash and convince systems and software engineers to help building a better alternative.

The problem

What is the output of the following script?


set -eu

echo Here
echo Not here

Right, the output is


Imagine that instead of false you have a lot of code. Since there are no exceptions, you have no idea where the error occurred.


Solutions using bash:

  1. Use set -x to trace the code.
  2. Add echo something every here and there to know between which two echo‘s the error occurred.
  3. Catch the error using trap and print the line number as suggested on StackOverflow . Writing this additional catching snippet at the top of every script is not really convenient.

This problem of unclear error location is unimaginable in any normal programming language.

“There is no problem, just don’t do it”

Bash was not intended to be a “normal” programming language. Some people say it’s an abuse of bash to use it as such. Looking at the code written in Bash I can tell it really is an abuse in many cases.

The reality though is that bash is still (ab)used for programming. In some cases Bash has positive aspects which outweigh the need to use other languages. In other cases a program starts as a small Bash script and is just not rewritten in another language after the script grows.

I suggest making a better shell rather than convincing people not to abuse Bash. People will keep on doing what they are doing. Let’s make their lives easier by providing them with a better shell.

The suggested solution

Use NGS. In NGS, any failed process throws an exception. Let’s take a look at the script below

#!/usr/bin/env ngs
echo Here
echo Not here

What’s the output?

Not here


Well, actually false returning an exit code of 1 is not an exception, it’s normal. If any command returning non-zero code would cause an exception, you wouldn’t be able to write for example if $(test -e myfile) do_something .

Failed process is a process that returns an unexpected exit code. Here is the part of stdlib that defines what’s a fail and what’s not:

F finished_ok(p:Process) p.exit_code == 0

F finished_ok(p:Process) {
    guard p.executable.path == '/bin/false'
    p.exit_code == 1

F finished_ok(p:Process) {
    guard p.executable.path == '/usr/bin/test'
    p.exit_code in [0, 1]

Such definitions also mean that you can easily extend NGS to work properly with any other command, simply by adding another finished_ok function. (Or add it to stdlib if it’s a common command so everyone would benefit).

So where are the exceptions?

We’ll have to modify the code to get an unexpected exit code. Example:

#!/usr/bin/env ngs
echo Here
ls nosuchfile
echo Not here


ls: cannot access 'nosuchfile': No such file or directory
========= Uncaught exception of type 'ProcessFailed' =========
====== Exception of type 'ProcessFailed' ======
=== [ backtrace ] ===
[Frame #0] /etc/ngs/bootstrap.ngs:158:1 - 158:10 [in <anonymous>]
[Frame #1] /etc/ngs/bootstrap.ngs:154:17 - 154:29 [in bootstrap]
[Frame #2] ./2.ngs:3:4 - 3:14 [in <anonymous>]
[Frame #3] /usr/share/ngs/stdlib.ngs:1116:11 - 1116:15 [in $()]
[Frame #4] /usr/share/ngs/stdlib.ngs:1050:29 - 1050:42 [in wait]
[Frame #5] /usr/share/ngs/stdlib.ngs:1006:7 - 1006:20 [in ProcessFailed]
=== [ dump process ] ===
(a lot of not very well formatted output with info about the process)

Please help building a better alternative

Go to https://github.com/ilyash/ngs/ and contribute some code.

Bashing bash – undefined variables

This is the second post in the “bashing bash” series. The aim of the series is to highlight problems in bash and convince systems and software engineers to help building a better alternative.

The problem

What does the following command do?

rm -rf $mydir/

Looks like the author would like to delete $mydir directory and everything in it. Actually it may do unexpected things because of missing quotes. The rant about quotes is in the previous post. This post is about yet another issue.

The correct commands should be:

set -u
rm -rf "$mydir/"

The important thing here is set -u . Without it, when $mydir is undefined for some reason, such as a bug in code preceding the rm command, there is a chance to brick the machine because an undefined variable becomes an empty string so the command is silently expanded to

rm -rf /


While more experienced engineers will usually use set -eu at the beginning of the script, omitting this declaration is a big trap for others.

Side note. You could ask why the original command has a trailing slash. The trailing slash is common and is used to signify a directory. While to the best of my knowledge the rm should work the same without the slash, some commands are actually more correct with trailing slash. For example cp myfile mydir/ would copy the file into the directory if it exists and would cause error if it doesn’t. On the other hand,  cp myfile mydir would behave the same if directory exists but would create a mydir file if there is no such directory nor file, which was not intended. Other commands such as rsync also behave differently with and without the slash. So it is common to use the slash.

See also: http://www.tldp.org/LDP/abs/html/options.html – bash options

The suggested solution

In NGS, any use of an undefined variable is an exception.

ngs -e 'echo(a)'

It’s going to look prettier but even in current implementation you have all the information about what happened:

========= Uncaught exception of type 'GlobalNotFound' =========
====== Exception of type 'GlobalNotFound' ======
=== [ dump name ] ===
* string(len=1) a
=== [ dump index ] ===
* int 321
=== [ backtrace ] ===
[Frame #0] /etc/ngs/bootstrap.ngs:156:1 - 156:10 [in <anonymous>]
[Frame #1] /etc/ngs/bootstrap.ngs:152:17 - 152:29 [in bootstrap]
[Frame #2] <command line -e switch>:1:8 - 1:9 [in <anonymous>]

While bash options probably have historical justification, a new language should not have such a mechanism. It complicates things a lot. In addition, the programmer should always be aware what are the current options.

Please help building a better alternative

Go to https://github.com/ilyash/ngs/ and contribute some code.

Bashing bash – variable substitution

This is the first post in the “bashing bash” series to highlight problems in bash and convince systems and software engineers to help building a better alternative.

Bash is a very powerful and useful tool, doing a better job than many other shells and programming languages when used for the intended tasks. Still, it’s hard to believe that writing a software decades later can not be done better.

The problem

What does the following command do?

cp $src_file $dst_file

One might think it copies the given file to the specified destination. Looking at the code we can say it was the intention. What would actually happen? It can not be known from the line above. Each $src_file and $dst_file expand to zero to N arguments so unexpected things that could happen. The correct command would be

cp "$src_file" "$dst_file"

Forgetting the quotes or leaving them out assuming  $src_file and $dst_file will always contain one bash word, expanding to exactly one argument each is dangerous.

Keeping quoting everything makes code cluttered.

The suggested solution

In NGS, $var expands to exactly one argument similar to "$var" in bash. The new syntax $*var, consistent with similar syntax in other NGS parts, would expand to zero to N arguments.

Please help building a better alternative

Go to https://github.com/ilyash/ngs/ and contribute some code.