This is a pain-driven response to post about Narrow Waist of Unix Architecture. If you have the time, please read that post.
The (very simplified and rough) TL;DR of the above link:
- The Internet has “Narrow Waist”, the IP protocol. Anything that is above that layer (TCP, HTTP, etc), does not need to be concerned with lower level protocols. Each piece of software therefore does not need to concern itself with any specifics of how the data is transferred.
- Unix has “Narrow Waist” which is text-based formats. You have a plethora of tools that work with text. On one side of of Narrow Waist we have different formats, on another side text manipulating tools.
I agree with both points. I disagree with implied greatness of the Unix “design” in this regard. I got the impression that my thoughts in this post are likely to be addressed by next oilshell blog posts but nevertheless…
Like hierarchy of types, we have hierarchy formats. Bytes is the lowest level.
Everything in Unix is Bytes. Like in programming languages, if you know the base type, you have a certain set of operations available to you. In case of Bytes in Unix, that would be cp, zip, rsync, dd, xxd and quite a few others.
A sub-type (a more specific type) of Bytes would be Text. Again, like in a programming language, if you know that your are dealing with data of a more specific type, you have more operations available to you. In case of Text in Unix it would be: wc, tr, sed, grep, diff, patch, text editors, etc.
For the purposes of this discussion X is a sub-type of Text. CSV or JSON or a program text, etc.
Is JSON a sub-type of Text? Yes, in the same sense that a cell phone is a communication device, a cow is an animal, and a car is a transportation device. Exercise to the reader: are this useful abstractions?
The Text Hell
The typical Unix shell approach for working with X are the following steps:
- Use Text tools (because they are there and you are proficient wielder)
- One of:
- Add a bunch of fragile code to bring Text tools to level where they understand enough of X (in some cases despite existing command line tools that deal specifically with X)
- Write your own set of tools to deal with the relevant subset of X that you have.
- Optional but likely: suffer fixing and extending number 2 for each new “corner case”.
The exception here are tools like jq and jc which continue gaining in popularity (for a good reason in my opinion). Yes, I am happy to see declining number of “use sed” recommendations when dealing with JSON or XML.
Interestingly enough, if a programmer would perform the above mentioned atrocities in almost any programming language today, that person would be pointed out that it’s not the way and libraries should be used and “stop using square peg for round hole”. After few times of unjustified repetition of the same offense, that person should be fired.
Somehow this archaic “Unix is great, we love POSIX, we love Text” approach is still acceptable…
Pipes Text Hell
- Create a pipe between different programs (text output becomes text input of the next program)
- Use a bunch of fragile code to transform between what first program produces and the second one consumes.
Where Text Abstraction is not Useful
Everywhere almost. In order to do some of the most meaningful/high-level operations on the data, you can’t ignore it’s X and just work like it is Text.
The original post says that since the format is Text, you can use vim to edit it. Yes you can… but did you notice that any self respecting text editor comes with plugins for various X’s? Why is that? Because even the amount of useful “text editing” is limited when all you know you are dealing with Text. You need plugins for semantic understanding of X in order to be more productive.
Wanna edit CSV in a text editor without CSV plugin? OK. I prefer spreadsheet software though.
Have you noticed that most developers use IDEs that “understand” the code and not Notepad?
wc -l my.csv. Do you know the embedded text in quotes does not have newlines? Oops. Does it have header line? Oops.
Want to try to rename a method in a Java program?
sed -i 's/my_method/our_method/g' *.java, right? Well, depends on your luck. I would highly recommend to do such kind of refactoring using an IDE that actually understands Java so that you rename: only specific method in a specific class as opposed to unfortunately named methods and variables, not to mention arbitrary strings.
Search / Indexing
Yep… except that understanding of the semantics helps here quite a bit. That’s why you have utilities which understand specific programming languages that do the indexing.
I do not understand the fascination with text. Still waiting for any convincing arguments why is it so “great” and why the interoperability that it provides is not largely a myth. Having a set of tools enabling one to do subpar job each time is better than not having them but is it the best we can?
My previous dream of eradicating text where it does not make sense (my blog post from 2009) came true with HTTP/2. Apparently I’m not alone in this regard.
Sorry if anything here was harsh. It’s years of pain.
Clarification – Layering
Added: 2022-02-07 (answering, I hope, https://www.reddit.com/r/ProgrammingLanguages/comments/t2bmf2/comment/hzm7n44/)
Layering in case of IP protocol works just fine. Implementer of HTTP server really does not care about the low level transport details such as Ethernet. Also the low level drivers don’t care which exactly data they deliver. Both sides of the Waist don’t care about each other. This works great!
My claim is that in case of the Text Narrow Waist, where X is on one hand of and the Text tools are on the other, there are two options:
- Tools ignore X and you have very limited functionality you get out of the tools.
- Tools know about X but then it’s “leaky abstraction” and not exactly a Narrow Waist.
That’s why I think that in case of Text, the Narrow Waist is more of an illusion.
Have a nice week!