Naming in Software – Practical Guide

The title of the post is the title of the book that I wanted to publish for quite some time now. While I was thinking about phrasing and gathering content, somebody else beat me to it with Naming Things: The Hardest Problem in Software Engineering. The main issue that I wanted to solve is now solved. Programmers don’t have an excuse for poor naming anymore.

In light of this event, I’ve decided to make small complementary post out of the materials that I have gathered and move on, focusing on Next Generation Shell.

Me and Naming

I have over 20 years of professional experience in programming. During that time, as many others, I’ve also noted the struggle when it comes to naming.

Here is a list of my accepted naming contributions to various projects.

  1. iterators – function shoes_in_my_size naming 2020-02-16, “The Rust Programming Language” book
  2. Constructors – Get_Contents() method is misnamed 2020-02-23, MS C++ Documentation
  3. Rename howMany() to countSelected() 2023-01, MDN
  4. nilJson naming issue in readme 2023-04, Otterize

Naming Things, the Book

I skimmed Tom’s book to understand how similar it was to what I was about to write. Quite close. If you are struggling with naming, go and read it.

There is some amount of fluff which I think my book would have less. Example: convincing people that naming is important while they are already reading the book.

Overall, I do recommend the book though.

Especially I recommend this book to AWS as an organization, I guess along with other books about code quality in general. AWS, your de-prioritization of code quality is staggering. I mean observable output here, not the stated “Insist on the Highest Standards”.

6.2.6 Use accurate parts of speech

Adding negative example from AWS:

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/crpg-ref-responses.html

SUCCESS and FAILED are not the same part of speech.

7.2.3 Omit Metadata

Additional reason to exclude data type from the name is to avoid additional changes anywhere in the code except for the declaration.

elt in items

Is items a list here or a set? Probably not important, the code should work with either. On the other hand, changing data type from list to set in the following example will make the code incorrect:

elt in items_list

8.2.4 Use similar names for similar concepts

Adding negative example from AWS, which time after time fails to give consistent names across their APIs.

How do you limit number of results from an API? MaxResults, maxResults, MaxRecords, MaxItems, Limit, limit, … Details at AWS API pagination naming.

It looks like consistent naming is valued less than independence of teams and ability of teams to perform uncoordinated work.

8.2.5 Use consistent antonyms

Adding example.

When I’ve got to name Option type (represents a container that can hold a value or can be empty) in Next Generation Shell, I went with straightforward antonyms.

  • Box (super type)
  • EmptyBox
  • FullBox

Authors of other languages preferred other naming conventions:

Scala: Option, None, Some

Haskell: Maybe, Nothing, Just

Perspective

Information Loss

In my perspective, giving inadequate names is part of a larger issue – Information Loss. Each time you give a name, think which information is now in your head which will be helpful to the reader of the code. If you don’t phrase it concisely and precisely, information loss occurs between your head and the code you are working on. There are several common types of errors one can make:

  1. Don’t provide enough information. Causes the reader to investigate in order to recover the information.
  2. Provide wrong information. That’s the worst, it’s misleading the reader.
  3. Provide too much information. The reader then must sift through the information to get to the relevant parts.

API

Sometimes it’s useful to think of methods as an API. That’s why method names shouldn’t include implementation details (with rare exceptions when they are important to the caller). Think of methods’ names and parameters’ names as a short version of API specification.

Identifiers

Tom’s book deals with naming identifiers, such as functions, classes, variables, etc. One step before naming an identifier is the question whether there should be an identifier.

Avoid Naming

Sometimes, the additional cognitive load is not worth it.

# Bad
chairs = fetch_chairs()
sorted_chairs = chairs.sort()
# also, now have to use the longer identifier in the code below

# Good
chairs = fetch_chairs().sort()

Apply your judgement of course. If it’s a 20 step process, additional identifiers in the middle do contribute to understanding. You still probably don’t want an identifier for each and every step of the calculation.

Do Name – Magic Numbers

Avoid magic numbers through naming. Please ignore whether result is a good name 🙂

# Bad
if result == 126 { ... }

# Good
NOT_EXECUTABLE = 126; # Or better, part of an Enum

if result == NOT_EXECUTABLE { ... }

Do Name – Repetitive Code

If you notice code that repeats, with rare exceptions, you should refactor your code extracting that code to a function or a method with a name.

Several Identifiers

Sometimes a function, a method, or a class do several things. In this case, you might struggle to name it. In a perfect world, the solution to this is refactoring to appropriate pieces.

Test Your Naming

You just named something: a function, a method or a class. Is there a change around the code that would make the name wrong? What if you copy+paste the named piece of code to another project? Would you need to change the name?

# Bad
function start_yellow_cars(cars) { ... } # The function doesn't know or care about the color
yellow_cars = ...
start_yellow_cars(yellow_cars)

# The change that would highlight the wrong naming
# while keeping the code completely functional
function start_yellow_cars(cars) { ... }
my_cars = ...
start_yellow_cars(my_cars)


# Good
function start_cars(cars) { ... }
yellow_cars = ...
start_cars(yellow_cars)

Common Naming Mistakes Observed

  1. Naming a data structure with “JSON” in name.
  2. Argument vs Parameter

Tooling

I highly recommend using IDEs that “understand” the code enough to be able to refactor/rename (classes, methods, functions, parameters) as opposed to text editors which can not assist with renaming to the same extent.


Hope this helps. Happy naming!

gRPC + REST API on AWS

Showing a setup of gRPC service which is also exposed as a REST API. It’s a setup that happens to work for us. No alternatives will be discussed in this post.

This is a concise blog post.

Architecture

  1. ALB with HTTPS listener (trivially configured, out of scope of this post)
  2. ECS running a task with 3 containers:
    • API Gateway. Implemented by Envoy. does:
      • requests authorization using the service in next container
      • proxies gRPC requests
      • proxies REST requests (converting them to upstream gRPC requests).
    • authorization service implemented with OPA
    • Our gRPC application

Notes

Health checks are not in very good shape yet

ECS Configuration (Simplified Excerpt)

In case the reader is not familiar, it CloudFormation below.

  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      ContainerDefinitions:
        - Name: apigw
          Image: !Ref ApiGwImage
          PortMappings:
            - ContainerPort: !Ref ContainerPort
        - Name: opa
          Image: !Ref OpaImage
          PortMappings:
            - ContainerPort: 9191
        - Name: app
          Image: !Ref AppImage
          PortMappings:
            - ContainerPort: 4000

  Service:
    DependsOn:
      - GrpcListenerRule
      - RestListenerRule
      - GrpcTargetGroup
      - RestTargetGroup
    Type: AWS::ECS::Service
    Properties:
      ServiceName: !Ref ServiceName
      Cluster: !Ref Cluster
      TaskDefinition: !Ref TaskDefinition
      LoadBalancers:
        - ContainerName: apigw
          ContainerPort: !Ref ContainerPort
          TargetGroupArn: !Ref GrpcTargetGroup
        - ContainerName: apigw
          ContainerPort: !Ref ContainerPort
          TargetGroupArn: !Ref RestTargetGroup

  GrpcTargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      HealthCheckIntervalSeconds: 10
      HealthCheckPath: /
      HealthCheckTimeoutSeconds: 5
      Matcher:
        GrpcCode: "0-99"
      UnhealthyThresholdCount: 2
      HealthyThresholdCount: 2
      Port: !Ref ContainerPort
      Protocol: HTTP
      ProtocolVersion: GRPC
      TargetGroupAttributes:
        - Key: deregistration_delay.timeout_seconds
          Value: 60 # default is 300
      TargetType: ip
      VpcId: !ImportValue VpcId

  RestTargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      HealthCheckIntervalSeconds: 10
      HealthCheckPath: /rest/not-found
      HealthCheckTimeoutSeconds: 5
      Matcher:
        HttpCode: 404
      UnhealthyThresholdCount: 2
      HealthyThresholdCount: 2
      Port: !Ref ContainerPort
      Protocol: HTTP
      ProtocolVersion: HTTP1
      TargetGroupAttributes:
        - Key: deregistration_delay.timeout_seconds
          Value: 60 # default is 300
      TargetType: ip
      VpcId: !ImportValue VpcId

  GrpcListenerRule:
    Type: AWS::ElasticLoadBalancingV2::ListenerRule
    Properties:
      Actions:
        - Type: forward
          TargetGroupArn: !Ref GrpcTargetGroup
      Conditions:
        - Field: path-pattern
          PathPatternConfig:
            Values:
              - '/censored.v1.CensoredService/*'
              - '/censored.v1.CensoredAdminService/*'
              - '/censored.v1.CensoredSystemService/*'
      ListenerArn: ...
      Priority: 1000

  RestListenerRule:
    Type: AWS::ElasticLoadBalancingV2::ListenerRule
    Properties:
      Actions:
        - Type: forward
          TargetGroupArn: !Ref RestTargetGroup
      Conditions:
        - Field: path-pattern
          PathPatternConfig:
            Values:
              - '/rest/v1/*'
      ListenerArn: ...
      Priority: 1001

Envoy Configuration (Simplified Excerpt)

static_resources:
  listeners:
    - address:
        socket_address:
          address: 0.0.0.0
          port_value: 8000
      filter_chains:
        - filters:
            - name: Connection Manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                via: CensoredGW
                route_config:
                  name: Static response for tests
                  virtual_hosts:
                    - name: backend
                      domains:
                        - "*"
                      routes:
                        - match:
                            prefix: "/test/static"
                          direct_response:
                            status: 200
                            body:
                              inline_string: "Static response for tests"
                        # Reference: https://envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/grpc_json_transcoder_filter#route-configs-for-transcoded-requests
                        - match:
                            prefix: "/"
                          route:
                            cluster: upstream
                            timeout: 60s
                http_filters:
                  - name: envoy.filters.http.grpc_json_transcoder
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_json_transcoder.v3.GrpcJsonTranscoder
                      # maybe disable later:
                      auto_mapping: true
                      proto_descriptor: "../path/to/proto_descriptor.bin" ### See next heading in this post
                      services:
                        - censored.v1.CensoredService
                        - censored.v1.CensoredAdminService
                        - censored.v1.CensoredSystemService
                      print_options:
                        add_whitespace: true
                        always_print_primitive_fields: true
                      request_validation_options:
                        reject_unknown_method: true
                        reject_unknown_query_parameters: true
                  - name: envoy.filters.http.cors
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
                  - name: envoy.ext_authz
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz
                      failure_mode_allow: false
                      with_request_body:
                        max_request_bytes: 10485760 # 10M
                        allow_partial_message: false
                        pack_as_bytes: true
                      transport_api_version: V3
                      grpc_service:
                        envoy_grpc:
                          cluster_name: opa-agent
                        timeout: 10s
                  - name: envoy.filters.http.router
                    # https://github.com/envoyproxy/envoy/issues/21464
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
                always_set_request_id_in_response: true
                access_log:
                  - typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                      # https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/access_log/usage#config-access-log-default-format

  # Based on https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/grpc_json_transcoder_filter
  clusters:
    - name: opa-agent
      connect_timeout: 0.25s
      type: STRICT_DNS
      typed_extension_protocol_options:
        envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
          "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
          explicit_http_config:
            http2_protocol_options: { }
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: service
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.1
                      port_value: 9191
    - name: upstream
      type: STRICT_DNS
      typed_extension_protocol_options:
        envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
          "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
          explicit_http_config:
            http2_protocol_options: {}
      load_assignment:
        cluster_name: grpc
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.1
                      port_value: 4000

proto_descriptor.bin

GrpcJsonTranscoder must have the proto descriptor file in order to know how to transcode. The file contains:

  1. proto definitions of your services, including extension that describes how to expose the services as REST
  2. dependencies of the above proto definitions

The descriptor file is generated using a command similar to the following:

buf build -o proto_descriptor.bin --as-file-descriptor-set --path path/to/my.proto

buf is a way to manage .proto files and their dependencies (very imprecise definition, sorry)

If I remember correctly, you can generate the descriptor with protoc (without buf) but I don’t remember how.

grpcurl

Same descriptor file is used with grpcurl when you later test your service from the command line:

grpcurl -H "Authorization: Bearer ..." -protoset proto_descriptor.bin "example.com:443" censored.service.name/MyFunc

my.proto

This is how a protobuf definition with REST extension looks like (excerpt):

import "google/api/annotations.proto";

service Censored {
  rpc MyCreate(CreateRequest) returns (CreateResponse){
    option (google.api.http) = { post: "/rest/v1/my-objs" };
  }
  rpc MyGet(GetRequest) returns (GetResponse) {
    option (google.api.http) = { get: "/rest/v1/my-objs/{id}" };
  }
}

Excerpt from buf.yaml corresponding to the import above:

version: v1

deps:
  - buf.build/googleapis/googleapis


Hope this helps.

Sorry, I was in a rush to get this out. If anything is unclear or missing, please let me know.

The Case for Concise Posts

According to DiSC, about a quarter of all people should be communicating like me. We want information, not fluff or stories. We are here to get the answer to our question: “what’s X?” (a technology, a format, a piece of software, etc). Yet, the number of blog posts which answer “What’s X?” concisely is roughly zero. I am going to fix this with my future posts as time allows. Stay tuned.

Everything below is implementation detail. You can stop reading here and save a few minutes.

Here is my plan. Feel free to use it as a guideline for your blog posts too.

Discretion

We are looking for the author’s discretion about what’s important (hint: typically concepts and architecture), not a dump of everything that the author knows about the topic. We are here for “I would have written a shorter letter, but I did not have the time“. Yes, that 5 minutes read should take hours if not days to write. Otherwise, what’s the value?

Can’t answer the question concisely? We doubt your understanding of the topic.

Context

“Context is important, the blog post must provide context, blah blah …”.

The context was already established by searching “what’s X” or following a link to the post. It’s annoying when the text starts with fluff, keeping us wondering when and whether my question will be answered.

If there is some *really* important context, that’s not 5 paragraphs. Sorry, storytellers.

Show why the Topic is Important

“You need to show why X is important and where it’s used”.

This information is available in every other blog post and/or on Wikipedia and/or is one search away.

Underlying Concepts

Don’t explain the underlying concepts, link to them (like DiSC above). Don’t waste our time if we already know that.


Stay tuned.

Event Loop for Beginners

The aim of the post is to give a simple, concise, and high level description of the event loop. I’m intentionally leaving out many details and being somewhat imprecise.

If you need detailed picture, follow the links in this post. In this case, I recommend reading this short post first.

Why Event Loop?

To deal with concurrency (multiple “things” need to be happening at what looks like the same time), there are few models. To discuss the models, we need to know what a thread is.

Roughly, a thread is a sequence of computing instructions that runs on a single CPU core.

Roughly, Event Loop manages on which tasks the thread works and in which order.

Queue

The queue contains what’s called in different sources messages, events, and tasks.

Message/event/task is a reference to a piece of code that needs to be scheduled to run. Example: “the code responsible for handling the click of button B + full details of the click event to pass to the code”.

Event Loop

  1. The queue is checked for new tasks. If there is none, we wait for one to appear.
  2. The first task in the queue is scheduled and starts running. The code runs till completion. In many languages, await keyword in the code counts as completion and everything after await is scheduled to run later – new task in the queue.
  3. Repeat from number 1. That’s the loop. It’s called Event Loop because it processes events from the queue in a loop.

Adding Events to the Queue

Tasks are added to the queue for two reasons:

  1. Something happened (user clicked on a button, network packet arrived, etc).
  2. The code that was running in step 2 of the event loop scheduled something to run later.

See Also

  1. Event Loop documentation at MDN.
  2. What is the difference between concurrency and parallelism? at StackOverflow

Hope this helps with high level understanding of Event Loop. Have a nice day!

AWS CDK Opinionated Pipeline – Where is What?

Background

You use AWS CDK. It’s great. It does a lot for you. Then one day something goes wrong. OK, it didn’t happen yet. But you want to be prepared for that (at least to some extent). The following information is what I have found when I was preparing. Sharing to hopefully save the reader some time.

Basics

Before we dive in, let’s just make sure we’ve got the basics covered

cdk ls

cdk ls lists all the stacks in the app, including the pipeline.

Example from my test project:

$ cdk ls
Proj1Stack
Proj1Stack/Deploy1/LambdaStack1
Proj1Stack/Deploy2/LambdaStack1
  • Proj1Stack is the pipeline.
  • Deploy1 and Deploy2 are “stages”

cdk synth

cdk synth $STACK_NAME >1.yaml is your friend, a debugging tool. It shows the generated CloudFormation.

cdk.out directory

cdk.out is the directory where cdk synth outputs everything that’s need for deploying (CloudFormation templates, related assets, metadata). They call it Cloud Assembly.

All assets are named based on the hash of their content so they are unique and immutable.

How the Generated Pipeline Looks Like?

When you use an opinionated pipeline, you can see the following generated CodePipeline actions:

  • Source (with long hash as output artifact name)
  • Build with name Synth (a CodeBuild project that runs cdk synth)
  • Build with name SelfMutate (a CodeBuild project that runs cdk deploy to update the pipeline)
  • Build with name FileAsset1 (a CodeBuild project that runs cdk-assets publish). From reading sources: there might be several cdk-assets publish commands configured in the CodeBuild project.
  • Then two CloudFormation deploy actions per each “stage” you want to deploy to (usage of change sets is the default but can be disabled as per documentation, see useChangeSets):
    • CHANGE_SET_REPLACE
    • CHANGE_SET_EXECUTE

cdk-assets

“It will take the assets listed in the manifest, prepare them as required and upload them to the locations indicated in the manifest.”

Note that cdk-assets is not making any decisions; metadata in the cdk.out directory has the information about assets, how to build/transform them and where they go.

cdk-assets can only handle two types of assets:

  • files (including directories). cdk-assets knows how to zip directories and how to upload files and directories to S3.

    (From reading source code) Didn’t see in use but apparently cdk-assets can also run an executable to package a file (or directory?). In this case the content-type of the output is assumed to be application/zip. 🤷‍♂️
  • Docker images. cdk-assets knows how to build Docker images and push them into registry.

Sample command to see the list of assets: npx cdk-assets ls -p cdk.out/Proj1Stack.assets.json

What is Built When?

Files – unprocessed

If the files/directories don’t need any processing, they are just copied over to cdk.out during cdk synth and given a name which is a hash of the contents.

Example: Lambda function code

Files – processed

The processing happens during the cdk synth so that cdk.out contains already processed assets.

Example: Node.JS Lambda function code (processed by tsc (optionally) and esbuild)

Docker Images

cdk-assets builds a docker image and pushes it into the specified repository. The input for the build of the image is a directory in cdk.out which has the Dockerfile and related files.

Deploy

After everything was built and uploaded during cdk synth and cdk-assets, the deploy uses CloudFormation template (templates?) from the cdk.out directory. At this point the assets (which the template references) are in ECR and S3.


  • I tried to condense the information that deemed important.
  • Let me know if something is missing or if you see mistakes.
  • The plan is to update this post as I discover new information.
  • My mistakes so far
    • Started taking notes quite a few hours into the process instead of from the start. Especially it would save me the jumping between the pipeline and the build projects to re-check what each action does.
    • Editing this post tired
  • Last edit: 2023-01-20

AWS CDK – Proposed Slogans

Below, despite the humor, is my honest praise to the AWS CDK team and the product.

  1. Finally bringing code into “infrastructure as code”
    (sorry Puppet, Ansible, CloudFormation, SAM, Terraform)
  2. The only team at AWS that actually cares about your experience
  3. Suffer much less
  4. No more dealing with IAM policies anymore*
    * almost
  5. Did you know that CodePipeline actually requires an S3 bucket to work?
  6. CloudFormation? Ye, nice intermediate representation, you know, like assembler with macros.
  7. Making interaction with AWS bearable
    (I would say “again” but it never was)
  8. So right on so many levels
  9. Cloud – it doesn’t have to be ugly
  10. CDK – Cool Developers Know
  11. Isolating you from the ugly
  12. We ate the shit so you wouldn’t have to*
    * mostly
  13. Don’t Look Up^W at the generated CloudFormation

Have a nice day!

Arguments and Parameters

These two words are used interchangeably. Please don’t. They mean different things. Here is my concise explanation.

Argument

A value passed into a function/method during invocation.

my_func(arg1, arg2)

Additional names for “argument” are “actual argument” and “actual parameter”.

Parameter

A name of a variable in the function/method definition. During invocation, the variable is used in the function/method body to refer to the value of the passed argument.

F my_func(param1, param2) {
  ...
  # Using param1 and param2 for a computation
  ...
}

Additional name for “parameter” is “formal argument”.

Tip – Parametrize

If you struggle to remember which one is which, this might help: when you “parameterize” a piece of code, you add parameters to the code. Then you have the code with the parameter used in it, with the first occurrence in the function/method definition.

# Initial version

echo("Hello, Joe")

# Parametrized version. "name" is a parameter.

F hello(name) {
  echo("Hello, ${name}")
}

See Also


Hope this helps! Have a nice day!


Updates after Reddit discussion:

  • I never asked the difference as an interview question. If I would:
    • Getting this wrong – tiny negative point
    • Not understanding why using correct terminology matters – big negative point
    • Understanding the difference and using these words interchangeably (knowingly incorrectly) – huge negative point
    • Providing fake facts to support your opinion that these words are interchangeable – huge negative point
  • Explaining why using correct terminology matters is out of scope of this post

Telegraph and the Unix Shell

Following is my opinion of the interactive mode of the Unix Shell. The interactive mode of the Unix shell is not much different from using Telegraph. I’ll substantiate the claim and point out what went wrong.

History

Short history of relevant inventions and events follows.

Telegraph – 1840s

Telegraph is a “point-to-point text messaging system”

Teleprinter – 1887

Teleprinter improved over telegraph’s user interface by adding a keyboard. Essentially it’s still a text messaging system.

“Computers used teleprinters for input and output from the early days of computing.”

Computer Terminal with VDU – 1950s

Computer Terminal with a “video display unit (VDU)” improved over teleprinter by replacing the printer with a screen.

( Summary Till this Point )

We see incremental progress in transmitting text. First between humans and later between a human and computer. The core concept did not change over time. Send text, receive text in response.

VT 52 – 1974/1975

VT 52 is released. It’s a pretty big deal because it supports cursor movement. Text on the screen can therefore be updated. In more general terms, it allowed interaction with what’s on the screen.

Bill Joy – vi – 1976

Bill Joy released vi in 1976. He figured that since cursor movement is supported he can make an editor that actually interactively uses the whole screen. That’s in contrast to the ex editor which didn’t.

What Went Wrong with the Shell?

The shell never caught up with the idea of interactivity on the screen. It’s still a “point-to-point text messaging system”. The paradigm shift never happened. Treating the received text as if it was printed on paper puts the idea of interacting with the output completely out of the realm of possibilities.

The “interactive” shell is not that much interactive. It is only “interactive” compared to “batch”. If you take a 24 lines terminal, considering that all interaction happens on one line (the “command line” I guess), that’s about 4% interactivity by line count.

Practical Consequences

Copy + Paste

Does the following sequence of operations happen to you when working in a shell?

  1. Type and run a command
  2. Start typing a second command
  3. Copy something from the output of the first command
  4. Paste that as an argument to the second command

I know that it does happen to me quite often. Why can’t the shell (tab) complete from the output of the first command? Because the shell has no idea (1) what’s in that output (2) what’s the semantic meaning of what’s in that output and how to use it for completion.

Imagine for a moment a website where you see a list of items but if you want more information about one of them, you need to copy its name and paste somewhere else, adding a word or two. Sounds weird? How is this OK in a shell then?

For the “shell is not supposed to do that” camp: you are welcome to continue to use Nano, Pico and Notepad for programming because “text editor is not supposed to do that”; I’ll use an IDE because it’s more productive.

Each Command on its own

Typically a shell user tries to achieve a goal. For that, typically a series of related commands. The Unix shell doesn’t know or care about that.

Let’s say you issued ls *.txt and you see the file that you were interested in in the output. You can’t interact with it despite being on the screen right in front of you so you proceed. Now you typed cp and pressed tab for completion. The completion will happily try to complete the names of all the files in the directory, not only *.txt which are way more likely to be meant by the user. There isn’t be any relation between sequentially issued commands in the Unix shell. Want composition? OK, use pipe or start scripting.


I might expand this article in the future.

The new Life of tap()

Background

I’m designing and implementing Next Generation Shell, a programming language (and a shell) for “DevOps” tasks (read: running external commands and data manipulation are frequent).

I came across a programming pattern (let’s call it P) as follows:

  1. An object is created
  2. Some operations are performed on the object
  3. The object is returned from a function (less frequently – stored in a variable)

P Using Plain Approach

The typical code for P looks in NGS like the following:

F my_func() {
  my_obj = MyType()
  my_obj.name = "blah"
  my_obj.my_method(...)
  my_obj  # last expression is evaluated and returned from my_func()
}

The above looks repetitive and not very elegant. Given the frequency of the pattern, I think it deserves some attention.

Attempt 1 – set()

In simpler but pretty common case when only assignment to fields is required after creating the object, one could use set() in NGS:

F my_func() {
  MyType().set(name = "blah")
}

or, for multiple fields:

F my_func() {
  MyType().set(
    name = "blah"
    field2 = 100
    field3 = "you get the idea"
  )
}

Side note: parameters to methods can be separated by commas or new lines, like in the example above.

I feel quite OK with the above but the cons are:

  1. Calling a method is not supported (unless that method returns the original object, in which case one could MyType().set(...).my_method())
  2. Setting of fields can not be interleaved in a straightforward manner with arbitrary code (for example to calculate the fields’ values)

Attempt 2 – tap()

I’m familiar with tap() from Ruby. It looked quite useful so NGS also had tap() for quite a while. Here is how P would look like in NGS when implemented with tap():


F my_func() {
  MyType().tap({
    A.name = "blah"
    A.my_method()
  })
}

Tap takes an arbitrary value, runs the given callback (passing that value as the only argument) and returns the original value. It is pretty flexible.

Can’t put my finger on what’s exactly is bothering me here but the fact is that I was not using tap() to implement P.

Attempt 3 – expr::{ … }

New Life of tap()

This one is very similar to tap() but it is syntactically distinct from tap.

F my_func() {
  MyType()::{
    A.name = "blah"
    # arbitrary code here
    A.my_method()
  }
}

I think the main advantage is that P is easily visually distinguishable. For example, if you only want to know the type of the expression returned, you can relatively easy skip everything between ::{ and } . Secondary advantage is that it’s a slightly less cluttered than tap().

Let’s get into the details of how the above works.

Syntax

  1. MyType() in our case is an expression. Happens to be a method call which returns a new object.
  2. :: – namespace field access operator. Typical use case is my_namespace::my_field.
  3. { ... } – anonymous function syntax. Equivalent to a function with three optional parameters (A, B, and C, all default to null).

Note that all three syntax elements above are not unique to this combination. Each one of them is being used in other circumstances too.

Up until recently, the :: syntax was not allowing anonymous function as the second argument. That went against NGS design: all methods should be able to handle as many types of arguments as possible. Certainly limiting arguments’ types syntactically was wrong for NGS.

Semantics

In NGS, any operator is transformed to a method call. :: is no exception. When e1::e2 is encountered, it is translated into a call to method :: with two arguments: e1 and e2.

NGS relies heavily on multiple dispatch. Let’s look at the appropriate definition of the :: method from the standard library:

F '::'(x, f:Fun) {
  f(x)
  x
}

Not surprisingly, the definition above is exactly like the definition of F tap() ... (sans method and parameters naming).

Examples of expr::{ … } from the Standard Library

# 1. Data is an array. Each element is augmented with _Region field.
data = cb(r)::{
  A._Region = ConstIter(r)
}


# 2. push() returns the original object, which is modified in { ... }
F push(s:Set, v) s::{ A.val[v] = true }


# 3. each() returns the original object.
# Since each() in { ... } would return the keys() and not the Set,
# we are working around that with s::{...}
F each(s:Set, cb:Fun) s::{ A.val.keys().each(cb) }


# 4. Return what c_kill() returns unless it's an error
F kill(pid:Int, sig:Int=SIGNALS.TERM) {
  c_kill(pid, sig)::{
    A == -1 throws KillFail("Failed to kill pid $pid with signal $sig")
    A != 0 throws Error("c_kill() did not return 0 or -1")
  }
}

Side note: the comments are for this post, standard library has more meaningful, higher level comments.

A Brother Looking for Use Cases

While changing syntax to allow anonymous function after ::, another change was also made: allow anonymous function after . so that one could write expr.{ my arbitrary code } . The whole expression returns what the arbitrary code returns. Unfortunately, I did not come across (or maybe haven’t noticed) real use cases. The appropriate . method in the standard library is defined as follows:

F .(x, f:Fun) f(x)

# Allows
echo(5.{ A * 2 })  # 10

Have any use cases which look less stupid than the above? Let me know.

The Original Sin in IT

You have a program with human readable output. Now a second program needs that information. There are two choices:

  1. Do the technically right and challenging thing – create a protocol and rewrite the first program and then write the second program (utilizing something like libxo in the first program maybe).
  2. Fuck everybody for the decades to come by deciding that parsing text is the way to go.

I would like to thank everybody involved in choosing number 2!

jc is an effort to fix (more work around) that decision. Thanks to the author and I hope it will make your DevOps life at least a bit more tolerable.