Empty Arrays for Cleaner Code

Background

Let’s look at the following piece of code (in an abstract programming language) :

if chairs {
  for chair in chairs {
    clean(chair)
  }
}

The Problem

It’s more code that we would like to have. What’s holding us back? chairs can be null, which is not an array (aka list), therefore requiring the wrapping if. Without that if, for can fail with an error because it won’t be able to iterate over chairs (when it’s null).

Side note: any time a variable can be of different types (in our case chairs can be an array or null), there will be some code to deal with this situation. Try to avoid these situations.

Solution

Decide on a code convention that says that absence of items is represented by an empty array, not null. Your code will become more uniform and you will get rid of the if. Then the code above becomes:

for chair in chairs {
  clean(chair)
}

Please note that this solution is a tradeoff. If memory usage is very important in the application, this might not be a good solution.


Hope this helps. Have a nice day!

Update 2023-07-13

Real code example:

const tags = describeImagesCommandOutput.Images[0].Tags ?? [];
console.log('tags', tags);
const newTags = tags.filter(tag => tag.Key && TAGS_TO_COPY.has(tag.Key));
console.log('newTags', newTags);

My advice above would eliminate the ?? [] part.

Naming in Software – Practical Guide

The title of the post is the title of the book that I wanted to publish for quite some time now. While I was thinking about phrasing and gathering content, somebody else beat me to it with Naming Things: The Hardest Problem in Software Engineering. The main issue that I wanted to solve is now solved. Programmers don’t have an excuse for poor naming anymore.

In light of this event, I’ve decided to make small complementary post out of the materials that I have gathered and move on, focusing on Next Generation Shell.

Me and Naming

I have over 20 years of professional experience in programming. During that time, as many others, I’ve also noted the struggle when it comes to naming.

Here is a list of my accepted naming contributions to various projects.

  1. iterators – function shoes_in_my_size naming 2020-02-16, “The Rust Programming Language” book
  2. Constructors – Get_Contents() method is misnamed 2020-02-23, MS C++ Documentation
  3. Rename howMany() to countSelected() 2023-01, MDN
  4. nilJson naming issue in readme 2023-04, Otterize

Naming Things, the Book

I skimmed Tom’s book to understand how similar it was to what I was about to write. Quite close. If you are struggling with naming, go and read it.

There is some amount of fluff which I think my book would have less. Example: convincing people that naming is important while they are already reading the book.

Overall, I do recommend the book though.

Especially I recommend this book to AWS as an organization, I guess along with other books about code quality in general. AWS, your de-prioritization of code quality is staggering. I mean observable output here, not the stated “Insist on the Highest Standards”.

6.2.6 Use accurate parts of speech

Adding negative example from AWS:

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/crpg-ref-responses.html

SUCCESS and FAILED are not the same part of speech.

7.2.3 Omit Metadata

Additional reason to exclude data type from the name is to avoid additional changes anywhere in the code except for the declaration.

elt in items

Is items a list here or a set? Probably not important, the code should work with either. On the other hand, changing data type from list to set in the following example will make the code incorrect:

elt in items_list

8.2.4 Use similar names for similar concepts

Adding negative example from AWS, which time after time fails to give consistent names across their APIs.

How do you limit number of results from an API? MaxResults, maxResults, MaxRecords, MaxItems, Limit, limit, … Details at AWS API pagination naming.

It looks like consistent naming is valued less than independence of teams and ability of teams to perform uncoordinated work.

8.2.5 Use consistent antonyms

Adding example.

When I’ve got to name Option type (represents a container that can hold a value or can be empty) in Next Generation Shell, I went with straightforward antonyms.

  • Box (super type)
  • EmptyBox
  • FullBox

Authors of other languages preferred other naming conventions:

Scala: Option, None, Some

Haskell: Maybe, Nothing, Just

Perspective

Information Loss

In my perspective, giving inadequate names is part of a larger issue – Information Loss. Each time you give a name, think which information is now in your head which will be helpful to the reader of the code. If you don’t phrase it concisely and precisely, information loss occurs between your head and the code you are working on. There are several common types of errors one can make:

  1. Don’t provide enough information. Causes the reader to investigate in order to recover the information.
  2. Provide wrong information. That’s the worst, it’s misleading the reader.
  3. Provide too much information. The reader then must sift through the information to get to the relevant parts.

API

Sometimes it’s useful to think of methods as an API. That’s why method names shouldn’t include implementation details (with rare exceptions when they are important to the caller). Think of methods’ names and parameters’ names as a short version of API specification.

Identifiers

Tom’s book deals with naming identifiers, such as functions, classes, variables, etc. One step before naming an identifier is the question whether there should be an identifier.

Avoid Naming

Sometimes, the additional cognitive load is not worth it.

# Bad
chairs = fetch_chairs()
sorted_chairs = chairs.sort()
# also, now have to use the longer identifier in the code below

# Good
chairs = fetch_chairs().sort()

Apply your judgement of course. If it’s a 20 step process, additional identifiers in the middle do contribute to understanding. You still probably don’t want an identifier for each and every step of the calculation.

Do Name – Magic Numbers

Avoid magic numbers through naming. Please ignore whether result is a good name 🙂

# Bad
if result == 126 { ... }

# Good
NOT_EXECUTABLE = 126; # Or better, part of an Enum

if result == NOT_EXECUTABLE { ... }

Do Name – Repetitive Code

If you notice code that repeats, with rare exceptions, you should refactor your code extracting that code to a function or a method with a name.

Several Identifiers

Sometimes a function, a method, or a class do several things. In this case, you might struggle to name it. In a perfect world, the solution to this is refactoring to appropriate pieces.

Test Your Naming

You just named something: a function, a method or a class. Is there a change around the code that would make the name wrong? What if you copy+paste the named piece of code to another project? Would you need to change the name?

# Bad
function start_yellow_cars(cars) { ... } # The function doesn't know or care about the color
yellow_cars = ...
start_yellow_cars(yellow_cars)

# The change that would highlight the wrong naming
# while keeping the code completely functional
function start_yellow_cars(cars) { ... }
my_cars = ...
start_yellow_cars(my_cars)


# Good
function start_cars(cars) { ... }
yellow_cars = ...
start_cars(yellow_cars)

Common Naming Mistakes Observed

  1. Naming a data structure with “JSON” in name.
  2. Argument vs Parameter

Tooling

I highly recommend using IDEs that “understand” the code enough to be able to refactor/rename (classes, methods, functions, parameters) as opposed to text editors which can not assist with renaming to the same extent.


Hope this helps. Happy naming!

Arguments and Parameters

These two words are used interchangeably. Please don’t. They mean different things. Here is my concise explanation.

Argument

A value passed into a function/method during invocation.

my_func(arg1, arg2)

Additional names for “argument” are “actual argument” and “actual parameter”.

Parameter

A name of a variable in the function/method definition. During invocation, the variable is used in the function/method body to refer to the value of the passed argument.

F my_func(param1, param2) {
  ...
  # Using param1 and param2 for a computation
  ...
}

Additional name for “parameter” is “formal argument”.

Tip – Parametrize

If you struggle to remember which one is which, this might help: when you “parameterize” a piece of code, you add parameters to the code. Then you have the code with the parameter used in it, with the first occurrence in the function/method definition.

# Initial version

echo("Hello, Joe")

# Parametrized version. "name" is a parameter.

F hello(name) {
  echo("Hello, ${name}")
}

See Also


Hope this helps! Have a nice day!


Updates after Reddit discussion:

  • I never asked the difference as an interview question. If I would:
    • Getting this wrong – tiny negative point
    • Not understanding why using correct terminology matters – big negative point
    • Understanding the difference and using these words interchangeably (knowingly incorrectly) – huge negative point
    • Providing fake facts to support your opinion that these words are interchangeable – huge negative point
  • Explaining why using correct terminology matters is out of scope of this post

The new Life of tap()

Background

I’m designing and implementing Next Generation Shell, a programming language (and a shell) for “DevOps” tasks (read: running external commands and data manipulation are frequent).

I came across a programming pattern (let’s call it P) as follows:

  1. An object is created
  2. Some operations are performed on the object
  3. The object is returned from a function (less frequently – stored in a variable)

P Using Plain Approach

The typical code for P looks in NGS like the following:

F my_func() {
  my_obj = MyType()
  my_obj.name = "blah"
  my_obj.my_method(...)
  my_obj  # last expression is evaluated and returned from my_func()
}

The above looks repetitive and not very elegant. Given the frequency of the pattern, I think it deserves some attention.

Attempt 1 – set()

In simpler but pretty common case when only assignment to fields is required after creating the object, one could use set() in NGS:

F my_func() {
  MyType().set(name = "blah")
}

or, for multiple fields:

F my_func() {
  MyType().set(
    name = "blah"
    field2 = 100
    field3 = "you get the idea"
  )
}

Side note: parameters to methods can be separated by commas or new lines, like in the example above.

I feel quite OK with the above but the cons are:

  1. Calling a method is not supported (unless that method returns the original object, in which case one could MyType().set(...).my_method())
  2. Setting of fields can not be interleaved in a straightforward manner with arbitrary code (for example to calculate the fields’ values)

Attempt 2 – tap()

I’m familiar with tap() from Ruby. It looked quite useful so NGS also had tap() for quite a while. Here is how P would look like in NGS when implemented with tap():


F my_func() {
  MyType().tap({
    A.name = "blah"
    A.my_method()
  })
}

Tap takes an arbitrary value, runs the given callback (passing that value as the only argument) and returns the original value. It is pretty flexible.

Can’t put my finger on what’s exactly is bothering me here but the fact is that I was not using tap() to implement P.

Attempt 3 – expr::{ … }

New Life of tap()

This one is very similar to tap() but it is syntactically distinct from tap.

F my_func() {
  MyType()::{
    A.name = "blah"
    # arbitrary code here
    A.my_method()
  }
}

I think the main advantage is that P is easily visually distinguishable. For example, if you only want to know the type of the expression returned, you can relatively easy skip everything between ::{ and } . Secondary advantage is that it’s a slightly less cluttered than tap().

Let’s get into the details of how the above works.

Syntax

  1. MyType() in our case is an expression. Happens to be a method call which returns a new object.
  2. :: – namespace field access operator. Typical use case is my_namespace::my_field.
  3. { ... } – anonymous function syntax. Equivalent to a function with three optional parameters (A, B, and C, all default to null).

Note that all three syntax elements above are not unique to this combination. Each one of them is being used in other circumstances too.

Up until recently, the :: syntax was not allowing anonymous function as the second argument. That went against NGS design: all methods should be able to handle as many types of arguments as possible. Certainly limiting arguments’ types syntactically was wrong for NGS.

Semantics

In NGS, any operator is transformed to a method call. :: is no exception. When e1::e2 is encountered, it is translated into a call to method :: with two arguments: e1 and e2.

NGS relies heavily on multiple dispatch. Let’s look at the appropriate definition of the :: method from the standard library:

F '::'(x, f:Fun) {
  f(x)
  x
}

Not surprisingly, the definition above is exactly like the definition of F tap() ... (sans method and parameters naming).

Examples of expr::{ … } from the Standard Library

# 1. Data is an array. Each element is augmented with _Region field.
data = cb(r)::{
  A._Region = ConstIter(r)
}


# 2. push() returns the original object, which is modified in { ... }
F push(s:Set, v) s::{ A.val[v] = true }


# 3. each() returns the original object.
# Since each() in { ... } would return the keys() and not the Set,
# we are working around that with s::{...}
F each(s:Set, cb:Fun) s::{ A.val.keys().each(cb) }


# 4. Return what c_kill() returns unless it's an error
F kill(pid:Int, sig:Int=SIGNALS.TERM) {
  c_kill(pid, sig)::{
    A == -1 throws KillFail("Failed to kill pid $pid with signal $sig")
    A != 0 throws Error("c_kill() did not return 0 or -1")
  }
}

Side note: the comments are for this post, standard library has more meaningful, higher level comments.

A Brother Looking for Use Cases

While changing syntax to allow anonymous function after ::, another change was also made: allow anonymous function after . so that one could write expr.{ my arbitrary code } . The whole expression returns what the arbitrary code returns. Unfortunately, I did not come across (or maybe haven’t noticed) real use cases. The appropriate . method in the standard library is defined as follows:

F .(x, f:Fun) f(x)

# Allows
echo(5.{ A * 2 })  # 10

Have any use cases which look less stupid than the above? Let me know.

Python 3.8 Makes me Sad Again

Looking at some “exciting” features landing in Python 3.8, I’m still disappointed and frustrated by the language… like by quite a few other languages.

As an author of another programming language, I can’t stop thinking about how things “should have been done” from my perspective. I want to be explicit here. My perspective is biased towards correctness and “WTF are you doing?”. Therefore, take everything here with a appropriate amount of salt.

Yes, not talking about any “positive” changes here.

Assignment Expressions

There is new syntax := that assigns values to variables as part of a larger expression.

A fix which couldn’t be the best because of previous design decision.

“Somebody” ignored the wisdom of Lisp, which was “everything is an expression and evaluates to a value” (no statements vs expressions), and made assignment a statement in Python years ago. Now this can not be fixed in a straightforward manner. It must be another syntax. Two different syntaxes for almost the same thing which is = for assignment as a statement and := for expression assignment.

Positional-only Parameters

There is a new function parameter syntax / to indicate that some function parameters must be specified positionally and cannot be used as keyword arguments:

def f(a, b, /, c, d, *, e, f):
    print(a, b, c, d, e, f)

Trying to clean up a mess created by mixing positional and named parameters. Unfortunately I did not give it enough thought at the time and copied parameters handling behaviour from Python. Now NGS also has the same problem as Python had before 3.8. Hopefully, I will be able to fix it in some more elegant way than Python did.

LRU cache

functools.lru_cache() can now be used as a straight decorator rather than as a function returning a decorator. So both of these are now supported

OK. Bug fix. But … (functools.py)

    if isinstance(maxsize, int):
        # Negative maxsize is treated as 0
        if maxsize < 0:
            maxsize = 0

If you are setting LRU cache size to a negative number, it’s 99% by mistake. In NGS that would be an exception. That’s the approach that causes rm -rf $myfolder/ to remove / when myfolder is unset. Note that the maxsize code is not new but it’s still there in Python 3.8. I guess that is another mistake which can not be easily fixed now because that would break “working” code.

Collections

The _asdict() method for collections.namedtuple() now returns a dict instead of a collections.OrderedDict. This works because regular dicts have guaranteed ordering since Python 3.7

OK. Everybody had the mistake of making maps unordered: Perl, Ruby, Python.

  1. Ruby fixed that with the release of version 1.9 in 2008 (according to the post).
  2. Python fixed that with the release of version 3.7 in 2018 (which I take as 10 years of “f*ck you, the developer”).
  3. Perl keeps using unordered maps according to documentation.
  4. Same for Raku, again according to the documentation.

NGS had ordered maps from the start but that’s not a fair comparison because NGS project started in 2013, when the mistake was already understood.


How all that helps you, the reader? I encourage deeper thinking about the choice of programming languages that you use. From my perspective, all languages suck, while NGS aims to suck less than the rest for the intended use cases (tl;dr – for DevOps scripting).


Update 2020-08-16

Discussions:

  1. https://news.ycombinator.com/item?id=24176823
  2. https://lobste.rs/s/rgcgjz/python_3_8_makes_me_sad_again

Update 2020-08-17

It looks like the article above needs some clarification about my perspective: background, what I am doing and why.

TL;DR

The main points of the article are:

  1. Everything still sucks, including Python. By sucks I mean does not fit well with the tasks I need to do neither aligned with how I think about these tasks.
  2. I am trying to help the situation and the industry by developing my own programming language

Background about my Thinking

In general, I’m amazed with how bad the overall state of programming is. That includes:

  1. All programming languages that I know including my own NGS. This is aggravated by inability to fix anything properly for any language with substantial amount of code written in it because you will be breaking existing code. And if you do break, you get the shitstorm like with Python 3 or Perl 6 (Raku).
  2. Code quality of the programs written in all languages. Most of the code that I have seen is bad. Sometimes even in official examples.
  3. Quality of available materials, which are sometimes plainly wrong.
  4. Many of existing “Infrastructure as code” solutions, which in most cases follow the same path:
    1. Invent a DSL or use YAML.
    2. “figure out” later that it’s not powerful enough (by the way there is an elegant solution – a programming language, forgot the name)
    3. Create pretty ugly programming language on top of a DSL that was intended for data.

I am creating new programming language and a shell out of frustration with current situation, especially with bash and Python. Why these two? Because that’s what I was and still using to get my tasks done.

Are these languages bad? I don’t think it’s a question with any good answers. These languages don’t fit the tasks that I’m trying to do nor are aligned with how I think while being apparently one of the best choices available.

This Article Background

  1. Seen some post on RSS about new features in Python 3.8.
  2. Took a look.
  3. Yep, everything is still f*cked up.
  4. Wrote a post about it which was not meant to be “deep discussion about Python flaws”.

I was not planning to invest more time in this but here I am trying to clarify.

And your Language is Better? Really?

Let’s clarify “better”. For me, it’s to suck less than the rest for the intended use cases.

author really does consider himself a superior language designer than the Python core-dev team

( From https://www.reddit.com/r/Python/comments/iartgp/python_38_makes_me_sad_again/ )

I consider myself in much easier circumstances:

  1. No substantial amount of code is written in NGS yet.
  2. I’m starting later and therefore have the advantage of looking at more languages, avoiding bad parts, copying (with adaptation) the good parts.
  3. NGS targets a niche, it’s not intended to be general purpose language. Choices are clearer and easier when you target a niche.
  4. The language that I’m creating is almost by definition is more aligned with how I think. Hoping that people out there will benefit from using NGS if it is more aligned with how they think too.
  5. See also my Creating a language is easier now (2016) post.

Will I be able to make a “better” language?

From technical perspective, that’s probable: I am a skilled programmer in several languages and I have languages to look at more than everybody else had before. My disadvantage is not much experience in language design. I’m trying to offset that with thinking hard (about the language, the essence of what is being expressed, common patterns, etc), looking at other languages and experimenting.

From marketing perspective, I need to learn a lot. I am aware that “technically better” doesn’t matter as much as I would like to. Without community and users that would be a failed project.

Also don’t forget luck which I might or might not have.

What if NGS fails?

I think that the situation today is unbearable. I’m trying to fix it. I feel like I have to, despite the odds. I hope that even if NGS fails to move the industry forward it would be useful to somebody who will attempt that later.

On Information Loss in Software

“Information Loss” is a way to look at the world. The topic is very broad. This blog post will focus on information loss during development and operation of computer software.

This post discusses why Information Loss is bad and gives some examples.

My hope is that after reading this post, you will be able to spot information loss more easily. This should help you avoiding information loss, eliminating the need for costly information recovery phase. Some examples include specific recommendations how to avoid that particular case of information loss.

Information Loss Definition

Information Loss for the purposes of this blog is the situation where information I is available and is easily accessible at point in time t1 but later, when it’s needed at point in time t2, it is either not available or not easily accessible.

The post will present various categories of information loss with examples. The list is not exhaustive; it’s not meant to be. The intention is to give some examples to help you get the feel and start looking at things from the information loss perspective.

Why Information Loss is Bad?

In many cases of Information Loss, the missing information can be recovered but that requires resources to be thrown at the issue (time and/or money). That is the situation I would like to help you to avoid.

Between the Head and the Code

When working on software, the first place the information loss occurs is when the programmer translates thoughts into code. Information loss at this stage will manifest itself as increased WTF-per-minute during code review or just code reading. Each time the code is read, there will be additional cognitive load while the reader reconstructs the programmer’s idea behind the code.

I have identified two main causes for information loss at the head-to-code stage:

  • Programmer’s fault
  • Programming language imposed

Information Loss due to Programmer’s Fault

The more a programmer is experienced, the less likely is the occurrence of information loss at this stage.

Misnamed Variable

In programmers head: number of servers running the ETL task. Name of the variable in the code: n. WTFs at code review – guaranteed.

Misnamed Function

I’m pretty sure getUser() should not update say last name of the user in database. Such naming is criminal but unfortunately I’ve seen code similar to that.

Use of Magic Numbers

if (result == 126) .... The person who wrote 126 knew what that number means. The person reading the code will need to spend time checking what that number means. One should use constants or enums instead: if (result == NOT_EXECUTABLE) ....

Missing Comments in Code

Most important comments are about why something is being done as opposed to how. If ones code is in a high-level language and of a good quality, it’s a rare occasion one needs to comment about what or how something is being done. On the other hand comments like “Working around API bug: it returns false instead of empty array” are very valuable.

Incorrect Usage of Data Types

A list of people, for example, is not just a list. It has semantic meaning. It’s much easier to understand a program when correct types are used for the data. Java has generics to convey such information, for example List<Person>. Some other languages have type systems that are powerful enough to convey such information too.

Programming Language Imposed Information Loss

Limitations of programming languages lead to less expressive code because the idea in programmer’s head can not be expressed in a straightforward manner. The readers of the code will struggle more (read waste time) to understand the code.

Unnamed Function Parameters

bash and perl5 (not sure about perl5 anymore, there was something experimental) do not have the syntax for specifying function parameter names. This makes the code less expressive. Sometimes programmers will do “the right thing”:

myfunc() {
    local target_file=$1
    ...
}

… but when they don’t, you finish with unnamed parameter, wondering what it could mean:

myfunc() {
    if [[ -f $1 ]];then
        ...
    fi
}

Is that a file to generate or a source file? You don’t know, you have to read on in myfunc hoping for the answer.

Recommendation: even if your language does not support named parameters, emulate them.

Expansion of Strings into Several Arguments (bash)

rm $x

Does that remove one file or several? What the programmer meant? You simply don’t know. It depends on the contents of x, which is typically split into arguments by spaces. You are lucky if you can deduce from the variable name whether it’s one or several files.

From today’s perspective this is just bad design. Back at the day I guess it was the most practical way to implement arrays.

Recommendation: use one of the two alternatives blow and do not use rm $x form.

  • Single file: rm "$x" (proper quoting)
  • Multiple files: rm "${my_files[@]}" (bash arrays)

Side note: this “feature” caused so much pain over the years when x would contain a spaces by accident. Even when x is meant to be used as an array, elements of that array can also contain spaces by accident.

Error Handling

In languages that do not support exceptions (bash, C, Go), the programmer is forced into one of two situations:

  • Write incorrect code that ignores the errors (on purpose or by mistake, go figure which one)
  • Write verbose code that handles the errors. When the code handles every possible error, it becomes cluttered with error handling and it takes more time to understand the code. That’s the case where information loss occurs because the reader is overwhelmed by the code.

In NGS, since typical use case is scripting, I wanted to have the option for the code to be concise. That rules out returning status code along with the result because the caller is then forced to check it. It does make more sense for NGS to have exceptions and for scripts to decide whether to catch them or let the whole script terminate with error because of an uncaught exception.

Unordered Hash/Map/dict Data Structure

Hash data structure is implemented in a non-order-preserving manner in some languages. That means that the programmer can not express the intention freely in situations where the order of key/value pairs is important. That pushes towards less readable code as the programmer fights the language by implementing his/her own ordered dictionary.

Information loss in this case is again losing the sight of programmer’s intention.

Fortunately many modern languages solved the issue by now:

Recommendation: check whether your language has the data structure you really want to use, either built-in or in a library.

Limited Data Structures (bash)

Working with data structures in bash results more or less convoluted code, depending on the data structures one need to work with. This is direct consequence of bash supporting exactly three data structures:

  • Scalar (strings which can sometimes be treated as numbers or arrays)
  • Array
  • Associative array

These data structures can not be nested.

The result is much less readable code where the original intent of the author is harder to recover as opposed to data manipulation in other popular languages (Python, Ruby, etc).

Recommendation: consider using other languages besides bash for heavy data manipulation code.

Absence of non-nullable Types

In some languages there is no straightforward way to specify non-nullable parameters. The programmers are then required to check whether each passed parameter is null. That results more boilerplate code. Let’s look at the following bit of Java code from the popular Apache Flink project:

// flink/flink-java/src/main/java/org/apache/flink/api/java/DataSet.java

protected DataSet(ExecutionEnvironment context, TypeInformation<T> typeInfo) {
    if (context == null) {
        throw new NullPointerException("context is null");
    }
    if (typeInfo == null) {
        throw new NullPointerException("typeInfo is null");
    }

    this.context = context;
    this.type = typeInfo;
}

Asynchronous Computing Model (JavaScript)

In JavaScript for example, progressively more readable code uses:

Again, information loss occurs when programmer’s intention is lost in the code because the code looks like a big struggle against asynchronicity and the language.

Recommendation: prefer async/await over Promises and prefer Promises over callbacks.

Loss of semantic information (JavaScript)

console.log() vs debug('my-module')('my message') in JavaScript. When a programmer chooses to use log() instead of debug(), loss of semantic information occurs. In this case it means more effort in finding the needed information in the output as opposed to simpler turning on and off the relevant debug sections.

Recommendation: use the debug module.

Information Loss at Runtime

Information loss at runtime will manifest as harder debugging.

Empty Catch Clause

This is borderline criminal. Except for very few cases when empty catch clause is really appropriate, by placing empty catch clause in the code, you are setting up a bomb for your colleagues. They will pay with their time, tears and mental health, not to mention they will be hating you. Where is the information loss? At the time the exception is generated, there is useful information about what happened. Empty catch clause loses that information. Result: hard to find exceptions and their causes.

In NGS, there are clear ways to express that you didn’t just forgot to handle the exception (try ... catch(e) { }) but you actually don’t care (or know exactly) what happened:

  • try EXPR without the catch clause at all. If EXPR throws exception, try EXPR evaluates to null, otherwise evaluates to EXPR.
  • EXPR tor DFLT if EXPR throws an exception, evaluates to DFLT, otherwise evaluates to EXPR.

Writing to stdout Instead of stderr

stdout has semantic meaning (result of the computation) and stderr also has semantic meaning (errors description). It will make harder for any wrapper script to deal with a program that outputs errors to stdout or outputs the result to stderr. The semantic information about the text is lost and then needs to be recovered by the caller if the two outputs are mixed.

Wrong exit codes reporting

This one really hinders automation.

if ... then {
    ...
    error("error occurred")
    exit(0) # incorrect error code reported
}

Since it’s easy to forget about exit code, and the common case is that exit() means abnormal termination of the program, in NGS exit() that does not provide an exit code defaults to exit code 1.

Wrong exit codes handling

if [ -e MY_FILE ] ...

This is all over bash scripts… and it’s wrong. Which exit codes [ program/built-in returns? Zero for “yes”, one for “no”, and two for “An error occurred”. Guess what. You can’t handle three distinct cases with two if branches; “An error occurred” is causing the “false” branch of the if to be taken. If you are lucky, you will spot error message on stderr. If you are not lucky, your script will just work incorrectly in some circumstances.

At this point the tradeoff in NGS was made in favor of correctness, not simplicity. if $(test -e MY_FILE) ... in NGS can go three ways: “yes” branch, “no” branch and an exception. After any external process is finished, NGS checks the exit code. For unknown process, non-zero exit code cases an exception. For test and a few others, zero and one are not causing an exception. The exit code checking facility is extensible and one can easily “teach” NGS about new programs.

Broaden your Horizon – Extras

I’ll mention here non-strictly software development related information loss cases.

Untagged Cloud Resources (AWS)

Have you just created an EC2 instance and named it Server or maybe you haven’t tagged it at all? Congratulations, semantic information has just been lost. You colleagues will strugle to understand what is the role of instance.

Recommendation: rigorously tag the resources, have alerts for untagged or improperly tagged resources. In AWS you can also know who created the resource by looking at CloudTrail.

Side note: In Azure, any resource must belong to a “Resource Group” which makes it much easier to track the resources.

GUI

You just performed operation in GUI. The information of what happened was just lost the minute you performed the operation. Good luck reproducing or documenting it.

The plan to combat this in NGS is to have textual representation for each operation that is performed via GUI.

String Concatenation

Every time two strings are concatenated into one, there is some information loss.

Recommendation: instead of parsing unstructured text (result of concatenation) later, consider using structured data format when producing the output. (Example: JSON).


Hope that helps. Have fun!

WordPress SSL small and easy hack

Hi people.
Just installed WordPress (version 2.0.10-1etch3). After the regular installation procedure, the first thing I was trying to do is to configure the SSL.
Being naive, just added Apache alias on another virtual host (which has SSL) to point to the same directory as the original installation.
Oops, that does not work. Googled. Came up with:
http://codex.wordpress.org/Administration_Over_SSL
That page just scared me. It looks too complicated for whatever it does. Spent few additional minutes of search. No satisfying answer.

Here is my small hack which works for me and which should work fine for any installations as long as URI beginnings are the same on both virtual hosts (read same Apache alias, that’s roughly the same thing)
In wp-config.php I’ve added at the end, just before ?> :

wp_cache_add('home_before_ilya', get_option('home'), 'options');
add_filter('option_siteurl', 'ilyas_siteurl');
add_filter('option_home', 'ilyas_siteurl');

function ilyas_siteurl($v) {
    if(isset($_SERVER['HTTPS']) && $_SERVER['HTTPS'] == 'on') {
        $s = 's';
    } else {
        $s = '';
    }
    $v = "http$s://{$_SERVER['HTTP_HOST']}/blog";
    return $v;
}

In wp-includes/functions.php in function weblog_ping :
Replace

$home = trailingslashit( get_option('home') );

with

$home = trailingslashit( get_option('home_before_ilya') );

Note that the code was not really tested. All I know is it worked for me. Use at your own risk!
Hope that helps.