AWS CDK Opinionated Pipeline – Where is What?

Background

You use AWS CDK. It’s great. It does a lot for you. Then one day something goes wrong. OK, it didn’t happen yet. But you want to be prepared for that (at least to some extent). The following information is what I have found when I was preparing. Sharing to hopefully save the reader some time.

Basics

Before we dive in, let’s just make sure we’ve got the basics covered

cdk ls

cdk ls lists all the stacks in the app, including the pipeline.

Example from my test project:

$ cdk ls
Proj1Stack
Proj1Stack/Deploy1/LambdaStack1
Proj1Stack/Deploy2/LambdaStack1
  • Proj1Stack is the pipeline.
  • Deploy1 and Deploy2 are “stages”

cdk synth

cdk synth $STACK_NAME >1.yaml is your friend, a debugging tool. It shows the generated CloudFormation.

cdk.out directory

cdk.out is the directory where cdk synth outputs everything that’s need for deploying (CloudFormation templates, related assets, metadata). They call it Cloud Assembly.

All assets are named based on the hash of their content so they are unique and immutable.

How the Generated Pipeline Looks Like?

When you use an opinionated pipeline, you can see the following generated CodePipeline actions:

  • Source (with long hash as output artifact name)
  • Build with name Synth (a CodeBuild project that runs cdk synth)
  • Build with name SelfMutate (a CodeBuild project that runs cdk deploy to update the pipeline)
  • Build with name FileAsset1 (a CodeBuild project that runs cdk-assets publish). From reading sources: there might be several cdk-assets publish commands configured in the CodeBuild project.
  • Then two CloudFormation deploy actions per each “stage” you want to deploy to (usage of change sets is the default but can be disabled as per documentation, see useChangeSets):
    • CHANGE_SET_REPLACE
    • CHANGE_SET_EXECUTE

cdk-assets

“It will take the assets listed in the manifest, prepare them as required and upload them to the locations indicated in the manifest.”

Note that cdk-assets is not making any decisions; metadata in the cdk.out directory has the information about assets, how to build/transform them and where they go.

cdk-assets can only handle two types of assets:

  • files (including directories). cdk-assets knows how to zip directories and how to upload files and directories to S3.

    (From reading source code) Didn’t see in use but apparently cdk-assets can also run an executable to package a file (or directory?). In this case the content-type of the output is assumed to be application/zip. šŸ¤·ā€ā™‚ļø
  • Docker images. cdk-assets knows how to build Docker images and push them into registry.

Sample command to see the list of assets: npx cdk-assets ls -p cdk.out/Proj1Stack.assets.json

What is Built When?

Files – unprocessed

If the files/directories don’t need any processing, they are just copied over to cdk.out during cdk synth and given a name which is a hash of the contents.

Example: Lambda function code

Files – processed

The processing happens during the cdk synth so that cdk.out contains already processed assets.

Example: Node.JS Lambda function code (processed by tsc (optionally) and esbuild)

Docker Images

cdk-assets builds a docker image and pushes it into the specified repository. The input for the build of the image is a directory in cdk.out which has the Dockerfile and related files.

Deploy

After everything was built and uploaded during cdk synth and cdk-assets, the deploy uses CloudFormation template (templates?) from the cdk.out directory. At this point the assets (which the template references) are in ECR and S3.


  • I tried to condense the information that deemed important.
  • Let me know if something is missing or if you see mistakes.
  • The plan is to update this post as I discover new information.
  • My mistakes so far
    • Started taking notes quite a few hours into the process instead of from the start. Especially it would save me the jumping between the pipeline and the build projects to re-check what each action does.
    • Editing this post tired
  • Last edit: 2023-01-20

API-less

I was always complaining about how AWS made APIs that are inconsistent in so many dimensions. AWS teams can’t even (read “don’t care”) to agree on how to name pagination related fields.

Because I am such who was that again? an important person, I hope at least someone everybody in AWS have read my blog and they are checking by mistake it at least twice a day for new stupid rants extreme wisdom.

One of these people said “Fook you, Ilya” and made a service without API. Not kidding. Meet AWS Control Tower. No API, no CLI, no CloudFormation. “Take that, Ilya!”

For some balance I will mention that other people in AWS are trying to do some uniform API. Not stellar at the moment but hey, at least they are trying.

Have a nice $(curl https://api.timeofday.example.com/) !

AWS Cloud Control is in for Review

Since 2016 I have worked on AWS library. On 2021-09-30, AWS released something comparable, AWS Cloud Control API. If it’s comparable, let’s compare. I hope the reader will find this unique perspective interesting.

Purpose

AWS Cloud Control provides uniform API to Create, Read, Update, Delete and List different resources. In short, CRUDL.

My library aims to provide uniform simplified declarative interface where the main functionality is to converge to desired state.

Supported Resources

Cloud Control supports much more resources. My library supports very few, the ones that I needed the most for my work. It’s just how experimental library by one person in spare time compares with a product from AWS.

Mode of Operation

Cloud Control

Operates on one resource at a time.

Cloud Control provides relatively low level API, CRUDL. The advantage of Cloud Control over existing APIs is the uniformity, not achieving a desired state. You either create or update, and you need to know which one you are doing. (Deletion is a different story).

I am amazed that somebody at AWS was able to pull this off… where different services don’t agree on naming conventions for tagging resources, how and what can be tagged, and don’t have a convention for naming pagination fields. (Which by the way is nightmare for the users but typically abstracted by Terraform or CloudFormation). Sorry for the rant.

The operations are performed asynchronously. You have an API to query the status: GetResourceRequestStatus. This approach is better for advanced use cases, where multiple resources are being created simultaneously. (Like CloudFormation?)

The operations are idempotent. This is achieved using client token, as in other AWS services.

My Library

Operates either on one resource or a set of resources of the same type at a time. Edit: … from default, specified , or all regions.

The library looks up the resources and either creates or updates them, depending on whether they exist or not.

The operations are performed synchronously. The library throws exception if anything goes wrong. I find this approach much more user friendly for basic use cases.

Idempotency is out of the picture. Should it be in? Probably. Looks like omission on my side. How to best shield the user of the library from this issue? Don’t know yet. Need to think.

Describing Desired State

Cloud Control

The ā€œcreateā€ API has desired state parameter: DesiredState.

The ā€œupdateā€ API doesn’t, which I find very strange. I thought that desired state is something to be achieved no matter what the current state is. Desired state is only used in the ā€œcreateā€ API so from my perspective it could have been called ā€œpropertiesā€ or whatever.

The ā€œupdateā€ API has PatchDocument parameter, which I find not user friendly for the following reasons:

  1. Who actually works with JSON patch format? It’s the first time I see that I need to use it.
  2. I think it is less mentally challenging to describe … desired state and not the delta. This is typically done by IaC tools, including CloudFormation: calculate the diff between current and desired state for the user, so that the user would not need to get into this.
  3. It makes the update inconsistent with create.

My Library

There is no separate create and update. The user specifies the desired state as a parameter to the ā€œconvergeā€ function. Converge then either creates or updates the resource / resources. The (non existent) ā€œcreateā€ and ā€œupdateā€ are therefore completely uniform in the library.

Search / Filtering

Cloud Control

Search is not supported. In practice it means listing all the resources and filtering the result on the client side. Typically that can be done in AWS CLI with the --query flag which is supported globally, for any AWS CLI command. Unfortunately I don’t see a way to make it work in this situation. The returned result has ResourceDescriptions field, an array where each item has Properties field, the Properties field is a string (JSON). Apparently JMESPath does not support parsing JSON in this situation. This means that the output of Cloud Control AWS CLI will be piped to jq or maybe a programming language for filtering and/or further processing.

My Library

While the number of supported resources and filters is low, the library supports filtering. The filtering is done on the server or on the client, completely transparent to the users of the library. What’s done on the server and what’s on the client? Simple – when filtering a given property is supported on the server side, it’s done there.

Desired State Format

Cloud Control

Cloud Control uses CloudFormation syntax. This makes sense.

My Library

My library uses the same format as you would see when using AWS CLI to describe the resource. It allows access to properties that CloudFormation does not have, and is unlikely to have, so for example this works:

  1. instance = AWS::Instance(...).converge(State = 'running') — which creates or turns on the specified EC2 instance / instances. Turning on or off EC2 instances is not supported in CloudFormation.
  2. vpc = AWS::Vpc(IsDefault=true).expect(1) — get a reference to default VPC (to use in further operations).

Closing Thoughts

  1. Cloud Control looks like a step in the right direction.
  2. Having Properties as a string is a major ergonomic issue.
  3. JSON patch for update is a huge ergonomic issue.
  4. Search (filtering in List) functionality is missing.
  5. “Desired state” naming is unjustified.

Cloud Control is a big effort, therefore let’s give the team some slack and see how the API is improved with time? Hopefully soon šŸ™‚


Have a nice day, evening, or night! Mornings are just statistically less nice…

How fucked is AWS CLI/API

In the 21st century, we can do significantly better than this crap AWS CLI. We just need to think from the users’ perspective for a change and not just dump on users whatever we were using internally or seen in a nightmare and then implemented.

thinking-3082823_640
Think from the users’ perspective

I’m working on a solution which is described towards the end of the post. It’s not ready yet but the (working btw) examples will convince the reader that “significantly better” is very possible… I hope.

Background

I’m building AWS library and a command line tool. Since I’m doing it for a shell-like language (NGS), the AWS library that I’m building uses AWS CLI. Frustration and anger are prominent feelings when working with AWS CLI. I will just list here some of the reasons behind that feeling.

furious-2514031_640
AWS CLI/API – WTF were they thinking?

Overall impression

Separate teams worked on different services and were not talking to each other. Then the something like this happened: “OK, we have these APIs here, let’s expose them. User experience? No, we don’t have time for that / we don’t care”.

Tags

Representing a map (key-value pairs) as an array ( "Tags": [ { "Key": "some tag name", "Value": "some tag value" }, ... ] ), as AWS CLI does is insane… but only when you think about the user (developer in this case) experience.

shocked-2681488_640
AWS Tags – no, not in this form!

Reservations

When listing EC2 instances using aws ec2 describe-instances, the data is organized as list of Reservations and each Reservation has list of instances. I’m using AWS for quite a few years and I never needed to do anything with Reservations but I did spent some time unwrapping again and again the interesting data (list of instances) from the unwanted layer of madness. It really feels like “OK, Internally we have Reservations and that’s how our API works, let’s just expose it as it is”.

annoyed-3126442_640
Who da fuck ever used Reservations?

Results data structure

AWS CLI usually (of course not always!) returns a map/hash at the top level of the result. The most interesting data would be called for exampleĀ LoadBalancerDescriptions, or Vpcs, or Subnets, or … ensuring difficulty making generic tooling around AWS CLI. Thanks! Do you even use your own AWS CLI, Amazon?

Inconsistencies

beer-2370783_640
Kind of the same but not really

Security Groups

These are of course special. Think ofĀ aws ec2 create-security-group ...

  1. --group-name andĀ --description are mandatory command line arguments. For now, I haven’t seen any other resource creation that requires both name and description.
  2. The command returns something like {Ā "GroupId": "sg-903004f8" } which is unlike many other commands which return a hash with the properties of the newly created resource, not just the ID.

Elastic Load Balancer

Oh, I like this one. It’s unlike any other.

  1. The unique key by which you find a load balance is name, unlike other resources which use ids.
  2. Tags on load balancers work differently. When you list load balancers, you don’t get the tags with the list like you have when listing instances, subnets, vpcs, etc. You need to issue additional command:Ā aws elb describe-tags --load-balancer-names ...
  3. aws elb describe-load-balancers – items in the returned list have fieldĀ VPCId while other places usually name it VpcId .

Target groups

aws elbv2 deregister-targets --target-group-arn ...

Yes, this particular command and it’s “target group” friends use ARN, not a name (as ELB) and not an ID (like most EC2 resources).

DHCP options (sets)

(Haven’t used but considered so looked at documentation and here is what I found)

Example: aws ec2 create-dhcp-options --dhcp-configuration "Key=domain-name-servers,Values=10.2.5.1,10.2.5.2"

Yes, the create syntax is unlike other commands and looks like filter syntax. Instead of say --name-servers ns1 ns2 ... switch you haveĀ "Key=domain-name-servers,Values="Ā WTF?

Route 53

aws route53 list-hosted-zones does not return the tags. Here is an output of the command:

{
 "HostedZones": [
 {
 "Id": "/hostedzone/Z2V8OM9UJRMOVJ",
 "Name": "test1.com.",
 "CallerReference": "test1.com",
 "Config": {
 "PrivateZone": false
 },
 "ResourceRecordSetCount": 2
 },
 {
 "Id": "/hostedzone/Z3BM21F7GYXS7Y",
 "Name": "test2.com.",
 "CallerReference": "test2.com",
 "Config": {
 "PrivateZone": false
 },
 "ResourceRecordSetCount": 2
 }
 ]
}

Wanna get the tags? F*ck you! Here is the command:Ā aws route53 list-tags-for-resources --resource-type hostedzone --resource-ids Z2V8OM9UJRMOVJ Z3BM21F7GYXS7Y . Get it? You are supposed to get the Id field from the list generated by list-hosted-zones, split it by slash and then use the last part as resource ids. Tagging zones also uses the rightmost part of id:Ā aws route53 change-tags-for-resource --resource-type hostedzone --resource-id Z2V8OM9UJRMOVJ --add-tags Key=t1,Value=v11

… but that apparently was not enough differentiation šŸ™‚ Check this out: aws elb add-tags --load-balancer-names NAME --tags TAGS vsĀ aws route53 change-tags-for-resource --resource-type hostedzone --resource-id ID --add-tags TAGS andĀ aws elb remove-tags --load-balancer-names NAME --tags TAGS vsĀ aws route53 change-tags-for-resource --resource-type hostedzone --resource-id ID --remove-tag-keys TAGS . Trick question: on how many dimensions this is different? So wrong on so many levels šŸ™‚

Here is another one: when you fetch records from a zone you use the full id, /hostedzone/Z2V8OM9UJRMOVJ , notĀ Z2V8OM9UJRMOVJ :Ā aws route53 list-resource-record-sets --hosted-zone-id /hostedzone/Z2V8OM9UJRMOVJ

 

(many more to come)

Expected comments and answers

feedback-2044700_640
Internet discussions are the best šŸ˜‰

Just use Terraform

I might some day. For the cases I have in mind, for now I prefer a tool that:

  1. Is conceptually simpler (improved scripting, not something completely different)
  2. Doesn’t require a state file (I don’t want to explain to a client with two servers where the state file is and what it does)
  3. Can be easily used for ad-hoc listing and manipulation of the resources, even when these resources were not created/managed by the tool.
  4. Can stop and start instances (impossible with Terraform last time I checked a few months ago)

Just use CloudFormation

I don’t find it convenient. Also, see Terraform points above.

By the way, the phrases of the form “Just use X” are not appreciated.

You just shit on what other people do to push your tools

Yes, I reserve the right to shit on things. Here I mean specifically AWS CLI. Freedom of speech, you know… especially when:

  1. The product (AWS API) is technically subpar
  2. The creator of the product does not lack resources
  3. The product is used widely, wasting huge amounts of time of the users

If I’m investing my time to write my own tool purely out of frustration, I will definitely tell why I think the product is crap.

Is that just a rant or you have alternatives?

I’m working on it. The command line tool is called na (Ngs Aws). It’s a thin wrapper around declarative primitives library for AWS.

excited-3126450_640
There might be a solution to this madness!!!

The command line tool that I’m working on is not anywhere ready but here is a gist of what you can do with it:

# list vpcs
na vpc

# Find the default VPC
na vpc IsDefault:true

# List security groups of all vpcs which have tag "Name" "v1".
na vpc Name=v1 sg

# Idempotent operations, in this case deletion.
# No error when this group does not exist.
# Delete "test-group-name" in the given vpc(s).
na vpc Name=v1 sg GroupName:test-group-name DEL

# List all volumes which are attached to stopped instances
na i State:stopped vol

# Delete all volumes that are not attached to instances
na vol State:available DEL

# Stop all instances which are tagged as
# "role" "ocsp-proxy" and "env" "dev".
na i role=ocsp-proxy env=dev SET State:stopped
  1. Na never dumps huge amounts of data to your terminal. As a human, you will probably not be able to process it so when number of items (rows in a table actually) is above certain configurable threshold, you will see something like this:
    $ na vol 
    === Digest of 78 rows ===
    ...

    It will show how many unique values there are in each column, min and max value for each column. Thinking about displaying top 5 (or so) top and bottom values for each column.

  2. Na has concept of “related” resources so when you write something likeĀ na i State:stopped vol , it knows that the volumes you want to show are related to the instances that you mentioned. In this particular case, it means volumes attached to the instances.
  3. Note the consistency of what you see in output and arguments to CLI. If something is called “State” in the data structure returned by AWS CLI, it will be called “State” in the query, not “state” (“–state”).

I will be updating this post as things come along.