gRPC + REST API on AWS

Showing a setup of gRPC service which is also exposed as a REST API. It’s a setup that happens to work for us. No alternatives will be discussed in this post.

This is a concise blog post.

Architecture

  1. ALB with HTTPS listener (trivially configured, out of scope of this post)
  2. ECS running a task with 3 containers:
    • API Gateway. Implemented by Envoy. does:
      • requests authorization using the service in next container
      • proxies gRPC requests
      • proxies REST requests (converting them to upstream gRPC requests).
    • authorization service implemented with OPA
    • Our gRPC application

Notes

Health checks are not in very good shape yet

ECS Configuration (Simplified Excerpt)

In case the reader is not familiar, it CloudFormation below.

  TaskDefinition:
    Type: AWS::ECS::TaskDefinition
    Properties:
      ContainerDefinitions:
        - Name: apigw
          Image: !Ref ApiGwImage
          PortMappings:
            - ContainerPort: !Ref ContainerPort
        - Name: opa
          Image: !Ref OpaImage
          PortMappings:
            - ContainerPort: 9191
        - Name: app
          Image: !Ref AppImage
          PortMappings:
            - ContainerPort: 4000

  Service:
    DependsOn:
      - GrpcListenerRule
      - RestListenerRule
      - GrpcTargetGroup
      - RestTargetGroup
    Type: AWS::ECS::Service
    Properties:
      ServiceName: !Ref ServiceName
      Cluster: !Ref Cluster
      TaskDefinition: !Ref TaskDefinition
      LoadBalancers:
        - ContainerName: apigw
          ContainerPort: !Ref ContainerPort
          TargetGroupArn: !Ref GrpcTargetGroup
        - ContainerName: apigw
          ContainerPort: !Ref ContainerPort
          TargetGroupArn: !Ref RestTargetGroup

  GrpcTargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      HealthCheckIntervalSeconds: 10
      HealthCheckPath: /
      HealthCheckTimeoutSeconds: 5
      Matcher:
        GrpcCode: "0-99"
      UnhealthyThresholdCount: 2
      HealthyThresholdCount: 2
      Port: !Ref ContainerPort
      Protocol: HTTP
      ProtocolVersion: GRPC
      TargetGroupAttributes:
        - Key: deregistration_delay.timeout_seconds
          Value: 60 # default is 300
      TargetType: ip
      VpcId: !ImportValue VpcId

  RestTargetGroup:
    Type: AWS::ElasticLoadBalancingV2::TargetGroup
    Properties:
      HealthCheckIntervalSeconds: 10
      HealthCheckPath: /rest/not-found
      HealthCheckTimeoutSeconds: 5
      Matcher:
        HttpCode: 404
      UnhealthyThresholdCount: 2
      HealthyThresholdCount: 2
      Port: !Ref ContainerPort
      Protocol: HTTP
      ProtocolVersion: HTTP1
      TargetGroupAttributes:
        - Key: deregistration_delay.timeout_seconds
          Value: 60 # default is 300
      TargetType: ip
      VpcId: !ImportValue VpcId

  GrpcListenerRule:
    Type: AWS::ElasticLoadBalancingV2::ListenerRule
    Properties:
      Actions:
        - Type: forward
          TargetGroupArn: !Ref GrpcTargetGroup
      Conditions:
        - Field: path-pattern
          PathPatternConfig:
            Values:
              - '/censored.v1.CensoredService/*'
              - '/censored.v1.CensoredAdminService/*'
              - '/censored.v1.CensoredSystemService/*'
      ListenerArn: ...
      Priority: 1000

  RestListenerRule:
    Type: AWS::ElasticLoadBalancingV2::ListenerRule
    Properties:
      Actions:
        - Type: forward
          TargetGroupArn: !Ref RestTargetGroup
      Conditions:
        - Field: path-pattern
          PathPatternConfig:
            Values:
              - '/rest/v1/*'
      ListenerArn: ...
      Priority: 1001

Envoy Configuration (Simplified Excerpt)

static_resources:
  listeners:
    - address:
        socket_address:
          address: 0.0.0.0
          port_value: 8000
      filter_chains:
        - filters:
            - name: Connection Manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                via: CensoredGW
                route_config:
                  name: Static response for tests
                  virtual_hosts:
                    - name: backend
                      domains:
                        - "*"
                      routes:
                        - match:
                            prefix: "/test/static"
                          direct_response:
                            status: 200
                            body:
                              inline_string: "Static response for tests"
                        # Reference: https://envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/grpc_json_transcoder_filter#route-configs-for-transcoded-requests
                        - match:
                            prefix: "/"
                          route:
                            cluster: upstream
                            timeout: 60s
                http_filters:
                  - name: envoy.filters.http.grpc_json_transcoder
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.grpc_json_transcoder.v3.GrpcJsonTranscoder
                      # maybe disable later:
                      auto_mapping: true
                      proto_descriptor: "../path/to/proto_descriptor.bin" ### See next heading in this post
                      services:
                        - censored.v1.CensoredService
                        - censored.v1.CensoredAdminService
                        - censored.v1.CensoredSystemService
                      print_options:
                        add_whitespace: true
                        always_print_primitive_fields: true
                      request_validation_options:
                        reject_unknown_method: true
                        reject_unknown_query_parameters: true
                  - name: envoy.filters.http.cors
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
                  - name: envoy.ext_authz
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.ext_authz.v3.ExtAuthz
                      failure_mode_allow: false
                      with_request_body:
                        max_request_bytes: 10485760 # 10M
                        allow_partial_message: false
                        pack_as_bytes: true
                      transport_api_version: V3
                      grpc_service:
                        envoy_grpc:
                          cluster_name: opa-agent
                        timeout: 10s
                  - name: envoy.filters.http.router
                    # https://github.com/envoyproxy/envoy/issues/21464
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
                always_set_request_id_in_response: true
                access_log:
                  - typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                      # https://www.envoyproxy.io/docs/envoy/latest/configuration/observability/access_log/usage#config-access-log-default-format

  # Based on https://www.envoyproxy.io/docs/envoy/latest/configuration/http/http_filters/grpc_json_transcoder_filter
  clusters:
    - name: opa-agent
      connect_timeout: 0.25s
      type: STRICT_DNS
      typed_extension_protocol_options:
        envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
          "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
          explicit_http_config:
            http2_protocol_options: { }
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: service
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.1
                      port_value: 9191
    - name: upstream
      type: STRICT_DNS
      typed_extension_protocol_options:
        envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
          "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
          explicit_http_config:
            http2_protocol_options: {}
      load_assignment:
        cluster_name: grpc
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.1
                      port_value: 4000

proto_descriptor.bin

GrpcJsonTranscoder must have the proto descriptor file in order to know how to transcode. The file contains:

  1. proto definitions of your services, including extension that describes how to expose the services as REST
  2. dependencies of the above proto definitions

The descriptor file is generated using a command similar to the following:

buf build -o proto_descriptor.bin --as-file-descriptor-set --path path/to/my.proto

buf is a way to manage .proto files and their dependencies (very imprecise definition, sorry)

If I remember correctly, you can generate the descriptor with protoc (without buf) but I don’t remember how.

grpcurl

Same descriptor file is used with grpcurl when you later test your service from the command line:

grpcurl -H "Authorization: Bearer ..." -protoset proto_descriptor.bin "example.com:443" censored.service.name/MyFunc

my.proto

This is how a protobuf definition with REST extension looks like (excerpt):

import "google/api/annotations.proto";

service Censored {
  rpc MyCreate(CreateRequest) returns (CreateResponse){
    option (google.api.http) = { post: "/rest/v1/my-objs" };
  }
  rpc MyGet(GetRequest) returns (GetResponse) {
    option (google.api.http) = { get: "/rest/v1/my-objs/{id}" };
  }
}

Excerpt from buf.yaml corresponding to the import above:

version: v1

deps:
  - buf.build/googleapis/googleapis


Hope this helps.

Sorry, I was in a rush to get this out. If anything is unclear or missing, please let me know.

AWS CDK Opinionated Pipeline – Where is What?

Background

You use AWS CDK. It’s great. It does a lot for you. Then one day something goes wrong. OK, it didn’t happen yet. But you want to be prepared for that (at least to some extent). The following information is what I have found when I was preparing. Sharing to hopefully save the reader some time.

Basics

Before we dive in, let’s just make sure we’ve got the basics covered

cdk ls

cdk ls lists all the stacks in the app, including the pipeline.

Example from my test project:

$ cdk ls
Proj1Stack
Proj1Stack/Deploy1/LambdaStack1
Proj1Stack/Deploy2/LambdaStack1
  • Proj1Stack is the pipeline.
  • Deploy1 and Deploy2 are “stages”

cdk synth

cdk synth $STACK_NAME >1.yaml is your friend, a debugging tool. It shows the generated CloudFormation.

cdk.out directory

cdk.out is the directory where cdk synth outputs everything that’s need for deploying (CloudFormation templates, related assets, metadata). They call it Cloud Assembly.

All assets are named based on the hash of their content so they are unique and immutable.

How the Generated Pipeline Looks Like?

When you use an opinionated pipeline, you can see the following generated CodePipeline actions:

  • Source (with long hash as output artifact name)
  • Build with name Synth (a CodeBuild project that runs cdk synth)
  • Build with name SelfMutate (a CodeBuild project that runs cdk deploy to update the pipeline)
  • Build with name FileAsset1 (a CodeBuild project that runs cdk-assets publish). From reading sources: there might be several cdk-assets publish commands configured in the CodeBuild project.
  • Then two CloudFormation deploy actions per each “stage” you want to deploy to (usage of change sets is the default but can be disabled as per documentation, see useChangeSets):
    • CHANGE_SET_REPLACE
    • CHANGE_SET_EXECUTE

cdk-assets

“It will take the assets listed in the manifest, prepare them as required and upload them to the locations indicated in the manifest.”

Note that cdk-assets is not making any decisions; metadata in the cdk.out directory has the information about assets, how to build/transform them and where they go.

cdk-assets can only handle two types of assets:

  • files (including directories). cdk-assets knows how to zip directories and how to upload files and directories to S3.

    (From reading source code) Didn’t see in use but apparently cdk-assets can also run an executable to package a file (or directory?). In this case the content-type of the output is assumed to be application/zip. 🤷‍♂️
  • Docker images. cdk-assets knows how to build Docker images and push them into registry.

Sample command to see the list of assets: npx cdk-assets ls -p cdk.out/Proj1Stack.assets.json

What is Built When?

Files – unprocessed

If the files/directories don’t need any processing, they are just copied over to cdk.out during cdk synth and given a name which is a hash of the contents.

Example: Lambda function code

Files – processed

The processing happens during the cdk synth so that cdk.out contains already processed assets.

Example: Node.JS Lambda function code (processed by tsc (optionally) and esbuild)

Docker Images

cdk-assets builds a docker image and pushes it into the specified repository. The input for the build of the image is a directory in cdk.out which has the Dockerfile and related files.

Deploy

After everything was built and uploaded during cdk synth and cdk-assets, the deploy uses CloudFormation template (templates?) from the cdk.out directory. At this point the assets (which the template references) are in ECR and S3.


  • I tried to condense the information that deemed important.
  • Let me know if something is missing or if you see mistakes.
  • The plan is to update this post as I discover new information.
  • My mistakes so far
    • Started taking notes quite a few hours into the process instead of from the start. Especially it would save me the jumping between the pipeline and the build projects to re-check what each action does.
    • Editing this post tired
  • Last edit: 2023-01-20

AWS CDK – Proposed Slogans

Below, despite the humor, is my honest praise to the AWS CDK team and the product.

  1. Finally bringing code into “infrastructure as code”
    (sorry Puppet, Ansible, CloudFormation, SAM, Terraform)
  2. The only team at AWS that actually cares about your experience
  3. Suffer much less
  4. No more dealing with IAM policies anymore*
    * almost
  5. Did you know that CodePipeline actually requires an S3 bucket to work?
  6. CloudFormation? Ye, nice intermediate representation, you know, like assembler with macros.
  7. Making interaction with AWS bearable
    (I would say “again” but it never was)
  8. So right on so many levels
  9. Cloud – it doesn’t have to be ugly
  10. CDK – Cool Developers Know
  11. Isolating you from the ugly
  12. We ate the shit so you wouldn’t have to*
    * mostly
  13. Don’t Look Up^W at the generated CloudFormation

Have a nice day!