Elixir Clustering with libcluster and AWS ECS Fargate in CDK

What Are We Doing?

Elixir clustering allows you to make remote function calls on other machines using a variety of techniques. Clustering nodes on your local machine with libcluster is easily done using the LocalEpmd strategy, but is much trickier to implement in real-world AWS conditions using serverless technology. Since we're using Fargate, and assuming you want to scale horizontally, the Gossip strategy is attractive since you don't know how many or which nodes will be available. Unfortunately, ECS Fargate tasks flat-out cannot do UDP multicasting: it is only supported on EC2 Linux launch types. This leaves us with DNSPoll, which fortunately works well in AWS.

Other Resources

There are some aging discussions on ElixirForum that indicate it is possible to set up clustering in ECS, and one step-by-step guide that gets pretty close. You won't be able to configure clustering based on the forum posts. The referenced guide gets a bit in-the-weeds for the author's use case, and in the end requires some icky-feeling routing configuration.

Assumptions

  • Elixir 1.14

  • Erlang/OTP 24+

  • Phoenix 1.6+

  • AWS CDK 2.21+

  • Libcluster 3.3.1

  • An existing AWS VPC

  • An existing ECS Cluster

  • You are using Linux as the ECS task OS

Local Setup

You can cluster any running elixir nodes, whether the nodes are running the same or different projects.

This libcluster configuration assumes you don't want libcluster running during tests. Add libcluster to your mix.exs deps function. As of this writing, libcluster 3.3.2 was not working with Phoenix 1.6+ with Elixir 1.14+, so I'm explicitly requiring 3.3.1

Configure your application.ex, and config/dev.exs. We will configure config/prod.exs later.

mix.exs deps()

{:libcluster, "3.3.1", only: [:dev, :prod]}

application.ex start()

@env Mix.env()

def start(_type, _args) do
  children = [
    ...
  ]

  children =
    if @env != :test do
      [
        # Start libcluster
        {Cluster.Supervisor, [Application.get_env(:libcluster, :topologies), [name: MyApp.ClusterSupervisor]]}
          | children
      ]
    else
      children
    end

  ...
end

config/dev.exs

config :libcluster,
  topologies: [
    myapp: [
      strategy: Elixir.Cluster.Strategy.LocalEpmd
    ]
  ]

Running multiple Phoenix instances

To run two instances of the same Phoenix project you'll need to change your port settings for the endpoint in config/dev.exs

http: [ip: {127, 0, 0, 1}, port: System.get_env("PORT", "4000"])]

Now we can dynamically set the port when starting the Phoenix server.

In a terminal window, execute iex --sname a -S mix phx.server from the project root.

In a second terminal window, execute PORT=4001 iex --sname b -S mix phx.server

In either terminal window execute Node.list and you should see a returned list with the name of the other running Phoenix node, i.e. [:b@hostname]

To see the clustering in action execute this command, replacing <node name> with the returned node name above. :erpc.call follows the MFA (module, function, arguments) pattern, and this expression will execute Node.self() on the remote node.

:erpc.call(<node name>, Node, :self, [])

Congratulations, you have your first Elixir cluster!

Production Setup

config/prod.exs

config :libcluster,
  topologies: [
    ecs: [
      strategy: Cluster.Strategy.DNSPoll,
      config: [
        polling_interval: 1000,
        query: <full URL of other running service>,
        node_basename: <base node name of other service>
      ]
    ]
  ]

The query value is the full URL of the other running service, i.e. myapp.organization.local. You will establish this value when we get to Service Discovery in CDK.

The node_basename value is what will be to the left of the @ in the node name. In the examples above this would be a or b. In your project it will likely be what you set as the RELEASE_NAME environment variable or the value of name in the release struct.

Where you see ecs: above is just a name for this topology config. It does not matter to libcluster, Elixir, or any other library what this value is.

Release

In your project root, run mix phx.gen.release --docker, then mix release.init

Customize the Dockerfile emitted by phx.gen.release for your specific project needs. To follow this guide your Docker runner instance Linux installation will need curl and jq which can be retrieved with apt-get install curl jq -y. Additionally, you will probably need to add the following statements.

# Default Phoenix server port
EXPOSE 4000

# Erlang EPMD port
EXPOSE 4369

# Intra-Erlang communication ports
EXPOSE 9000-9010

# :erpc default port
EXPOSE 9090

In the env.sh.eex generated by mix release.init add the following

# The interpolated env var is created by default by AWS ECS
# For more info see https://docs.aws.amazon.com/AmazonECS/latest/userguide/task-metadata-endpoint-v4-fargate.html
TASK_METADATA=$(curl ${ECS_CONTAINER_METADATA_URI_V4}/task)
HOSTNAME=$(echo $TASK_METADATA | jq -r '.Containers[0].Networks[0].IPv4Addresses[0]')

export RELEASE_DISTRIBUTION=name

# Alternatively, you can use "<%= @release.name %>@$HOSTNAME" 
# as the value of RELEASE_NODE if your Elixir 
# project name and your desired node basename are the same value.
export RELEASE_NODE="$RELEASE_NAME@$HOSTNAME"

Alternatively, you can use "<%= @release.name %>@$HOSTNAME" as the value of RELEASE_NODE if your Elixir project name and your desired node basename are the same value.

In remote.vm.args.eex add the following to configure the intra-erlang communication ports.

-kernel inet_dist_listen_min 9000
-kernel inet_dist_listen_max 9010

More info about Elixir releases can be found in the Mix docs here.

Infrastructure

The following is an outline of a CDK stack configuring the necessary infrastructure components for launching ECS Fargate tasks in an existing ECS Cluster, within an existing VPC. You will need to update the resource IDs and references throughout this example with your actual values.

Explanations for certain decisions and alternatives are found in the comments above the respective resource statement.

import * as cdk from 'aws-cdk-lib';
import { Duration, Stack} from 'aws-cdk-lib';
import * as ec2 from 'aws-cdk-lib/aws-ec2';
import * as ecs from 'aws-cdk-lib/aws-ecs';
import * as iam from 'aws-cdk-lib/aws-iam';
import * as servicediscovery from 'aws-cdk-lib/aws-servicediscovery'
import * as ssm from 'aws-cdk-lib/aws-ssm';
import * as crypto from 'crypto';

export class EcsStack extends Stack {
  constructor(
    scope: Construct,
    id: string,
    props: cdk.StackProps
  ) {
    const vpcId = '<my vpc id>';
    const clusterName = '<my cluster name>';

    const vpc = ec2.Vpc.fromLookup(this, 'my-vpc', { vpcId });

    const cluster = ecs.Cluster.fromClusterAttributes(this, 'my-cluster', {
      vpc,
      clusterName,
      securityGroups: []
    });

    const taskRole = new iam.Role(this, 'my-task-role', {
      assumedBy: new iam.ServicePrincipal('ecs-tasks.amazonaws.com'),
    });

    /**
     * Enabling this setting lets us start remote sessions on the 
     * running task. See the AWS blog for more 
     * https://aws.amazon.com/blogs/containers/new-using-amazon-ecs-exec-access-your-containers-fargate-ec2/
     */
    const ecsCommandPolicy = new iam.PolicyStatement({
      actions: [
        "ecs:ExecuteCommand",
      ],
      resources:[`arn:aws:ecs:<region>:<account>:cluster/${cluster.name}`],
    });

    this.role.addToPrincipalPolicy(ecsCommandPolicy);

    const taskDefinition = new ecs.FargateTaskDefinition(this, 'my-td', {
      taskRole: taskRole,
      executionRole: taskRole,
      memoryLimitMiB: 1024,
      cpu: 512,
    });

    const serviceSecurityGroup = new ec2.SecurityGroup(
      this,
      'my-service-sg',
      {
        vpc,
        securityGroupName: 'my-service-sg',
      },
    );

    /** 
     * Before two elixir nodes can cluster, they must share a secret
     * cookie. For a longer explanation, see 
     * https://fly.io/docs/elixir/the-basics/clustering/#the-cookie-situation
     * In the meantime, this assumes you have already set 
     * an SSM parameter to hold this shared cookie. 
     */
    const releaseCookie = 
      ssm.StringParameter.fromStringParameterName(
        this, 
        'my-ssm-release-cookie', 
        '/path/to/your/release-cookie'
      )

    /**
     * Setting RELEASE_NAME is optional here, as you may
     * rely on the release struct mentioned earlier in this guide.
     */
    const container = taskDefinition.addContainer('my-tc', {
      containerName: 'my-tc',
      image: ecs.ContainerImage.fromAsset('../'),
      essential: true,
      environment: {
        SECRET_KEY_BASE: crypto.randomBytes(32).toString('hex'),
        RELEASE_NAME: "myapp",
        RELEASE_COOKIE: releaseCookie.stringValue
      },
    });

    /**
     * This example creates a service discovery service and namespace 
     * for each application. If you have access to the configuration
     * for your existing ECS Cluster, or if you are creating a new
     * ECS Cluster you can create a single service discovery service
     * and namespace. The reasoning here is that we have to provide
     * a namespace in the FargateService definition and the only way
     * to obtain an existing namespace is from the 
     * Cluster.defaultCloudMapNamespace() function. So, if you cannot
     * update the existing ECS Cluster with a default Cloud Map
     * namespace, you will have to do as you see here.
     */
    const namespace = new servicediscovery.PrivateDnsNamespace(this, 'ecs-cluster-namespace', {
      name: 'organization.local',
      vpc,
    });

    const serviceDiscovery = namespace.createService('ecs-cluster-servicediscovery', {
      dnsRecordType: servicediscovery.DnsRecordType.A,
      dnsTtl: cdk.Duration.seconds(10),
    });

    /**
     * As explained above, if you have an existing ECS Cluster
     * with a default Cloud Map namespace, you can provide it to the
     * FargateService cloudMapOptions with
     * cluster.defaultCloudMapNamespace() instead of creating a new
     * namespace and service discovery namespace.
     *
     * The `query` value in the libcluster configuration will be a
     * combination of the cloudMapOptions.name followed by the 
     * cloudMapNamespace.name, in this case "myapp.organization.local"
     */
    const service = new ecs.FargateService(this, 'my-service', {
      cluster,
      taskDefinition,
      healthCheckGracePeriod: Duration.seconds(60),
      serviceName: 'my-service',
      securityGroups: [serviceSecurityGroup],
      desiredCount: 1,
      enableExecuteCommand: true,
      cloudMapOptions: {
        name: 'myapp',
        cloudMapNamespace: this.namespace,
      }
    });

    /**
     * This is perhaps the most critical aspect of this configuration.
     * Alternatively, you could use a shared security group where all
     * applications you wish to cluster are members. In that case you
     * would set connections.allowInternally(ec2.Port.allTraffic())
     *
     * The code below restricts communication between security groups
     * to the specified ports. As far as I can tell the only
     * libcluster strategy that relies on UDP traffic is Gossip,
     * and Erlang's EPMD (which undergirds Elixir clustering) 
     * operates on port 4369/TCP. However, :erpc defaults to port
     * 9090/TCP and, once the cluster is created with EPMD,
     * intra-erlang communication is performed over
     * a random high-number port unless a range is given like we
     * established in remote.vm.args.eex. An optional statement is
     * provided below to allow all traffic between security groups.
     * For a live Phoenix app you will also need to configure an alb
     * targeting port 4000 and update your security group to allow
     * traffic from the alb to port 4000/TCP. This is beyond the scope
     * of this guide.
     */
    const myapp2SecurityGroup = ec2.SecurityGroup.fromLookupByName(this, 'myapp2-sg', 'myapp2-service-sg', vpc);

    // Uncomment this statement to allow all traffic from other security group
    //serviceSecurityGroup.connections.allowFrom(myapp2SecurityGroup, ec2.Port.allTraffic())

    serviceSecurityGroup.connections.allowFrom(myapp2SecurityGroup, ec2.Port.tcp(4369), 'Erlang EPMD port');
    serviceSecurityGroup.connections.allowFrom(myapp2SecurityGroup, ec2.Port.tcp(9090), 'erpc default port');

    /**
     * Allow from the range we limited intra-erlang communication
     * ports we opened in the remote.vm.args.eex file earlier.
     * Erlang uses the EPMD port 4369 to find the available nodes
     * and then assigns each node a listener port from the range
     * specified in the inet_dist_listen_min/max range.
     */
    serviceSecurityGroup.connections.allowFrom(myapp2SecurityGroup, ec2.Port.tcpRange(9000, 9010), 'Connection ports')

  }
}

Duplicate this configuration for additional projects, changing the security group details for the security group you'd like to allow access to the additional project. Alternatively, configure your respective FargateService resources with the same security group as described in the comments within the example above. You will still need to duplicate this configuration for additional projects, but you can omit the security group sections.

Deployment and After

Deploy this CDK with your CI/CD pipeline. Once your tasks are up and running you can remote in with the AWS CLI using this statement.

aws ecs execute-command --command="bin/myapp remote" --cluster my-cluster --task <task id> --interactive

This will open a remote iex session in the running task. If all is well you can run Node.list() and you will see your clustered nodes in the response. From there your code can use any of the various strategies for distributed Elixir. Happy clustering!

Troubleshooting

One frustrating thing that can go wrong during development is that you need to change or remove a PrivateDnsNamespace you have created while figuring out how you want this thing to be setup. This can cause your CDK deployments to fail as it cannot remove the namespace while it still has resources attached. You can navigate to Cloud Map within the AWS Console, click through to the namespace you wish to delete and manually delete its resources. This clears the way for CDK to delete the deprecated namespace.