Jump to content

Deploying IPFS Cluster using AWS Fargate and Amazon EFS One Zone


Recommended Posts

Logo / architecture diagram of Peers a-D of an IPFS swarm

Introduction

Image source: https://ipfscluster.io/

IPFS (InterPlanetary File System) is a popular decentralized storage solution used for many use cases like decentralized applications, p2p data sharing, or immutable file systems. For more usage ideas see these examples.

IPFS Cluster is another application that runs alongside IPFS and provides data orchestration across a swarm of IPFS daemons by allocating, replicating, and tracking a global pinset distributed among multiple peers.

For example, if you want to create a non-fungible token (NFT) collection, you should store your assets on IPFS. IPFS ensures that your assets are stored in a decentralized way and are immutable. This way your NFTs can live on the internet independently from you, your business, hosting solution and domain name.

In this post, we’ll deploy a highly available and serverless IPFS Cluster using AWS Fargate and Amazon EFS One Zone. This stack helps you get started quickly and deploy your own IPFS Cluster instead of relying on a third-party IPFS pinning services.

Solution overview

This IPFS stack is suitable for various type of projects like storing and serving NFT assets, storing Docker images, hosting a website or distributing files across multiple regions. It’s meant to be cost effective and easy to deploy and operate.

We’ll use AWS Fargate to run IPFS, which is a serverless, pay-as-you-go compute engine for containers that lets you focus on building applications without managing servers. Amazon Elastic Container Service (Amazon ECS) is a container orchestration service that manages AWS Fargate tasks and services and helps you run container clusters.

We’ll deploy three Amazon ECS services, each running a single AWS Fargate task. Each task is in charge of running two containers in the same environment: one container for IPFS and one for IPFS Cluster.

Amazon EFS provides a simple, serverless, and performant elastic filesystem service which can be mounted in our AWS Fargate task and containers. As IPFS Cluster replicates the pinned content of our cluster across multiple Availability Zones (AZs) it provides high availability automatically so we can use Amazon EFS One Zone to lower our storage cost.

Amazon EFS is a network filesystem, so there are some performance limitations to consider. If you need high performance storage and throughput, then running IPFS on Amazon EC2 with storage services like Amazon EBS or Amazon FSx is recommended. Take a look at the Amazon EBS gp3 drive performances here to compare with Amazon EFS.

In front of our IPFS nodes, we deploy one Application Load Balancers (ALB), which load balances traffic among our three IPFS Gateways and IPFS Cluster REST API endpoints. The ALB also monitors the nodes and take them out of rotation if they become unhealthy.

To make our IPFS Gateway ALB endpoint available over HTTPS, we create an Amazon CloudFront distribution that uses the port 80 of our ALB as origin. Amazon CloudFront will be our Content Delivery Network (CDN). It improves performances by caching our IPFS files at the edge and enforcing HTTPS access to your files. You can configure the distribution to serve files under your own DNS domain name.

We’ll interact with IPFS through the IPFS Cluster REST API that controls IPFS. To make our IPFS Cluster REST API secure, we create another Amazon CloudFront distribution using the port 9094 of our ALB as origin. Amazon CloudFront secures the connection with HTTPS and safely pass our credentials to the REST API. This way, we can operate and add files to our cluster remotely.

IPFS network architecture diagram

Walkthrough

Deploying the solution

Prerequisites

To deploy this solution you will need:

  • An AWS Account and proper AWS Identify and Access Management (AWS IAM) permissions to create all the resources needed for this architecture
  • A VPC with three public subnets in three different AZs
  • A Linux, Mac or Windows machine with shell access to run the “ipfs” commands and interact with your cluster

Requirements

To deploy this solution, we assume that you have deployed a VPC with at least three public subnets in three different AZs. The default VPC in your region should work if there are at least three AZs in that region.

In this blog, we’ve deployed a dedicated VPC in the us-east-1 (Virginia) region following this best practice architecture. Use this quick link to deploy this VPC stack in your AWS account.

Important: To ensure your deployment doesn’t fail due to the following error The maximum number of rules per security group has been reached, make sure the limit is at least 120. If not, then please click this link to request a limit increase in your AWS Console.

Tools:

Those two scripts will be used to configure and manage our IPFS Cluster.

AWS CDK stack

You can also deploy this stack using AWS CDK. With the AWS CDK stack, you can configure your architecture to use Standard Amazon EFS using different mount points instead of Amazon EFS One Zone. It’s also easier to customize and make your template dynamic.

See the project repository here (courtesy of Paul Lu).

AWS CloudFormation stack

Download the AWS CloudFormation template here.

Before creating a new stack, you need to generate some secrets to pass as environment variables to your cluster. Those secrets are required by the AWS CloudFormation stack:

  1. CLUSTER_ID and CLUSTER_PRIVATEKEY: The id and private key of the bootstrap node that will be referenced by other nodes. To generate both the id and private key, run the following command: ./ipfs-cluster-service -c /tmp/ipfs init to generate an json file in the temporary folder you specified (/tmp/ipfs in this example). In this file, you’ll find the id and private_key value you need. Copy both values and paste them in your AWS CloudFormation parameters. See this documentation for more information about cluster configuration.
  2. CLUSTER_SECRET: The ipfs-cluster-service command also generated a service.json file. At the top of the file, you’ll find a secret. Copy the secret and paste it in your AWS CloudFormation parameter. You can then delete the /tmp/ipfs folder as we won’t need it anymore.
  3. CLUSTER_RESTAPI_BASICAUTHCREDENTIALS: Login and password to secure your IPFS Cluster REST API using HTTP basic authentication. The format is login:password.Screenshot of Cluster-ID with fields to input variables

Before deploying, make sure of the following:

The name you give to your stack will be appended to all resources’ names created by AWS CloudFormation. Make sure your stack name is no longer than 18 characters.

  • Make sure that you selected the right VPC, Availability Zones, and matching Subnets in the stack’s parameters drop-downs.

After generating those secrets and checking your input parameters, you can now deploy the AWS CloudFormation stack. The deployment should take a few minutes.

After a successful deployment, you should see a new cluster in the Amazon ECS console, three new Amazon EFS filesystems, one new load balancer, and two new CloudFront distributions. New private DNS entries also appear in Route 53.

In the AWS CloudFormation stack’s Output tab in the console, you’ll see the two public AWS CloudFront DNS endpoints we created to access IPFS Gateway and IPFS Cluster REST API:

  • IPFS Gateway endpoint: Allows you to get IPFS files through HTTPS. In your browser, just append /ipfs/${CID} to the endpoint domain name to access any folders or files CID on the IPFS network.
  • IPFS Cluster REST API endpoint: Allows you to control your IPFS Cluster. You can pass this URL to the ipfs-cluster-ctl command to remotely control your cluster (see below).

Testing the cluster

We’ll remotely connect to our IPFS Cluster REST API to check the health of our IPFS and IPFS Cluster containers.

We use ipfs-cluster-ctl to interact with the IPFS Cluster REST API. The following command should return the list of IPFS Peers in your cluster.

$> ./ipfs-cluster-ctl -l /dns/${REST_API_DNS_ENDPOINT}/tcp/443 --secret ${CLUSTER_SECRET} --basic-auth ${CLUSTER_RESTAPI_BASICAUTHCREDENTIALS} peers ls

Note: As we distribute traffic to three IPFS nodes with our ALB, each API call may connect to a different node.

Note: We are using the port 443 to connect to our IPFS Cluster REST API AWS CloudFront endpoint and not the default port 9094. Behind AWS CloudFront, the traffic is forwarded to the port 9094 on the ALB and then to the port 9094 on the IPFS Cluster containers.

Adding files to your IPFS Cluster

To add files to our cluster, run the following command:

$> ./ipfs-cluster-ctl -l /dns/${REST_API_DNS_ENDPOINT}/tcp/443 --secret ${CLUSTER_SECRET} --basic-auth ${CLUSTER_RESTAPI_BASICAUTHCREDENTIALS} add ~/files/my_file.mp4

The file will get uploaded to IPFS and the content identifier (CID) of the new file should be displayed as follows:

added QmVPF6CBups2HY5qCkbGYCR5vdqTJ5Dk3UgfN3xxxxxXX my_file.mp4

Adding a file using IPFS Cluster API automatically pins your file and replicate it across all your nodes.

For more information about the IPFS Cluster REST API and how it manages upload and pinning, see here.

Accessing files from your IPFS Gateway

Now that your file has been added and pinned to your IPFS Cluster, you can access it through your own IPFS Gateway.

Copy the DNS endpoint of your IPFS Gateway located in the outputs tab of your AWS CloudFormation stack and append /ipfs/${CID} to it. As the files are pinned on your own cluster the response time should be very quick.

You can also access the file you uploaded through a public IPFS gateway like ipfs.io as follows:

https://ipfs.io/ipfs/${CID}.

This time it’s slower because ipfs.io needs to get the file from your IPFS Cluster first before serving it through the gateway.

You can also access any public files or folders on the IPFS network through your own gateway. You can also pin existing public files on your cluster by using the ipfs-cluster-ctl pin add command.

Deep dive

High availability

The IPFS Cluster is highly available as each node is running in a different AZ. If a node were to fail and shutdown, then our ALB notices it and stops sending traffic to this node. This is because we configured health-checks on the ALBs to monitor both our IPFS gateway and our IPFS REST API endpoints. The overall service would remain available even if two AZs became unreachable.

If an IPFS node fails, then the Amazon ECS service associated with it would notice the problem and would start a new AWS Fargate task automatically to bring the cluster back to health. Both containers IPFS and IPFS Cluster are considered essential in Amazon ECS. If one container were to fail, the whole AWS Fargate task would fail.

We also perform health checks on the ipfs container by running a command every 30 seconds to that make sure the IPFS daemon is running and responsive. Amazon ECS would restart the task if the health check were to fail.

As the storage is decoupled from the containers thanks to Amazon EFS, no data is lost when a task fail. The Amazon ECS Service would start a new AWS Fargate task that would mount the existing Amazon EFS file system and restore access to all the existing metadata and files from the previous task.

Note: Multiple IPFS nodes cannot share the same Amazon EFS filesystem or it creates conflicts at the IPFS level.

Networking

Amazon ECS uses AWS Cloud Map to manage the mapping between AWS Fargate tasks IP addresses and their private domain name.

The node in the first AZ is the main IPFS boostrap node. The other two nodes have a custom container command that references our first node using DNS. You can see this command if you look at the ipfs-cluster container configuration in the Amazon ECS task definition for the nodes in your last two AZs.

daemon,—bootstrap,/dns/ipfs-node-us-east-1a.ipfs/tcp/9096/p2p/${CLUSTER_ID}

This DNS name is recorded in Route 53 and maintained by CloudMap automatically from Amazon ECS. If a node is restarted and get a new IP address, then the domain name is updated accordingly.

Note: The first node is our bootstrap node and doesn’t have a custom container command.

Security

Beside the IPFS swarm port 4001, no other port is directly accessible from the internet.

Communication with IPFS Gateway or IPFS Cluster REST API can only be done through AWS CloudFront, which provides custom domain capabilities, secure communication over HTTPS, and caching at the edge (for IPFS Gateway).

As of today, Basic HTTP Authentication is the only authentication method supported by IPFS Cluster REST API. Using Amazon CloudFront with HTTPS, we ensure that the credentials are encrypted in transit and allow us to operate our cluster remotely and in a secure way.

Our secrets are safely stored in AWS Systems Manager Parameter Store and are safely injected as environment variables into the containers. The secret values will not appear in the Amazon ECS environment. However, they still show as input parameters of the AWS CloudFormation stack.

For extra security, you could pre-create the Simple System Manager (SSM) parameter store entries containing those secrets and reference them in your template. This way you don’t have to provide secrets as parameters to your AWS CloudFormation template.

Logging

Each container in each AWS Fargate task sends logs to separate Amazon CloudWatch log groups. The logs are directly accessible in Amazon CloudWatch or from the AWS ECS Console when you open from the Amazon ECS Service or AWS Fargate task.

Both IPFS logs and IPFS Cluster logs are available and can be used to observe the cluster behavior and troubleshoot.

Monitoring

It’s vital to know the health of your cluster. Here are some of the things to actively monitor:

  • Your Amazon ECS Cluster and AWS Fargate tasks – This is easily done using Container Insights in Amazon CloudWatch.
  • Your IPFS and IPFS Cluster container logs – Those logs are in Amazon CloudWatch, and by using filters you can raise alarm if a specific log message appears.
  • Your Amazon EFS storage – Using default Amazon CloudWatch metrics on Amazon EFS you can keep an eye on the storage size of each file systems.
  • IPFS Cluster also provides monitoring tools you can run alongside your cluster. See the documentation.

Taking it further

Private IPFS Cluster

You can create a private IPFS Cluster by setting the IPFS_SWARM_KEY env variable in the Amazon ECS task definition of your ipfs containers. This makes sure your IPFS nodes can only communicate with each other and not with the public IPFS network.

One of your nodes can be elected “bootstrap node” and be referenced by the other two nodes. To remove the default IPFS bootstrap list set by IPFS and add your own bootstrap node, you can follow this documentation.

You’ll need to create your own Docker image from the official one. Your Dockerfile will copy a script in the /container-init.d folder inside the IPFS container, which is executed by IPFS after the initialization phase. The script cleans the default the bootstrap list and add your own boostrap node instead.

You can store your new Docker image easily by using Amazon ECR and then update the Amazon ECS task definition for the ipfs container to use your custom image instead of the default one.

Note: With a private IPFS Cluster you can only access files known to your cluster. Public files won’t be accessible.

Reduce compute cost by using Fargate SPOT

To reduce compute cost, you can use Fargate SPOT, which can offer up to 70% discount off the AWS Fargate price.

A possible architecture would be to keep the bootstrap node running on AWS Fargate to guaranty high availability and run all other nodes on AWS Fargate SPOT. As we have an ALB in front of our IPFS Cluster, the load balancer detects when a SPOT task gets reclaimed by AWS and takes it out of rotation. A new SPOT task is automatically started by Amazon ECS and is added back to the ALB.

Reduce EFS storage cost

Amazon EFS One Zone has “Automatic backups” turned on by default. For safety you may want to keep automatic backup on only for one volume.

Amazon EFS Infrequent Access is another option to reduce storage cost up to 92%. Using Life Cycle Policies, Amazon EFS moves files to the Infrequent Access storage tier automatically after a given period of time.

As the IPFS Cluster replicates data across all three Amazon EFS volumes, both options are viable for most use cases and can reduce storage cost for large clusters.

Secure IPFS Gateway with Lambda@edge or Amazon CloudFront functions

Right now, IPFS Gateway is publicly available and can be accessed by anyone on the internet to get any files from the public IPFS network. This is acceptable for this demonstration, but you may want to restrict access to it.

To do that you can take a look at Lambda@edge to run some Basic Authentication logic on Amazon CloudFront directly. Or Amazon CloudFront Functions for a more lightweight approach.

Here are a couple posts on this topic:

Gain containers Shell access

To debug your IPFS Cluster, you may want to log in directly into the containers and run some commands directly.

To do this, you can use Amazon ECS Exec and gain shell access to your running containers. You’ll need to create a custom IAM role, update your services, and redeploy your existing Amazon ECS services.

Use Amazon S3 as storage solution

To reduce your storage cost, you could store your IPFS files on Amazon S3 instead of Amazon EFS One Zone.

Using this plugin, you could configure your IPFS nodes to use Amazon S3 as storage layer. However, this requires you to build your own IPFS and IPFS Cluster Docker images and install all the dependencies so both containers have access to Amazon S3.

Cleaning up

To remove all resources deployed in this blog, delete the AWS CloudFormation stack you created earlier. It will delete all AWS resources and all the files previously stored on the cluster.

Conclusion

In this post, we showed you how to deploy and run your own IPFS Cluster using IPFS and IPFS Cluster docker containers. Our solution is serverless which reduces cost and highly available making it ideal for web2 and web3 applications to consume content from IPFS.

IPFS is also an interesting approach to data replication over multiple Regions. We can imagine multiple clusters, like the one that we presented in this post, running in multiple Regions and being part of one big global IPFS Cluster.

The main advantage of this solution is that it’s fully serverless, which greatly reduces the operational work and operating cost.

View the full article

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...