Hosting a Website

Website hosting is a common thing to do, but sometimes you may want to expand your website’s functionality with more complex services like databases, analytics, machine learning, etc. Hosting your website on AWS is a great way to bridge this gap. Shown below are some of the most common ways of hosting a website.

Note: Anything can be as simple or as complicated as you want it.

Pre-Reqs

Prepare Code

Have your website ready to deploy, whether it be a React app, WordPress site, ASP.NET application, or plain old HTML, CSS, and JS.

Deployment option 1

Lightsail

Difficulty: Easy

Lightsail is the easiest way to deploy a website on AWS. Simply upload your code or pass a link to your Docker image repository. Lightsail will then provide you a link to access your public site online. The OSU App Store was deployed using this method.

Deployment option 2

Amplify

Difficulty: Easy

Amplify is another easy way to deploy a website on AWS. Simply upload your code or pass a link to your GitHub repository containing your code. Amplify will then provide you a link to access your public site online. Amplify also makes it easy to extend your site’s functionality by combining with login, API, and database resources.

Deployment option 3

S3

Difficulty: Moderate

S3 is a basically like Google Drive where you can upload your static files to. This method is best suited for simple HTML, CSS, and JS files. Once you upload your files, S3 will provide you a link to access them publicly on the web. You can optionally add a content delivery network like Cloudfront in front of your URL for localized caching of web content. This simply reduces the number of direct calls to your S3 content because they are fetched from the Cloudfront cache instead.

Elastic Beanstalk vs Elastic Container Service

Elastic Beanstalk Difficulty: Semi Advanced
Elastic Container Service Difficulty: Advanced

Deployment options 4 and 5 can be deployed using either Elastic Beanstalk or Elastic Container Service. The only difference between the two is that Elastic Beanstalk provides an easy to use interface for deploying an Elastic Container Service under the hood. Some options may also not be available if using Elastic Beanstalk. If you want to get fine grain control over your site’s architecture, use Elastic Container Service. Both services allow you to spin up one or more copies of your site across several servers. You provide a link to the Docker image of your website code and it’ll run a copy of it on each server. The amount of servers running can be be automatically scaled by an auto-scaling group which detects the amount of traffic to your site and compares it to the current CPU and memory from the servers available. If there’s too much traffic and not enough CPU or memory from the current servers, it’ll automatically spin up more servers to meet the demand. Lastly a load balancer will provide you a URL to visit your site and it will evenly distribute traffic across your servers.

Deployment option 4

Fargate

Difficulty: Advanced

Fargate allows you to deploy servers on the cloud without the hassle of managing them. This means they won’t show up as traditional EC2 servers nor will they officially show up in your account since they are actually managed by AWS. This makes deploying easier since management of servers are handled by AWS. You can only select a predefined server type and it will be used for running your website code.

Deployment option 5

EC2

Difficulty: Advanced

EC2 is the traditional way of deploying servers. These servers will show up under your account as they are managed by you. You have more control over the specifics of your servers such as the security permissions they have as well as the virtual cloud network they can be attached to. You also have more control over what software can be deployed onto the servers. It requires more management and maintenance though. You can still select a predefined server type and it will be used for running your website code.

Deployment option 6

Elastic Kubernetes Service

Difficulty: Hard

Elastic Kubernetes Service is based on the popularily used open source Kubernetes service for container orchestration. It does the same thing as Elastic Beanstalk and Elastic Container Service except that AWS doesn’t provide you an interface for managing your servers, autoscaling group, etc. Instead you get a link to the Kubernetes Control Plane which is where you set up your architecture and adjust settings. So if you like using Kubernetes, you can migrate to this service fairly easily.

API

The goal of an API is to serve data requested by the user. An user typically queries data with a given filter such as an id, timestamp, token, etc. The server will respond back with the requested data or something to acknowledge that the request had been processed.

GraphQL API

AppSync

Difficulty: Moderate / Advanced

Another fairly new API is GraphQL. It gives the user more flexibility with their data queries. So the user can select the exact pieces of data they want, including relationships to other pieces of data in a single query. AppSync can run GraphQL for you. You simply write the queries/mutations an user can perform and how it will map to the data sources (e.g. Lambda function, database, or other data source). The Smart Home App backend was deployed using this method.

REST API

API Gateway

Difficulty: Moderate

The most common type of API is REST. You typically perform create, read, update, and delete operations which are POST, GET, PUT, and DELETE, respectively. You can use an API Gateway as an entry point to your API code. It’ll map out all the different paths and methods for you and you can either have one Lambda function for each path/method or one Lambda function to handle them all. The Lambda function will simply run your code, and so you choose what you want to do with the input request. API Gateway will provide you with a public url to use.

Manual Way

Elastic Beanstalk, ECS, EKS

Difficulty: Advanced

Of course it is also possible to use Elastic Beanstalk, Elastic Container Service, or Elastic Kubernetes Service to host any API. This is the most flexible, but also tedious solution, to getting the job done. You simply package your API code in a Docker image and deploy it to one of the three methods just like a website.

Data Storage

There are several options for storing data. The following are a selection of some of the most commonly used databases as well as some unique use cases.

S3

S3 or simple storage service is like Google Drive. You can upload all sorts of files you want here whether they be static website files, transaction records, logs, temporary file storage, machine learning input/output data, etc.

RDS

RDS or relational database service is a solution for hosting database systems like MySQL, SQL Server, or PostgresSQL. You always store data in a relational format which means data from one table should relate to data in another table. The format of data in each column is very strict and the data you insert must adhere to the data formats that you specify. RDS is traditionally a server option which means you specify the number of servers you need and what type you need. However, there is currently a preview option for hosting MySQL RDS in serverless mode as of December 2022.

DynamoDB

DynamoDB is an easy to use database for setting up relational or non-relational data. You typically store records in a table like format and each record can store an arbitrary amount of data in columnar format. You can relate data in different DynamoDB tables by storing id references in each record (e.g. book table has a column called owner which contains the id of a person in the person table). It is a serverless service which means you don’t manage any servers.

DocumentDB

MongoDB is one of the most popular software for storing data in a document form. It is quite similar to DynamoDB in which you can choose to have data stored in relational or non-relational format. However, the data is structures more like JSON and not so much as a table. You can host a MongoDB cluster using DocumentDB by AWS.

Timestream

Timestream is a unique type of database for store data sequentially in time. You might store weather forecasts or IoT data here as data is aggregated over time. It is built for this special case. However, it it still possible to use previously mentioned database methods for storing data in sequences of time. This database method simply provides a direct method for solving such a problem.

QLDB

QLDB or quantum ledger database is a blockchain type of database. It allows you to store data on a blockchain which basically stores your data sequentially (the last record added is the last item on the block “chain”), acting as a ledger. This blockchain can be kept private and doesn’t have a relationship with cryptocurrency.

Data Pipelines

Data pipelines are commonly used to get data from one or more sources to one or more destinations. They can also act as buffers between sources and destinations in case the data flow has too much ongoing traffic of a destination fails to process a message. You can also chain multiple data pipelines together with intermediate steps in between. For example, you can fetch data from one pipelines, perform analytics on the data, and send the output to another data pipeline.

Kinesis Data Streams

Kinesis Data Streams is a way to pass data from one or more producers (generates data) to one or more consumers (processes data). Data can optionally be stored on a stream for up to a year before being discarded. Typically consumers will have to poll for the data. Each time you poll for data, you can either fetch the last record that was added, the first record that was added, or your last cursor position (this is the last record you looked at) and then go from there.

Kinesis Firehose

Kinesis Firehose will aggregate all data it ingests from a one or more producers over a period of time (e.g. one minute) or if a buffer has exceed its maximum capacity (e.g. 1GB data collected). After either criteria has expired, then all the data aggregated will be sent out of the data pipeline to its next destination in a single action. Data does not get stored on the pipeline.

MSK

MSK or managed service for Kafka is a service for hosting Kafka data pipelines. Kafka is one of the most popular open source data pipeline. It pretty much does the same thing as Kinesis Data streams except it allows for more fine grain control over you data pipelines, typically for more advanced DevOps developers.

Event Bridge

Event Bridge is typically used for internal use between different AWS services. As the name implies, the data pipeline from a source to a destination is event driven, meaning that as a source generates data, it’ll trigger the pipeline and notify the destinations of newly arrived data. Event Bridge also provides schedulers to automatically send destinations pieces of data based on a set schedule. Data does not get stored on the pipeline.

SNS

As the logo implies, SNS, or simple notification service, gets data from one source and notifies one or more data sources of the new data. SNS can be used to trigger other AWS services or send out notifications through mailing services like AWS simple email service or phone SMS texts. Data does not get stored on the pipeline.

SQS

SQS or simple queue service simply send data from one source to another. As the name implies, data is written/read off a queue which is typically processed in FIFO (first in first out) format. It can act as a buffer in case the destination is not ready to process the data yet or it can preserve data should the destination fail to process the data. Data will be stored on the pipeline until it expires after a set amount of days (means the destination did not process the message). Consumers will have to poll data off an SQS data pipeline in order to read it.

Step Functions

Step Functions are an easy way to build an end-to-end data pipeline solution. For example, you can place a Lambda and connect the output to the input of another Lambda function. The second Lambda’s output can then trigger a DynamoDB write operation. AWS has a drag and drop interface for building out this entire pipelines like this.

Authentication

Authentication is important in order to ensure the privacy and integrity of an user’s data. Below is a common architecture for adding authentication to a REST API, using Firebase as the login mechanism.

Common Authentication Example

REST API Authentication with Firebase

Difficulty: Semi Advanced

    Steps

  1. An user logs into the app and sends a request for private data.
  2. The app passes the user login credentials to Firebase.
  3. Firebase returns back a JWT token that can be used for authentication. This token contains some user metadata such as their user id.
  4. An HTTPS request is sent over the internet to API Gateway with Firebase token in authorization header.
  5. API Gateway passes the authorization token to the Lambda authorizer.
  6. The Lambda authorizer will validate the token against Firebase.
  7. Firebase returns a success or error message depending on whether the token was valid.
  8. The Lambda authorizer will notify API Gateway whether the authentication succeeded or failed.
  9. If the Lambda authorizer succeeded, then the API’s Lambda function will be invoked.
  10. The API code will handle the user’s request and return back the requested private data for the specific user.