Why Is Geo-Replication Functionality Important For Apache Pulsar Users?
Most companies are either already on a public cloud, or planning on moving to a public cloud. Cloud architects want to design an application that has zero downtime, or 99.999% availability.
Let’s take a look at a use case to better understand this idea.
For example, let’s say you are working for a company that handles financial transactions for clients all over the United States. Your company hosts a web application and database on the same data center, and also uses Apache Pulsar. The financial events regarding home loan approval are published and then consumed by the consumer application. If the events are lost, the customers will have difficulty getting the loan.
If the data center goes down, such as during a power outage, an application outage will also happen. The software architects must design the application with high availability and disaster recovery to prevent this.
This is where geo-replication comes in.
Geo-replication means coping with the data/events on another server location as a backup. If the primary data center fails, then the application can continue publishing events to another data center as expected. Geo-replication supports the always-on use case for the organization.
The Perfect Example of Active-Active
The producer application publishes events on US-East Apache Pulsar, and the producer application from US-West publishes events on the US-West Apache Pulsar cluster. Both clusters are configured to have Async replication, meaning if the entire US-East is down, your producer/consumer applications are up and running without any outages.
Geo-replication is not only important for the database, but for other types of events.
Active-Passive Use Case Using Apache Pulsar and Other Infrastructure
Let’s say your company collects the analytics data from 1200 turbines that are running on different locations. The analytics is configured to run using EMR and generates meaningful information for power generation companies, including any alerts.
As soon as events are published on the Apache Pulsar topic, the analytics is triggered and provides real-time turbine monitoring to the power station. With the right settings, this configuration can be up and running all the time.
Unfortunately, you have no control over public cloud infrastructure, and there may be an issue or outage on the public data center. If you are hosting your infrastructure on the US-West, then it has the possibility of going down. You can configure another AWS region as Active-Passive mode, and Apache Pulsar will copy the events to the US-East Pulsar cluster. All other components are also deployed on another data center in standby mode. If there is an outage on US-West, then the entire processing is switched to AWS US-East data center.
Geo-Replication Reduces Latency
Let’s say your web application publishes a large message (5 MB) to Apache Pulsar’s topic. The consumer application receives the message, processes the client’s request, and then responds to the client. Your entire infrastructure is on the US-East coast, but your web application also receives messages from California. The customers in California might experience latency due to the distance.
It would make sense to configure the load balancer to send the request to US-West. You should also deploy the entire infrastructure, including Pulsar on the US-West, so if a customer accesses your web application, the latency will not be an issue.
Geo-Replication Eliminates Downtime
Geo-replication isn’t just beneficial during an outage, but also during the deployment of new features, so companies require zero downtime.
Let’s consider, for example, a Blue/Green deployment, a technique that minimizes risk and reduces downtime. You are planning on rolling out the latest version of Apache Pulsar within your web application, and the traffic is currently coming into the blue environment. Because the green environment is a replica of the blue environment, the CD pipelines automatically deploy the new version of Apache Pulsar to a green environment. The CD pipeline also has a state for automation testing for validating Pulsar functionality. Once the pipeline deploys the change into the green environment, then the load balancer makes a green environment to the blue environment. It then starts taking real customer traffic, and the end-user does not notice a change, and no outage occurs.
This method shows the importance of having a replica of the servers.
Geo-replication also helps with having multiple consumers on different geo-location. For example, if you are working with PII data on US-East, the web application will publish on US-East. The consumer applications are in US-East, US-West, and AP-South. This is where you can take advantage of geo-replication, as a single message needs to be consumed by different applications in a different region.
Due to geo-replication using the sync-replication, you have a Pulsar cluster in three different regions:
- US-East
- Us-West
- Canada-Central
All the clusters are set up with active sync between them.
- US-East-> Us-West
- Us-West->Canada Central
- US-East->Canada Central
Notice the entire cluster acts as a single logical cluster, so if one region is down, there is no impact on the health of the cluster. We have to balance out the zero-downtime requirement with the infrastructure cluster and configure it.
Geo-Replication’s Role in Video Content Delivery
Geo-replication plays a very important role in the delivery of video content, and there are numerous websites that provide live streaming or video streaming services. Video content takes a large amount of network bandwidth, and we can take advantage of geo-replication to provide an optimal streaming experience to users.
Here is the step-by-step implementation for CloudFront:
- Upload video to S3 bucket
- Get the web distributions using CloudFront
- Get the CloudFront Distribution URL
- Use the CloudFront URL on your web application
- CloudFront copies data to configured edge locations based on your preference
This means all videos are copied to edge locations. For example, if your client is from Asia, then the request will be complete from that particular edge location. If your client is from the https://slimlifehw.com United Kingdom, then the request is processed from the edge location in the UK.
To sum everything up, geo-replication can help you with the following:
- Increase the availability of your data
- Provide support for zero-downtime deployments
- Improve web application performance
- Boost video viewing experience
- Render highly available application architecture
Sign up for your FREE trial of Pandio to experience Apache Pulsar as a Service for yourself!