Apache Pulsar

Posted on February 4, 2021

Why the Financial Sector is Increasingly Choosing Apache Pulsar

Apache Pulsar resolves many technical obstacles of Kafka, which pose operational obstacles in particular to banks and other financial services enterprises. Following suit, global banks are now transforming their messaging and event streaming solutions with Pulsar. Banks require flawless performance and accuracy in their transaction data solutions. Meeting this challenge, Pulsar has now evolved as a zero data loss, near zero latency, massively scalable pub-sub platform. The innovations in Pulsar’s Cloud-native real-time event streaming framework also include zero downtime and further define Pulsar as a Cloud-native solution with:

Unified messaging and real-time event streaming
Highest-in-industry throughput capability
Absolute durability via disc syncing

We will explore two particular technical advantages enjoyed by banking enterprises which have already chosen Pulsar: multi-tenancy and geo-replication. These visionary design aspects of Pulsar define the fail-proof, secure foundation of a data-driven banking app. Along the way, we will also see, for example, that Pulsar SQL provides the best real-time data querying capabilities for the machine learning apps which power banking’s most accurate risk assessment and fraud detection algorithms based on benchmarks from third party analysts.

Why Banks are Choosing Apache Pulsar

For enterprise banking and financial service Compute-at-the-Edge projects today, Pulsar handles real-time data event streaming with unparalleled speed and accuracy. For example, transaction data pipelines from IoT sensors at bank branches stream live to Edge Computing (strategically instead of Cloud infrastructure) for the fastest possible analytics and ML insights. Two important features differentiate Apache Pulsar and position it as the event streaming platform for next generation workloads.

Pulsar’s multi-tenancy and multi-datacenter redundancy guarantee total disaster recovery with zero downtime for users by original design. Multi-tenancy and geo-replication also elegantly facilitate the growing trend of Edge Computing to strategically process Big Data near the source. The revelation that IoT sources are now producing Zetabytes of data, and that ML apps need to crunch this at the source instead of transmitting it to Cloud apps, was one of the visionary insights which influenced the design of Pulsar. Edge Computing makes possible the fastest time to ML-based analytics and actionable insights. Pulsar’s multi-layer architectural design enables unique, unmatched scalability and resilience. Let’s look more deeply into multi-tenancy and geo-replication.

Multi-Tenancy Enriches Financial Services Use Cases

Important to the flexible Cloud-Native concept of Apache Pulsar is the idea that a single cluster supports multiple users and variable workloads. Multi-tenancy is implemented so that users and namespaces can be authenticated independently, providing these distinct advantages for application development in the financial sector:

Authentication / authorization in namespaces and APIs
Between read/write isolation (Bookkeeper)
Soft isolation (configurable)
Hardware isolation as tenants per broker/bookie

Within developer teams different users have designated roles and levels of secure authorization to access the varying levels of confidential and secure client data; this is a critical concern in financial applications. Likewise, transaction data and data from portfolio trades streaming from source devices are assigned access levels as producers. Finely tuned hierarchies of access are implemented to engage the precise governance required for each developer, team member, or project. And this functionality is brilliantly built into Pulsar so that no additional coding is needed to achieve authorization to publish-subscribe data projects.

Defining the authorization levels to data access has an easy, non-technical use case in bank branches. Consider the client-facing team members at a bank branch, for example. Loan analysts, account reps, and tellers each need particular security clearance to view segments of client confidential data. Pulsar’s multi-tenancy architecture makes it very easy to define such secure access per user.

Nowadays, massive network breaches are commonplace, as with the recent SolarWinds hack, and isolating one user’s access scope from another’s is increasingly essential to preserving confidential data. Pulsar’s soft isolation makes it easy for DevOps in financial services to secure networks.

Geo-Replication Serves Growing Edge-Compute Trends

Valuable in banking use cases, Pulsar provides configurable management of multiple data centers which intelligently replicate and share topics to guarantee 24/7 availability of messaging and event streaming data. Clusters do not depend on shared resources for system availability. Therefore, when one cluster is down, another cluster will provide access to the same data. And all data can be migrated from one cluster to another at any time. These guarantees are ensured by several technical components of Pulsar:

Scalable async replication
Integration with broker message flow
Simple config to add-remove geo regions

In a hypothetical scenario, if cluster A is offline the subscription data for the topic will migrate automatically to cluster B. The subscription topic in cluster A will be maintained automatically in cluster B. When cluster A comes online again, Pulsar will back-replicate to it and resume updating topics there. This architecture guarantees zero data loss in the event of hardware failure. For this reason, Pulsar is the ideal secure messaging and streaming solution for banking and finance applications.

Consider, for example, portfolio trade data streaming live in real-time to performance analytics for executive decision-making. With a non Cloud-native like Kafka, if a cluster fails, then executives looking at a data visualization of minute-by-minute performance metrics on Wellesley or Wellington mutual funds will suddenly NOT be looking at the most recent data! This problem is solved by Pulsar failover features, because the data is always streaming to multiple datacenters. When Pulsar detects that one server is down, the designated survivor automatically transmits to the dashboard or data presentation app.

Additionally facilitating the Edge-Compute trend, geo-replication is configurable, so that clusters can be designated near data sources to optimize computational speed and reduce throughput to Cloud infrastructure when the fastest available actionable insights are demanded.

Pulsar’s Single Solution Elegance

Previously, Banks and other financial services companies needed to combine Flink with Kafka in a complex orchestration to achieve live data event streaming. Now, Pulsar includes all the functionality needed to capture the most recent transaction events in a single platform. Pulsar was originally designed to store events persistently – in essence, forever – solving one of the important problems of Kafka which required coding patches.

Solving the shortcomings of Kafka in banking and finance leads to beneficial outcomes in surprising areas such as compliance with strict regulatory requirements. Pulsar’s guarantee of flawless zero latency streaming ensures continuous real-time reporting to governance and compliance systems, which further leads to reduced administration costs.

An application based on Pulsar is truly event-driven up to the microsecond. With this technical foundation, developers can engineer the best possible customer experience. Machine learning based analytics now focus on merchant monitoring, for example, to capture and report anomalies which will impact user experience.

Why Are Banks Making the Transition from Kafka to Pulsar?

At first glance, it sounds expensive: a lot of banks have already coded solutions for the technical obstacles encountered with Kafka. So, why are many banks now considering stopping the in-house patch development and moving to Pulsar? Coding tasks such as resolving deduplication, and the issue of the first/last stream update lost in the cache, to name two of a list of hundreds, mount up as new data mining methods appear. There are two important reasons for the transition, but the first is most compelling: new technical obstacles are coming to Kafka as the Cloud and Edge Compute trends become more complex. In simple terms, Pulsar is a cloud-native solution and Kafka is not. Therefore, more in-house coding to compensate for Kafka short-comings should be replaced by transitioning early to the Pulsar Cloud-native solution.

The most immediately impactful use case in banking today – the one use case compelling the switch to Pulsar instead of coding more patches to Kafka – arises with the increasing inclusion of IoT streams added when existing banking technology is upgraded. Every new IoT network, and new ones appear on the horizon every day, will require engineers and custom coding with Kafka. But Pulsar makes the transition easy. So enticing is the prospect that utilities are now available, like KoP – Kafka on Pulsar – to convert your Kafka deployment to Pulsar automatically.

The second rationale for moving to Pulsar is the growing number of contributors who are excited to participate in the vastly improved messaging bus technology of Pulsar. Now that Pulsar is proven, we enter a progressive expanding community of fintech developers who continually improve the experience and share solutions. While Pulsar is a superior technical solution, there remains an advantage to be enjoyed by engaging Pulsar as a Service through an expert provider such as Pandio.

Live Transaction Data Query for ML Apps with StreamSQL

The ability for ML algorithms to update and retrain their forecasting models live depends on the ability to query live streaming data in real-time. For example, suppose a credit applicant’s score went critical because of transaction events which occurred during the last minute. The actionable insight to prevent that credit risk from contaminating a bank’s portfolio depends on the algorithm’s ability to retrieve the most recent data from the source. Here we can see why Edge Computing is the trend in obtaining the fastest insights from artificial intelligence.

In other words, every second counts as a lot of data, because a second may contain a million transaction events. Data science engineers now realize that capturing and processing an event stream near the source can make the critical difference between approving and rejecting a credit application. In fact, most banking apps now recognize the urgent need for Edge Computing in the top four research areas delivering important actionable insights:

Bankruptcy prediction
Credit risk assessment
Fraud detection
Optimal portfolio management

Now that ML methods such as naïve bayes and k-nearest neighbor demonstrate accuracy of greater than 97% in fraud detection, the emerging imperative is to capture data at the source with the Apache Pulsar streaming to ML algorithms. The guidance from patterns in the data is compelling.

Imperative in the Competition

Banks and other financial sector players will lag in competition when they do not reach into the future; afterall, prediction is the substance of the artificial intelligence we are striving to build. Prediction is most competitive when its insights are obtained fastest. Pulsar’s intentional Cloud architecture is now the data fabric of ML algorithms which foretell the future of financial transactions – those events which define an endeavor’s success or failure.

In a financial app which processes billions of transactions per day, a few seconds difference can mean a vast amount of data; that is the difference between real-time and historical data. A single second of latency means the app could have streamed another thousand transactions to analytics which could change the outcome of a credit risk assessment. That single second needed to lift you an inch above the crowd – that’s the Pulsar advantage. We now know that the critical actionable insight was derived from data streamed a millisecond ago; harvesting that insight put us ahead of competitors.

As messaging becomes more complex many financial services firms are looking to a managed Pulsar service provider as a means to enhance performance, reduce technical overhead, cut costs, and to reduce the complex data science human resources factor which accompanies data mining endeavors. Pandio addresses these and other facets including one unique feature: Pandio enhances Pulsar by means of a custom neural network to do load balancing. The unique neural network, designed by Pandio, learns the messaging patterns of each client and then automates its load balancing and resource optimization. Ultimately, as Pandio’s neural network learns the messaging environment it substantially eliminates the need for developer tweaking. While Pulsar is brilliant on its own, Pandio makes the transition to Pulsar even more compelling.

Sources

Spark AI Summit

Machine learning identifies fraudulent transactions

IoT network intrusion detection

Fraud detection use cases

You must be logged in to post a comment.