Apache Pulsar in AI-Driven Enterprise Integration Middleware
How is it possible to automate integration of enterprise software applications when unique data formats and disparate protocols make integration a very complex task? Remarkable integration providers such as IBM and Pandio have already created hybrid integration platforms which apply AI-powered automation to the laborious task of interfacing such enterprise apps as ERPs and CRMs. The answer: AI-based data format recognition and load-balancing are important factors, but speed is critical, especially in Edge Compute scenarios. The core component in the newest hybrid integration platforms, the conductor which orchestrates the five enterprise platforms in our case study to perform as one, is Apache Pulsar. As such, Pulsar-as-a-Service providers like Pandio now support the only true Cloud native integration platform. These universal connectors harness multiple integration strategies to build true hybridizations including two or more of the following:
- Enterprise Service Bus (ESB)
- API Management
- Service-Oriented Architecture (SOA)
- Event Driven Architecture (EDA)
Applying sophisticated AI-based machine learning to automatically generate code based on high-level non technical user inputs, as well as code completion suggestions to enhance engineer performance, hybrid integration platforms can now automate tedious integration tasks. Through automated data mapping and data format recognition and translation, the newest Pulsar-based platforms more quickly and accurately apply interfaces. Such integration solutions pull in any and all existing integration technologies as required. Messaging platforms play the most central role in integration platforms. Message Queue Interface (MQI), Apache Pulsar, Java Message Service (JMS), REST, .NET, MQ Light and MQTT are among the most important components.
Today, a task that was once among the most developer intensive, that of enterprise software integration, is being tamed and even standardized by the ingenuity of data science. Our objective here is to present a proven use case to demo Apache Pulsar as the fastest component of an Enterprise Service Bus (ESB). Before we begin that journey, let’s clarify some of the often confusing terminology around enterprise integration middleware and messaging platforms.
Distinguishing the MQs from the PubSubs
Subtle but important differences exist between Message Queue (MQ) and Pub-Sub platforms which are at the core of integration tools today; but these distinctions are blurring somewhat now, as providers include many integration strategies within a single platform. Traditional MQ, for example, does not absolutely de-couple applications as PubSub does. MQ supports data exchange between applications and systems via message queues, but still requires message recipients to be named and configured, and connected apps must specify a queue manager. MQ, as an integration, simplifies the creation of business application interfaces. But there are many scenarios in which several enterprise applications require access to the data in a queue. Data security is an issue in such scenarios. PubSub has now evolved to fill this niche.
In fact, IBM’s MQ product has already evolved into a publish/subscribe platform precisely for this reason. Pub-Sub makes possible the full decoupling of target applications; any number of applications intended to be integrated can share information without knowing anything about each other – that’s the genius of PubSub. A sending application generically sends (publishes) data to a topic. Any number of consumers can subscribe to that topic and fetch the data. Pub-Sub platforms like Pulsar support authorization tokens so that the bookkeeper can determine the consumer’s access privileges, thus satisfying the critical security requirements of multi-tenancy. Now that we know PubSub is the best core integration tool, all that remains is to choose the correct PubSub platform.
Message Oriented Middleware VS. ESB
With the knowledge that essentially all enterprise software applications can be interfaced ultimately via the PubSub paradigm, how do we choose the most accurate PubSub provider?
Many ESB middleware technologies available today are built on an older PubSub platform called Kafka. However, Kafka as ESB fails for a reason which is ironic in this scope: it fails to decouple computation and storage. Indeed, an essential integration principle – i.e. decoupling via microservices within a platform – was not a fundamental Kafka design principle. Fortunately, the Cloud native PubSub platform Apache Pulsar does decouple compute from storage, which makes scaling large enterprise integrations much easier.
To fully understand why a frequently used PubSub platform like Kafka is the least ideal for the task, we need a balanced view of competing concepts and strategies. For example, many implementations fall into the ESB anti-pattern. Designing too much dependence into an ESB architecture for non-functional requirements and business business logic. In other words, ESB is great but it must be used correctly!
A misconception arises in the often observed ESB vs. Kafka approach. Event-based data flows which require real-time and batch processing demand flexible and agile microservices to extract intelligence from data while it is relevant in the analytics decision cycle. However, Kafka cannot deliver the performance of Apache Pulsar as proven in standard benchmarks. Hybridization emerges once again in the ESB ETL debate. Apache Pulsar overcomes the often published complaint that ESB cannot move massive data quickly. With Pulsar as the pubsub core of its event streaming platform, Yahoo integrates Yahoo Finance, Yahoo Mail, and Flickr with astonishing data throughput volumes including billions of events and transactions per day. It is not surprising that Pulsar’s scalability meets Yahoo’s requirements since Yahoo is the progenitor of Apache Pulsar.
The ESB ETL Debate
Extraction Transformation and Loading (ETL) intends, as the name suggests, to deal with data from multiple sources. There are existing tools for wrangling, customization, and formatting, all in the name of integration when seen in this scope, and all for insertion into an enterprise’s standard data warehouse. Designing and building an ETL process is complex, but tools exist to make things easier, such as entity relationship diagramming utilities, known data models like StarSchema. And now there is also a standard mapping model. While ETL as a concept is complex, recall that our objective is to incorporate the best of ETL and ESB into our hybrid concept of integration. In other words, we are no longer stuck with the problems of one paradigm; we now select the best tools and methodologies from both ESB and ETL. We can do so by many means, and among the most productive of those means is the study of successful use cases.
Featured Use Case: Integration of Five Manufacturing Enterprise Apps
The manufacturing industry is now in transformation because of the enormous potential of Internet-of-Things, Big-Data, Edge Compute, and AI-based Analytics to innovate and:
- Improve productivity
- Reduce cost
- Enhance customer experience
- Increase market share
As new systems appear in all categories of technology, the most intense challenge is to make them all work harmoniously together, in other words: Integration. A “smart factory” is the ideal, and to achieve this ideal requires orchestrating many smart subsystems in a factory’s tech ecosystem.
In such an ecosystem, equipment and applications are interconnected to work toward common objectives. Hybrid integration strategies such as those we have discussed earlier will be required to achieve such an integration. In a typical manufacturing integration use case, the five key enterprise applications to be integrated are:
- ERP (Enterprise Resource Planning)
- PLM (Product Lifecycle Management)
- SCM (Supply Chain Management)
- CRM (Customer Relations Management)
- Manufacturing Enterprise-Specific Apps:
- MDM (Mobile Device Management)
- MES (Manufacturing Execution System)
- Equipment PLC (Programmed Logic Control)
To name a few of the many integrations required, the PLC must be integrated with the MES to accurately regulate process steps. Furthermore, devices including handheld scanners and cell
phones need to communicate and share data with all the other enterprise apps. An objective of integration in this scope is to ensure an integral closed loop of information collection and control universal to the system. For example, the IoT sensor Equipment must comply with industry standard protocols including:
- SECS (SEMI Equipment Communications Standard)
- GEM (Generic Equipment Model),
- OPC (Open Platform Communications),
Of course it should go without saying that the fundamental cloud protocol TCP/IP must be included in the transformations supported by our proposed hybrid integration platform.
The primary outcomes sought with multiple integrations include advanced analytics platforms to extract data from the above apps, sensors, devices, and files in order to facilitate complex analysis of factory performance and provide valuable insights to achieve optimization.
The vast event data streaming within the ESB scenario depicted in the diagram above will inevitably indicate Edge Compute methods to avoid streaming excessive data to Cloud apps. Edge Compute is one of the natural fits for Apache Pulsar, in which the desired outcome includes critical system performance forecasting. Maintenance forecasting, for example, requires the fastest updates from sensors, as well as the highest performance interface between enterprise application components. Only Apache Pulsar can provide the ultra-reliable, low-latency communication services required with edge computing scenarios. And this is a factor of growing importance: edge-based ML is proven to enhance several aspects of such operations, including data privacy and security. Pulsar’s smart pipes and lightweight compute framework optimize edge compute for ML, also increasingly important in ESB frameworks. The nature of data, including the size and time-dependence, the speed requirements for transmitting, as well as processing requirements, are all important factors in determining the accurate middleware solution required to interface two systems.
Apache Pulsar at the Core of Hybrid Integration Platforms
With the advent of increasingly “data-aware” integration strategies, real-time, Edge Compute, and even online processing are now very realistic for implementation today. Furthermore, with the recognition that only Apache Pulsar can support the massive data throughputs common in manufacturing, telecomm, and financial services industries today, many enterprises are now choosing to implement a hybrid implementation in conjunction with an expert partner, such as a Pulsar as a Service provider like Pandio. The lean data science engineers at Pandio have gleaned their expertise working with the legacy giants in the field, but now have the mobility to deploy their solutions for a small fraction of the giant’s overhead!