How Apache Pulsar Can Prevent Click Fraud
Let us understand what click fraud is. You are searching for something on Google.com. The search engine displays the search results, but it also displays the ads. For example, you may see an ad for Walmart. Once a user clicks on the ad, then Walmart will get charged based on the click. Google also allows you to insert the ads on your web page. If any ads are clicked from your webpage, then you will also get some revenue per click. This model works very well with a legitimate business.
There are several automation programs or scripts available in the market that will allow you to create a sample recording script and then re-run that same script with different data combinations. Let us review, step-by-step, how hackers can create the script for click fraud.
- Using any scripting program or recording program (LoadNinja or Selenium)
- Record the script that goes to web page and clicks on the Google Ads
- Re-run the same script to make sure that it runs successfully
- Run the same script with 100k virtual users
The purpose of this script is to click the Google ads multiple times, ranging from 100k times or more, depending upon the configuration. For each click Google pays money to the site owner. This is a clear case of click fraud, as the script is clicking on the ad instead of the actual users.
With these programs, you often have an option to create the script to send the transactions from different IP addresses. This will make it very difficult to determine if the transaction is coming from the actual users or the fraud script, which can impact online businesses.
There are several ways to prevent click fraud. Here are a few examples.
- A sudden spike in search cost: You should always consider reviewing the search cost weekly. If you notice a sudden click spike for your application, then you should investigate the true reason behind it. You can also set up a monthly advertising budget limit to be on the safe side.
- Repeated click from the same internet service provider: You should keep an algorithm running that detects this pattern. If you see repeated clicks from the same internet service provider, then the request should be blocked. You can also take advantage of a web application firewall, so any request that comes to your web application first needs to bypass WAF. It is possible to configure several rules such as OWASP top 10 on the web application firewall to prevent this type of attack.
- Performance spike: You should also keep the web server monitoring so you can be alerted in case of a sudden performance spike.
How Apache Pulsar Can Help Prevent a Click Fraud Attack
Apache Pulsar can be used to help prevent one of these fraudulent attacks by helping us analyze the events in real-time using the following principles:
- Received the data
- Analyze the issue
- Act on the data
Data ingestion to data processing happens in real-time using Apache Pulsar’s high-level architecture. Pulsar provides real-time streaming service with log storage and supports C, Go, Python, Java, and C++ clients. You can use heavyweight computes for faster click fraud detection or use lightweight computes. It is always expensive to have the infrastructure for real-time streaming vs. batch streaming service. If you have already set up monthly advertising limits, then you can also process the click fraud using batch streaming service and create the entire click fraud infrastructure as a nightly job on the public cloud. You can then read/Query data using the Pulsar SQL and identity click fraud incidents. You can also train the consumer application to learn those patterns and then prevent fraud in the future.
Your web application contains several micro-services and gets traffic from Google search ads. The micro-services will capture all the events, such as the Caller ISP, Caller IP address, Caller Required request, etc. The producer application produces an event that will be written to the topic. The topics are created within the Pulsar-Broker. Once the events are written to the broker, then the consumers will consume the events. You can configure several algorithms or machine learning algorithms as the consumer of the topic first. This algorithm will analyze any click fraud or any other fraud detection before responding to the user.
Apache Pulsar also provides Pulsar SQL. This is a very powerful tool that is useful for ingestion and querying the data as soon as it is written. You can also write a separate program using machine learning for detecting click fraud. This program will fetch the data using the Pulsar SQL and query the data in real-time. The real-time streaming data is the input for this python program. It can also act as a consumer. You can use a supervised machine learning algorithm. You can use the datasets to train the program and then run with the data. Pulsar SQL supports not only fraud detection but also several other use cases.
Here are just a few examples.
Real-Time Analytics: You can kick off any real-time analytics based on the event that the consumer application received from the topic. You do have an option to write a new analytics algorithm that analyzes the pattern and block the requests to prevent click fraud.
Event Logging: We may need to track details such as IP address and Username for audit purposes. Apache Pulsar can also store the operational logs or system logs.
Mobile App Analytics: Pulsar supports mobile application analytics. It can easily track all the streaming data from your application and then build the analytics to prevent click fraud.
Click fraud can be very expensive if it is not handled the right way.