PrestoDB

Posted on April 29, 2021

Why PrestoDB (now Trino) continues to gain in popularity

Many people have already accepted that we are living in the age of the 4th industrial revolution. Fields with information technology, the biological, digital, and physical worlds are quickly becoming interconnected.

These fusions lead to the development of new technologies like quantum computers, 3D printing, IoT, robotics, machine learning, and much more. One thing that is crucial for all these technologies is big data.

The more technologies are developed, the more data is generated, and it needs to be analyzed. For these reasons, billions and billions of gigabytes of data need to be analyzed and queried efficiently. That’s why there is such a big need for distributed cloud queries like PrestoDB and Trino.

What is PrestoDB & Trino?

However, before we start talking about the current situation of PrestoDB/Trino, we need to make a couple of things clear. First of all, people need to understand what they are and that they are, in fact, the same thing.

PrestoDB was created at Facebook in 2012. Initially, it was designed to help speed up 300 PB Data Warehouse queries as they were too slow. The goal was to create an engine that could connect easily to various data warehouses or databases, be easy to use, and can be integrated with all BI tools.

Presto was the solution for all of this as it improved efficiency and speed of access to large-scale data. Over time PrestoDB grew into PrestoSQL that was designed to be used for various situations and by different companies. PrestoSQL transformed into Trino in 2020 and continued building on its past capabilities while catering to many companies.

Essential Reason Why Presto/Trino is Used for Cloud Computing Stacks

Trino can interface directly with many different data sources due to its connector design. It can work with raw data lakes storage, including HDFS and AWS S3 data blocks. On top of that, it can also work with various relational databases such as Microsoft SQL and MySQL.

Amazon has taken the Presto open source and offers a hosted cloud option that it calls “Amazon Athena.” It’s an interactive query service that’s completely serverless and allows users to use standard SQL to analyze Amazon S3 data. Using various other services, developers can get a visual interface that lets them create real-time tables while ensuring the best performance.

Furthermore, these services also let users delete or update tables within Athena for compliance and common CDC use cases. That’s what makes Trino an excellent solution for analyzing large volumes of data with lower costs and great speeds.

Trino’s Versatility Makes it Very Convenient!

Even though it was initially designed to be used on Facebook, Trino has evolved into a comprehensive SQL querying engine. Trino has an MPP architecture that makes it flexible and scalable.

Since Trino is a querying engine at its core, it can separate storage and computing using connectors to query against other data sources. Compared with other technologies, Trino has the best versatility in querying against data sources, including columnar databases, non-relational, and traditional databases.

Columnar

Clustered systems
Hadoop distributed file systems
Azure blob storage
Google cloud storage
Amazon S3

Non-relational

Apache Cassandra
Redis structure store
MongoDB Atlas Database

Traditional

SQL
PostGres
MySQL

Which Companies use Trino for Their Operations

Some of the largest tech giants across the world use Trino in their operations. Some of the biggest names that use it are, of course, Facebook, Nasdaq, Atlassian, Asurion, Inmar, WyzAnt, Netflix, Airbnb, and many others.

Thousands of people use the Trino implementation at Facebook. They run over 30,000 ongoing queries. Due to the platform’s high data generation, the company uses Trino to process more than a petabyte of data each day. Netflix uses its Trino clusters to run an average of 3,500 queries each day.

On the other hand, Airbnb has created its own qwerty tool called Airpal and uses it over Trino for their operations. In terms of industry type that uses PrestoDB/Trino, here are the rough numbers:

40% of companies are from the computer software industry
6% are from the information technology & services sector
6% from marketing
5% offer financial services
5% operate in the retail sector

Common Uses of PrestoDB/Trino

Ad-Hoc Queries

Trino lets users run ad hoc queries with SQL no matter where data might be located. There’s no need to ETL the data to another system – you can query it at its storage place. Connectors let teams get access to data for analysts to work on.

Dashboard & Data Reports

Another popular use of Trino is to query multiple data sources to create dashboards and reports. Analysts and data scientists can query multiple data sources independently without having to work with data platform teams.

SQL Transformations

Lots of companies use Trino to run ETL queries against multiple data sources. These queries give more output and save resources, making them much better than legacy batching.

Analyzing Data Lakes

Organizations often use this engine to query their data lakes directly, as there’s no need to do any transformation. On top of that, it works with unstructured and structured data.

The Future of Trino

Even though PrestoDB has rebranded itself, the software hasn’t changed. The main reason for this was that Facebook “took over” Presto and trademarked it as its own service. However, this couldn’t be further from the truth as it was always an open-source solution made by its employees.

Trino is not only an engine – it’s a different approach to data. In a world where machine learning is evolving rapidly along with big data, it’s necessary to process lots of data quickly to gain important information. That’s what Presto has been doing for a long time and will continue to do.

Bottom Line

Ever since it was created, Presto has been improving and opening new opportunities to analysts and data scientists but also programmers. We can expect Trino to continue in the same fashion when cost-efficiency and resource management are so important.

You must be logged in to post a comment.