The Story Behind Presto DB

Presto DB is one of the most exciting developments in the digital world. In essence, it’s an SQL query engine built to handle the query needs of buzzing technologies such as Big Data. It was released under the Apache license, and like most products under it, Presto DB is a community-driven open-source piece of software. While Big Data is a technology that is arguably still in its infancy, the framework and architecture that enables it is already in existence – kind of. 

This article will talk a bit about Presto DB, its history, why it was made, how it has changed, and what the future holds for this SQL query engine. 

Where Was Presto DB Made?

Facebook initially developed Presto DB to run interactive queries on their massive Apache Hadoop data warehouses. While Facebook developers originally made it, it has later expanded into open-source software free for commercial use, meaning everyone can use it for queries, data analytics, and data management. 

The development on Presto DB originally started in 2012, and the software was deployed a year later, in 2013. 

What Caused the Need for Such a Development? 

Considering that the data warehouse consisted of multiple petabytes before the development of Presto DB, it was tough to query using the existing technology, Apache Hive. 

Due to the relatively slow speed of Apache Hive and its inability to traverse and properly query large data centers, data scientists over at Facebook had created Presto DB to streamline, smooth out, and speed up the process. In more technical terms, Presto DB was created to solve the issue of low latency interactive analytics. 

Furthermore, existing software was relatively primitive compared to the Presto DB technology, as it could not query data from multiple sources. Considering that most large-scale digital operations were dealing with more than one data source and that all data sources were massive in their regard, creating a tool such as Presto DB became a necessity. 

Another very popular reason why Presto DB was born was the existing software’s inability to handle different data sources simultaneously. Presto DB, on the other hand, can handle a variety of these data sources, which include but are not limited to: 

  • AWS S3
  • Alluxio
  • Cassandra
  • Hadoop
  • Kafka
  • MongoDB
  • MySQL
  • Teradata

Queries in their entirety were relatively slow and clunky, and with the development of Presto DB, not only could they be sourced from multiple centers and done in a considerably faster way – it could do high-end interactive analytics while sourcing the data. 

How Presto DB Has Changed Over Time

Presto DB has changed massively over the years. Since Presto DB is open source, this means that everyone can add to it and improve it in some way, and the open source community takes this responsibility seriously. 

Ever since release, developers worldwide have been tinkering with the software to enable it to handle more queries and data at a much faster rate – and new releases are coming out every month. 

Even other tech giants have come into the picture to use Presto DB as their go-to SQL software. A year after release, streaming megacorporation Netflix has reported that they’ve used Presto DB to handle 10 petabytes of data stored on AWS S3. 

A few years later, at the start of 2019, the Presto Software Foundation started as a non-profit organization devoted to improving, streamlining, and improving Presto DB’s capabilities and prospects and turning it into PrestoSQL. 

Presto (PrestoDB) which is operated, developed, and maintained by Facebook, had some similarities in code to PrestoSQL, meaning that the two follow a similar trajectory. 

At the end of 2019, Facebook has donated the Presto DB project to the Linux Foundation for natural governance, a foundation known for pioneering open-source software as a concept.

The Change from PrestoSQL to Trino

After PrestoDB was donated to the Linux Foundation, PrestoSQL saw a seismic shift. The original developers, contributors, and people involved with Presto DB were not a part of PrestoDB post-donation, and most of them moved to develop PrestoSQL. 

This was quite a significant development in the Presto story and a massive controversy in the open-source software development community. 

To accommodate all people flocking to develop PrestoSQL, PrestoSQL underwent complete rebranding at the end of 2020 and changed its name to Trino.

Trino is essentially the same thing as PrestoSQL but takes a slightly different approach to development. Unlike Presto DB, Trino has all of the original major contributors and developers of PrestoDB working on it. 

Since the split from the Presto moniker, Trino (PrestoSQL) has continually gained more popularity, adoption, and contributions compared to the original Presto DB. 

The Future of Presto/Trino 

The digital world is in a state of constant evolution. As new technologies such as Big Data, ML, and AI continue on their progression trajectory – tools such as Trino will be the necessary technology that enables their growth.

Big Data isn’t much of technology as it is a selection of tools, methodologies, and practices. As our needs for accumulating data grow, so too will the requirement for indexing, querying, and analyzing this data – and that’s where Trino comes in. 

Final Thoughts

Trino (PrestoSQL) and Presto DB have a pretty rocky and exciting history behind them, which, while perplexing and controversial, has led to some fantastic innovation in the area of SQL engines. This technological advancement will yield a new age of data right beside technologies such as Big Data and serve as a top-of-the-line tool for a range of applications that don’t even exist yet.

Leave a Reply