The Story Behind Presto DB
Presto DB is one of the most exciting developments in the digital world. In essence, it’s an SQL query engine built to handle the query needs of buzzing technologies such as Big Data. It was released under the Apache license, and like most products under it, Presto is a community-driven open-source piece of software. While Big Data is a technology that is arguably still in its infancy, the framework and architecture that enables it is already in existence – kind of.
This article will talk a bit about Presto, its history, why it was made, how it has changed, and what the future holds for this SQL query engine.
Where Was Presto DB Made?
Facebook initially developed Presto to run interactive queries on their massive Apache Hadoop and distributed MySQL data warehouse. From the beginning, Facebook developers; Martin Traverso, Dain Sundstrom, and David Phillips insisted that Presto be open-source software and free for commercial use. This means that everyone can use it for queries, data analytics, and data management.
The development on Presto originally started in 2012, and the software was deployed a year later, in 2013.
What Caused the Need for Such a Development?
Considering that the data warehouse consisted of multiple petabytes before the development of Presto, it was tough to query using the existing technology, Apache Hive.
Due to the relatively slow speed of Apache Hive and its inability to traverse and properly query large data centers, engineers over at Facebook created Presto to streamline, smooth out, and speed up the process. In more technical terms, Presto was created to solve the issue of low latency interactive analytics.
Furthermore, existing software was relatively primitive compared to the Presto technology, as it could not query data from multiple sources. Considering that most large-scale digital operations were dealing with more than one data source and that all data sources were massive in their regard, creating a tool such as Presto became a necessity.
Another reason Presto grew to popularity was the ability to handle different data sources simultaneously, which wasn’t available in existing software. Presto, on the other hand, can handle a variety of these data sources, which include but are not limited to:
- AWS S3
- Alluxio
- Cassandra
- Hadoop
- Kafka
- MongoDB
- MySQL
- Teradata
Queries in their entirety were relatively slow and clunky, and with the development of Presto, not only could they be sourced from multiple centers and done in a considerably faster way – it could do high-end interactive analytics while sourcing the data.
How Presto DB Has Changed Over Time
Presto has changed massively over the years. Since Presto is open source, this means that everyone can add to it and improve it in some way, and the open source community takes this responsibility seriously.
Ever since release, developers worldwide have been tinkering with the software to enable it to handle more queries and data at a much faster rate – and new releases are coming out every month.
Even other tech giants have come into the picture to use Presto as their go-to SQL software. A year after release, streaming megacorporation Netflix has reported that they’ve used Trino (formerly Presto SQL) to handle 10 petabytes of data stored on AWS S3.
In late 2018, Martin, Dain, and David left Facebook due to a unilateral change in the policies to govern the Presto project that gave Facebook committers more privilege to commit changes over the open source community. In lieu of this decision, the founders branched off of the Presto DB branch to create Presto SQL (now Trino).
At the start of 2019, they started the Presto Software Foundation started as a non-profit organization devoted to improving, streamlining, and improving Presto’s capabilities and prospects and turning it into Presto SQL.
Presto DB which is operated, developed, and maintained by Facebook, shares 6 years of history with Presto SQL.
At the end of 2019, Facebook donated the Presto DB project to the Linux Foundation for natural governance, a foundation known for pioneering open-source software as a concept.
The Change from PrestoSQL to Trino
After Presto DB was donated to the Linux Foundation, Presto SQL saw a seismic shift. The original developers, contributors, and people involved with Presto DB moved to develop Presto SQL.
This was quite a significant development in the Presto story and a massive controversy in the open-source software development community.
After years of not enforcing the Presto Trademark, The Linux Foundation took steps that forced the Presto SQL project to immediately remove all uses of the Presto trademark and rebrand at the end of 2020.The result was still the same project with a new name, Trino, and a new bunny mascot.
Since the split from the Presto moniker, Trino (Presto SQL) has continually gained more popularity, adoption, and contributions compared to the original Presto DB.
The Future of Presto DB / Trino
The digital world is in a state of constant evolution. As new technologies such as Big Data, ML, and AI continue on their progression trajectory – tools such as Trino will be the necessary technology that enables their growth.
Big Data isn’t much of technology as it is a selection of tools, methodologies, and practices. As our needs for accumulating data grow, so too will the requirement for indexing, querying, and analyzing this data – and that’s where Trino comes in.
Final Thoughts
Trino (Presto SQL) and Presto DB have a pretty rocky and exciting history behind them, which, while perplexing and controversial, has led to some fantastic innovation in the area of SQL engines. This technological advancement will yield a new age of data right beside technologies such as Big Data and serve as a top-of-the-line tool for a range of applications that don’t even exist yet.