loader

Presto Guide

Presto is a high-end, distributed SQL system for big data technology operating with state-of-the-art data querying infrastructure that works with various data sources.

What is Presto?

Presto (now known as either Trino or Presto) is a SQL system originally built by Facebook developers in 2012 to act as a high-performance, heavy-duty SQL system for large data centers. The project was made available to the open source community in 2013 to modify and use for their specific needs.

In January of 2019 the Presto Software Foundation was announced.  At the same time, the development of Presto forked – PrestoDB maintained by Facebook and Presto SQL maintained by the Presto Software Foundation. In September of that same year Facebook donated PrestoDB to the Linux Foundation. However the original committers and top contributors rebranded PrestoSQL as Trino. 

Both open source projects handle massive quantities of data with SQL queries in a multi-layered, scalable, and efficient manner. Presto outperforms other SQL engines because it works with almost all common data sources such as Hadoop, AWS S3, Alluxio, MySQL, Cassandra, Hive, MongoDB, Teradata, and many more. 

Another thing that makes Presto unique is its querying system. It allows the software to query multiple data sources within the same query, actively streamlining the efficiency and performance of the engine.

Presto is an exciting development. Originally designed to handle large data centers, Presto is becoming the de facto  SQL query engine for big data.

Big data is still in its infancy but it requires a programmable, adaptable, and efficient SQL engine to handle queries across such a wide infrastructure – all of which are flagship features of Presto.

Presto vs. Apache Hive

Presto was initially developed to overtake Apache Hive, as Hive couldn’t perform SQL queries with speed and finesse. Still, some companies and people use Hive, so we’ve decided to show just how better Presto is. If you want to see it in action, book here for a quick demo. 

Faster Speed – Presto operates at a much faster rate than Apache Hive when it comes to querying. That’s because Presto has no internal cache mechanism built-in, works on a unique multi-layered system, and is optimized to pull data from numerous data sources.

Straightforward pipeline – When working with Hive, you’ll have to wait for in-between stages to get your data. With Presto’s direct pipeline, you cut down on the waiting time considerably.

No memory – Presto doesn’t keep any memory onboard, including cache. Apache Hive, on the other hand, does, which significantly impacts its performance.

Open Source – Both are open-source, but Presto has a far more active community working on streamlining and improving the software every day. Hive is reserved for major corporate applications and companies with in-house IT departments.

Push Model – Apache Hive pulls data from the data centers, while Presto pushes it out. It makes for a much more streamlined process that mitigates data loss and crashes, which aren’t common with Presto.

Integration Conundrum – Presto is known for its simple integration within an existing framework and can easily adapt. Hive, on the other hand, needs to be modified to adapt to any existing configuration. 

Scalability – Unlike Hive, Presto is built for scalability, making it a good tool for data centers of any size. Hive works best with larger data centers.

Cloud friendly – Presto can be applied to the cloud with relative ease, while Hive isn’t cloud-friendly.

See How Easy It Is To Start With Presto

Frequently Asked Questions

Pandio’s Presto is a powerful SQL engine that can handle anything from small-scale data centers to up-and-coming technologies such as big data. Presto is fully open source and can work with all kinds of data sources. It’s very reliable and highly scalable, making it ideal for all querying applications. Presto was developed by Facebook and was later donated to the Linux Foundation.

Presto is better than Hive in almost every way, the most important of which are:

  •       Presto doesn’t keep internal memory or cache.
  •       Unlike Hive, Presto has a direct pipeline data system, cutting down on latency.
  •       Presto is fully scalable and works well on small and large applications alike, while Hive only works well for large applications.
  •       The Presto open source code is maintained and improved by the community, and Hive doesn’t have anything near that level of support.

 

A wide range of companies and large enterprises use Presto daily, including:

  •   Atlassian
  •   Amazon
  •   Airbnb
  •   Facebook
  •   Netflix
  •   NASDAQ
  •   Bigin
  •   Gympass
  •   Amperity
  •   Walt Disney
  •   DBS C2E

Presto is used as a premier SQL engine for any application possible. It’s used to extract data from huge databases, query more than one application simultaneously, and act as the primary analytics tool. Companies also use Presto for data combinations, querying data from multiple sources, and running aggregations.

Presto is used across all industries. From financial giants over at NASDAQ to the entertainment megacorporation Walt Disney, Presto is used wherever there’s any querying and analytical need.

Presto has memory-to-memory transfer, which is one of its main selling points and the primary reason behind its speed. Unlike a traditional pipeline, Presto uses a direct hybrid pipeline that allows it to pull data from multiple sources simultaneously.

Presto, in its initial form, has ceased to exist. These days, the two versions of what Presto used to be are Presto, owned and developed by the Linux Foundation, and Trino, owned and developed by the community and the developers who worked Presto before it was donated to Linux.

The differences between the two are pretty minimal, but Trino seems to have a far more active community of developers behind it. Both versions share most metrics and features.

Want to See Presto in action?

Try Pandio’s fully managed Presto service for free and start querying across any data sources you have.