Databases have been around for decades and since 1979 we use one primary way to query them: we use the SQL language. SQL is a well-defined language used across the industry. It's complete, flexible, fast and simple. It requires some learning though as it comes with its own concepts, syntax, structure and variances. So are there some easier ways to query the data for power users who don't know SQL? Data Discovery tools have been an answer to that and let people query the data in visual ways. There are also numerous query builder tools that can help build SQL queries. While these are indeed helping to query the data, they have limitation and can't provide the flexibility and completeness SQL brings. It also requires to know the database very well. But what if we could query the data using natural language? What if querying the database was the same as asking a question to your data analyst?
Oracle Autonomous Database has taken a significant leap forward with the introduction of Select AI, a new LLM-powered feature that seamlessly integrates natural language queries with database operations. This capability enhances user experience and makes complex queries more accessible to a broader audience.
Autonomous Database delivers Select AI, a conversational SQL generation tool set driven by pluggable LLM technology. Marty Gubar (Product Manager), Jeffrey Thomas (A-team Architect) and Jerome Francoisse (A-team Architect) discuss the benefits, use cases, and position in the stack of this new exciting capability in Oracle Database.
At the core of Select AI lies the power of large language models (LLMs). These models, trained on vast datasets from the internet, possess an incredible ability to understand and interpret natural language. Let's have a look at a sample question: "what are our total streams for each Tom Hanks movie this month?". LLMs can understand the meaning of that question. The model can infer that "total streams" means total number of movie views, "for each" means a breakout per movie and "this month" requires filtering on the current month. What's impressive is that the language model grasps the implicit context, recognizing Tom Hanks as an actor even though not explicitly stated in the query. It therefore knows it has to filter based on the actor as well. This contextual understanding is a game-changer, allowing users to frame questions in a more conversational manner without the need for explicit details.
Once a natural language query is input, Select AI generates the corresponding SQL query which is then executed privately on the user's data in the Autonomous Database. The LLM doesn't have access to the data stored in the database. It only knows the structure of the table so the table and column names. The beauty of this process lies in the fact that users don't need to know every detail of their database schemas or know where specific data resides. The LLM takes care of the intricate details, generating the necessary joins and aggregations to produce the desired result.
Select AI is not tied to a particular LLM. At the moment it supports OpenAI, Cohere AI, Azure OpenAI Service and OCI Generative AI. Configuring a new LLM profile is quite simple to do and makes this feature future-proof. The Generative AI space is evolving quickly with new models at a very high cadence right now so being able to switch from one model to another and quickly configure new one is crucial.
While the Select AI is indeed a remarkable technology, we can't stress enough the importance of validation. Language can be imprecise, and queries might not always capture the user's intent accurately. Users are encouraged to verify the results and ensure that the generated SQL aligns with their expectations. This is particularly crucial for scenarios where the query might have multiple interpretations. For instance when asking for Tom Hanks movies the user might expect the movies he produced and directed as well.
Select AI not only enables power users not proficient in SQL to query the data. It also significantly enhances developer productivity. The ability to generate SQL statements based on natural language queries serves as a starting point for developers. It streamlines the development process, quickly providing a query that developers can further optimize, saving time and effort.
Next to the "Run SQL" mode, the "Show SQL" mode allows developers to inspect the generated SQL, empowering them to fine-tune and optimize queries for better performance before running them. This not only benefits SQL developers but also aids data scientists in discovering where their required data resides, bridging the gap between data science and SQL development. The "Explain SQL" mode describes the process being the query in natural language. This is helpful for users who can't read SQL but want to validate the query generated.
Select AI is a paradigm shift in the way we approach database queries. This technology marks a significant step forward, enabling faster time-to-market, improved accuracy, and a more streamlined process for extracting insights from data. Select AI in Oracle Autonomous Database is a technological advancement that has the potential to reshape how we interact with databases. Its ability to understand and process natural language queries not only simplifies the user experience but also empowers developers and data scientists to unlock the true potential of their data. As we look ahead, this marks a defining moment in the evolution of database technologies.
Jérôme is a Solution Architect for the A-Team with a focus on Data Integration, Big Data and Analytics.