At Capitalize, we love Alteryx. It’s a wonderful tool that gives a wide variety of data consumers the power to explore, analyze, and make predictions based on data from myriad sources. Alteryx enables automation of everything from simple end-of-month reporting to advanced data science use cases.
In our several years of Alteryx consulting, however, we have often run into some constraints when it comes to getting the most out of cloud data lake environments. As these sources become a more critical component of the modern data stack, it’s clear that additional tools are necessary to democratize access to the cloud data lake for analytics. One such tool is Dremio, the Open Data Lakehouse Platform, which simplifies and accelerates access to data residing in cloud data lakes, and connects Alteryx directly to data residing in cloud object storage like Amazon S3 or Microsoft Azure Data Lake Storage (ADLS).
In this blog post, we’ll discuss the growth of data lakes, and how Dremio and Alteryx work together to enable easy and self-service access to the data lake for analytics.
The Growth of Data Lakes
Decades ago, businesses were primarily concerned with collecting and analyzing structured data from business systems in the data center, and the tool of choice was a data warehouse. These data warehouses were proprietary, appliance-based solutions, and they struggled to scale with the volume, variety, and velocity of modern data.
To solve this problem, many businesses turned to data lakes, first with Hadoop, and then with cloud object storage. Data lakes worked well as cheap repositories for large volumes and a variety of data, and for small-scale exploratory data science projects, but they never managed to fully replace the Business Intelligence (BI) & reporting capabilities of the data warehouse.
As a result, most organizations found themselves managing a cooperative architecture with one or more data warehouses sitting side-by-side with one or more data lakes, and each still doing generally what they were designed to do – BI & reporting workloads are still relegated to the data warehouse, and data science is still relegated to the data lake. If a business user needs access to an important data source in the data lake, it requires a request to the data team, who usually needs to move the data into the data warehouse in the right format via an Extract, Transform & Load (ETL) pipeline.
The challenge for many organizations going forward is that the most important sources of customer and operational data over the next several years will be semi-structured and unstructured data from sources outside of the data center – sensor data, mobile data, social media, and more – and business users will increasingly require access to that data for BI & reporting. Many data teams have a backlog of data access requests that require ETL processes that are often manual and ad hoc. Once the data is in the data warehouse, data consumers often require copies of the data in the form of BI cubes and extracts to meet performance SLAs for various tools. The result is a complex web of ETL pipelines and proliferating data copies that adds time and complexity to data access requests and limits the scope of data for consumers.
Dremio Simplifies Data Architectures and Accelerates Time to Insight
Dremio is an open lakehouse platform that solves many of the challenges data consumers face when accessing data in the data lake. It consists of two products: Dremio Arctic, an intelligent metastore built on Apache Iceberg that optimizes and automates many data management and data governance tasks; and Dremio Sonar, a SQL query engine, query accelerator, and semantic layer that allows a wide range of data consumers to access data directly in the data lake for analytics.
Dremio eliminates complex, brittle ETL pipelines, and features a no-copy architecture that relies on Virtual Datasets, stored as SQL statements, to perform joins and queries across Physical Datasets rather than moving or copying the data. Dremio delivers several advantages over other data architectures for data teams and data consumers:
- Accelerated time to insight: Data consumers access data more quickly, and experience accelerated query performance over comparable solutions.
- Data consumers are more self-sufficient: The semantic layer empowers both technical and non-technical data consumers, from business analysts to data scientists.
- A single source of data: Data copies are difficult to manage and maintain, and can lead to different views of the data. Dremio eliminates data copies and creates a consistent view across the organization.
Dremio + Alteryx: Direct Access to the Data Lake with a No-Copy Architecture
Alteryx Designer, the desktop application for process automation that gives data consumers the ability to build workflows via drag-and-drop features, works with a wide range of data sources. Alteryx users can take advantage of a couple of features of Dremio that improve the data lake experience:
- The Semantic Layer: Dremio’s semantic layer accelerates the process of getting data into Alteryx. Users can perform joins across Physical Datasets to create Virtual Datasets, and Alteryx can target those Virtual Datasets, which are simple SQL statements, not an actual copy of the data. The no-copy architecture means every data consumer has a consistent and up-to-date view of the data.
- Query Acceleration: Dremio optimizes query performance so that data consumers are not limited in the amount of data they can analyze. Business analysts and data consumers can explore millions and even billions of table rows at interactive speed.
Finally, Alteryx is able to write back to the data lake, so other data consumers can take advantage of analytic insights discovered within Alteryx with other tools like Tableau or Power BI, and access those views via the Dremio semantic layer.
The result of this combination is democratized access to the data lake for a wide range of analytic use cases, from BI & reporting to predictive and prescriptive analytics, with a simplified data architecture built on top of the most critical customer and operational data our organizations are collecting.
See Dremio + Alteryx in Action
If you’d like to see how all of this comes together, check out our recent webinar, “Unlocking Analytics from your Data Lake with Alteryx and Dremio.” In it, we explore the synergies between Alteryx and Dremio, and offer an in-depth demonstration of the two technologies. You can watch it on-demand at the link above.
When you’re ready, contact us at firstname.lastname@example.org, and we will be happy to put the Alteryx + Dremio solution into the context of your business and data stack.