Data Mesh And It’s Principles

“Data Mesh is a strategic approach to modern data management and a way to strengthen an organization’s digital transformation journey, as it centers on serving up valuable and secure data products.” 

Many organizations have invested in a central data lake and a data team with the expectation to drive their business based on data.

However, after a few initial quick wins, they notice that the central data team often becomes a bottleneck. The team cannot handle all the analytical questions of management and product owners quickly enough. This is a massive problem because making timely data-driven decisions is crucial to stay competitive. For example: Is it a good idea to offer free shipping during Black Week? Do customers accept longer but more reliable shipping times? How does a product page change influence the checkout and returns rate?

The data team wants to answer all those questions quickly. In practice, however, they struggle because they need to spend too much time fixing broken data pipelines after operational database changes. In their little time remaining, the data team has to discover and understand the necessary domain data. For every question, they need to learn domain knowledge to give meaningful insights. Getting the required domain expertise is a daunting task.

On the other hand, organizations have also invested in domain-driven design, autonomous domain teams (also known as stream-aligned teams or product teams) and a decentralized microservice architecture. These domain teams own and know their domain, including the information needs of the business. They design, build, and run their web applications and APIs on their own. Despite knowing the domain and the relevant information needs, the domain teams have to reach out to the overloaded central data team to get the necessary data-driven insights.

With the eventual growth of the organization, the situation of the domain teams and the central data team becomes worse. A way out of this is to shift the responsibility for data from the central data team to the domain teams.

This is the core idea behind the Data Mesh concept: Domain-oriented decentralization for analytical data. A data mesh architecture enables domain teams to perform cross-domain data analysis on their own and interconnects data, similar to APIs in a microservice architecture.

What Is Data Mesh

Much in the same way that software engineering teams transitioned from monolithic applications to microservice architectures, the Data Mesh is, in many ways, the data platform version of microservices.

Data Mesh

The term Data Mesh was coined by Zhamak Dehghani in 2019 and is based on four fundamental principles that bundle well-known concepts:

  • The domain ownership principle mandates the domain teams to take responsibility for their data. According to this principle, analytical data should be composed around domains, similar to the team boundaries aligning with the system’s bounded context. Following the domain-driven distributed architecture, analytical and operational data ownership is moved to the domain teams, away from the central data team.
  • The data as a product principle projects a product thinking philosophy onto analytical data. This principle means that there are consumers for the data beyond the domain. The domain team is responsible for satisfying the needs of other domains by providing high-quality data. Basically, domain data should be treated as any other public API.
  • The idea behind the self-serve data infrastructure platform is to adopt platform thinking to data infrastructure. A dedicated data platform team provides domain-agnostic functionality, tools, and systems to build, execute, and maintain interoperable data products for all domains. With its platform, the data platform team enables domain teams to seamlessly consume and create data products.
  • The federated governance principle achieves interoperability of all data products through standardization, which is promoted through the whole data mesh by the governance guild. The main goal of federated governance is to create a data ecosystem with adherence to the organizational rules and industry regulations

Why Use A Data Mesh

Until recently, many companies leveraged a single data warehouse connected to myriad business intelligence platforms. Such solutions were maintained by a small group of specialists and frequently burdened by significant technical debt.

In 2020, the architecture du jour is a Data Lake with real-time data availability and stream processing, with the goal of ingesting, enriching, transforming, and serving data from a centralized data platform. For many organizations, this type of architecture falls short in a few ways:

  • A central ETL pipeline gives teams less control over increasing volumes of data
  • As every company becomes a data company, different data use cases require different types of transformations, putting a heavy load on the central platform

Such Data Lakes lead to disconnected data producers, impatient data consumers, and worse of all, a backlogged data team struggling to keep pace with the demands of the business. Instead, domain-oriented data architectures, like Data Meshes, give teams the best of both worlds: a centralized database (or a distributed data lake) with domains (or business areas) responsible for handling their own pipelines. As Zhamak argues, data architectures can be most easily scaled by being broken down into smaller, domain-oriented components.

Data Mesh vs Data Lake

The Data Lake is a technology approach, whose main objective has traditionally been as a single repository to move data to in as simple a manner as possible, where the central team is responsible for managing it. While Data Lakes can provide significant business value, they also suffer from a number of issues. The primary issue is that once data is moved to the lake it loses context, for example we may have many files containing a definition of customer, one from a logistics system, one from payments and one from marketing, which one is correct for my usage? Furthermore data in the Data Lake will not have been pre-processed, so data issues will inevitably arise. The data consumer will then typically have to liaise with the data lake team to understand and resolve data issues, which becomes a significant bottleneck to using the data to answer the initial business question.

In comparison Data Mesh is more than just technology, Data Mesh combines both technology and organizational aspects including the idea of data ownership, data quality and autonomy. So consumers of data have a clear line of sight around data quality and data ownership and data issues can be discovered and resolved much more efficiently. Ultimately data can be used and trusted.

Data Mesh vs Data Fabric

Data Fabric concentrates on a collection of various technological capabilities that collaborate to produce an interface for the end-users that consume data. Many of the supporters of Data Fabric espouse automation through technologies like ML of many of the data management tasks to enable end users to access data in a simpler way. For simple data usage there is some value in this, however for more complex situations or where business knowledge needs to be integrated into the data then the limitations of Data Fabric will become apparent.

Arguably Data Fabric could be used as part of a Data Mesh self-serve platform, where data fabric exposes data to the domains who can then embed their business knowledge into a resulting data product.

As Darnell-Kanal Professor of Computer Science, University of Maryland at College Park Daniel Abadi says the difference between a Data Fabric and Data Mesh is not obvious. He advises, “Ultimately, an optimal solution will likely take the best ideas from each of these approaches.”

Final Thoughts

Data Mesh may not be applicable in every environment, but it offers an alternative to current data architecture models, allowing greater synergy between technical teams and business areas, which are the big users of data.

🅐🅚🅖


Interested in Management, Design or Technology Consulting, contact anil.kg.26@gmail.com
Get updates and news on our social channels!

LATEST POSTS

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.