Before we can decipher the data mesh principle, we must first understand what the concept encompasses. If you’re here, there’s probably a good chance you have a firm grasp of data mesh principles and what they mean, but perhaps you’d like it broken down a bit more.
Data mesh is a paradigm shift regarding big analytical data and the management of that data. Data mesh has addressed many of the limitations of past data management systems, including data lakes and warehouses.
There are four main principles to data mesh, and they include:
In this article, we’ll look closely at each of the four principles of data mesh so you can understand the benefits and challenges you might experience when implementing Data Mesh principles within your company. You can never be too prepared whenever you employ a new way of managing data. More profound knowledge of data mesh and product development will help you better integrate it as a new data management approach.
Investing in Data Mesh
It’s no secret that many organizations today struggle with managing their data, primarily as it grows seemingly beyond their control. Data lakes and data warehouses can help build a solid infrastructure, but monolithic data platforms are becoming a thing of the past. Thousands of companies (of all sizes) are investing in data mesh architecture and operating models to focus more intently on outcomes while increasing data agility.
Though the need for a more efficient solution to data management is evident, business owners aren’t fully grasping how to implement data mesh architecture into their organizations. Data mesh aims to break down the traditional ways of storing larger amounts of data and allow businesses to share data across specific domains. Still, it’s crucial to note that this alone does not make your new (though decentralized) platform data mesh.
To completely integrate the principles of data mesh, companies must employ them to work together. Each principle is interconnected, working together to provide a way for companies to experience growth on different levels. It’s crucial to develop an operating model that combines the four principles into a finalized, workable data management architecture.
The first principle of data mesh is often referred to as domain-oriented ownership or domain-driven ownership of data. As company data platforms expand, they’re likely to outgrow the methods used to produce and consume data. Collecting data without the right data management ecosystem puts much pressure on data platforms and organizational structures.
The collection of data has to scale out horizontally at some point, and to do that, you can use the first principle of data mesh, or the decentralized, domain-oriented architecture of data ownership. Companies that expect to decentralize their monolithic platforms must change how they currently think about big data and who owns it.
We cannot keep data flowing from domains into centrally-owned lakes and platforms. Instead, finding a way for domains to serve and host their data in a way that’s easy to consume. Each domain should own all the data, including access control and consumption patterns.
This principle includes the ability to develop an end-to-end data solution, which means the inclusion of architectural considerations and the support of new technologies introduced, such as resources from the cloud. Domain-oriented ownership requires multi-disciplinary teams that can operate independently if needed.
This team should generally consist of the product owner, data architect, data engineer, data modeler, and data steward. If you’re unsure what professions to include in your new data management process, you might consider data scientists, machine learning engineers, data analysts, and software engineers. Building the team is half the battle.
Data as a Product
The second principle for data mesh is data as a product. The data mesh process can raise questions regarding the usability of datasets, as well as accessibility. Data as a product can enable consumers to understand and securely discover data distributed across various domains.
However, data doesn’t just become a product existing within a new data mesh infrastructure. Instead, domain teams should set a representation of their own data infrastructure, also called a producer, of which the output is a data product.
Any code or data relevant to that product is kept within the producers. Domain ownership is flexible, allowing one domain to own more than one producer and the products associated with that producer. The producer will publish the data product via an integration pattern or interface, which can look like a table share, API endpoint, database, or data marketplace.
Categorizing data products based on the business needs they serve can highlight the expectations of each mesh product. There are two categories appropriate for data products in the mesh, and these include:
There are heavy costs associated with the data product model, which reduces the speed of advancements. Data products require a strong operational model and flawless design processes. Also, your teams have to have the correct capabilities to build the platform.
Self-Serve Data Infrastructure as a Platform
Data as a product gives a cause for concern about the domain team cost of ownership, leading to the data mesh principle of self-serve data infrastructure as a platform. This third principle developed as a solution to accelerate the completion of producers in the data mesh. The self-serve approach also helps standardize the patterns and tools across domains.
Building data infrastructure as a platform means keeping the domain agnostic and ensuring that the platform disguises any possible problems while providing data with a self-service practice. To build this infrastructure, we must decrease lead times to create new data products faster.
Self-serve data infrastructure requires an approach that is not too restrictive or lacks strategy. Nobody wants a solution that’s difficult or impossible for various people to use. How can we do this? By implementing the following.
Federated Computational Governance
The fourth principle of data mesh involves enabling users to gain value from correlating independent data products and establishing a catalog of shared products. A data product catalog is a metadata store that includes non-technical and technical metadata about data products.
When executed correctly, a data catalog presents a two-way relationship involving sourcing and publishing data products. Of course, correctly implementing this product is much more than purchasing data catalog software and integrating it with your existing technical metadata store.
One of the most significant risks with a data product store is the possibility of the data product development moving too slowly. One team cannot manage every aspect of this solution; if we expect them to, we’re asking too much. Teams must embed data management into domain teams so team members can execute the proper activities without overwhelming.
Developing a shared responsibility model is the way to govern data and clarify expected roles within the data mesh. We must put together the right mix of resources or risk getting nowhere.
Working Through the Data Mesh Principles
If working through the principles seems challenging, that’s because it’s no easy feat. The principles intertwine, and each one is necessary to address any potential problems or risks that could happen in another. The data mesh principles are not easy to operate, and the only real solution to that problem is to make them work together.
When these four principles are mapped out together and executed correctly, you’ll have streamlined, seamless results that address many of the issues that come up in the data mesh architecture solution. The data mesh principles are advanced, but with the right team in place, companies can execute them in a way that makes them work to store and manage data on a whole new level.