The Importance of a Modern Data Architecture for Successful AI Projects
by Daniel Lambert (book a 30-minute meeting)
​
Data is everywhere. There were about 15 billion connected devices of all types in 2023[i]. This was about 1.9 devices per person worldwide[ii]. This number should grow to over 3 devices per person by 2030. In terms of data, the numbers are astronomical. In 2023, 120 zettabytes (ZB) of data were created worldwide, according to Edge Delta[iii]. This is 5.6 TB of data per connected device per year or 10.6 TB per person per year. In a day, it represents 22.5 GB of created data per person in 2023. It is expected that the data universe will continue to grow to 128GB per person per day in 2030[iv] fueled by video streaming and artificial intelligence. To understand this better, one hour of standard video streaming on Netflix is 1 GB[v] and ChatGPT alone was trained on a massive corpus of text data with 570 GB of datasets per day in November 2023. No wonder a growing quantity of corporate resources is allocated to managing, planning, and architecting data.
​
What is Data Architecture?
​
Data architects create artifacts of a data environment that aligns with the goals and objectives of their organization and its distinctive contextual requirements. According to TOGAF, “Data Architecture describes the structure of an organization's logical and physical data assets and data management resources”[vi]. It is an offshoot of enterprise architecture.
​
Data architecture should not be confused with data modeling which consists of a “process of discovering, analyzing, representing, and communicating data requirements in a precise form called the data model.[vii]” Data modeling is more about having a focused view on specific applications, systems, or business cases.
​
Data architecture is not limited to a set of products and tools an organization operates to manage its external and internal data. There is much more to it. Data architects define the methods to obtain, transform, and deliver usable data to the organization’s clients and business users. Most importantly, it detects the stakeholders who will use that data and their distinctive requirements. As indicated by Wayne Eckerson “A good data architecture flows right to left, from data consumers to data sources—not the other way.[viii]”
​
Modern Data Architecture
​
Organizations are not limiting themselves to static IT-driven data architectures anymore, called data warehouses. They take too many resources to implement and change. Today’s data architecture needs to be ready for speed, flexibility, and innovation. The key to a successful data architecture upgrade is agility. As shown in Figure 1 below, modern data architecture may still include a data warehouse and data marts, but they need to be more flexible, adaptable, and agile. The use will be limited to generating reports, dashboards, diagrams, and smart applications that are viewed by only a few casual users for analysis.
Data warehouses and data marts should only be one of many elements part of a modern data architecture. They are just a portion of a data lake environment, which is closer to a data ecosystem that uncovers and rapidly responds to changes, continuously understands, adapts, and delivers governed, tailored access to every stakeholder involved with operational applications and power analytic users, and artificial intelligence developers that are looking for ways to finetune the organization’s operations.
​
A modern data architecture environment should not be confused with a data platform. Data architecture refers to all the engines and data applications that move, shape, secure, and validate data. A data platform is about the database engines (e.g., relational, Hadoop, OLAP, OLTP) that process and ingrate data to allow data engineers from IT and business stakeholders to create collaboratively datasets for business applications and systems.
​
Organizations are moving quickly to deploy new data tools in conjunction with legacy infrastructure to drive client-driven innovations such as more personalized digital approaches, real-time alerts, predictive maintenance, etc. These technical embellishments, including data lakes, client analytics platforms, and stream processing, for example, have hugely amplified the complexity of data architecture. Without a more modern approach to data architecture, the proliferation, and variety of data extracted from just about everywhere in a business’s environment are significantly impeding its ongoing ability to deliver new business capabilities to provide value, maintain current infrastructures, and safeguard the integrity of tagged raw data necessary to build artificial intelligence (AI) models.
The steady flow of rapid market changes makes it very costly for organizations to wait for a more modern data architecture. Amazon, Facebook, and Google among others have been successfully investing in AI innovations that are disturbing rapidly traditional business models, forcing laggards to reshape some facets of their own offerings to keep up. Most cloud providers now offer serverless data platforms that can be used instantly, enabling early adopters to benefit from a faster time to market. Data analytics is now about automated model-deployment platforms enabling quicker use of new models. More and more businesses are adopting application programming interfaces (APIs) to share and synchronize data between disparate systems and applications within their data lakes to have a real-time view of what is really going on in their ecosystem to rapidly understand and integrate new perceived visions directly into their operational applications.
​
Six Foundational Data Architecture Shifts
​
In a modern data architecture, Antonio Castro, Jorge Machado, Matthias Roggendorf, and Henning Soller have identified six foundational shifts organizations need to grasp to enable more rapid delivery of new business capabilities and make more straightforward their current architectural model. “Even though organizations can implement some shifts while leaving their core technology stack intact, many require careful re-architecting of the existing data platform and infrastructure, including both legacy technologies and newer technologies previously bolted on[ix]”. These 6 shifts are as follows:
-
From on-premises to cloud-based data platforms,
-
From batch to real-time data processing,
-
From pre-integrated commercial solutions to modular, best-of-breed platforms,
-
From point-to-point to decoupled data access,
-
From an enterprise warehouse to domain-based architecture, and
-
From rigid data models to flexible, extensible data schemas.
Information Architecture - the Forgotten Part of Data Architecture
​
Gathering and storing data is easy. Making sense of data and extracting information from it is another story. What portion of your data can actually allow useful data analytics or train artificial intelligence in such a way to provide value to users and customers is much more difficult. In brief, what valuable information can we extract for our data?
Most business and enterprise architects understand business capabilities and their supporting applications. This is not enough. To build a modern data architecture, business and enterprise architects should also understand the need to examine information concepts (or information type or business objects). Business and enterprise architects need to ask themselves what information is required to deliver a value proposition to a client or a user with a value stream as shown in Figure 2 below[x]. Business and enterprise architects also need to identify what information can be created, modified, and/or used by business capabilities and that are stored in one or several databases.
Information architecture is “the structural design of shared information environments. (…) It is a subset of data architecture where usable data (or information concept) is constructed and designed or arranged in a fashion most useful or empirically holistic to the users of this data.[xi]” Information architecture is also about mapping a single source of truth of a domain of information used to plan software development, customize software applications, build websites, etc. In business architecture, information concepts are standard business terms and semantics. Information concepts are usable data created, modified, and used by business capabilities. In information technology, a database can support or store one or several information concepts.
​
Information mapping allows the creation of visual representations of what usable data is required to ensure that a business capability is performing well. As pointed out by Sam Forouzi, to succeed in examining your information concepts, you may need to dig deeper into more details and ask the following questions. “What information … to have? … to capture? … to create? … to share? … may be public? … must be private? … must be logged or audited? … to see? … to sell? …. Is needed in the future? Who … needs it? … creates it? … enters it? (…) Where does the information need to be … captured? … created? … used? … shared? When does information need to be … captured? … created? … used? …shared? How does the information need to be … captured? … created? … used? … shared?[xii]” Completing information relating to business capabilities will also allow better, smoother, and quicker planning of business process modeling (BPM), and UX design.
​
Data Architecture and Artificial Intelligence
​
Data Architecture is not sufficient to build valuable artificial intelligence projects. You also need Information Architecture, as shown below in Figure 3.
Having reliable data sources is crucial for developing valuable artificial intelligence models. High-quality data ensures accuracy, reduces biases, and enhances the model's ability to make accurate predictions. Robust data allows for better training, testing, and validation of AI algorithms, ultimately leading to more effective and trustworthy AI solutions. Investing in good data sources lays the foundation for building models that can drive meaningful insights and innovations in various fields.
​
Identifying relevant and required information within your data sources is vital for building an effective artificial intelligence model. Pertinent data enables the model to learn and generalize better, improving its performance. Filtering out noise from your data sources and focusing on relevant and required information enhances the model's ability to assist management in making reliable decisions and generating valuable insights. This process is essential for developing robust AI solutions that can effectively address specific challenges and deliver impactful results.
More and more digital transformation must include the modernization of an organization’s data architecture. It needs to be agile, quicker, more flexible, and easier to implement. A modern data architecture should also include information architecture to accelerate the planning of software development, integrate software applications, build valuable and successful projects with artificial intelligence, etc. Without a proper and modern data and information architecture, traditional organizations will continue to be laggards and more and more insignificant in tomorrow’s world.
​
​
____________________________
[i] This number is extracted from a diagram entitled “Number of Internet of Things (IoT) connected devices worldwide from 2019 to 2023, with forecasts from 2022 to 2030” published by Statista.
[ii] This number is 15 billion devices divided by 8 billion people worldwide in 2023. This second number is extracted from this Worldometers webpage.
[iii] Data extracted from this article entitled “Breaking Down the Numbers: How Much Data Does The World Create Daily in 2024?” published by Edge Delta.
[iv] Data extracted from this article entitled “How the data universe could grow more than 10 times from 2020 to 2030” published by UBS.
[v] Data extracted from this article entitled “How Much Data Does Netflix Use? Tips to Stream Smart and Save Data” published by Airalo.
[vi] Data architecture definition according to TOGAF: https://pubs.opengroup.org/architecture/togaf9-doc/arch/chap02.html
[vii] This definition is from DMBOK v2 from the Global Data Management Community on this webpage: https://www.dama.org/cpages/body-of-knowledge
[viii] Quote from this article entitled “Ten Characteristics of a Modern Data Architecture” published in November 2018.
[ix] The 6 shifts to modernize data architecture are described in detail in this article entitled “How to Build a Data Architecture to Drive Innovation—Today and Tomorrow” written by Antonio Castro, Jorge Machado, Matthias Roggendorf, and Henning Soller in June 2020 in McKinsey & Co.
[x] To learn more about extracting value from a value stream read this article entitled “Using Business and Enterprise Architects to Increase the Success Rate of SAFe® Projects”.
[xi] Definition of Information Architecture according to Wikipedia: https://en.wikipedia.org/wiki/Information_architecture
[xii] Quote extracted from an article entitled “Information Architecture - The (Forgotten) Part of Architecture” written by Sam Forouzi in April 2021 on LinkedIn.