Blog detail

Will AI resolve the never-ending data problem in IT?

The emergence of new data integration and management solutions that incorporate AI and machine learning is an indication that assistance is on the way to address the growing organizational data dilemma.

Businesses already receive a lot of useful benefits from artificial intelligence and machine learning, like fraud detection, chatbots, and predictive analytics. Yet ChatGPT has lifted the bar for AI/ML with its bold creative writing abilities. IT executives can't help but wonder if AI and machine learning are finally prepared to go beyond simple point solutions and tackle fundamental business issues.

Consider the most significant, lengthy, and perplexing IT issue of them all: Managing and integrating data across the company. As the volume, variety, variability, and spread of data across on-prem and cloud platforms climb an infinite exponential curve, that endeavour cries out for assistance from AI/ML technology today. According to Stewart Bond, vice president of data integration and intelligence software at IDC.

Can AI/ML actually bring order to the chaos of data? The answer is a qualified yes, but experts agree that we're only beginning to scratch the surface of what might eventually be possible. Many established providers of integration software, including Informatica, IBM, and SnapLogic, have introduced AI/ML capabilities to automate various processes, while a slew of more recent startups, including Tamr, Cinchy, and Monte Carlo, have made AI/ML the centrepiece of their services. None come close to providing end-to-end automated data management and integration procedures using AI/ML solutions.

That's just not feasible. Without human participation, no product or service can resolve every data anomaly, much less overhaul a disorganised enterprise data architecture. Today's AI/ML-driven solutions have the ability to significantly reduce manual labour in a range of data wrangling and integration tasks, from data categorization to creating data pipelines to enhancing data quality.

Such victories may be notable ones. But, a CDO (chief data officer) approach is necessary in place of the inclination to grab integration tools for ad hoc tasks if you want to make a significant, long-lasting impact. Enterprises require a comprehensive grasp of the metadata defining their whole data estate—customer data, product data, transaction data, event data, and so on—before they can prioritise which AI/ML solutions to apply where.

The size of the issue with enterprise data

Cloud computing has made this proliferation worse as business units swiftly launch cloud applications with their own data silos. Most companies today maintain a broad array of data stores, each one linked to its own applications and use cases. While some of those data stores (mostly data warehouses) serve individuals working in analytics or business intelligence, others (transactional data stores) might be utilised for transactions or other operational tasks.

According to Noel Yuhanna, President and lead analyst at Forrester Research, "any organisation on the planet has more than two dozen data management technologies," which only serves to muddle matters further. "None of those tools communicate with one another." Data governance, data observability, master data management, and other tasks are all handled by these tools. While some companies have already included AI/ML capabilities in their products, others have not yet done so.

Fundamentally, the main goal of data integration is to map the schema of diverse data sources so that data can be shared, synced, and/or enhanced amongst systems. For example, creating a 360-degree perspective of clients, the latter is essential. But, seemingly straightforward activities like figuring out if two clients or businesses with the same name are the same thing—and which information from which databases are accurate—require human interaction. Frequently, rules to manage different exceptions must be established with the assistance of domain experts.

Usually, an embedded rules engine found in integration software houses these rules. One of the relational database creators, Michael Stonebraker, founded Tamr, which has created an ML-driven MDM solution. Stonebraker uses a real-world example of a major media corporation with a "homebrew" MDM system that has been accumulating rules for 12 years to demonstrate the drawbacks of rules-based systems.

300,000 regulations have been written, according to Stonebraker. "If you ask someone how many rules they can understand, they usually say 500. If you really push me, I'll give you 1,000. I'll give you 2,000 if you twist my arm. But managing 50,000 or 100,000 rules is impossible. And because there are so many specific cases, there are so many regulations.

The chief product officer of Tamr, Anthony Deighton, asserts that his MDM solution gets around the rules-based systems' brittleness. What's good about the machine learning-based method, he explains, is that the system can smoothly adjust to changes when new sources are added or, more significantly, when the form of the data itself changes. To resolve differences, however, human judgement is still necessary, as is the case with the majority of ML systems, and continuing training with a huge amount of data is necessary.

The use of AI/ML is not a panacea. Yet, it can offer extremely useful automation across many data integration domains, not only for MDM. But, businesses need to clean houses in order to really benefit.

Data Quality improvement

Better data quality is where AI/ML is having the biggest impact, according to Bond. Yuhanna of Forrester concurs: "AI/ML is actually driving enhanced quality of data," he claims. This is so that ML can offer new rules or modifications that humans are unable to make because it can find and learn from patterns in massive amounts of data.

For transactional systems and other operating systems that manage crucial customer, employee, vendor, and product data, high-quality data is crucial. Yet, it can also significantly simplify life for data scientists who are immersed in analytics.

enhancing data quality

Better data quality is where AI/ML is most effective, in Bond's opinion. The analyst at Forrester, Yuhanna, concurs: "AI/ML is actually driving increased quality of data," he says. That's because ML can find patterns in massive amounts of data, learn from them, and suggest new rules or tweaks that people lack the time to make.

Systems that handle critical customer, employee, vendor, and product data, such as transactional systems, are dependent upon high-quality data. Yet, it can also greatly simplify life for data scientists who work in the analytics field.

Data quality is a continuous process that never ends. Data observability software is a brand-new category of solutions as a result of the constantly changing nature of data and the numerous systems it traverses. Data is being observed as it passes through data pipelines, according to this categorization. And it's locating problems with data quality," says Bond. The firms Anomolo and Monte Carlo are singled out by the author as two participants who assert to be "using AI/ML to monitor the six characteristics of data quality": accuracy, completeness, consistency, uniqueness, timeliness, and validity.

It's hardly a coincidence if this reminds you a little bit of the continuous testing required for DevOps. Dataops, where "you're performing continuous testing of the dashboards, the ETL processes, the things that make those pipelines run and analyse the data that's in those pipelines," is becoming more and more popular among businesses, according to Bond. But, statistical control is also added to that.

The issue is that discovering a data issue is post hoc. Without shutting down pipelines, it is impossible to stop bad data from reaching customers. But as Bond points out, if a member of the data ops team makes a repair and records it, "the next time that exception occurs, a machine may make that correction."