GDPR: Where is your data and how did it get there?

GDPR: Where is your data and how did it get there?

This blog was originally published in the IBM Big Data & Analytics Hub on May 10, 2018.

To meet the requirements of the European Union’s General Data Protection Regulation (GDPR), organizations need visibility of all the personal data they have for EU customers and employees.

Organizations must know what data of theirs they have, where it’s stored, who has access to it and how it’s processed. They need to know its lineage, too: where it comes from, what they’ve done with it, processing activities, and where it ends up.

The coach’s take: “With the GDPR, the time has come for businesses everywhere to put processes in place to define, discover and catalogue customers’ personal data.”

This is a formidable governance challenge, especially for larger businesses that hold vast quantities of customer data. That data is often in unstructured forms, such as emails, social media, recorded calls, and spreadsheets.

Compounding the challenge further is the obligation under the GDPR to quickly respond to customers when they request access to their data. Without technology to help, an organization might find it’s almost impossible to avoid breaching the regulations.

How artificial intelligence can help with the GDPR

Artificial intelligence (AI) solutions, which are based on machine learning, have a pivotal role to play in this process. They can help organizations locate customer data, establish where it’s from, remember the reason it’s there and find out in what volumes it’s being stored.

AI solutions like IBM Watson are uniquely suited to this challenge because they can process vast amounts of complex, ambiguous data in seconds and turn these into usable insights.

For example, machine learning solutions can discover how many phone numbers or email addresses an organization has in any one repository, allowing for the prioritization of next steps. Techniques such as natural language understanding can help identify personal data from unstructured documents by learning how to understand industry terms and data, and recognize people’s names. This is the technology underpinning IBM Watson solutions.

Machine learning is a key accelerator, since trying to find personal data via patterns or regular-expression searches alone, is not likely to find most of it. Machine learning can be trained to help find far more of it.

AI can also help organizations ensure certain datasets are available only to authorized internal stakeholders. For example, while a manager or an HR department would require access to employees’ personal data such as home address and medical information, the marketing department wouldn’t.

The coach’s take: “With the right technologies in place, you might find looking after your customers’ personal data and building closer, trust-based relationships is easier than you thought.”  

Mapping the future

Knowing where your data is and how it got there gives your business the opportunity to remove inaccurate, incomplete, erroneous or out-of-date data. This can free up storage and relieve IT teams of unnecessary responsibilities. Cutting the amount of customer data you hold also makes the data that remains easier to locate, helping to reduce the “burden” of GDPR compliance.

Of course, the benefits of building a complete, accurate, available and practical data catalog or map aren’t limited to GDPR compliance. It will also help:

  • Provide a better view of the stewardship of data: where it originated, of what it is comprised, who in the organization uses it and what they are using it for
  • Respond better to data breaches or emergencies
  • Respond to data access requests
  • Offer flexible and simple ways for customers and employees to take data with them
  • Build trusting and transparent relationships with customers and employees

What’s more, ensuring the right data is available to business users throughout the organization is an essential step towards building a more intelligent, data-driven business.

A strategic advantage

AI solutions powered by data models for specific industries are already showing exciting promise. These algorithms have been trained to understand specific subject areas such as medicine, underwriting, weather, banking transactions or, indeed, the GDPR. Such advances are leading to even more organized datasets, greater analytic insights, and more informed and strategic decision making for businesses.

As the GDPR focuses the minds of organizational leaders on greater data governance and seamless compliance, this is another example of how it’s also creating new and significant business advantages.

For more from the ‘Coach’ take a look at the rest of the GDPR series.

Notice: Clients are responsible for ensuring their own compliance with various laws and regulations, including the European Union General Data Protection Regulation. Clients are solely responsible for obtaining advice of competent legal counsel as to the identification and interpretation of any relevant laws and regulations that may affect the clients’ business and any actions the clients may need to take to comply with such laws and regulations. The products, services, and other capabilities described herein are not suitable for all client situations and may have restricted availability. IBM and CGOC does not provide legal, accounting or auditing advice or represent or warrant that its services or products will ensure that clients are in compliance with any law or regulation.