Powering trustworthy customer analytics with a Customer Data Platform

18 minute read

Team BlueVenn

Customer analytics have the potential to promote business profitability by surfacing insights which enable correct business decisions to be made, increasing customer revenue or reducing inefficiencies in the marketing, sales and customer service teams. Meaningful and accurate customer insights must be based on trustworthy, reliable customer data, yet most companies today are dealing with data that is incomplete, inconsistent, inaccurate, inaccessible or fragmented, resulting in business and marketing decisions being made that fail to yield the desired results.

In the best case scenario, a data science team or specific business users will spend precious time cleaning up the mess. In the worst case scenario, time is not spent cleaning the data and faulty data becomes the foundation for poor decisions. It’s not a pleasant choice!

A Customer Data Platform (CDP) is therefore an attractive option for any person or team that is tasked with  generating useful customer insights that the business can base decisions upon. A CDP not only removes the need to have dedicated resources, spending their time accessing and manipulating the data from many systems, but has the potential to handle the automated cleansing, enhancement and standardization of the data that will ensure the customer analytics process yields trustworthy results.

A Customer Data Platform can create a continuously updated pool of clean, standardized and enhanced customer data that is readily available to data scientists, business analysts, and any other end users that need it for analysis or marketing purposes.

In this extended customer analytics blog, we will outline why establishing a trustworthy data deposit is so important, what processing factors help to create such a deposit, what system requirements you should consider when settling upon the perfect CDP to match your business needs, and what role a CDP plays in the customer analytics function.

Table of contents

Remembering the competitive advantage of customer data

Although traditionally a marketing purchase, the CDP becomes a single source of truth for the business that reduces the time needed to access data and ensures consistency in the reports and insight generated by different departments. Across the entire business, customer data is transformed from a precious but inaccessible buried treasure into ready cash, with the critical difference being that cash can only be spent once, while customer data can be constantly reused to create new value.

When data related to customer transactions, behaviors and interactions is scattered across the business, and business users are unable to process it effectively, a culture of inconsistent decision making and opinions develops. If one person analyzes one set of data, whilst another uses a different view, then their two conclusions could be wildly different.

In a world where evolving privacy regulations and growing consumer mistrust make external data harder to obtain every day, a company’s own customer data store is an increasingly important source of competitive advantage. Making that data as accurate and accessible as possible, therefore, is an important function of a Customer Data Platform.

69% 92% 79%
of managers in a global survey reported low or limited trust in their business metrics. of executives in a global survey agreed that data is “critical to improving customer experience” of data scientist time spent gathering and preparing data
(Econsultancy, Reinventing Commerce, 2019) (Oxford Economics, The Future of Data: Adjusting to an Opt-In Economy, 2018) (CrowdFlow, 2016 Data Science Report)

Customer analytics criteria

Delivering a customer database that meets the criteria for trustworthy data analysis demands a substantial amount of technology, conducting a number of processes. Specific requirements will include:

Adequate source data

Most customer information comes from systems that gathered it for their own purpose: websites, email marketing campaigns, call centers and CRM systems, mobile apps, order processing systems, customer service tools, individual spreadsheets, and more. Unfortunately, the data needed for the original purpose isn’t always adequate for analysis.

For example, an order processing system may need a physical address for package delivery, but not a phone number, email address, full name or customer ID. Yet, if the analysis system is to link those orders with other data about that customer, all of this information may be required.

Even if the original system asks for the additional data, users will often leave fields blank or provide false values if they recognize that certain details are not really needed. Improving the quality of source data requires a clear definition of quality standards, ongoing monitoring of inputs against those standards, and programs to identify and remove the causes of errors. The primary responsibility falls on data stewards for the source systems, so analysts need to work with those stewards to explain what standards the data needs to meet for analytical purposes and show that meeting those standards is worth the trouble.

Appropriate detail of customer data

The level of detail needed will depend on the particular project. Analytical requirements are especially variable, since project purposes vary so widely. A master analytical data set should retain as much of the original detail as possible, so the appropriate details can be extracted in each situation. Therefore, as data changes over time, you need the ability to understand those changes, but also ensure that every data point throughout time can be accessed if required to meet the specific analytical needs of any project.

For example, you may want to analyze website and eCommerce trends of active customers from 2010, compared to 2020. To ensure that you have the depth of data needed to generate meaningful insights, it is important that the 2010 data is not summarized or erased, so that the granular data can be analyzed.

Data standardization

Perfectly valid data may still be inconsistent across systems, or even within the same system. There often isn’t a problem in the original context, but discrepancies can confuse analytical and marketing systems that rely on consistency. Standardization removes these inconsistencies. This will include putting all data of the same type into the same format (e.g. all dates in form YYYYMMDD, all temperatures in Celsius, all monetary values into dollars), mapping all versions of an identifier to a single master (e.g. alternate names or SKUs for the same product), converting field names to a consistent standard, and applying complex standardization rules (e.g. postal address formats).

Standardized values should always be placed in newly created fields, so that the original data remains available for examination and possible reprocessing.

A common example that we see in the multi-channel retail industry, where customers are purchasing via many methods (e.g. in-store, via the eCommerce website and through a mobile app), each product being sold may have a different SKU or product identifier on each payment system. It is important to ensure that a pair of brown canvas shoes is not seen in the database as three unidentifiable entries (e.g. SKU12345, POS34567 and EC98765), but can be analyzed as one product named ‘brown canvas shoes’.

Watch on demand: Putting predictive analytics to use for delivering better customer experiences

Watch this on-demand webinar to understand how a Customer Data Platform provides a foundation of trustworthy data to help you build effective and accurate predictive models. We also take you through some examples of predictive analytics that help to reinforce customer loyalty and increase purchase probability.


Data enhancement

This process involves adding new data derived from the inputs. It uses external data sources, such as address tables (to get proper street names, postal codes, political jurisdictions, latitude/longitude), product catalogs (for product families and attributes), business directories (for industries, corporate parent/subsidiary relationships, corporate executives, product ownership), personal directories (to append contact information, demographics, interests, family relationships) and content taxonomies. Enhancements such as address correction will be done before identity matching, to improve match results; other enhancements will be done after matching, to take advantage of unified data and “Golden Records”.

For example, when you want to try and match several databases or data tables together, and identify an individual customer captured across those data sets, the more potential identifiers you have in the data, the more likely you are to find a match. Enhancing and adding postal addresses, telephone numbers, or any other unique or nearly-unique variables, therefore, ensures that the accuracy of matching disparate data sets together will improve, so this work should be undertaken before the matching process.

Data matching

Records relating to the same customer must be identified, so that they can be combined into a unified profile. This identity matching process (often referred to as identity resolution) can employ a variety of techniques, ranging from exact, deterministic matches on a customer ID or phone number, to similarity (or loose) matches on similar names and addresses, to probabilistic matches between devices that are often used at the same time, in the same place. Matches between pairs of identifiers can also be stitched into a master list, so a device-to-email match and email-to-shipping address match will attach all three elements to the same customer. Matching may also use external data to provide links which a company‘s own data would not provide, such as the old and new postal address of people who have moved, or a pool of device and browser IDs to help make probabilistic matches a bit more accurate.

Customer data unification

Once all matches are found, the standardized data can be combined into a complete profile. This will unify data from different sources that belongs in the same category, such as putting all transactions into a common format.

Profiles will also include values derived* from the raw customer data, including calculations (lifetime purchases, predictive model score) and rule-based classifications (customer type, segment codes).

Profile creation may also involve creating a “Golden Record” with the best version of a particular value, such as the current address. The original variants (e.g. the customers’ old address) will still be needed for matching and other purposes, but will not necessarily be exposed in the profile.

* Derived values are incredibly useful for analytical purposes. A derived value is a new field calculated based on another field, such as calculating the age of a person based on a ‘date of birth’ field.

Proper data sourcing and compliance

As privacy rules become more stringent and customers become more sensitive to data sharing, it’s increasingly important for companies to ensure they are using data legitimately. This means capturing explicit consent for different uses of personal data and different channels, enforcing any restrictions, and keeping audit trails to prove data compliance. It may mean excluding some personal data from the unified profiles, obfuscating PII (Personally Identifiable Information) from view, or creating different elements within a profile that can be used for different purposes.

Analytical data sets, in particular, may take advantage of anonymized data (with IDs that cannot be traced back to individuals) and pseudonymized data (with IDs that are not themselves personal identifiers but can be linked to individuals using a reference table). Good privacy practice includes using anonymized data whenever possible, especially in analytical projects, where there’s no real need to identify the individuals whose data is being examined.

Also, under regulations such as the GDPR, a consumer may ask for their ‘personal data’ to be erased, and a real benefit here, with some Customer Data Platforms, is the ability to only erase the personal data but retain all other events, behaviors and transactions, to be stored anonymously thereafter but still retained for analytical purposes.

Accessibility of data

The assembled data must be accessible for analytical use. This typically means extracting selected data elements, for selected customers, from the full data set. The extracts may often contain aggregated values where the underlying detail is not needed. Individual identifiers are often removed or obscured. The data format will depend on the analytical system that will use the data; it is often a flat file, but might be a relational database file, SAS data set, or something else. Special functions might be employed, such as creating a time series.

Customer Data Platform system requirements

Here are some system requirements you should review with any CDP vendor to ensure that the data model, capabilities and underlying database meet your customer analytics requirements. Specific requirements include:

Adequate source data

  • An API and/or SDK (software development kit) to connect new data sources.
  • Prebuilt connectors for common data sources.
  • Ability to import flat files and make database queries.
  • Enables import process management (defines import processes, runs on schedule or on demand, reports on process completion and key statistics, such as the number of records, match rates, and blank fields, issues alerts when problems occur).
  • Data exploration tools to examine new sources and review ingested data.
  • Data quality tools to identify missing, unexpected or invalid values.

Appropriate detail of data

  • Supports all data types (structured, semi-structured and unstructured).
  • Retains full detail of all ingested data.
  • Accepts new or unexpected attributes (either ingests and stores these automatically or issues alerts, so that users can examine them and decide how to handle them).
  • Generates metadata, so available data is known.
  • Specifies retention periods for each data type and source.
  • Date/time-stamps new data and provides the ability to recreate data at a past point in time (something that is important for predictive modeling).

Data standardization (sometimes known as “normalization”)

  • Provides interfaces to build, test, and deploy data standardization rules.
  • Includes APIs and prebuilt connectors for reference data sets.
  • Monitors standardization processes and reports on the results.

Data Enhancement

  • Has in-built enhancement data sets that can run enhancements during the data-loading process, based on unique IDs or built-in data processing rules, or can enable the import of 3rd party enhancement data sets in order to provide the same function.
  • Can send customer ID files to an external enhancement vendor, which finds matches, appends enhancement data, and returns it to system for ingestion.
  • Uses real-time APIs and prebuilt connectors to send single customer IDs to an external enhancement vendor, then receives enhancement data and adds it to the customer record.
  • Monitors enhancement processes and reports on the results, including match rates and data values.
  • Has a process to identify conflicting data from different enhancement sources and applies rules that determine which value to accept.
  • Automatically removes obsolete enhancement data, based on contractual obligations or timeliness (e.g. discards purchase intent after 90 days).

Powering trustworthy and reliable customer analytics with a Customer Data Platform eBook

Download this eBook, authored by CDP Institute Founder David Raab in conjunction with BlueVenn, to find out how to review the requirements and specifications of a Customer Data Platform for a data analytics use case.


Data matching and identity resolution

Applies multiple matching methods, including:

  • Explicit matching (links two identifiers provided by a customer, e.g. a phone number and email address).
  • Deterministic matching (links two identifiers used in the same interaction, e.g. the device ID and account ID when a device is used to access an account).
  • Similarity methods (links two identifiers based on a near match, e.g. different versions of the same postal address).
  • Probabilistic matching (links two identifiers based on behaviors, e.g. when two devices are consistently at the same locations over time).
  • Maintains persistent IDs (links all identifiers for each customer to an unchanging master ID, so relationships are retained even if a particular identifier becomes obsolete).
  • Allows users to modify and test matching rules, including reports on matches created as a result of rule changes.
  • Monitors matching processes and reports on the results, including match rates, match certainty, reasons for matches, a list of questionable matches for manual review, etc.
  • Includes an interface that allows users to manually accept or reject matches; the system automatically rebuilds profiles as a result.
  • Applies different matching rules or certainty standards for different purposes.

Data unification

  • Maps data elements from source systems into standard profile attributes.
  • Updates profile attributes in real-time as underlying data changes.
  • Manages derived values included in the profile (e.g. lifetime value calculations, model scores).
  • Provides interfaces to define values, controls who is allowed to make changes, keeps an audit trail of changes, retains old versions of formulas, uses derived values in segmentation, queries selection rules, etc.
  • Uses derived values as inputs to other derived values; reports where value is used, to understand the impact of making a change.
  • Reports on profile data (number of customers, value distributions, values by segment, changes over time and so on).
  • Generates a “Golden Record” with the best information available for each customer.

Proper data sourcing and compliance

  • Provides a customer-facing interface to gather and manage consent (optional).
  • Stores customer consent with full detail (e.g. identifier, collection date, expiration date, data elements covered, permitted use).
  • Ensures proper authority for all data use (maintains rules regarding which elements require which authority for what purposes; compares each request against the rules; allows for different rules in different jurisdictions).
  • Keeps an audit trail of data usage, including date, data elements, customers and authorities using it.
  • Creates anonymized data sets (data cannot be tied to a personal identity).
  • Creates pseudonymized data sets (PII is obscured, but could be recovered by an authorized user).

Customer data accessibility

  • Exposes profile elements so that users and external systems will know just what is available.
  • Provides segmentation and extract tools, allowing users to send sets of customer profiles to external systems.
  • Places extracted data sets into the formats required by external systems, including flat files, database tables and analytical formats, such as SAS data sets.
  • Includes standard API and prebuilt connectors that will let external systems extract profiles.
  • Provides real-time access to individual customer profiles via API or SQL queries, where the system receives a customer identifier, finds the related profile, and then returns the entire profile (or specified elements).

Other CDP requirements

  • Has a user interface that lets non-technical users perform system functions, including: adding new data sources and attributes; managing standardization, enhancement, and profile creation; creating extracts; and connecting with external systems, so they can access the data.
  • Works at the scale required for a particular situation. Dimensions include number of customers, total volume and complexity of data; data volume during peak periods; and acceptable update, query, and extraction times for real-time and batch processes.
  • Automates processes to continuously update the data, with minimum manual intervention.
  • Implements adequate security and privacy practices, to ensure that the data remains safe, accurate, and available.

Role of the CDP for customer analytics

The CDP Institute defines Customer Data Platforms as “packaged software that creates a unified, persistent customer database accessible to other systems”. Many systems that call themselves CDPs do not, in fact, meet these criteria. This is especially relevant to analytics database buyers, because the RealCDP criteria maps closely to customer analytics software requirements and analytics database needs.


Buyers should approach non-certified systems carefully, to assess whether they have gaps that will make them unsuitable to use for building an analytical database.

Buyers should also recognize that CDPs do not exist in a vacuum. Your company’s exact requirements will depend on your existing systems and business needs. For example, many companies have invested heavily in data warehouses and data lakes that assemble much of the data needed for analytical projects. If a CDP can pull its data from those sources, there’s less need to connect with source systems directly. However, as seen in the above ‘Customer Data Platform system requirements’ list, a CDP can offer many benefits in terms of enhancement, matching and unification that will improve the quality of the data available to be used for customer analytics.

Similarly, some companies already have comprehensive identity matching capabilities in place. Those firms can buy a CDP with minimal identity management features of its own. Conversely, companies without those capabilities will want a CDP that has all the necessary features. Even CDPs that meet the RealCDP requirements vary greatly in these areas.

RealCDP definition

The CDP Institute has further refined its criteria with the RealCDP program, www.realcdp.com, which lists five requirements that a system must meet to be certified as a true CDP. It should be able to:

  • Ingest data from all sources.
  • Capture all detail of the ingested data.
  • Store the ingested data indefinitely
    (subject to regulatory constraints).
  • Create unified profiles.
  • Allow any system to access the data.


As you will see, preparing analytical data sets is a complex process, with many requirements. Customer Data Platforms are designed specifically to create customer data sets in an automated and efficient fashion. They can be an important tool because they free up data scientists, marketers and business users to focus on more productive tasks, rather than spending excessive time preparing data for specific analytical tasks.

When you are looking to select a Customer Data Platform for analytical purposes there will be some CDPs that provide all the enhancements and cleansing you need, but others will not, especially if their primary use case is for personalization or real-time interaction management instead. Doing your due diligence on all the factors mentioned in this blog will help you to create the right vendor shortlists and invest in the most relevant Customer Data Platform to meet your customer analytics requirements.

Reliable products. Real results.

Every day, thousands of companies rely on Upland to get their jobs done simply and effectively. See how brands are putting Upland to work.

View Success Stories