In today’s fast-paced environment, CPG organizations must have quality data to maintain a competitive edge. Yet, ensuring high data quality and timely data integration in a big data environment can be a major challenge. With numerous disparate data sources, multiple data formats and increasingly high data velocity, many organizations are struggling to accurately and effectively process all of their data in near real time. Handling big data can be a daunting task, and poor big data management can create an uncertain and volatile situation. However, with good integration and the right set of tools, big data becomes entirely manageable.
On March 21, Robert Routzahn, marketing manager for IBM InfoSphere Information Server Team, and I presented an InformationWeek webcast entitled, “Integrating Data in a Big Data Environment.” We led off with a discussion of the complexity of big data and the challenges that arise as a result of the “Four Vs”:
- Volume – the sheer amount of data available
- Variety – the combination of structured and unstructured data from numerous disparate sources, such as manufacturers, retailers, distributers, websites and third party sources
- Velocity – the frequent addition of new products and constantly changing product information
- Veracity – the integrity and trustworthiness of various product information
For example, SymphonyIRI works with more than 2 million active products in the United States alone (volume), with more than 3,000 new products added each week (velocity), and our data comes from a broad assortment of sources (variety), some of which are more reliable than others (veracity).
In order to put their big data to work, CPG organizations need a continuous, automated data solution that allows them to effectively standardize, classify and integrate their data, essentially transforming fragmented product data into actionable product information. For accurate classification, a successful data solution needs the ability to standardize phrases and words within phrases, and must be able to account for both syntax and semantics, i.e., being able to register “Red Door” as a brand name, and “Red” as a color.
At SymphonyIRI, we’ve successfully tackled our data integration challenges with IBM’s InfoSphere DataStage and QualityStage, which provide a simple, reliable means of organizing and tracking complex product data. These data solutions categorize our data into more than 800 product groupings based on more than 600 unique product attributes and allow us to easily develop repeatable processes.
Big data can be a significant asset, but only if that data is manageable. For CPG organizations looking to maintain a competitive advantage, the right data management solution is an essential tool for putting big data to work. How are you managing big data, and how has your organization benefitted? Share your experiences in the comments section below!