Tame the Chaos
All companies that are competitive in their industries leverage their data to drive business decisions. The challenge is that different data is found in many different areas of the company and often have data quality issues or similar data across the company can't easily be compared.
Data Discovery and Cleansing
Unless a system is creating the data, humans almost always insert problems into any data set. Most of the time, this is unintentional and is often just a human error (e.g., transcription or transposition error) or someone entering different data that what is expected (e.g., entering “NA” for race when someone declines to provide it). From the beginning of any IT Transformers project, we review the data for data issues (i.e., Data Discovery) and then identify validations or data transformations to handle cases of “bad data”. Understanding how clean (or not) your data is and implementing processes to handle it appropriately will ensure that your analytics and reports have meaning.
Closely related to data discovery and data cleasning, data validation is a key part of mananging the overall data environment. IT Transformers regularly add data validation processes to ETL jobs to ensure data integrity or identify bad data. While IT professionals cannot second-guess user enteries, a process can be added to raise suspect data to be reviewed. In the past, our staff have created processes to enable staff to review new data entries and determine if they should be allowed to be stored or affect "down-stream" processes. This is key to an organizations data management strategy because data issues coming from other systems can create expensive problems when trying to deal with them on the analytics side. The key is to identify suspect data as early as possible.
This is a complicated term to mean that we pull together every piece of data that means the same thing regardless of variations. The different items that appear to be similar are given a probability and then a threshold can be set to identify matches or non-matches. There’s always some chance for a false-positive or a false-negative, so spending time understanding the nuances of your dataset are key to finding the right balance. IT Transformers staff have implemented probabilistic matching algorithms using IBM and Informatica tools...and even have worked with custom matching processes.
Something as simple as "customer" or "product" may mean very different things in different departments. IT Transformers is adept at bringing key players to the table to create a version of the data (e.g., definitions or standards) that will make sense to business units and won't cause conflicts. IT Transformers can help set up Data Stewards to take owership of the data for the long-term. This will move your organization to be "Data First".
Choose any way to get in touch and lets dive into how we can help!