IT Transformers founders and staff have worked in the data warehousing field for 20 years. We’ve heard endless debates on the “right way” to create a data warehouse and have seen technologies evolve over the years. Through it all, we keep one thing first and foremost in our minds: We must maintain the integrity of all this data to support the business.
Simply put, a data warehouse is a big store of disparate data organized in a way that it can more easily be analyzed and reported. While we can explain the differences been the classic Bill Innmon or Ralph Kimball way of thinking of data modeling, we have found it more helpful to discuss each organizations’ data and the nuances in it.
In the past, a data warehouse meant millions of dollars and a very long timeline. Fast forward, technologies evolved and began to disrupt the “traditional data warehouse” market. Also the rise of API’s over the years have made data more easily accessible to software services rather than just ETL tools. This has led to an explosion of new opportunities for organizations to finally organize their data at reasonable cost. The new problem is sifting through all the hype and industry lingo to find the right set of technologies to unlock your data’s power. This is what’s meant by the new industry term: right fit engineering.
Traditionally, this is the “Extract-Transform-Load” process of data integration into a data warehouse. Given today’s hybrid environments involving cloud service sources and destination, we now call this Enterprise Data Integration. Make no mistake, wrangling all of your organizations’ data is still a challenge as it was in the past before API’s and the explosion of cloud services. This is due to “shadow IT” (a term many traditional IT staff used with a negative connotation) which is a business unit buying or creating a technology solution that isn’t known by the enterprise as a whole. Whether its a one-off cloud service, a local database or custom application coded by a “hobbyist” developer, IT Transformers can deal with these disparate sources and bring them into the fold of the enterprise.
The market leaders in this space are Informatica PowerCenter (Informatica), Microsoft’s SQL Server Integration Services (SSIS) and Talend’s Open Studio for Data Integration (Talend). Below we highlight a few points of each:
Informatica PowerCenter is a truly enterprise-level tool but has the agility to deploy in less time than some other solutions. Informatica brings its meta-data driven platform to accelerate your projects. This powerful platform lets developers focus on validation, analysis and prototype to show you value from your data earlier. It also connects to most any data source regardless of it being a legacy source or newer technology. Therefore, Informatica PowerCenter can be the foundation of all your data integration and movement needs.
SQL Server Integration Services or simply SSIS, is an industry mainstay. If your organization uses Microsoft SQL Server as a database platform, chances are that SSIS is already being used. That’s because SSIS is bundled with SQL Server. This makes it handy from a licensing perspective and the tight integration with SQL Server and the Microsoft ecosystem makes it an attractive tool for Microsoft-oriented organizations. Further, there’s a ready pool of talent available in most any market when the time comes for your organization to take on maintenance, support or even development.
Talend’s Open Studio for Data Integration is a widely respected – and used – open source tool. That means that there are no licensing costs involved for implementing it. This may be a good option for those organization not able to invest in Informatica and who don’t have SQL Server implemented. This is a full-featured tool that is suitable for most scenarios and there is a paid option that provides even more features.
Data Warehouse, Data Mart or Data Lake?
In the past, the idea was a Data Warehouse would hold a wider scope of data and Data Marts would be created from the data warehouse based on a dimensional model of a certain area of your business (like Marketing or Supply Chain). What many business found is that this took alot of time and effort. However, these can still make alot of sense if your organization has many standard reports that should stay fairly stable over time. See this diagram for the concepts:
If you’re in a business that changes rapidly or is in the process of being disrupted by new entrants to the market, you may want to quickly create a Data Lake to have a central place to explore all your data. Simply put, a Data Lake is a central area that is largely unstructured but where many different data sets can be integrated. The advantage is that it is quick and easy to identify relationships that may take many months to uncover with a Data Warehouse.
While many have been trained to think that all of this data lives in a single database, new technologies have even changed that basic paradigm. Indeed, the concept of a data lake is that data isn’t loaded into a single database at this stage. This uses a large storage space and raw data is loaded typically in files as most analysis can be done. In fact, there may be a strong use-case for implementing a Hadoop ecosystem, Apache Spark, Google Big Query, or other file-based analysis tools. See this diagram:
When it comes to a true data warehouse, relational databases are still the go-to technology. However, the marketplace has also become complicated in recent years. Cloud-native databases like Amazon Redshift or Snowflake have become popular as cloud migrations have made it easier and cheaper to maintain a data warehouse. However, the stalwarts are still popular including Teradata, Oracle, SQL Server or even PostGreSQL. Which is right for your organization depends on a lot of factors.
Again, you shouldn’t have to try to figure out what is right for your organization. IT Transformers will assess your organization, market forces and technology posture to create solid recommendations that are aimed at solving your business problems -- NOT selling a particular tool as a “silver bullet”.
Choose any way to get in touch and lets dive into how we can help!