Ten Mistakes to Avoid When Virtualizing Data – Part 1
September 2, 2011 1 Comment
In late 2008, I wrote the cover article for the November edition of Virtualization Journal. Ten Mistakes to Avoid When Virtualizing Data described the ten most common mistakes made by data virtualization’s early adopters.
In that article, my objective was to provide important ‘lessons learned’ guidance that would assist new data virtualization users to accelerate their success and benefits realization.
Fast Forward to 2011
In the nearly three years since, both data virtualization technology and its adoption have advanced significantly. Early adopters have expanded their data virtualization deployments to a far wider set of use cases. Hundreds for enterprises across multiple industry segments, as well as dozens of federation government agencies, have started similar data virtualization journeys.
Industry Analysts Report Data Virtualization Acceleration
Industry analysts also recognize this acceleration. According to a June 2011 Forrester Research report, entitled Data Virtualization Reaches Critical Mass: Technology Advancements, New Patterns, And Customer Successes Make This Enterprise Technology Both A Short- And Long-Term Solution, data virtualization has reached critical mass with adoption in the coming 18-30 months expected to accelerate as new usage patterns and successes increase awareness and interest.
The July 2011, Gartner Hype Cycle for Data Management 2011 reports that data virtualization has moved into the slope of enlightenment with mainstream adoption expected within the next two to five years.
Looking Back – The Ten Mistakes from 2008
Let’s consider the ten mistakes identified in the 2008 article. Determining where and when to use data virtualization was the source of five common mistakes. Implementing data virtualization, from the design and enabling technology points of view, was the source of three potential mistakes. Failing to determine who implements it and failing to correctly estimate how much value may result were also common mistakes.
- Are these the same mistakes data virtualization adopters are making today?
- If so, what additional advice and insight is available today to complement this earlier counsel and mitigate negative impacts?
- If not, are there other mistakes that are more relevant today?
Mistake #1 – Trying to Virtualize too Much
Data virtualization, similar to storage, server and application virtualization, delivers significant top- and bottom-line benefits. However, data virtualization is not the right solution for every data integration problem. For instance, when the use case requires multidimensional analysis, pre-aggregating the data using physical data consolidation is a more effective, albeit a more expensive, approach.
Trying to use too much data virtualization has only recently become a common mistake amongst the most successful data virtualization adopters. For an updated look at this topic, check out When Should We Use Data Virtualization? And Successful data integration projects require a diverse approach. These articles provide updated counsel and tools for making data virtualization versus data consolidation decisions.
Mistake #2 – Failing to Virtualize Enough
Failing to virtualize enough carries a large opportunity cost because physical data consolidation necessitates longer time- to-solution, more costly development and operations, and lower business and IT agility.
This continues as perhaps the biggest mistake today. The main issue is familiarity with other data integration approaches closes one’s mind to better options. To counteract this tendency, become more adept at evaluating data virtualization’s measurable impacts, especially in contrast to other integration approaches. To better understand data virtualization’s business and IT value propositions, take a look at How to Justify Data Virtualization Investments.
Mistake #3 – Missing the Hybrid Opportunity
In many cases, the best data integration solution is a combination of virtual and physical approaches. There is no reason to be locked into one way or the other. This remains true today. For more insights into hybrid combinations of data virtualization and data warehousing, check out:
- Five Ways Data Virtualization Improves Data Warehousing
- Practical Ways to Use Data Virtualization with Data Warehouse Appliances
- Extend MDM with Data Virtualization.
Mistake #4 – Assuming Perfect Data is Prerequisite
Poor data quality was a pervasive problem in enterprises three years ago and remains so today. While correcting all your data is the ultimate goal, most of the time enterprises settle for a clean data warehouse. With source data left as is, they assume that the quality of virtualized data can never match the quality of warehouse data.
Nothing could be further from the truth. And this myth has dissipated rapidly of late. How Data Virtualization Improves Data Quality is an article that addresses the many ways enterprises are applying data virtualization as a solution to the data quality problem rather than a reason to not do data virtualization. New capabilities developed over the past two years provide data virtualization platforms with a number of important data quality improvement mechanisms and techniques that complement and extend data quality tools.
Mistake #5 – Anticipating Negative Impact on Operational Systems
Although operational systems are often a data virtualization source, the run-time performance of these systems is not typically impacted as a result. Yet, designers have been schooled to think about data volumes in terms of the size of the data warehouse or the throughput of the nightly ETLs.
When using a virtual approach, designers should instead consider the size on any individual query, and how often these queries will run. If the queries are relatively small (for example, 100,000 rows) and broad (across multiple systems and/or tables), or run relatively infrequently (several hundred times per day), then the impact on operational systems will be light.
These and other larger queries are less of an issue today. Data virtualization query performance has is faster than ever due to many internal advancements and technology improvements (Moore’s Law). These improvements are starting to eliminate this mistake from the top ten. As an example of these advancements, Composite Software’s recent Composite 6 release included a number of innovative optimization and caching techniques. Why Query Optimization Matters provides a good summary of the state of the art in query optimization.