How Caching Works In Data Virtualization Environments
May 20, 2013 3 Comments
Managing Performance and SLAs
I am often asked how to manage query performance of frequently-accessed data sources, in order to minimize impact on operational systems or to support service level agreements.
While this can be a challenge in large scale data virtualization environments, caching, also known as materialized views, provides an excellent performance adjunct to query optimization.
Caching Flexibly Persists Data to Meet Service Level Needs
Mature data virtualization platforms provide multiple caching options and techniques.
These let you flexibly persist queried data to meet data delivery service level agreements and protect source system performance.
- Any View, Any Service, Any Procedure – Any view, service or procedure may be cached for future use, and all caches may be periodically and automatically refreshed to stay synchronized with their systems of record. Queries are processed against caches just as if you were querying the original data source.
- Multiple Cache Repository Options – It’s a good idea to cache data with other frequently accessed sources. Composite for instance can cache on DB2 , Greenplum, Microsoft SQL Server, MySQL, Netezza, Oracle, Sybase, Teradata and Vertica.
- Event-driven Refresh – Updating a cache based on defined business rules provides significant flexibility based on events and activities.
- Scheduled Refresh – Updating a cache based on set times is useful in more schedule-driven environments.
- Manual Refresh – Updating a cache on demand, for example when a report is run, provides an additional option.
- Incremental Refresh – Updating a partial cache based on triggered changes is useful for large data sets with frequent refreshes.
- Native Data Source Load – Using the target repository’s native load functions to load and refresh the cache accelerates loading times by 10x or more over a typical SQL insert.
- Parallel Load – Using multiple threads to load a cache in parallel also accelerates loads.
- Centralized Caching – In centralized mode, all cached data is stored in a single cache repository. Centralized cache refreshes are fully configurable including timed refresh, event-based refresh (CJM or JMS message), incremental refresh and forced refresh.
- Distributed Caching – In distributed mode, users dedicate one or more data virtualization servers as edge servers and configure edge cache policies. Edge cache policies let you control which cache data is replicated from the central cache to the edge location and the refresh rules. Refresh can be time based, event-based or incremental.
- Clustered Deployment – For clustered deployments, a centralized cache reduces the need for each cluster node to re-fetch the data from the source, which significantly reduces the impact on production data sources.
Enjoy the Flexibility
As you can see, caching’s many options help provide architects and developers with significant flexibility to address nearly any performance or SLA challenge.
Caching is easy to implement, and easy to change as conditions change. Take advantage.