Solving the Analyst’s Data Problem
February 6, 2013 1 Comment
Recently I have been talking to a number of data scientists and business analysts about what they actually do when performing a new analysis of some nature. Their processes were quite surprising because they were far more data intensive and far less modeling / analysis intensive than I had thought.
The Analytic Development Process
Analysts start by thinking about the problem they are trying to analyze.
The next thing they do is go after the data they think they might need. This means determining what data is actually available. Then they work with IT to get access to that data. And finally they pull the data together into some form of a sandbox.
They do all of this data preparation work before they start building the analytic model, statistically analyzing the results, interpreting what the results mean for the business and communicating these insights.
More Than Half Their Time Spent On The Data!
The data scientists and business analysts will say they spend over half their time addressing these data related activities. This means they spend less than half their time actually doing analysis! Does that make any sense?
Steven Hillion, well-known data scientist and the head of products at Alpine Data Labs doesn’t think so as you can see in his video on this topic.
At Composite, our products and services are excellent at helping enterprises simplify and accelerate access to data. Out-of- the-box today we have products for automatically introspecting data sources, discovering relationships and then modeling them is friendly entity-relationship diagrams that are easy for the analysts to understand.
Once the data is identified, our development studio simplifies the building of easy-to-understand views of the data. Next our powerful information server automatically optimizes queries the required data sets. And then depending on the sandbox strategy (physical, virtual, or hybrid), our server can also manage these data sets. And all of this can be done in hours or days, rather than weeks or months in the “old way” using ETL, data replication tools and/or hand-coding.
Data Virtualization Speeds the Process, and More
With data virtualization the result is a 2-10x acceleration of time-to-analytic results, which pays off handsomely when analyzing revenue optimization, risk management and/or compliance opportunities.
In addition, the data scientists and business analysts are not only more productive, they are much happier because they get to do more modeling and analyzing and less data chasing. And happier analysts are easier to retain, a key issue given the shortage of analysts today.
Further all of this works with Big Data, traditional enterprise data, external or cloud data, desktop data, and more.
Simple, yet powerful and works for any organization’s IT environment. Lots of value-add and the users like it too. I think your data scientists and business analysts will find data virtualization a great solution for their data challenges.
Are You A Data Scientist or Business Analyst?
I am eager to continue talking with data scientists and business analysts about their data challenges. I’ll start a discussion track on the DV (Data Virtualization) Café Linked-In Group where we can explore things further.