Data Not Included – The Era of the Data Steward

Data Not Included – The Era of the Data Steward author Webmaster1 On: 04/01/2013 Views: 49



It’s all because of connectivity, don’t ya know?  The “Internet of Things” is a simple concept – anything can be connected to the Internet.  Anything.  An embedded electronic gizmo, smaller than a fingernail, and boom, there it is – in your browser or app – that “thing”, transmitting all sorts of information.  Trivial things, such as a reminder to water  flowers; and critical things, such as a jet engine signaling it’s statistically likely to fail on its next flight.  All this information, all this new type of data and the sophisticated analysis to make sense of it – is real. But, it will take many years to realize the revolutionary impact big data and big analytics will have on us.  Importantly; however, we have passed the point of no return – the big data and big analytics craze, in all its hype, evangelical praise, and emphatic disdain, is secular and irreversible.  Welcome to the “Era of Transformation”.

Up to now, technology was primarily about efficiency.  Driving costs out of the system through automation, increased speed and replacing physical channels with digital ones.  The Era of Transformation is something else.  It’s about effectiveness.  The Era of Transformation is likely to go on for a decade or more.  It will transform our organizations – and lifestyles – in ways that cannot be imagined.  How is it and why is it that technology makes us more effective only now?   Today, software and systems have the ability to take millions upon millions of seemingly mutually exclusive data points (and, perhaps more importantly, the ability to gather them) and run a myriad of algorithms against them and discover relationships – cause and effect – and answers the questions what happened, and why, but, ultimately, what will happen next, and what to do about it.  It is an intractable problem, if not impossible, for the human mind. 

The prevailing distributed or client-server model was about delivering applications to users; the cloud model is a bit more about bringing data and applications together. Enterprise applications, primarily creators of data, will be accompanied by a tsunami of new enterprise applications that consume data. Inevitably it will break the current methods of distributing and leveraging information.  In the current enterprise application model, the RDBMSs and the teams that supported them are the “center” of the data universe.  Analyzing data?  Contact the database admin, work with her to create an interface, and she will provide a copy of the data you need – and off you go.  Each connection was point to point, one data source to one data consumer and it was either hand coded or engineered with third party ETL tools.  This is data integration. 

When reflecting on these data silos, my colleague, Ted Friedman, expressed it well during his keynote at the Gartner’s 2013 BI Summit, “First, we have to stop thinking about data as a byproduct”. And this is the real change that big data brings to the way we design, deploy and use applications and how we treat the data they create and how they get the data they require.  There are countless analytical applications emerging to capitalize on data – applications for digital marketing to studying diseases – among a slew of others.   All these applications have one thing in common – data not included.   The line of business will covet these applications; they will need to move fast and painlessly.  Application users will demand a simple way to get the data sets they need, analyze them, preserve their findings, get new data sets, analyze those, and so on.   However, much of the data these analytical applications will need comes from outside the organization.  For example, just in the U.S, there are nearly 100 Federal Agencies with Statistical Programs, each publishing data to and accessible via the Internet. I look at this problem and think déjà vu. Years ago, point-to-point connections from application to application and their inherent brittleness made the application integration model break.   

These days, the title “data scientist” is an oft used term that rivals “big data”. The data scientist is the glory gal.  She takes the realms of data at her disposal, uses her BI and analytics tools, and comes up with answers – and questions – that she would never have been able to find or know to ask.  She’s the resident hero.  I submit another role, the data steward, will rise in ascendancy and importance.   The data steward does the strenuous lifting to prepare the data so it’s ready for the data scientist. The data steward will gather data from a set of these practically infinite data sources, collect them, format them, assure their quality, and then take these data sources and make each one seamlessly available to many data consumers.  It has to be repeatable, scalable, and done rapidly and often.  It likely needs to be self-service for the business user.  The steward will be required to provide internal, transactional, long-lived, short-term and real-time data.  The tool he will need to realize this vision does not exist, but it will.   And when it does, it will transform the ETL market such that it will be unrecognizable.

I recommend that our Gartner Invest clients read the following documents: Top 10 Technology Trends Impacting Information Infrastructure, 2013; Hadoop Is Not a Data Integration Solution; Data Integration Enables Information Capabilities for the 21st Century;, and The Future of Data Management for Analytics Is the Logical Data Warehouse.  These are only a few titles from our library on data and analytics.  Be sure to get on the inquiry calendars of any member of our Information Management Team, including Gartner Invest regulars:  Merv Adrian (big data, DBMS), Mark Beyer (big data, data warehousing), Roxane Edjlali (DBMS, data management), Donald Feinberg (DBMS, data warehousing) and Ted Friedman (data integration and data quality).

You will need a Gartner login to access documents mentioned.

Click to connect:




For ThoughtLeader

CIO Index

Our Focus is On Your Agenda

CIO Index is the world's largest professional network for CIOs - of the CIO, for the CIO, by the CIO. 

Over 75,000 CIOs and other IT Executives use CIO Index to Learn, Network and Share.


Cioindex, Inc.

  • (+1) 800-309-3550
  • Mon - Fri 9:00am - 5:00 pm
  • 375 North Stephanie St., Ste 1411, Henderson, NV 89014