Data is everywhere, generated by increasing numbers of applications, devices and users, with few or no guarantees on the format, semantics, and quality. The economic potential of data-driven innovation is enormous, estimated to reach as much as £40B in 2017, by the Centre for Economics and Business Research.
To realise this potential, and to provide meaningful data analyses, data scientists must first spend a significant portion of their time (estimated as 50% to 80%) on “data wrangling” – the process of collection, reorganising, and cleaning data. This heavy toll is due to what is referred as the four Vs of big data:Volume – the scale of the data, Velocity – speed of change, Variety – different forms of data, and Veracity – uncertainty of data.
There is an urgent need to provide data scientists with a new generation of tools that will unlock the potential of data assets and significantly reduce the data wrangling component. As many traditional tools are no longer applicable in the 4 V’s environment, a radical paradigm shift is required. The VADA Programme Grant aims to add value to data by:
- carrying out data management tasks in an environment that takes full account of data and user contexts, and
- integrating and automating key data management tasks in a way not yet attempted, but desperately needed by many innovative companies in today’s data-driven economy.
The VADA research programme will define principles and solutions for Value Added Data Systems, which support users in discovering, extracting, integrating, accessing and interpreting the data of relevance to their questions. In so doing, it uses the context of the user, e.g., requirements in terms of the trade-off between completeness and correctness, and the data context, e.g., its cost, provenance and quality. The user context characterises not only what data is relevant, but also the properties it must exhibit to be fit for purpose.