Realising the vision involves both fundamental changes to existing data management techniques, and new ways of bringing them together, to simultaneously address the four V’s of big data. Specifically, our objectives are:
- To deliver the first comprehensive service-based reference model and integrated architecture for Value Added Data Systems (VADA) for continuously extracting, integrating and cleaning data from heterogeneous and possibly unreliable sources, accumulating knowledge about the data throughout its lifetime.
- To introduce a new class of self-correcting, continuous data extraction systems, that revise both how and what data is extracted from data sources, based on the context of the extractions in the form of feedback from other VADA components and external sources.
- To develop a novel framework for cleaning data of heterogeneous structures, that incorporates principled approaches to the scalable discovery of expressive quality rules for repairing a wide range of quality issues.
- To establish progressive data integration techniques responsive to user and data contexts, taking account of rich preferences and diverse quality and performance properties, to make well founded integration decisions.
- To provide techniques for query answering that take into account both the user and data contexts, addressing the fundamental challenges to established techniques from the combination of scale and uncertainty through accounting for both the user and data contexts.
- To put in place knowledge representation and reasoning services that provide a lingua franca enabling a flexible integration of VADA components and support expressive reasoning over knowledge about data and user contexts to inform continuous improvements.
- To characterise and address the requirements of a broad set of applications, driven by our industrial partners, ensuring that all design decisions reflect current and emerging requirements.