SOURCERY: User Driven Multi-Criteria Source Selection

The SOURCERY system – an interactive Source Selection web application – was presented at the  CIKM 2018 conference in Torino, Italy.

With increasingly many possible data sources, data scientists are often only interested in a subset of available sources, where the data selected is the best fit for their purpose and needs. Sourcery looks to model a user’s preferences, regarding the relative importance between the set of criteria that are significant to him/her, and then find a source selection solution most aligned to his/her preferences. In this way, different solutions will be more appropriate for different users, in relation to his/her context and needs for the data.

To read the Demo paper Click Here, and to learn more and try out the demo Click Here. Any comments/questions/suggestions/bugs! get in touch with Ed Abel.

Data and User Context Papers Published

In automating data wrangling, a key feature of VADA is that the automation takes account of the data context and the user context. The data context is supplementary data about the result of the data wrangling process. The user context is information about what is important to the user, as likely there are trade-offs between different features of the result, such as the correctness and the completeness.

A paper on data context [1] was presented in December 2017 at IEEE Big Data in Boston, and a paper on the user context for source selection [2] has been published in Information Sciences.

  1. Koehler, M., Bogatu, A., Civili, C., Konstantinou, N., Abel, E., Fernandes, A. A. A., Keane, J., Libkin, L., Paton, N. W. (2017). Data Context Informed Data Wrangling. IEEE 2017 International Conference on Big Data (IEEE Big Data 2017).
  2. Abel, E., Keane, J., Paton, N. W., Fernandes, A. A. A., Koehler, M., Konstantinou, N., Ríos, J.C.C., Azuan, N., Embury, S. M. (2018). User Driven Multi-Criteria Source Selection. Information Sciences, 430–431, 179–199. https://doi.org/10.1016/j.ins.2017.11.019

Wrapidity Acquired by Meltwater

Wrapidity, the company founded to commercialise web data extraction software from Oxford, has been acquired by media intelligence company, Meltwater.

Georg Gottlob, Professor at the Oxford University Department of Computer Science and Co-Founder of Wrapidity, said: “Instant access to products, places, people and news has changed our lives in the last decade. The same access, but at a much larger scale, is now changing business in ways we can’t even imagine yet. At Wrapidity, we have responded to this by developing a completely new AI-based technology for extracting massive amounts of relevant data from millions of websites.”

Tim Furche, Lecturer at the Oxford University Department of Computer Science and Co-Founder and Chief Technology Officer (CTO) of Wrapidity, added: “Meltwater already monitors and analyses millions of articles per day across several languages. Combining Meltwater’s industry leadership and global footprint with Wrapidity’s advances in AI technology, we will be able to surface more accurate, timely and insightful content for Meltwater’s customers. Jorn and his team were visionaries in developing the software, services and business models to make such external web data usable for internal decision-making. We truly believe that companies of the future will hinge on Outside Insight, and we’re extremely excited to pursue this together.”

New EPSRC Grant Award

Prof. Leonid Libkin has been awarded an EPSRC Established Career Fellowship. The grant’s title is “MAGIC: MAnaGing InComplete Data – New Foundations“, and its total amount is £1.14M over 5 years, starting 1 August 2016.

The main goal of this research programme is to deliver new understanding of uncertain and incomplete information in data processing tasks, and by doing so to provide new ways of getting knowledge out of such data. It will reconcile correctness guarantees with an efficient algorithmic toolkit that scales to large data sets, and put an end to perceived impossibility of achieving correctness and efficiency simultaneously for large classes of queries over incomplete data.