Jacopo Urbani

"You must believe in spring" - Bill Evans

I am an associate professor in Computer Science at the Vrije Universiteit Amsterdam (VUA). My research focuses on how to extract new (and interesting) knowledge from large datasets which are primarily available on the Web. If you are interested, please check out my publication list on Google Scholar or DBLP to have a better idea of my research area.

I received a number of awards for my research. A few papers that I co-authored have received either a honorable mention or a best paper award at top conferences. In 2010, my work on forward inference with MapReduce has won the IEEE SCALE challenge. In 2012, the Network Institute awarded me the prize “Most Promising Young Researcher Award”. In 2013, my PhD was awarded with the qualification cum laude, which was given only to 5% of the theses in our department. In 2014, my PhD work received an honourable mention as best PhD thesis in Computer Science in the country. The award was given by the Christiaan Huygens society, after a selection performed by KNAW (Royal Netherlands Academy of Arts and Sciences).

Latest news

Jun 15, 2024

Back at the VU after a one-year sabbatical at Fondazione Bruno Kessler (FBK), in Trento, Italy

I had the privilege of visiting Luciano Serafini's group (https://dkm.fbk.eu/), which does fascinating research on neuro-symbolic AI and planning.

May 3, 2024

New paper in the journal ''Transactions on Graph Data and Knowledge (TGDK)'' on grounding stream reasoning research

[Abstract]

In the last decade, there has been a growing interest in applying AI technologies to implement complex data analytics over data streams. To this end, researchers in various fields have been organising a yearly event called the "Stream Reasoning Workshop" to share perspectives, challenges, and experiences around this topic. In this paper, the previous organisers of the workshops and other community members provide a summary of the main research results that have been discussed during the first six editions of the event. These results can be categorised into four main research areas: The first is concerned with the technological challenges related to handling large data streams. The second area aims at adapting and extending existing semantic technologies to data streams. The third and fourth areas focus on how to implement reasoning techniques, either considering deductive or inductive techniques, to extract new and valuable knowledge from the data in the stream. This summary is written not only to provide a crystallisation of the field, but also to point out distinctive traits of the stream reasoning community. Moreover, it also provides a foundation for future research by enumerating a list of use cases and open challenges, to stimulate others to join this exciting research area.

[Link]

For more information, see here.

Dec 5, 2022

Hosted the sixth edition of the Stream Reasoning Workshop

The Stream Reasoning Workshop is an international yearly event where scientists from different communities gather together to discuss problems around the processing of data streams. I co-chaired the 2022 edition, hosted in Amsterdam.

For more information, see here.

Sep 1, 2022

Appointed as the new program director of the bachelor of Computer Science.

The bachelor program is the largest in our department, welcoming 500+ new students every year. As program director, my role is to oversee all educational activities (courses, etc.) in the program, to ensure the quality of the education.

Aug 16, 2022

New paper at SIGMOD 2023 on performing scalable probabilistic reasoning using Trigger Graphs. This work was done in collaboration with Samsung AI (Cambridge, UK).

[Abstract]

The role of uncertainty in data management has become more prominent than ever before, especially because of the growing importance of machine learning-driven applications that produce large uncertain databases. A well-known approach to querying such databases is to blend rule-based reasoning with uncertainty, but techniques proposed so far struggle with large databases. In this paper, we address this problem by presenting a new technique for probabilistic reasoning that exploits Trigger Graphs (TGs) – a notion recently introduced for the non-probabilistic setting. The intuition is that TGs can effectively store a probabilistic model by avoiding an explicit materialization of the lineage and by grouping together similar derivations of the same fact. Firstly, we show how TGs can be adapted to support the possible world semantics. Then, we describe techniques for efficiently computing a probabilistic model, and formally establish the correctness of our approach. We also present an extensive empirical evaluation using a prototype called LTGs. Our comparison against other leading engines shows that LTGs is not only faster, even against approximate reasoning techniques, but can also reason over probabilistic databases that existing engines cannot scale to.

Apr 15, 2022

New paper at KR 2022 on performing rule-based reasoning with existential rules on data streams. This work was done in collaboration with Markus Krötzsch (TU Dresden) and Thomas Eiter (TU Wien).

[Abstract]

We study reasoning with existential rules to perform query answering over streams of data. On static databases, this problem has been widely studied, but its extension to data that changes rapidly has not yet been considered. To bridge this gap, we consider LARS, a well-known framework for rule-based stream reasoning, and extend it to support existential rules. For that, we show how to translate LARS with existentials into a semantics-preserving set of existential rules. As query answering with such rules is undecidable in general, we describe how to leverage the temporal nature of streams and present suitable notions of acyclicity that ensure decidability. Our contribution also includes a preliminary empirical evaluation over artificial streams.

Feb 22, 2022

New paper at ESWC 2022 on fact classification with Knowledge Graph embeddings and enseble-based learning

[Abstract]

Numerous prior works have shown how we can use Knowledge Graph embeddings for ranking unseen facts that are likely to be true. Much less attention has been given on how to use embeddings for fact classification, which is a related task where we do not rank facts but label them either as true or false. A direct conversion of the ranked lists of facts into true/fact labels tends to yield a low accuracy. This makes fact classification with embedding a non-trivial problem. In this paper, we tackle this challenge with a new technique that exploits ensemble learning and weak supervision, following the principle that multiple weak classifiers can make a strong one. Our method is implemented in a new system called DuEL. DuEL post-processes the ranked lists produced by the embedding models with multiple classifiers, which include supervised models like LSTMs, MLPs, and CNNs and unsupervised ones that consider subgraphs and reachability in the graph. The output of these classifiers is aggregated using a weakly supervised method that does not need ground truths, which would be expensive to obtain. Our experiments show that DuEL produces a more accurate classification than other existing methods, with improvements up to 72% in terms of F1 score. This suggests that weakly supervised ensemble learning is a promising technique to perform fact classification with embeddings.

[GitHub]

Nov 7, 2021

New paper at EMNLP 2021 on robust stance classification with BERT-based inconsistency detection

[Abstract]

We study the problem of performing automatic stance classification on social media with neural architectures such as BERT. Although these architectures deliver impressive results, their level is not yet comparable to the one of humans and they might produce errors that have a significant impact on the downstream task (e.g., fact-checking). To improve the performance, we present a new neural architecture where the input also includes automatically generated negated perspectives over a given claim. The model is jointly learned to make simultaneously multiple predictions, which can be used either to improve the classification of the original perspective or to filter out doubtful predictions. In the first case, we propose a weakly supervised method for combining the predictions into a final one. In the second case, we show that using the confidence scores to remove doubtful predictions allows our method to achieve human-like performance over the retained information, which is still a sizable part of the original input.

[Link] [arXiv] [GitHub]

For more information, see here.