Title: Entity Resolution with Active Learning
报告题目:实体分析与主动学习
Abstract: Entity Resolution (ER) refers to the process of identifying records which represent the same real-world entity from one or more datasets. However, traditional methods for ER suffer from several challenges, such as imbalance classes, limited labelling budgets and model overfitting. In this talk, I will first introduce a novel blocking scheme learning approach based on active learning techniques. Two strategies called active sampling and active branching are proposed to select samples and generate blocking schemes efficiently. Then, I will propose a skyblocking method, aiming to learn blocking scheme skylines with respect to different blocking criteria using three novel algorithms. Based on these blocking techniques, I will further develop a general active learning framework for classification, called Learning-To-Sample (LTS). This LTS framework has two key components: a sampling model and a boosting model, which can mutually learn from each other in iterations to improve the performance of each other. Finally, to address the overfitting problem, I will propose a semi-supervised generative adversarial network, namely ErGAN. This model contains a label generator and a discriminator which are optimized alternatively through adversarial learning.
报告摘要:实体解析(ER)是指从一个或多个数据集中识别出代表相同真实实体的记录的过程。然而,传统的ER方法面临着一些挑战,如不平衡的类别,有限的标签预算和模型过拟合。在这次演讲中,我将首先介绍一种新的基于主动学习技术的分块架构的学习方法。为了有效地选择样本并生成分块方案,提出了主动采样和主动分支两种策略。然后,我将提出一种skyblocking方法,旨在通过三种新的算法学习不同分块准则下的分块架构天际线。基于这些分块技术,我将进一步开发一个基于主动学习的通用的分类框架模型,称为learning-to-sample(LTS)。该LTS框架有两个关键的组成部分:采样模型和提升模型,它们可以在迭代中相互学习,相互提高性能。最后,为了解决潜在的过拟合问题,我将提出一种基于半监督学习的生成对抗网络模型,即ErGAN。该模型包含一个标签生成器和一个鉴别器,他们通过对抗性学习交替优化。
报告人简介:邵靖宇,男,澳大利亚国立大学学计算机博士,本科毕业于北京航空航天大学,硕士毕业于悉尼科技大学。主要研究方向为实体分析、数据挖掘、主动学习和机器学习。
报告时间:2021年7月26日(星期一)上午9:00-10:00
报告平台:腾讯会议
会议号:688 511 698
会议密码:202107