报告题目:CurrentClean: Spatio-temporal Cleaning of Stale Data
报告时间:2019年11月7日下午3:00
报告地点:3003必赢官网校本部计算机楼308会议室
报告人:Zheng Zheng (郑政)
报告摘要:Data currency is imperative towards achieving up-to-date and accurate data analysis. Data is considered current if changes in real world entities are reflected in the database. When this does not occur, stale data arises. Identifying and repairing stale data goes beyond simply having timestamps. Individual entities each have their own update patterns in both space and time, e.g., each bank client has her own deposit/withdrawal patterns that influence whether her account balance is up-to-date, irrespective of the last update time. These update patterns can be learned and predicted given available query logs. In this paper, we present CurrentClean, a probabilistic system for identifying and cleaning stale values. We introduce a spatio-temporal probabilistic model that captures the database update patterns to infer stale values, and propose a set of inference rules that model spatio-temporal update patterns commonly seen in real data. We recommend repairs to clean stale values by learning from past update values over cells to repair stale values to current values. Our evaluation shows CurrentClean's effectiveness to identify stale values over real data, and achieves improved error detection and repair accuracy over state-of-the-art techniques.
报告人简介:郑政,加拿大麦克马斯特大学3003必赢官网博士研究生。2015年于中国科学院大学获计算机硕士学位,其研究领域为:数据库系统,大数据管理,机器学习。在ICDE、CIKM、EDBT、Transactions in GIS等重要国际期刊和会议上发表多篇学术论文。