ICDM 2023's Workshop

The First Workshop on Data-Centric AI (DCAI): shifting research focus from model to data.

Home

Machine learning focuses on developing models for datasets, but real-world data is often messy. Improving the dataset itself can be a better way to enhance performance instead of just improving the models. Data-Centric AI (DCAI) is an emerging field that systematically improves datasets, resulting in significant improvements in ML applications. DCAI treats data improvement as an engineering discipline, offering a shift in focus from modeling to the underlying data. This workshop aims to build an interdisciplinary DCAI community to tackle data problems such as collection, labeling, preprocessing, quality evaluation, debt, and governance. Interested parties can shape the future of AI and ML by submitting papers in response to the call for papers.

Agenda

Date: Dec. 1st, Location: Room 6, Zoom link: https://zoom.us/j/91649466943, Password: 202312
Time (Beijing Time)	Title	Attendance	Format	Presenter/Author
8:00-8:05	Opening Remarks			Organizers
8:05-8:40	Keynote Presentation: Addressing Data Quality Issues with Data-Centric AI Approaches	Video	30 min + 5 min QA session	Jae-Gil Lee
8:40-9:15	Keynote Presentation: Data-Efficient Fine-Tuning and Adaptation of Language Models	Video	30 min + 5 min QA session	Chao Zhang
9:15-9:30	Resolving the Imbalance Issue in Hierarchical Disciplinary Topic Inference via LLM-based Data Augmentation	In-person	12 min + 3 min QA session	Xunxin Cai, Meng Xiao, Zhiyuan Ning, and Yuanchun Zhou
9:30-9:45	Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse	In-person	12 min + 3 min QA session	Seungyoon Lee, Dahyun Jung, Chanjun Park, Seolhwa Lee, and Heuiseok Lim
9:45-10:00	Deep Outdated Fact Detection in Knowledge Graphs	In-person	12 min + 3 min QA session	Huiling Tu, Shuo Yu, Vidya Saikrishna, Feng Xia, and Karin Verspoor
10:00-10:15	Silence Speaks Volumes: Re-weighting Techniques for Under-Represented Users in Fake News Detection	Video	12 min + 3 min QA session	Mansooreh Karami, David Mosallanezhad, Paras Sheth, and Huan Liu
10:15-10:30	Guided Nearest-Neighbor Contrastive Learning with Prior Knowledge For Hotel Recognition	Video	12 min + 3 min QA session	Aarash Feizi, Randall Balestriero, Arantxa Casanova, Adriana Romero-Soriano, and Reihaneh Rabbany
10:30-10:35	Closing Remarks			Organizers

Topics

We welcome a wide array of submissions focused on data-centric AI, encompassing topics such as theories, algorithms, applications, systems, and tools. These topics include but are not limited to:

Automated Data Science Methods
- Data cleaning, denoising, and interpolation
- Feature selection and generation
- Data refinement, feature-instance joint selection
- Data quality improvement, representation learning, reconstruction
- Outlier detection and removal

Tools and Methodologies for Expediting Open-source Dataset Preparation
- Time acceleration tools for sourcing and preparing high-quality data
- Tools for consistent data labeling, data quality improvement
- Tools for generating high-quality supervised learning training data
- Tools for dataset control, high-level editing, searching public resources
- Tools for dataset feedback incorporation, coverage understanding, editing
- Dataset importers and exporters for easy data combination and consumption
- System architectures and interfaces for dataset tool composition

Algorithms for Handling Limited Labeled Data and Label Efficiency
- Data selection techniques, semi-supervised learning, few-shot learning
- Weak supervision methods, transfer learning, self-supervised learning approaches

Algorithms for Dealing with Biased, Shifted, Drifted, and Out of Distribution Data
- Datasets for bias evaluation and analysis
- Algorithms for automated bias elimination, model training with biased data

Submission Details

We invite the submission of regular research papers (max 6 pages plus 2 extra pages), including all content and references. Submissions must be in PDF format, and formatted according to the new Standard IEEE Conference Proceedings Template. Submitted papers will be assessed based on their novelty, technical quality, potential impact, insightfulness, depth, clarity, and reproducibility. All the papers are required to be submitted via the wi-lab system. By the unique ICDM tradition, all accepted workshop papers will be published in the dedicated ICDMW proceedings published by the IEEE Computer Society Press. For more questions about the workshop and submissions, please send email to kunpeng@pdx.edu

Important Dates

Workshop Papers Submission: September 15, 2023
Notification of Workshop Papers Acceptance: September 24, 2023
Camera-ready Deadline and Copyright Form: October 15, 2023
Workshop Day: December 1, 2023

Organizing Committee

Steering Co-Chairs

Hui Xiong

The Hong Kong University of Science and Technology (Guangzhou)

Vipin Kumar

University of Minnesota

Program Co-Chairs

Yanjie Fu

Arizona State University

Steven Euijong Whang

Korea Advanced Institute of Science & Technology

Kunpeng Liu

Portland State University

Publicity Co-Chairs

Pengyang Wang

University of Macau

Dongjie Wang

University of Central Florida

Local Co-Chairs

Pengyang Wang

University of Macau

Web Co-Chairs

Dongjie Wang

University of Central Florida

Wei Fan

University of Central Florida

Meng Xiao

Chinese Academy of Sciences

Accepted Paper

Xunxin Cai, Meng Xiao, Zhiyuan Ning, and Yuanchun Zhou, "Resolving the Imbalance Issue in Hierarchical Disciplinary Topic Inference via LLM-based Data Augmentation"
Huiling Tu, Shuo Yu, Vidya Saikrishna, Feng Xia, and Karin Verspoor, "Deep Outdated Fact Detection in Knowledge Graphs"
Aarash Feizi, Randall Balestriero, Arantxa Casanova, Adriana Romero-Soriano, and Reihaneh Rabbany, "Guided Nearest-Neighbor Contrastive Learning with Prior Knowledge For Hotel Recognition"
Seungyoon Lee, Dahyun Jung, Chanjun Park, Seolhwa Lee, and Heuiseok Lim, "Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse"
Mansooreh Karami, David Mosallanezhad, Paras Sheth, and Huan Liu, "Silence Speaks Volumes: Re-weighting Techniques for Under-Represented Users in Fake News Detection"

Speakers

Dr. Jae-Gil Lee, Korea Advanced Institute of Science and Technology
Dr. Chao Zhang, Georgia Institute of Technology

Program Committee

Dr. Yong Ge, University of Arizona
Dr. Hao Liu, The Hong Kong University of Science and Technology (Guangzhou)
Dr. Kunpeng Liu, Portland State University
Dr. Qi Liu, University of Science and Technology of China
Dr. Yanchi Liu, NEC Labs America
Dr. Leilei Sun, Beihang University
Dr. Pengfei Wang, Chinese Academy of Sciences
Dr. Pengyang Wang, University of Macau
Dr. Senzhang Wang, Central South University
Dr. Keli Xiao, Stony Brook University
Dr. Yang Yang, Nanjing University of Science and Technology
Dr. Zijun Yao, University of Kansas
Dr. Denghui Zhang, Rutgers University
Dr. Wei Zhang, University of Central Florida
Dr. Xi Zhang, Chinese Academy of Sciences
Dr. Dongjie Wang, University of Central Florida

Volunteers

Mr. Haihua Xu, University of Macau
Ms. Qi Hao, University of Macau

Photos