Home
Machine learning focuses on developing models for datasets, but real-world data is often messy. Improving the dataset itself can be a better way to enhance performance instead of just improving the models. Data-Centric AI (DCAI) is an emerging field that systematically improves datasets, resulting in significant improvements in ML applications. DCAI treats data improvement as an engineering discipline, offering a shift in focus from modeling to the underlying data. This workshop aims to build an interdisciplinary DCAI community to tackle data problems such as collection, labeling, preprocessing, quality evaluation, debt, and governance. Interested parties can shape the future of AI and ML by submitting papers in response to the call for papers.
Agenda
Date: Dec. 1st, Location: Room 6, Zoom link: https://zoom.us/j/91649466943, Password: 202312 | ||||
Time (Beijing Time) | Title | Attendance | Format | Presenter/Author |
8:00-8:05 | Opening Remarks | Organizers | ||
8:05-8:40 | Keynote Presentation: Addressing Data Quality Issues with Data-Centric AI Approaches | Video | 30 min + 5 min QA session | Jae-Gil Lee |
8:40-9:15 | Keynote Presentation: Data-Efficient Fine-Tuning and Adaptation of Language Models | Video | 30 min + 5 min QA session | Chao Zhang |
9:15-9:30 | Resolving the Imbalance Issue in Hierarchical Disciplinary Topic Inference via LLM-based Data Augmentation | In-person | 12 min + 3 min QA session | Xunxin Cai, Meng Xiao, Zhiyuan Ning, and Yuanchun Zhou |
9:30-9:45 | Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse |
In-person | 12 min + 3 min QA session | Seungyoon Lee, Dahyun Jung, Chanjun Park, Seolhwa Lee, and Heuiseok Lim |
9:45-10:00 | Deep Outdated Fact Detection in Knowledge Graphs |
In-person | 12 min + 3 min QA session | Huiling Tu, Shuo Yu, Vidya Saikrishna, Feng Xia, and Karin Verspoor |
10:00-10:15 | Silence Speaks Volumes: Re-weighting Techniques for Under-Represented Users in Fake News Detection |
Video | 12 min + 3 min QA session | Mansooreh Karami, David Mosallanezhad, Paras Sheth, and Huan Liu |
10:15-10:30 | Guided Nearest-Neighbor Contrastive Learning with Prior Knowledge For Hotel Recognition |
Video | 12 min + 3 min QA session | Aarash Feizi, Randall Balestriero, Arantxa Casanova, Adriana Romero-Soriano, and Reihaneh Rabbany |
10:30-10:35 | Closing Remarks | Organizers |
Topics
We welcome a wide array of submissions focused on data-centric AI, encompassing topics such as theories, algorithms, applications, systems, and tools. These topics include but are not limited to:
- Automated Data Science Methods
- Data cleaning, denoising, and interpolation
- Feature selection and generation
- Data refinement, feature-instance joint selection
- Data quality improvement, representation learning, reconstruction
- Outlier detection and removal
- Tools and Methodologies for Expediting Open-source Dataset Preparation
- Time acceleration tools for sourcing and preparing high-quality data
- Tools for consistent data labeling, data quality improvement
- Tools for generating high-quality supervised learning training data
- Tools for dataset control, high-level editing, searching public resources
- Tools for dataset feedback incorporation, coverage understanding, editing
- Dataset importers and exporters for easy data combination and consumption
- System architectures and interfaces for dataset tool composition
- Algorithms for Handling Limited Labeled Data and Label Efficiency
- Data selection techniques, semi-supervised learning, few-shot learning
- Weak supervision methods, transfer learning, self-supervised learning approaches
- Algorithms for Dealing with Biased, Shifted, Drifted, and Out of Distribution Data
- Datasets for bias evaluation and analysis
- Algorithms for automated bias elimination, model training with biased data
Submission Details
Important Dates
- Workshop Papers Submission: September 15, 2023
- Notification of Workshop Papers Acceptance: September 24, 2023
- Camera-ready Deadline and Copyright Form: October 15, 2023
- Workshop Day: December 1, 2023
Organizing Committee
Steering Co-Chairs
Hui Xiong
The Hong Kong University of Science and Technology (Guangzhou)
Vipin Kumar
University of Minnesota
Program Co-Chairs
Yanjie Fu
Arizona State University
Steven Euijong Whang
Korea Advanced Institute of Science & Technology
Kunpeng Liu
Portland State University
Publicity Co-Chairs
Pengyang Wang
University of Macau
Dongjie Wang
University of Central Florida
Local Co-Chairs
Pengyang Wang
University of Macau
Web Co-Chairs
Dongjie Wang
University of Central Florida
Wei Fan
University of Central Florida
Meng Xiao
Chinese Academy of Sciences
Accepted Paper
- Xunxin Cai, Meng Xiao, Zhiyuan Ning, and Yuanchun Zhou, "Resolving the Imbalance Issue in Hierarchical Disciplinary Topic Inference via LLM-based Data Augmentation"
- Huiling Tu, Shuo Yu, Vidya Saikrishna, Feng Xia, and Karin Verspoor, "Deep Outdated Fact Detection in Knowledge Graphs"
- Aarash Feizi, Randall Balestriero, Arantxa Casanova, Adriana Romero-Soriano, and Reihaneh Rabbany, "Guided Nearest-Neighbor Contrastive Learning with Prior Knowledge For Hotel Recognition"
- Seungyoon Lee, Dahyun Jung, Chanjun Park, Seolhwa Lee, and Heuiseok Lim, "Alternative Speech: Complementary Method to Counter-Narrative for Better Discourse"
- Mansooreh Karami, David Mosallanezhad, Paras Sheth, and Huan Liu, "Silence Speaks Volumes: Re-weighting Techniques for Under-Represented Users in Fake News Detection"
Speakers
- Dr. Jae-Gil Lee, Korea Advanced Institute of Science and Technology
- Dr. Chao Zhang, Georgia Institute of Technology
Program Committee
- Dr. Yong Ge, University of Arizona
- Dr. Hao Liu, The Hong Kong University of Science and Technology (Guangzhou)
- Dr. Kunpeng Liu, Portland State University
- Dr. Qi Liu, University of Science and Technology of China
- Dr. Yanchi Liu, NEC Labs America
- Dr. Leilei Sun, Beihang University
- Dr. Pengfei Wang, Chinese Academy of Sciences
- Dr. Pengyang Wang, University of Macau
- Dr. Senzhang Wang, Central South University
- Dr. Keli Xiao, Stony Brook University
- Dr. Yang Yang, Nanjing University of Science and Technology
- Dr. Zijun Yao, University of Kansas
- Dr. Denghui Zhang, Rutgers University
- Dr. Wei Zhang, University of Central Florida
- Dr. Xi Zhang, Chinese Academy of Sciences
- Dr. Dongjie Wang, University of Central Florida
Volunteers
- Mr. Haihua Xu, University of Macau
- Ms. Qi Hao, University of Macau