Home


Machine learning typically focus on techniques for developing mod- els for a given dataset. However, in real-world applications, data is often messy, and improving models is not always the best way to enhance performance. One can also improve the dataset itself instead of treating it as a fixed input. Data-Centric AI (DCAI) is an emerging field that studies techniques to improve datasets sys- tematically, often resulting in significant improvements in practical ML applications. While good data scientists have traditionally used ad hoc trial-and-error methods and intuition to improve datasets manually, DCAI approaches data improvement as a systematic engineering discipline. DCAI marks a recent shift in focus from modeling to the underlying data used to train and evaluate models. Common model architectures now dominate many tasks, and scaling rules have become predictable. However, building and using datasets is still a labor-intensive and expensive process, with little infrastructure or best practices to make it easier, cheaper, and more repeatable. The DCAI movement aims to address these issues by developing high- productivity, efficient open data engineering tools for managing data in modern ML systems. This workshop aims to foster a vibrant interdisciplinary DCAI community that can tackle practical data problems. These problems include data collection and generation, data labeling, data preprocessing and augmentation, data quality evaluation, data debt, and data governance. Many of these areas are still in their early stages, and the workshop seeks to bring them together to define and shape the DCAI movement that will influence the future of AI and ML. Interested parties can take an active role in shaping this future by submitting papers in response to the call for papers provided below.


Topics


We welcome a wide array of submissions focused on data-centric AI, encompassing topics such as theories, algorithms, applications, systems, and tools. These topics include but are not limited to:

  • Automated Data Science Methods
    • Data cleaning, denoising, and interpolation
    • Feature selection and generation
    • Data refinement, feature-instance joint selection
    • Data quality improvement, representation learning, reconstruction
    • Outlier detection and removal
  • Tools and Methodologies for Expediting Open-source Dataset Preparation
    • Time acceleration tools for sourcing and preparing high-quality data
    • Tools for consistent data labeling, data quality improvement
    • Tools for generating high-quality supervised learning training data
    • Tools for dataset control, high-level editing, searching public resources
    • Tools for dataset feedback incorporation, coverage understanding, editing
    • Dataset importers and exporters for easy data combination and consumption
    • System architectures and interfaces for dataset tool composition
  • Algorithms for Handling Limited Labeled Data and Label Efficiency
    • Data selection techniques, semi-supervised learning, few-shot learning
    • Weak supervision methods, transfer learning, self-supervised learning approaches
  • Algorithms for Dealing with Biased, Shifted, Drifted, and Out of Distribution Data
    • Datasets for bias evaluation and analysis
    • Algorithms for automated bias elimination, model training with biased data

Submission Details


We invite the submission of short paper (up to 6 pages) and full paper (up to 10 pages), including all content and references. Submissions must be in PDF format, and formatted according to the new Standard IEEE Conference Proceedings Template . Submitted papers will be assessed based on their novelty, technical quality, potential impact, insightfulness, depth, clarity, and reproducibility. All the papers are required to be submitted via the online submission system . For more questions about the workshop and submissions, please send email to yanjie.fu@asu.edu.

Important Dates


  • Workshop Papers Submission: Oct. , 2024
  • Notification of Workshop Papers Acceptance: Nov. , 2024
  • Camera-ready Deadline and Copyright Form: Nov. 20, 2024
  • Workshop Day: Dec. 15-18, 2024

Agenda



Organizing Committee


Placeholder

Hui Xiong

The Hong Kong University of Science and Technology (Guangzhou)

Placeholder

Yanjie Fu

Arizona State University

Placeholder

Haifeng Chen

NEC Laboratories America, Inc

Placeholder

Chandan Reddy

Virginia Tech

Placeholder

Kunpeng Liu

Portland State University

Placeholder

Dongjie Wang

University of Kansas

Placeholder

Charu Aggarwal

IBM T. J. Watson Research Center

Placeholder

Wei Ding

University of Massachusetts Boston