Home


Machine learning focuses on developing models for datasets, but real-world data is often messy. Improving the dataset itself can be a better way to enhance performance instead of just improving the models. Data-Centric AI (DCAI) is an emerging field that systematically improves datasets, resulting in significant improvements in ML applications. DCAI treats data improvement as an engineering discipline, offering a shift in focus from modeling to the underlying data. This workshop aims to build an interdisciplinary DCAI community to tackle data problems such as collection, labeling, preprocessing, quality evaluation, debt, and governance. Interested parties can shape the future of AI and ML by submitting papers in response to the call for papers.

marks Organizers 8:05-8:40 Keynote Presentation: Addressing Data Quality Issues with Data-Centric AI Approaches Video 30 min + 5 min QA session Jae-Gil Lee

Agenda


Date: December 9th (GMT+4)
Time Title Format Presenter/Author
15:30-15:35 Opening Remarks 5 min Organizers
15:35-16:10 Keynote Presentation: Active Covering via Density-based Space Transformation 25 min + 10 min QA session Hossein Esfandiari
16:10-16:45 Keynote Presentation: Adapting Graph Models under Domain Shifts: A Data-Centric Perspective 25 min + 10 min QA session Ziyue Qiao
16:45-17:10 Coffee Break 25 min Organizers
17:10-17:18 Paper Presentation: Formal Classifier of a Rhythmic Structure Model for Inference about Linguistic Text Properties 6 min + 2 min QA session Jolanta Mizera-Pietraszko and Jolanta Tancula
17:18-17:26 Paper Presentation: Heisenberg Algebra Implementation to AAG Algorithm for Secure Data Transfer in IoT Networks 6 min + 2 min QA session Jolanta Mizera-Pietraszko and Jolanta Tancula
17:26-17:34 Paper Presentation: Anomaly Detection and Interpretation from Tabular Data Using Transformer Architecture 6 min + 2 min QA session Hajar Homayouni, Hamed Aghayarzadeh, Indrakshi Ray, and Hossein Shirazi
17:34-17:42 Paper Presentation: scReader: Prompting Large Language Models to Interpret scRNA-seq Data 6 min + 2 min QA session Cong Li, Qingqing Long, Yuanchun Zhou, and Meng Xiao
17:42-17:50 Paper Presentation: MRDiff: Time Series Anomaly Detection Using Multi-level Reconstruction Diffusion 6 min + 2 min QA session Daygeong Na and Junseok Kwon
17:50-17:58 Paper Presentation: On the classification of weather based on the production of photovoltaic installations 6 min + 2 min QA session Paweł Parczyk, Tobiasz Puślecki, and Robert Burduk
17:58-18:00 Closing Remarks 2 min Organizers

Topics


We welcome a wide array of submissions focused on data-centric AI, encompassing topics such as theories, algorithms, applications, systems, and tools. These topics include but are not limited to:

  • Automated Data Science Methods
    • Data cleaning, denoising, and interpolation
    • Feature selection and generation
    • Data refinement, feature-instance joint selection
    • Data quality improvement, representation learning, reconstruction
    • Outlier detection and removal
  • Tools and Methodologies for Expediting Open-source Dataset Preparation
    • Time acceleration tools for sourcing and preparing high-quality data
    • Tools for consistent data labeling, data quality improvement
    • Tools for generating high-quality supervised learning training data
    • Tools for dataset control, high-level editing, searching public resources
    • Tools for dataset feedback incorporation, coverage understanding, editing
    • Dataset importers and exporters for easy data combination and consumption
    • System architectures and interfaces for dataset tool composition
  • Algorithms for Handling Limited Labeled Data and Label Efficiency
    • Data selection techniques, semi-supervised learning, few-shot learning
    • Weak supervision methods, transfer learning, self-supervised learning approaches
  • Algorithms for Dealing with Biased, Shifted, Drifted, and Out of Distribution Data
    • Datasets for bias evaluation and analysis
    • Algorithms for automated bias elimination, model training with biased data

Submission Details


We invite the submission of regular research papers (6-10 pages), including the bibliography and any possible appendices. Submissions must be in PDF format, and formatted according to the new Standard IEEE Conference Proceedings Template. Submitted papers will be assessed based on their novelty, technical quality, potential impact, insightfulness, depth, clarity, and reproducibility. All the papers are required to be submitted via the wi-lab system. By the unique ICDM tradition, all accepted workshop papers will be published in the dedicated ICDMW proceedings published by the IEEE Computer Society Press. For more questions about the workshop and submissions, please send email to kunpeng@pdx.edu

Important Dates (All deadlines are at 11:59 pm in the Anywhere on Earth timezone)


  • Workshop Papers Submission: September 22, 2024
  • Notification of Workshop Papers Acceptance: October 7, 2024
  • Camera-ready Deadline and Copyright Form: October 11, 2024
  • Workshop Day: December 9, 2024

Organizing Committee


Steering Co-Chairs

Placeholder

Hui Xiong

The Hong Kong University of Science and Technology (Guangzhou)

Placeholder

Vipin Kumar

University of Minnesota

Program Co-Chairs

Placeholder

Yanjie Fu

Arizona State University

Placeholder

Steven Euijong Whang

Korea Advanced Institute of Science & Technology

Placeholder

Kunpeng Liu

Portland State University

Placeholder

Meng Xiao

Chinese Academy of Sciences

Publicity Co-Chairs

Placeholder

Pengyang Wang

University of Macau

Placeholder

Dongjie Wang

University of Kansas

Local Co-Chairs

Placeholder

Pengyang Wang

University of Macau

Web Co-Chairs

Placeholder

Dongjie Wang

University of Kansas

Placeholder

Wei Fan

University of Central Florida


Keynote Presentations


Active Covering via Density-based Space Transformation

Presenter: Hossein Esfandiari

Bio: Dr. Esfandiari is a Senior Research Scientist at Google Research where he works on a wide range of research areas from Machine Learning, Data Mining, and Algorithms to Web and Ethics. Prior to that Dr. Esfandiari was a Postdoctoral Researcher at Harvard University. He received a Ph.D in Computer Science from University of Maryland, while archiving several honors and awards such as, Google PhD Fellowship in Market Algorithms, World Quantitative and Science Scholarship, Outstanding Graduate Student Dean's Fellowship, Ann G. Wylie Dissertation Fellowship, and UPE Award at ACM-ICPC International Programming Contest, among several others. 


Program Committee


  • Dr. Yong Ge, University of Arizona
  • Dr. Hao Liu, The Hong Kong University of Science and Technology (Guangzhou)
  • Dr. Kunpeng Liu, Portland State University
  • Dr. Qi Liu, University of Science and Technology of China
  • Dr. Yanchi Liu, NEC Labs America
  • Dr. Leilei Sun, Beihang University
  • Dr. Pengfei Wang, Chinese Academy of Sciences
  • Dr. Pengyang Wang, University of Macau
  • Dr. Senzhang Wang, Central South University
  • Dr. Keli Xiao, Stony Brook University
  • Dr. Yang Yang, Nanjing University of Science and Technology
  • Dr. Zijun Yao, University of Kansas
  • Dr. Denghui Zhang, Rutgers University
  • Dr. Wei Zhang, University of Central Florida
  • Dr. Xi Zhang, Chinese Academy of Sciences
  • Dr. Dongjie Wang, University of Kansas

Volunteers


  • Mr. Haihua Xu, University of Macau
  • Ms. Qi Hao, University of Macau

Photos