Home
Machine learning focuses on developing models for datasets, but real-world data is often messy. Improving the dataset itself can enhance performance. Data-Centric AI (DCAI) studies systematic techniques to improve datasets, resulting in significant improvements in ML applications. DCAI treats data improvement as a systematic engineering discipline, unlike traditional ad hoc methods. DCAI shifts the focus from modeling to the underlying data. Common model architectures dominate tasks, but dataset building and usage are labor-intensive and expensive. The DCAI movement aims to develop efficient open data engineering tools, addressing the lack of infrastructure and best practices. This workshop fosters an interdisciplinary DCAI community to tackle practical data problems. It covers collection, labeling, preprocessing, augmentation, quality evaluation, data debt, and governance. The workshop shapes the DCAI movement that influences the future of AI and ML. Interested parties can submit papers to contribute to shaping this future.