Building “Dataset Nutrition Labels” for common race and ethnicity datasets to mitigate bias in algorithmic systems

Kasia Chmielinski
2023 DCSL/CCSRE Technology & Racial Equity Practitioner Fellow
AI systems are only as effective, accurate and inclusive as the data that they are trained on - if the data provided to a model for training purposes is incomplete, biased or unrepresentative, then the system will be as well. In 2018, I co-Founded the Data Nutrition Project (DNP) to investigate methods of increasing the quality & equity of AI systems by building “Dataset Nutrition Labels” (analogous to FDA Nutritional Labels for datasets) that help data practitioners identify the “ingredients” of a dataset -- especially those that are anomalous -- before issues of underlying bias are further propagated in a model. My project proposal has two parts: 1) building Dataset Nutrition Labels for oft-cited race / ethnicity datasets, used for everything from policy recommendations to voter outreach; 2) leveraging these Labels as data literacy tools to drive conversations with data practitioners, policymakers and community members about the lasting and real impact of problematic data.
Bio: Kasia Chmielinski is the Co-Founder of the Data Nutrition Project and a technologist focused on building responsible data systems across industry, academia, government, and non-profit domains. Previously, they held positions at the United Nations (OCHA), US Digital Service (EOP / OMB), MIT Media Lab, McKinsey & Company, and Google. When not thinking about data, Kasia is usually cycling or birdwatching around the Northeastern US.