This talk looks into the process of creating datasets in general for Machine Learning, some of the issues involved, tools for data labelling, evaluation processes and some of the challenges I went through in my own work while creating a dataset for labelling South African neighbourhoods on satellite images.
Raesetje is a Machine Learning Researcher who is currently a Computer Science Masters Student at Wits University, Johannesburg. Her research focuses on creating ground truth datasets and using machine learning to study spatial segregation in South Africa, post-Apartheid. She is interested in building communities which aim to increase the capacity and quality of work, of underrepresented groups in AI. Raesetje has been involved in building and organizing events for communities such as Women in Computational Science Research and the Deep Learning IndabaX Pretoria. She is mainly interested in using AI to solve problems experienced in developing countries; creating datasets for machine learning research and the discussions & creation/amendment of data privacy, ethics and accountability policies. As we now move to the next phase of our country’s experience of COVID-19 and its responses, we continue to look back at what the covid19za project has been busy with over the last 2 months. Our focus for this write up is on understanding our health system capacity. The lack of updated openly available health system data (data about hospitals, clinics and health resources) was identified early on in the project as a challenge. Little did project members know how much work would have to go into getting data and making it accessible. In this post, we discuss these challenges, present what has been achieved so far and propose the next steps.
Video, Slides and Notes
Publicly available datasets
- Deep globe building extraction: URL
- DeepGlobe Land Cover Classification: URL
- DeepGlobe Road Extraction: URL
- Dstl Satellite Imagery Feature Detection: URL
- UC Merced Land Use Dataset: URL
- The UAVid dataset: URL
- US National Land Cover Database: URL
- Chesapeake Conservancy land cover classification datasets: URL