The field of Artificial Intelligence (AI) offers the potential for transformative advancements across various sectors, fuelled by techniques like Machine Learning (ML), Natural Language Processing (NLP) and Computer Vision (CV). These technologies harness vast datasets to ‘learn’ patterns and create predictive models based on them, heralding an era of enhanced automation and decision-making. However, the performance of AI models is not just an engineering feature; the outputs from the model are fundamentally dependent on data and therefore on the rules underlying its governance.
This report outlines the journey of data within AI systems, seeking to define the key stages from collection and preprocessing to training and deployment. It lays the foundation for understanding the importance of managing data at each stage to ensure ethical and effective AI/ML system development.