Between Impossible and Probable. Architectural Recognition Through Qualitative Evaluation of Artificial Intelligence Response

Carlevaris, Laura; Delgado-Martos, Emilio; INTRA SIDOLA, Giovanni; Ana María Maitín,; Nogales, Alberto; Pesqueira-Calvo, Carlos; Marta Bravo Peña,; and Álvaro José García Tejedor,

doi:10.1007/978-3-031-62963-1_51

Artificial Intelligence (AI) has emerged as a revolutionary discipline that redefines the way we interact with technology and addresses several complex challenges in contemporary society. The ability of machines to learn from data, adapt to new situations and perform specific tasks without human intervention, has led to significant advances in a wide range of fields within architecture, ranging from ideation at the design stage to the interpretation of images of finished buildings. However, the efficiency and reliability of artificial intelligence systems are intrinsically linked to the quality of their results.The evaluation of the results obtained by AI models has become a critical area of research as it directly influences decision making, practical application and trust in these advanced technologies. This paper aims to explore in a generalized way the evaluation methodologies used in the field of artificial intelligence in relation to image analysis of historic buildings. Performance evaluation in the field of architectural heritage faces several challenges, ranging from clearly defining evaluation metrics to interpreting the uncertainty of predictions. The absence of uniform standards and the variability of the datasets used to train and test models further complicate the task of accurately assessing the response efficiency of AI algorithms. In recent years we have seen that as artificial intelligence has evolved, so have evaluation methodologies. From traditional classification metrics to complex evaluations in natural language processing tasks, the scientific community has developed increasingly sophisticated approaches to measure the quality and performance of AI models. The evaluation of artificial intelligence results is not only crucial from an academic point of view, but also plays a fundamental role in the practical application of these systems in real environments. In the field of architectural historical heritage, decisions based on the results of AI models have an impact on critical areas such as accurate dating and identification of artistic styles, forecasting of corrective measures, and monitoring of the useful life of buildings, which underlines the need for rigorous and reliable evaluation. On the other hand, large image-generating AI algorithms (i.e. text-to-image and image-to-image models), capable of combining concepts, attributes, or styles, are achieving a very surprising quality of results. The immense amount of data used in the training of these well-known models contrasts with the reduced number of images with which we have trained our algorithms, calling into question, on the one hand, the reliability of the performance of our proposal (i.e. overtraining) and, on the other hand, the quality and realism of the responses obtained. However, the problem we have identified is that when analyzing the content of the images generated by the large algorithms (e.g. Midjourney, DALL-E 2, Stable Diffusion) is that the totality of the predictions requested, through an apparently clear and concise prompt, incur in a series of errors and contradictions that invalidate the response, turning it in most cases into impossible results. Behind an apparently well-finished image were hidden important defects in the elaboration of architectural elements, inconsistencies with the prompt, optical irregularities and in the integration of extemporaneous elements. Although the great algorithms are not yet oriented to obtain images with a response precision like the one we are looking for in our projects, this situation helped us to become aware of the fact that we are not yet able to obtain images with the same precision as the one we are using in our research. In the field of architecture, especially in the reconstruction of historical heritage, AI tools play a very relevant role because of their ability to address a very complex problem, which is to predict the original state of a building fragment whose initial characteristics are unknown. In this research we propose to identify the origin of the problem in the uncertainty with which virtual reconstructions have traditionally been performed. This uncertainty poses a distance between the prediction and the desired original state that is worth considering. In addition, the list of possible predictions requires not only a terminological organization, but also a structuring of the concepts used. This will allow a better understanding of the difference between generalist models and specialized models, as well as their potential.

Between Impossible and Probable. Architectural Recognition Through Qualitative Evaluation of Artificial Intelligence Response / Carlevaris, Laura; Delgado-Martos, Emilio; INTRA SIDOLA, Giovanni; María Maitín, Ana; Nogales, Alberto; Pesqueira-Calvo, Carlos; Bravo Peña, Marta; Álvaro José García Tejedor, And. - (2024), pp. 839-850. [10.1007/978-3-031-62963-1_51].