Yad Vashem develops AI language model to search for unknown Holocaust victims
Yad Vashem's innovation department has developed an artificial intelligence language model capable of identifying new names and identification features in a database of evidence. Thanks to this AI model, information about 400 previously unknown victims of the Holocaust has been added to the Hall of Names.
Currently, the Yad Vashem Hall of Names commemorates 4.9 million names of Holocaust victims. Adding each name requires extensive effort, based on evidence from the database, which currently holds approximately 10 million records from various sources in different formats and languages.
Names are only added to the Hall after proper identification. This involves establishing mandatory identification characteristics such as first name, last name, father’s or mother’s name, profession, or year of birth—all of which must be confirmed by experts. Calcalist notes that identifying children is particularly challenging, as they are often referred to not by name but simply as “boy” or “girl.”
Working with the evidence base is challenging, as it contains data in multiple languages, audio and video recordings, and handwritten sources, further complicating the process.
Yad Vashem experts recognize that testimonies often reference multiple victims beyond the person providing the testimony. Therefore, the task of considering cross-references has long been a priority. While this task is challenging given the state of the sources and the volume of the database, it is essential for accurate identification. Moreover, the database is continuously updated.
Yad Vashem scientists have trained a language model to recognize such cross-references. Initially, the model was trained on data labeled by experts before being set to search independently. The model efficiently and successfully identified new data, resulting in the addition of 400 new names to the Hall. Each name was further confirmed by experts, with approximately the same number of names awaiting confirmation. According to experts, every 20 thousand certificates in the database contain information about at least seven new names. An important outcome of the AI model's work is the standardization of the evidence base itself and the creation of concise descriptions for each entry.