Let’s Make a Cinephile Search Tool!
The second Data Challenge of the Data Competence Center HERMES
We’re thrilled to announce the second data challenge hosted by the Data Competence Center HERMES — and this time, we’re on a mission to transform access to cultural data. This exciting challenge focuses on enhancing the discoverability of audio-visual heritage using data provided by three major European institutions:
- Deutsches Filminstitut und Filmmuseum (The German Film Institute and Film Museum)
- Das Bundesarchiv (Federal Archives of Germany)
- Nederlands Instituut voor Beeld en Geluid (The Netherlands Institute for Sound & Vision)
All three institutions have published a collection of public domain videos on Europeana, from which we’ve selected video files and their accompanying metadata for this challenge.
The Challenge
Film and media scholars increasingly work with vast and complex digital moving image datasets. Beyond exploring associated metadata — such as members of film crew, production dates, and locations — they conduct visual and auditory analyses. The exponential growth of public and private audiovisual archives makes it nearly impossible to manually view all available videos in search of specific visual or sonic patterns.
Yet, tools for conducting multimodal searches across these datasets are still rare. For example, no commonly available app allows users to search a local or institutional video collection for clips showing, for example, a woman operating machinery in a historical film in a certain national context. Nor can you upload an image and retrieve all scenes with similar composition.
That’s where you come in. The goal of this challenge is to build exactly that kind of complex, multimodal search tool.
The Dataset
The dataset comprises 40 hours of .mp4 video (20 hours from the German Archives and 20 hours from the Netherlands Institute for Sound and Vision) to make sure that the dataset is diverse enough. Each file is accompanied by a .json metadata file sharing the same name. At this link, you can find the code we used to retrieve data from the Europeana API. All materials are publicly available via Europeana. The full dataset size is 30.2 GB. You can download the first dataset from this link and the second one from this link.
Your Mission
We challenge you to create an advanced discovery system that enables multimodal search across the dataset — going beyond metadata to analyze the actual video content as well. Your discovery system should be capable of performing at least two of the following tasks, but the more, the better:
- Searching the metadata for information
- Detecting as many shot types as possible (e.g., long shot, medium shot, American shot, close-up, extreme close-up)
- Detecting shot transitions (cuts)
- Identifying objects, locations, people, actions, etc. in single shots
- Accepting an input image and retrieving all scenes with similar visual composition
- Also: Any additional search feature that you find appropriate or particularly relevant to the dataset
- And ideally: combining all the above into a chatbot that can process complex queries, such as (but of course not limited to):
- “Find a scene from 1940s Germany in which a woman is working with a machine in a medium shot.”
- “Find a film where a medium shot of a soldier cuts to a close-up of his face.”
- “Find all films made after 1945 that contain shots resembling the input image.”
Your search tool doesn’t need to include all the options listed above: you can start with implementing only two options at first and then add the others if your team has the capacity for it. Ultimately, the more accurately your tool can explore and interpret the video content and its metadata, the more valuable it will be.
Don’t be intimidated by the challenge’s scale! There are plenty of free, pre-trained LLMs and CNNs available that you can integrate into your project and leverage for transfer learning. Plus, we’ll be here to support you throughout the journey, especially at the kickoff!
Who Can Register?
This challenge is open to students, PhD candidates, postdoctoral researchers, professors, and adjunct academic staff from around the world with backgrounds in film and media studies, digital humanities, data science, and related fields. Each team should ideally include at least one expert in film/media studies and one in data science/programming. If you’re interested but don’t yet know someone from the other field, join us at the kickoff event — we’ll help you connect and form interdisciplinary teams.
The Challenge Timeline
- Kickoff (23.05.2025) - This online event is a chance to get to know each other, ask questions, and explore initial ideas and approaches. If you’re looking for teammates or collaborators, you’ll also have the opportunity to form teams.
- Register Your Team (31.05.2025) - Use the registration button provided below to sign up.
- Develop Your Solution - You’ll have three full months to tackle the challenge. During this time, we’ll host two check-in meetings for teams to share progress, ask questions, and receive technical support.
- Submit Your Work - Upload your packaged code (developed in the programming language of your choice) to a GitHub repository and send the link, together with the names of your team and team members, to hermes.challenges@uni-marburg.de. It’s your choice whether you want to develop an interface for your app. Your README.md should include:
- Clear documentation of your methodology
- Instructions for running the code
- A list of dependencies via requirements.txt or a poetry.lock file
- A panel of three experts in digital media studies will evaluate the submissions based on performance, innovation, and impact.
Final Presentation & Recognition
In March 2026, the winning team will present their results at a two-day expert workshop at Philipps University of Marburg, Germany. Depending on your location and team size, we will cover all or part of your travel expenses.
Key dates:
- Kickoff: 23 May 2025, 4:00 PM (CET) — Zoom link
- Registration deadline: 31 May 2025, 11:59 PM
- Submission of solutions: 31 August 2025, 11:59 PM
- Results Announced: 1 December 2025
- Final Presentation & Workshop: March 2026 (the names of the experts invited to the workshop and its exact time and place will follow.)
Are you up for the challenge? Let’s push the boundaries of film discovery together!