Create an AI-Driven Movie Quiz with Gemini LLM, Python, FastAPI, Pydantic, RAG and more

TDS · April 18

Discover the basics of using Gemini with Python via VertexAI, creating APIs with FastAPI, data validation with Pydantic and the fundamentals of Retrieval-Augmented Generation (RAG)

Within this article, I share some of the basics to create a LLM-driven web-application, using various technologies, such as: Python, FastAPI, Pydantic, VertexAI and more.

You will learn how to create such a project from the very beginning and get an overview of the underlying concepts, including Retrieval-Augmented Generation (RAG).

Disclaimer: I am using data from The Movie Database within this project. The API is free to use for non-commercial purposes and complies with the Digital Millennium Copyright Act (DMCA). For further information about TMDB data usage, please read the official FAQ.

– Inspiration
– System Architecture
– Understanding Retrieval-Augmented Generation (RAG)
– Python projects with Poetry
– Create the API with FastAPI
– Data validation and quality with Pydantic
– TMDB client with httpx
– Gemini LLM client with VertexAI
– Modular prompt generator with Jinja
– Frontend
– API examples
– Conclusion

The best way to share this knowledge is through a practical example. Hence, I’ll use my project Gemini Movie Detectives to cover the various aspects. The project was created as part of the Google AI Hackathon 2024, which is still running while I am writing this.

Gemini Movie Detectives (by author)

Gemini Movie Detectives is a project aimed at leveraging the power of the Gemini Pro model via VertexAI to create an engaging quiz game using the latest movie data from The Movie Database (TMDB).

Part of the project was also to make it deployable with Docker and to create a live version. Try it yourself: movie-detectives.com. Keep in mind that this is a simple prototype, so there might be unexpected issues. Also, I had to add some limitations in order to control costs that might be generated by using GCP and VertexAI.

Gemini Movie Detectives (by author)

The project is fully open-source and is split into two separate repositories:

Github repository for backend: https://github.com/vojay-dev/gemini-movie-detectives-api
Github repository for frontend: https://github.com/vojay-dev/gemini-movie-detectives-ui

The focus of the article is the backend project and underlying concepts. It will therefore only briefly explain the frontend and its components.

In the following video, I also give an overview over the project and its components:

Inspiration

Growing up as a passionate gamer and now working as a Data Engineer, I’ve always been drawn to the intersection of gaming and data. With this project, I combined two of my greatest passions: gaming and data. Back in the 90’ I always enjoyed the video game series You Don’t Know Jack, a delightful blend of trivia and comedy that not only entertained but also taught me a thing or two. Generally, the usage of games for educational purposes is another concept that fascinates me.

In 2023, I organized a workshop to teach kids and young adults game development. They learned about mathematical concepts behind collision detection, yet they had fun as everything was framed in the context of gaming. It was eye-opening that gaming is not only a huge market but also holds a great potential for knowledge sharing.

With this project, called Movie Detectives, I aim to showcase the magic of Gemini, and AI in general, in crafting engaging trivia and educational games, but also how game design can profit from these technologies in general.

By feeding the Gemini LLM with accurate and up-to-date movie metadata, I could ensure the accuracy of the questions from Gemini. An important aspect, because without this Retrieval-Augmented Generation (RAG) methodology to enrich queries with real-time metadata, there’s a risk of propagating misinformation — a typical pitfall when using AI for this purpose.

Another game-changer lies in the modular prompt generation framework I’ve crafted using Jinja templates. It’s like having a Swiss Army knife for game design — effortlessly swapping show master personalities to tailor the game experience. And with the language module, translating the quiz into multiple languages is a breeze, eliminating the need for costly translation processes.

Taking that on a business perspective, it can be used to reach a much broader audience of customers, without the need of expensive translation processes.

From a business standpoint, this modularization opens doors to a wider customer base, transcending language barriers without breaking a sweat. And personally, I’ve experienced firsthand the transformative power of these modules. Switching from the default quiz master to the dad-joke-quiz-master was a riot — a nostalgic nod to the heyday of You Don’t Know Jack, and a testament to the versatility of this project.

Movie Detectives — Example: Santa Claus personality (by author)

System Architecture

Before we jump into details, let’s get an overview of how the application was built.

Tech Stack: Backend

Python 3.12 + FastAPI API development
httpx for TMDB integration
Jinja templating for modular prompt generation
Pydantic for data modeling and validation
Poetry for dependency management
Docker for deployment
TMDB API for movie data
VertexAI and Gemini for generating quiz questions and evaluating answers
Ruff as linter and code formatter together with pre-commit hooks
Github Actions to automatically run tests and linter on every push

Tech Stack: Frontend

VueJS 3.4 as the frontend framework
Vite for frontend tooling

Essentially, the application fetches up-to-date movie metadata from an external API (TMDB), constructs a prompt based on different modules (personality, language, …), enriches this prompt with the metadata and that way, uses Gemini to initiate a movie quiz in which the user has to guess the correct title.

The backend infrastructure is built with FastAPI and Python, employing the Retrieval-Augmented Generation (RAG) methodology to enrich queries with real-time metadata. Utilizing Jinja templating, the backend modularizes prompt generation into base, personality, and data enhancement templates, enabling the generation of accurate and engaging quiz questions.

The frontend is powered by Vue 3 and Vite, supported by daisyUI and Tailwind CSS for efficient frontend development. Together, these tools provide users with a sleek and modern interface for seamless interaction with the backend.

In Movie Detectives, quiz answers are interpreted by the Language Model (LLM) once again, allowing for dynamic scoring and personalized responses. This showcases the potential of integrating LLM with RAG in game design and development, paving the way for truly individualized gaming experiences. Furthermore, it demonstrates the potential for creating engaging quiz trivia or educational games by involving LLM. Adding and changing personalities or languages is as easy as adding more Jinja template modules. With very little effort, this can change the full game experience, reducing the effort for developers.

System Overview (by author)

As can be seen in the overview, Retrieval-Augmented Generation (RAG) is one of the essential ideas of the backend. Let’s have a closer look at this particular paradigm.

Understanding Retrieval-Augmented Generation (RAG)

In the realm of Large Language Models (LLM) and AI, one paradigm becoming more and more popular is Retrieval-Augmented Generation (RAG). But what does RAG entail, and how does it influence the landscape of AI development?

At its essence, RAG enhances LLM systems by incorporating external data to enrich their predictions. Which means, you pass relevant context to the LLM as an additional part of the prompt, but how do you find relevant context? Usually, this data can be automatically retrieved from a database with vector search or dedicated vector databases. Vector databases are especially useful, since they store data in a way, so that it can be queried for similar data quickly. The LLM then generates the output based on both, the query and the retrieved documents.

Picture this: you have an LLM capable of generating text based on a given prompt. RAG takes this a step further by infusing additional context from external sources, like up-to-date movie data, to enhance the relevance and accuracy of the generated text.

Let’s break down the key components of RAG:

LLMs: LLMs serve as the backbone of RAG workflows. These models, trained on vast amounts of text data, possess the ability to understand and generate human-like text.
Vector Indexes for contextual enrichment: A crucial aspect of RAG is the use of vector indexes, which store embeddings of text data in a format understandable by LLMs. These indexes allow for efficient retrieval of relevant information during the generation process. In the context of the project this could be a database of movie metadata.
Retrieval process: RAG involves retrieving pertinent documents or information based on the given context or prompt. This retrieved data acts as the additional input for the LLM, supplementing its understanding and enhancing the quality of generated responses. This could be getting all relevant information known and connected to a specific movie.
Generative Output: With the combined knowledge from both the LLM and the retrieved context, the system generates text that is not only coherent but also contextually relevant, thanks to the augmented data.

RAG architecture (by author)

While in the Gemini Movie Detectives project, the prompt is enhanced with external API data from The Movie Database, RAG typically involves the use of vector indexes to streamline this process. It is using much more complex documents as well as a much higher amount of data for enhancement. Thus, these indexes act like signposts, guiding the system to relevant external sources quickly.

In this project, it is therefore a mini version of RAG but showing the basic idea at least, demonstrating the power of external data to augment LLM capabilities.

In more general terms, RAG is a very important concept, especially when crafting trivia quizzes or educational games using LLMs like Gemini. This concept can avoid the risk of false positives, asking wrong questions, or misinterpreting answers from the users.

Here are some open-source projects that might be helpful when approaching RAG in one of your projects:

txtai: All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows.
LangChain: LangChain is a framework for developing applications powered by large language models (LLMs).
Qdrant: Vector Search Engine for the next generation of AI applications.
Weaviate: Weaviate is a cloud-native, open source vector database that is robust, fast, and scalable.

Of course, with the potential value of this approach for LLM-based applications, there are many more open- and close-source alternatives, but with these, you should be able to get your research on the topic started.

Python projects with Poetry

Now that the main concepts are clear, let’s have a closer look how the project was created and how dependencies are managed in general.

The three main tasks Poetry can help you with are: Build, Publish and Track. The idea is to have a deterministic way to manage dependencies, to share your project and to track dependency states.

Photo by Kat von Wood on Unsplash

Poetry also handles the creation of virtual environments for you. Per default, those are in a centralized folder within your system. However, if you prefer to have the virtual environment of project in the project folder, like I do, it is a simple config change:

poetry config virtualenvs.in-project true

With poetry new you can then create a new Python project. It will create a virtual environment linking you systems default Python. If you combine this with pyenv, you get a flexible way to create projects using specific versions. Alternatively, you can also tell Poetry directly which Python version to use: poetry env use /full/path/to/python.

Once you have a new project, you can use poetry add to add dependencies to it.

With this, I created the project for Gemini Movie Detectives:

poetry config virtualenvs.in-project true
poetry new gemini-movie-detectives-api

cd gemini-movie-detectives-api

poetry add 'uvicorn[standard]'
poetry add fastapi
poetry add pydantic-settings
poetry add httpx
poetry add 'google-cloud-aiplatform>=1.38'
poetry add jinja2

The metadata about your projects, including the dependencies with the respective versions, are stored in the poetry.toml and poetry.lock files. I added more dependencies later, which resulted in the following poetry.toml for the project:

[tool.poetry]
name = "gemini-movie-detectives-api"
version = "0.1.0"
description = "Use Gemini Pro LLM via VertexAI to create an engaging quiz game incorporating TMDB API data"
authors = ["Volker Janz <volker@janz.sh>"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.12"
fastapi = "^0.110.1"
uvicorn = {extras = ["standard"], version = "^0.29.0"}
python-dotenv = "^1.0.1"
httpx = "^0.27.0"
pydantic-settings = "^2.2.1"
google-cloud-aiplatform = ">=1.38"
jinja2 = "^3.1.3"
ruff = "^0.3.5"
pre-commit = "^3.7.0"


[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

Create the API with FastAPI

FastAPI is a Python framework that allows for rapid API development. Built on open standards, it offers a seamless experience without new syntax to learn. With automatic documentation generation, robust validation, and integrated security, FastAPI streamlines development while ensuring great performance.

Photo by Florian Steciuk on Unsplash

Implementing the API for the Gemini Movie Detectives projects, I simply started from a Hello World application and extended it from there. Here is how to get started:

from fastapi import FastAPI

app = FastAPI()


@app.get("/")
def read_root():
    return {"Hello": "World"}

Assuming you also keep the virtual environment within the project folder as .venv/ and use uvicorn, this is how to start the API with the reload feature enabled, in order to test code changes without the need of a restart:

source .venv/bin/activate
uvicorn gemini_movie_detectives_api.main:app --reload
curl -s localhost:8000 | jq .

If you have not yet installed jq, I highly recommend doing it now. I might cover this wonderful JSON Swiss Army knife in a future article. This is how the response looks like:

Hello FastAPI (by author)

From here, you can develop your API endpoints as needed. This is how the API endpoint implementation to start a movie quiz in Gemini Movie Detectives looks like for example:

@app.post('/quiz')
@rate_limit
@retry(max_retries=settings.quiz_max_retries)
def start_quiz(quiz_config: QuizConfig = QuizConfig()):
    movie = tmdb_client.get_random_movie(
        page_min=_get_page_min(quiz_config.popularity),
        page_max=_get_page_max(quiz_config.popularity),
        vote_avg_min=quiz_config.vote_avg_min,
        vote_count_min=quiz_config.vote_count_min
    )

    if not movie:
        logger.info('could not find movie with quiz config: %s', quiz_config.dict())
        raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail='No movie found with given criteria')

    try:
        genres = [genre['name'] for genre in movie['genres']]

        prompt = prompt_generator.generate_question_prompt(
            movie_title=movie['title'],
            language=get_language_by_name(quiz_config.language),
            personality=get_personality_by_name(quiz_config.personality),
            tagline=movie['tagline'],
            overview=movie['overview'],
            genres=', '.join(genres),
            budget=movie['budget'],
            revenue=movie['revenue'],
            average_rating=movie['vote_average'],
            rating_count=movie['vote_count'],
            release_date=movie['release_date'],
            runtime=movie['runtime']
        )

        chat = gemini_client.start_chat()

        logger.debug('starting quiz with generated prompt: %s', prompt)
        gemini_reply = gemini_client.get_chat_response(chat, prompt)
        gemini_question = gemini_client.parse_gemini_question(gemini_reply)

        quiz_id = str(uuid.uuid4())
        session_cache[quiz_id] = SessionData(
            quiz_id=quiz_id,
            chat=chat,
            question=gemini_question,
            movie=movie,
            started_at=datetime.now()
        )

        return StartQuizResponse(quiz_id=quiz_id, question=gemini_question, movie=movie)
    except GoogleAPIError as e:
        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f'Google API error: {e}')
    except Exception as e:
        raise HTTPException(status_code=status.HTTP_500_INTERNAL_SERVER_ERROR, detail=f'Internal server error: {e}')

Within this code, you can see already three of the main components of the backend:

tmdb_client: A client I implemented using httpx to fetch data from The Movie Database (TMDB).
prompt_generator: A class that helps to generate modular prompts based on Jinja templates.
gemini_client: A client to interact with the Gemini LLM via VertexAI in Google Cloud.

We will look at these components in detail later, but first some more helpful insights regarding the usage of FastAPI.

FastAPI makes it really easy to define the HTTP method and data to be transferred to the backend. For this particular function, I expect a POST request as this creates a new quiz. This can be done with the post decorator:

@app.post('/quiz')

Also, I am expecting some data within the request sent as JSON in the body. In this case, I am expecting an instance of QuizConfig as JSON. I simply defined QuizConfig as a subclass of BaseModel from Pydantic (will be covered later) and with that, I can pass it in the API function and FastAPI will do the rest:

class QuizConfig(BaseModel):
    vote_avg_min: float = Field(5.0, ge=0.0, le=9.0)
    vote_count_min: float = Field(1000.0, ge=0.0)
    popularity: int = Field(1, ge=1, le=3)
    personality: str = Personality.DEFAULT.name
    language: str = Language.DEFAULT.name
# ...
def start_quiz(quiz_config: QuizConfig = QuizConfig()):

Furthermore, you might notice two custom decorators:

@rate_limit
@retry(max_retries=settings.quiz_max_retries)

These I implemented to reduce duplicate code. They wrap the API function to retry the function in case of errors and to introduce a global rate limit of how many movie quizzes can be started per day.

What I also liked personally is the error handling with FastAPI. You can simply raise a HTTPException, give it the desired status code and the user will then receive a proper response, for example, if no movie could be found with a given configuration:

raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail='No movie found with given criteria')

With this, you should have an overview of creating an API like the one for Gemini Movie Detectives with FastAPI. Keep in mind: all code is open-source, so feel free to have a look at the API repository on Github.

Data validation and quality with Pydantic

One of the main challenges with todays AI/ML projects is data quality. But that does not only apply to ETL/ELT pipelines, which prepare datasets to be used in model training or prediction, but also to the AI/ML application itself. Using Python for example usually enables Data Engineers and Scientist to get a reasonable result with little code but being (mostly) dynamically typed, Python lacks of data validation when used in a naive way.

That is why in this project, I combined FastAPI with Pydantic, a powerful data validation library for Python. The goal was to make the API lightweight but strict and strong, when it comes to data quality and validation. Instead of plain dictionaries for example, the Movie Detectives API strictly uses custom classes inherited from the BaseModel provided by Pydantic. This is the configuration for a quiz for example:

class QuizConfig(BaseModel):
    vote_avg_min: float = Field(5.0, ge=0.0, le=9.0)
    vote_count_min: float = Field(1000.0, ge=0.0)
    popularity: int = Field(1, ge=1, le=3)
    personality: str = Personality.DEFAULT.name
    language: str = Language.DEFAULT.name

This example illustrates, how not only correct type is ensured, but also further validation is applied to the actual values.

Furthermore, up-to-date Python features, like StrEnum are used to distinguish certain types, like personalities:

class Personality(StrEnum):
    DEFAULT = 'default.jinja'
    CHRISTMAS = 'christmas.jinja'
    SCIENTIST = 'scientist.jinja'
    DAD = 'dad.jinja'

Also, duplicate code is avoided by defining custom decorators. For example, the following decorator limits the number of quiz sessions today, to have control over GCP costs:

call_count = 0
last_reset_time = datetime.now()


def rate_limit(func: callable) -> callable:
    @wraps(func)
    def wrapper(*args, **kwargs) -> callable:
        global call_count
        global last_reset_time

        # reset call count if the day has changed
        if datetime.now().date() > last_reset_time.date():
            call_count = 0
            last_reset_time = datetime.now()

        if call_count >= settings.quiz_rate_limit:
            raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail='Daily limit reached')

        call_count += 1
        return func(*args, **kwargs)

    return wrapper

It is then simply applied to the related API function:

@app.post('/quiz')
@rate_limit
@retry(max_retries=settings.quiz_max_retries)
def start_quiz(quiz_config: QuizConfig = QuizConfig()):

The combination of up-to-date Python features and libraries, such as FastAPI, Pydantic or Ruff makes the backend less verbose but still very stable and ensures a certain data quality, to ensure the LLM output has the expected quality.

TMDB client with httpx

The TMDB Client class is using httpx to perform requests against the TMDB API.

httpx is a rising star in the world of Python libraries. While requests has long been the go-to choice for making HTTP requests, httpx offers a valid alternative. One of its key strengths is asynchronous functionality. httpx allows you to write code that can handle multiple requests concurrently, potentially leading to significant performance improvements in applications that deal with a high volume of HTTP interactions. Additionally, httpx aims for broad compatibility with requests, making it easier for developers to pick it up.

In case of Gemini Movie Detectives, there are two main requests:

get_movies: Get a list of random movies based on specific settings, like average number of votes
get_movie_details: Get details for a specific movie to be used in a quiz

In order to reduce the amount of external requests, the latter one uses the lru_cache decorator, which stands for “Least Recently Used cache”. It’s used to cache the results of function calls so that if the same inputs occur again, the function doesn’t have to recompute the result. Instead, it returns the cached result, which can significantly improve the performance of the program, especially for functions with expensive computations. In our case, we cache the details for 1024 movies, so if 2 players get the same movie, we do not need to make a request again:

@lru_cache(maxsize=1024)
def get_movie_details(self, movie_id: int):
    response = httpx.get(f'https://api.themoviedb.org/3/movie/{movie_id}', headers={
        'Authorization': f'Bearer {self.tmdb_api_key}'
    }, params={
        'language': 'en-US'
    })

    movie = response.json()
    movie['poster_url'] = self.get_poster_url(movie['poster_path'])

    return movie

Accessing data from The Movie Database (TMDB) is for free for non-commercial usage, you can simply generate an API key and start making requests.

Gemini LLM client with VertexAI

Before Gemini via VertexAI can be used, you need a Google Cloud project with VertexAI enabled and a Service Account with sufficient access together with its JSON key file.

Create GCP project (by author)

After creating a new project, navigate to APIs & Services –> Enable APIs and service –> search for VertexAI API –> Enable.

Enable VertexAI (by author)

To create a Service Account, navigate to IAM & Admin –> Service Accounts –> Create service account. Choose a proper name and go to the next step.

Create Service Account (by author)

Now ensure to assign the account the pre-defined role Vertex AI User.

Assign correct role (by author)

Finally you can generate and download the JSON key file by clicking on the new user –> Keys –> Add Key –> Create new key –> JSON. With this file, you are good to go.

Create JSON key file (by author)

Using Gemini from Google with Python via VertexAI starts by adding the necessary dependency to the project:

poetry add 'google-cloud-aiplatform>=1.38'

With that, you can import and initialize vertexai with your JSON key file. Also you can load a model, like the newly released Gemini 1.5 Pro model, and start a chat session like this:

import vertexai
from google.oauth2.service_account import Credentials
from vertexai.generative_models import GenerativeModel

project_id = "my-project-id"
location = "us-central1"

credentials = Credentials.from_service_account_file("credentials.json")
model = "gemini-1.0-pro"

vertexai.init(project=project_id, location=location, credentials=credentials)
model = GenerativeModel(model)

chat_session = model.start_chat()

You can now use chat.send_message() to send a prompt to the model. However, since you get the response in chunks of data, I recommend using a little helper function, so that you simply get the full response as one String:

def get_chat_response(chat: ChatSession, prompt: str) -> str:
    text_response = []
    responses = chat.send_message(prompt, stream=True)
    for chunk in responses:
        text_response.append(chunk.text)
    return ''.join(text_response)

A full example can then look like this:

import vertexai
from google.oauth2.service_account import Credentials
from vertexai.generative_models import GenerativeModel, ChatSession

project_id = "my-project-id"
location = "us-central1"

credentials = Credentials.from_service_account_file("credentials.json")
model = "gemini-1.0-pro"

vertexai.init(project=project_id, location=location, credentials=credentials)
model = GenerativeModel(model)

chat_session = model.start_chat()


def get_chat_response(chat: ChatSession, prompt: str) -> str:
    text_response = []
    responses = chat.send_message(prompt, stream=True)
    for chunk in responses:
        text_response.append(chunk.text)
    return ''.join(text_response)


response = get_chat_response(
    chat_session,
    "How to say 'you are awesome' in Spanish?"
)
print(response)

Running this, Gemini gave me the following response:

You are awesome (by author)

I agree with Gemini:

Eres increíble

Another hint when using this: you can also configure the model generation by passing a configuration to the generation_config parameter as part of the send_message function. For example:

generation_config = {
    'temperature': 0.5
}

responses = chat.send_message(
    prompt,
    generation_config=generation_config,
    stream=True
)

I am using this in Gemini Movie Detectives to set the temperature to 0.5, which gave me best results. In this context temperature means: how creative are the generated responses by Gemini. The value must be between 0.0 and 1.0, whereas closer to 1.0 means more creativity.

One of the main challenges apart from sending a prompt and receive the reply from Gemini is to parse the reply in order to extract the relevant information.

One learning from the project is:

Specify a format for Gemini, which does not rely on exact words but uses key symbols to separate information elements

For example, the question prompt for Gemini contains this instruction:

Your reply must only consist of three lines! You must only reply strictly using the following template for the three lines:
Question: <Your question>
Hint 1: <The first hint to help the participants>
Hint 2: <The second hint to get the title more easily>

The naive approach would be, to parse the answer by looking for a line that starts with Question:. However, if we use another language, like German, the reply would look like: Antwort:.

Instead, focus on the structure and key symbols. Read the reply like this:

It has 3 lines
The first line is the question
Second line the first hint
Third line the second hint
Key and value are separated by :

With this approach, the reply can be parsed language agnostic, and this is my implementation in the actual client:

@staticmethod
def parse_gemini_question(gemini_reply: str) -> GeminiQuestion:
    result = re.findall(r'[^:]+: ([^\n]+)', gemini_reply, re.MULTILINE)
    if len(result) != 3:
        msg = f'Gemini replied with an unexpected format. Gemini reply: {gemini_reply}'
        logger.warning(msg)
        raise ValueError(msg)

    question = result[0]
    hint1 = result[1]
    hint2 = result[2]

    return GeminiQuestion(question=question, hint1=hint1, hint2=hint2)

In the future, the parsing of responses will become even easier. During the Google Cloud Next ’24 conference, Google announced that Gemini 1.5 Pro is now publicly available and with that, they also announced some features including a JSON mode to have responses in JSON format. Checkout this article for more details.

Apart from that, I wrapped the Gemini client into a configurable class. You can find the full implementation open-source on Github.

Modular prompt generator with Jinja

The Prompt Generator is a class wich combines and renders Jinja2 template files to create a modular prompt.

There are two base templates: one for generating the question and one for evaluating the answer. Apart from that, there is a metadata template to enrich the prompt with up-to-date movie data. Furthermore, there are language and personality templates, organized in separate folders with a template file for each option.

Prompt Generator (by author)

Using Jinja2 allows to have advanced features like template inheritance, which is used for the metadata.

This makes it easy to extend this component, not only with more options for personalities and languages, but also to extract it into its own open-source project to make it available for other Gemini projects.

Frontend

The Gemini Movie Detectives frontend is split into four main components and uses vue-router to navigate between them.

The Home component simply displays the welcome message.

The Quiz component displays the quiz itself and talks to the API via fetch. To create a quiz, it sends a POST request to api/quiz with the desired settings. The backend is then selecting a random movie based on the user settings, creates the prompt with the modular prompt generator, uses Gemini to generate the question and hints and finally returns everything back to the component so that the quiz can be rendered.

Additionally, each quiz gets a session ID assigned in the backend and is stored in a limited LRU cache.

For debugging purposes, this component fetches data from the api/sessions endpoint. This returns all active sessions from the cache.

This component displays statistics about the service. However, so far there is only one category of data displayed, which is the quiz limit. To limit the costs for VertexAI and GCP usage in general, there is a daily limit of quiz sessions, which will reset with the first quiz of the next day. Data is retrieved form the api/limit endpoint.

Vue components (by author)

API examples

Of course using the frontend is a nice way to interact with the application, but it is also possible to just use the API.

The following example shows how to start a quiz via the API using the Santa Claus / Christmas personality:

curl -s -X POST https://movie-detectives.com/api/quiz \
  -H 'Content-Type: application/json' \
  -d '{"vote_avg_min": 5.0, "vote_count_min": 1000.0, "popularity": 3, "personality": "christmas"}' | jq .

{
  "quiz_id": "e1d298c3-fcb0-4ebe-8836-a22a51f87dc6",
  "question": {
    "question": "Ho ho ho, this movie takes place in a world of dreams, just like the dreams children have on Christmas Eve after seeing Santa Claus! It's about a team who enters people's dreams to steal their secrets. Can you guess the movie? Merry Christmas!",
    "hint1": "The main character is like a skilled elf, sneaking into people's minds instead of houses. ",
    "hint2": "I_c_p_i_n "
  },
  "movie": {...}
}

Movie Detectives — Example: Santa Claus personality (by author)

This example shows how to change the language for a quiz:

curl -s -X POST https://movie-detectives.com/api/quiz \
  -H 'Content-Type: application/json' \
  -d '{"vote_avg_min": 5.0, "vote_count_min": 1000.0, "popularity": 3, "language": "german"}' | jq .

{
  "quiz_id": "7f5f8cf5-4ded-42d3-a6f0-976e4f096c0e",
  "question": {
    "question": "Stellt euch vor, es gäbe riesige Monster, die auf der Erde herumtrampeln, als wäre es ein Spielplatz! Einer ist ein echtes Urviech, eine Art wandelnde Riesenechse mit einem Atem, der so heiß ist, dass er euer Toastbrot in Sekundenschnelle rösten könnte. Der andere ist ein gigantischer Affe, der so stark ist, dass er Bäume ausreißt wie Gänseblümchen. Und jetzt ratet mal, was passiert? Die beiden geraten aneinander, wie zwei Kinder, die sich um das letzte Stück Kuchen streiten! Wer wird wohl gewinnen, die Riesenechse oder der Superaffe? Das ist die Frage, die sich die ganze Welt stellt! ",
    "hint1": "Der Film spielt in einer Zeit, in der Monster auf der Erde wandeln.",
    "hint2": "G_dz_ll_ vs. K_ng "
  },
  "movie": {...}
}

And this is how to answer to a quiz via an API call:

curl -s -X POST https://movie-detectives.com/api/quiz/84c19425-c179-4198-9773-a8a1b71c9605/answer \
  -H 'Content-Type: application/json' \
  -d '{"answer": "Greenland"}' | jq .

{
  "quiz_id": "84c19425-c179-4198-9773-a8a1b71c9605",
  "question": {...},
  "movie": {...},
  "user_answer": "Greenland",
  "result": {
    "points": "3",
    "answer": "Congratulations! You got it! Greenland is the movie we were looking for. You're like a human GPS, always finding the right way!"
  }
}

Conclusion

After I finished the basic project, adding more personalities and languages was so easy with the modular prompt approach, that I was impressed by the possibilities this opens up for game design and development. I could change this game from a pure educational game about movies, into a comedy trivia “You Don’t Know Jack”-like game within a minute by adding another personality.

Also, combining up-to-date Python functionality with validation libraries like Pydantic is very powerful and can be used to ensure good data quality for LLM input.

And there you have it, folks! You’re now equipped to craft your own LLM-powered web application.

Feeling inspired but need a starting point? Check out the open-source code for the Gemini Movie Detectives project:

Github repository for backend: https://github.com/vojay-dev/gemini-movie-detectives-api
Github repository for frontend: https://github.com/vojay-dev/gemini-movie-detectives-ui

The future of AI-powered applications is bright, and you’re holding the paintbrush! Let’s go make something remarkable. And if you need a break, feel free to try https://movie-detectives.com/.

Create an AI-Driven Movie Quiz with Gemini LLM, Python, FastAPI, Pydantic, RAG and more was originally published in Towards Data Science on Medium, where people are continuing the conversation by highlighting and responding to this story.

View the full article

Sign In