Jordan Walker

Romain Bongibault

5th year Computer Science and Networks student at INSA Toulouse

Intern at Juloa
Building an Institutional Chatbot in the Era of Generative AI

Building an Institutional Chatbot in the Era of Generative AI

Romain Bongibault
Romain Bongibault ·

The rise of artificial intelligence in education opens the way to new modes of interaction between students and institutions. At the heart of this dynamic, chatbots appear as a promising solution to answer frequently asked questions, guide newcomers, and relieve administrative services. But these tools must also deal with an unstable environment, where information is constantly evolving.

At INSA Toulouse, we undertook the creation of an institutional chatbot capable of facing these challenges. After testing different architectures, from the most basic to the most sophisticated, it was ultimately the RAG (Retrieval-Augmented Generation) model that proved to be the most suitable.

A Modern History of Chatbots

The roots of chatbots go back to Alan Turing, with his famous question about the ability of machines to think. Since ELIZA, the first conversational program born in the 1960s, progress has been spectacular. The introduction of deep learning and transformers has enabled decisive breakthroughs. Today, models like ChatGPT or Siri are capable of understanding natural language and generating remarkably fluid responses.

Among the notable architectures:

  • GPT (Generative Pretrained Transformer), the current reference in text generation
  • MoE (Mixture of Experts), used in Mixtral-8x7B
  • RNN, now outdated, but historically important

Data Collection: A Scraper for INSA

To feed our chatbot, we developed a Java scraper targeting the public sites of INSA Toulouse and its Moodle platform. Results:

  • 6,215 pages collected in 20 minutes
  • Approximately 4.45 million words extracted
  • 55% of documents with identifiable update date
Scraping
Fig. 1 - Overview of the scraping process

Some limitations arose, notably the inability to read image content, unreadable scanned PDFs, or poor extraction of complex tables. Despite everything, the volume of raw text proved sufficient for our first experiments.

First Attempt: Building a Model from Scratch

We attempted to design a language model of the SLM (Small Language Model) type with PyTorch, testing several tokenization configurations (char-level and subword).

The datasets used:

  • Shakespeare (2 MB, old English)
  • Wikipedia in French (10 GB)

The results, despite some loss improvements, remained very far from our expectations. The model did not properly learn syntax, nor could it produce logical sentences.

NameTraining DeviceLanguageDatasetFinal LossContext SizeBatch SizeDuration
Scratch-v0.0Intel Core i7-8700EnglishShakespeare (2 MB)1.12681923203:24:00
Scratch-v0.1Intel Core i7-8700EnglishShakespeare (2 MB)0.80461923206:11:00
Scratch-v1.02×GPU NVIDIA RTX A4500EnglishShakespeare (2 MB)0.90415123200:28:00
Scratch-v2.02×GPU NVIDIA RTX A4500FrenchWikipedia (10 GB)1.06035123200:27:00
Scratch-v2.32×GPU NVIDIA RTX A4500FrenchWikipedia (10 GB)0.64845124803:31:00

The main limitation was hardware: a single RTX A4500 GPU card is not enough for deep training. For results comparable to GPT-2, weeks of training on distributed infrastructure would have been necessary.

Second Attempt: Fine-tuning GPT-2

We then opted for adapting an existing model: GPT-2. The idea was to use a pre-trained model, then specialize it with an internal dataset (100 documents from our scraping).

Training was done locally, with:

  • 3 epochs
  • batch size of 2
  • Intel Core i7-13700H CPU
  • specific HuggingFace tokenizer

Despite these efforts, only 3.82% of responses were fully satisfactory according to two human evaluators. Moreover, any data update would require tedious and energy-intensive retraining.

RAG: The Hybrid Solution That Changes Everything

The Retrieval-Augmented Generation approach combines the best of both worlds: semantic search and response generation.

The process:

  1. Documents are split into 1000-character segments
  2. Each segment is transformed into a vector via MiniLM
  3. Vectors are indexed in FAISS
  4. When queried, the closest segments are extracted and injected with the question into Mixtral-8x7B
RAG Pipeline
Fig. 2 - RAG Pipeline

Main advantage: the model can rely on up-to-date documents, without requiring retraining.

Testing the Assistant: IAN

We developed an interface with Streamlit, giving birth to IAN (INSA Artificial Intelligence). The interface requires:

  • Responses in French
  • Formal and concise tone
  • Clear indication when information is unavailable
IAN Interface

On a test of 8 questions, 7 responses were deemed relevant, both for questions about regulations and for general interactions.

Evaluating Relevance: Contradictions and Context

We developed several tools to detect system weaknesses:

  • Cosine similarity calculation between chunks
  • Re-rankers by cross-encoder (MiniLM)
  • Binary classifier to estimate whether a question is answerable

An interesting discovery: the FAISS score range can indicate whether extracted documents are useful. A narrow range = low relevance; a wide range = diversity of responses, thus good coverage.

Conclusion: A Promising Model

Our work demonstrates that RAG, combined with a well-structured database, can become a reliable tool to help students in an academic setting. It far surpasses locally trained or fine-tuned models.

Certainly, everything is not perfect. We will need to:

  • Improve ambiguity management
  • Add structured sources (schedules, SQL databases)
  • Integrate multimodality (images, forms)

But the foundations are solid.

Technical Perspectives

We are considering several improvements:

  • Divide the FAISS index into specialized sub-indexes
  • Use a classifier to trigger query rewriting or not
  • Explore structured data to enrich responses

These avenues open the way to a powerful and agile school assistant.

Acknowledgments

Thanks to Philippe Leleux, Eric Alata and Céline Peyraube for their support. This project was conducted seriously, and no generative AI tool was used for research or analysis. Only to improve text clarity.

Building a Chatbot for Student Guidance.

Building an Institutional Chatbot in the Era of Generative AI