Building a Dutch QA Bot on Consumer Hardware

Sep 2, 2024

Case Study: Building a Dutch QA Bot on Consumer Hardware

Introduction

Since the introduction of ChatGPT, the use of large language models (LLMs) has gained significant popularity. For instance, tools like Copilot, which assist with tasks, are becoming mainstream. Businesses are also adopting LLMs in their processes to save costs and reduce man-hours.

The government is interested in utilizing this emerging technology due to its potential, but it faces several challenges. One major issue is data security and privacy, particularly when using closed-source models on external servers. Another challenge is that most LLMs perform well in English but are limited in their proficiency in Dutch, making it difficult for the Dutch government to adopt this technology.

To address these issues, we have deployed an open-source LLM on our own hardware as the foundation for a Dutch QA bot. This serves as a first step to demonstrate that we can run such a system locally, thereby overcoming data security concerns, and that our bot shows promising fluency in Dutch.

Challenges

Developing a Dutch Question-Answering (QA) bot that operates efficiently on own consumer-grade hardware posed several challenges, particularly in handling Dutch language intricacies and ensuring optimal performance.

The Solution

By leveraging advanced techniques, we created a scalable QA bot using an 8-bit quantized 13 billion-parameter Llama large language model (LLM) and a Retrieval-Augmented Generation (RAG) architecture. This architecture consisted of several key components, as depicted in Figure 1 below:

Figure 1: Layout of RAG Architecture

The RAG architecture included:

  • A ChromaDB vector database connected to an OpenAI text embedding model (text-embedding-3-large) for vectorizing contextual data provided through a Streamlit UI.

  • In-memory caching for storing and retrieving historical prompts.

  • A logging module for monitoring application performance and debugging.

  • The Llama 13B LLM, which generated responses conditioned by the contextual data.

  • The entire pipeline orchestrated with Langchain in Python, triggered by user queries via the UI.

The bot was built and deployed on a consumer-grade desktop equipped with an Intel i9 9900K processor, an NVIDIA RTX2070 GPU, and 16 GB of RAM. The 13B Llama model, trained on a limited corpus of Dutch, was fine-tuned with Dutch instructions. By utilizing an 8-bit quantized version of the model, the memory requirement was reduced to 9 GB (52GB), allowing it to run efficiently on the available hardware.

The Result

The Dutch QA bot demonstrated high accuracy with a cosine similarity of over 0.8 compared to ChatGPT-3.5 responses. Despite the limitations of consumer hardware, the bot's performance was robust and reliable, marking a significant step towards more accessible and efficient natural language processing (NLP) solutions.

What We Learned

Balancing computational efficiency with language proficiency was key. The use of a quantized model proved to be cost-effective without compromising performance. The RAG architecture effectively integrated contextual information, enhancing the bot's response accuracy.

What We Liked Best

The deployment of the bot on local hardware, bypassing the need for expensive server farms. Delivering reliable performance on widely accessible hardware highlights the potential of this relatively new technology for government use, with data security and privacy in mind.

For more detailed insights into the project, visit the original LinkedIn post.

© 2024 Thisworkz BV

© 2024 Thisworkz BV

© 2024 Thisworkz BV