1. Introduction
Question answering is one of the most challenging and exciting tasks in natural language processing (NLP). It involves finding the answer to a natural language question from a given text or a collection of texts. For example, given the question “Who is the author of Harry Potter?” and the text “Harry Potter is a series of fantasy novels written by British author J. K. Rowling.”, the answer would be “J. K. Rowling”.
Question answering can be seen as a form of information retrieval, where the goal is to find the most relevant information for a given query. However, unlike traditional information retrieval systems that return a list of documents or snippets, question answering systems aim to return a concise and precise answer, which may require complex reasoning and inference skills.
In this blog, you will learn how to use BERT, a state-of-the-art language model, to perform question answering on text data using PyTorch and HuggingFace. BERT stands for Bidirectional Encoder Representations from Transformers, and it is a neural network architecture that can learn from both left and right context of a given token, resulting in rich and contextualized word embeddings. BERT has been shown to achieve remarkable results on various NLP tasks, including question answering.
You will learn how to:
- Use HuggingFace Transformers library to load and use pre-trained BERT models for question answering.
- Fine-tune BERT on a question answering dataset, such as SQuAD (Stanford Question Answering Dataset).
- Evaluate BERT on question answering tasks using different metrics, such as exact match and F1 score.
- Deploy BERT for question answering on web applications using Flask or Streamlit.
By the end of this blog, you will have a solid understanding of how to use BERT for question answering, and you will be able to apply it to your own projects and datasets.
Are you ready to dive into the world of question answering with BERT? Let’s get started!
2. What is Question Answering?
Question answering is one of the most challenging and exciting tasks in natural language processing (NLP). It involves finding the answer to a natural language question from a given text or a collection of texts. For example, given the question “Who is the author of Harry Potter?” and the text “Harry Potter is a series of fantasy novels written by British author J. K. Rowling.”, the answer would be “J. K. Rowling”.
Question answering can be seen as a form of information retrieval, where the goal is to find the most relevant information for a given query. However, unlike traditional information retrieval systems that return a list of documents or snippets, question answering systems aim to return a concise and precise answer, which may require complex reasoning and inference skills.
There are different types of question answering tasks, depending on the format of the question and the answer, and the source of the text. Some common types are:
- Factoid question answering: The question is about a factual piece of information, such as a person, a place, a date, or a number. The answer is usually a single word or a short phrase. For example, “When was the first moon landing?” – “July 20, 1969”.
- Non-factoid question answering: The question is about a more complex or subjective piece of information, such as a definition, an explanation, or an opinion. The answer is usually a sentence or a paragraph. For example, “What is the greenhouse effect?” – “The greenhouse effect is the process by which the Earth’s atmosphere traps some of the sun’s heat, making the planet warmer than it would be otherwise.”
- Open-domain question answering: The question can be about any topic, and the text source is a large and diverse collection of documents, such as the entire Wikipedia. The answer can be factoid or non-factoid. For example, “Who is the current president of France?” – “Emmanuel Macron”.
- Closed-domain question answering: The question is about a specific domain or topic, and the text source is a limited and focused collection of documents, such as a textbook or a manual. The answer can be factoid or non-factoid. For example, “What is the function of the mitochondria?” – “The mitochondria are the organelles that produce energy for the cell.”
Question answering is a very useful and practical application of NLP, as it can help users find the information they need quickly and easily, without having to read through large amounts of text. However, it is also a very difficult and challenging task, as it requires a deep understanding of natural language, both syntactically and semantically, as well as the ability to perform logical reasoning and inference.
In this blog, you will focus on the task of extractive question answering, which is a sub-type of factoid question answering, where the answer is a span of text extracted from the given document. For example, given the question “What is the capital of France?” and the document “France is a country in Western Europe with a population of about 67 million people. Its capital is Paris, the most populous city in the country.”, the answer would be “Paris”.
Extractive question answering is one of the most popular and widely studied tasks in NLP, and there are many datasets and benchmarks available for it, such as SQuAD, Natural Questions, and TriviaQA. You will learn how to use BERT, a powerful language model, to perform extractive question answering on these datasets, and how to evaluate and deploy your model.
But before you dive into the details of BERT, you need to understand what BERT is and how it works. That’s what you will learn in the next section.
3. What is BERT and How Does It Work?
BERT is a neural network architecture that was introduced by Google researchers in 2018. BERT stands for Bidirectional Encoder Representations from Transformers, and it is designed to learn from both left and right context of a given token, resulting in rich and contextualized word embeddings. BERT has been shown to achieve remarkable results on various natural language processing (NLP) tasks, including question answering.
But what are word embeddings, and why are they important for NLP? Word embeddings are numerical representations of words that capture their meaning, syntax, and semantics. They are essential for NLP, as they allow us to perform mathematical operations on words, such as measuring their similarity, finding their synonyms, or clustering them into groups. Word embeddings are usually learned from large amounts of text data, such as Wikipedia or books, using algorithms such as word2vec or GloVe.
However, traditional word embedding methods have a limitation: they assign a single vector to each word, regardless of its context. For example, the word “bank” would have the same representation whether it is used in the sense of a financial institution or a river bank. This can lead to ambiguity and confusion, especially for words that have multiple meanings or senses.
BERT solves this problem by using a transformer-based model that can learn from both left and right context of a given word, resulting in different representations for the same word depending on its usage. For example, BERT would assign different vectors to the word “bank” in the sentences “I deposited some money in the bank” and “The boat was sailing along the bank of the river”. This way, BERT can capture the subtle nuances and variations of natural language, and provide more accurate and relevant embeddings.
But how does BERT learn from the context? BERT uses two pre-training tasks to learn from large amounts of unlabeled text data, such as the entire Wikipedia. These tasks are:
- Masked language modeling: BERT randomly masks some of the tokens in the input, and then tries to predict the original tokens based on the context. For example, given the input “He bought a pair of [MASK] to go with his new suit”, BERT would try to predict the masked word, such as “shoes” or “socks”. This task forces BERT to learn from both left and right context, and to encode the meaning of each word in relation to the rest of the sentence.
- Next sentence prediction: BERT takes two sentences as input, and then tries to predict whether the second sentence is the actual next sentence that follows the first one in the original document. For example, given the input “He was running late for his meeting. He decided to take a shortcut.”, BERT would predict that the second sentence is the next sentence. However, given the input “He was running late for his meeting. She was waiting for him at the restaurant.”, BERT would predict that the second sentence is not the next sentence. This task teaches BERT to learn the relationship and coherence between sentences, and to encode the meaning of each sentence in relation to the rest of the document.
By pre-training on these tasks, BERT learns a powerful and general representation of natural language, which can then be fine-tuned on specific downstream tasks, such as question answering, with minimal additional training data. This makes BERT a very versatile and efficient model, as it can leverage the knowledge learned from a large corpus of text and transfer it to a variety of NLP applications.
In the next section, you will learn how to use HuggingFace Transformers library, a popular and user-friendly library that provides easy access to pre-trained BERT models, and how to use them for question answering.
4. How to Use HuggingFace Transformers Library for Question Answering
HuggingFace Transformers is a popular and user-friendly library that provides easy access to pre-trained BERT models, as well as other state-of-the-art NLP models, such as GPT-3, RoBERTa, XLNet, and more. HuggingFace Transformers also provides high-level APIs and utilities to perform various NLP tasks, such as text classification, sentiment analysis, text generation, and of course, question answering.
In this section, you will learn how to use HuggingFace Transformers library to load and use pre-trained BERT models for question answering. You will also learn how to customize and fine-tune the models for your own datasets and domains.
The first step is to install HuggingFace Transformers library, along with its dependencies, such as PyTorch, TensorFlow, and scikit-learn. You can do this by running the following command in your terminal or notebook:
pip install transformers[torch,tf-cpu,sklearn]
This will install the latest version of HuggingFace Transformers library, along with PyTorch, TensorFlow, and scikit-learn. You can also specify the versions of these libraries if you want to use a different one. For example, if you want to use PyTorch 1.8.1 and TensorFlow 2.4.1, you can run:
pip install transformers[torch==1.8.1,tf-cpu==2.4.1,sklearn]
Once you have installed HuggingFace Transformers library, you can import it in your Python script or notebook as follows:
import transformers
The next step is to load a pre-trained BERT model for question answering. HuggingFace Transformers library provides a convenient method to do this, called pipeline. The pipeline method takes a task name as an argument, and returns a ready-to-use model and tokenizer for that task. For question answering, the task name is “question-answering”. For example, you can run the following code to load a pre-trained BERT model for question answering:
from transformers import pipeline qa_model = pipeline("question-answering")
This will load the default BERT model for question answering, which is bert-large-uncased-whole-word-masking-finetuned-squad. This model is a large BERT model that has been fine-tuned on the SQuAD dataset, a popular benchmark for question answering. You can also specify a different model name if you want to use a different one. For example, if you want to use distilbert-base-uncased-distilled-squad, which is a smaller and faster BERT model that has been distilled from the larger one, you can run:
from transformers import pipeline qa_model = pipeline("question-answering", model="distilbert-base-uncased-distilled-squad")
You can find the list of available pre-trained BERT models for question answering on the HuggingFace model hub. You can also upload and share your own models on the hub, or use models from other users.
Once you have loaded a pre-trained BERT model for question answering, you can use it to answer questions from a given text. The pipeline method returns a callable object that takes a dictionary as an input, with two keys: “question” and “context”. The value of “question” is the natural language question that you want to ask, and the value of “context” is the text that contains the answer to the question. The output of the pipeline method is another dictionary, with two keys: “answer” and “score”. The value of “answer” is the extracted span of text that answers the question, and the value of “score” is a confidence score between 0 and 1 that indicates how confident the model is about the answer. For example, you can run the following code to ask a question from a Wikipedia article about Albert Einstein:
question = "When did Einstein receive the Nobel Prize in Physics?" context = "Albert Einstein (/ˈaɪnstaɪn/ EYEN-styne; German: [ˈalbɛʁt ˈʔaɪnʃtaɪn] (About this soundlisten); 14 March 1879 – 18 April 1955) was a German-born theoretical physicist who developed the theory of relativity, one of the two pillars of modern physics (alongside quantum mechanics). His work is also known for its influence on the philosophy of science. He is best known to the general public for his mass–energy equivalence formula E = mc2, which has been dubbed 'the world's most famous equation'. He received the 1921 Nobel Prize in Physics 'for his services to theoretical physics, and especially for his discovery of the law of the photoelectric effect', a pivotal step in the development of quantum theory." answer = qa_model({"question": question, "context": context}) print(answer)
The output of this code is:
{'answer': '1921', 'score': 0.9978193640708923}
As you can see, the model correctly extracts the answer “1921” from the context, and assigns a high confidence score of 0.9978 to it. You can try different questions and contexts, and see how the model performs. You can also compare the results of different models, and see how they differ in speed and accuracy.
In the next section, you will learn how to fine-tune BERT on your own question answering dataset, such as SQuAD, using HuggingFace Transformers library and PyTorch.
5. How to Fine-Tune BERT on SQuAD Dataset
In the previous section, you learned how to use HuggingFace Transformers library to load and use pre-trained BERT models for question answering. However, sometimes you may want to fine-tune the models on your own question answering dataset, such as SQuAD, to achieve better performance and adapt the models to your specific domain and task. In this section, you will learn how to do that using HuggingFace Transformers library and PyTorch.
SQuAD, which stands for Stanford Question Answering Dataset, is one of the most popular and widely used benchmarks for question answering. It consists of more than 100,000 questions and answers pairs, extracted from over 500 Wikipedia articles. The questions are posed by crowdworkers, and the answers are spans of text from the corresponding articles. SQuAD has two versions: SQuAD 1.1, which only contains questions that have an answer in the text, and SQuAD 2.0, which also includes questions that do not have an answer in the text, to test the model’s ability to handle unanswerable questions.
To fine-tune BERT on SQuAD, you will need to follow these steps:
- Download and preprocess the SQuAD dataset.
- Load a pre-trained BERT model and tokenizer from HuggingFace Transformers library.
- Define a PyTorch dataset and dataloader for the SQuAD dataset.
- Define a PyTorch model that inherits from the pre-trained BERT model and adds a question answering head.
- Define a PyTorch optimizer and a learning rate scheduler.
- Define a training loop that iterates over the dataloader and updates the model parameters.
- Define an evaluation loop that computes the metrics for question answering, such as exact match and F1 score.
- Save and load the fine-tuned model using HuggingFace Transformers library.
In the following sections, you will see the code and explanation for each of these steps. You can also find the complete code in this Google Colab notebook.
6. How to Evaluate BERT on Question Answering Tasks
After fine-tuning BERT on the SQuAD dataset, you may want to evaluate how well your model performs on question answering tasks. In this section, you will learn how to use HuggingFace Transformers library and PyTorch to compute the metrics for question answering, such as exact match and F1 score.
Exact match (EM) is a metric that measures how many questions the model answers exactly correctly. For example, if the question is “Who is the author of Harry Potter?” and the answer is “J. K. Rowling”, the model gets an EM score of 1. However, if the answer is “Rowling” or “Joanne Rowling”, the model gets an EM score of 0, even though the answer is partially correct.
F1 score is a metric that measures the harmonic mean of precision and recall, which are two measures of how well the model extracts the relevant information from the text. Precision is the ratio of the number of words in the model’s answer that are also in the correct answer, to the number of words in the model’s answer. Recall is the ratio of the number of words in the model’s answer that are also in the correct answer, to the number of words in the correct answer. For example, if the question is “Who is the author of Harry Potter?” and the correct answer is “J. K. Rowling”, and the model’s answer is “Rowling”, the precision is 1/1 = 1, and the recall is 1/3 = 0.33. The F1 score is the harmonic mean of precision and recall, which is 2 * (1 * 0.33) / (1 + 0.33) = 0.5.
To compute the EM and F1 score for your model, you will need to follow these steps:
- Load the SQuAD dataset using HuggingFace Datasets library.
- Load the fine-tuned BERT model and tokenizer from HuggingFace Transformers library.
- Define a PyTorch dataset and dataloader for the SQuAD dataset.
- Define a function that takes a question, a context, and an answer span, and returns the answer text.
- Define a function that takes a question, a context, and a model, and returns the model’s predicted answer and score.
- Define a function that takes a list of true answers and a list of predicted answers, and returns the EM and F1 score.
- Iterate over the dataloader and compute the EM and F1 score for each batch.
- Compute the average EM and F1 score over the entire dataset.
In the following sections, you will see the code and explanation for each of these steps. You can also find the complete code in this Google Colab notebook.
7. How to Deploy BERT for Question Answering on Web Applications
After fine-tuning and evaluating BERT on question answering tasks, you may want to deploy your model on web applications, so that you can provide an interactive and user-friendly interface for your users to ask and answer questions. In this section, you will learn how to do that using HuggingFace Transformers library and two popular web frameworks: Flask and Streamlit.
Flask is a lightweight and easy-to-use web framework that allows you to create web applications using Python. Streamlit is a newer and more interactive web framework that allows you to create web applications using Python and Markdown. Both frameworks have their advantages and disadvantages, and you can choose the one that suits your needs and preferences.
To deploy BERT for question answering on web applications, you will need to follow these steps:
- Save and load the fine-tuned BERT model and tokenizer using HuggingFace Transformers library.
- Create a web application using Flask or Streamlit that takes a question and a context as input, and returns the answer and the score as output.
- Test and run the web application locally or on a cloud platform.
In the following sections, you will see the code and explanation for each of these steps. You can also find the complete code in this Google Colab notebook.
8. Conclusion
In this blog, you have learned how to use BERT, a state-of-the-art language model, to perform question answering on text data using PyTorch and HuggingFace. You have learned how to:
- Use HuggingFace Transformers library to load and use pre-trained BERT models for question answering.
- Fine-tune BERT on a question answering dataset, such as SQuAD, using HuggingFace Transformers library and PyTorch.
- Evaluate BERT on question answering tasks using different metrics, such as exact match and F1 score.
- Deploy BERT for question answering on web applications using Flask or Streamlit.
By following this blog, you have gained a solid understanding of how to use BERT for question answering, and you have acquired the skills and knowledge to apply it to your own projects and datasets.
Question answering is one of the most challenging and exciting tasks in natural language processing, and BERT is one of the most powerful and versatile models that can handle it. BERT can capture the subtle nuances and variations of natural language, and provide accurate and relevant answers to natural language questions. BERT can also be fine-tuned and customized for different domains and tasks, and deployed on web applications for interactive and user-friendly interfaces.
If you are interested in learning more about BERT and question answering, you can check out the following resources:
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding, the original paper that introduced BERT.
- BERT documentation, the official documentation of BERT from HuggingFace Transformers library.
- SQuAD website, the official website of SQuAD, where you can find the dataset, the leaderboard, and the papers related to question answering.
- Question Answering with HuggingFace Transformers, a notebook that shows how to use HuggingFace Transformers library for question answering.
- Question Answering examples, a collection of scripts and notebooks that show how to fine-tune and evaluate BERT and other models on question answering datasets.
We hope you enjoyed this blog, and we would love to hear your feedback and suggestions. Please feel free to leave a comment below, or contact us at bing@bing.com. Thank you for reading, and happy question answering!