An automatic short-answer grading system is designed to evaluate and grade short answers provided by students or test-takers in a way that mimics human grading. Various techniques of NLP are implemented to assess the answers' content, coherence, and relevance.

Components

  1. Preprocessing:
    1. Tokenization
    2. Stopword Removal
    3. Lemmatization/Stemming
  2. Feature Extraction:
    1. Bag of Words(BoW)
    2. TF-IDF (Term Frequency-Inverse Document Frequency)
    3. Word Embeddings
  3. Similarity Measurement:
    1. Cosine Similarity
    2. Semantic Similarity
  4. Grading Method:
    1. Rule-based Systems
    2. Machine Learning Models
    3. Deep Learning Models
  5. Evaluation Metrics:
    1. Precision and Recall
    2. F1 Score
    3. Cohen’s Kappa

Tools and Libraries

  1. NLP Libraries:
    1. NLTK
    2. SpaCy
    3. Gensim
  2. Machine Learning Framework:
    1. Scikit-learn
    2. Tensorflow
    3. PyTorch
  3. Pre-Trained Deeplearning Models:
    1. Word2Vec
    2. Glove (Global Vectors for Word Representation)
    3. FastText
    4. ELMo (Embeddings from Language Models)
    5. BERT (Bidirectional Encoder Representations from Transformers)

Word Embedding Methods

(inc. Explanation, Differences, Pros and Cons)

1. Word2Vec

  1. Explanation: Word2Vec, developed by Google, consists of two main models: Continuous Bag of Words (CBOW) and Skip-gram.
  2. Differences:
  3. Pros:
  4. Cons:

2. GloVe (Global Vectors for Word Representation)