An automatic short-answer grading system is designed to evaluate and grade short answers provided by students or test-takers in a way that mimics human grading. Various techniques of NLP are implemented to assess the answers' content, coherence, and relevance.

Components

Preprocessing:
1. Tokenization
2. Stopword Removal
3. Lemmatization/Stemming
Feature Extraction:
1. Bag of Words(BoW)
2. TF-IDF (Term Frequency-Inverse Document Frequency)
3. Word Embeddings
Similarity Measurement:
1. Cosine Similarity
2. Semantic Similarity
Grading Method:
1. Rule-based Systems
2. Machine Learning Models
3. Deep Learning Models
Evaluation Metrics:
1. Precision and Recall
2. F1 Score
3. Cohen’s Kappa

Tools and Libraries

NLP Libraries:
1. NLTK
2. SpaCy
3. Gensim
Machine Learning Framework:
1. Scikit-learn
2. Tensorflow
3. PyTorch
Pre-Trained Deeplearning Models:
1. Word2Vec
2. Glove (Global Vectors for Word Representation)
3. FastText
4. ELMo (Embeddings from Language Models)
5. BERT (Bidirectional Encoder Representations from Transformers)

Word Embedding Methods

(inc. Explanation, Differences, Pros and Cons)

1. Word2Vec

Explanation: Word2Vec, developed by Google, consists of two main models: Continuous Bag of Words (CBOW) and Skip-gram.
- CBOW: Predicts the target word from its context.
- Skip-gram: Predicts the context from the target word.
Differences:
- CBOW is faster and works better with smaller datasets.
- Skip-gram is slower but performs better with infrequent words.
Pros:
- Captures semantic relationships (e.g., king - man + woman = queen).
- Efficient to train and use.
Cons:
- Context is limited to a fixed-size window.
- Ignores word order within the context window.

Components

Tools and Libraries

Word Embedding Methods

(inc. Explanation, Differences, Pros and Cons)

1. Word2Vec

2. GloVe (Global Vectors for Word Representation)