ExamSearch is a website that allows Cambridge students to go through past Cambridge exam papers and search them for content to study.
Currently, ExamSearch only supports Biology 9700 Multiple Choice Papers, but support for free-response and other types of papers in Biology 9700 as well as for other subjects is on the roadmap.
It is important to note that the following instructions for installation are simply for creating your own instance of the server A working copy of this project can be found at my website
If you'd like your own version of the parser, continue following the directions. Clone the repository, and install the requirements listed below.
Well, this is a Python project, so I guess it's expected for you have to Python. Make sure you get Python 3!
Follow installation instructions here. This is the main OCR tool used. Make sure to add this to the path! Otherwise, you will have issues.
Download the latest binary for Windows here.
You can find binaries for other systems with a Google search.
For this library, you just need to extract the package and add the bin
folder to the path.
Python 3
pip install pyocr
PyOCR is necessary for the majority of the heavy lifting as it is the wrapper between tesseract-ocr and Python. Installing PyOCR also installs Pillow, which is also used.
Python 3
pip install pdf2image
If you're stuck with installing pdf2image, this is the Github page. It details out the dependencies for pdf2image as well
Python 3
pip install nltk
Before you run main.py
, make sure you download the stopwords
corpus via
import nltk
nltk.download('stopwords')
You only need to run this once before you run main.py
.
- First, you will need to run
initializeDirectories()
frommain.py
in order to download all past Cambridge papers from, currently, the Biology section on PapaCambridge. This will allow you to proceed with the next steps. - Grab any multiple choice pdf file path and feed it through
pdfToText(filePath)
- run
getMultipleChoiceQuestions(filePath)
in order to, well, get the multiple choice questions - Run
tagImage(filePath)
in order to get image tags for each question into a database - Finally, you may run
search()
in order to search for questions!search()
has been moved to theapp/routes.py
because it is part of the website search algorithm now
Happy studying!
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Currently, Exam Search is able to parse a majority of Biology 9700 Multiple Choice papers.
- Expand Exam Search to all Biology 9700 papers
- Expand Exam Search to other subjects like A Level History
- Pull up the mark scheme alongside the question
- Index the Biology textbook and pull up relevant paragraphs from the text to be used to answer the question
- This feature is the end goal for Exam Search at least for Biology 9700
- Work on UI/Create a good-looking application For further details, visit this page for the complete roadmap.
Head Developer - Nithish Narasimman