ResumeGPT

ResumeGPT is a Python package designed to extract structured information from a PDF Curriculum Vitae (CVs)/Resumes documents. It leverages OCR technology and utilizes the capabilities of ChatGPT AI language model (GPT-3.5 and GPT-4) to extract pieces of information from the CV content and organize them in a structured Excel-friendly format.

Features

Extracts text from PDF CVs: Uses OCR technology to extract the CV's PDF content as text.
Extracts information using GPT: Sends the extracted text to GPT for information extraction according to a predefined prompt.
Structures information to Excel file: Processes the extracted information from GPT and structures it from JSON into a Excel-friendly format.

Module Overview

OCR Reader (CVsReader module): This process reads CVs from a specified directory and extracts the text from PDF files.
Engineered Prompt and ChatGPT Pipeline (CVsInfoExtractor module): This process takes as an input the extracted text generated by the OCR Reader and extracts specific information using ChatGPT in a JSON format.
Extracted Information Structuring (CVsInfoExtractor module): This process takes the JSON output from the ChatGPT Pipeline, which contains the information extracted from each CV. This information is then structured and organized into a clear and easy-to-understand Excel format.

Requirements

Python: Python 3.8 or newer.
GPT-4 API Access: If GPT-3.5 tokens don not fit the CV content, the package uses GPT-4 to extract the information from the CVs, so you'll need an access to the GPT-4 API.

How to Use

Prepare Your CVs: Make sure all the CVs you want to analyze are in the “CVs” directory.
Run the Script: Run the following scripts. This will clone the project, prepare the environment, and execute the code.

Clone the project

git clone https://github.com/Aillian/ResumeGPT.git

CD project directory

cd ResumeGPT

Create a virtual environment

python -m venv resumegpt_venv

Activate the virtual environment

source resumegpt_venv/Scripts/activate

Upgrade pip version

pip install --upgrade pip

Install requirements.txt

pip install -r requirements.txt

CD codes directory

cd ResumeGPT

Run main.py and provide the 3 required arguments:
- CVs Directory Path: use "../CVs" to read from 'CVs' directory
- Openai API Key: should include GPT-4 model access
- Desired Positions: written like the following "Data Scientist,Data Analyst,Data Engineer"

python main.py "../CVs" "sk-ldbuDCjkgJHiFnbLVCJvvcfKNBDFJTYCVfvRedevDdf" "Data Scientist, Data Analyst, Data Engineer"

Examine the Results: After the script finishes, you will find the output in “Output” directory which are two file (CSV & Excel) of the extracted information from each CV.

Extracted Information

ResumeGPT is designed to extract 23 features from each CV:

Education:

Education Bachelor University: name of university where bachelor degree was taken
Education Bachelor GPA: GPA of bachelor degree (Example: 4.5/5)
Education Bachelor Major: major of bachelor degree
Education Bachelor Graduation Date: date of graduation from bachelor degree (in format: Month_Name, YYYY)
Education Masters University: name of university where masters degree was taken
Education Masters GPA: GPA of masters degree (Example: 4.5/5)
Education Masters Major: major of masters degree
Education Masters Graduation Date: date of graduation from masters degree (in format: Month_Name, YYYY)
Education PhD University: name of university where PhD degree was taken
Education PhD GPA: GPA of PhD degree (Example: 4.5/5)
Education PhD Major: major of PhD degree
Education PhD Graduation Date: date of graduation from PhD degree (in format: Month_Name, YYYY)

Work Experience:

Years of Experience: total years of experience in all jobs (Example: 3)
Experience Companies: list of all companies that the candidate worked with (Example: [Company1, Company2])
Top 5 Responsibilities/Projects Titles: list of top 5 responsibilities/projects titles that the candidate worked on (Example: [Project1, Project2, Project3, Project4, Project5])

Courses/Certifications:

Top 5 Courses/Certifications Titles: list of top 5 courses/certifications titles that the candidate took (Example: [Course1, Course2, Course3, Course4, Course5])

Skills:

Top 3 Technical Skills: list of top 3 technical skills (Example: [Skill1, Skill2, Skill3])
Top 3 Soft Skills: list of top 3 soft skills (Example: [Skill1, Skill2, Skill3])

Employment Status:

Current Employment Status: one of the following (Full-time, Part-Time, Intern, Freelancer, Consultant, Unemployed)

Personal Information:

Nationality: nationality of the candidate
Current Residence: where the candidate currently live

Suitable Position:

Suitable Position: the most suitable position for the candidate, this will be taken from the user and dynamically replaced in the prompt

Rating Score:

Candidate Rating (Out of 10): score of the candidate suitability for the classified position in point 19 (Example: 7.5)

This information is then organized into a structured Excel file.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Possible additional features and optimizations:

Add additional features to the prompt.
Handling exceeded tokens limit, by further cleansing cv content.
The code tries to call gpt-3.5-turbo model first, if token limit exceeds the acceptable limit, it calls gpt-4. But this has some problems: 1- it is costly 2- what if the provided API key does not have access to gpt-4 model?
Catching GPT-4 "service is down" error by calling the API again after some sleeping time.
Can the prompt be reduced so we save some tokens for the cv content?
Separating "Information To Extract" in the prompt to a different file so the user gets the flexibility of adding new features and then dynamically imputing it into the prompt after that the added features in "CVs_Info_Extracted.csv" should be reflected as column names in the csv file.
Additional errors handling.
What about extending the usage to other LLMs?

License

ResumeGPT is released under the MIT License. See the LICENSE file for more details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ResumeGPT

Features

Module Overview

Requirements

How to Use

Extracted Information

Contributing

License

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
CVs		CVs
Engineered_Prompt		Engineered_Prompt
Output		Output
ResumeGPT		ResumeGPT
ResumeGPT_Workflow		ResumeGPT_Workflow
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

License

Aillian/ResumeGPT

Folders and files

Latest commit

History

Repository files navigation

ResumeGPT

Features

Module Overview

Requirements

How to Use

Extracted Information

Contributing

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages