
What Challenges We Faced?
Dynamic Formats
- • Resumes come in many different formats and layouts
- • Each resume can have a unique structure and organization
- • Traditional PDF parsers struggle to accurately extract data from such varied formats
Word-to-Word Accurate Work Experience Extraction
- • Exact word-for-word extraction is critical - no words can be missed or added
- • Final Innowhyte resume must maintain identical content with only layout changes
- • Process works well for short resumes but becomes complex with longer documents
Information Ambiguity
- • Resumes often contain ambiguous or missing information
- • Critical details like employment dates and client names may be omitted
- • Final Innowhyte resume requires complete and unambiguous information
How We Solved Them?
Layout Analysis & Human-in-the-Loop
- • Built custom layout analysis using Visual Language Models (VLMs) to understand resume visual layouts
- • Automatically categorizes resumes as complex or simple based on layout analysis
- • Leverages human-in-the-loop annotation for complex resumes to guide PDF parsing order
Benefit
This approach ensures our system can adapt to and process new complex resume layouts that we haven't encountered before.
Sequential Chunked Extraction
- • AI models perform better with smaller, focused prompts
- • PDF content is meaningfully chunked into smaller sections
- • Each chunk is sequentially processed by the LLM to extract work experience
- • This approach achieves word-for-word accuracy in experience extraction
Benefit
This approach ensures reliability and allows us to be model agnostic and not have to maintain different prompts for different models.
Validation & Human-after-the-Loop
- • Generated Word document format for easy editing by Recruiters
- • Implemented validation checks to flag missing or ambiguous information with document comments
- • Enables Recruiters to follow up with candidates and update documents as needed
Benefit
Adds trust and control to the system allowing the Recruiter to review the document and make changes if needed.
Let's look at the workflow in detail

Future Research & Improvements
- As models are getting better, we would like to experiment by passing the PDF directly to the LLM and see if it works better and simplifies the workflow.