QA Lead — AI Systems & Models Testing
Quality Assurance • Artificial Intelligence • Contract Position
Contract |
Montreal, QC |
AI / ML Testing |
LLM / RAG / LangChain |
ABOUT THE ROLE
We are seeking an experienced QA Lead with deep expertise in AI systems testing to join our team on a contract basis in Montreal, Québec. This role sits at the intersection of quality engineering and artificial intelligence, requiring hands-on proficiency in LLM behavior analysis, RAG pipeline validation, and modern AI orchestration frameworks. You will own the end-to-end test strategy for complex AI products and help define quality standards in a rapidly evolving space.
MUST-HAVE SKILLS
- Proven QA leadership experience designing and executing test strategies for AI/ML systems or LLM-powered applications.
- Strong understanding of LLM internals: tokenization, embeddings, attention mechanisms, and inference behavior to anticipate and diagnose failure modes.
- Hands-on experience with prompt engineering — constructing effective prompts, detecting hallucinations, and evaluating outputs across accuracy, tone, coherence, and bias dimensions.
- Experience testing RAG pipelines and knowledge base integrations, including validation of data quality and retrieval accuracy as they impact model outputs.
- Familiarity with vector database mechanics: similarity search thresholds, embedding drift, near-duplicate documents, and sparse vs. dense embeddings.
- Practical experience with LangChain and/or LangGraph — able to read chain/graph construction code, identify failure points, and write test harnesses.
- Ability to validate MCP (Model Context Protocol) integration points, including tool availability and error-handling scenarios.
- Proficiency applying generative AI evaluation metrics and establishing quality thresholds appropriate for production AI systems.
- Excellent written and verbal communication in English; bilingualism (English/French) is a plus for the Montreal market.
NICE-TO-HAVE SKILLS
- Experience with bias detection and safety testing frameworks for AI systems.
- Exposure to performance and scalability testing of vector databases under high load.
- Familiarity with CI/CD pipelines for ML model deployment and automated regression testing.
- Knowledge of responsible AI principles and AI governance frameworks.
- Contributions to or experience with open-source AI testing or evaluation tooling (e.g., DeepEval, Ragas, PromptFlow).
- Background in data engineering or data quality practices relevant to AI pipeline inputs.
- Cloud platform experience (AWS, Azure, or GCP) in the context of deploying or testing AI workloads.
KEY RESPONSIBILITIES
- Lead design and execution of comprehensive test strategies across AI systems, including prompt evaluation, output quality assessment, and bias/safety analysis.
- Develop and maintain test harnesses for LangChain and LangGraph-based applications; review chain and graph construction code to proactively surface integration risks.
- Validate RAG pipeline integrity — data ingestion, chunking, retrieval accuracy, and embedding consistency — and define edge-case coverage for vector database interactions.
- Establish and track generative AI quality metrics and thresholds; report on model output quality across multiple evaluation dimensions.
- Collaborate with ML engineers, data scientists, and product teams to embed quality practices throughout the AI development lifecycle.
- Document test findings clearly for both technical and non-technical stakeholders.
Contract position based in Montreal, Québec, Canada • On-site / Hybrid