11 Apr
|
Tigi HR
|
Guntur
Apply on Kit Job: kitjob.in/job/488rvn
Job Summary
n
We are seeking an AI QA Engineer to ensure the quality, accuracy, and performance of our enterprise-grade Natural Language to SQL (NL2SQL) pipeline. You will be responsible for validating a complex, multi-stage AI architecture—including semantic routing, LLM-based disambiguation, and query generation—ensuring it securely and accurately translates user intent into valid queries within the BFSI domain.
n
Experience: 7+ Years
n
Location: Gurugram
n
Work Mode: Hybrid - 3 Days WFO
n
Employment Type: Full time
n
Key Responsibilities
n
- LLM & Pipeline Evaluation: Design and execute automated evaluations for a 4-stage NL2SQL pipeline using LangSmith. Monitor metrics such as structural F1, execution accuracy, latency, and token cost.
- Dataset Management: Create, curate, and maintain benchmark/golden datasets for continuous regression testing of LLM prompts and model outputs.
- Search & Retrieval Testing: Validate precision and recall trade-offs in semantic search and schema discovery, ensuring optimal candidate selection for downstream query generation.
- Failure Analysis & Debugging: Perform root cause analysis across pipeline stages (routing, disambiguation, query generation, execution), identifying issues such as schema mismatches, type/coercion errors, runtime incompatibilities, and query structure failures.
- E2E & API Automation: Develop automated test scripts using Python (Pytest) for backend API testing and Playwright for the React frontend, validating end-to-end user workflows.
- Observability & Debugging: Utilize Grafana and structured JSONL logs to identify pipeline bottlenecks, LLM hallucinations, or prompt degradation.
- Compliance & Security: Ensure the AI pipeline meets strict BFSI data security standards,
validating execution safety mechanisms (e.g., runtime capability probing, injection prevention); Ability to design validation rules and guardrails for AI pipelines to prevent invalid query generation and runtime failures.
Required Skillsn
- AI/LLM Testing: Experience testing LLM applications, RAG (Retrieval-Augmented Generation) pipelines, or NLP models. Familiarity with AI evaluation frameworks (e.g., LangSmith, DeepEval, or similar).
- Languages: Strong proficiency in Python 3.12+ (crucial for integrating with the existing AI backend and Pytest suite). Secondary experience with JavaScript/TypeScript.
- Test Automation: Expertise in API testing (REST) and optional UI automation using Playwright.
- Data & Search: Understanding of Vector Databases (e.g., Milvus, Pinecone) and semantic search concepts (embeddings, hybrid search).
- Data & SQL Validation: Solid understanding of SQL and data validation techniques to verify correctness of complex query outputs.
- Tools & Infrastructure: Git, Docker, CI/CD pipelines, and observability tools (Prometheus/Grafana).
Educationn
- BE / BTech / MCA / BSc in Computer Science, Data Science, or a related field.
Nice to Haven
- Familiarity with Graph Databases (Neo4j) and LangGraph orchestration.
- Experience evaluating foundational LLM models (OpenAI, Anthropic, Google).
- Prior exposure to query languages like SQL or PURE or any other functional programming language.
- Experience testing workflows across multiple services or pipelines, with an understanding of failure handling, retries, and system reliability concepts.
- Experience in Banking, Financial Services, or Insurance domains
- Understanding of data security, compliance, and enterprise database schemas
Apply on Kit Job: kitjob.in/job/488rvn
📌 AI QA Engineer (Guntur)
🏢 Tigi HR
📍 Guntur