Role: Gen AI Developer
Total Experience: 6+ years with 2+ years working on GenAI initiatives
Employment Type: Permanent & Full time
Working Model: Hybrid (3 days work from office)
Job Summary:
We are seeking a Senior AI Developer with proven expertise in Generative AI technologies, a solid foundation in machine learning, and a strong understanding of data governance. The ideal candidate will have hands-on experience with both cloud-based LLM platforms, on-premise, open-source LLMs like Ollama, Llama.cpp, and GGUF-based models. You should also have good knowledge in Model Context Protocol (MCP). You will help architect and implement GenAI-powered products that are secure, scalable, and enterprise-ready.
Key Responsibilities:
- Design, build, and deploy GenAI solutions using both cloud-hosted and on-prem LLMs.
- Work with frameworks like Hugging Face, LangChain, LangGraph, LlamaIndex to enable RAG and prompt orchestration.
- Implement private LLM deployments using tools such as Ollama, LM Studio, llama.cpp, GPT4All, and vLLM.
- Design retrieval-augmented generation (RAG) pipelines with context-aware orchestration using MCP.
- Implement and manage Model Context Protocol (MCP) for dynamic context injection, chaining, memory management, and secure prompt orchestration across GenAI workflows.
- Fine-tune open-source models for specific enterprise tasks and optimize inference performance.
- Integrate LLMs into real-world applications via REST, gRPC, or local APIs.
- Ensure secure data flows and proper context management in RAG pipelines.
- Collaborate across data, product, and infrastructure teams to operationalize GenAI.
- Incorporate data governance and responsible AI practices from design through deployment.
Required Skills and Qualifications:
- 6+ years of experience in AI/ML; 2+ years working on GenAI initiatives.
- Experience with OpenAI, Claude, Gemini, cloud based LLMs (AWS/GCP/Azure) and open-source LLMs like Mistral, LLaMA 2/3, Falcon, Mixtral.
- Strong hands-on expertise with on-premise LLM frameworks (Ollama, llama.cpp, GGUF models, etc.)
- Hands-on experience with Model Context Protocol (MCP) for structured prompt orchestration, context injection and tool execution.
- Proven experience in building and optimizing Retrieval-Augmented Generation (RAG) pipelines, including document chunking, embedding generation, and vector search integration.
- Proficiency in Python and libraries such as Transformers, Hugging Face, LangChain, and PyTorch.
- Experience with embedding models and vector DBs (FAISS, Pinecone, Weaviate, Qdrant, etc.)
- Familiarity with MLOps, GPU optimization, containerization, and deployment in secure environments.
- Good understanding of data governance—access control, lineage, auditability, privacy.
Nice to Have:
- Exposure to multi-modal models (image, speech) and toolformer-style agents
- Experience integrating AI into enterprise platforms (e.g., ServiceNow, Salesforce, Jira)
- Awareness of inference acceleration tools (vLLM, DeepSpeed, TensorRT)