Role Summary
The AI Engineer will be responsible for implementing, maintaining, and optimising the technical infrastructure of the AI Centre of Excellence (CoE). This role focuses on
hands-on deployment, long-term reliability, and operational excellence, ensuring that all AI labs are secure, scalable, and cost-efficient. Working under the guidance of the
AI Consultant (Strategic Lead) and
Project Manager (Delivery Lead), the AI Engineer will build and maintain the technical backbone that supports research, teaching, and innovation across the CoE’s core domains.
Location: Bhubaneswar
Engagement: Permanent
Key Responsibilities
Infrastructure Setup & Operations
- Deploy and configure cloud and on-prem GPU environments, storage, and development systems for CoE labs.
- Manage system administration, user provisioning, monitoring dashboards, and uptime tracking.
- Implement backup, versioning, and disaster-recovery protocols for research data and codebases.
- Maintain and upgrade hardware and software assets in alignment with usage growth and performance needs.
Frameworks & Tool Integration
- Install, configure, and maintain core AI frameworks and SDKs (TensorFlow, PyTorch, Hugging Face, OpenCV, etc.).
- Integrate shared data pipelines, APIs, and reusable frameworks across CoE domains for standardisation.
- Ensure reproducible environments using containers or orchestration tools (Docker, Kubernetes, Conda).
- Support faculty and student teams with environment setup, framework compatibility, and troubleshooting.
Governance, Security & Cost Management
- Implement access control, data-security policies, and anonymisation workflows compliant with Responsible AI standards.
- Monitor GPU and cloud utilisation; optimise resource allocation and maintain cost dashboards.
- Enforce institutional and ethical AI compliance in collaboration with the Project Manager and governance boards.
- Support periodic audits, infrastructure reviews, and reporting to the Steering Committee.
Knowledge Transfer & User Enablement
- Develop lab manuals, configuration runbooks, and onboarding guides for faculty and students.
- Provide first-line technical support and training sessions on AI toolchains, environments, and data handling.
- Document all configurations and procedures to ensure continuity through the Build–Operate–Transfer phases.
- Contribute to university self-sufficiency by mentoring internal staff on ongoing system management.
Skills & Competencies
- Strong applied AI tooling experience (Python, TensorFlow, PyTorch, Hugging Face, scikit-learn).
- Hands-on expertise in environment setup, containerisation, and DevOps tools (Docker, Kubernetes, Git).
- Cloud platform proficiency (AWS, Azure, GCP) with focus on performance tuning and cost control.
- Familiarity with CI/CD pipelines, code versioning, and reproducibility best practices.
- Working knowledge of information security, access management, and ethical AI compliance.
- Reliable, collaborative, and process-driven with a commitment to long-term CoE operations.
Qualifications
- Bachelor’s or Master’s degree in Computer Science, Artificial Intelligence, or related field.
- 3–4 years’ experience in AI systems engineering, DevOps, or technical lab management.
- 5-6 years’ experience in software engineer in dev role.
- Experience maintaining GPU clusters, hybrid-cloud research environments, or AI lab infrastructure.
- Demonstrated track record of stability and sustained operational ownership in previous roles.