Information Technology
Cloud Infrastructure Engineer II
Last updated
A Cloud Infrastructure Engineer II is a mid-level practitioner who owns significant infrastructure components independently — writing production Terraform modules, managing Kubernetes workloads, and diagnosing multi-layer cloud incidents without continuous supervision. They begin influencing infrastructure standards beyond their own immediate work.
Role at a glance
- Typical education
- Bachelor's degree in CS, IT, or equivalent experience/bootcamp with portfolio
- Typical experience
- 3-6 years
- Key certifications
- AWS Solutions Architect, Certified Kubernetes Administrator (CKA), HashiCorp Terraform Associate, Azure Administrator (AZ-104)
- Top employer types
- Cloud providers, AI-native companies, large technology companies, enterprise IT departments
- Growth outlook
- Strong tailwind driven by emerging specialization in AI infrastructure, GPU provisioning, and model serving.
- AI impact (through 2030)
- Strong tailwind — demand is expanding as engineers specialize in provisioning GPU clusters, model serving infrastructure, and vector databases to support AI workloads.
Duties and responsibilities
- Provision and maintain core cloud infrastructure components using Terraform: VPC resources, compute, managed databases, and shared services
- Manage Kubernetes workloads including Deployments, StatefulSets, HPA configuration, network policies, and namespace resource governance
- Troubleshoot infrastructure incidents independently: trace network issues, diagnose IAM permission failures, identify performance bottlenecks, and document root cause
- Design and implement observability for infrastructure components: metrics, alerts, dashboards, and synthetic health checks that give early warning of degradation
- Review infrastructure design proposals from application teams and provide feedback on security, cost, and operational implications
- Implement cloud security controls including IAM least-privilege policies, VPC security group rules, encryption configurations, and security posture findings remediation
- Optimize cloud costs within owned infrastructure: rightsize compute, configure appropriate storage tiers, identify unused resources, and implement lifecycle policies
- Support on-call rotation for shared infrastructure services; respond to pages, restore service, and complete thorough postmortems
- Mentor Level I engineers on Terraform best practices, cloud networking fundamentals, and systematic troubleshooting approach
- Maintain accurate infrastructure documentation and runbooks reflecting current state of managed systems
Overview
A Cloud Infrastructure Engineer II is at the point in their career where they can be trusted with real ownership of production systems without a senior engineer on the critical path. They understand what they don't know well enough to ask for help strategically, but they don't need guidance on the standard problems in their domain.
In a typical week, the II-level engineer might spend Monday working through a network connectivity issue that an application team reported — tracing the packet flow from their service to the target, finding that the issue is a missing security group rule on the target's VPC endpoint rather than the application's VPC, fixing it, and documenting the change in Terraform. Tuesday involves a Kubernetes HPA tuning exercise: a service is scaling aggressively on CPU utilization but the scaling events aren't correlated with actual latency increase, suggesting the scaling metric should be request rate instead. Wednesday is a Terraform PR review for a junior engineer's new RDS module, providing feedback on variable design and missing encryption configuration.
The mentoring dimension grows at Level II. Explaining why the right Terraform pattern is to use module outputs rather than data source lookups across modules, or why a permissive security group rule is a risk even if it seems low-probability, requires the engineer to have internalized the reasoning rather than just knowing the answer.
On-call coverage at Level II means being the first responder to infrastructure alerts during rotation. Good on-call outcomes require that the runbooks exist and work, that the monitoring is instrumented correctly to surface the right signal, and that the engineer on call has enough context to act without calling for help on every page. Building and maintaining that context is part of the job at this level.
Qualifications
Education:
- Bachelor's degree in computer science, information technology, or equivalent
- Bootcamp or self-taught background with demonstrable cloud project portfolio is accepted at many employers
- Relevant certifications often signal competency more directly than academic background
Experience benchmarks:
- 3–6 years total experience in cloud infrastructure, DevOps, or systems engineering
- At least 2 years directly operating production cloud environments with real ownership
- Track record of completing infrastructure projects end-to-end, not just contributing to components
Required technical skills:
- Cloud platform: strong working knowledge of AWS, Azure, or GCP including VPC/VNet, IAM, managed compute, databases, and object storage
- Terraform: module design, state management with remote backends, workspace patterns, CI/CD integration for infrastructure pipelines
- Kubernetes: production cluster operations — workload management, networking, storage, RBAC, upgrades
- CI/CD: GitHub Actions or GitLab CI for infrastructure automation
- Observability: Prometheus alert rules and recording rules, Grafana dashboard construction, log query proficiency
- Scripting: Python and Bash for automation tasks; Go is a growing expectation for tooling development
Security:
- IAM policy design: can write least-privilege policies from scratch
- Understands the difference between security groups, NACLs, VPC endpoints, and route table rules
- Familiar with CSPM tool findings and knows which are high-priority versus noise
Certifications valued:
- AWS Solutions Architect Associate or Professional
- Certified Kubernetes Administrator (CKA)
- HashiCorp Terraform Associate
- Azure Administrator (AZ-104) for Microsoft environments
Career outlook
Cloud Infrastructure Engineer II is a well-defined band at most technology companies, and it's one of the more reliable employment tiers in IT. The combination of hands-on platform skills, production experience, and emerging leadership makes Level II engineers useful enough to hire but junior enough to hire at scale.
The cloud infrastructure discipline has matured enough that most organizations know what they want from a Level II: someone who can be given a component to own and can be trusted to handle the expected problems without constant oversight. That clarity of expectation makes hiring at this level more systematic than early-career or principal-level roles.
Kubernetes operations has become a near-universal expectation for cloud infrastructure work. Organizations that adopted containers early are now on their third or fourth cluster upgrade cycle and need engineers who've done it before. The CKA certification has become a meaningful baseline signal in this context.
AI infrastructure is creating an emerging specialization within the Level II band. Engineers who invest in understanding GPU cluster provisioning, model serving infrastructure, and vector database operations are differentiating themselves in a growing market. Infrastructure teams at AI-native companies are hiring heavily at the II level.
Career progression from Level II typically runs to Senior Infrastructure Engineer in 2–4 years. Engineers who develop strong architecture instincts may accelerate to Staff/Principal IC tracks; those who develop leadership instincts move toward team lead or engineering manager roles. Senior infrastructure engineers at large public companies earn $180K–$250K+ in total compensation, making the career trajectory financially compelling.
Sample cover letter
Dear Hiring Manager,
I'm applying for the Cloud Infrastructure Engineer II position at [Company]. I've been working as an infrastructure engineer at [Current Company] for three years, where I own our AWS VPC networking layer and am one of two engineers maintaining our EKS cluster fleet.
The VPC work has been the most technically demanding part of the role. When I joined, we had a flat VPC design with a single public and private subnet pair per account. As the number of services grew, we started hitting route table limits and found that our security group rules had become unmanageable. I designed and migrated us to a tiered subnet architecture — separate subnets for application, data, and management tiers with appropriate routing and security group policies for each. The migration ran in phases over 12 weeks without any service interruptions.
On the Kubernetes side, I completed our EKS upgrade from 1.26 to 1.29 earlier this year. The hardest part was the deprecated API removals — we had a handful of Deployment and Ingress manifests using removed APIs that I had to identify and update across 15 application teams' repositories. I automated the detection using kubectl's --dry-run against the new API versions, which found everything in one pass.
I'm pursuing my CKA certification and expect to complete it next month. I'm looking for a team where Kubernetes operations are central and the infrastructure complexity is large enough that I'll keep encountering problems I haven't seen before.
Thank you for your consideration.
[Your Name]
Frequently asked questions
- What distinguishes a Level II from a Level I Cloud Infrastructure Engineer?
- A Level I completes assigned tasks within established systems and escalates novel problems. A Level II owns components end-to-end — designs changes, implements them, handles problems that arise, and iterates without needing a senior engineer on the critical path. The Level II is also expected to improve team-level practices, not just execute within them.
- What Terraform skills are expected at the Level II?
- At Level II, basic Terraform fluency is assumed. The distinction is module design: Level II engineers should be able to write well-designed, reusable modules with sensible variable interfaces, appropriate outputs, and state management practices. Engineers who can only write flat resource configurations without module abstraction are typically at the Level I threshold.
- How much Kubernetes depth does a Cloud Infrastructure Engineer II need?
- Enough to be the person application teams come to with Kubernetes problems, without needing to escalate to a senior every time. That means understanding pod scheduling, resource requests and limits, the relationship between HPA and cluster autoscaler, NetworkPolicy enforcement, storage class selection, and how ingress controllers route traffic. The CKA exam coverage maps reasonably well to what's expected.
- How does AI tooling affect cloud infrastructure work at this level?
- AI coding assistants have become part of the workflow for most practitioners — accelerating Terraform module authoring, generating first drafts of monitoring configurations, and helping with scripting tasks. More substantively, AI infrastructure workloads (GPU instances, vector databases, model serving endpoints) are becoming a growing category of infrastructure that Level II engineers need to provision and operate.
- Is on-call required for this role?
- For most companies running production cloud infrastructure, yes. Level II engineers typically join the on-call rotation once they're sufficiently familiar with the systems. The intensity varies widely — some rotations page once or twice per week, others multiple times per night. Understanding the alerting philosophy and page volume of a specific team is worth asking about during the interview process.
More in Information Technology
See all Information Technology jobs →- Cloud Infrastructure Engineer$110K–$160K
Cloud Infrastructure Engineers build and operate the foundational cloud systems — networks, compute, storage, and shared platform services — that application teams deploy their software onto. They work deeper in the stack than application developers and are responsible for the reliability and security of the platform itself.
- Cloud Infrastructure Manager$145K–$200K
Cloud Infrastructure Managers lead the teams that build and operate the cloud platform layer an organization runs on. They balance people management, technical direction, reliability accountability, and infrastructure cost ownership — while ensuring the platform keeps pace with the needs of the engineering organization it serves.
- Cloud Infrastructure Architect$145K–$210K
Cloud Infrastructure Architects design the foundational cloud environments that organizations build their products and operations on. They make the high-stakes technical decisions — network topology, account strategy, compute platform selection, security architecture — that constrain or enable engineering work for years after implementation.
- Cloud Infrastructure Specialist$100K–$145K
Cloud Infrastructure Specialists configure, manage, and optimize cloud environments to keep applications running reliably and securely. They work across cloud platforms handling provisioning, networking, security, and monitoring — typically with focused ownership of specific infrastructure domains within a larger platform team.
- DevOps Manager$140K–$195K
DevOps Managers lead the teams that build and operate CI/CD pipelines, cloud infrastructure, and developer platforms. They hire and develop engineers, set technical direction for the platform, manage relationships with engineering leadership and product teams, and ensure that delivery infrastructure enables rather than constrains the broader engineering organization.
- IT Consultant II$85K–$130K
An IT Consultant II is a mid-level technology advisor who designs, implements, and optimizes IT solutions for client organizations — translating business requirements into technical architectures and guiding projects from scoping through delivery. They operate with less oversight than a Consultant I, own client relationships on defined workstreams, and are expected to produce billable work product with measurable outcomes across infrastructure, software, or business-process domains.