Monolith AI logo

QA Engineer - Load Testing Specialist (2 months contract)

Monolith AI
Full-time
On-site
London, United Kingdom
QA Engineer

Position Overview

Monolith AI is seeking an experienced QA Engineer to lead load testing efforts for a critical system

release focused on improving concurrency and high request load handling. This fast-paced, short-

term engagement requires someone who can quickly understand complex distributed systems,

design comprehensive load tests, and work collaboratively with a rapidly growing engineering team

to ensure our new environment meets performance requirements.

Primary Responsibilities

  1. Design and Implement Automated Load Testing Framework

β—¦ Develop comprehensive load tests for FastAPI endpoints, Temporal workflows/

activities, and AWS service interactions

β—¦ Create realistic test scenarios simulating concurrent workflow execution patterns,

including graph-based workflow orchestration

β—¦ Build automated test suites that measure system behavior under varying concurrency

levels and request loads

  1. Performance Analysis and Bottleneck Identification

β—¦ Monitor and analyze system performance across the entire stack (API layer,

Temporal workers, AWS services)

β—¦ Identify concurrency limitations in Temporal workflow execution, AWS service

limits (Athena, ECS), and inter-component communication

β—¦ Document performance characteristics including response times, throughput limits,

and failure modes under load

  1. Collaborate on Non-Functional Requirements (NFR) Definition

β—¦ Work with Customer Success and Product teams to understand business

requirements and translate them into measurable performance criteria

β—¦ Iterate on acceptable concurrency thresholds, latency targets, and throughput

requirementsβ—¦ Validate that proposed NFRs are realistic and achievable given architectural

constraints

  1. System Documentation and Knowledge Extraction

β—¦ Understanding of the existing system through code review, discussions with the

development team, and exploratory testing

β—¦ Create clear documentation of test methodologies, results, and recommendations for

future testing

  1. Recommendation and Optimization Guidance

β—¦ Provide actionable recommendations for removing identified bottlenecks

β—¦ Suggest configuration optimizations for Temporal (worker pools, task queues) and

AWS services (Athena concurrency, ECS capacity)

  1. Rapid Communication and Status Reporting

β—¦ Maintain daily/frequent communication with the Tech Lead regarding project

progress, blockers, and findings

β—¦ Quickly escalate issues that could impact the aggressive timeline

β—¦ Present findings and recommendations to technical and non-technical stakeholders

  1. Cross-Component Integration Testing

β—¦ Test complex scenarios involving graph execution triggering node workflows across

multiple system boundaries

β—¦ Validate S3 read/write operations under concurrent load

β—¦ Ensure inter-component communication (API β†’ Temporal, Temporal Activity β†’

API triggers) performs reliably at scale

Key Performance Indicators

  1. Test Coverage and Execution

β—¦ Complete automated load test suite covering all critical components within first 3

weeks

β—¦ Execute baseline and progressive load tests identifying maximum sustainable

concurrency levels

  1. Bottleneck Identification and Impact

β—¦ Identify and document top 5-7 performance bottlenecks with clear impact analysis

β—¦ Provide actionable remediation recommendations with estimated effort and impact

for each bottleneck

3. NFR Definition and Validation

β—¦ Collaborate with stakeholders to define measurable NFRs within first 2 weeks

β—¦ Validate system meets or document gaps against agreed NFR criteria by project end

  1. Documentation and Knowledge Transfer

β—¦ Deliver comprehensive test documentation, results analysis, and system performance

characteristics

β—¦ Conduct knowledge transfer sessions ensuring team can maintain and extend testing

framework

  1. Project Velocity and Communication

β—¦ Meet weekly milestone targets in this fast-paced 2-month engagement

β—¦ Maintain proactive communication rhythm (daily standups, weekly detailed reports

to Tech Lead)

Required Qualifications

Experience:

  • 4+ years of experience in QA/performance testing roles

  • 2+ years of hands-on experience with load testing distributed systems and microservices

architectures

  • Proven experience with load testing tools (e.g., k6, JMeter, Locust, Gatling, Artillery)

  • Experience testing workflow orchestration systems (Temporal, Airflow, Prefect, or similar)

  • Demonstrated ability to test systems integrating with AWS services (particularly Athena,

ECS, S3)

Technical Skills:

  • Strong proficiency in Python (required for test automation and working with FastAPI/

Temporal)

  • Experience with REST API testing and performance validation

  • Understanding of distributed systems concepts: concurrency, queueing, backpressure, rate

limiting

  • Familiarity with AWS infrastructure and service limitsβ€’ Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or

similar)

  • Proficiency with Git and CI/CD pipelines

  • Ability to read and understand code in order to design effective tests

Immediate Availability:

  • Ability to start in early January 2025 and commit to focused 3-month engagement

  • Availability for full-time contract work during project duration

Preferred Qualifications

  • Experience with containerized workloads and Docker/ECS

  • Prior work in fast-paced startup or scale-up environments

  • Experience with infrastructure-as-code (Terraform, CloudFormation)

  • Background in Site Reliability Engineering (SRE) or DevOps practices

  • Familiarity with data processing pipelines and analytics systems

  • Previous contract/consulting experience with rapid knowledge acquisition

  • Experience with graph-based workflow systems or DAG execution engines

  • Knowledge of AWS service limits and optimization strategies

Essential Soft Skills

Self-Direction and Initiative:

  • Ability to operate independently in an ambiguous, fast-moving environment with minimal

documentation

  • Proactive problem-solving mindset; doesn't wait for perfect information before taking action

  • Comfortable making pragmatic decisions quickly in a time-constrained project

Communication and Collaboration:

  • Exceptional communication skills for extracting knowledge through conversations with

existing team members

  • Ability to translate technical findings into clear, actionable recommendations for diverse

audiencesβ€’ Comfortable asking clarifying questions and challenging assumptions respectfully

  • Strong written communication for documentation and status updates

Adaptability and Learning Agility:

  • Quick learner who can rapidly understand complex, poorly documented systems

  • Flexible and comfortable with changing priorities in a 15-person team that's doubling in size

  • Thrives in fast-paced environments with aggressive timelines

Pragmatism and Results Orientation:

  • Focused on delivering practical, actionable outcomes within tight timeframes

  • Understands the balance between thoroughness and speed in a 2-month engagement

  • Comfortable with "good enough" when perfect isn't achievable within constraints

Stakeholder Management:

  • Skilled at managing expectations with technical leadership about realistic timelines and

trade-offs

  • Diplomatic when delivering difficult news about performance limitations or bottlenecks

  • Collaborative approach when working with CS and Product on NFR definition

Key Challenges in This Role

  1. Rapid Knowledge Acquisition with Limited Documentation

β—¦ The existing system lacks comprehensive documentation, requiring you to quickly

build understanding through code review, system exploration, and frequent

discussions with the development team

β—¦ Success requires comfort with ambiguity and strong investigative skills

  1. Aggressive Timeline with High Impact

β—¦ A 3-month timeline to design tests, execute comprehensive load testing, identify

bottlenecks, and deliver actionable recommendations is extremely tight

β—¦ Must balance thoroughness with pragmatism; prioritize ruthlessly to ensure critical

areas are covered

  1. Complex Distributed System with Multiple Integration Points

β—¦ The system involves multiple layers (FastAPI, Temporal, AWS services) with

complex inter-component communication patterns (graph β†’ node workflows)β—¦ Must understand the entire stack sufficiently to design realistic, comprehensive load

tests that expose real-world bottlenecks

Apply now
Share this job