Why This Role Exists
T2 is architecturally unlike most SaaS products, and its testing challenges reflect that. The system is a hybrid Java/Python mono-repo with 12+ microservices communicating over REST APIs and RabbitMQ, where AI agents produce non-deterministic outputs, governance rules create combinatorial complexity, and the Planning→Generation→Validation loop can dynamically modify its own execution plan mid-flight. A team of developers writing unit tests alongside their code is necessary but not sufficient — nobody on the team today owns the cross-cutting question: "If I change how the Planning Agent sequences steps, does the Governance Service still correctly apply quality gates, and does the UI correctly reflect the state transition?"
This role is not about classical automation testing or about script-based manual testing; it involves developing test infrastructure, creating automated test suites, and conducting quality analysis based on a deep understanding of the code. You bridge the gap between developer and QA tester disciplines - combining the mindset of a tester with the skills of an engineer.
About Transform
At RWS, we enable the world’s largest enterprises to communicate with global audiences through cutting-edge language technology, AI-driven solutions, and expert services. Our RWS Transform division empowers organizations to accelerate digital transformation, scale global content, and unlock growth in every market.
What Makes T2's QA Uniquely Hard
Non-deterministic agent outputs. The same input to the Content Analysis Agent can produce subtly different classifications, risk scores, or entity extractions depending on LLM response variance. The Quality Evaluation Agent's confidence scores fluctuate. The Planning Agent may sequence steps differently based on slight variations in content analysis output. Traditional assertion-based testing ("expect exactly this output") doesn't work. T2 needs property-based testing ("the output must have these structural properties"), boundary testing ("at what confidence threshold does the system switch autonomy levels"), and statistical testing ("over N runs, does quality stay within acceptable variance").
Cross-language, cross-service state flows. A single transformation triggers a cascade: the Java Orchestrator invokes the Python Content Analysis Agent, which publishes results to RabbitMQ, which triggers the Planning Agent, whose plan is consumed by the Java Transformation Service, which dispatches to Python Transformation Agents, whose outputs feed the Quality Evaluation Agent, whose results flow back through RabbitMQ to the Java layer, which pushes updates to the React UI. A bug anywhere in this chain may only manifest as a wrong state in the UI or a silently incorrect quality score. Someone needs to write end-to-end tests that trace these flows and catch breaks across service boundaries.
Governance rule combinatorics. Six categories of execution guidelines, each with conditions based on content type, language, tags, org scope, and priority. Rules can overlap, conflict, or cascade. The Planner Agent consumes them all and reasons about what applies. Testing individual rules is straightforward; testing rule interactions at scale — and catching regressions when a new rule type is added — requires systematic test generation and architectural understanding of the governance model.
The iteration loop. Execution isn't linear. When quality gates fail, the plan evolves: new steps get added, parameters get modified, additional LLM passes run, then quality re-evaluates. This dynamic plan mutation creates complex state paths. Testing needs to cover not just the happy path but failure-recovery-retry sequences that are hard to predict from reading individual service code.
Multi-tenancy and data isolation. T2 runs within the LC perimeter with shared infrastructure (RabbitMQ vhosts, Redis databases, EKS namespace). Testing must verify that tenant A's governance rules never leak into tenant B's transformation, that content isolation holds across the Content Lake, and that concurrent transformations across tenants don't interfere.
What This Person Owns
Test automation architecture. Designing and maintaining the test pyramid for T2: what gets tested at unit level within each service (developers own this, QA engineer reviews and raises the bar), what gets tested at integration level across service boundaries, and what gets tested end-to-end. Owning the test infrastructure — Docker Compose test environments, mock LLM services with deterministic responses, RabbitMQ test harnesses, MongoDB test data seeding. Extending the existing selective testing strategy (which already reduces CI from 30 min to 5-10 min) to cover the new agent services and cross-boundary flows.
API contract testing. The Java↔Python boundary is a high-risk seam. This person owns contract tests that verify both sides agree on message shapes, error handling, and streaming behavior. When the Python Agent Engineer changes an agent's response schema, these tests catch it before it breaks the Java Orchestrator.
Agent behavior testing. Developing strategies for testing non-deterministic systems: snapshot testing with golden datasets (known content → expected classification ranges), property-based testing (quality scores always between 0-100, confidence always accompanies quality, plan steps always include at least one generation and one evaluation step), and regression suites that detect when agent behavior drifts after prompt changes or model upgrades.
Event flow validation. End-to-end tests that trace a transformation from API request through the entire service chain to UI state. This includes: content uploaded → analysis triggered → plan generated → execution started → quality evaluated → iteration triggered (if quality fails) → completion → UI reflects correct final state. These tests need to handle async RabbitMQ flows and verify event ordering, completeness, and payload integrity.
Governance rule interaction testing. Systematic testing of rule combinations: conflicting rules, overlapping scopes, priority resolution, rule inheritance through org hierarchy. This is where understanding the governance data model (execution guidelines, quality controls, scope mechanics, conflict detection) matters — the QA engineer needs to generate test scenarios that exercise the combinatorial space, not just test the scenarios a developer thought of.
Performance and reliability testing. Load testing transformation pipelines — what happens with 50 concurrent transformations across 5 tenants? Does the agent worker pool scale correctly? Do RabbitMQ queues back up? Does the Quality Evaluation Agent's token budget mechanism actually prevent runaway costs? Does the WebSocket push notification service maintain connection stability under load?
CI/CD quality gates. Owning what must pass before code merges: extending the existing GitHub Actions CI pipeline with integration test stages that cover cross-service flows, defining quality thresholds (coverage targets, test stability requirements), and maintaining the dependency graph that drives selective testing as new services get added.
What This Person Is Not
Not a manual tester who executes test cases from a spreadsheet. Not a pure performance engineer who only runs load tests. Not a traditional QA who tests by clicking through the UI (though they should be capable of validating UI flows when needed). This is a developer who tests — someone who reads the Java Orchestrator code and the Python agent graphs to understand state transitions, then writes automated tests that verify those transitions work correctly together.
Technical Profile
Java + Python — must be able to read and write tests in both languages; doesn't need to be senior in either, but must be fluent enough to understand the code paths they're testing
Test automation frameworks — JUnit 5, pytest, Vitest/Testing Library; experience designing test suites, not just writing individual tests
API and contract testing — REST API testing (RestAssured or similar), gRPC contract testing, schema validation
Docker and container orchestration — writing and maintaining Docker Compose test environments, service health checks, test data lifecycle management
Message queue testing — RabbitMQ consumer/producer test patterns, event ordering verification, async flow testing
CI/CD pipelines — GitHub Actions workflow authoring, understanding of selective testing strategies, quality gate configuration
Performance testing — tools like k6, Gatling, or Locust; experience designing load scenarios for distributed systems
Testing non-deterministic systems — property-based testing, snapshot testing with tolerance ranges, statistical assertion strategies (this is rare and very valuable for T2)
MongoDB — enough to write test data fixtures and verify data state post-test
Nice to have: experience with LLM/AI testing strategies, Kubernetes test environments, security testing basics, experience in a mono-repo environment
Collaboration Pattern
Works across the entire team — this is the most horizontally connected role. Reviews PRs for test quality across all services. Pairs with the Python Agent Engineer on agent behavior test strategies. Works with the senior backend engineers to understand orchestration state machines and write integration tests that cover edge cases. Collaborates with the UI Developer on E2E flows. Partners with the architect on defining what "quality" means for the CI pipeline and release process.
This person should have a seat in architecture discussions — not to design the system, but to ask "how would we test this?" before decisions are finalized. The earlier testability is considered in design, the less painful testing becomes.
Life at RWS - If you like the idea of working with smart people who are passionate about growing the value of ideas, data and content by making sure organizations are understood, then you’ll love life at RWS.
Our purpose is to unlock global understanding. This means our work fundamentally recognizes the value of every language and culture. So, we celebrate difference, we are inclusive and believe that diversity makes us strong. We want every employee to grow as an individual and excel in their career.
In return, we expect all our people to live by the values that unite us: to partner, putting clients fist and winning together, to pioneer, innovating fearlessly and leading with vision and courage, to progress, aiming high and growing through actions and to deliver, owning the outcome and building trust with our colleagues and clients.
RWS embraces DEI and promotes equal opportunity, we are an Equal Opportunity Employer and prohibit discrimination and harassment of any kind. RWS is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. All employment decisions at RWS are based on business needs, job requirements and individual qualifications, without regard to race, religion, nationality, ethnicity, sex, age, disability, or sexual orientation. RWS will not tolerate discrimination based on any of these characteristics.
RWS Values
Get the 3Ps right – Partner, Pioneer, Progress – and we´ll Deliver together as RWS.
Recruitment Agencies: RWS Holdings PLC does not accept agency resumes. Please do not forward any unsolicited resumes to any RWS employees. Any unsolicited resume received will be treated as the property of RWS and Terms & Conditions associated with the use of such resume will be considered null and void.