Orchestrating Multi-Domain Processes Using AWS Step Functions

In November 2022, amaysim released brand new iPhone devices bundled with SIM-only mobile plans. While it wasn't the first time amaysim included mobile devices as part of their product offering, it was the first time that amaysim offered a bundled product - a device AND a mobile plan - bought together within the same checkout journey.

However, other larger mobile providers can offer customers mobile devices with their plans. "So, what's the big deal?" you ask.

As a low-cost virtual mobile network operator, amaysim is able to leverage supply chain and logistics. However, how did we coordinate processes between our disparate systems to ensure one reliable customer journey?

New Solution to Old Problems

Over the years, amaysim has offered and experimented with several products and services, including fixed broadband (NBN), mobile devices, tablets, and energy. While they all look like one big happy family of products on the website, in those earlier incarnations, customers could only purchase each product type separately. For those services that could be "bought together”, one part of the "bundle" would sometimes be provisioned through "back of office" methods or through marketing campaigns. Although effective in getting us to market, it was a less than ideal Technology solution.

One of the reasons for this was the consequence of adding new product sets through mergers and acquisitions. Whilst an acquisition generally enables a fast-tracked entry into a new market, it can also bring its own unique set of technologies, which often need to be managed separately, but stitched together to produce a seamless customer experience.

As we’ve evolved, the amaysim Technology approach has been to look at how we can group some standard business functions and we began by recently rebuilding the Devices product offerings based on the principles of domain-oriented architecture.

When it was time to consider the creation of a bundled product, we decided to use the following to help navigate through cross-domain functions:

  1. An integration layer consisting of several serverless functions that abstract the complexities of all the different domains involved in the order process

  2. An orchestration layer consisting of a step function or state machine that visualises and coordinates the serverless functions required to complete the order process.

A high-level representation of the relationship between the step function, the serverless integration layer and domain-specific microservices. Not indicative of actual implementation.

Step Functions vs Lamdbas

As a technology organisation, our first preference for implementation is Cloud-Native using serverless technologies. Most of the time, we focus on writing domain-specific services such as Payments, Referrals and Devices. These services are not process intensive; most of our code gets deployed as AWS Lambda functions and does not require complex infrastructure.

The Order Orchestrator is unique because we cannot just categorise it into a single domain. Orchestration involves many different services depending on which task the order process is up to.

Example of a the typical steps when processing a bundled order.

While it is possible to complete orchestration using an army of lambda functions, it was better to utilise AWS Step Functions.

Step Functions are composed of several connected "states" that can:

  • do some work (a Task state)

  • make decisions on which part of the process to execute (a Choice state)

  • pass inputs to outputs or inject fixed data into the process (a Pass state)

  • delay a method for a certain amount of time or until something happens (a Wait state)

There are many other types of states provided by AWS, but these are the basic building blocks of our Order Orchestrator.

Serverless Framework vs AWS CDK

Another decision was to keep using Serverless Framework to define the step function or use something else. Historically, we would have used Serverless Framework to deploy our serverless infrastructure to AWS. Serverless excellently works when you are working with simple backend services. For example, an AWS Lambda integrated with AWS API Gateway and saving data to an AWS DynamoDB table.

It seems like a lot of work considering the several tasks and choices required in our ordering processing. We wanted to have more time to focus on writing code that would do the processing without worrying too much about AWS infrastructure. As a result, we decided to write the step function using AWS Cloud Development Kit (CDK).

CDK enables us to develop faster using our preferred programming language (Typescript/NodeJS) and provides more fine-grained control over our infrastructure without worrying about how to write CloudFormation code.

A CDK code example

Step Functions: Benefits and Drawbacks

There was a lot to love and several challenges when building the Order Orchestrator. Here are the top three things that sold us on using AWS Step Functions.

  1. It simplifies workflow management by allowing us to break down our ordering process into different "states,"such as a discrete tasks.

  2. It has a built-in visual interface that increases visibility, making it easier to understand and monitor the state of the orchestration. It was easy to spot a broken task because it would be in red highlight.

  3. It has built-in retry and error handling. AWS Step Functions supports states that allow us to “pause” order processing that catch errors so we can apply manual corrections. Having this mechanism in place allows us to have granular control over error handling or build in automated retries with backoff in the case of issues caused by external outages. Having these features help reduce the risk of orchestration failure and improves reliability.

Step function enables us to redirect a failure to a certain action/task.

There are, however, some challenges involved. Here are the top three things that made our eyebrows furrow during the implementation.

  1. Potential steep learning curve. AWS Step Functions introduced several new concepts such as "state machines" and the different "states" and "tasks". Integration with order services and external systems can be complex, and it was essential to understand how to pass data between states.

  2. State management can be challenging when dealing with extensive data or when multiple steps must share a state.

  3. Debugging state issues can be difficult and time-consuming. There are no built-in debugging tools at this time of writing, making it more challenging to find and fix workflow issues.

Tackling the Learning Curve - Tips & Tricks

If you are keen on trying out AWS Step Functions to orchestrate your complex processes, we have a few tips to help reduce the slope of the learning curve.

  1. Plan your data model, inputs and outputs from start to finish. Knowing all the attributes you need for the process will really help build a cleaner and more organised flow. Also note that if your orchestration uses AWS Event Rules to trigger API destinations, the input transformers must translate missing attributes well. The state machine should initialise these attributes with a default value when it publishes the event to AWS Event Bridge.

  2. Draw a decision tree or process flow to help identify the steps in the process. Knowing which actions or steps trigger lambdas and where the orchestration will branch or pause will allow you to maximise the use of different states and help the team keep aligned as parts of the orchestration will branch out.

  3. When in doubt, use the Data Flow Simulator. The Data Flow Simulator is a potent tool to give you a rough idea of what happens to your data and what you can do to manipulate them after task completion states.

  4. You can directly send data to various AWS services like SQS or DynamoDB without using a Lambda. Usually, we use lambdas to send updates to SQS or DynamoDB. However, states can trigger these services without using lambdas, reducing the dependencies we must consider.

  5. Logs will lead the way. For troubleshooting or debugging needs, the Step Functions have clickable links directing you to the CloudWatch log records generated and AWS services called. Always check the Task Input and Task Result to know the orchestrator's input and the tasks' output. This checking helped us troubleshoot issues around JSON paths.

In Conclusion

AWS Step Function is an excellent service to orchestrate and increase the visibility of your microservices as they work together to complete complex cross-domain processing. No doubt there are other complex cross-domain processing we can convert into a step function, and we look forward to exploring these new avenues!

The views expressed on this blog post are ours alone and do not necessarily reflect the views of my employer, Optus Administration Pty Ltd.

Previous
Previous

Enabling code reuse

Next
Next

Flutter test : mockito GenerateMocks vs GenerateNiceMocks