AWS custom app using High Performance Computing for Tonkin+Taylor
Executive Summary
About ClientÂ
The customer’s (Tonkin + Taylor) business is involved in environmental consulting or meteorological services, focuses on providing high-resolution meteorological data for various applications, including air quality analysis, weather forecasting, and climate risk assessment. Their offerings are centered around advanced data modeling using the Weather Research Forecasting (WRF) model, which requires significant computational resources due to its ability to generate detailed meteorological datasets.
Project Background - AWS Custom product for Weather research forecasting
Peritos was hired to address these challenges by developing a comprehensive system that could:
- Efficiently run the WRF model using HPC cluster.
- Automatically create and manage HPC cluster jobs on receiving new data requests.
- Automatically manage data resolution adjustments.
- Provide a seamless experience for customers through an easy-to-use online platform.
Enable the commercialization of the datasets, ensuring that the customer could capitalize on the broad applicability of their data across multiple disciplines
Scope & Requirement
Implementation
Technology and Architecture
The architecture of this application efficiently handles the computational intensity of the WRF model, scales dynamically with demand, and provides a seamless experience for users. The integration of various AWS services ensures that the solution is robust, secure, and scalable.
Overall Workflow
User Request: Users input data parameters and request pricing. If satisfied, they proceed with the purchase.
Processing Trigger: Upon payment confirmation, the system triggers the data processing workflow.
WRF and WPS Processing: The ParallelCluster performs the necessary computations to generate the meteorological data.
Post-Processing: Any additional processing is done before the final data is stored.
Download and Notification: Users are notified and provided with a link to download their processed data.
Â
Technology
The web app was deployed with the below technological component
• Backend Code: .NET, C#, Python
• Web App code: NextjsÂ
• Database: PostgreSQL
• Cloud: AWS
Integrations
• Google APIsÂ
• Stripe
• Auth0
• SendGrid
• Slurm APIs
Â
Â
Â
Â
High-Performance Computing (HPC) Environment
 • AWS ParallelCluster: Provides the compute infrastructure needed to run the WRF model and WPS processes. This cluster is set up dynamically and scaled according to the computational demands of the task, ensuring efficient resource usage.
• Head Node and Compute Fleet: The head node manages the compute fleet, which executes the high-compute WRF and WPS processes.
• FSx for Lustre: High-performance file storage integrated with the ParallelCluster, used to store and access the large datasets generated during processing.
Â
Processing and Orchestration
• AWS Lambda Functions: Used extensively for orchestrating various steps in the data processing workflow.
• AWS Step Functions: Orchestrates the entire workflow by coordinating Lambda functions, managing state transitions, and handling retries or errors.
Features of Application
The solution leverages AWS cloud services to generate, process, and distribute high-resolution meteorological data.
Users interact via an interface hosted on AWS Amplify, secured by AWS WAF and Shield, with APIs managed by Amazon API Gateway.
The system orchestrates data processing using AWS Lambda functions and AWS Step Functions, coordinating tasks such as WRF and WPS processing on an AWS ParallelCluster.
FSx for Lustre provides high-performance storage, while Amazon S3 and Aurora DB handle data storage and transaction management.
Post-processing is done on EC2 instances, with notifications sent via SNS. The solution efficiently manages the high computational demands of the WRF model, scales dynamically, and ensures secure, seamless data access for internal and external users.
Challenges
- Challenge 1: High Computational Demand: The WRF model’s capacity to produce highly detailed meteorological datasets necessitates extensive computational power, which made running it on the customer’s existing local infrastructure impractical. The challenge was to find a solution that could efficiently handle large-scale data generation with optimum costing.
- Solution: This challenge was met by implementing an AWS-based high-performance computing (HPC) cluster, specifically AWS ParallelCluster, which provided the necessary computational resources to run the WRF model efficiently. The jobs on ParallelCluster were created and managed dynamically using AWS Stepfunction and AWS Lambda by utilizing Slurm APIs
- Challenge 2: User Experience and Commercialization: To monetize their meteorological data, the customer needed to create an accessible, user-friendly portal where external users could easily select regions, adjust data resolution, and purchase datasets. The portal needed to be intuitive, efficient, and fully capable of handling secure transactions, which was essential for the success of the customer’s business model.
- Solution: The customer addressed this challenge by developing a web-based portal using AWS Amplify, integrated with AWS WAF and Shield for security, and managed via Amazon API Gateway. This platform provided a seamless user experience, enabling external customers to effortlessly interact with the system, select their data parameters, and complete purchases, thereby facilitating the commercialization of their datasets and enhancing revenue streams.
Project Completion
Duration
Â
- July 2023Â – May 2024Â ~ Implementation and Support
Deliverables
• Setting up the AWS services Architecture review and sign off by internal and existing vendors of Landcheck to ensure all best practices are followed and it is in alignment with best practices using AWS well Architected framework to ensure security , scalability and performance are upto the mark.Â
• Custom web application was developed by the Peritos team working closely with the client’s product owner and completing any changes, bugs and adding critical features prior to Go live to ensure we have a smooth release.Â
• We are still working on the handover documents and preparing for the final go LiveÂ
Testimonial
Awaited
Next Phase
We are now looking at the next phase of the project which involves:
1. Ongoing Support and adding new features every Quarter with minor bug fixes
2. Adding support for more countriesÂ