WEKA moves file data at 2TB/sec on Oracle’s cloud – Blocks and Files

WEKA claims to have moved file data on Oracle’s cloud at nearly 2TB/sec to servers using its scalable parallel file system software.

The Oracle Cloud Infrastructure (OCI) public cloud provides bare metal servers for compute forms. A shape is a template specifying the CPU type, number of CPU cores, RAM, and network speed. Oracle and WEKA validated the performance of WEKA when running inside the OCI.

Oracle’s lead solutions architect, Pinkesh Valdria, said in a Blog at the end of last week: “The performance that WEKA and OCI can deliver to customer workloads is fantastic…this combination of performance and scalability, along with the elasticity that OCI provides, allows you to host successfully modern EDA, life sciences, financial analysis, and more. traditional enterprise workloads on OCI.

WEKA says of Oracle’s cloud: “Oracle Cloud Infrastructure (OCI) is a leading hyperscale cloud that provides XaaS compute and application services to Oracle customers, and offers much more than you might expect. expect, including AI/ML and GPU workloads”.

Validation testing involved runs using 80 or 373 x bare metal (BM.Optimized3.36) computes shapes with 36 cores, 512 GB RAM, 3.8 TB local NVMe SSD, 100 Gb RoCE (RDMA over Converged Ethernet), and 2 x 50 Gbps network links. WEKA was configured to use six cores, leaving 30 to run the operating system and application software on the same server.

Test data was generated using Flexible IO (Fio) with 1MB blocks for large block workloads and 4KB for small block workloads. Here is a table of results followed by a graph:

WEKA performance table

There were no doubt intermediate server count configurations, but WEKA and Oracle highlighted results of 80 and 373 servers. A throughput close to 2TB/sec is definitely a hero number.

A possible comparison is with WEKA’s Nvidia GPUDirect performance of 97.9 GB/s of throughput to 16 NVIDIA A100 GPUs and 113.1 GB/sec to an Nvidia DGX-2 server. Presumably, WEKA could provide higher bandwidth if more Nvidia GPUs were targeted, thus requiring more WEKA nodes.

The ESG analyst house has validated WEKA’s performance against a number of benchmarks, but does not specifically state IOPS and GB/sec bandwidth. Similarly, the many WEKA STAC benchmarks generally do not identify general OPS or throughput numbers.

WEKA’s own statistics for its AWS Performance calls for large file (1 MB) read performance of over 100 GB/s and small file (4 KB) read performance of over 5 million IOPS across 16 EC2 instances. [Conveniently for Oracle’s marketers, their lower, 80-server, 5.5 million read IOPS number exceeds the AWS 5 million-plus IOPS result. It wouldn’t look so good if the OCI result was lower than the AWS one.]

These AWS-WEKA numbers are a far cry from the 80 and 373 instances used in the OCI tests, but that’s no reason to think that WEKA’s AWS performance is lower than its OCI performance.

Our thinking is that WEKA could scale to the same levels of OCI performance on AWS if it increased the number of compute instances; it is evolutionary software, after all. On a linear scale, it would take approximately 56 AWS compute instances, of the type used in the 16-instance test, to achieve the OCI 373 server result of 17.4 million IOPS.

There are no equivalent datasheets for WEKA running in Azure or GCP clouds which we could find on the WEKA resources web page.