Internet Engineering Task Force B. Zhang, Ed. Internet-Draft Pengcheng Laboratory Intended status: Informational Y. Dai, Ed. Expires: 20 July 2026 Sun Yat-sen University B. Shen, Ed. Harbin Institute of Technology 16 January 2026 Computing metrics as a service (CMAS) for facilitating traffic steering in CATS framework draft-zhangb-cats-cmas-00 Abstract In the context of CATS applications, resource modeling and dynamic scheduling face core challenges: heterogeneous computing resources (e.g., CPUs, GPUs, FPGAs) with differentiated characteristics are difficult to unify through traditional coarse-grained metrics (e.g., virtual machine/container counts). Moreover, dynamically changing resource states (e.g., resource occupancy, service instance load cycles) complicate routing table maintenance in network nodes, creating bottlenecks for resources scheduling. This document provides a service-oriented computing capability modeling framework, abstracting heterogeneous resources into standardized service units for efficient resource allocation. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 20 July 2026. Copyright Notice Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved. Zhang, et al. Expires 20 July 2026 [Page 1] Internet-Draft cmas January 2026 This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 3. Computing Metrics As a Service (CMAS) . . . . . . . . . . . . 4 4. Service Registration . . . . . . . . . . . . . . . . . . . . 6 5. Service Deployment . . . . . . . . . . . . . . . . . . . . . 7 6. Service Announcement . . . . . . . . . . . . . . . . . . . . 9 7. Service Distribution . . . . . . . . . . . . . . . . . . . . 11 8. Service Consuming Process . . . . . . . . . . . . . . . . . . 13 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 9.1. Informative References . . . . . . . . . . . . . . . . . 14 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 14 1. Introduction Computing-aware traffic steering (CATS) is a traffic engineering approach that takes into account the dynamic nature of computing resources and network state to optimize service-specific traffic forwarding towards a given service instance. As described in [I-D.ietf-cats-framework], the Computing-Aware Traffic Steering (CATS) framework assumes that there might be multiple service instances that are providing one given service, which are running in one or more service sites. Each of these service instances can be accessed via a service contact instance, which is a client-facing service function instance. A single service site may host one or multiple service contact instances. A single service site may have limited computing resources available at a given time, whereas the various service sites may experience different resource availability issues over time. Therefore, steering traffic among different service sites can address the issues of lacking resources in a specific service site. Base on this, [I-D.ietf-cats-framework] provides an architectural framework that aims at facilitating the making of compute- and network-aware traffic steering decisions in networking environments where computing service resources are deployed. Zhang, et al. Expires 20 July 2026 [Page 2] Internet-Draft cmas January 2026 In CATS framework, C-SMA collects both computing-related capabilities and metrics, and associates them with a CS-ID that identifies the service. The C-SMA then advertises CS-IDs along with metrics to related C-PSes in the network. Computing metrics are very huge and may change very frequently, which make them unsuitable for direct dissemination on the network. [I-D.ietf-cats-metric-definition] proposes to use normalized metrics in CATS. Level 1 and level 2 metrics are proposed to transfer on the network instead of the level 0 raw metrics. [I-D.ietf-cats-metric-definition] only provides the metric representation of level 1 and level 2 metrics, but does not provide the concrete methods or algorithms to normalize metrics, which is left for the service provider to construct their own normalization methods. However, unlike electricity, computing metrics cannot be quantified simply in units like "kWh/kWh", especially considering the different types of CPUs, GPUs, FPGAs, ASICs and other chips, it is difficult to make a unified measurement. The concrete normalization method for computing metrics is very hard and is a key factor hindering the development of CATS. The normalization will face two challenges. The first is different service provider may use different normalization method, which will make C-PS hard to decide for a normalized metric. The second is a normalized value may lose important information of the concrete raw metrics. To solve this problem, this draft proposes a public service platform, which not only makes it convenient for clients to find services they want to use and for service sites to find services they want to deploy, but most importantly, in this platform, services and computing metrics are bundled, and the service site allocates resources according to the computing metric units bundled with services when deploying specific services. In this way, each service site only needs to disseminate CS-IDs, the number of each CS-ID, the cost of each CS-ID, and CSCI-ID in the network. This simple information associated with CS-ID can replace the propagation of the service site's computing metrics, making it possible for CATS to be widely used on the Internet. We call this normalization method for computing metrics as CMAS(Computing Metrics As a Service). 2. Terminology This document makes use of the terms defined in [I-D.ietf-cats-framework] and also makes use of the following terms: Zhang, et al. Expires 20 July 2026 [Page 3] Internet-Draft cmas January 2026 * Computing Metrics as a Service (CMAS): CMAS is a standardization approach that packages computing metrics (e.g., FLOPS, memory, latency) alongside services. When deploying specific services, the service site allocates resources based on these bundled metric units, enabling efficient, service-oriented resource allocation across heterogeneous infrastructures. * Public service platform: The public service platform hosts the complete set of CATS public services and acts as a bridge between clients and service sites. From it, service sites can download and deploy offerings, while clients can formulate and submit their service requests. 3. Computing Metrics As a Service (CMAS) The public service platform provides all public services of the CATS framework and serves as a bridge between clients and service sites, from which the service sits can download and deploy some services to provide service for clients and the clients can build their service requests. Table 1 illustrates a typical public service table—an openly searchable and browsable registry for both clients and service sites. Zhang, et al. Expires 20 July 2026 [Page 4] Internet-Draft cmas January 2026 +--------------+-----------------+---------------------+-------------------+---------------+-----------------+-----------------+-----------------+-----------------+ | Service ID | Service Name | Input | Service | Service | Computing | Storage | Computing Time | Software | | | | | Description | Runing Code | Requirement | Requirement | | Dependency | | | | | | | | | | | +--------------+-----------------+---------------------+-------------------+---------------+-----------------+-----------------+-----------------+-----------------+ | | | Motion Capture | This service | | | | | | | | | Voice Tracking |receives multiple | |multi-thread CPUs| 16GB DRAM | | Unity, | | AR1 | AR/VR | Eye Tracking |inputs from sensors| Github Link |with minimum | 256GB SSD | ≤ 1ms | Unreal Engine, | | | | Environmental |and generate scenes| |2.0GHz; Higher | | | etc. | | | | Sensing | | |than RTX 4060 | | | | +--------------+-----------------+---------------------+-------------------+---------------+-----------------+-----------------+-----------------+-----------------+ | | | Transport standard | Automation Driving| | CPU: ≥4.0GHz, | 64GB DDR5 DRAM | | Apache Kafka | | TP1 | Intelligent | datas, transport | Sensing Enviroment| Github Link | ≥24MB L3 Cache, | ≥1TB NVMe SSD | ≤20ms | Apollo | | | transportation | traffic info, etc. | | | GPU: ≥200 TOPS | | | CUDA | |--------------+-----------------+---------------------+-------------------+---------------+-----------------+-----------------+-----------------+-----------------+ | | | Video input source | Video Game Live | |CPU: ≥4.5GHz, 12 | ≥32GB DDR5 DRAM | depending on | OBS Studio | | LB1 | Live | Audio input source | Interaction Live | Github Link |cores; GPU: NVENC| ≥5TB NVMe SSD | pecific scene | WebRTC | | | broadcase | Interaction input | Sport Live | |encoder | | 0.5s - 3s | FFmpeg | |--------------+-----------------+---------------------+-------------------+---------------+-----------------+-----------------+-----------------+-----------------+ | | | speech input |real-time caption | | CPU: ≥3.5GHz, 16| ≥32GB DDR5 DRAM | | CUDA/cuDNN | | ST1 | Simultaneous |(optional) action cap|conf. translation | Github Link | threads; GPU: | ≥1TB NVMe SSD | ≤ 1s | Apache Kalfa | | | interpretation | interaction input | | | RTX 4090, FP16 | ≥16GB GPU DRAM | | | +--------------+-----------------+---------------------+-------------------+---------------+-----------------+-----------------+-----------------+-----------------+ Table 1: examples of the service table in the public service platform. The service ID represents the service. The service name is the name of the service. The input describes the concrete information and format of the input data for the service. The service description details the concrete function of the service. The service code contains the location of the service code. The computing requirement lists the basic computing resources demands of the service such as CPU/GPU/NPU detailed information. The storage requirement lists the basic storage demands of the service such as memory and disk detailed information. The computing time describes the computing delay of the service when computing a basic data sample. The software dependency describes the software environment for deploying the service. The service ID is empowered to indicate a kind of service ability by the service table. The client can query the service table to find the service he may interest, build his service request using the service ID, and send the service request to its Ingress CATS- Forwarder to get the service. The service site can query the service table and find services interested, allocates resouces based on the computing and storage requirements of a service and deploy these services as service instances. A Service contact instance can be run on the service site for these service instances who provide the same service. Zhang, et al. Expires 20 July 2026 [Page 5] Internet-Draft cmas January 2026 By ingeniously designing the public-service platform, clients can formulate their requirements in plain service-language, while service sites normalize their heterogeneous compute and storage into service- specific units only—no unified abstraction across resource types is required. The service table spells out a common resource recipe (CPU, memory, runtime) for one logical service unit. A site may therefore: * allocate 3× that recipe and run three AR1 service instances, or * allocate 4× and run four TP1 service instances, all according to its own capacity and business goals. The computing time listed in the table is the delay measured when the basic recipe processes the basic data sample (Table 1). If a client wants faster turnaround, he simply requests more service instances (higher Gas); CATS will pick the site/instance combination whose real computing time ≤ requested delay, while keeping cost close to his stated budget. 4. Service Registration +--------------+-----------------+---------------------+ | Service ID | Sample | Result | | | | | | | | | +--------------+-----------------+---------------------+ | | | | | | | | | AR1 | data sample | computing result | | | of AR1 | of AR1 data sample | | | | | +--------------+-----------------+---------------------+ | | | | | TP1 | data sample | computing result | | | of TP1 | of TP1 data sample | |--------------+-----------------+---------------------+ | | | | | LB1 | data sample | computing result | | | of LB1 | of LB1 data sample | |--------------+-----------------+---------------------+ | | | | | ST1 | data sample | computing result | | | of ST1 | of ST1 data sample | +--------------+-----------------+---------------------+ Table 2: examples of the service sample result table. Zhang, et al. Expires 20 July 2026 [Page 6] Internet-Draft cmas January 2026 +----------+ +----------------+ | Service | | Public Service | | Provider | | Platform | +----------+ +----------------+ | | | | | 1: Registration(service name, input, discription, running|code, computing requirement, | storage requirement, computing time, software|dependency data sample, result) |--------------------------------------------------------> | | | | 2. Authentication(who are you?) | |<-------------------------------------------------------- | | | | 3. Auth_ RESPONSE(IP, domain, host, port, params) | |--------------------------------------------------------->| | | | 4. REGISTER_RESPONSE(Service ID, success/failure) | |<---------------------------------------------------------|Build service table and | |service sample result table | | Figure 1: Service Registration Workflow A service site as a public service provider or contributor can register some specific services to the public service platform based on the fields in the service table. The public service platform should authenticate the service provider who provide the public service. The concrete authentication method is out of scope of this document. After authentication, a service ID is assigned for a registered service, and a service will be added in the service table. In addition to the fields listed in Table 1, the raw data sample and the computation results produced by the service are also uploaded to the public service platform. These data are used to build an internal service sample result table (illustrated in Table 2), which validates whether the service has been correctly deployed on a service site. This table is private and cannot be accessed by either clients or service sites. The complete service-registration workflow is shown in Figure 1. 5. Service Deployment A service site—such as a regional cloud-computing pool—begins by browsing the public service platform’s catalogue and applying for the specific services it intends to host. Before any resource is allocated, the platform authenticates the site (concrete authentication methods are outside the scope of this document) and verifies that its available compute, and storage capacity meet the computing and storage requirements listed in the service table. Once approved, the service site: 1.reserves the required amount of CPU, Zhang, et al. Expires 20 July 2026 [Page 7] Internet-Draft cmas January 2026 GPU, RAM and disk space; 2.instantiates a dedicated virtual machine (VM) that will act as the service instance; 3.installs all software dependencies declared in the service table (libraries, drivers, runtimes); 4.downloads the service’s runnable code bundle from the platform; 5.executes the code inside the VM, thereby starting the service instance. +----------+ +----------------+ | Service | | Public Service | | Site | | Platform | +----------+ +----------------+ | | | | | 1: Apply(Service id) | |--------------------------------------------------------> | | | | 2. Authentication(who are you?) | |<-------------------------------------------------------- | | | | 3. Auth_ RESPONSE(IP, domain, host, port, params) | |--------------------------------------------------------->| | | | 4. Apply_RESPONSE(service id, running code, data sample,|computing requirement, | storage requirement, computing time, software|dependency) |<---------------------------------------------------------| | | | Allocate resources and build the service contact instance| | 5: Validation(Service id, result) | |--------------------------------------------------------->| | | | 6. Validation_RESPONSE(service id, success/failure) | |<-------------------------------------------------------- | | | Figure 2: Service Deployment Workflow Before the service announcement to the C-SMA, the service deployment must be validated by the public service platform. The public service platform sends a pre-defined data sample to the newly created service instance, and compares the returned computation results with those stored in the private service sample result table. Only if the two outputs match within the allowed tolerance is the instance deemed valid and eligible for publication to clients. The complete service deployment workflow is illustrated in Figure 2. If passed validation, the service instance is allowed to accept client requests, process them according to the service logic, and return results. Zhang, et al. Expires 20 July 2026 [Page 8] Internet-Draft cmas January 2026 6. Service Announcement After passing validation, the service site models its available compute and storage resources in terms of the services it can actually deliver. Table 3 illustrates such a service model table: each row describes one service type (e.g., AR1, ML-Inference), while the columns expose the site's current capacity, economics and contact points. * GAS (Global Available Slots) – the total number of identical service instances that the site is willing to run concurrently. Example: GAS = 3 for AR1 means the site will keep three AR1 VMs alive, so three clients can be served simultaneously. * Cost per instance – the site-declared price for one such slot. Rule of thumb: an edge site with scarce GPUs may set higher cost than a central cloud with abundant resources. * CSCI-ID – a tiny proxy VM created per service, whose public IP is published as the CSCI-ID. Role: handles concrete hand-over (token exchange, redirect, health ping) so that the main service VM remains shielded from direct client traffic. Thus, after validation the service site: 1.inserts one row per service into its service model table; 2.spawns GAS identical worker VMs; 3.starts one proxy per service and stores its CSCI-ID into the service model table; 4.updates cost and GAS in real time as local load or hardware changes occur. This allows CATS to rank and select the most economical or closest instance for each client request, while the service site retains full control over its own pricing and capacity policies. Zhang, et al. Expires 20 July 2026 [Page 9] Internet-Draft cmas January 2026 +--------------+-----------------+---------------------+------------------------+ | Service ID | Gas | Cost | CSCI-ID | | | | | | | | | | | +--------------+-----------------+---------------------+------------------------+ | | | | | | | | | | | AR1 | 3 | 4 | IP address | | | | | | | | | | | +--------------+-----------------+---------------------+------------------------+ | | | | | | TP1 | 6 | 5 | IP address | | | | | | |--------------+-----------------+---------------------+------------------------+ | | | | | | LB1 | 2 | 7 | IP address | | | | | | |--------------+-----------------+---------------------+------------------------+ | | | | | | ST1 | 1 | 2 | IP address | | | | | | +--------------+-----------------+---------------------+------------------------+ Table 3: example of the service model table of a service site. In this way, a service site turns its raw compute and storage capacity into service-oriented offers without ever exposing internal computing metrics to the network. Instead of publishing FLOPS, memory sizes or utilization curves, the site simply maintains and distributes its service model table—a concise, standardized summary of how many instances of each service type it can run and at what cost. * Initial state: the site sends the entire table to the C-SMA. * Subsequent changes: only the delta (new deployments, added/removed instances, price adjustments) is transmitted, keeping updates lightweight and avoiding the complex normalization of raw computing metrics. CMAS turns the traditional flood of raw computing metrics into a single, lightweight service model table. Because the table contains only service counts and cost values, the information volume shrinks dramatically and is immediately understandable to the C-PS. Resource management inside the site is equally simplified: * To allocate resources for a service, the site simply increments its GAS counter by one. Zhang, et al. Expires 20 July 2026 [Page 10] Internet-Draft cmas January 2026 * To free resources, it decrements GAS by one. No normalization, no complex telemetry, no metric flooding—just “+1” or “-1” against its own service model table. 7. Service Distribution [I-D.ietf-cats-framework] describes that a C-SMA collects both computing-related capabilities and metrics, and associates them with a CS-ID that identifies the service, then advertises CS-IDs along with metrics to related C-PSes in the network. With CMAS mechanism, the C-SMA only needs to collect a minimal service-metric tuple—(Service ID, CSCI-ID, GAS, cost)—from each site. Meanwhile, the C-NMA gathers purely network-related metrics such as delay, jitter, and bandwidth; no raw computing figures are ever distributed, keeping both data collection and cross-domain orchestration lightweight and standardized. Figure 3 illustrates how CATS metrics are disseminated under the CMAS mechanism. A client reaches the network through “CATS-Forwarder 1”. For the service identified by CS-ID “1”, two contact instances exist: * Instance CSCI-ID “1” at Service Site 2 (reachable via CATS- Forwarder 2) * Instance CSCI-ID “3” at Service Site 3 (reachable via CATS- Forwarder 3) Additionally, two separate services (CS-ID “2” and CS-ID “3”) each have one contact instance located at Service Site 2 and Service Site 3, respectively. Zhang, et al. Expires 20 July 2026 [Page 11] Internet-Draft cmas January 2026 CS-ID 1, CSCI-ID 1, gas, cost CS-ID 2, CSCI-ID 2, gas, cost :<----------------------: : : +---------+ +----------+ : : |CS-ID 1 | |Public | : : .--|CSCI-ID 1| |Service | : +----------------+ | +---------+ |Platform | : | C-SMA |----| Service Site 2 +----------+ : +----------------+ | +---------+ |computing| : |CATS-Forwarder 2| '--|CS-ID 2 | | time | : +----------------+ |CSCI-ID 2| +--------+ | : | +---------+ | Client | | : Network +----------------------+ +--------+ | : delay | +-------+ | | | : :<---------| C-NMA | | | | : : | +-------+ | +---------------------+ | | |CATS-Forwarder 1|C-PS|----| | +---------------------+ | Underlay | :Service | Infrastructure | +---------+ :table | | |CS-ID 1 | : +----------------------+ .---|CSCI-ID 3| : | | +---------+ : +----------------+ +------+ : |CATS-Forwarder 3|--|C-SMA | Service Site 3 : +----------------+ +------+ : : | : : | +-----------+ : : '---|CS-ID 3 | : : |CSCI-ID 4 | :<-------------------------------: +-----------+ CS-ID 1, CSCI-ID 3, gas, cost CS-ID 3, CSCI-ID 4, gas, cost Figure 3: An Example of CATS Service Metric Dissemination. The service table formed in C-PS in this example is: (CS-ID 1, CSCI-ID 1, gas, cost, Computing time, Network delay) (CS-ID 1, CSCI-ID 3, gas, cost, Computing time, Network delay) (CS-ID 2, CSCI-ID 2, gas, cost, Computing time, Network delay) (CS-ID 3, CSCI-ID 4, gas, cost, Computing time, Network delay) Zhang, et al. Expires 20 July 2026 [Page 12] Internet-Draft cmas January 2026 In Figure 3, the C-SMA co-located with “CATS-Forwarder 2” advertises the CMAS metrics for both service-contact instances it covers—namely (CS-ID 1, CSCI-ID 1, gas, cost) and (CS-ID 1, CSCI-ID 2, gas, cost). Likewise, the C-SMA agent at “Service Site 3” publishes the metrics for the two services hosted by that site. All these service-metric advertisements are received and processed by the C-PS hosted on “CATS-Forwarder 1”, which also handles the network-metric advertisements sent by the C-NMA. Thanks to CMAS, the C-PS can effortlessly build and maintain a unified service view for every offering. It simply collects all service metrics from the sites, appends the network metrics (here, delay) advertised by the C-NMA, and fetches each service's computing time from the public service platform. This yields a comprehensive service table in the form (Service ID, CSCI-ID, Gas, Cost, Computing time, Network delay), which we call it the whole service table. Using this table, the C-PS selects the most suitable path to the egress CATS-Forwarder by evaluating: * the client's initial service request ("CS-ID 1" or "CS-ID 2"), * the real-time state of each service-contact instance (gas, cost, computing time), and * the current network state (delay and other metrics). 8. Service Consuming Process In the example of Figure 3, the client first queries the public service platform to build its request from Table 1. The request is expressed as a 4-tuple: (Service ID, Gas, Cost, Delay). It is injected into the network through "CATS-Forwarder 1" (ingress role), which forwards it to the C-PS. The C-PS (co-located or centralised) selects the CSCI-ID that best matches the tuple by comparing: * Gas ≥ requested Gas, * Cost closest to requested Cost, * Real Delay ≤ requested Delay, where Real Delay = Computing Time + Network Delay taken from the whole service table. The C-PS returns a 6-tuple response: (Service ID, CSCI-ID, Gas, Real- Cost, Real-Delay, success). When this response reaches "CATS- Forwarder 1", the forwarder notifies the client that the service is ready with a simplified acknowledgement: (Service ID, Gas, Real-Cost, Real-Delay, success). The client then: Zhang, et al. Expires 20 July 2026 [Page 13] Internet-Draft cmas January 2026 * pays the Real-Cost, * assembles input data according to the "input" schema in Table 1, * sends (Service ID, data) back to "CATS-Forwarder 1". Using the CSCI-ID returned in the response, the forwarder establishes a direct data-plane tunnel between the client and the selected service-contact instance. Client data and subsequent computing results flow through this tunnel; no further routing decisions are needed. Immediately after the tunnel is set up, the forwarder signals the C-PS to decrement the corresponding GAS value in the global service table. When the service completes, the contact instance itself sends a “service-finished” heartbeat to the C-PS, which increments GAS again, keeping the table accurate in real time. 9. References 9.1. Informative References [I-D.ietf-cats-usecases-requirements] Yao, K., "Computing-Aware Traffic Steering (CATS) Problem Statement, Use Cases, and Requirements", June 2025, . [I-D.ietf-cats-metric-definition] Kumari, W. and K. Yao, "CATS Metrics Definition", July 2025, . [I-D.ietf-cats-framework] Li, C., "A Framework for Computing-Aware Traffic Steering (CATS)", July 2025, . Authors' Addresses Bin Zhang (editor) Pengcheng Laboratory Sibilong Street Shenzhen 518055 China Email: bin.zhang@pcl.ac.cn Zhang, et al. Expires 20 July 2026 [Page 14] Internet-Draft cmas January 2026 Yina Dai (editor) Sun Yat-sen University Sun Yat-sen Street Guangzhou 510080 China Email: daiyn5@mail2.sysu.edu.cn Bowen Shen (editor) Harbin Institute of Technology Taoyuan Street Shenzhen 518055 China Email: shenbowen@stu.hit.edu.cn Zhang, et al. Expires 20 July 2026 [Page 15]