Network Working Group Joe Tansey Internet-Draft: draft-joetansey-alvc-codec-01 Cisco Intended status: Experimental August 28, 2025 Expires: February 27, 2026 Adaptive Layered Voice Codec (ALVC) for LPWAN Store-and-Forward Joe Tansey Cisco Email: joetanse@cisco.com draft: draft-joetansey-alvc-codec-01 Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on February 27, 2026. Copyright Notice Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. ---------------- Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved. Author: Joe Tansey Cisco Systems 170 West Tasman Dr San Jose, CA 95134 United States Email: joetanse@cisco.com Abstract -------- This document specifies the Adaptive Layered Voice Codec (ALVC), a scalable speech codec optimized for extremely constrained low-power wide-area networks (LPWANs). ALVC enables intelligible base layer playback at sub-kilobit rates and progressive quality improvement via enhancement layers delivered asynchronously. The design supports store- and-forward operation, fragmentation, unequal error protection, and monotonic refinement from partial reception. Table of Contents ----------------- 1. Introduction 2. Requirements Language 3. Use Cases and Deployment Scenarios 4. Operational Constraints 5. Signal Model and Analysis 6. Layered Structure 7. Bitstream Syntax 8. Framing and Bit Budgets 9. Packet Loss, FEC, and Concealment 10. Decoder Behavior from Partial Layers 11. Complexity and Memory 12. Interoperability and Profiles 13. Security Considerations 14. Privacy Considerations 15. IANA Considerations 16. Acknowledgments 17. References 1. Introduction --------------- Low-power wide-area networks (LPWANs) such as LoRaWAN, Sigfox, and NB- IoT typically carry short telemetry messages and cannot sustain interactive voice. Nevertheless, many industrial, public safety, and remote operations scenarios benefit from delayed voice delivery, for example voice notes that can be forwarded opportunistically through gateways. This document introduces the Adaptive Layered Voice Codec (ALVC), which explicitly separates intelligibility from fidelity. A low- rate base layer provides immediate comprehension from sparse fragments, and one or more enhancement layers provide incremental quality upgrades when channel capacity permits. ALVC is designed for robustness to out- of-order arrival, long one-way latency, and loss. Receivers can start playback after decoding an initial window of base-layer fragments and then upgrade already buffered audio as enhancements arrive. The codec is independent of any particular transport; a companion document specifies a SCHC based profile suitable for LPWANs. 2. Requirements Language ------------------------ The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 (RFC 2119 and RFC 8174) when, and only when, they appear in all capitals, as shown here. 3. Use Cases and Deployment Scenarios ------------------------------------- Industrial and Utilities: maintenance crews leave spoken annotations at remote assets; gateways collect fragments and forward them to control rooms. Public Safety and SAR: short updates where coverage is intermittent; base-layer-only reception provides meaningful information. Telemetry Plus Human Context: numeric telemetry with spoken rationale; base layer coexists with sensor payloads under strict duty-cycle limits. 4. Operational Constraints -------------------------- Deployments share: 50-200 byte payload budgets; strict duty-cycle limits; minutes-scale latency and out-of-order delivery; MCU-class endpoints. Goals include intelligibility at or below 1.2 kbps, progressive refinement without resending Layer-0, and graceful degradation under loss. 5. Signal Model and Analysis ---------------------------- Narrowband speech at 8 kHz. Analysis on 20 ms frames (50% overlap), grouped into 40-80 ms superframes. Fixed-point pitch tracking and LPC analysis. Quantizers are chosen to support most-significant-bit-first emission for graceful truncation. 6. Layered Structure -------------------- Layer-0 (Base): independently decodable, ~0.8-1.2 kbps; voicing, pitch, gain, compact spectral envelope (e.g., LSF), sparse excitation indices. Layer-1 (Core): refines envelope, pitch, and gain; MSB-first emission recommended. Layer-2 (High-Band): adds high-band envelopes and sibilance cues; playback remains narrowband when absent. Layer-R (Residual): optional transform-coded residual in bit-planes; encoders may stop opportunistically. 7. Bitstream Syntax ------------------- Top-level fields: version and profile ID; superframe index and duration (40 or 80 ms); layer presence bitmap; optional CRC16. Top-Level Header Bitfield (Informative): 0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +---------------+---+---+-------+---------------------------+---+ | Version(4) | P | D | Res | SuperframeIndex(16) |C16| +---------------+---+---+-------+---------------------------+---+ | LAYER MAP (8) | OPTIONAL CRC16 (if C16=1) | +---------------+-----------------------------------------------+ 8. Framing and Bit Budgets -------------------------- Default superframe duration: 40 ms (two frames) or 80 ms (four frames). Illustrative budgets: Layer-0 at 800 bps (32 bits per 40 ms); Layer-0 at 1200 bps (48 bits per 40 ms); Layer-1 adds 500-1200 bps; Layer-2 adds 300-800 bps; Layer-R opportunistic. Worked Example (Informative): Layer-0 at 0.8 kbps -> 32 bits per 40 ms; 20 s -> 500 superframes -> 16,000 bits ~2,000 bytes; with 5-byte headers, overhead ~2,500 bytes; total Layer-0 ~4.5 kB including parity; with 80 B MTU, ~57 payloads (+20-30% parity). 9. Packet Loss, FEC, and Concealment ------------------------------------ Layer-0 MUST use FEC such as Reed-Solomon or fountain; enhancements SHOULD use lighter FEC or none; encoders MAY embed low-rate forward copies; enhancements MAY be MSB-first; when base fragments are unrecoverable, decoders MUST apply PLC using pitch-synchronous synthesis and noise fill. 10. Decoder Behavior from Partial Layers ---------------------------------------- Receivers MUST permit immediate playback from Layer-0 alone and MUST apply received enhancements idempotently to improve buffered audio without discontinuities. Implementations SHOULD support background re- synthesis and MAY cache undecoded enhancements. 11. Complexity and Memory ------------------------- Targets fixed-point MCUs. Reference decoder < 40 MHz (Layer-0), < 80 MHz with one enhancement. Informative: Cortex-M4 @ 80 MHz decodes Layer-0 in real time with <25% CPU and <32 KiB RAM; +one enhancement ~50% CPU and +8 KiB RAM; M7/A-class provide headroom. 12. Interoperability and Profiles --------------------------------- Interoperability via profiles fixing Layer-0 bit allocation and frame durations; a LoRaWAN example is in the companion transport document. 13. Security Considerations --------------------------- Bitstreams SHOULD be protected via authenticated encryption. With CoAP, use OSCORE. With SCHC, per-fragment AEAD is RECOMMENDED. Implementations MUST avoid variable-time decoding paths that leak content via timing. 14. Privacy Considerations -------------------------- Voice content is sensitive; support redaction/transcript-only modes and retention limits; metadata timing can leak usage patterns; padding and batching MAY mitigate. 15. IANA Considerations ----------------------- This document has no IANA actions. 16. Acknowledgments ------------------- Thanks to colleagues for feedback on layered speech models and LPWAN operation. 17. References ------------- Normative: RFC 2119; RFC 8174; RFC 8613. Informative: RFC 6716; RFC 8724; RFC 9363; RFC 7252. Authors' Addresses Joe Tansey Cisco Systems 170 West Tasman Dr San Jose, CA 95134 United States Email: joetanse@cisco.com