CBOR L. Lundblade Internet-Draft Security Theory LLC Intended status: Standards Track 13 May 2025 Expires: 14 November 2025 CBOR Serialization and Determinism draft-lundblade-cbor-serialization-00 Abstract This document updates and clarifies CBOR Serialization and Deterministic Encoding as defined in [RFC8949]. It also provides background explanations that were not included in the original specification. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 14 November 2025. Copyright Notice Copyright (c) 2025 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Lundblade Expires 14 November 2025 [Page 1] Internet-Draft CBOR Serialization May 2025 Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 2. Information Model, Data Model and Serialization . . . . . . . 3 3. Preferred Serialization . . . . . . . . . . . . . . . . . . . 4 3.1. Encoder Requirements . . . . . . . . . . . . . . . . . . 4 3.2. Decoder Requirements . . . . . . . . . . . . . . . . . . 5 3.3. When to use Preferred Serialization . . . . . . . . . . . 6 4. CBOR Deterministic Encoding Requirements . . . . . . . . . . 6 4.1. Encoder Requirements . . . . . . . . . . . . . . . . . . 6 4.2. Decoder Requirements . . . . . . . . . . . . . . . . . . 7 4.3. When to use Deterministic Serialization . . . . . . . . . 7 5. Deterministic Encoding for Popular Tags . . . . . . . . . . . 7 5.1. Date Strings, Tag 0 . . . . . . . . . . . . . . . . . . . 7 5.2. Epoch Date, Tag 1 . . . . . . . . . . . . . . . . . . . . 7 5.2.1. Encoder Requirements . . . . . . . . . . . . . . . . 8 5.2.2. Decoder Requirements . . . . . . . . . . . . . . . . 8 5.3. Big Numbers, Tags 2 and 3 . . . . . . . . . . . . . . . . 8 5.4. Big Floats and Decimal Fractions, Tags 4 and 5 . . . . . 8 5.4.1. Encoder Requirements . . . . . . . . . . . . . . . . 8 5.4.2. Decoder Requirements . . . . . . . . . . . . . . . . 8 6. General Protocol Considerations for Determinism . . . . . . . 8 7. CDDL Support . . . . . . . . . . . . . . . . . . . . . . . . 10 8. Security Considerations . . . . . . . . . . . . . . . . . . . 10 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 10 10. Normative References . . . . . . . . . . . . . . . . . . . . 10 Appendix A. Examples and Test Vectors . . . . . . . . . . . . . 10 Appendix B. Explanation for Big Number Preferred Serialization . . . . . . . . . . . . . . . . . . . . . . 10 Author's Address . . . . . . . . . . . . . . . . . . . . . . . . 11 1. Introduction This document provides a complete definition of both Preferred Serialization and CBOR Deterministic Encoding Requirements (CDER) such that the reader does not need to refer to their definitions in [RFC8949]. The overwhelming purpose of this document is clarity and ease for the CBOR ecosystem on the subject of serialization and determinism. Aside from one small change, this restatement of the requirements doesn’t change anything in [RFC8949]. No new concepts or terminology is introduced. The small change is to Preferred Serialization. The conditional “preference” for deterministic length encoding in Section 4.1 of [RFC8949] is promoted to an unconditional requirement by this document. This change is considered reasonably compatible with the Lundblade Expires 14 November 2025 [Page 2] Internet-Draft CBOR Serialization May 2025 extant CBOR ecosystem. Since the publication of [RFC8949], a period of five years, the CBOR community largely assumed deterministic length encoding was a requirement of Preferred Serialization. It is better to make this minor change than to create a third serialization concept that would compound the complexity and confusion in this part of the CBOR ecosystem. 2. Information Model, Data Model and Serialization To understand CBOR serialization and determinism, it's helpful to distinguish between the general concepts of an information model, a data model, and serialization. +================+=============+===================+================+ | | Information | Data Model | Serialization | | | Model | | | +================+=============+===================+================+ | Abstraction | Top level; | Realization of | Actual bytes | | Level | conceptual | information in | encoded for | | | | data structures | transmission | | | | and data types | | +----------------+-------------+-------------------+----------------+ | Example | The | A floating- | Encoded CBOR | | | temperature | point number | of a floating- | | | of | representing | point number | | | something | the temperature | | +----------------+-------------+-------------------+----------------+ | Standards | | CDDL | CBOR | +----------------+-------------+-------------------+----------------+ | Implementation | | API Input to | Encoded CBOR | | Representation | | CBOR encoder | in memory or | | | | library, output | for | | | | from CBOR | transmission | | | | decoder library | | +----------------+-------------+-------------------+----------------+ Table 1 CBOR doesn't provide facilities for information models. They are mentioned here for completeness and to provide some context. CBOR defines a palette of basic types that are the usual integers, floating-point numbers, strings, arrays, maps and other. Extended types may be constructed from these basic types. These basic and extended types are used to construct the data model of a CBOR protocol. While not required, [RFC8610] may be used to describe the data model of a protocol. The types in the data model are serialized per [RFC8949] to create encoded CBOR. Lundblade Expires 14 November 2025 [Page 3] Internet-Draft CBOR Serialization May 2025 CBOR allows certain data types to be serialized in multiple ways to facilitate easier implementation in constrained environments. For example, indefinite-length encoding enables strings, arrays, and maps to be streamed without knowing their length upfront. Crucially, CBOR allows — and even expects — that some implementations will not support all serialization variants. In contrast, JSON permits variations (e.g., representing 1 as 1, 1.0, or 0.1e1), but expects all parsers to handle them. That is, the variation in JSON is for human readability, not to facilitate easier implementation in some environments. Since CBOR does not require implementations to support every serialization variant, defining a common serialization format is highly beneficial for those that don’t need specialized encoding. This is the role of preferred serialization. It mandates a specific variant for each data type when multiple options exist. 3. Preferred Serialization The requirements in the next two sections replace the definition of Preferred Serialization in [RFC8949]. They are restated in normative form to be more clear and so they can be formally referenced by the restatement of Section 4. As mentioned in Section 1 there is one change relative to the definition of Preferred Serialization in [RFC8949]. 3.1. Encoder Requirements 1. Shortest-form encoding of the argument MUST be used for all major types. The shortest form encoding for any argument that is not a floating point value is: * 0 to 23 and -1 to -24 MUST be encoded in the same byte as the major type. * 24 to 255 and -25 to -256 MUST be encoded only with an additional byte (ai = 0x18). * 256 to 65535 and -257 to -65536 MUST be encoded only with an additional two bytes (ai = 0x19). * 65536 to 4294967295 and -65537 to -4294967296 MUST be encoded only with an additional four bytes (ai = 0x1a). Lundblade Expires 14 November 2025 [Page 4] Internet-Draft CBOR Serialization May 2025 2. If maps or arrays are emitted, they MUST use definite-length encoding (never indefinite-length). 3. If text or byte strings are emitted, they MUST use definite- length encoding (never indefinite-length). 4. If floating-point numbers are emitted, the following apply: * The length of the argument indicates half (binary16, ai = 0x19), single (binary32, ai = 0x1a) and double (binary64, ai = 0x1b) precision encoding. If multiple of these encodings preserve the precision of the value to be encoded, only the shortest form of these MUST be emitted. That is, encoders MUST support half-precision and single-precision floating point. Positive and negative infinity and zero MUST be represented in half-precision floating point. * NaNs, and thus NaN payloads MUST be supported. As with all floating point numbers, NaNs with payloads MUST be reduced to the shortest of double, single or half precision that preserves the NaN payload. The reduction is performed by removing the rightmost N bits of the payload, where N is the difference in the number of bits in the significand (mantissa) between the original format and the reduced format. The reduction is performed only (preserves the value only) if all the rightmost bits removed are zero. 5. If big numbers (tags 2 and 3) are supported, the following apply: * Positive values from 0 to 2^63 - 1 MUST be encoded as a type 0 integer. * Negative values from -1 to -(2^64) MUST be encoded as a type 1 integer. * Leading zeros MUST not be present in the byte string content of tag 2 and 3. * See also Appendix B. 3.2. Decoder Requirements 1. Decoders MUST accept shortest-form encoded arguments. 2. If arrays or maps are supported, definite-length arrays or maps MUST be accepted. Lundblade Expires 14 November 2025 [Page 5] Internet-Draft CBOR Serialization May 2025 3. If text or byte strings are supported, definite-length text or byte strings MUST be accepted. 4. If floating-point numbers are supported, the following apply: * Half-precision values MUST be accepted. * Double- and single-precision values SHOULD be accepted; leaving these out is only foreseen for decoders that need to work in exceptionally constrained environments. * If double-precision values are accepted, single-precision values MUST be accepted. * NaNs, and thus NaN payloads, MUST be accepted. 5. If big numbers (tags 2 and 3) are supported, type 0 and type 1 integers MUST be accepted in place of a byte string big number. Leading zeros in a big number byte string must be ignored. 3.3. When to use Preferred Serialization It is recommended that Preferred Serialization be used unless an application has special needs. It is usually implementations in constrained environments that have special needs. For example, indefinite-length encoding is useful to send a lot of data from a device that has insufficient memory to store the data to be sent. 4. CBOR Deterministic Encoding Requirements The requirements in the next two sections replace the definition of CDER from [RFC8949]: There are no differences between these requirements and those of [RFC8949]. This restatement is only for the sake of clarity. ([RFC8949] allowed indefinite-length encoding for preferred serialization but not for CDER; that is why there is a change to preferred serialization in this document but not to CDER). 4.1. Encoder Requirements 1. Preferred Serialization defined in Section 3.1 MUST be used. 2. If a map is emitted, the keys in it MUST be sorted in the bytewise lexicographic order of their deterministic encodings. Lundblade Expires 14 November 2025 [Page 6] Internet-Draft CBOR Serialization May 2025 4.2. Decoder Requirements 1. Decoders MUST meet the decoder requirements for Section 3.2. That is, deterministic encoding imposes no requirements over and above the requirements for decoding Preferred Serialization. 4.3. When to use Deterministic Serialization Most applications do not require deterministic encoding—even those that use signing or hashing to authenticate or protect the integrity of data. For example, the payload of a COSE_Sign message does not need to be encoded deterministically, because it is transmitted along with the message. The recipient receives the exact same bytes that were signed. Deterministic encoding becomes important when the data being protected is NOT transmitted in the form needed for authenticity or integrity checks—typically when that form is derived from other data. This can happen for reasons such as data size, privacy concerns, or other constraints. The only difference between preferred and non-deterministic serialization is map key sorting. Sorting can be prohibitively expensive in very constrained environments. However, in many systems, sorting maps is not costly, and deterministic encoding can be used by default. Deterministically encoded data is always decodable, even by receivers that do not specifically support deterministic encoding. It can also be helpful for debugging protocols. 5. Deterministic Encoding for Popular Tags The definitions of the following tags in [RFC8610] allow variation in the data mode, thus it is useful to define a deterministic encoding for them should a particular deterministic protocol need one. The tags defined in [RFC8610] but not mentioned here have no variability in their data model. 5.1. Date Strings, Tag 0 TODO -- complete this work and remove this comment before publication 5.2. Epoch Date, Tag 1 Lundblade Expires 14 November 2025 [Page 7] Internet-Draft CBOR Serialization May 2025 5.2.1. Encoder Requirements The integer form MUST be used unless one of the following applies: (1) the date is too far in the past or future to fit in a 64-bit integer of type 0 or 1, or (2) the date requires sub-second precision. In these cases, the floating-point form MUST be used instead. 5.2.2. Decoder Requirements The decoder MUST decode both the integer and floating-point form. 5.3. Big Numbers, Tags 2 and 3 The determinism requirements for big numbers are part of the big number requirements that are part of Section 3. That is, the Preferred Serialization of big numbers is deterministic. See also Appendix B. 5.4. Big Floats and Decimal Fractions, Tags 4 and 5 5.4.1. Encoder Requirements The mantissa MUST be encoded in the preferred serialization form specified in Section 3.4.3 of RFC 8949. The mantissa MUST NOT contain trailing zeros. For example, the decimal fraction with value 10 must be encoded with a mantissa of 1 and an exponent of 1. For big floats, the mantissa must not include any trailing zero bits if encoded as a type 0 or 1 integer, and no trailing zero bytes if encoded as a big number 5.4.2. Decoder Requirements Both the integer and big number forms of the mantissa MUST be decoded. 6. General Protocol Considerations for Determinism This is the section that covers what is know as ALDR in some discussions. // RFC Editor: Please remove above sentence before publication In addition to Section 4 and Section 5, there are considerations in the design of any deterministic protocol. Lundblade Expires 14 November 2025 [Page 8] Internet-Draft CBOR Serialization May 2025 For a protocol to be deterministic, both the encoding (serialization) and data model (application) layer must be deterministic. While CDER ensures determinism at the encoding layer, requirements at the application layer may also be necessary. Here’s an example application layer specification: At the sender’s convenience, the birth date MAY be sent either as an integer epoch date or string date. The receiver MUST decode both formats. While this specification is interoperable, it lacks determinism. There is variability in the data model layer akin to variability in the CBOR encoding layer when CDER is not required. To make this example application layer specification deterministic, specify one date format and prohibit the other. A more interesting source of application layer variability comes from CBOR’s variety of number types. For instance, the number 2 can be represented as an integer, float, big number, decimal fraction and other. Most protocols designs will just specify one number type to use, and that will give determinism, but here’s an example specification that doesn’t: At the sender’s convenience, the fluid level measurement MAY be encoded as an integer or a floating-point number. This allows for minimal encoding size while supporting a large range. The receiver MUST be able to accept both integers and floating- point numbers for the measurement. Again, this ensures interoperability but not determinism—identical fluid level measurements can be represented in more than one way. Determinism can be achieved by allowing only floating-point, though that doesn’t minimize encoding size. A better solution requires the fluid level always be encoded using the smallest representation for every particular value. For example, a fluid level of 2 is always encoding as an integer, never as a floating-point number. 2.000001 is always be encoded as a floating- point number so as to not lose precision. See the numeric reduction defined by dCBOR. Although this is not strictly a CBOR issue, deterministic CBOR protocol designers should be mindful of variability in Unicode text, as some characters can be encoded in multiple ways. Lundblade Expires 14 November 2025 [Page 9] Internet-Draft CBOR Serialization May 2025 While this is not an exhaustive list of application-layer considerations for deterministic CBOR protocols, it highlights the nature of variability in the data model layer and some sources of variability in the CBOR data model (i.e., in the application layer). 7. CDDL Support TODO -- complete work and remove this comment 8. Security Considerations The security considerations in Section 10 of [RFC8949] apply. 9. IANA Considerations TODO -- complete work and remove this comment before publication 10. Normative References [RFC8610] Birkholz, H., Vigano, C., and C. Bormann, "Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures", RFC 8610, DOI 10.17487/RFC8610, June 2019, . [RFC8949] Bormann, C. and P. Hoffman, "Concise Binary Object Representation (CBOR)", STD 94, RFC 8949, DOI 10.17487/RFC8949, December 2020, . Appendix A. Examples and Test Vectors TODO -- complete work and remove this comment before publication Appendix B. Explanation for Big Number Preferred Serialization All requirements defined for Preferred Serialization address the intentional variability in CBOR serialization designed to support constrained environments—with one exception: the handling of big numbers. Specifically, all Preferred Serialization rules apply strictly to serialization concerns and not to the data model, except for the requirement regarding integers that can be encoded using major types 0 or 1. Lundblade Expires 14 November 2025 [Page 10] Internet-Draft CBOR Serialization May 2025 The rule that such integers MUST be encoded using major type 0 or 1, rather than as bignums (tags 2 or 3), represents a constraint at the data model level. It does not serve to limit variability in serialization format and is therefore conceptually distinct from other Preferred Serialization requirements. This exception is included in Preferred Serialization to promote a consistent and widely supported representation of 128-bit integers. While such integers are desirable for many applications, they exceed the range supported by the base CBOR data model, which is limited to 64-bit integers. Incorporating this constraint within Preferred Serialization enables consistent encoding practices for extended integer ranges without modifying the core CBOR data model. Author's Address Laurence Lundblade Security Theory LLC Email: lgl@securitytheory.com Lundblade Expires 14 November 2025 [Page 11]