| Internet-Draft | UZPIF-OI | January 2026 |
| Fisher | Expires 17 July 2026 | [Page] |
This document proposes an outbound, opt-in mechanism for web content discovery and indexing, complementing or replacing traditional inbound crawling models such as those governed by the Robots Exclusion Protocol (REP; [RFC9309]). In the proposed approach, servers proactively initiate authenticated outbound connections to trusted indexers (search engines or AI systems) using identity-bound grants, enabling explicit consent for indexing, freshness signalling, and content usage policy communication.¶
The mechanism integrates with identity-centric frameworks such as the Universal Zero-Port Interconnect Framework (UZPIF; [UZPIF]) and supports both traditional search engines and AI-driven indexing and retrieval systems. It aims to reduce unsolicited crawling abuse, improve signal quality for indexers, and provide site owners with positive control over discoverability in an era of increasing AI content consumption.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 17 July 2026.¶
Copyright (c) 2026 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document.¶
This document is an Internet-Draft and represents work in progress. It is published to enable structured technical review, interoperability discussion, and disciplined specification development around outbound, consent-first indexing mechanisms for UZPIF-style transports.¶
The material is a research artefact. It does not claim technical completeness, production readiness, or endorsement by the IETF or any other standards body, and it is not presented as a standards-track specification.¶
It is designed for experimentation, operator feedback, and profile-driven deployments. It does not require changes to the HTTP protocol, but it can carry or reference HTTP-origin content.¶
During conversion from internal research documents into IETF XML, care has been taken to:¶
preserve a clear distinction between normative and informative content;¶
use requirement language (e.g., "MUST", "SHOULD", "MAY") only where behaviour is intentionally specified;¶
avoid any implication of registry finalisation, mandatory implementation, or standards-track status; and¶
maintain intellectual-property neutrality, with no implied patent grants or licensing commitments beyond the IETF Trust copyright licence applicable to Internet-Draft text.¶
Ongoing research, implementation, performance validation, and real-world pilot work remain outside the scope of this Internet-Draft text and may be pursued separately.¶
This document defines an outbound indexing model in which content publishers (servers) initiate outbound, authenticated sessions to trusted indexers to advertise content availability, request refresh, and communicate explicit usage constraints. The model is intended as a complement to inbound crawling and robots.txt-based opt-out signalling.¶
The core components are:¶
Outbound Discovery Session: A publisher-initiated secure session to an indexer service, established over UZPIF and secured using identity-bound handshakes (e.g., TLS-DPA).¶
Discovery Grant: An identity-bound, purpose-scoped authorisation object that grants an indexer permission to index, cache, summarise, or otherwise process specific content for specific purposes.¶
Policy Communication: Machine-readable statements about permitted uses (e.g., search indexing, snippet generation, retrieval augmentation, AI training), retention, attribution, and derivative generation.¶
Freshness Signalling: A method for publishers to explicitly request refresh or indicate change without exposing unauthenticated inbound endpoints.¶
The proposal is compatible with legacy web publishing and can be adopted incrementally. It is especially suited to "zero-port" deployments where inbound crawling is undesirable or impossible.¶
Requirements Language: The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119] and [RFC8174] when, and only when, they appear in all capitals.¶
This Internet-Draft is primarily exploratory; requirement language is used sparingly and only where behaviour is intentionally specified.¶
Traditional web indexing relies on inbound crawling, where automated clients (crawlers) initiate connections to servers and respect opt-out signals such as robots.txt (as standardised in the Robots Exclusion Protocol; [RFC9309]). While effective for many years, this model exposes servers to unsolicited traffic, abuse from malicious crawlers, and challenges in enforcing preferences - particularly as AI systems increasingly use crawled content for training or real-time retrieval.¶
Recent developments, including crawler best practices ([draft-illyes-aipref-cbcp]) and discussions on AI-specific controls, highlight limitations of opt-out regimes. Inbound crawling assumes servers are reachable and willing to respond, which conflicts with emerging zero-port and zero-trust architectures.¶
This document describes an outbound opt-in alternative: servers explicitly initiate authenticated connections to authorised indexers (traditional search engines or AI agents) when they wish to be discovered or refreshed. Discovery requests are bound to cryptographic identities and policy grants, enabling fine-grained control over who may index content and for what purpose (e.g., traditional search, AI training, or summarisation).¶
The approach builds on identity-first transports such as the Universal Zero-Port Interconnect Framework (UZPIF; [UZPIF]) and Universal Zero-Port Transport Protocol (UZP; [UZP]), where endpoints establish outbound-only sessions. It provides a proactive, consent-first model suited to both human-readable web content and AI-driven search consumption.¶
This is a research proposal intended for experimentation and discussion, particularly in contexts where reducing inbound exposure and strengthening consent are priorities.¶
The inbound crawling model is built on an assumption of open reachability: crawlers discover a server, initiate inbound connections, and learn policies after connecting. For modern deployments - especially those seeking to minimise exposed attack surface - this is an inversion of the desired trust model.¶
Specific limitations include:¶
Unsolicited load and abuse: REP is advisory, and many automated clients do not comply. Even compliant crawlers can produce significant load when scaled across multiple indexers and AI agents.¶
Weak identity and accountability: A User-Agent string is not a strong identity; it is easy to spoof and difficult to bind to policy obligations.¶
Purpose ambiguity: The same content acquisition can be used for search indexing, summarisation, retrieval augmentation, or AI training. Without explicit purpose signalling and enforcement, site operators cannot make informed consent decisions.¶
Incompatibility with zero-port architectures: If a Publisher exposes no public inbound listening ports, inbound crawling becomes impossible by design.¶
Policy enforcement gaps: Opt-out mechanisms require a crawler to first connect and then choose to comply, rather than enforcing access authorisation at session establishment.¶
The outbound indexing model aims to preserve the benefits of web discoverability while shifting control to the Publisher, making consent explicit, identity-bound, and enforceable within authenticated channels.¶
The mechanism defined in this document has the following goals:¶
Opt-in discoverability: indexing and refresh occur only when the Publisher chooses to contact an Indexer.¶
Identity binding: all sessions are authenticated and bound to cryptographic identities, enabling durable accountability.¶
Purpose limitation: Publishers can grant indexing permission for specific purposes (e.g., search indexing) while denying others (e.g., AI training).¶
Policy expressiveness: Publishers can express usage constraints, retention expectations, attribution requirements, and derivative permissions.¶
Freshness signalling: Publishers can efficiently request refresh without being continuously crawled.¶
Transport independence: the mechanism should operate over UZPIF sessions and may be profiled for other outbound-friendly transports.¶
Incremental adoption: the mechanism complements existing protocols such as REP and does not require immediate ecosystem-wide migration.¶
At a high level, outbound indexing replaces "Indexer-driven fetch" with a Publisher-initiated session that can carry announcements, policy, and (optionally) content. Under UZPIF, both Publisher and Indexer may maintain outbound connectivity to one or more Rendezvous Nodes (RNs), which stitch permitted sessions.¶
Publisher (Site) RN(s) Trusted Indexer |- outbound setup -->|<- outbound presence -| |<== identity-bound secure session (via RN stitch) ==>| |- ANNOUNCE+GRANT+POLICY ------------->| |<- (optional) REQUEST(resource set) -------| |- CONTENT(resource set/deltas) --------->| |<- RECEIPT/STATUS ----------------|
The session is initiated by the Publisher, but once established it is a bidirectional secure channel in which the Indexer may request specific resources and the Publisher may provide them. The key property is that the Publisher does not expose an unauthenticated public inbound service for discovery.¶
The model distinguishes three relationships:¶
Publisher <-> Indexer: A trust and consent relationship expressed via Discovery Grants and enforced via authenticated sessions.¶
Publisher <-> RN: A connectivity relationship in which the Publisher maintains outbound sessions to one or more RNs, as defined in [UZPIF].¶
Indexer <-> RN: An analogous connectivity relationship enabling stitching to Publishers that authorise the Indexer.¶
This document focuses on Publisher-to-Indexer semantics and does not redefine UZPIF stitching or transport behaviour.¶
Outbound indexing relies on cryptographic identities for both Publishers and Indexers. In UZPIF deployments, these identities are typically represented by certificates or equivalent credentials issued within an identity plane (e.g., Pantheon as described in [UZPIF]), and sessions are established over secure channels such as TLS-DPA ([TLS-DPA]).¶
A Publisher MUST authenticate the Indexer identity before sending any content beyond minimal discovery metadata. An Indexer MUST authenticate the Publisher identity before accepting Discovery Grants, policy, or content.¶
Identity binding serves two purposes:¶
Consent enforcement: Discovery Grants are bound to specific identities and cannot be meaningfully replayed by unauthorised parties.¶
Operational accountability: Publishers can select and audit trusted Indexers, and Indexers can maintain verifiable provenance of content acquisition.¶
This document does not mandate a single Indexer discovery mechanism. A Publisher may discover Indexer identities and endpoints through out-of-band agreements, operator-curated trust lists, or an identity plane such as Pantheon ([UZPIF]).¶
A Publisher SHOULD treat Indexer discovery as a trust decision comparable to granting API access. Blind acceptance of unsolicited Indexer identities reintroduces abuse vectors that outbound indexing is intended to reduce.¶
A Discovery Grant conveys explicit permission from a Publisher to an Indexer, scoped by content and purpose. It is an authorisation artefact, not merely a preference hint.¶
Discovery Grants are conceptually aligned with grant structures described in [UZPIF]. This document defines additional, indexing-specific fields and semantics.¶
A Discovery Grant SHOULD include at least:¶
Issuer: the Publisher identity that issues the grant.¶
Audience: the Indexer identity authorised to use the grant.¶
Scope: a Content Scope describing what may be indexed or retrieved.¶
Purposes: one or more Usage Purposes for which the Indexer is authorised.¶
Constraints: optional limits such as maximum fetch rate, retention period, or required attribution.¶
Expiry: a time limit after which the grant is no longer valid.¶
Signature: a cryptographic signature or equivalent proof binding the grant to the Issuer.¶
Discovery Grants MUST be bound to the authenticated identities observed in the Outbound Discovery Session. An Indexer MUST reject a grant if the authenticated Publisher identity is not the Issuer, or if the authenticated Indexer identity is not the intended Audience.¶
Content Scope may be expressed in multiple ways, depending on deployment:¶
URL prefix scope: allow indexing for all resources under a given origin and path prefix.¶
Feed scope: allow indexing for resources enumerated in a signed feed.¶
Hash set scope: allow indexing for content objects identified by cryptographic hashes.¶
Semantic collection: allow indexing for a named collection (e.g., "docs", "blog", "product-catalogue") maintained by the Publisher.¶
Outbound indexing provides a channel for Publishers to communicate content usage policy to Indexers in a form that is:¶
bound to authenticated identities;¶
associated with explicit Content Scope; and¶
auditable and revocable.¶
The policy data model is intentionally generic. Deployments MAY use the vocabulary defined by the IETF AIPREF working group (e.g., [draft-ietf-aipref-vocab]), or they MAY define private purpose tokens under bilateral agreement.¶
A policy statement SHOULD be able to express:¶
Allowed purposes: which Usage Purposes are permitted (or denied) for a given scope.¶
Derivation: whether summaries, snippets, embeddings, or other derived artefacts are permitted.¶
Training: whether AI training or model fine-tuning is permitted.¶
Attribution: requirements for attribution or source linking in downstream displays.¶
Retention: permitted retention duration for cached copies or extracted features.¶
Redistribution: whether indexed content may be redistributed or provided to third parties.¶
Policy is not an access-control mechanism by itself; it is enforced by the combination of Discovery Grants, authenticated sessions, and Indexer compliance. However, unlike REP, policy is exchanged in an authenticated context where non-compliance can be attributed to a specific identity.¶
The following is a non-normative example of a policy object that permits traditional search indexing and snippet generation, but denies training and long-term retention:¶
{
"scope": "https://example.com/docs/*",
"allowed_purposes": [
"search.index",
"search.snippet",
"rag.retrieve"
],
"denied_purposes": ["ai.train", "ai.finetune"],
"derivatives": {
"summary": "allowed",
"embeddings": "allowed",
"snippets": "allowed"
},
"retention": {
"cached_copy_days": 14,
"embedding_days": 30
},
"attribution": {
"required": true,
"link_back": true
}
}
The encoding, signing, and canonicalisation of policy objects are open for profiling and experimentation. A deployment MAY sign policy objects as part of a Discovery Grant, or MAY carry them as a separate signed artefact within the session.¶
This section describes a baseline operational sequence. The message formats used inside the Outbound Discovery Session are intentionally abstract in this version of the document; the focus is on semantics and security properties.¶
A Publisher initiates an Outbound Discovery Session to an Indexer endpoint using UZPIF connectivity ([UZPIF]) and an identity-bound secure channel (e.g., TLS-DPA; [TLS-DPA]).¶
The Publisher MUST verify the authenticated Indexer identity before sending any sensitive content. The Indexer MUST verify the authenticated Publisher identity before accepting grants or indexing data.¶
Once the secure session is established, the Publisher sends:¶
a discovery announcement identifying the Publisher, site scope, and desired indexing actions (new discovery, refresh, or revocation);¶
one or more Discovery Grants; and¶
associated policy statements for the relevant Content Scopes.¶
The Indexer MUST validate Discovery Grants and reject any announcement that lacks sufficient authorisation for the requested purposes.¶
This document defines two conceptual modes for content transfer:¶
Publisher Push: the Publisher proactively sends content objects or deltas to the Indexer.¶
Indexer Request within Session: the Indexer requests specific resources inside the established session, and the Publisher responds over the same channel.¶
Both modes preserve the "no unauthenticated inbound ports" property because the Indexer does not initiate a new network connection to the Publisher.¶
A Publisher MAY choose to offer only one mode. For example, a Publisher with strict egress policy may prefer request/response within a session, while a Publisher with pre-generated feeds may prefer push.¶
A Publisher MAY send Freshness Signals to request that an Indexer refresh previously indexed content. Freshness Signals SHOULD be lightweight and SHOULD include enough metadata for the Indexer to prioritise work (e.g., change timestamps, object identifiers, or hashes).¶
Freshness Signals do not, by themselves, grant permission; they operate under the permissions already established by Discovery Grants and policy.¶
An Indexer MAY provide receipts indicating that it has accepted content for indexing and the purposes under which it will be processed. Receipts can improve auditability and facilitate contractual enforcement in commercial relationships.¶
This document does not specify a receipt format; profiles may define signed receipts that include:¶
A Publisher MUST be able to revoke consent. Revocation may apply to a specific grant, a policy scope, or an entire Publisher-to-Indexer relationship.¶
Upon receiving a valid revocation instruction from an authenticated Publisher, an Indexer SHOULD cease further acquisition under the revoked scope and SHOULD follow the Publisher's stated retention and deletion policy.¶
A Publisher SHOULD treat revocation as an operational and legal relationship issue. Technical signalling can communicate intent and scope, but enforcement ultimately depends on Indexer compliance.¶
Outbound indexing is designed to be complementary to existing web controls and does not attempt to obsolete them.¶
REP ([RFC9309]) is an opt-out signalling mechanism interpreted by automated clients that initiate inbound connections. It is widely deployed and remains relevant for legacy crawling.¶
Outbound indexing differs in that it:¶
is opt-in by default;¶
operates over authenticated, identity-bound channels; and¶
supports explicit purpose limitation and richer policy statements.¶
A Publisher MAY use REP for the general web while using outbound indexing for high-value relationships with specific trusted indexers. An Indexer MAY choose to prioritise outbound indexing signals when present, as they can be higher quality and fresher than crawl-derived heuristics.¶
The IETF AIPREF working group is developing vocabulary and attachment mechanisms for expressing usage preferences (e.g., [draft-ietf-aipref-vocab] and [draft-ietf-aipref-attach]).¶
Outbound indexing does not compete with these efforts. Instead, it provides an authenticated delivery channel for the same or compatible preference statements, including in environments where HTTP acquisition is not the primary mechanism or where inbound HTTP access is intentionally restricted.¶
Outbound indexing reduces exposure to unsolicited inbound traffic by eliminating the need for publicly reachable discovery endpoints. However, it introduces new considerations around trust, grant handling, and policy enforcement.¶
Implementations SHOULD consider:¶
Indexer impersonation: Publishers must authenticate Indexer identities; otherwise, an attacker could harvest content by posing as an Indexer.¶
Grant replay: Discovery Grants should be bound to identities and sessions; replay in a different context should be rejected.¶
Scope escalation: Indexers must enforce the declared Content Scope and Purposes; ambiguous scopes should be avoided.¶
Confused deputy: Publishers should avoid issuing broadly scoped grants that a downstream Indexer could use to justify unexpected processing.¶
Compromised Indexers: A trusted Indexer compromise can lead to large-scale misuse; publishers should prefer short-lived grants and revocation readiness.¶
Metadata leakage: Even announcing content existence can reveal sensitive information; publishers should consider minimal announcements and staged disclosure.¶
When used with UZPIF ([UZPIF]) and TLS-DPA ([TLS-DPA]), outbound indexing benefits from identity-bound handshake properties and reduced scanning surface. This document does not define cryptographic primitives; it relies on the referenced transports for channel security.¶
Outbound indexing provides publishers with positive control over who may access content for automated processing. This can reduce privacy harms associated with indiscriminate crawling.¶
Publishers SHOULD consider:¶
minimising announcement metadata to what is necessary for indexing;¶
scoping grants narrowly to avoid unintended disclosure;¶
using short-lived grants and explicit retention policy; and¶
auditing relationships with trusted indexers.¶
Indexers SHOULD consider:¶
This document has no IANA actions.¶