1. Introduction

   Uniform Resource Identifiers (URI) provide a simple and extensible
   means for identifying a resource.  This specification of URI syntax
   and semantics is derived from concepts introduced by the World Wide
   Web global information initiative, whose use of such objects dates
   from 1990 and is described in "Universal Resource Identifiers in WWW"
   [RFC1630].  The specification of URI is designed to meet the
   recommendations laid out in "Functional Recommendations for Internet
   Resource Locators" [RFC1736] and "Functional Requirements for Uniform
   Resource Names" [RFC1737].

URIは、リソースを識別するための単純且つ拡張性のある手段を提供する。URIの構文と概念を扱う当仕様書は、WWW大域情報イニシアティブによって広められた概念――1990年から使われ始め、「URI in WWW」[RFC1630]にて定義されている――から派生している。URIの仕様は、「インター ネットリソース位置子の機能的要件」[RFC1736]と「URNの機能的要件」[RFC1737]を満たす勧告として設計されている。

   This document updates and merges "Uniform Resource Locators"
   [RFC1738] and "Relative Uniform Resource Locators" [RFC1808] in order
   to define a single, generic syntax for all URI.  It excludes those
   portions of RFC 1738 that defined the specific syntax of individual
   URL schemes; those portions will be updated as separate documents, as
   will the process for registration of new URI schemes.  This document
   does not discuss the issues and recommendation for dealing with
   characters outside of the US-ASCII character set [ASCII]; those
   recommendations are discussed in a separate document.

当文書は、全てのURIに適応できる単一で一般的な構文を定義するために、「URL」[RFC1738]と「相対URL」[RFC1808]を更新し、合併するものである。ただし、RFC 1738内の個々のURLスキーム構文の仕様を定義する部分は除外し、この部分は新たなURIスキームを登記する過程で別の文書として更新される。US-ASCII文字セット[ASCII]以外の文字セットを扱う際の問題点や勧告については別の文書で触れるので、当文書では議論しない。

All significant changes from the prior RFCs are noted in Appendix G.


1.1 Overview of URI

URI are characterized by the following definitions:


         Uniformity provides several benefits: it allows different types
         of resource identifiers to be used in the same context, even
         when the mechanisms used to access those resources may differ;
         it allows uniform semantic interpretation of common syntactic
         conventions across different types of resource identifiers; it
         allows introduction of new types of resource identifiers
         without interfering with the way that existing identifiers are
         used; and, it allows the identifiers to be reused in many
         different contexts, thus permitting new applications or
         protocols to leverage a pre-existing, large, and widely-used
         set of resource identifiers.


         A resource can be anything that has identity.  Familiar
         examples include an electronic document, an image, a service
         (e.g., "today's weather report for Los Angeles"), and a
         collection of other resources.  Not all resources are network
         "retrievable"; e.g., human beings, corporations, and bound
         books in a library can also be considered resources.

         The resource is the conceptual mapping to an entity or set of
         entities, not necessarily the entity which corresponds to that
         mapping at any particular instance in time.  Thus, a resource
         can remain constant even when its content---the entities to
         which it currently corresponds---changes over time, provided
         that the conceptual mapping is not changed in the process.



         An identifier is an object that can act as a reference to
         something that has identity.  In the case of URI, the object is
         a sequence of characters with a restricted syntax.


   Having identified a resource, a system may perform a variety of
   operations on the resource, as might be characterized by such words
   as `access', `update', `replace', or `find attributes'.


1.2. URI, URL, and URN

   A URI can be further classified as a locator, a name, or both.  The
   term "Uniform Resource Locator" (URL) refers to the subset of URI
   that identify resources via a representation of their primary access
   mechanism (e.g., their network "location"), rather than identifying
   the resource by name or by some other attribute(s) of that resource.
   The term "Uniform Resource Name" (URN) refers to the subset of URI
   that are required to remain globally unique and persistent even when
   the resource ceases to exist or becomes unavailable.


   The URI scheme (Section 3.1) defines the namespace of the URI, and
   thus may further restrict the syntax and semantics of identifiers
   using that scheme.  This specification defines those elements of the
   URI syntax that are either required of all URI schemes or are common
   to many URI schemes.  It thus defines the syntax and semantics that
   are needed to implement a scheme-independent parsing mechanism for
   URI references, such that the scheme-dependent handling of a URI can
   be postponed until the scheme-dependent semantics are needed.  We use
   the term URL below when describing syntax or semantics that only
   apply to locators.


   Although many URL schemes are named after protocols, this does not
   imply that the only way to access the URL's resource is via the named
   protocol.  Gateways, proxies, caches, and name resolution services
   might be used to access some resources, independent of the protocol
   of their origin, and the resolution of some URL may require the use
   of more than one protocol (e.g., both DNS and HTTP are typically used
   to access an "http" URL's resource when it can't be found in a local


   A URN differs from a URL in that it's primary purpose is persistent
   labeling of a resource with an identifier.  That identifier is drawn
   from one of a set of defined namespaces, each of which has its own
   set name structure and assignment procedures.  The "urn" scheme has
   been reserved to establish the requirements for a standardized URN
   namespace, as defined in "URN Syntax" [RFC2141] and its related

URNのURLとの違いは、永続的にリソースを識別できる名付け方を主な目的としている点である。そのための識別子は定義済みの名前空間の集合からなり、それぞれの集合には名前構造と割り当て手順が決められている。urnスキームは、「URN Syntax[RFC 2141]」とその関連仕様書で定義されているように、標準化されたURN名前空間が実装されることを見越して予約されている。

   Most of the examples in this specification demonstrate URL, since
   they allow the most varied use of the syntax and often have a
   hierarchical namespace.  A parser of the URI syntax is capable of
   parsing both URL and URN references as a generic URI; once the scheme
   is determined, the scheme-specific parsing can be performed on the
   generic URI components.  In other words, the URI syntax is a superset
   of the syntax of all URI schemes.


1.3. Example URI

   The following examples illustrate URI that are in common use.

      -- ftp scheme for File Transfer Protocol services

      -- gopher scheme for Gopher and Gopher+ Protocol services

      -- http scheme for Hypertext Transfer Protocol services

      -- mailto scheme for electronic mail addresses

      -- news scheme for USENET news groups and articles

      -- telnet scheme for interactive services via the TELNET Protocol



1.4. Hierarchical URI and Relative Forms

   An absolute identifier refers to a resource independent of the
   context in which the identifier is used.  In contrast, a relative
   identifier refers to a resource by describing the difference within a
   hierarchical namespace between the current context and an absolute
   identifier of the resource.


   Some URI schemes support a hierarchical naming system, where the
   hierarchy of the name is denoted by a "/" delimiter separating the
   components in the scheme. This document defines a scheme-independent
   `relative' form of URI reference that can be used in conjunction with
   a `base' URI (of a hierarchical scheme) to produce another URI. The
   syntax of hierarchical URI is described in Section 3; the relative
   URI calculation is described in Section 5.

URIスキームの中には、スキーム内の要素を"/"で区切ることにより階層名が示される階層名システムが使われているものがある。当文書では、別のURIを生成するために階層内の"基底"URIと共に使うことができるURI参照方法の独立したスキームである"相対"形式を定義する。階層的URIの構文は3. URI構文の構成要素に記載する。また、相対URIの算出方法は5. 相対URI参照に記載する。

1.5. URI Transcribability

   The URI syntax was designed with global transcribability as one of
   its main concerns. A URI is a sequence of characters from a very
   limited set, i.e. the letters of the basic Latin alphabet, digits,
   and a few special characters.  A URI may be represented in a variety
   of ways: e.g., ink on paper, pixels on a screen, or a sequence of
   octets in a coded character set.  The interpretation of a URI depends
   only on the characters used and not how those characters are
   represented in a network protocol.


   The goal of transcribability can be described by a simple scenario.
   Imagine two colleagues, Sam and Kim, sitting in a pub at an
   international conference and exchanging research ideas.  Sam asks Kim
   for a location to get more information, so Kim writes the URI for the
   research site on a napkin.  Upon returning home, Sam takes out the
   napkin and types the URI into a computer, which then retrieves the
   information to which Kim referred.


   There are several design concerns revealed by the scenario:

      o  A URI is a sequence of characters, which is not always
         represented as a sequence of octets.

      o  A URI may be transcribed from a non-network source, and thus
         should consist of characters that are most likely to be able to
         be typed into a computer, within the constraints imposed by
         keyboards (and related input devices) across languages and

      o  A URI often needs to be remembered by people, and it is easier
         for people to remember a URI when it consists of meaningful


   These design concerns are not always in alignment.  For example, it
   is often the case that the most meaningful name for a URI component
   would require characters that cannot be typed into some systems.  The
   ability to transcribe the resource identifier from one medium to
   another was considered more important than having its URI consist of
   the most meaningful of components.  In local and regional contexts
   and with improving technology, users might benefit from being able to
   use a wider range of characters; such use is not defined in this


1.6. Syntax Notation and Common Elements

   This document uses two conventions to describe and define the syntax
   for URI.  The first, called the layout form, is a general description
   of the order of components and component separators, as in



   The component names are enclosed in angle-brackets and any characters
   outside angle-brackets are literal separators.  Whitespace should be
   ignored.  These descriptions are used informally and do not define
   the syntax requirements.


   The second convention is a BNF-like grammar, used to define the
   formal URI syntax.  The grammar is that of [RFC822], except that "|"
   is used to designate alternatives.  Briefly, rules are separated from
   definitions by an equal "=", indentation is used to continue a rule
   definition over more than one line, literals are quoted with "",
   parentheses "(" and ")" are used to group elements, optional elements
   are enclosed in "[" and "]" brackets, and elements may be preceded
   with <n>* to designate n or more repetitions of the following
   element; n defaults to 0.

2番目の方法はBNF的な文法で、公式なURI構文を定義するために用いられる。これは、"|"を代替の意に用いる点を除いて[RFC 822]と同じものである。簡単に言うと、規則は等号によって定義と分けられ、インデントは行を超えて規則を定義する際に用いられ、リテラル文字は引用符で括り、要素をグループ化するために丸括弧を使い、任意要素は各括弧で括る。また、要素の前には続く要素の n 回以上の繰り返しを意味する<n>を付けることができる(n の規定値は 0)。

   Unlike many specifications that use a BNF-like grammar to define the
   bytes (octets) allowed by a protocol, the URI grammar is defined in
   terms of characters.  Each literal in the grammar corresponds to the
   character it represents, rather than to the octet encoding of that
   character in any particular coded character set.  How a URI is
   represented in terms of bits and bytes on the wire is dependent upon
   the character encoding of the protocol used to transport it, or the
   charset of the document which contains it.


   The following definitions are common to many elements:

      alpha    = lowalpha | upalpha

      lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                 "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                 "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"

      upalpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"

      digit    = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                 "8" | "9"

      alphanum = alpha | digit

   The complete URI syntax is collected in Appendix A.


完全なURI構文はA. URIのためのBNFを参照。


Copyright (C) 2006 七鍵 key@do.ai 初版:2006年04月14日 最終更新:2006年09月25日