Uniform Resource Identiriers (URI): Generic Syntax（1）

   Uniform Resource Identifiers (URI) provide a simple and extensible
   means for identifying a resource.  This specification of URI syntax
   and semantics is derived from concepts introduced by the World Wide
   Web global information initiative, whose use of such objects dates
   from 1990 and is described in "Universal Resource Identifiers in WWW"
   [RFC1630].  The specification of URI is designed to meet the
   recommendations laid out in "Functional Recommendations for Internet
   Resource Locators" [RFC1736] and "Functional Requirements for Uniform
   Resource Names" [RFC1737].

URIは、リソースを識別するための単純且つ拡張性のある手段を提供する。URIの構文と概念を扱う当仕様書は、WWW大域情報イニシアティブによって広められた概念――1990年から使われ始め、「URI in WWW」[RFC1630]にて定義されている――から派生している。URIの仕様は、「インターネットリソース位置子の機能的要件」[RFC1736]と「URNの機能的要件」[RFC1737]を満たす勧告として設計されている。

   This document updates and merges "Uniform Resource Locators"
   [RFC1738] and "Relative Uniform Resource Locators" [RFC1808] in order
   to define a single, generic syntax for all URI.  It excludes those
   portions of RFC 1738 that defined the specific syntax of individual
   URL schemes; those portions will be updated as separate documents, as
   will the process for registration of new URI schemes.  This document
   does not discuss the issues and recommendation for dealing with
   characters outside of the US-ASCII character set [ASCII]; those
   recommendations are discussed in a separate document.

当文書は、全てのURIに適応できる単一で一般的な構文を定義するために、「URL」[RFC1738]と「相対URL」[RFC1808]を更新し、合併するものである。ただし、RFC 1738内の個々のURLスキーム構文の仕様を定義する部分は除外し、この部分は新たなURIスキームを登記する過程で別の文書として更新される。US-ASCII文字セット[ASCII]以外の文字セットを扱う際の問題点や勧告については別の文書で触れるので、当文書では議論しない。

All significant changes from the prior RFCs are noted in Appendix G.

1.1 Overview of URI

URI are characterized by the following definitions:

      Uniform
         Uniformity provides several benefits: it allows different types
         of resource identifiers to be used in the same context, even
         when the mechanisms used to access those resources may differ;
         it allows uniform semantic interpretation of common syntactic
         conventions across different types of resource identifiers; it
         allows introduction of new types of resource identifiers
         without interfering with the way that existing identifiers are
         used; and, it allows the identifiers to be reused in many
         different contexts, thus permitting new applications or
         protocols to leverage a pre-existing, large, and widely-used
         set of resource identifiers.

画一的であることには幾つかの利点がある。第一に、リソースにアクセスする機構に拠らず異なるリソース識別子を同じ文脈の中で用いることができる。第二に、異なるリソース識別子間で共通した構文規約を一貫した意味で解釈することができる。第三に、既にある識別子の手法を妨げることなく新たな型のリソース識別子を導入することができる。更には、識別子を多くの異なる文脈の中で再利用でき、従って新しいアプリケーションやプロトコルは現存の、広範囲で、広く用いられているリソース識別子セットの影響を受ける。

      Resource
         A resource can be anything that has identity.  Familiar
         examples include an electronic document, an image, a service
         (e.g., "today's weather report for Los Angeles"), and a
         collection of other resources.  Not all resources are network
         "retrievable"; e.g., human beings, corporations, and bound
         books in a library can also be considered resources.

         The resource is the conceptual mapping to an entity or set of
         entities, not necessarily the entity which corresponds to that
         mapping at any particular instance in time.  Thus, a resource
         can remain constant even when its content---the entities to
         which it currently corresponds---changes over time, provided
         that the conceptual mapping is not changed in the process.

リソースは識別可能であればどのようなものでもよい。例として、電子文書、画像、サービス（例：今日のL.A.の天気予報）、又はそれ以外のリソースの集合など。ただし、全てのリソースがネットワークを通じて取得できるわけではない。人間や企業、図書館の本もリソースなのだから。

リソースとは、実体または実体の集合への概念的なマッピングのことであり、ある特別な時点でマッピングが一致する実体である必要はない。従って、例えリソースの内容――その時点でマッピングと一致している実体――に変更が加えられても、変更過程の中で概念的なマッピングが変えられない限りリソースは一定であり続ける。

      Identifier
         An identifier is an object that can act as a reference to
         something that has identity.  In the case of URI, the object is
         a sequence of characters with a restricted syntax.

識別子は、識別可能なものへの参照として作用し得るオブジェクトである。URIの場合、オブジェクトは特定の規則に則った文字列である。

   Having identified a resource, a system may perform a variety of
   operations on the resource, as might be characterized by such words
   as `access', `update', `replace', or `find attributes'.

リソースが識別されれば、システムは様々な操作、例えば「アクセス」「更新」「置換」「属性判断」といった操作をそのリソースに対して行うことが可能である。

1.2. URI, URL, and URN

   A URI can be further classified as a locator, a name, or both.  The
   term "Uniform Resource Locator" (URL) refers to the subset of URI
   that identify resources via a representation of their primary access
   mechanism (e.g., their network "location"), rather than identifying
   the resource by name or by some other attribute(s) of that resource.
   The term "Uniform Resource Name" (URN) refers to the subset of URI
   that are required to remain globally unique and persistent even when
   the resource ceases to exist or becomes unavailable.

URIは、位置指定子か、名前か、もしくはその両方であるかとの観点からさらに分類可能である。URLという用語が意味するURIの部分集合は、リソースをそれらの主要なアクセス機構の表現――例えばネットワーク上の位置――によって識別し、そのリソースの名前やその他の属性に依存しない。URNという用語が意味するURIの部分集合は、例えそのリソースが存在しなくなったり利用できなくなったとしても、全世界で一意且つ永続的であり続ける必要がある。

   The URI scheme (Section 3.1) defines the namespace of the URI, and
   thus may further restrict the syntax and semantics of identifiers
   using that scheme.  This specification defines those elements of the
   URI syntax that are either required of all URI schemes or are common
   to many URI schemes.  It thus defines the syntax and semantics that
   are needed to implement a scheme-independent parsing mechanism for
   URI references, such that the scheme-dependent handling of a URI can
   be postponed until the scheme-dependent semantics are needed.  We use
   the term URL below when describing syntax or semantics that only
   apply to locators.

URIスキームはURIの名前空間を定義するため、そのスキームを用いる識別子の構文と意味はさらに制限を受けることとなる。この仕様書は、全てのURIスキームで必要となる、又は多くのURIスキームで共通するURI構文の要素を定義する。したがって、構文と意味の定義はURI参照をスキームごとに解析する機能を実装するために必要とされ、そのためスキームに依存するURIの処理はスキームごとの意味が必要となるまで何もする必要がない。これ以降URLという用語は、位置指定子に該当する構文と意味を記述する際にだけ用いることとする。

   Although many URL schemes are named after protocols, this does not
   imply that the only way to access the URL's resource is via the named
   protocol.  Gateways, proxies, caches, and name resolution services
   might be used to access some resources, independent of the protocol
   of their origin, and the resolution of some URL may require the use
   of more than one protocol (e.g., both DNS and HTTP are typically used
   to access an "http" URL's resource when it can't be found in a local
   cache).

多くのURLスキームはプロトコル名から名付けられているが、そのプロトコルのみがURLの示すリソースへアクセスできることを暗示しているわけではない。ゲートウェイ、プロキシ、キャッシュ、名前解決サービスなどが、スキーム本来のプロトコルとは独立してリソースへのアクセスに使われることがあり、またURLの名前解決のために複数のプロトコルが必要となる場合もある。例えば、"http"スキームが示すURLリソースへアクセスする際、そのリソースがローカルキャッシュ内で見つからなければ、DNSとHTTPの両方を使うこととなる。

   A URN differs from a URL in that it's primary purpose is persistent
   labeling of a resource with an identifier.  That identifier is drawn
   from one of a set of defined namespaces, each of which has its own
   set name structure and assignment procedures.  The "urn" scheme has
   been reserved to establish the requirements for a standardized URN
   namespace, as defined in "URN Syntax" [RFC2141] and its related
   specifications.

URNのURLとの違いは、永続的にリソースを識別できる名付け方を主な目的としている点である。そのための識別子は定義済みの名前空間の集合からなり、それぞれの集合には名前構造と割り当て手順が決められている。urnスキームは、「URN Syntax[RFC 2141]」とその関連仕様書で定義されているように、標準化されたURN名前空間が実装されることを見越して予約されている。

   Most of the examples in this specification demonstrate URL, since
   they allow the most varied use of the syntax and often have a
   hierarchical namespace.  A parser of the URI syntax is capable of
   parsing both URL and URN references as a generic URI; once the scheme
   is determined, the scheme-specific parsing can be performed on the
   generic URI components.  In other words, the URI syntax is a superset
   of the syntax of all URI schemes.

URLは非常に多様な構文が許され、また階層的な名前空間を持つこともあるため、当仕様書では大部分の例をURLの説明に割いている。スキームが判明すればそのスキーム独自の解析が一般的なURIコンポーネントに対して実行されるので、URI構文のパーサはURLとURN参照を一般的なURIとして解析する能力を有することとなる。つまり、URI構文は全てのURIスキーム構文の上位集合であると言い換えることができる。

1.3. Example URI

   The following examples illustrate URI that are in common use.

   ftp://ftp.is.co.za/rfc/rfc1808.txt
      -- ftp scheme for File Transfer Protocol services

   gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles
      -- gopher scheme for Gopher and Gopher+ Protocol services

   http://www.math.uio.no/faq/compression-faq/part1.html
      -- http scheme for Hypertext Transfer Protocol services

   mailto:mduerst@ifi.unizh.ch
      -- mailto scheme for electronic mail addresses

   news:comp.infosystems.www.servers.unix
      -- news scheme for USENET news groups and articles

   telnet://melvyl.ucop.edu/
      -- telnet scheme for interactive services via the TELNET Protocol

以下に一般的なURIの使用例を示す。

ftp://ftp.is.co.za/rfc/rfc1808.txt: FTPサービス用のftpスキーム。
gopher://spinaltap.micro.umn.edu/00/Weather/California/Los%20Angeles: ゴーファとゴーファ+プロトコルサービス用のgopherスキーム。
http://www.math.uio.no/faq/compression-faq/part1.html: HTTPサービス用のhttpスキーム。
mailto:mduerst@ifi.unizh.ch: 電子メールアドレス用のmailtoスキーム。
news:comp.infosystems.www.servers.unix: USENETニュースグループと記事用のnewsスキーム。
telnet://melvyl.ucop.edu/: TELNETプロトコルを通じた対話型サービス用のtelnetスキーム。

1.4. Hierarchical URI and Relative Forms

   An absolute identifier refers to a resource independent of the
   context in which the identifier is used.  In contrast, a relative
   identifier refers to a resource by describing the difference within a
   hierarchical namespace between the current context and an absolute
   identifier of the resource.

絶対識別子は、その識別子が示される文脈に依存することなくリソースを参照する。また、相対識別子は、階層的名前空間内における現在の文脈と、絶対識別子が指すリソースの位置との差によってリソースを参照する。

   Some URI schemes support a hierarchical naming system, where the
   hierarchy of the name is denoted by a "/" delimiter separating the
   components in the scheme. This document defines a scheme-independent
   `relative' form of URI reference that can be used in conjunction with
   a `base' URI (of a hierarchical scheme) to produce another URI. The
   syntax of hierarchical URI is described in Section 3; the relative
   URI calculation is described in Section 5.

URIスキームの中には、スキーム内の要素を"/"で区切ることにより階層名が示される階層名システムが使われているものがある。当文書では、別のURIを生成するために階層内の"基底"URIと共に使うことができるURI参照方法の独立したスキームである"相対"形式を定義する。階層的URIの構文は3. URI構文の構成要素に記載する。また、相対URIの算出方法は5. 相対URI参照に記載する。

1.5. URI Transcribability

   The URI syntax was designed with global transcribability as one of
   its main concerns. A URI is a sequence of characters from a very
   limited set, i.e. the letters of the basic Latin alphabet, digits,
   and a few special characters.  A URI may be represented in a variety
   of ways: e.g., ink on paper, pixels on a screen, or a sequence of
   octets in a coded character set.  The interpretation of a URI depends
   only on the characters used and not how those characters are
   represented in a network protocol.

URI構文は、世界中で扱いやすいことを主要な目的の一つとして設計された。URIは、基本的なラテンアルファベット、数字、いくつかの特殊な文字といった非常に限定された文字のみで表される。またURIは、紙面で、画面で、符号化文字のオクテット列でと、様々な方法で表現される。URIの解釈は用いられている文字にのみ依存し、それらの文字がネットワークプロトコル中でどのように表されているかには依存しない。

   The goal of transcribability can be described by a simple scenario.
   Imagine two colleagues, Sam and Kim, sitting in a pub at an
   international conference and exchanging research ideas.  Sam asks Kim
   for a location to get more information, so Kim writes the URI for the
   research site on a napkin.  Upon returning home, Sam takes out the
   napkin and types the URI into a computer, which then retrieves the
   information to which Kim referred.

扱いやすさの最終目的は、簡単な例で説明できる。ある国際会議の中、サムとキムという2人の参加者が居酒屋で研究のアイデアを交わしている姿を想像して欲しい。サムがより詳しい情報はどこで得られるのかとキムに尋ねると、キムはナプキンに自身の研究サイトを示すURIを書く。サムは家に帰り、そのナプキンに書かれているURIをコンピュータにタイプすると、キムの教えてくれた情報が手に入るというわけだ。

   There are several design concerns revealed by the scenario:

      o  A URI is a sequence of characters, which is not always
         represented as a sequence of octets.

      o  A URI may be transcribed from a non-network source, and thus
         should consist of characters that are most likely to be able to
         be typed into a computer, within the constraints imposed by
         keyboards (and related input devices) across languages and
         locales.

      o  A URI often needs to be remembered by people, and it is easier
         for people to remember a URI when it consists of meaningful
         components.

上記の例からいくつかの設計課題を読み取ることができる。

URIは文字列であり、常にオクテット列として表されるわけではない。
URIはネットワーク上にない資源からも転記されることがあり、このため殆どのコンピュータでタイプ可能な文字で構成されるべきである。キーボードや関連する入力デバイスには様々な言語や種類のものがあり、それが制限となる。
URIを覚えなければならない場面があるかもしれない。その際、意味のある文字列を使えば人は覚えやすい。

   These design concerns are not always in alignment.  For example, it
   is often the case that the most meaningful name for a URI component
   would require characters that cannot be typed into some systems.  The
   ability to transcribe the resource identifier from one medium to
   another was considered more important than having its URI consist of
   the most meaningful of components.  In local and regional contexts
   and with improving technology, users might benefit from being able to
   use a wider range of characters; such use is not defined in this
   document.

これらの設計課題が常に同時に成り立っているわけではない。例えば、あるURIにもっとも適切な名前を表すために必要な文字が、あるシステムでは入力できないということが起こり得る。メディアから別のメディアへリソースの識別子を転記する効率は、URIをもっとも適切な意味で表すことよりも重要であると考えられた。ローカルな環境又は地域を限定した文脈とすること、もしくは技術改良をすることによって利用者はより広範囲な文字を便利に使えるようになるかもしれないが、当文書ではそれらを定義しない。

1.6. Syntax Notation and Common Elements

   This document uses two conventions to describe and define the syntax
   for URI.  The first, called the layout form, is a general description
   of the order of components and component separators, as in

       <first>/<second>;<third>?<fourth>

当文書では2つの方法でURI構文を記述し、定義する。まず始めにレイアウト形式と呼ぶ方法。こちらはコンポーネントとその区切り文字の順序を統一した記述とする方法で、以下のように表す。

   The component names are enclosed in angle-brackets and any characters
   outside angle-brackets are literal separators.  Whitespace should be
   ignored.  These descriptions are used informally and do not define
   the syntax requirements.

コンポーネント名は山括弧で括られ、山括弧外にある文字は全て区切り文字である。空白は無視すること。これらの記述は非公式に用いられ、構文の要件を定義するものではない。

   The second convention is a BNF-like grammar, used to define the
   formal URI syntax.  The grammar is that of [RFC822], except that "|"
   is used to designate alternatives.  Briefly, rules are separated from
   definitions by an equal "=", indentation is used to continue a rule
   definition over more than one line, literals are quoted with "",
   parentheses "(" and ")" are used to group elements, optional elements
   are enclosed in "[" and "]" brackets, and elements may be preceded
   with <n>* to designate n or more repetitions of the following
   element; n defaults to 0.

2番目の方法はBNF的な文法で、公式なURI構文を定義するために用いられる。これは、"|"を代替の意に用いる点を除いて[RFC 822]と同じものである。簡単に言うと、規則は等号によって定義と分けられ、インデントは行を超えて規則を定義する際に用いられ、リテラル文字は引用符で括り、要素をグループ化するために丸括弧を使い、任意要素は各括弧で括る。また、要素の前には続く要素の n 回以上の繰り返しを意味する<n>を付けることができる（n の規定値は 0）。

   Unlike many specifications that use a BNF-like grammar to define the
   bytes (octets) allowed by a protocol, the URI grammar is defined in
   terms of characters.  Each literal in the grammar corresponds to the
   character it represents, rather than to the octet encoding of that
   character in any particular coded character set.  How a URI is
   represented in terms of bits and bytes on the wire is dependent upon
   the character encoding of the protocol used to transport it, or the
   charset of the document which contains it.

プロトコルで使えるオクテットを定義するBNF的な文法を用いる他の多くの仕様とは違い、URIの文法は文字の条件を定義するものである。文法中の各文字は表される文字と一致しており、ある特定の文字コードセットで符号化したオクテット列に対応するわけではない。URIが通信回線上でどのようなbitやバイトで表されるかは、転送に使われるプロトコル、又はそのURIを含む文書の文字符号化方式に依存する。

   The following definitions are common to many elements:

      alpha    = lowalpha | upalpha

      lowalpha = "a" | "b" | "c" | "d" | "e" | "f" | "g" | "h" | "i" |
                 "j" | "k" | "l" | "m" | "n" | "o" | "p" | "q" | "r" |
                 "s" | "t" | "u" | "v" | "w" | "x" | "y" | "z"

      upalpha  = "A" | "B" | "C" | "D" | "E" | "F" | "G" | "H" | "I" |
                 "J" | "K" | "L" | "M" | "N" | "O" | "P" | "Q" | "R" |
                 "S" | "T" | "U" | "V" | "W" | "X" | "Y" | "Z"

      digit    = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" |
                 "8" | "9"

      alphanum = alpha | digit

   The complete URI syntax is collected in Appendix A.

以下の定義は多くの要素に共通するものである。

完全なURI構文はA. URIのためのBNFを参照。

URI共通構文（1）

原文

URI共通構文（和訳）

1. Introduction