5. Relative URI References

   It is often the case that a group or "tree" of documents has been
   constructed to serve a common purpose; the vast majority of URI in
   these documents point to resources within the tree rather than
   outside of it.  Similarly, documents located at a particular site are
   much more likely to refer to other resources at that site than to
   resources at remote sites.


   Relative addressing of URI allows document trees to be partially
   independent of their location and access scheme.  For instance, it is
   possible for a single set of hypertext documents to be simultaneously
   accessible and traversable via each of the "file", "http", and "ftp"
   schemes if the documents refer to each other using relative URI.
   Furthermore, such document trees can be moved, as a whole, without
   changing any of the relative references.  Experience within the WWW
   has demonstrated that the ability to perform relative referencing is
   necessary for the long-term usability of embedded URI.


   The syntax for relative URI takes advantage of the <hier_part> syntax
   of <absoluteURI> (Section 3) in order to express a reference that is
   relative to the namespace of another hierarchical URI.

      relativeURI   = ( net_path | abs_path | rel_path ) [ "?" query ]

相対URI構文は、別のURI名前空間階層に関連する参照を示すために、3. URI構文の構成要素で触れた<hier_part>構文を利用する。

   A relative reference beginning with two slash characters is termed a
   network-path reference, as defined by <net_path> in Section 3.  Such
   references are rarely used.

2つのスラッシュから始まる相対参照は3. URI構文の構成要素で<net_path>と定義され、ネットワークパス参照と名付けられている。ただし、このような参照はほとんど用いられない。

   A relative reference beginning with a single slash character is
   termed an absolute-path reference, as defined by <abs_path> in
   Section 3.

1つのスラッシュから始まる相対参照は3. URI構文の構成要素で<abs_path>と定義され、絶対パス参照と名付けられている。

   A relative reference that does not begin with a scheme name or a
   slash character is termed a relative-path reference.

      rel_path      = rel_segment [ abs_path ]

      rel_segment   = 1*( unreserved | escaped |
                          ";" | "@" | "&" | "=" | "+" | "$" | "," )


   Within a relative-path reference, the complete path segments "." and
   ".." have special meanings: "the current hierarchy level" and "the
   level above this hierarchy level", respectively.  Although this is
   very similar to their use within Unix-based filesystems to indicate
   directory levels, these path components are only considered special
   when resolving a relative-path reference to its absolute form
   (Section 5.2).


   Authors should be aware that a path segment which contains a colon
   character cannot be used as the first segment of a relative URI path
   (e.g., "this:that"), because it would be mistaken for a scheme name.


   It is therefore necessary to precede such segments with other
   segments (e.g., "./this:that") in order for them to be referenced as
   a relative path.


   It is not necessary for all URI within a given scheme to be
   restricted to the <hier_part> syntax, since the hierarchical
   properties of that syntax are only necessary when relative URI are
   used within a particular document.  Documents can only make use of
   relative URI when their base URI fits within the <hier_part> syntax.
   It is assumed that any document which contains a relative reference
   will also have a base URI that obeys the syntax.  In other words,
   relative URI cannot be used within a document that has an unsuitable
   base URI.


   Some URI schemes do not allow a hierarchical syntax matching the
   <hier_part> syntax, and thus cannot use relative references.


5.1. Establishing a Base URI

   The term "relative URI" implies that there exists some absolute "base
   URI" against which the relative reference is applied.  Indeed, the
   base URI is necessary to define the semantics of any relative URI
   reference; without it, a relative reference is meaningless.  In order
   for relative URI to be usable within a document, the base URI of that
   document must be known to the parser.


   The base URI of a document can be established in one of four ways,
   listed below in order of precedence.  The order of precedence can be
   thought of in terms of layers, where the innermost defined base URI
   has the highest precedence.  This can be visualized graphically as:

      |  .----------------------------------------------------.  |
      |  |  .----------------------------------------------.  |  |
      |  |  |  .----------------------------------------.  |  |  |
      |  |  |  |  .----------------------------------.  |  |  |  |
      |  |  |  |  |       <relative_reference>       |  |  |  |  |
      |  |  |  |  `----------------------------------'  |  |  |  |
      |  |  |  | (5.1.1) Base URI embedded in the       |  |  |  |
      |  |  |  |         document's content             |  |  |  |
      |  |  |  `----------------------------------------'  |  |  |
      |  |  | (5.1.2) Base URI of the encapsulating entity |  |  |
      |  |  |         (message, document, or none).        |  |  |
      |  |  `----------------------------------------------'  |  |
      |  | (5.1.3) URI used to retrieve the entity            |  |
      |  `----------------------------------------------------'  |
      | (5.1.4) Default Base URI is application-dependent        |


5.1.1. Base URI within Document Content
   Within certain document media types, the base URI of the document can
   be embedded within the content itself such that it can be readily
   obtained by a parser.  This can be useful for descriptive documents,
   such as tables of content, which may be transmitted to others through
   protocols other than their usual retrieval context (e.g., E-Mail or
   USENET news).


   It is beyond the scope of this document to specify how, for each
   media type, the base URI can be embedded.  It is assumed that user
   agents manipulating such media types will be able to obtain the
   appropriate syntax from that media type's specification.  An example
   of how the base URI can be embedded in the Hypertext Markup Language
   (HTML) [RFC1866] is provided in Appendix D.

各メディアタイプに基底URIをどう埋め込むかについては当文書の範囲を超えるものである。そのようなメディアタイプを処理するユーザエージェントは、そのメディアタイプの仕様書から適切な構文を入手することが望ましい。HTML[RFC1866]への基底URI埋め込み方法の例はD. HTML文書中の基底URIの埋め込みにて触れている。

   A mechanism for embedding the base URI within MIME container types
   (e.g., the message and multipart types) is defined by MHTML
   [RFC2110].  Protocols that do not use the MIME message header syntax,
   but which do allow some form of tagged metainformation to be included
   within messages, may define their own syntax for defining the base
   URI as part of a message.


5.1.2. Base URI from the Encapsulating Entity
   If no base URI is embedded, the base URI of a document is defined by
   the document's retrieval context.  For a document that is enclosed
   within another entity (such as a message or another document), the
   retrieval context is that entity; thus, the default base URI of the
   document is the base URI of the entity in which the document is


5.1.3. Base URI from the Retrieval URI
   If no base URI is embedded and the document is not encapsulated
   within some other entity (e.g., the top level of a composite entity),
   then, if a URI was used to retrieve the base document, that URI shall
   be considered the base URI.  Note that if the retrieval was the
   result of a redirected request, the last URI used (i.e., that which
   resulted in the actual retrieval of the document) is the base URI.


5.1.4. Default Base URI
   If none of the conditions described in Sections 5.1.1--5.1.3 apply,
   then the base URI is defined by the context of the application.
   Since this definition is necessarily application-dependent, failing
   to define the base URI using one of the other methods may result in
   the same content being interpreted differently by different types of


   It is the responsibility of the distributor(s) of a document
   containing relative URI to ensure that the base URI for that document
   can be established.  It must be emphasized that relative URI cannot
   be used reliably in situations where the document's base URI is not


5.2. Resolving Relative References to Absolute Form

   This section describes an example algorithm for resolving URI
   references that might be relative to a given base URI.


   The base URI is established according to the rules of Section 5.1 and
   parsed into the four main components as described in Section 3.  Note
   that only the scheme component is required to be present in the base
   URI; the other components may be empty or undefined.  A component is
   undefined if its preceding separator does not appear in the URI
   reference; the path component is never undefined, though it may be
   empty.  The base URI's query component is not used by the resolution
   algorithm and may be discarded.

基底URIは、5.1. 基底URIの確立の規則に従い確定され、3. URI構文の構成要素で触れた4つの主なコンポーネントとして解釈される。特筆すべきは、基底URIの中で必須とされているのがスキームコンポーネントだけという点であり、即ち他のコンポーネントは空でもよいし未定義でもよい。コンポーネントが未定義であるとは、URI参照の中にそれに先行するセパレータが出現しないことであり、パスコンポーネントは例え空であっても未定義とはならない。基底URIのクエリコンポーネントは解決アルゴリズムでは用いないので破棄してもよい。

   For each URI reference, the following steps are performed in order:


   1) The URI reference is parsed into the potential four components and
      fragment identifier, as described in Section 4.3.

1)4.3. URI参照の構文解析の説明に従い、URI参照は4つのコンポーネントとフラグメント識別子として解釈される。

   2) If the path component is empty and the scheme, authority, and
      query components are undefined, then it is a reference to the
      current document and we are done.  Otherwise, the reference URI's
      query and fragment components are defined as found (or not found)
      within the URI reference and not inherited from the base URI.


   3) If the scheme component is defined, indicating that the reference
      starts with a scheme name, then the reference is interpreted as an
      absolute URI and we are done.  Otherwise, the reference URI's
      scheme is inherited from the base URI's scheme component.

      Due to a loophole in prior specifications [RFC1630], some parsers
      allow the scheme name to be present in a relative URI if it is the
      same as the base URI scheme.  Unfortunately, this can conflict
      with the correct parsing of non-hierarchical URI.  For backwards
      compatibility, an implementation may work around such references
      by removing the scheme if it matches that of the base URI and the
      scheme is known to always use the <hier_part> syntax.  The parser
      can then continue with the steps below for the remainder of the
      reference components.  Validating parsers should mark such a
      misformed relative reference as an error.



   4) If the authority component is defined, then the reference is a
      network-path and we skip to step 7.  Otherwise, the reference
      URI's authority is inherited from the base URI's authority
      component, which will also be undefined if the URI scheme does not
      use an authority component.


   5) If the path component begins with a slash character ("/"), then
      the reference is an absolute-path and we skip to step 7.


   6) If this step is reached, then we are resolving a relative-path
      reference.  The relative path needs to be merged with the base
      URI's path.  Although there are many ways to do this, we will
      describe a simple method using a separate string buffer.


      a) All but the last segment of the base URI's path component is
         copied to the buffer.  In other words, any characters after the
         last (right-most) slash character, if any, are excluded.


      b) The reference's path component is appended to the buffer


      c) All occurrences of "./", where "." is a complete path segment,
         are removed from the buffer string.


      d) If the buffer string ends with "." as a complete path segment,
         that "." is removed.


      e) All occurrences of "<segment>/../", where <segment> is a
         complete path segment not equal to "..", are removed from the
         buffer string.  Removal of these path segments is performed
         iteratively, removing the leftmost matching pattern on each
         iteration, until no matching pattern remains.


      f) If the buffer string ends with "<segment>/..", where <segment>
         is a complete path segment not equal to "..", that
         "<segment>/.." is removed.


      g) If the resulting buffer string still begins with one or more
         complete path segments of "..", then the reference is
         considered to be in error.  Implementations may handle this
         error by retaining these components in the resolved path (i.e.,
         treating them as part of the final URI), by removing them from
         the resolved path (i.e., discarding relative levels above the
         root), or by avoiding traversal of the reference.


      h) The remaining buffer string is the reference URI's new path


   7) The resulting URI components, including any inherited from the
      base URI, are recombined to give the absolute form of the URI
      reference.  Using pseudocode, this would be

         result = ""

         if scheme is defined then
             append scheme to result
             append ":" to result

         if authority is defined then
             append "//" to result
             append authority to result

         append path to result

         if query is defined then
             append "?" to result
             append query to result

         if fragment is defined then
             append "#" to result
             append fragment to result

         return result


    result = ""






    return result
      Note that we must be careful to preserve the distinction between a
      component that is undefined, meaning that its separator was not
      present in the reference, and a component that is empty, meaning
      that the separator was present and was immediately followed by the
      next component separator or the end of the reference.


   The above algorithm is intended to provide an example by which the
   output of implementations can be tested -- implementation of the
   algorithm itself is not required.  For example, some systems may find
   it more efficient to implement step 6 as a pair of segment stacks
   being merged, rather than as a series of string pattern replacements.


      Note: Some WWW client applications will fail to separate the
      reference's query component from its path component before merging
      the base and reference paths in step 6 above.  This may result in
      a loss of information if the query component contains the strings
      "/../" or "/./".


   Resolution examples are provided in Appendix C.

解決例は「C. 相対URI参照を解決する為の例」にて触れる。


Copyright (C) 2006 七鍵 key@do.ai 初版:2006年04月14日 最終更新:2006年09月27日