3. URI Syntactic Components

   The URI syntax is dependent upon the scheme.  In general, absolute
   URI are written as follows:



   An absolute URI contains the name of the scheme being used (<scheme>)
   followed by a colon (":") and then a string (the <scheme-specific-
   part>) whose interpretation depends on the scheme.


   The URI syntax does not require that the scheme-specific-part have
   any general structure or set of semantics which is common among all
   URI.  However, a subset of URI do share a common syntax for
   representing hierarchical relationships within the namespace.  This
   "generic URI" syntax consists of a sequence of four main components:



   each of which, except <scheme>, may be absent from a particular URI.
   For example, some URI schemes do not allow an <authority> component,
   and others do not use a <query> component.

      absoluteURI   = scheme ":" ( hier_part | opaque_part )


   URI that are hierarchical in nature use the slash "/" character for
   separating hierarchical components.  For some file systems, a "/"
   character (used to denote the hierarchical structure of a URI) is the
   delimiter used to construct a file name hierarchy, and thus the URI
   path will look similar to a file pathname.  This does NOT imply that
   the resource is a file or that the URI maps to an actual filesystem

      hier_part     = ( net_path | abs_path ) [ "?" query ]

      net_path      = "//" authority [ abs_path ]

      abs_path      = "/"  path_segments


   URI that do not make use of the slash "/" character for separating
   hierarchical components are considered opaque by the generic URI

      opaque_part   = uric_no_slash *uric

      uric_no_slash = unreserved | escaped | ";" | "?" | ":" | "@" |
                      "&" | "=" | "+" | "$" | ","


   We use the term <path> to refer to both the <abs_path> and
   <opaque_part> constructs, since they are mutually exclusive for any
   given URI and can be parsed as a single component.


3.1. Scheme Component

   Just as there are many different methods of access to resources,
   there are a variety of schemes for identifying such resources.  The
   URI syntax consists of a sequence of components separated by reserved
   characters, with the first component defining the semantics for the
   remainder of the URI string.


   Scheme names consist of a sequence of characters beginning with a
   lower case letter and followed by any combination of lower case
   letters, digits, plus ("+"), period ("."), or hyphen ("-").  For
   resiliency, programs interpreting URI should treat upper case letters
   as equivalent to lower case in scheme names (e.g., allow "HTTP" as
   well as "http").

      scheme        = alpha *( alpha | digit | "+" | "-" | "." )


   Relative URI references are distinguished from absolute URI in that
   they do not begin with a scheme name.  Instead, the scheme is
   inherited from the base URI, as described in Section 5.2.

スキーム名から始まっていないURI参照は相対URI参照とみなされ、絶対URIと区別される。代わりに相対URI参照のスキームは基底URIから継承され、詳細は5.2. 相対参照を絶対形式へ解決する方法に記す。

3.2. Authority Component

   Many URI schemes include a top hierarchical element for a naming
   authority, such that the namespace defined by the remainder of the
   URI is governed by that authority.  This authority component is
   typically defined by an Internet-based server or a scheme-specific
   registry of naming authorities.

      authority     = server | reg_name


   The authority component is preceded by a double slash "//" and is
   terminated by the next slash "/", question-mark "?", or by the end of
   the URI.  Within the authority component, the characters ";", ":",
   "@", "?", and "/" are reserved.


   An authority component is not required for a URI scheme to make use
   of relative references.  A base URI without an authority component
   implies that any relative reference will also be without an authority


3.2.1. Registry-based Naming Authority
   The structure of a registry-based naming authority is specific to the
   URI scheme, but constrained to the allowed characters for an
   authority component.

      reg_name      = 1*( unreserved | escaped | "$" | "," |
                          ";" | ":" | "@" | "&" | "=" | "+" )


3.2.2. Server-based Naming Authority
   URL schemes that involve the direct use of an IP-based protocol to a
   specified server on the Internet use a common syntax for the server
   component of the URI's scheme-specific data:


   where <userinfo> may consist of a user name and, optionally, scheme-
   specific information about how to gain authorization to access the
   server.  The parts "<userinfo>@" and ":<port>" may be omitted.

      server        = [ [ userinfo "@" ] hostport ]



   The user information, if present, is followed by a commercial at-sign

      userinfo      = *( unreserved | escaped |
                         ";" | ":" | "&" | "=" | "+" | "$" | "," )


   Some URL schemes use the format "user:password" in the userinfo
   field. This practice is NOT RECOMMENDED, because the passing of
   authentication information in clear text (such as URI) has proven to
   be a security risk in almost every case where it has been used.


   The host is a domain name of a network host, or its IPv4 address as a
   set of four decimal digit groups separated by ".".  Literal IPv6
   addresses are not supported.

      hostport      = host [ ":" port ]
      host          = hostname | IPv4address
      hostname      = *( domainlabel "." ) toplabel [ "." ]
      domainlabel   = alphanum | alphanum *( alphanum | "-" ) alphanum
      toplabel      = alpha | alpha *( alphanum | "-" ) alphanum
      IPv4address   = 1*digit "." 1*digit "." 1*digit "." 1*digit
      port          = *digit


   Hostnames take the form described in Section 3 of [RFC1034] and
   Section 2.1 of [RFC1123]: a sequence of domain labels separated by
   ".", each domain label starting and ending with an alphanumeric
   character and possibly also containing "-" characters.  The rightmost
   domain label of a fully qualified domain name will never start with a
   digit, thus syntactically distinguishing domain names from IPv4
   addresses, and may be followed by a single "." if it is necessary to
   distinguish between the complete domain name and any local domain.
   To actually be "Uniform" as a resource locator, a URL hostname should
   be a fully qualified domain name.  In practice, however, the host
   component may be a local domain literal.

      Note: A suitable representation for including a literal IPv6
      address as the host part of a URL is desired, but has not yet been
      determined or implemented in practice.



   The port is the network port number for the server.  Most schemes
   designate protocols that have a default port number.  Another port
   number may optionally be supplied, in decimal, separated from the
   host by a colon.  If the port is omitted, the default port number is


3.3. Path Component

   The path component contains data, specific to the authority (or the
   scheme if there is no authority component), identifying the resource
   within the scope of that scheme and authority.

      path          = [ abs_path | opaque_part ]

      path_segments = segment *( "/" segment )
      segment       = *pchar *( ";" param )
      param         = *pchar

      pchar         = unreserved | escaped |
                      ":" | "@" | "&" | "=" | "+" | "$" | ","


   The path may consist of a sequence of path segments separated by a
   single slash "/" character.  Within a path segment, the characters
   "/", ";", "=", and "?" are reserved.  Each path segment may include a
   sequence of parameters, indicated by the semicolon ";" character.
   The parameters are not significant to the parsing of relative


3.4. Query Component

   The query component is a string of information to be interpreted by
   the resource.

      query         = *uric


   Within a query component, the characters ";", "/", "?", ":", "@",
   "&", "=", "+", ",", and "$" are reserved.



Copyright (C) 2006 七鍵 key@do.ai 初版:2006年04月14日 最終更新:2006年09月26日