HTTP: The Hypertext Transfer Protocol
The Hypertext Transfer Protocol
By Mick Knutson
23,986 Downloads · Refcard 172 of 199 (see them all)
The Essential HTTP Cheat Sheet
HTTP: The Hypertext Transfer Protocol
The Hypertext Transfer Protocol (HTTP) is an application protocol for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web.
Hypertext is a multi-linear set of objects, building a network by using logical links (the so-called hyperlinks) between the nodes (e.g. text or words). HTTP is the protocol to exchange or transfer hypertext.
RFC 2616 Hypertext Transfer Protocol: http://www.w3.org/Protocols/rfc2616/rfc2616.html
Uniform resource identifier (URI) is a string of characters used to identify a name or a resource.
RFC 1630: Universal Resource Identifiers (URI): http://tools.ietf.org/html/rfc1630
Uniform resource name (URN) is a uniform resource identifier (URI) that uses the urn scheme and does not imply availability of the identified resource. Both URN's (names) and URL's (locators) are URI's, and a particular URI may be a name and a locator at the same time.
The URN for 'Java EE6 Cookbook for Securing, Tuning and Extending Enterprise applications.'
RFC 1737: Uniform Resource Names (URN): http://tools.ietf.org/html/rfc1737
Uniform resource locator (URL) is a specific character string that constitutes a reference to an Internet resource.
RFC 1808: Relative Uniform Resource Locators (URL): http://tools.ietf.org/html/rfc1808
|http:||Hypertext transfer protocol|
|https:||Secured Hypertext transfer protocol|
|ftp:||File transfer protocol|
QUERY STRING ENCODING USES THE FOLLOWING RULES:
- Letters (A-Z and a-z), numbers (0-9) and the characters '.','-','~' and '_' are left as-is
- SPACE is encoded as '+' or hexadecimal (%20)
- All other characters are encoded as %FF hexadecimal representation with any non-ASCII characters first encoded as UTF-8 (or other specified encoding)
- The typical maximum value for a single GET parameter value is 512 bytes
HTTP defines methods (sometimes referred to as "verbs") to indicate the desired action to be performed on the identified resource. What this resource represents, whether pre-existing data or data that is generated dynamically, depends on the implementation of the server. Often, the resource corresponds to a file or the output of an executable residing on the server.
The HTTP/1.0 specification: section 8 defined the GET, POST and HEAD methods and the HTTP/1.1 specification: section 9 added 5 new methods: OPTIONS, PUT, DELETE, TRACE and CONNECT. By being specified in these documents their semantics are known and can be depended upon. Any client can use any method that they want and the server can choose to support any method it wants. If a method is unknown to an intermediate it will be treated as an un-safe and non-idempotent method. There is no limit to the number of methods that can be defined and this allows for future methods to be specified without breaking existing infrastructure. For example WebDAV (RFC5789) defined 7 new methods and RFC5789 specified the PATCH method.
|CONNECT||This specification reserves the method name CONNECT for use with a proxy that can dynamically switch to being a tunnel (e.g. SSL tunneling).|
|DELETE||The DELETE method requests that the origin server delete the resource identified by the Request-URI.|
|GET||The GET method means retrieves whatever information (in the form of an entity) is identified by the Request-URI.|
|HEAD||The HEAD method is identical to GET except that the server MUST NOT return a message-body in the response.|
|OPTIONS||The OPTIONS method represents a request for information about the communication options available on the request/response chain identified by the Request-URI.|
|POST||The POST method is used to request that the origin server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line.|
|PUT||The PUT method requests that the enclosed entity be stored under the supplied Request-URI.|
|TRACE||The TRACE method is used to invoke a remote, application-layer loop-back of the request message.|
RFC 2616-sec9: HTTP Method definitions: http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html
RFC 2616 (Hypertext Transfer Protocol HTTP/1.1) section 3.2.1: The HTTP protocol does not place any a priori limit on the length of a URI. Servers must be able to handle the URI of any resource they serve, and should be able to handle URIs of unbounded length if they provide GET-based forms that could generate such URIs. A server should return 414 (Request-URI Too Long) status if a URI is longer than the server can handle (see section 10.4.15).
URI Length Limits
|Firefox||Unlimited, although instability occurs with URLs reaching around 65,000 characters.|
|Internet Explorer v6 - v7||Maximum length of a URL in Internet Explorer is 2,083 characters, with no more than 2,048 characters in the path portion of the URL.|
|Internet Explorer v8+||Maximum length of a URL in Internet Explorer is 4,095 characters, with no more than 2,048 characters in the path portion of the URL. Maximum mailto: length is 500 to 512 characters long.|
|Sitemap Protocol||<loc> URL of the page. This URL must begin with the protocol (such as http) and end with a trailing slash, if your web server requires it. This value must be less than 2,048 characters.|
|GoogleBot crawler||Google will index URLs up to 2047 characters in length.|
|Google search results page(SERP)||Google index-able links that will work when clicked in the SERP's is ~1855 characters in length.|
RFC 3986: Uniform Resource Identifier (URI) section 3.2.2: URI producers should use names that conform to the DNS syntax, even when use of DNS is not immediately apparent, and should limit these names to no more than 255 characters in length.
The request message consists of the following:
The request line and headers must all end with <CR><LF> (that is, a carriage return character followed by a line feed character (\r\n)). The empty line must consist of only <CR><LF> and no other whitespace. In the HTTP/1.1 protocol, all headers except Host are optional.
A request line containing only the path name is accepted by servers to maintain compatibility with HTTP clients before the HTTP/1.0 specification in RFC1945.
RFC 2616-sec5: HTTP Request: http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html
The response message consists of the following:
The Status-Line and headers must all end with <CR><LF> (a carriage return followed by a line feed). The empty line must consist of only <CR><LF> and no other whitespace.
RFC 2616-sec6: HTTP Response: http://www.w3.org/Protocols/rfc2616/rfc2616-sec6.html
Status Code and Reason Phrase
The Status-Code element is a 3-digit integer result code of the attempt to understand and satisfy the request. The Reason-Phrase is intended to give a short textual description of the Status-Code. The Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason- Phrase.
The first digit of the Status-Code defines the class of response. The last two digits do not have any categorization role. There are 5 values for the first digit:
|Informational 1xx||This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request.|
|This class of status code indicates that the client's request was successfully received, understood, and accepted.|
|Redirection 3xx||This class of status code indicates that further action needs to be taken by the user agent in order to fulfill the request.|
Client Error 4xx
|The 4xx class of status code is intended for cases in which the client seems to have erred.|
|Server Error 5xx||Response status codes beginning with the digit "5" indicate cases in which the server is aware that it has erred or is incapable of performing the request.|
Common Status Codes
|200||RFC-2616 Section 10.2.1: OK|
|301||RFC-2616 Section 10.3.2: Moved Permanently|
|304||RFC-2616 Section 10.3.5: Not Modified|
|307||RFC-2616 Section 10.3.8: Temporary Redirect|
|400||RFC-2616 Section 10.4.1: Bad Request|
|401||RFC-2616 Section 10.4.2: Unauthorized|
|403||RFC-2616 Section 10.4.4: Forbidden|
|404||RFC-2616 Section 10.4.5: Not Found|
|405||RFC-2616 Section 10.4.6: Method Not Allowed|
|408||RFC-2616 Section 10.4.9: Request Time-out|
|414||RFC-2616 Section 10.4.15: Request-URI Too Large|
|500||RFC-2616 Section 10.5.1: Internal Server Error|
RFC 2616-sec10 HTTP Status Code Definitions: http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
HTTP header fields are components of the message header of requests and responses in the Hypertext Transfer Protocol (HTTP). They define the operating parameters of an HTTP transaction.
The header fields are transmitted after the request or response line, the first line of a message. Header fields are colon-separated name-value pairs in clear-text string format, terminated by a carriage return (CR) and line feed (LF) character sequence. The end of the header fields is indicated by an empty field, resulting in the transmission of two consecutive CR-LF pairs. Long lines can be folded into multiple lines; continuation lines are indicated by presence of space (SP) or horizontal tab (HT) as first character on the next line. A few fields can also contain comments (i.e. in. User-Agent, Server, Via fields), which can be ignored by software.
|Accept||Content-Types that are acceptable.||Accept: text/plain|
|Accept-Charset||Character sets that are acceptable||Accept-Charset: utf-8|
|Accept-Encoding||Acceptable encodings. See HTTP
|Acceptable languages for response.||Accept-Language: en-US|
|Accept-Datetime||Acceptable version in time.||Accept-Datetime: Tue, 19
Jun 2012 10:10:10 GMT
|Authorization||Authentication credentials for HTTP
|Cache-Control||Used to specify directives that MUST
be obeyed by all caching mechanisms
along the request/response chain.
|Connection||What type of connection the useragent
|Cookie||an HTTP cookie previously sent by
the server with Set-Cookie header.
|Content-Length||The length of the request body in
octets (8-bit bytes).
|Content-MD5||A Base64-encoded binary MD5 sum of the content of the request body.||Content-MD5: bWljazpzZWNyZXQga2V5|
|Content-Type||The MIME type of the body of the request (used with POST and PUT requests).||Content-Type: application/ x-www-form-urlencoded|
|Date||The date and time that the message was sent.||Date: Tue, 19 Jun 2012 10:10:10 GMT|
|Expect||Indicates that particular server behaviors are required by the client.||Expect: 100-continue|
|From||The email address of the user making the request.||From: firstname.lastname@example.org|
|Host||The domain name of the server (for virtual hosting), and the TCP port number on which the server is listening. The port number may be omitted if the port is the standard port for the service requested. Mandatory since HTTP/1.1.||Host: baselogic.com:80 Host: baselogic.com|
|If-Match||Only perform the action if the client supplied entity matches the same entity on the server. This is mainly for methods like PUT to only update a resource if it has not been modified since the user last updated it.||If-Match: "bWljazpzZWNyZXQga2V5"|
|If-Modified- Since||Allows a 304 Not Modified to be returned if content is unchanged.||If-Modified-Since: Tue, 19 Jun 2012 10:10:10 GMT|
|If-None-Match||Allows a 304 Not Modified to be returned if content is unchanged.||If-None-Match: "bWljazpzZWNyZXQga2V5"|
|If-Range||If the entity is unchanged, send me the part(s) that I am missing; otherwise, send me the entire new entity.||If-Range: "bWljazpzZWNyZXQga2V5"|
|If-Unmodified- Since||Only send the response if the entity has not been modified since a specific time.||If-Unmodified-Since: Tue, 19 Jun 2012 10:10:10 GMT|
|Max-Forwards||Limit the number of times the message can be forwarded through proxies or gateways.||Max-Forwards: 10|
|Pragma||Implementation-specific headers that may have various effects anywhere along the request-response chain.||Pragma: no-cache|
|Proxy- Authorization||Authorization credentials for connecting to a proxy.||Proxy-Authorization: Basic bWljazpzZWNyZXQga2V5|
|Range||Request only part of an entity. Bytes are numbered from 0.||Range: bytes=500-999|
|Referer||This is the address of the previous web page from which a link to the currently requested page was followed. (The word "referrer" is misspelled in the RFC as well as in most implementations.)||Referer: http://baselogic. com/|
|TE||The transfer encodings the user agent is willing to accept: the same values as for the response header TE can be used, plus the "trailers" value (related to the "chunked" transfer method) to notify the server it expects to receive additional headers (the trailers) after the last, zero-sized, chunk.||TE: trailers, deflate|
|Upgrade||Ask the server to upgrade to another protocol.||Upgrade: HTTP/2.0, SHTTP/1.3, IRC/6.9, RTA/ x11|
|User-Agent||The user agent string of the user agent||User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20100101 Firefox/12.0|
|Via||Informs the server of proxies through which the request was sent.||Via: 1.0 mick, 1.1 baselogic. com (Apache/2.1)|
|Warning||A general warning about possible problems with the entity body.||Warning: 199 Miscellaneous warning|
|Specifying which web sites can participate in cross-origin resource sharing||Access-Control-Allow- Origin: *|
|Accept-Ranges||What partial content range types this server supports||Accept-Ranges: bytes|
|Age||The age the object has been in a proxy cache in seconds||Age: 12|
|Allow||Valid actions for a specified resource. To be used for a 405 Method not allowed||Allow: GET, HEAD|
|Cache-Control||Tells all caching mechanisms from server to client whether they may cache this object. It is measured in seconds||Cache-Control: max-age=3600|
|Connection||Options that are desired for the connection||Connection: close|
|Content-Encoding||The type of encoding used on the data. See HTTP compression.||Content-Encoding: gzip|
|Content-Language||The language the content is in.||Content-Language: fr|
|Content-Length||The length of the response body in octets (8-bit bytes)||Content-Length: 348|
|Content-Location||An alternate location for the returned data||Content-Location: /index.htm|
|Content-MD5||A Base64-encoded binary MD5 sum of the content of the response||Content-MD5: bWljazpzZWNyZXQga2V5|
|Content-Disposition||An opportunity to raise a "File Download" dialogue box for a known MIME type with binary format or suggest a filename for dynamic content. Quotes are necessary with special characters.||Content-Disposition: attachment; filename="fname.ext"|
|Content-Range||Where in a full body message this partial message belongs||Content-Range: bytes 21010-47021/47022|
|Content-Type||The MIME type of this content||Content-Type: text/html; charset=utf-8|
|Date||The date and time that the message was sent||Date: Tue, 19 Jun 2012 10:10:10 GMT|
|ETag||An identifier for a specific version of a resource, often a message digest||ETag: "bWljazpzZWNyZXQga2V5"|
|Expires||Gives the date/time after which the response is considered stale||Expires: Date: Tue, 19 Jun 2012 10:10:10 GMT|
|Last-Modified||The last modified date for the requested object, in RFC 2822 format||Last-Modified: Date: Tue, 19 Jun 2012 10:10:10 GMT|
|Link||Used to express a typed relationship with another resource, where the relation type is defined by RFC 5988||Link: </feed>; rel="alternate"|
|Location||Used in redirection, or when a new resource has been created.||Location: http://www.w3.org/pub/WWW/People.html|
|P3P||This header is supposed to set Platform for Privacy Preferences Project (P3P) policy, in the form of P3P:CP="your_compact_policy". However, P3P did not take off, most browsers have never fully implemented it, a lot of websites set this header with fake policy text, that was enough to fool browsers the existence of P3P policy and grant permissions for third party cookies.||P3P: CP="This is not a P3P policy! See http://www.google.com/support/ accounts/bin/answer.py?hl=en&answer=151657 for more info."|
|Pragma||Implementation-specific headers that may have various effects anywhere along the request-response chain.||Pragma: no-cache|
|Proxy-Authenticate||Request authentication to access the proxy.||Proxy-Authenticate: Basic|
|Refresh||Used in redirection, or when a new resource has been created. This refresh redirects after 5 seconds. This is a proprietary, non-standard header extension introduced by Netscape and supported by most web browsers.||Refresh: 5; url=http://baselogic.com/index.html|
|Retry-After||If an entity is temporarily unavailable, this instructs the client to try again after a specified period of time (seconds).||Retry-After: 120|
|Server||A name for the server||Server: Apache/2.4 (Unix)|
|Set-Cookie||Sets an HTTP Cookie||Set-Cookie: UserID=JaneSmith; Max-Age=3600; Version=1|
|Strict-transfer-Security||A HSTS Policy informing the HTTP client how long to cache the HTTPS only policy and whether this applies to subdomains.||Strict-transfer-Security: max-age=16070400; includeSubDomains|
|Trailer||The Trailer general field value indicates that the given set of header fields is present in the trailer of a message encoded with chunked transfer-coding.||Trailer: Max-Forwards|
|Transfer-Encoding||The form of encoding used to safely transfer the entity to the user. Currently defined methods are: chunked, compress, deflate, gzip, identity.||Transfer-Encoding: chunked|
|Vary||Tells downstream proxies how to match future request headers to decide whether the cached response can be used rather than requesting a fresh one from the origin server.||Vary: *|
|Via||Informs the client of proxies through which the response was sent.||Via: 1.0 mick, 1.1 baselogic.com (Apache/2.4)|
|Warning||A general warning about possible problems with the entity body.||A general warning about possible problems with the entity body.|
|WWW-Authenticate||Indicates the authentication scheme that should be used to access the requested entity.||WWW-Authenticate: Basic|
RFC 2616-sec14: Header Field Definitions: http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
An Internet media type is a two-part identifier for file formats on the Internet. The identifiers were originally defined in RFC 2046 for use in email sent through SMTP, but their use has expanded to other protocols such as HTTP. These types were called MIME types, and are sometimes referred to as Content-types, after the name of a header in several protocols whose value is such a type.
A media type is composed of two or more parts: A type, a subtype, and zero or more optional parameters. For example, subtypes of text have an optional charset parameter that can be included to indicate the character encoding (e.g. text/html; charset=UTF-8).
Common application MIME Types
|application/EDI-X12||EDI X12 data; Defined in RFC 1767|
|application/EDIFACT||EDI EDIFACT data; Defined in RFC 1767|
|application/octet-stream||Arbitrary binary data. Generally speaking this type identifies files that are not associated with a specific application. Contrary to past assumptions by software packages such as Apache this is not a type that should be applied to unknown files. In such a case, a server or application should not indicate a content type, as it may be incorrect, but rather, should omit the type in order to allow the recipient to guess the type.|
|application/ogg||Ogg, a multimedia bitstream container format; Defined in RFC 5334|
|application/pdf||Portable Document Format, PDF has been in use for document exchange on the Internet since 1993; Defined in RFC 3778|
|application/postscript||PostScript; Defined in RFC 2046|
|application/rdf+xml||Resource Description Framework; Defined by RFC 3870|
|application/soap+xml||SOAP; Defined by RFC 3902|
|application/font-woff||Web Open Font Format; (candidate recommendation; use application/x-font-woff until standard is official)|
|application/xhtml+xml||XHTML; Defined by RFC 3236|
|application/xml-dtd||Document Type Definition (DTD) files; Defined by RFC 3023|
|application/xop+xml||XML-binary Optimized Packaging (XOP)|
|application/zip||ZIP archive files; Registered|
|application/gzip||Gzip, Defined in RFC 6713|
Common multipart MIME Types
|multipart/mixed||MIME Email; Defined in RFC 2045 and RFC 2046|
|multipart/alternative||MIME Email; Defined in RFC 2045 and RFC 2046|
|multipart/related||MIME Email; Defined in RFC 2387 and used by MHTML (HTML mail)|
|multipart/form-data||MIME Webform; Defined in RFC 2388|
|multipart/signed||Defined in RFC 1847|
|multipart/encrypted||Defined in RFC 1847|
Common text MIME Types
|text/cmd||commands; subtype resident in Gecko browsers like Firefox 3.5|
|text/css||Cascading Style Sheets; Defined in RFC 2318|
|text/csv||Comma-separated values; Defined in RFC 4180|
|text/html||HTML; Defined in RFC 2854|
|text/plain||Textual data; Defined in RFC 2046 and RFC 3676|
|text/vcard||vCard (contact information); Defined in RFC 6350|
|text/xml||Extensible Markup Language; Defined in RFC 3023|
A cookie, also known as an HTTP cookie, web cookie, or browser cookie, is usually a small piece of data sent from a website and stored in a user's web browser while a user is browsing a website. When the user browses the same website in the future, the data stored in the cookie can be retrieved by the website to notify the website of the user's previous activity.Cookies were designed to be a reliable mechanism for websites to remember the state of the website or activity the user had taken in the past. This can include clicking particular buttons, logging in, or a record of which pages were visited by the user even months or years ago.
SERVER SETTING COOKIES IN RESPONSE:
Set-Cookie: name=value Set-Cookie: name2=value2; Expires=Tue, 26 Jun 2012 19:19:47 GMT
BROWSER SENDING COOKIES TO SERVER IN REQUEST:
Cookie: name=value; name2=value2
Data URL scheme can be useful for embedding images into HTML/CSS/JS to save on HTTP requests instead of referencing remote files.
STANDARD <IMG> TAG:
<img width="99" height="99" alt="BASE Logic, Inc. logo" src="/sites/all/modules/dzone/assets/refcardz/172/http://baselogic. com/images/BLiNC_logo.png" />
DATA URI IMAGE IN <IMG> TAG:
<img width="99" height="99" alt="BASE Logic, Inc. logo" src="/sites/all/modules/dzone/assets/refcardz/172/data:image/ png;base64,R0lGODlhEAAQAMQAAORHHOVSKudfOulrSOp3WOyDZu6QdvCchPGolfO0o/XBs/ fNwfjZ0frl3/zy7///
If a web server responds with Cache-Control: no-cache then a web browser or other caching system must not use the response to satisfy subsequent responses without first checking with the originating server. This header field is part of HTTP version 1.1, and is ignored by some caches and browsers. It may be simulated by setting the Expires HTTP version 1.0 header field value to a time earlier than the response time.
The request that a resource should not be cached is no guarantee that it will not be written to disk. In particular, the HTTP/1.1 definition draws a distinction between history stores and caches. If the user navigates back to a previous page a browser may still show you a page that has been stored on disk in the history store. This is correct behavior according to the specification. Many user agents show different behavior in loading pages from the history store or cache depending on whether the protocol is HTTP or HTTPS.
The header field Cache-Control: no-store is intended to instruct a browser application to make a best effort not to write it to disk. The Pragma: no-cache header field is an HTTP/1.0 header intended for use in requests. It is a means for the browser to tell the server and any intermediate caches that it wants a fresh version of the resource, not for the server to tell the browser not to cache the resource. Some user agents do pay attention to this header in responses, but the HTTP/1.1 RFC specifically warns against relying on this behavior.
RFC 2616-sec13: Caching in HTTP: http://www.w3.org/Protocols/rfc2616/rfc2616-sec13.html
HTTP Conditional GET
A conditional GET is an HTTP GET request that returns an HTTP 304 response (versus HTTP 200). An HTTP 304 response indicates that the resource has not been modified since the previous GET request and the resource will not be returned to the requesting client as part of the response.
[Last-Modified / If-Modified-Since] [ETag / If-None-Match]
[MickKnutson]$ curl --silent --head --header 'If-Modified-Since: Tue, 19 Jun 2012 10:10:10 GMT' http://baselogic.com/images/BLiNC_logo.png
SERVER RESPONSE HTTP 304:
HTTP/1.1 304 Not Modified Date: Tue, 26 Jun 2012 19:13:29 GMT Server: Apache/2.2.21 (Unix) Connection: close Expires: Wed, 27 Jun 2012 19:13:29 GMT Cache-Control: max-age=86400
transfer Layer Security (TLS / SSL)
transfer Layer Security (TLS) and its predecessor, Secure Sockets Layer (SSL), are cryptographic protocols that provide communication security over the Internet. TLS and SSL encrypt the segments of network connections at the Application Layer for the transfer Layer, using asymmetric cryptography for key exchange, symmetric encryption for privacy, and message authentication codes for message integrity.
RFC 5246 transfer Layer Security (TLS) Protocol v1.2: http://tools.ietf.org/html/ rfc5246
Basic access Authentication (BASIC)
In the context of an HTTP transaction, basic access authentication is a method for a web browser or other client program to provide a user name and password when making a request. This is the most basic way of implementing authentication for a web application and is suitable when we are accessing the application both using browser and other software such as scripts and so on. In this mode, when accessed by a browser, the browser will use its standard dialog to collect the credentials. It is easy to implement, but the credentials will be transmitted as plain text and anyone can collect them if we do not have TLS/SSL or some network level encryption in place.
CLIENT REQUEST (NO AUTHENTICATION):
GET /secured/index.html HTTP/1.1 Host: localhost
HTTP/1.1 401 Authorization Required Server: Apache Date: Tue, 26 Jun 2012 19:19:47 GMT WWW-Authenticate: Basic realm="Secure Area" Content-Type: text/html Content-Length: 311 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"> <HTML> <HEAD> <TITLE>Error</TITLE> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"> </HEAD> <BODY><H1>401 Unauthorized.</H1></BODY> </HTML>
Before transmitting the username and password entered by the user, the two are concatenated with a colon separating the two values; the resulting string is Base64 encoded. For example, given a username mick and password secret key, the string "mick:secret key" will be encoded with the Base64 algorithm resulting in bWljazpzZWNyZXQga2V5. The Base64- encoded string is transmitted in the HTTP header and decoded by the receiver, resulting in the decoded colon-separated username and password String. Encoding the username and password with the Base64 makes them unreadable visually, but they are easily decoded. Confidentiality is not the intent of the encoding step, rather, the intent is to encode non-HTTPcompatible characters that a username or password may contain, into those that are HTTP-compatible.
CLIENT REQUEST "MICK:SECRET KEY" (USER NAME "MICK", PASSWORD "SECRET KEY"):
GET /secured/index.html HTTP/1.1 Host: localhost Authorization: Basic bWljazpzZWNyZXQga2V5
HTTP/1.1 200 OK Server: Apache Date: Tue, 26 Jun 2012 19:19:47 GMT Content-Type: text/html Content-Length: 10476
Digest access authentication (DIGEST)
This method is similar to the BASIC authentication method, but instead of the password a digest of the password is transmitted.
Digest communication starts with a client that requests a resource from a web server. If the resource is secured with Digest Authentication, the server will respond with the http status code 401, which means Unauthorized to access this resource.
CLIENT REQUEST (NO AUTHENTICATION):
GET /dir/index.html HTTP/1.0 Host: localhost
In the response from the initial request, the server indicates in the HTTP header with which mechanism the resource is secured.
HTTP/1.0 401 Unauthorized Server: Apache Date: Tue, 26 Jun 2012 19:19:47 GMT WWW-Authenticate: Digest realm="baselogic.com", qop="auth", nonce="dcd98b7102dd2f0e8b11d0f600bfb0c093", opaque="5ccc069c403ebaf9f0171e9517f40e41" Content-Type: text/html Content-Length: 311 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/1999/REC-html401-19991224/loose.dtd"> <HTML> <HEAD> <TITLE>Error</TITLE> <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=ISO-8859-1"> </HEAD> <BODY><H1>401 Unauthorized.</H1></BODY> </HTML>
Now the user can enter the credentials using mick as the username and secret key as the password.
CLIENT REQUEST (USERNAME "MICK", PASSWORD "SECRET KEY"):
GET /secured/index.html HTTP/1.0 Host: localhost Authorization: Digest username="mick", realm="baselogic.com", nonce="dcd98b7102dd2f0e8b11d0f600bfb0c093", uri="/secured/index.html", qop=auth, nc=00000001, cnonce="0a4f113b", response="35f308904a9a9623498f358d1cb10afd"
HTTP/1.0 200 OK Server: Apache Date: Tue, 26 Jun 2012 20:19:47 GMT Content-Type: text/html Content-Length: 7984
You should notice the term Digest in the response which indicates that the resource requested by the client is secured using Digest Authentication. The server also indicates the type of Digest Authentication algorithm used by the client with Quality Of Protection (QOP) and the nonce string, which is a Base64 encoded timestamp and private hash generated by the server.
String nonce = Base64.encode(new Timestamp() : "Private MD5 Hash")
The private hash is created by the server, and the Base64 encoding allows for decoding of the timestamp and private hash, even though the private MD5 hash is a one-way encryption.
An internet browser responds to this by presenting the user a dialog, in this dialog the user is able to enter a username and password for credentials. The dialog does not show the warning about transmitting the credentials in clear text as with a Basic Authentication secured site.
The response is generated by several digest properties sent from the client, with the addition to an HA1 and HA2 value concatenated together, then MD5 hash encrypted. The algorithm is the following:
Response = MD5( "HA1:\ nonce:\ nc:\ cnonce:\ qop:\ HA2" )
The HA1 hash is the username, realm, and password separated by colons.
HA1 = MD5( "mick:baselogic.com:secret key")
The MD5 hash for the HA1 is: 935f1e7b1582ffd6e05e7dc8e949ac6f
The HA2 hash is the initial HTTP GET request made to the server.
HA2 = MD5( "GET:/secured/index.html" )
The MD5 hash for the HA2 is: cc21ab6caf04c32228f0250c9eb48705
The final response MD5 hash algorithm would look like this:
Response = MD5( 935f1e7b1582ffd6e05e7dc8e949ac6f:\ dcd98b7102dd2f0e8b11d0f600bfb0c093:\ 00000001:0a4f113b:auth:\ cc21ab6caf04c32228f0250c9eb48705")
This would result in 35f308904a9a9623498f358d1cb10afd which is the value for the response to the client sent to the server for authentication.
Listing: Submitting DIGEST authentication credentials.
Form-Based Authentication (FORM)
In this mode, we can use our own form to collect the username and password. It is very flexible in terms of implementation and how we ask for the username and password, but requires extra work for implementing the forms.
This method suffers from the same security risk as the BASIC method because the credentials are transmitted as plain text.
This is however, the most user friendly type of authentication as it allows site owners to control the look-and-feel over the user experience during site navigation.
<form method="post" action="/secured/login"> <input type="text" name="username" required> <input type="password" name="password" required> <input type="submit" value="Login"> </form>
BASE Logic, Inc: http://baselogic.com
RFC 793 TCP Connection states: http://tools.ietf.org/html/rfc793
RFC 2397 The "data" URL scheme: http://tools.ietf.org/html/rfc2397
RFC 2459 Internet X.509 Public Key Infrastructure: http://tools.ietf.org/html/rfc2459
RFC 4918: HTTP Extensions for Web Distributed Authoring and Versioning (WebDAV): http://tools.ietf.org/html/rfc4918