This module defines a standard interface to break Uniform Resource Locator (URL) strings up in components (addressing scheme, network location, path etc.)
Returns a tuple (scheme, netloc, path, query, fragment) derived from url.
This corresponds to the general structure of a URL:
scheme://netloc/path?query#fragment. Each tuple item is a string, possibly empty. The components are not broken up in smaller parts (for example, the network location is a single string), and % escapes are not expanded. The delimiters as shown above are not part of the result, except for a leading slash in the path component, which is retained if present. For example:
import urlparse o = urlparse(‘http://www.cwi.nl:80/%7Eguido/Python.html‘)
# result is (‘http’, ‘www.cwi.nl:80’, ‘/%7Eguido/Python.html’, ‘’, ‘’)
Following the syntax specifications in RFC 1808, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component.
Given netloc as parsed by
parse(), breaks it in its component returning a tuple (user, password, host, port). Each component of the returned tuple is a string.
Return the urlencoded version of s.
Urlencoding transforms unsafe bytes to their %XX representation where XX is the hex value of the byte.
Safe bytes are:
- lowercase letters from “a” to “z” (bytes from 0x61 to 0x7a)
- uppercase letters from “A” to “Z” (bytes from 0x41 to 0x5a)
- numbers from “0” to “9” (bytes from 0x30 to 0x39)
- the following symbols: $-_.+!*’()
If s is urlencoded, returns s with every
+substituted with a space and every
%xxsubstituted with the corresponding character.