GitHub - boostorg/url: Boost.URL is a library for manipulating Uniform Resource Identifiers (URIs) and Locators (URLs).
Integration
|
Note |
Sample code and identifiers used throughout are written as if the following declarations are in effect: #include <boost/url.hpp> using namespace boost::urls; |
We begin by including the library header file which brings all the symbols into scope.
Alternatively, individual headers may be included to obtain the declarations for specific types.
Boost.URL is a compiled library. You need to install binaries in a location that can be found by your linker and link your program with the Boost.URL built library. If you followed the [@http://www.boost.org/doc/libs/release/more/getting_started/index.html Boost Getting Started] instructions, that’s already been done for you.
For example, if you are using CMake, you can use the following commands to find and link the library:
find_package(Boost REQUIRED COMPONENTS url) target_link_libraries(my_program PRIVATE Boost::url)
Parsing
Say you have the following URL that you want to parse:
boost::core::string_view s = "https://user:pass@example.com:443/path/to/my%2dfile.txt?id=42&name=John%20Doe+Jingleheimer%2DSchmidt#page%20anchor";In this example, string_view is an alias to boost::core::string_view, a
string_view implementation that is implicitly convertible from and to std::string_view.
You can parse the string by calling this function:
boost::system::result<url_view> r = parse_uri( s );
The function parse_uri returns an object of type result<url_view>
which is a container resembling a variant that holds either an error or an object.
A number of functions are available to parse different types of URL.
We can immediately call result::value to obtain a url_view.
Or simply
for unchecked access.
When there are no errors, result::value
returns an instance of url_view, which holds the parsed result.
result::value throws an exception on a parsing error.
Alternatively, the functions result::has_value and result::has_error could also be used to check if the string has been parsed without errors.
|
Note |
It is worth noting that parse_uri does not allocate any memory dynamically.
Like a As long as the contents of the original string are unmodified, constructed URL views always contain a valid URL in its correctly serialized form. If the input does not match the URL grammar, an error code is reported through result rather than exceptions. Exceptions only thrown on excessive input length. |
Accessing
Accessing the parts of the URL is easy:
url_view u( "https://user:pass@example.com:443/path/to/my%2dfile.txt?id=42&name=John%20Doe+Jingleheimer%2DSchmidt#page%20anchor" ); assert(u.scheme() == "https"); assert(u.authority().buffer() == "user:pass@example.com:443"); assert(u.userinfo() == "user:pass"); assert(u.user() == "user"); assert(u.password() == "pass"); assert(u.host() == "example.com"); assert(u.port() == "443"); assert(u.path() == "/path/to/my-file.txt"); assert(u.query() == "id=42&name=John Doe+Jingleheimer-Schmidt"); assert(u.fragment() == "page anchor");
url_view::query percent-decodes escapes but preserves literal plus signs, matching RFC 3986 rules.
Use url_view::params (or pass decoding options with space_as_plus = true) when the query represents
form data where '+' should be treated as a space.
URL paths can be further divided into path segments with the function url_view::segments.
Although URL query strings are often used to represent key/value pairs, this interpretation is not defined by rfc3986.
Users can treat the query as a single entity.
url_view provides the function
url_view::params to extract this view of key/value pairs.
for (auto seg: u.segments()) std::cout << seg << "\n"; std::cout << "\n"; for (auto param: u.params()) std::cout << param.key << ": " << param.value << "\n"; std::cout << "\n";
The output is:
path
to
my-file.txt
id: 42
name: John Doe Jingleheimer-SchmidtThese functions return views referring to substrings and sub-ranges of the underlying URL. By simply referencing the relevant portion of the URL string internally, its components can represent percent-decoded strings and be converted to other types without any previous memory allocation.
std::string h = u.host(); assert(h == "example.com");
A special string_token type can also be used to specify how a portion of the URL should be encoded and returned.
std::string h = "host: "; u.host(string_token::append_to(h)); assert(h == "host: example.com");
These functions might also return empty strings
url_view u1 = parse_uri( "http://www.example.com" ).value(); assert(u1.fragment().empty()); assert(!u1.has_fragment());
for both empty and absent components
url_view u2 = parse_uri( "http://www.example.com/#" ).value(); assert(u2.fragment().empty()); assert(u2.has_fragment());
Many components do not have corresponding functions such as
has_authority to check for their existence.
This happens because some URL components are mandatory.
When applicable, the encoded components can also be directly accessed through a string_view without any need to allocate memory:
std::cout <<
"url : " << u << "\n"
"scheme : " << u.scheme() << "\n"
"authority : " << u.encoded_authority() << "\n"
"userinfo : " << u.encoded_userinfo() << "\n"
"user : " << u.encoded_user() << "\n"
"password : " << u.encoded_password() << "\n"
"host : " << u.encoded_host() << "\n"
"port : " << u.port() << "\n"
"path : " << u.encoded_path() << "\n"
"query : " << u.encoded_query() << "\n"
"fragment : " << u.encoded_fragment() << "\n";The output is:
url : https://user:pass@example.com:443/path/to/my%2dfile.txt?id=42&name=John%20Doe+Jingleheimer%2DSchmidt#page%20anchor
scheme : https
authority : user:pass@example.com:443
userinfo : user:pass
user : user
password : pass
host : example.com
port : 443
path : /path/to/my%2dfile.txt
query : id=42&name=John%20Doe+Jingleheimer%2DSchmidt
fragment : page%20anchorPercent-Encoding
An instance of decode_view provides a number of functions to persist a decoded string:
decode_view dv("id=42&name=John%20Doe%20Jingleheimer%2DSchmidt"); std::cout << dv << "\n";
The output is:
id=42&name=John Doe Jingleheimer-Schmidtdecode_view and its decoding functions are designed to perform no memory allocations unless the algorithm where its being used needs the result to be in another container.
The design also permits recycling objects to reuse their memory, and at least minimize the number of allocations by deferring them until the result is in fact needed by the application.
In the example above, the memory owned by str can be reused to store other results.
This is also useful when manipulating URLs:
If u2.host() returned a value type, then two memory allocations would be necessary for this operation.
Another common use case is converting URL path segments into filesystem paths:
boost::filesystem::path p; for (auto seg: u.segments()) p.append(seg.begin(), seg.end()); std::cout << "path: " << p << "\n";
The output is:
path: "path/to/my-file.txt"In this example, only the internal allocations of filesystem::path need to happen.
In many common use cases, no allocations are necessary at all, such as finding the appropriate route for a URL in a web server:
auto match = []( std::vector<std::string> const& route, url_view u) { auto segs = u.segments(); if (route.size() != segs.size()) return false; return std::equal( route.begin(), route.end(), segs.begin()); };
This allows us to easily match files in the document root directory of a web server:
std::vector<std::string> route =
{"community", "reviews.html"};
if (match(route, u))
{
handle_route(route, u);
}Compound elements
The path and query parts of the URL are treated specially by the library. While they can be accessed as individual encoded strings, they can also be accessed through special view types.
This code calls encoded_segments to obtain the path segments as a container that returns encoded strings:
segments_encoded_view segs = u.encoded_segments(); for( auto v : segs ) { std::cout << v << "\n"; }
The output is:
As with other url_view functions which return encoded strings, the encoded segments container does not allocate memory.
Instead, it returns views to the corresponding portions of the underlying encoded buffer referenced by the URL.
As with other library functions, decode_view permits accessing elements of composed elements while avoiding memory allocations entirely:
segments_encoded_view segs = u.encoded_segments(); for( pct_string_view v : segs ) { decode_view dv = *v; std::cout << dv << "\n"; }
The output is:
Or with the encoded query parameters:
params_encoded_view params_ref = u.encoded_params(); for( auto v : params_ref ) { decode_view dk(v.key); decode_view dv(v.value); std::cout << "key = " << dk << ", value = " << dv << "\n"; }
The output is:
key = id, value = 42
key = name, value = John DoeModifying
The library provides the containers url and static_url which supporting modification of the URL contents.
A url or static_url must be constructed from an existing url_view.
Unlike the url_view, which does not gain ownership of the underlying character buffer, the url container uses the default allocator to control a resizable character buffer which it owns.
url u = parse_uri( s ).value();
On the other hand, a static_url has fixed-capacity storage and does not require dynamic memory allocations.
static_url<1024> su = parse_uri( s ).value();Objects of type url are std::regular.
Similarly to built-in types, such as int, a url is copyable, movable, assignable, default constructible, and equality comparable.
They support all the inspection functions of url_view, and also provide functions to modify all components of the URL.
Changing the scheme is easy:
Or we can use a predefined constant:
u.set_scheme_id( scheme::https ); // equivalent to u.set_scheme( "https" );The scheme must be valid, however, or an exception is thrown. All modifying functions perform validation on their input.
-
Attempting to set the URL scheme or port to an invalid string results in an exception.
-
Attempting to set other URL components to invalid strings will get the original input properly percent-encoded for that component.
It is not possible for a url to hold syntactically illegal text.
Modification functions return a reference to the object, so chaining is possible:
u.set_host_ipv4( ipv4_address( "192.168.0.1" ) ) .set_port_number( 8080 ) .remove_userinfo(); std::cout << u << "\n";
The output is:
https://192.168.0.1:8080/path/to/my%2dfile.txt?id=42&name=John%20Doe#page%20anchorAll non-const operations offer the strong exception safety guarantee.
The path segment and query parameter containers returned by a url offer modifiable range functionality, using member functions of the container:
params_ref p = u.params(); p.replace(p.find("name"), {"name", "John Doe"}); std::cout << u << "\n";
The output is:
https://192.168.0.1:8080/path/to/my%2dfile.txt?id=42&name=Vinnie%20Falco#page%20anchorFormatting
Algorithms to format URLs construct a mutable URL by parsing and applying arguments to a URL template.
The following example uses the format
function to construct an absolute URL:
url u = format("{}://{}:{}/rfc/{}", "https", "www.ietf.org", 80, "rfc2396.txt"); assert(u.buffer() == "https://www.ietf.org:80/rfc/rfc2396.txt");
The rules for a format URL string are the same as for a std::format_string, where replacement fields are delimited by curly braces.
The URL type is inferred from the format string.
The URL components to which replacement fields belong are identified before replacement is applied and any invalid characters for that formatted argument are percent-escaped:
url u = format("https://{}/{}", "www.boost.org", "Hello world!"); assert(u.buffer() == "https://www.boost.org/Hello%20world!");
Delimiters in the URL template, such as ":", "//", "?", and "#", unambiguously associate each replacement field to a URL component.
All other characters are normalized to ensure the URL is valid:
url u = format("{}:{}", "mailto", "someone@example.com"); assert(u.buffer() == "mailto:someone@example.com"); assert(u.scheme() == "mailto"); assert(u.path() == "someone@example.com");
url u = format("{}{}", "mailto:", "someone@example.com"); assert(u.buffer() == "mailto%3Asomeone@example.com"); assert(!u.has_scheme()); assert(u.path() == "mailto:someone@example.com"); assert(u.encoded_path() == "mailto%3Asomeone@example.com");
The function format_to can be used to format URLs into any modifiable URL container.
static_url<50> u; format_to(u, "{}://{}:{}/rfc/{}", "https", "www.ietf.org", 80, "rfc2396.txt"); assert(u.buffer() == "https://www.ietf.org:80/rfc/rfc2396.txt");
As with std::format, positional and named arguments are supported.
url u = format("{0}://{2}:{1}/{3}{4}{3}", "https", 80, "www.ietf.org", "abra", "cad"); assert(u.buffer() == "https://www.ietf.org:80/abracadabra");
The arg function can be used to associate names with arguments:
url u = format("https://example.com/~{username}", arg("username", "mark")); assert(u.buffer() == "https://example.com/~mark");
A second overload based on std::initializer_list
is provided for both format and format_to.
These overloads can help with lists of named arguments:
boost::core::string_view fmt = "{scheme}://{host}:{port}/{dir}/{file}"; url u = format(fmt, {{"scheme", "https"}, {"port", 80}, {"host", "example.com"}, {"dir", "path/to"}, {"file", "file.txt"}}); assert(u.buffer() == "https://example.com:80/path/to/file.txt");