How URL normalization works
URL normalization modifies separators, encoded elements, and literal bytes in incoming URLs so that they conform to a consistent formatting standard.
For example, consider a WAF custom rule that blocks requests whose URLs match www.example.com/hello
. The rule would not block a request containing an encoded element — www.example.com/%68ello
. Normalizing incoming URLs on the Cloudflare global network helps simplify rules expressions containing URLs.
The two available types of URL normalization are:
The location where URL normalization will occur depends on the configured settings.
For examples of the different settings and their impact on request URLs, refer to the URL normalization examples.
The URL normalization performed according to RFC 3986 ↗ is as follows:
- The following unreserved characters are percent decoded ↗:
- Alphabetical characters:
a
-z
,A
-Z
(decoded from%41
-%5A
and%61
-%7A
) - Digit characters:
0
-9
(decoded from%30
-%39
) - hyphen
-
(%2D
), period.
(%2E
), underscore_
(%5F
), and tilde~
(%7E
)
- Alphabetical characters:
- These reserved characters are not encoded or decoded:
: / ? # [ ] @ ! $ & ' ( ) * + , ; =
- Other characters, for example literal byte values, are percent encoded.
- Percent encoded representations are converted to upper case.
- URL paths are normalized according to the Remove Dot Segments ↗ protocol.
When using the Cloudflare URL normalization, some extra normalization techniques will be applied to URLs of incoming requests, in the following order:
- Normalize back slashes (
\
) into forward slashes (/
). - Merge successive forward slashes (for example,
//
will be normalized to/
). - Perform RFC 3986 normalization of the resulting URL.