◐ Shell
reader mode source ↗
Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
File filter
Conversations
Jump to
Diff view
Apply and reload
Show whitespace
Diff view
Apply and reload
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@

The first idea can be to list the languages with `|` in-between.

But that doesn't work right:

```js run
let regexp = /Java|JavaScript|PHP|C|C\+\+/g;
Expand All @@ -11,18 +11,18 @@ let str = "Java, JavaScript, PHP, C, C++";
alert( str.match(regexp) ); // Java,Java,PHP,C,C
```

The regular expression engine looks for alternations one-by-one. That is: first it checks if we have `match:Java`, otherwise -- looks for `match:JavaScript` and so on.

As a result, `match:JavaScript` can never be found, just because `match:Java` is checked first.

The same with `match:C` and `match:C++`.

There are two solutions for that problem:

1. Change the order to check the longer match first: `pattern:JavaScript|Java|C\+\+|C|PHP`.
2. Merge variants with the same start: `pattern:Java(Script)?|C(\+\+)?|PHP`.

In action:

```js run
let regexp = /Java(Script)?|C(\+\+)?|PHP/g;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# Find programming languages

There are many programming languages, for instance Java, JavaScript, PHP, C, C++.

Create a regexp that finds them in the string `subject:Java JavaScript PHP C++ C`:

```js
let regexp = /your regexp/g;
Expand Down
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@

Opening tag is `pattern:\[(b|url|quote)]`.

Then to find everything till the closing tag -- let's use the pattern `pattern:.*?` with flag `pattern:s` to match any character including the newline and then add a backreference to the closing tag.

The full pattern: `pattern:\[(b|url|quote)\].*?\[/\1]`.

In action:

```js run
let regexp = /\[(b|url|quote)].*?\[\/\1]/gs;
Expand All @@ -20,4 +20,4 @@ let str = `
alert( str.match(regexp) ); // [b]hello![/b],[quote][url]http://google.com[/url][/quote]
```

Please note that besides escaping `pattern:[`, we had to escape a slash for the closing tag `pattern:[\/\1]`, because normally the slash closes the pattern.
Original file line number Diff line number Diff line change
@@ -1,47 +1,47 @@
# Find bbtag pairs

A "bb-tag" looks like `[tag]...[/tag]`, where `tag` is one of: `b`, `url` or `quote`.

For instance:
```
[b]text[/b]
[url]http://google.com[/url]
```

BB-tags can be nested. But a tag can't be nested into itself, for instance:

```
Normal:
[url] [b]http://google.com[/b] [/url]
[quote] [b]text[/b] [/quote]

Can't happen:
[b][b]text[/b][/b]
```

Tags can contain line breaks, that's normal:

```
[quote]
[b]text[/b]
[/quote]
```

Create a regexp to find all BB-tags with their contents.

For instance:

```js
let regexp = /your regexp/flags;

let str = "..[url]http://google.com[/url]..";
alert( str.match(regexp) ); // [url]http://google.com[/url]
```

If tags are nested, then we need the outer tag (if we want we can continue the search in its content):

```js
let regexp = /your regexp/flags;

let str = "..[url][b]http://google.com[/b][/url]..";
alert( str.match(regexp) ); // [url][b]http://google.com[/b][/url]
Expand Down
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
The solution: `pattern:/"(\\.|[^"\\])*"/g`.

Step by step:

- First we look for an opening quote `pattern:"`
- Then if we have a backslash `pattern:\\` (we have to double it in the pattern because it is a special character), then any character is fine after it (a dot).
- Otherwise we take any character except a quote (that would mean the end of the string) and a backslash (to prevent lonely backslashes, the backslash is only used with some other symbol after it): `pattern:[^"\\]`
- ...And so on till the closing quote.

In action:

```js run
let regexp = /"(\\.|[^"\\])*"/g;
let str = ' .. "test me" .. "Say \\"Hello\\"!" .. "\\\\ \\"" .. ';

alert( str.match(regexp) ); // "test me","Say \"Hello\"!","\\ \""
```
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@
# Find quoted strings

Create a regexp to find strings in double quotes `subject:"..."`.

The strings should support escaping, the same way as JavaScript strings do. For instance, quotes can be inserted as `subject:\"` a newline as `subject:\n`, and the slash itself as `subject:\\`.

```js
let str = "Just like \"here\".";
```

Please note, in particular, that an escaped quote `subject:\"` does not end a string.

So we should search from one quote to the other ignoring escaped quotes on the way.

That's the essential part of the task, otherwise it would be trivial.

Examples of strings to match:
```js
.. *!*"test me"*/!* ..
.. *!*"Say \"Hello\"!"*/!* ... (escaped quotes inside)
.. *!*"\\"*/!* .. (double slash inside)
.. *!*"\\ \""*/!* .. (double slash and an escaped quote inside)
```

In JavaScript we need to double the slashes to pass them right into the string, like this:

```js run
let str = ' .. "test me" .. "Say \\"Hello\\"!" .. "\\\\ \\"" .. ';

// the in-memory string
alert(str); // .. "test me" .. "Say \"Hello\"!" .. "\\ \"" ..
```
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@

The pattern start is obvious: `pattern:<style`.

...But then we can't simply write `pattern:<style.*?>`, because `match:<styler>` would match it.

We need either a space after `match:<style` and then optionally something else or the ending `match:>`.

In the regexp language: `pattern:<style(>|\s.*?>)`.

In action:

```js run
let regexp = /<style(>|\s.*?>)/g;
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# Find the full tag

Write a regexp to find the tag `<style...>`. It should match the full tag: it may have no attributes `<style>` or have several of them `<style type="..." id="...">`.

...But the regexp should not match `<styler>`!

For instance:

```js
let regexp = /your regexp/g;

alert( '<style> <styler> <style test="...">'.match(regexp) ); // <style>, <style test="...">
```
60 changes: 30 additions & 30 deletions 9-regular-expressions/13-regexp-alternation/article.md
Original file line number Diff line number Diff line change
@@ -1,67 +1,67 @@
# Alternation (OR) |

Alternation is the term in regular expression that is actually a simple "OR".

In a regular expression it is denoted with a vertical line character `pattern:|`.

For instance, we need to find programming languages: HTML, PHP, Java or JavaScript.

The corresponding regexp: `pattern:html|php|java(script)?`.

A usage example:

```js run
let regexp = /html|php|css|java(script)?/gi;

let str = "First HTML appeared, then CSS, then JavaScript";

alert( str.match(regexp) ); // 'HTML', 'CSS', 'JavaScript'
```

We already saw a similar thing -- square brackets. They allow to choose between multiple characters, for instance `pattern:gr[ae]y` matches `match:gray` or `match:grey`.

Square brackets allow only characters or character classes. Alternation allows any expressions. A regexp `pattern:A|B|C` means one of expressions `A`, `B` or `C`.

For instance:

- `pattern:gr(a|e)y` means exactly the same as `pattern:gr[ae]y`.
- `pattern:gra|ey` means `match:gra` or `match:ey`.

To apply alternation to a chosen part of the pattern, we can enclose it in parentheses:
- `pattern:I love HTML|CSS` matches `match:I love HTML` or `match:CSS`.
- `pattern:I love (HTML|CSS)` matches `match:I love HTML` or `match:I love CSS`.

## Example: regexp for time

In previous articles there was a task to build a regexp for searching time in the form `hh:mm`, for instance `12:00`. But a simple `pattern:\d\d:\d\d` is too vague. It accepts `25:99` as the time (as 99 minutes match the pattern, but that time is invalid).

How can we make a better pattern?

We can use more careful matching. First, the hours:

- If the first digit is `0` or `1`, then the next digit can be any: `pattern:[01]\d`.
- Otherwise, if the first digit is `2`, then the next must be `pattern:[0-3]`.
- (no other first digit is allowed)

We can write both variants in a regexp using alternation: `pattern:[01]\d|2[0-3]`.

Next, minutes must be from `00` to `59`. In the regular expression language that can be written as `pattern:[0-5]\d`: the first digit `0-5`, and then any digit.

If we glue hours and minutes together, we get the pattern: `pattern:[01]\d|2[0-3]:[0-5]\d`.

We're almost done, but there's a problem. The alternation `pattern:|` now happens to be between `pattern:[01]\d` and `pattern:2[0-3]:[0-5]\d`.

That is: minutes are added to the second alternation variant, here's a clear picture:

```
[01]\d | 2[0-3]:[0-5]\d
```

That pattern looks for `pattern:[01]\d` or `pattern:2[0-3]:[0-5]\d`.

But that's wrong, the alternation should only be used in the "hours" part of the regular expression, to allow `pattern:[01]\d` OR `pattern:2[0-3]`. Let's correct that by enclosing "hours" into parentheses: `pattern:([01]\d|2[0-3]):[0-5]\d`.

The final solution:

```js run
let regexp = /([01]\d|2[0-3]):[0-5]\d/g;
Expand Down
Toggle all file notes Toggle all file annotations