Regular expressions used patterns operating variables

Follow

Regular expressions (pattern matching) use some features of strings of characters to create patterns (e.g. 2 characters at the beginning of the line, English vocabulary beginning with "c", 3-digits numerical values) that they express with special characters.

 

Using patterns allows to collect and specify strings of characters different from each others when searching or converting words.

 

▼<Example>Special characters for regular expressions you can select with the "Symbol" button in the Pattern Editor.
1.png

 

■Basic special characters

Below are descriptions of basic special characters used in regular expressions, and some examples of matching characters (single characters).

 Special character 

Description

 Examples of matching characters (single characters)

.(dot)

 Arbitrary character excluding the Newline character (\n).

 a,  1,  /,  ?,  . 

\d 

 Arbitrary decimal digit.

 0, 1, 2, … 9 

\D

 Arbitrary non-numeric character.

 a,  /,  ?,  . 

\s

 Arbitrary space character.

 Space, tab, carriage return 

\S

 Arbitrary non-space character.

 a,  1,  /,  ?,  . 

\w

Arbitrary English alphanumeric character or underscore.

 a, … z,  A, … Z,  0, … 9, _ 

\W

Arbitrary character excluding English alphanumeric characters.

 /, ?,  . 

\n

 Newline character.

 Newline 

\r

 Carriage return character.

Carriage return

\t

 Tab character.

 Tab

 

You can use special characters such as "," or "\" as normal characters by adding a backslash "\" before them.

(Example)

・If you want to use "." as a period → \.

・If you want to use "\" as a backslash → \\



■Looping special characters

Below are special characters that match repetitions.

 Special character 

Description

*

 No looping or looping the character immediately before once or more.

+

 Looping the character immediately before once or more.

{n} 

 Looping the character immediately before n times.

{n,} 

 Looping the character immediately before n times or more.

{m,n}

 Looping the character immediately before between m and n times.

?

 No looping or looping the character immediately before once.

 

You can specify various patterns by mixing special characters.

You should in particular remember ".*", a frequently used pattern that combines the basic special character "." with the looping one "*".

 Pattern 

Description

Examples of matching strings of characters

 .* 

 Arbitrary text 

 Design Studio, 1213, 「」(empty text) 




■Sub-pattern

If you use (), you can arrange patterns by assembling them as sets of sub-patterns.

Regular Expression

Description

 (abc)* 

 Arbitrary text with 0, 1 or more occurrences of "abc".「」(empty text), abc, 
 abcabc, abcabcabc etc.

 (.*)(.*) 

 If the matching text is "abc", the first sub-pattern has matched "abc", and the second one has matched「」(empty text).




■Shortest match (special action)

"Shortest match" is a description method that essentially has the meaning of "?" ("No looping or looping the character immediately before once") and that designates the shortest string of characters only, using specific location descriptors.

 

If you write ".*", the regular expression will in principle be processed as the longest match.

For example, if you only want to match the first p tag (<p>) of the string of characters <p>abc</p>, and you use <.*>, the longest string of characters between "<" and ">" will match: "p>abc</p". Using <.*> hence makes the whole string of characters match.

 

To avoid this, add the question mark "?" (shortest match) before ">" such as below.

<.*?>

With the above regular expression, the shortest string of characters between a "<" and a ">" will match, yielding <p>.



Below are examples of regular expressions using shortest match symbols.

 Pattern 

Description

*?

 Matches an arbitrary number of loops of the preceding sub-pattern, or match an empty empty text. Shortcut for {0,}?
Tries to match a small number of sub-patterns.

+?

 Matches one loop or more of the preceding sub-pattern. Shortcut for {1,}?
Tries to match a small number of sub-patterns.

{m,n}?

 Matches between m and n number of loops of the preceding sub-pattern. Tries to match a small number of sub-patterns.

{m,}?

 Matches m loops or more of the preceding sub-pattern. Tries to match a small number of sub-patterns.

(.*?)(.*)

 If the matching text is "abc", the first sub-pattern (.*?) has matched「」(empty text) because it tries to match a small number of characters (among 0, 1 and more of them), while the second sub-pattern has matched "abc".

(.+?)(.*)

 If the matching text is "abc", the first sub-pattern (.+?) has matched "a" because it tries to match a small number of characters (among 1 and more of them), while the second sub-pattern has matched "bc".



■Set

If you want to match one character out of many ones, you can describe this in a pattern using brackets [] to define a set of characters.

 Pattern 

Description

 Examples of matching characters

[abc] 

 Matches one characters among a, b and c.

a, b, c

[^ abc]

 One arbitrary character outside of a, b and c.

d,  1,  /,  ?,  .

[a-z]

 One arbitrary character of the English alphabet.

a, b, c, … z




Below is an example of practical regular expressions.

Pattern

Description

Examples of matching strings of characters

 .an 

 3-characters text that ends with "an".

can, man

  [a-dkx-z]  

 Matches one character among: a to d, k, and x to z.

 a, b, c, d, k, x, y, z 

 \w*\d 

 If the matching string of characters is "abc1abc", \w* expresses 0, 1 or more loops of arbitrary English alphanumeric characters (here, "abc1abc"), and \d an arbitrary decimal digit (here, "1").

abc1abc1, ab2, 0

 \w*?\d 

 If "abc1abc1" is among the searched strings of characters, the "abc1" part will match. It indeed matches the shortest pattern string of characters starting with 0, 1 or more English alphanumeric characters and ending with a numeric one.

abc1, ab2, 0

 \d\d\s\d\d 

 5-characters text starting with 2 digits, followed by a space and ending with 2 digits as well.

01 23, 72 13

 a\.b\\c 

 "." and "\" function as normal characters because they are directly preceded by a backslash "\".

a.b\c

 (\d\d){1,2} 

 Matches either 2 digits or 4.

12, 0357

 (good)?bye 

 Matches a string of characters beginning with 0 or 1 "good" and ending with "bye".

goodbye, bye

 ([bn]a){3,3} 

 Matches a string of characters repeating 3 times a 2-characters text starting with either "b" or "n" and ending with "a".

 banana, babana, nanana 




For a method to describe patterns using more practical regular expressions, please refer to "How to extract characters by pattern in DS?".





Notes

・The newline characters \n or \r are specified differently in DS, DA or Chromium.

 DS:"\r\n", "\r" and "\n"can be used
 DA:"\r" and "\n" can be used ※"\r\n" can also be used but it creates 2 new lines
 Chromium:"\r\n" and "\r" can be used

2 out of 2 found this helpful

Comments

0 comments

Please sign in to leave a comment.