Topic Wizard – Visual RegExp Editor
The interactive Topic Wizard is a visual tool to create and edit regular expressions. It helps to define search queries by combining a dialog for prefix and suffix extension with a word tree of phrases. By removing irrelevant phrases, user can restrict the query and increase the relevance of results.
A Visual Tool to Specify Topics
Effective topic management goes beyond simple keyword searches. To properly describe abstract concepts such as innovation, or to disambiguate brand names such as Amazon and Apple one needs to consider synonyms, singular and plural versions of a term, grammatical variations, lists of related products, etc.
To formally describe such multiple appearances, regular expressions are very useful. But defining them can be a tedious task, pondering over a wide range of possible combinations. Most likely the result will be incomplete, since it is almost impossible to think of every possible combination. To overcome this problem, the Topic Wizard suggest a list of prefixes and suffixes, as well as frequent phrases preceding or following the term, grouped together in a word tree-based fashion. Please note that adding prefixes and suffixes extends the result set, while switching to phrase mode limits the result set).
How to Access and Use the Topic Wizard
The Topic Wizard is available from each line defined in the Topic Editor, which is accessible via the drop-down menu in the third column of the advanced search dialog. As shown in the figure above, its user interface consists of a mode selector, the central term list, visual tree structures to the left and right, and additional options in the footer menu:
Main Interface Components
- Mode Selector. Located in the lower part of the window, it allows to switch between suffix and prefix modes, or activate the word tree representation to restrict the query to specific phrases.
- Term List. Depending on the selected mode, the area in the center of the window shows either a scrollable list of the base term plus identified prefixes or suffixes, or the currently selected pre-/suffix combinations.
- Phrase Tree. Left and right of the center area, a tree structure presents the identified phrases surrounding the term(s).
- Additional Features. The footer provides (i) sorting criteria to list query results alphabetically or by frequency; (ii) stop word filtering in the phrase restriction mode; and (iii) a counter in the lower right corner to indicate the number of matches based on the selected data source(s) and time interval.
Interactive Features
- Clicking on a term adds the highlighted prefix or suffix.
- Hovering over a prefix term will display all its suffixes, and vice versa.
- Hovering over a phrase will display a list of base terms preceding or following this phrase; depending on the local tree structure, an icon allows to expand or collapse a specific subtree.
- Clicking on a sub-tree restricts the query to the selected phrase.
Supported Regular Expression Format
To use the Topic Wizard for a phrase or regular expression (RegExp), it needs to follow a specific syntax. The most straightforward way is to start with a single word such as “sustain”, and use the Topic Wizard to add relevant prefixes and suffixes, or to restrict the query to match only certain phrases.
Adding the suffix -able and the phrase parts development, growth, agriculture and energy policy, for example, would lead to the following regular expression:
sustain(able)? (development|growth|agriculture|energy policy)
This expression is a compact way to represent the following phrases:
sustainable development
sustainable growth
sustainable agriculture
sustainable energy policy
sustain development
sustain growth
sustain agriculture
sustain energy policy
The most complex regular expressions supported by the wizard (i.e. can be re-opened and edited) correspond to the following syntax:
(prefix phrase1|prefix phrase2|…) (prefix1|…)?baseterm(suffix1|…)? (suffix phrase1|…)