You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Many elements/tags appear in wikiextractor's output, such as poem, q, ins, del, br, section, onlyinclude, includeonly, math or mathematical equations (with commands such as \mathbf) not enclosed in any tags.
Download this dump: https://dumps.wikimedia.org/enwiki/20221020/enwiki-20221020-pages-articles1.xml-p1p41242.bz2
Invoke the following command to list lines that contain the opening tags of these elements:
Many elements/tags appear in wikiextractor's output, such as
poem
,q
,ins
,del
,br
,section
,onlyinclude
,includeonly
,math
or mathematical equations (with commands such as\mathbf
) not enclosed in any tags.https://dumps.wikimedia.org/enwiki/20221020/enwiki-20221020-pages-articles1.xml-p1p41242.bz2
wikiextractor --no-templates --html-safe '' -o - dumps.wikimedia.org/enwiki/20221020/enwiki-20221020-pages-articles1.xml-p1p41242.bz2 | grep '<\(poem\|q\|section\|ins\|del\|math\|onlyinclude\|br\|chem\)\b'
Examples from the output:
(Not all of the tags appear in this particular dump.)
The text was updated successfully, but these errors were encountered: