Added "MongoDB Query" syntax #2502

airs0urce · 2020-08-05T09:04:40Z

No description provided.

…or no token"

RunDevelopment

Thank you for making this pull request @airs0urce.

I gave it a quick review but I mainly want to ask this: Is this a real language?
From what I see, this is just a subset of JS with some additional highlighting for special MongaDB properties. Could you please explain the use-case for this.

RunDevelopment · 2020-08-06T20:13:25Z

components/prism-mongodb-query.js

+		return keyword.replace('$', '\\$');
+	});
+
+	var keywordsRegex = '(?:' + keywords.join('(?:\\b|:)|') + ')\\b';


This will generate a string value like this:

(?:\$foo(?:\b|:)|...|\$bar(?:\b|:)|\$baz)\b

Assuming that it's a bug that the last keyword doesn't get the (?:\b|:) suffix, we can factor out the common pre- and suffixes like this:

\$(?:foo|...|bar|baz)(?:\b|:)\b

Let's talk about the (?:\b|:)\b suffix. It's equivalent to (?:\b|:\b) where the problem is easier to see. The :\b alternative can never be matched. If we have a string "$foo:", then the \b alternative will accept after "$foo".
So we can simplify the whole pattern even further:

\$(?:foo|...|bar|baz)\b

But we know that the \b assertion will always accept because of the way the this pattern is used. It's inserted to create the keyword regex, so we know that what follows looks like this: ["']?$. Since we know that \b always accepts, we can just remove it.

\$(?:foo|...|bar|baz)

^ This is the string we want to generate.
So you can remove the $ prefix in all strings of the keywords array, you can remove the mapping adding a \ character, and this line becomes:

Suggested change

var keywordsRegex = '(?:' + keywords.join('(?:\\b|:)|') + ')\\b';

var keywordsRegex = '\\$(?:' + keywords.join('|') + ')';

RunDevelopment · 2020-08-06T20:18:31Z

components/prism-mongodb-query.js

+				},
+				entity: {
+					// ipv4
+					pattern: /\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b/,  


Shouldn't the whole string content be an IP address instead of just a substring? E.g. "foo bar 0.0.0.0 baz".

Same for url.

airs0urce · 2020-08-07T04:16:34Z

@RunDevelopment

Thank you for making this pull request @airs0urce.

I gave it a quick review but I mainly want to ask this: Is this a real language?
From what I see, this is just a subset of JS with some additional highlighting for special MongaDB properties. Could you please explain the use-case for this.

Thank you for the quick review.
Yes, actually this looks more like subset of javascript.

So, basically MongoDB has query language that you use to fetch data from database, this is like SQL, but syntax based on some mix of JSON/javascript, you can define only one js object and you can use limited set of functions supported by mongodb.

This type of highlighting implemented in many mongo clients. Examples:

MongoDB Compass (official mongodb gui client)
"NoSQL booster for mongoDB"
Robo 3T

and many others.
After checking I see they have a little different highlighting types, looks like there is no standard about this.

I'm working on client app for MongoDB database, here is the code: https://github.com/airs0urce/punkmongo
This is screenshot of interface, I pasted demo query in "filter" area:

I need this new syntax to highlight query that user types and also I use it to highlight results of query:

Not sure if this syntax should go to main prism.js repo, but I decided to send pull request anyway. I was not able to find any library that can highlight mongo query, only one here: https://github.com/mongodb-js/ace-mode

But it has webworker for syntax checking and for me looks not lightweight enough, especially because I plan to use it for highlighting query results too and there may be 1000 records shown. So, I wanted some simple highlighting and this is why I implemented it on prism.js.

--

Btw, now I think it makes more sense to call it "Mongo Filter", because there are also Aggregation query which is query too and different keywords can be used in that kind of query.
So, it's better to call this syntax "Mongo Filter" and I plan to create another syntax named "Mongo Aggregagation Stage".

airs0urce · 2020-08-07T04:25:41Z

Here is also example of query made using official Mongo Shell (https://docs.mongodb.com/manual/mongo/) and results:

Just to show, here I tried to use "RandomFunction" in mongo shell and got error:

In the syntax I highlight only valid functions and keywords like $set, $unset, $gt, etc.

airs0urce · 2020-08-07T04:45:15Z

here is list of syntaxes I came with after thinking how to define them more correct:

MongoDB Filter - Only filter related keywords should be available here like $gt, $lt. But we should not include update-related keywords like $set, $unset
MongoDB Update - When you want to update records in database you call update query, you set MongoDB Filter to show what records you want to update and then in "MongoDB Update" you explain how exactly you want to update those records. Keywords used in Mongo Filter should not be available here like $gt, $lt. But update -related keywords should be, for example: $set, $unset, $pull
MongoDB Document - This is one record in results MongoDB returns (like a row in SQL dbs). Keywords like $set, $gt, $lt should not be available here
MongoDB Aggregation Stage - This is aggregation framework, you can make more complex queries than with MongoDB Filter using it. Additional keywords available here. Like $match, $group

So, if I make it like this it will be 4 different syntaxes, looks a lot but this is how it works in MongoDB. Anyway let me know what do you think. I could create all 4 syntaxes.

RunDevelopment · 2020-08-10T16:09:35Z

Sorry for the delay!

After your response and looking through the MongoDB doc and projects, I think it is best to implement this as one language that is a superset of JS. (One language because it's easier to use.) The additions to vanilla JS will be MongoDB-specific properties (e.g. $currentDate), properties in general, and highlighting for string URLs (and IP addresses).

The implementation could look like this:

Prism.languages.mongodb = Prism.languages.extend('javascript', {});

Prism.languages.insertBefore('mongodb', 'string', {
	'property': {
		pattern: /(?:(["'])(?:\\(?:\r\n|[\s\S])|(?!\1)[^\\\r\n])*\1|[_$a-zA-Z\xA0-\uFFFF][$\w\xA0-\uFFFF]*)(?=\s*:)/,
		greedy: true,
		inside: {
			'keyword': RegExp('^([\'"]?)\\$(?:' + ['lt', 'gt', ...].join('|') + ')(?:\\1)$')
		}
	}
});

Prism.languages.mongodb.string.inside = {
	'url': { pattern: /.../ },
	'ip-address': { pattern: /.../, alias: 'entity' },
};

If we also wanted special highlighting for MongoDB-specific types (e.g. ObjectId), we can do it like this:

Prism.languages.insertBefore('mongodb', 'function', {
	'builtin': /\b(?:ObjectId|Code|...)\b/
});

The main advantage of implementing it like this is that JS is doing most of the work for us. We don't have to worry about comments, numbers, keywords, etc; JS handles all of this for us.

Thoughts?

airs0urce · 2020-08-10T17:10:57Z

@RunDevelopment Yes, sounds good. I'll check how that extending works and prepare the changes when I have free time.
Only thing I hope there is way to exclude keywords like "if", "else" etc. something JS specific that doesn't make sense for mongodb syntax. So, I'll play with that.

About this:

highlighting for string URLs (and IP addresses).

Actually this is what I did for my mongo client to make it more easy to analyze content of strings, this is not related to any mongodb features directly. I believe it will be better if we don't include them in generic mongodb syntax that potentially can go to Prism.js distribution.
Is there any practice about this? Right now I think about preparing mongodb syntax without "url" and "ip" and later I can make a fork and add those additional "decorations".

RunDevelopment · 2020-08-10T21:12:16Z

Only thing I hope there is way to exclude keywords like "if", "else" etc. something JS specific that doesn't make sense for mongodb syntax.

You could say that they are dead weight and they will make the tokenization process slightly slower but it's probably not worth your time.

Also, can't queries include functions that in turn can contain arbitrary JS code?

Is there any practice about this? Right now I think about preparing mongodb syntax without "url" and "ip" and later I can make a fork and add those additional "decorations".

The reason we do syntax highlighting is to improve the readability of code. As long as the feature improves readability, I'm very open to make it a part of Prism. Since URLs are a somewhat common appearance in databases, I think it's a nice idea to highlight them.

airs0urce · 2020-08-11T05:36:57Z

@RunDevelopment

Sorry for bothering you again, but I was thinking more about mongodb syntaxes.
And as I want them to be part of prism.js - I would know your opinion.

Here are my thoughts:

Actually I see there are two approaches different mongo clients use.

1) Approach `#1` is when client allows to edit query filter and all other options available from GUI by fields/checkboxes:

So, only thing to highlight here is query filter which is one object with set of available functions like ObjectId:
{'_id': ObjectId('5f31857635a20728b57d6c96'),}

Official client MongoDb Compass:

The client I'm working on:

2) Approach `#2` is to give you shell. In this case you actually can run javascript and extending javascript syntax makes sense here. "NoSQL Mongo Booster" gives us even more - you can use libraries like moment.js etc.

So you can write normal JS code:

db._logs_api.find({'_id': ObjectId('5f31857635a20728b57d6c96')})

Official MongoDB Shell

NoSQL Mongo Booster app

There is no right answer for question "Which approach is right" as even two official Mongo clients use both approaches.
It means both approaches are ok. Based on this I believe syntaxes for mongodb should be done like this:

4 syntaxes for basic parts (described in my coment above):

MongoDB Filter
MongoDB Update
MongoDB Document
MongoDB Aggregation Stage

I plan to use all of them in different places of my project, so in query filter I'll use "MongoDB Filter" syntax and this
way properties like $match and $group (from "MongoDB Aggregation Stage") will be ignored. For displaying results I'll use "MongoDB Document" and
redundant properties like $set, $unset from "MongoDB Update" will be ignored too, it will speedup parsing when you show many results like I plan to do in my client.

So each part will highlight only what needed which will make it better for UX as user will get better control on what he wrote and same time we will get better parsing speed.

Then later if somebody wants to implement syntax for approach #2 they can reuse all 4 parts I created. May be even I will create it.
So, they can call syntax for example

MongoDB Shell

"MongoDB Shell" will extend javascript syntax and also "depends" on those 4. Something like this will be in "components.json":

"mongodb-shell": {
    "title": "MongoDB Shell",
    "require": ["mongodb-filter", "mongodb-update", "mongodb-document","mongodb-aggregation-stage"],
    "require": "javascript",
    "owner": "<username>"
},

And that new "MongoDB Shell" syntax will be like normal javascript, but will detect parts where syntax must be highlighted mongo-specific way, examples:
For example:

MongoDB Filter

.find(<MongoDB_Filter-syntax>)

MongoDB Update

.updateOne(<...>, <MongoDB_Update-syntax>)
.updateMany(<...>, <MongoDB_Update-syntax>)

MongoDB Document. Any object will be highlighted:

{<MongoDB_Document-syntax>}

MongoDB Aggregation Stage

.aggregate([
    <MongoDB_Aggregation_Stage-syntax>, 
    <MongoDB_Aggregation_Stage-syntax>,  
    ...
])

So, on the end list of syntaxes will look like this:

MongoDB Filter
MongoDB Update
MongoDB Document
MongoDB Aggregation Stage
MongoDB Shell: javascript + reuse of 4 syntaxes above

For now I can create first 4 syntaxes. I don't need "MongoDB Shell" right now, but probably later I'll make it too.
So, the list looks big, but same time it looks like optimal way to be sure that it's possible to highlight syntaxes for both approaches: #1 and #2.

I would know your opinion about this - are you ok about pull request for 4 syntaxes together?
If you have doubts about many syntaxes instead of one, let me know - I'll try to explain with more examples.

RunDevelopment · 2020-08-11T15:07:50Z

it will speedup parsing
we will get better parsing speed.

Please don't worry about the tokenization speed. I really mean it. Since #1909, Prism is really fast.

Benchmark

This was run on my Windows 10 computer with an Intel i7-8700K and NodeJS v13.12.0.

This is a benchmark log from #2153. Each section starts with the current language(s) followed by a list of files and the tokenization timings for that file.

Example:

  ../../components.json (25 kB)
  | local     2.23ms ±  2%   48smp

This means that the file ../../components.json is 25kB in size, was tokenized using the javascript language, and it took 2.23ms on average.

Log:

c

  https://raw.githubusercontent.com/git/git/master/mergesort.c (1 kB)
  | local     0.08ms ±  1%   45smp
  https://raw.githubusercontent.com/git/git/master/mergesort.h (1 kB)
  | local     0.02ms ±  1%   49smp
  https://raw.githubusercontent.com/git/git/master/remote.c (58 kB)
  | local     2.80ms ±  1%   48smp
  https://raw.githubusercontent.com/git/git/master/remote.h (10 kB)
  | local     0.35ms ±  1%   52smp

------------------------------------------------------------

css

  ../../style.css (7 kB)
  | local     0.85ms ±  1%   54smp

------------------------------------------------------------

css!+css-extras (css)

  ../../style.css (7 kB)
  | local     1.32ms ±  1%   53smp

------------------------------------------------------------

javascript

  ../../components.json (25 kB)
  | local     2.23ms ±  2%   48smp
  ../../package-lock.json (190 kB)
  | local    11.87ms ±  2%   39smp
  ../../scripts/utopia.js (11 kB)
  | local     1.32ms ±  1%   49smp
  https://cdnjs.cloudflare.com/ajax/libs/prism/1.20.0/prism.js (29 kB)
  | local     3.53ms ±  1%   48smp
  https://cdnjs.cloudflare.com/ajax/libs/prism/1.20.0/prism.min.js (14 kB)
  | local     2.04ms ±  1%   51smp
  https://code.jquery.com/jquery-3.4.1.js (274 kB)
  | local    30.00ms ±  2%   32smp
  https://code.jquery.com/jquery-3.4.1.min.js (86 kB)
  | local    17.37ms ±  2%   33smp

------------------------------------------------------------

json

  ../../components.json (25 kB)
  | local     1.33ms ±  1%   51smp
  ../../package-lock.json (190 kB)
  | local     7.58ms ±  1%   41smp

------------------------------------------------------------

markup

  ../../download.html (4 kB)
  | local     0.34ms ±  0%   51smp
  ../../index.html (19 kB)
  | local     1.81ms ±  1%   53smp
  https://github.com/PrismJS/prism (192 kB)
  | local    16.24ms ±  1%   35smp

------------------------------------------------------------

markup!+css+javascript (markup)

  ../../download.html (4 kB)
  | local     0.61ms ±  1%   53smp
  ../../index.html (19 kB)
  | local     2.52ms ±  1%   51smp
  https://github.com/PrismJS/prism (192 kB)
  | local    21.09ms ±  1%   35smp

------------------------------------------------------------

ruby

  https://raw.githubusercontent.com/rails/rails/master/actionview/lib/action_view/base.rb (12 kB)
  | local     0.53ms ±  0%   53smp
  https://raw.githubusercontent.com/rails/rails/master/actionview/lib/action_view/layouts.rb (16 kB)
  | local     0.64ms ±  1%   53smp
  https://raw.githubusercontent.com/rails/rails/master/actionview/lib/action_view/template.rb (14 kB)
  | local     0.94ms ±  0%   54smp

------------------------------------------------------------

rust

  https://raw.githubusercontent.com/rust-lang/regex/master/src/compile.rs (42 kB)
  | local     2.68ms ±  2%   44smp
  https://raw.githubusercontent.com/rust-lang/regex/master/src/lib.rs (28 kB)
  | local     0.27ms ±  1%   51smp
  https://raw.githubusercontent.com/rust-lang/regex/master/src/utf8.rs (9 kB)
  | local     0.61ms ±  2%   46smp

If you highlight 1MB of text (= >10k lines of code), the tokenization should only take about 100~200ms on a typical desktop computer.

It's what comes after the tokenization that is usually the bottleneck. First, we have to create HTML from the token stream (this usually takes half as long as the tokenization itself) and then we hand it off to the browser. The browser has to parse the HTML code, create hundreds and thousands of DOM nodes, and then calculate layout and style for all of them. With asynchronous highlighting, we can offload all of Prism's work (tokenization + HTML creation) to a different thread, so your page remains responsive but we can't do anything about the browser having to display the highlighted code.

Unfortunately, Prism can't do partial highlighting and we don't plan to make it a focus in the future. If you need to regularly highlight megabytes of text (>= 10MB) and still need your webpage to be snappy, you might need a library that dynamically highlights and displays part of the text as you scroll.

If you have doubts about many syntaxes instead of one, let me know

I still do. As I said, tokenization performance isn't usually the bottleneck and it seems to be one of your motivations for making 4 languages.

You said that making it 4 languages is more correct, but I don't really see that. I get that you can get false positives if you merged everything together (e.g. "properties like $match and $group (from "MongoDB Aggregation Stage") will be ignored [in filter queries]"). However, I don't think this is too much of an issue because nobody will be using those properties in the wrong context (or at least nobody shouldn't).

Also, in the end, it's just less work to make one language instead of 4 to 5.

airs0urce · 2020-08-12T08:50:00Z

Ok, got you. I’ll create new pull request when done. Will close current one for now.
So in new pull request I’ll add “MongoDB” syntax with full coverage and extending javascript, it will fit prism.js approach to languages.
Along with this I’ll keep the syntaxes separated in my forked version, so it will be possible to use sub-syntaxes if anybody needs.

airs0urce · 2020-09-03T10:15:27Z

For those who read this thread, here is forked version with separated mongodb syntaxes:
https://github.com/airs0urce/prism-mongodb

Read README.md to understand how to use it.

airs0urce added 3 commits August 5, 2020 15:45

Added "MongoDB Query" syntax

e560097

"Mongo Query" syntax, fix "Failed to tokenize the word 'foo1' as one …

18be9af

…or no token"

"Mongo Query" syntax, removed NaN and Infinity duplicates

4359fc0

RunDevelopment reviewed Aug 6, 2020

View reviewed changes

RunDevelopment added language-definitions needs consensus new language labels Aug 6, 2020

airs0urce closed this Aug 12, 2020

airs0urce deleted the mongodb-query-syntax branch August 13, 2020 10:24

airs0urce mentioned this pull request Aug 13, 2020

MongoDB syntax #2518

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added "MongoDB Query" syntax #2502

Added "MongoDB Query" syntax #2502

airs0urce commented Aug 5, 2020

RunDevelopment left a comment

RunDevelopment Aug 6, 2020

RunDevelopment Aug 6, 2020

airs0urce commented Aug 7, 2020 •

edited

Loading

airs0urce commented Aug 7, 2020

airs0urce commented Aug 7, 2020 •

edited

Loading

RunDevelopment commented Aug 10, 2020

airs0urce commented Aug 10, 2020 •

edited

Loading

RunDevelopment commented Aug 10, 2020

airs0urce commented Aug 11, 2020 •

edited

Loading

RunDevelopment commented Aug 11, 2020

airs0urce commented Aug 12, 2020

airs0urce commented Sep 3, 2020

	var keywordsRegex = '(?:' + keywords.join('(?:\\b\|:)\|') + ')\\b';
	var keywordsRegex = '\\$(?:' + keywords.join('\|') + ')';

Added "MongoDB Query" syntax #2502

Added "MongoDB Query" syntax #2502

Conversation

airs0urce commented Aug 5, 2020

RunDevelopment left a comment

Choose a reason for hiding this comment

RunDevelopment Aug 6, 2020

Choose a reason for hiding this comment

RunDevelopment Aug 6, 2020

Choose a reason for hiding this comment

airs0urce commented Aug 7, 2020 • edited Loading

airs0urce commented Aug 7, 2020

airs0urce commented Aug 7, 2020 • edited Loading

RunDevelopment commented Aug 10, 2020

airs0urce commented Aug 10, 2020 • edited Loading

RunDevelopment commented Aug 10, 2020

airs0urce commented Aug 11, 2020 • edited Loading

1) Approach #1 is when client allows to edit query filter and all other options available from GUI by fields/checkboxes:

Official client MongoDb Compass:

The client I'm working on:

2) Approach #2 is to give you shell. In this case you actually can run javascript and extending javascript syntax makes sense here. "NoSQL Mongo Booster" gives us even more - you can use libraries like moment.js etc.

Official MongoDB Shell

NoSQL Mongo Booster app

RunDevelopment commented Aug 11, 2020

airs0urce commented Aug 12, 2020

airs0urce commented Sep 3, 2020

airs0urce commented Aug 7, 2020 •

edited

Loading

airs0urce commented Aug 7, 2020 •

edited

Loading

airs0urce commented Aug 10, 2020 •

edited

Loading

airs0urce commented Aug 11, 2020 •

edited

Loading

1) Approach `#1` is when client allows to edit query filter and all other options available from GUI by fields/checkboxes:

2) Approach `#2` is to give you shell. In this case you actually can run javascript and extending javascript syntax makes sense here. "NoSQL Mongo Booster" gives us even more - you can use libraries like moment.js etc.