-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added "MongoDB Query" syntax #2502
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for making this pull request @airs0urce.
I gave it a quick review but I mainly want to ask this: Is this a real language?
From what I see, this is just a subset of JS with some additional highlighting for special MongaDB properties. Could you please explain the use-case for this.
return keyword.replace('$', '\\$'); | ||
}); | ||
|
||
var keywordsRegex = '(?:' + keywords.join('(?:\\b|:)|') + ')\\b'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will generate a string value like this:
(?:\$foo(?:\b|:)|...|\$bar(?:\b|:)|\$baz)\b
Assuming that it's a bug that the last keyword doesn't get the (?:\b|:)
suffix, we can factor out the common pre- and suffixes like this:
\$(?:foo|...|bar|baz)(?:\b|:)\b
Let's talk about the (?:\b|:)\b
suffix. It's equivalent to (?:\b|:\b)
where the problem is easier to see. The :\b
alternative can never be matched. If we have a string "$foo:"
, then the \b
alternative will accept after "$foo"
.
So we can simplify the whole pattern even further:
\$(?:foo|...|bar|baz)\b
But we know that the \b
assertion will always accept because of the way the this pattern is used. It's inserted to create the keyword
regex, so we know that what follows looks like this: ["']?$
. Since we know that \b
always accepts, we can just remove it.
\$(?:foo|...|bar|baz)
^ This is the string we want to generate.
So you can remove the $
prefix in all strings of the keywords
array, you can remove the mapping adding a \
character, and this line becomes:
var keywordsRegex = '(?:' + keywords.join('(?:\\b|:)|') + ')\\b'; | |
var keywordsRegex = '\\$(?:' + keywords.join('|') + ')'; |
}, | ||
entity: { | ||
// ipv4 | ||
pattern: /\b(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\b/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't the whole string content be an IP address instead of just a substring? E.g. "foo bar 0.0.0.0 baz"
.
Same for url
.
Thank you for the quick review. So, basically MongoDB has query language that you use to fetch data from database, this is like SQL, but syntax based on some mix of JSON/javascript, you can define only one js object and you can use limited set of functions supported by mongodb. This type of highlighting implemented in many mongo clients. Examples: and many others. I'm working on client app for MongoDB database, here is the code: https://github.com/airs0urce/punkmongo Not sure if this syntax should go to main prism.js repo, but I decided to send pull request anyway. I was not able to find any library that can highlight mongo query, only one here: https://github.com/mongodb-js/ace-mode But it has webworker for syntax checking and for me looks not lightweight enough, especially because I plan to use it for highlighting query results too and there may be 1000 records shown. So, I wanted some simple highlighting and this is why I implemented it on prism.js. -- Btw, now I think it makes more sense to call it "Mongo Filter", because there are also Aggregation query which is query too and different keywords can be used in that kind of query. |
Here is also example of query made using official Mongo Shell (https://docs.mongodb.com/manual/mongo/) and results: Just to show, here I tried to use "RandomFunction" in mongo shell and got error: In the syntax I highlight only valid functions and keywords like $set, $unset, $gt, etc. |
here is list of syntaxes I came with after thinking how to define them more correct:
So, if I make it like this it will be 4 different syntaxes, looks a lot but this is how it works in MongoDB. Anyway let me know what do you think. I could create all 4 syntaxes. |
Sorry for the delay! After your response and looking through the MongoDB doc and projects, I think it is best to implement this as one language that is a superset of JS. (One language because it's easier to use.) The additions to vanilla JS will be MongoDB-specific properties (e.g. The implementation could look like this: Prism.languages.mongodb = Prism.languages.extend('javascript', {});
Prism.languages.insertBefore('mongodb', 'string', {
'property': {
pattern: /(?:(["'])(?:\\(?:\r\n|[\s\S])|(?!\1)[^\\\r\n])*\1|[_$a-zA-Z\xA0-\uFFFF][$\w\xA0-\uFFFF]*)(?=\s*:)/,
greedy: true,
inside: {
'keyword': RegExp('^([\'"]?)\\$(?:' + ['lt', 'gt', ...].join('|') + ')(?:\\1)$')
}
}
});
Prism.languages.mongodb.string.inside = {
'url': { pattern: /.../ },
'ip-address': { pattern: /.../, alias: 'entity' },
}; If we also wanted special highlighting for MongoDB-specific types (e.g. Prism.languages.insertBefore('mongodb', 'function', {
'builtin': /\b(?:ObjectId|Code|...)\b/
}); The main advantage of implementing it like this is that JS is doing most of the work for us. We don't have to worry about comments, numbers, keywords, etc; JS handles all of this for us. Thoughts? |
@RunDevelopment Yes, sounds good. I'll check how that extending works and prepare the changes when I have free time. About this:
Actually this is what I did for my mongo client to make it more easy to analyze content of strings, this is not related to any mongodb features directly. I believe it will be better if we don't include them in generic mongodb syntax that potentially can go to Prism.js distribution. |
You could say that they are dead weight and they will make the tokenization process slightly slower but it's probably not worth your time. Also, can't queries include functions that in turn can contain arbitrary JS code?
The reason we do syntax highlighting is to improve the readability of code. As long as the feature improves readability, I'm very open to make it a part of Prism. Since URLs are a somewhat common appearance in databases, I think it's a nice idea to highlight them. |
Sorry for bothering you again, but I was thinking more about mongodb syntaxes. Here are my thoughts: Actually I see there are two approaches different mongo clients use. 1) Approach
|
Please don't worry about the tokenization speed. I really mean it. Since #1909, Prism is really fast. BenchmarkThis was run on my Windows 10 computer with an Intel i7-8700K and NodeJS v13.12.0. This is a benchmark log from #2153. Each section starts with the current language(s) followed by a list of files and the tokenization timings for that file. Example: ../../components.json (25 kB)
| local 2.23ms ± 2% 48smp This means that the file Log:
If you highlight 1MB of text (= >10k lines of code), the tokenization should only take about 100~200ms on a typical desktop computer. It's what comes after the tokenization that is usually the bottleneck. First, we have to create HTML from the token stream (this usually takes half as long as the tokenization itself) and then we hand it off to the browser. The browser has to parse the HTML code, create hundreds and thousands of DOM nodes, and then calculate layout and style for all of them. With asynchronous highlighting, we can offload all of Prism's work (tokenization + HTML creation) to a different thread, so your page remains responsive but we can't do anything about the browser having to display the highlighted code. Unfortunately, Prism can't do partial highlighting and we don't plan to make it a focus in the future. If you need to regularly highlight megabytes of text (>= 10MB) and still need your webpage to be snappy, you might need a library that dynamically highlights and displays part of the text as you scroll.
I still do. As I said, tokenization performance isn't usually the bottleneck and it seems to be one of your motivations for making 4 languages. You said that making it 4 languages is more correct, but I don't really see that. I get that you can get false positives if you merged everything together (e.g. "properties like $match and $group (from "MongoDB Aggregation Stage") will be ignored [in filter queries]"). However, I don't think this is too much of an issue because nobody will be using those properties in the wrong context (or at least nobody shouldn't). Also, in the end, it's just less work to make one language instead of 4 to 5. |
Ok, got you. I’ll create new pull request when done. Will close current one for now. |
For those who read this thread, here is forked version with separated mongodb syntaxes: Read README.md to understand how to use it. |
No description provided.