-
Notifications
You must be signed in to change notification settings - Fork 452
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changed quote to escape #294
Conversation
Note that this is a breaking change. The rename from I'm not sure about returning |
I think it's better to ask what is the advantage of keeping it as a |
I'm skeptical. When would this happen where the extra allocation matters?
Because it's arguably simpler. |
Well, what about a web service that has make frequent regular expressions on request? The current implementation essentially doubles the memory on each call, with a lot of users that could lead up to a lot of allocations.
I think this makes the function a lot smarter, and provides a lot control for only the addition of a single if statement. For example with |
The memory requirements of the extra string are dwarfed by at least a few orders of magnitude by the memory requirements of the regular expression itself. Moreover, the regex (and corresponding string) are probably dropped once the request has been serviced.
I'm not sure what the use for this is. If there was one, then I imagine it'd be useful to expose a |
Isn't this a point for it's inclusion? If the |
If the string is 1MB then the compiled regular expression will be much larger. If each character in the regex roughly corresponds to a single instruction (which is a pretty conservative estimate), then you're looking at a minimum overhead of 24 bytes per character. Multiply this by some small constant, since multiple FSMs are typically compiled. The size of the string is insignificant. In fact, |
I agree that in relative terms the regex will always be greater, and I also think that doesn't mean that extra memory should be wasted when in absolutes. This isn't a strong case but there could rare edge cases where the extra memory would cause an OOM or stack overflow. I still the best case for it's inclusion is that providing |
I remain unconvinced. I'm not a fan of complicating an API for super rare edge cases. Even the case you've provided is unrealistic: creating a 1MB long regex doesn't sound like a use case we should be considering in the API design of this library. I just actually tried it: extern crate regex;
use std::iter::repeat;
use regex::Regex;
fn main() {
let size = 1<<20;
let _ = Regex::new(&repeat('a').take(size).collect::<String>()).unwrap();
} This program takes |
You see I think this where we differ. I see the change to let escaped = if regex::has_meta_character(string) {
Cow::Owned(regex::escape(string))
} else {
Cow::Borrowed(string)
}; vs let escaped = regex::escape(string); |
What are the use cases for wanting this function to not allocate? On Nov 8, 2016 4:20 AM, "Aaron Power" [email protected] wrote:
|
Well what about in the case where there is a lot allocations, say for example in tokei I escape every unique multi line start in a language. A lot of languages have at least 2 or 3 and tokei currently has 87 languages, that is a lot of time spent allocating and deallocating before I even create the regex. |
@Aaronepower ... but you still have to create the regex. As I showed above, creating |
Sorry, I misexplained it would be allocating 271 |
I don't think that actually changes anything. We're talking about orders of magnitude differences here.
Agreed, but that is clearly the intended use case. I am personally growing weary of this debate. I feel we have reached an impasse and I don't think there's a compelling argument in favor of returning One thing I would be in favor of is exposing a function (possibly in |
Updated PR |
Thanks! It looks like the build is failing. I think you might have forgotten to update |
My bad! Updated the PR. |
@Aaronepower Yeah I'm not sure why the nightly build is failing, but it does indeed look unrelated. I'd say just leave this for now and I'll straighten it out when I merge this PR (which I'll do as part of prep for the |
I've rolled this into #310. Thanks again! |
Escape is a much more familiar name to this type of operation.
The function will now also only allocate a
String
when there are actual meta characters to escape, rather than allocating on every call.