-
-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Direct URL support for --lua-filter
#6760
Comments
Even if pandoc allowed this, I wouldn't recommend doing this sort of thing, because of the security implications. If the upstream filter gets updated in a malicious or unintentionally destructive way, you'd be vulnerable. Remember, unlike a template, a lua filter can do virtually anything -- it could, for example, read your passwords and upload them to an external URL. |
If someone want to do this, you can point it to an url of a particular commit and making it much more secure. |
Yes, that's true, and for that reason we might want to consider allowing URLs for filters. |
@tarleb If we did want to allow this, we'd probably want to use |
Again if you allow this for templates there is no reason not to allow it for filters. If you think it's a security issue that should be blocked for filters you should block it for templates as well. The code path to exploit this is a bit more convoluted in the case of templates but it is quite doable. Personally I suggest allowing both but only with an additional |
But I gave a reason above for distinguishing the two cases. Filters can launch missiles. Templates can't. The mischief you can do with a malicious template is quite limited. |
Can't they? I believe templates for some formats can. The limitation is only relevant to some output formats. |
I think that's an edge case; LaTeX templates can launch missiles, but only if For other formats, the problems start when the output file is opened by another program (macros in a Word file could be a possible example). I'd argue that this is a security problem of those programs, not pandoc. |
I gave it more thought but am not enthusiastic about the idea. My gut feeling is that this should be solved via a wrapper or maybe a filter. Some points which led me to that assessment:
It's not that I'm fundamentally opposed to the proposal, and I do think it would add value. It is just that I'd much rather we'd use LuaRocks or another package manager to keep filters up to date. This also makes me think that maybe the pandoc ecosystem could use a more defined concept of "package". Oftentimes, filters, defaults, and templates are bundled together and intended to be used in combination. So maybe what we really need are pandoc packages? |
I completely agree with @tarleb. This is the territory of a pandoc manager and if pandoc does that I think it is trying to be too smart. I hope we can have more discussion (either here or separately in pandoc-discuss) about pandoc package manager. We started talking about this and there was even a prototype pandocpm lying somewhere in our pandoc-extras organization. But the problem I found is that the idea of the pandocpm is still trying to be too smart, and managing executables can be uneasy (in terms of potential security issues.) Even the auto-fetching of templates is a pandoc package manager problem (which is part of what is considered in pandocpm too.) In the end one want to be able to have reproducibility in authoring pandoc documents, and now it is very hard for a number of reasons including not having a pandoc package manager (and also an index, like those in TeX.) There are at least 2 options: Among data Scientists conda is very common, and can be useful here. As a start, it already has pandoc, pandocfilters, panflute packaged there. conda is a real package manager so that it can manage the dependencies (in principle the panflute package can and should requires pandoc<2.10. The machinery is there but the maintainer of the "formula" in conda-forge hasn't do that.) And importantly, conda is cross-platform: Windows, macOS, Linux; x86, x64, aarch64, even other architectures, are supported (more than the supported platforms in pandoc), making it a very good candidate for a cross-platform pandoc package manager. Lua filter system has taken off and is the most "native" way of implementing pandoc filters, so if we can take advantage of a package manager there, it would be the most natural and more self-contained then the above solution. Lua experts can say more about this! On the tangential of this topics, it would be nice to have something like |
How far does |
Wow, an option like |
A good resource on package updating is The Update Framework. Haskell's I like the idea of using |
A (consistent) As of now, I have to use |
I think supporting zipped directories with a potential I have never used conda, but it purports to be multiplatform and language-agnostic, so maybe it would work. However, getting into the business of providing remote packages opens a lot of cans of worms. Do we take responsibility for auditing the packages in the repository so they don't include vectors for malware (remember, filters can do anything)? If so, that's potentially a lot of work and a lot of responsibility. If not, the proposed feature might end up being a vector for bad stuff. Rather than providing a central repository, one might just go one step beyond what we have with filters: you can check out someone's package repository as a directory, audit it yourself, and use it at your own risk. In any case, one key missing piece is a way to address file paths relative to the package directory in defaults files and perhaps YAML metadata (see #5982, #5977). (E.g., you might include in your package a logo which gets referred to in a template; the template wouldn't need to refer to it directly if it could be set in a variable in a defaults file, but the defaults file would have to be able to specify the path relative to the package directory.) |
R packages can be managed using conda: https://docs.anaconda.com/anaconda/user-guide/tasks/using-r-language/ . But I don't know if a typical R programmer is going to use it that way. Also, Rmarkdown while uses pandoc are kind of its own thing so even if pandoc try to support them I don't know if they would be on board to this "vanilla pandoc" ecosystem (they already have their own package manager for example.) I don't know if we are over-thinking about the responsibility of auditing the code. It should be something sort of like CTAN or PyPI. In principle anyone can upload any code there, and in the past malwares do exist at least in PyPI. It would be unreasonable to ask them to audit every piece of code before letting maintainers releasing it. And neither should the "pandoc packaging index" does that. In the past we tried to build a 3rd party filter repo but it is hard without the official pandoc's blessing. But basically this is what is done in pandoc/lua-filters: a repo of pandoc filters with user contribution. Below are 2 different directions... Built our own package managerThere's 2 examples we can expand something like lua-filters repo to something bigger but still rely on voluntary based contributions: homebrew and conda-forge. The main difference is that in homebrew, all "formula" (a recipe to obtain a package) lives in one single monolithic repo, and conda-forge has each "feedstock" (recipe) in their own separate repo. I think conda-forge's feedstock model makes more sense for us, something like:
Using existing tool such as condaConda on one hand seems overkill. I think the momentum needed to kick start it would be much larger. It is a hassle for most simple things used in pandoc such as a template or a single file, self-contained filter. But there is another kind of filter, complex, multi-module, imported multiple 3rd party libraries. A pandoc package manager cannot effectively manage this. With 3rd party dependencies (such as pandas, XLsxWriter, etc.) which might have their own dependencies, it is basically next to be impossible to be managed in a homegrown pandoc package manager. But it is exactly the kind of applications conda is built for. Conda can manage any dependencies, not only Python. In fact it is the reason why pip is not good enough and Guido told them they should build their own. It is because there are many code based in Scientific computing that has FORTRAN, C, C++, etc. dependence, conda built for those cases and can therefore handle those dependencies too. R, Julia, etc. can also be installed in conda. However, it doesn't has Haskell related toolchain yet. After writing this, I think a home-grown pandoc package manager together with an index makes most sense in most cases. A fall back would be conda for the more complicated situations. It would be great if it has some sort of official blessing (such as letting these external conda packages appears in the pandoc package index where there's only metadata directing people to install them using conda.) |
To address, the relative paths issues, etc. maybe the (Speaking of which, I guess nobody here has used nix extensively..? From what I hear, the learning curve is quite steep, but truly reproducible builds are a wonderful thing..) So I imagine this somehow like |
I think conda probably can do what nix does. Eg conda is reproducible such as building on conda forge uses the compilers given there, not the OS's (but again it doesn't have Haskell.) Nix may have a problem of installing in user specified path. I heard that the macOS read only root has problem for nix on macOS in Catalina because of that. Conda has an interesting approach to solve that, that guarantee that wherever the prefix is, it will run correctly. In terms of learning curve, don't know which is harder, but conda is not easy except for the simplest case (of PyPI compatible packages.) |
This is a small feature request: I think it would be very useful if Pandoc's
--lua-filter
argument directly supported URLs pointing to.lua
filters the same way as--template
does.So one could call Pandoc like this
instead of first having to manually (re-)download
include-files.lua
and callingThis would ensure that always the latest version of a filter available at a specific URL is used.
The text was updated successfully, but these errors were encountered: