Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Possible to use for XML #16

Open
bestlem opened this issue Oct 29, 2022 · 5 comments
Open

Possible to use for XML #16

bestlem opened this issue Oct 29, 2022 · 5 comments
Labels
discussion enhancement New feature or request help wanted Extra attention is needed

Comments

@bestlem
Copy link

bestlem commented Oct 29, 2022

How easy would it be to provide a prism mode for XML.

It does not need to fully understand XML - but sufficient to identify elements and attributes

I am finding prism much more useful than normal font-lock as it emphasises the structure not different types.

@alphapapa
Copy link
Owner

alphapapa commented Oct 31, 2022

Hi Mark,

I've done a bit of exploration, and it should be possible, but it requires some fiddling, as the built-in Emacs functions to analyze SGML/HTML/XML and find the structural depth at a position is not very straightforward. Here's what I have so far, but it doesn't work correctly yet:

(define-minor-mode prism-sgml-mode
  "Disperse SGML document into a spectrum of colors according to depth.
Depth is determined by `sgml-calculate-indent'."
  :global nil
  (let ((keywords '((prism-match-sgml 0 prism-face prepend))))
    (if prism-sgml-mode
        (progn
          (dolist (mode (cl-remove 'prism-mode (prism-active-mode)))
            ;; Deactivate alternative mode so this one can be enabled.
            (funcall mode -1))
          (unless prism-faces
            (prism-set-colors))
          (setq prism-syntax-table (prism-syntax-table (syntax-table)))
          (font-lock-add-keywords nil keywords 'append)
          (font-lock-flush)
          ;; (add-hook 'font-lock-extend-region-functions #'prism-extend-region nil 'local)
          (unless (advice-member-p #'prism-after-theme #'load-theme)
            ;; Don't add the advice again, because this mode is
            ;; buffer-local, but the advice is global.
            (advice-add #'load-theme :after #'prism-after-theme)
            (advice-add #'disable-theme :after #'prism-after-theme)))
      (font-lock-remove-keywords nil keywords)
      (prism-remove-faces)
      (unless (--any (or (buffer-local-value 'prism-mode it)
                         (buffer-local-value 'prism-whitespace-mode it))
                     (buffer-list))
        ;; Don't remove advice if `prism' is still active in any buffers.
        (advice-remove #'load-theme #'prism-after-theme)
        (advice-remove #'disable-theme #'prism-after-theme))
      (remove-hook 'font-lock-extend-region-functions #'prism-extend-region 'local))))

(defun prism-match-sgml (limit)
  "Matcher function for `font-lock-keywords'.
Matches up to LIMIT."
  ;;  (prism-debug (current-buffer) (point) limit)
  (cl-macrolet ((depth ()
                       `(save-excursion
                          (cl-loop for context = (sgml-get-context)
                                   while context
                                   append context into contexts
                                   finally return (length contexts))))
                (parse-syntax ()
                              `(-setq (_depth _ _ in-string-p comment-level-p)
                                 (syntax-ppss)
                                 depth (depth)))
                (comment-p ()
                           ;; This macro should only be used after `parse-syntax'.
                           `(or comment-level-p (looking-at-p (rx (syntax comment-start)))))
                (looking-at-paren-p
                 () `(looking-at-p (rx (or (syntax open-parenthesis)
                                           (syntax close-parenthesis)))))
                (face-at ()
                         ;; Return face to apply.  Should be called with point at `start'.
                         `(cond ((and prism-parens (looking-at-paren-p))
                                 (alist-get depth prism-faces-parens))
                                ((comment-p)
                                 (pcase (/ depth sgml-basic-offset)
                                   (0 'font-lock-comment-face)
                                   (else (if prism-faces-comments
                                             (alist-get else prism-faces-comments)
                                           (alist-get else prism-faces)))))
                                ((or in-string-p (looking-at-p (rx (syntax string-quote))))
                                 (pcase (/ depth sgml-basic-offset)
                                   (0 'font-lock-string-face)
                                   (else (if prism-faces-strings
                                             (alist-get else prism-faces-strings)
                                           (alist-get else prism-faces)))))
                                (t (alist-get (/ depth sgml-basic-offset) prism-faces)))))
    (cl-labels ((next-depth-change-from
                 (pos) (save-excursion
                         (goto-char pos)
                         (let ((starting-depth (depth)))
                           (save-excursion
                             (cl-loop do (forward-sexp)
                                      while (and (= starting-depth (depth))
                                                 (not (eobp)))
                                      finally return (point)))))))
      (catch 'eobp
        (let ((parse-sexp-ignore-comments t)
              depth in-string-p comment-level-p comment-or-string-start start end
              found-comment-p found-string-p)
          (while ;; Skip to start of where we should match.
              (cond ((eobp)
                     ;; Stop matching and return nil if at end-of-buffer.
                     (throw 'eobp nil))
                    ((eolp)
                     (forward-line 1))
                    ((looking-at-p (rx blank))
                     (forward-whitespace 1))
                    ((unless prism-strings
                       (when (looking-at-p (rx (syntax string-quote)))
                         ;; At a string: skip it.
                         (forward-sexp 1))))
                    ((unless prism-comments
                       (forward-comment most-positive-fixnum)))))
          (parse-syntax)
          (when in-string-p
            ;; In a string: go back to its beginning (before its delimiter).
            ;; It would be nice to leave this out and rely on the check in
            ;; the `while' above, but if partial fontification starts inside
            ;; a string, we have to handle that.
            ;; NOTE: If a string contains a Lisp comment (e.g. in
            ;; `custom-save-variables'), `in-string-p' will be non-nil, but
            ;; `comment-or-string-start' will be nil.  I don't know if this
            ;; is a bug in `parse-partial-sexp', but we have to handle it.
            (when comment-or-string-start
              (goto-char comment-or-string-start)
              (unless prism-strings
                (forward-sexp 1))
              (parse-syntax)))
          ;; Set start and end positions.
          (setf start (point)
                ;; I don't know if `ignore-errors' is going to be slow, but since
                ;; `scan-lists' and `scan-sexps' signal errors, it seems necessary if we want
                ;; to use them (and they seem to be cleaner to use than regexp searches).
                end (min limit
                         (save-excursion
                           (or (when (looking-at-p (rx (syntax close-parenthesis)))
                                 ;; I'd like to just use `scan-lists', but I can't find a way
                                 ;; around this initial check.  The code (scan-lists start 1
                                 ;; 1), when called just inside a list, scans past the end of
                                 ;; it, to just outside it, which is not what we want, because
                                 ;; we want to highlight the closing paren with the shallower
                                 ;; depth.  But if we just back up one character, we never
                                 ;; exit the list.  So we have to check whether we're looking
                                 ;; at the close of a list, and if so, move just past it.
                                 (cl-decf depth)
                                 (1+ start))
                               (when (and prism-comments (comment-p))
                                 (forward-comment most-positive-fixnum)
                                 (setf found-comment-p t)
                                 (point))
                               (when (looking-at-p (rx (syntax string-quote)))
                                 (if in-string-p
                                     ;; At end of string: break out of it.
                                     (forward-char 1)
                                   ;; At beginning of string: skip it.
                                   (forward-sexp 1))
                                 (setf found-string-p t)
                                 (point))
                               (ignore-errors
                                 (next-depth-change-from (point)))
                               ;; (ignore-errors
                               ;;   ;; Scan to the past the delimiter of the next deeper list.
                               ;;   (scan-lists start 1 -1))
                               ;; (ignore-errors
                               ;;   ;; Scan to the end of the current list delimiter.
                               ;;   (1- (scan-lists start 1 1)))
                               ;; If we can't find anything, return `limit'.  I'm not sure if
                               ;; this is the correct thing to do, but it avoids an error (and
                               ;; possibly hanging Emacs) in the event of an undiscovered bug.
                               ;; Although, signaling an error might be better, because I have
                               ;; seen "redisplay" errors related to font-lock in the messages
                               ;; buffer before, which might mean that Emacs can handle that.
                               ;; I think the important thing is not to hang Emacs, to always
                               ;; either return nil or advance point to `limit'.
                               limit))))
          (when (< end start)
            ;; Set search bound properly when `start' is greater than
            ;; `end' (i.e. when `start' is moved past `limit', I think).
            (setf end start))
          (when end
            ;; End found: Try to fontify.
            (save-excursion
              (or (unless (or in-string-p found-string-p found-comment-p)
                    ;; Neither in a string nor looking at nor in a
                    ;; comment: set `end' to any comment found before it.
                    (when (re-search-forward (rx (or (seq (not (syntax escape)) (syntax string-quote))
                                                     (syntax comment-start)))
                                             end t)
                      (unless (equal '(7) (syntax-after (match-beginning 0)))
                        ;; Not in a string: set end to the beginning
                        ;; of the comment (this avoids stopping at
                        ;; comment-starts inside strings).
                        (setf end (match-beginning 0)))))
                  (unless (or found-comment-p found-string-p)
                    ;; Neither in nor looking at a comment: set `end'
                    ;; to any string or comment found before it.
                    (when (re-search-forward (rx (syntax string-quote)) end t)
                      (setf end (match-beginning 0))))
                  
                  ))
            ;; (when prism-parens
            ;;   (unless (= 1 (- end start))
            ;;     ;; Not fontifying a single open paren (i.e. we are trying to fontify more
            ;;     ;; than just an open paren): so if we are looking at one, fontify only it.
            ;;     (when (eq 4 (syntax-class (syntax-after (1- end))))
            ;;       ;; End is past an open paren: back up one character.
            ;;       (cl-decf end))))
            (if (and (comment-p) (= 0 depth))
                (setf prism-face nil)
              (setf prism-face (face-at)))
            (goto-char end)
            (set-match-data (list start end (current-buffer)))
            ;;  (prism-debug (current-buffer) "END" start end)
            ;; Be sure to return non-nil!
            t))))))

To fix it will require some more adjustments, e.g. sgml-get-context, which seems to be the best way to determine structural depth, returns different results depending on whether point is within a tag (like </form|>) or outside it (like </form>|), as well as before or after it (like |</form>).

I'm guessing this would be much easier in the upcoming release of Emacs 29, which will have tree-sitter support, which I'm guessing will support HTML out-of-the-box, which would probably make it easy to determine the depth at any position and the bounds of such.

Other than that, I don't spend much time editing SGML/HTML/XML documents in Emacs, so I probably won't work on this feature anytime soon. Patches welcome. :)

@alphapapa alphapapa added enhancement New feature or request help wanted Extra attention is needed discussion labels Oct 31, 2022
@meedstrom
Copy link

Not sure if it's a separate issue but when I open some .js or .jsx files, with prism-mode, it can cause Emacs to freeze completely. This happens specifically with rjsx-mode as provided by Doom Emacs' :lang javascript module. I saw the relation because rjsx-mode is ready for HTML/XML tags inside the javascript (a common way to write React apps), although no HTML tags need to be present for my Emacs to freeze anyway. Still learning more about this bug. In the meantime, I've taken prism-mode off rjsx-mode-hook.

@alphapapa
Copy link
Owner

@meedstrom Thanks for reporting that. Can you reproduce it in a clean Emacs configuration, e.g. using https://github.com/alphapapa/with-emacs.sh ?

Hopefully Emacs 29 with tree-sitter will make all this much simpler.

@alphapapa
Copy link
Owner

@meedstrom FYI, the fix made for #18 might fix the problem you had with rjsx-mode.

@meedstrom
Copy link

Ok, I resumed using it! WIll be exciting to see if I run into anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants