Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Diffs between dpANS3 and CLUS #22

Open
phoe opened this issue Apr 21, 2017 · 27 comments
Open

Diffs between dpANS3 and CLUS #22

phoe opened this issue Apr 21, 2017 · 27 comments

Comments

@phoe
Copy link
Owner

phoe commented Apr 21, 2017

10:44 <phoe> There is a lot of work to be done that I do not really know how to automate.
10:44 <phoe> The biggest mistake I have made is - I have corrected various minor mistakes in the specification without noting what I have corrected and where.
10:44 <phoe> There is no diff done between the text of dpANS3 and CLUS.
10:44 <phoe> And this is something that needs to be fixed.
10:45 <phoe> The task is to produce, by any means, a list of all differences between the glossaries and dictionary pages of CLUS and dpANS3.
10:46 <phoe> Which is a sizeable and somewhat boring task that requires a lot of concentration or a sufficiently smart approach that can compare the two texts despire their different markup.

10:47 <phoe> At least we do not need to take Examples and Notes into account as they are not a normative part of the specification. They're there purely for illustration and can be changed as we see fit.
10:49 <phoe> The approach is either to do it manually or to somehow automate it.
10:50 <phoe> In theory, we could simply copypaste the text from both the original specification and compare it to CLUS.
10:51 <phoe> Since Ctrl+C has the fascinating trait of stripping all formatting and only preserving text.

10:53 <phoe> My idea is.
10:54 <phoe> Open up a text editor with two buffers.
10:54 <phoe> Copypaste a page from the original dpANS3 into one buffer.
10:54 <phoe> Copypaste a page from CLUS into the other buffer.
10:54 <phoe> Run diff.
10:55 <phoe> Inspect all differences.
10:55 <phoe> There will be garbage that comes from differences in formatting and such, but we will also be able to see the differences this way.
10:55 <phoe> But the diffing process can be automated through unix diff or any other emacslike diff tool.
10:56 <phoe> So the task that I'd say would be first is - find the proper way of dealing with this, create the method.
10:56 <phoe> I'll create a github issue about this and link it to you in a few hours.
10:56 <phoe> Hours, huh, minues.
10:56 <phoe> Once you have any kind of workable method, please post it there - and let's start rocking.

@KZiemian
Copy link
Collaborator

This is current state of what I find. Emacs have ediff-trees https://www.emacswiki.org/emacs/ediff-trees.el, which can take regexp and compare two directorie. I still don't understand this, but in the end that can automated some things, just run it once and feed with regexp.

I try in the week find what ediff-trees can do.

@phoe
Copy link
Owner Author

phoe commented Apr 24, 2017

@KZiemian it looks like we can scrape dpANS and CLUS - @rmhsilva is developing an effective way of doing this. Once we have the scraped material, we can use the emacs ediff.

@KZiemian
Copy link
Collaborator

@phoe So what precisely we need to do, because I little lost? Download form their GitHubs tex files and make diff on them?

@phoe
Copy link
Owner Author

phoe commented Apr 25, 2017

@KZiemian Please contact @rmhsilva on how he does his scraping and diffing.

@rmhsilva
Copy link
Collaborator

Hi @KZiemian. I will generate a set of diffs in the next couple of days, and post here when that's done. Then we can just review the diffs, knowing that they have all been generated somewhat repeatably/consistently.

@KZiemian
Copy link
Collaborator

@rmhsilva Okej, after monday I should find time for that. You have done great work.

@rmhsilva
Copy link
Collaborator

I've just added a ton of diff files to https://github.com/phoe/clus-data/tree/master/diffs.

I'm not sure how useful they are in their current form - diff has included quite a few blank lines, despite me trying quite hard to get it to ignore them. However, it's mostly clear where a blank line can be ignored.

The full page of data (including examples) has been included - we can ignore the examples for now, we just need to concentrate on the core text I believe (@phoe, is that right?)

Also, Github's diff viewer is pretty decent!

@phoe
Copy link
Owner Author

phoe commented Apr 29, 2017

diff has included quite a few blank lines, despite me trying quite hard to get it to ignore them.

You can remove all lines that contain only + or - using some basic text processing.

Yes, we completely ignore the examples - these will need to be manually rewritten and fixed.

Thank you - I'll start looking at these soon.

@rmhsilva
Copy link
Collaborator

rmhsilva commented May 1, 2017

Hah yeah, of course, I'm not sure why I didn't do that in the first place.

Done now, no more blank lines!

@KZiemian
Copy link
Collaborator

KZiemian commented May 1, 2017

@rmhsilva Thank you. I must still check this GitHub diff, but I make progress.

@KZiemian
Copy link
Collaborator

KZiemian commented May 2, 2017

@phoe I read some files from @rmhsilva, learn some new things on the way, but I don't for what I should watching? There is many rearanging between two version, make true change harder to find, that only thing that I can told right now.

How to mark some file as check? Clon to my account I edit there? I still try get my head over GitHub.

@KZiemian
Copy link
Collaborator

KZiemian commented May 2, 2017

I forked from rmhsilva repository diffs and look at some. I don't see any mistake now, few times unimportant word is missing.

Now I need some more information about what there is to do. And I can mess something up with GitHub.

@rmhsilva
Copy link
Collaborator

rmhsilva commented May 7, 2017

Cool. I'm not quite sure exactly what needs to be looked for when reviewing the diffs, @phoe?

In order to track which diffs have been checked, I suggest the following:

  • we create one more directory, "comments"
  • when you have reviewed a diff (e.g. "foobar.diff"), create a file in the comments directory and put any review comments into it
  • if there are no significant differences, leave the file empty (e.g. use the unix touch utility)
  • the file name should be the same as the diff name, except with a .txt extension (e.g. "foobar.txt")

We know when we're done when there is a file in the comments directory for every diff. We can also find all non-empty files to check which pages are significantly different.

You can do this process in your own copy of this repository (use the Fork button), and when you are done, you can create a pull request for us to merge. Don't worry about messing up things on Github - we can review all changes before merging them in, and can always revert...

@KZiemian
Copy link
Collaborator

KZiemian commented May 7, 2017

To this moment I did some checking https://github.com/KZiemian/clus-data/tree/master/diffs. Can this be helpful?

@rmhsilva
Copy link
Collaborator

rmhsilva commented May 7, 2017

@KZiemian Ah awesome, yeah that's helpful! My suggestions were nothing but suggestions, so as you've already started, carry on in that method 👍

@KZiemian
Copy link
Collaborator

I have a problem. In diffs directory file "cl:functions:first_to_tenth.diff" is empty, where can I find orginal versions of filles?

@KZiemian
Copy link
Collaborator

Other empty files.
"cl:functions:hash-table.p.diff"
"cl:types:restart.diff"
"cl:types:satisfies.diff"
"cl:types:standard-class.p.diff"
"cl:types:standard-object.diff"
"cl:types:storage-condition.diff"
"cl:functions:setf-table.p.diff"
"cl:functions:setf-class-name.diff"

@KZiemian
Copy link
Collaborator

Other empty files.
"cl:types:control-error.diff"
"cl:types:division-by-zero.diff"
"cl:types:floating-point-inexact.diff"
"cl:types:floating-point-invalid-operation.diff"
"cl:types:floating-point-overflow.diff"
"cl:types:floating-point-underflow.diff"
"cl:types:generic-function.diff"
"cl:types:method-combination.diff"
"cl:types:program-error.diff"
"cl:types:restart.diff"
"cl:types:satisfies.diff"
"cl:types:standard-class.diff"
"cl:types:standard-object.diff"
"cl:types:sotrage-condition.diff"

@rmhsilva
Copy link
Collaborator

Hi @KZiemian, thanks for pointing that out. A few things fell through the automated diff process, we'll have to check them manually (by comparing the text in clus with the standard text).
I've checked the "First to Tenth" (http://phoe.tymoon.eu/clus/doku.php?id=cl:functions:first_to_tenth) text, and it looks fine.

@KZiemian
Copy link
Collaborator

@rmhsilva In diffs there are often lines like this:
@@ -14,23 +26,51 @@
What they mean? Can they replace a large part of identical text? That will be good news.

@phoe
Copy link
Owner Author

phoe commented Aug 14, 2017

@KZiemian
Copy link
Collaborator

@rmhsilva @phoe I try to find number of diff that rmhsilva generated few month ago, in sort words my OSs make a mess with names and I must check that noting was lost. I think that should be 972 of them, @rmhsilva can you check that number?

@KZiemian
Copy link
Collaborator

@rmhsilva @phoe I think that I solved problem, that should be 967 diffs.

@KZiemian
Copy link
Collaborator

KZiemian commented Sep 19, 2017

At least these files don't have good diffs. Diffrent files were compered.

cl:constant_variables:nil
cl:constant_variables:t
cl:functions:abort
cl:functions:atom
cl:functions:eql
cl:functions:error
cl:functions:bit
cl:functions:bit-orc1
cl:functions:character
cl:functions:complex
cl:functions:cons
cl:functions:continue
cl:functions:eql
cl:functions:error
cl:functions:float
cl:functions:list
cl:functions:logical-pathname
cl:functions:math-add
cl:functions:math-divide
cl:functions:math-greater
cl:functions:math-less
cl:functions:math-multiply
cl:functions:math-not-equal
cl:functions:math-not-greater
cl:functions:math-not-less
cl:functions:math-subtract
cl:functions:mod
cl:functions:muffle-warning
cl:functions:not
cl:functions:rational
cl:functions:pathname
cl:functions:rational
cl:functions:string
cl:functions:values
cl:macros:and
cl:macros:lambda
cl:types:character
cl_symbols_lambda
cl:types:and
cl:restarts:continue
cl:restarts:muffle-warning
cl:restarts:store-value
cl:restarts:use-value
cl:types:character
cl:types:complex
cl:types:cons
cl:types:eql
cl:types:error
cl:types:list
cl:types:logical-pathname
cl:types:mod
cl:types:nil
cl:types:not
cl:types:null
cl:types:pathname
cl:types:rational
cl:types:values
cl:types:vector
cl:variables:repl-minus
cl:variables:repl-plus
cl:variables:repl-slash
cl:special_operators:function
cl:special_operators:labels (maybe we don't need diff of that)
cl:special_operators:macrolet (maybe we don't need diff of that)
cl:special_operators:function
cl:special_operators:labels
cl:special_operators:macrolet

@KZiemian
Copy link
Collaborator

I hardly believe in that, but from 967 diffs is not checked 965. Now I must find which two are missing and most of my current work is done.

@KZiemian
Copy link
Collaborator

In my best knowledge 967 diffs done, I can't go further without help. This don't mean that all works with them is done, I know that more is needed, but I can't do it myself.

@KZiemian
Copy link
Collaborator

KZiemian commented Oct 23, 2017

State of diff, 24 October 2017.

  1. On GitHub there are 967 checked diffs, I hope I don't miss any from that generated by @rmhsilva.
  2. Every colon ":" was removed from that diffs before checking, that may cause problems. Especial in files "cl_macros_something" there as a lot of examples of this. This is because this section use BNF notation with it "::=" and lot of keywords. @phoe and @rmhsilva should decided what we do with that.
  3. Regardless of how problems with colons ":" ends, 5-10 diffs need be checked again. One of this is "cl_macros_loop", very complicated diff (hard to explain why, easier is just to take a look) which lost many of ":" so I decide to not checking it until above issue is solved.
  4. Diff sometimes caught different but similar in names files. This files must yet be checked. Here is list of all diffs that I know have this problem.

cl:constant_variables:nil
cl:constant_variables:t
cl:functions:abort
cl:functions:atom
cl:functions:eql
cl:functions:error
cl:functions:bit
cl:functions:bit-orc1
cl:functions:character
cl:functions:complex
cl:functions:cons
cl:functions:continue
cl:functions:eql
cl:functions:error
cl:functions:float
cl:functions:list
cl:functions:logical-pathname
cl:functions:math-add
cl:functions:math-divide
cl:functions:math-greater
cl:functions:math-less
cl:functions:math-multiply
cl:functions:math-not-equal
cl:functions:math-not-greater
cl:functions:math-not-less
cl:functions:math-subtract
cl:functions:mod
cl:functions:muffle-warning
cl:functions:not
cl:functions:rational
cl:functions:pathname
cl:functions:rational
cl:functions:string
cl:functions:values
cl:macros:and
cl:macros:lambda
cl:types:character
cl_symbols_lambda
cl:types:and
cl:restarts:continue
cl:restarts:muffle-warning
cl:restarts:store-value
cl:restarts:use-value
cl:types:character
cl:types:complex
cl:types:cons
cl:types:eql
cl:types:error
cl:types:list
cl:types:logical-pathname
cl:types:mod
cl:types:nil
cl:types:not
cl:types:null
cl:types:pathname
cl:types:rational
cl:types:values
cl:types:vector
cl:variables:repl-minus
cl:variables:repl-plus
cl:variables:repl-slash
cl:special_operators:function
cl:special_operators:labels (maybe we don't need diff of that)
cl:special_operators:macrolet (maybe we don't need diff of that)
cl:special_operators:function
cl:special_operators:labels
cl:special_operators:macrolet

  1. Maybe we don't need diff of topic e.g., cl:special_operators:labels. Reason is that topic like it probably is identical in content like other topic that diff we have.

  2. Section "See Also" mostly changed in fallowing way (example from cl_functions_pathname-device)
    CLHS:

- pathname, logical-pathname, Section 20.1 (File System Concepts),   Section 19.1.2 (Pathnames as Filenames)

CLUS:

+* System Class PATHNAME
+* System Class LOGICAL-PATHNAME**
+ {\secref\FileSystemConcepts}
+ {\secref\PathnamesAsFilenames}

Adding capitalization, better description and changing way of reference was only change in, I think, 300 diffs. Problem is that after a 300 diffs with good changes in that section, I just look at "See Also" without enough care, so I most likely missed some problems. Today looking for example to this point I recognized that I missed "*" in diff above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants