-
Notifications
You must be signed in to change notification settings - Fork 639
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates docs build #229
Updates docs build #229
Conversation
…tegories, updates home page, fixes more docs converter and execute it.
What I meant was the docs (on the home page here) don't reflect the actual names of the NuGet packages/assemblies. We should probably sync them up to make it less confusing. Here is how they should be mapped:
We also have
and makes it runnable from the command line. In Java, the commands can be run directly on the packages that contain them, Note that the functionality that was wrapped into The name As for your other questions, let me review and try massaging my memory muscles... |
@NightOwl888 ok great, i can certainly make the landing page listing reflect the actual package names, that's easy to do. As for the interlinking namespaces between packages, i'll have to investigate the best way to deal with this. I'll try to get the site updated again this week and we can review from there. |
I did a search using NotePad++'s "Find in Files" feature and here is the entire list
My thought on the versioning was to pretty much copy what Lucene did. You can access any version by simply changing the version number in the URL: This should include beta versions. We don't want to remove documentation that might be relevant only to a specific version if someone is still depending on that version. Especially if there have been breaking API changes between them.
Each version of the docs should point only to its own version of the source (using the tag)
This ensures the docs for a specific version stay static and point to the right code for that version even though the code is changing and new versions of docs are being released over time. We don't want to point to the head of the repository, because by the time the reader clicks the link, the doc could be years behind the code. Perhaps there should even be an index page/directory listing at the root that shows all of the versions that are available (at https://lucenenet.somewhere.com/). Also, it might make sense to make a copy (or redirect) of the latest version at https://lucenenet.somewhere.com/latest/ so we can have links that never need to drift in some places. The fact that you are hosting them in a temporary location is fine, but they shouldn't be at the top level of the site, they should be in a directory with the version number on it (or at least one that is escaped in a way that works in the URL). buildingI am having issues getting this working.
Windows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.
PS C:\Users\shad> f:
PS F:\> cd projects/lucenenet
PS F:\projects\lucenenet> ./websites/apidocs/docs.ps1 0 1
Directory: F:\projects\lucenenet\websites\apidocs
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 8/13/2019 5:28 PM tools
Cleaning tools...
Directory: F:\projects\lucenenet\websites\apidocs\tools
Mode LastWriteTime Length Name
---- ------------- ------ ----
d----- 8/13/2019 5:43 PM tmp
d----- 8/13/2019 5:43 PM docfx
Retrieving docfx...
d----- 8/13/2019 5:44 PM nuget
Download NuGet...
d----- 8/13/2019 5:44 PM vswhere
Download VsWhere...
Feeds used:
https://api.nuget.org/v3/index.json
C:\Program Files (x86)\Microsoft SDKs\NuGetPackages\
Installing package 'vswhere' to 'F:\projects\lucenenet\websites\apidocs\tools\tmp'.
GET https://api.nuget.org/v3/registration3-gz-semver2/vswhere/index.json
OK https://api.nuget.org/v3/registration3-gz-semver2/vswhere/index.json 867ms
Attempting to gather dependency information for package 'vswhere.2.7.1' with respect to project 'F:\projects\lucenenet\websites\apidocs\tools\tmp', targeting 'Any,Version=v0.0'
Gathering dependency information took 25.38 ms
Attempting to resolve dependencies for package 'vswhere.2.7.1' with DependencyBehavior 'Lowest'
Resolving dependency information took 0 ms
Resolving actions to install package 'vswhere.2.7.1'
Resolved actions to install package 'vswhere.2.7.1'
Retrieving package 'vswhere 2.7.1' from 'nuget.org'.
Adding package 'vswhere.2.7.1' to folder 'F:\projects\lucenenet\websites\apidocs\tools\tmp'
Added package 'vswhere.2.7.1' to folder 'F:\projects\lucenenet\websites\apidocs\tools\tmp'
Successfully installed 'vswhere 2.7.1' to F:\projects\lucenenet\websites\apidocs\tools\tmp
Executing nuget actions took 200.65 ms
Cleaning...
MSBuild path = C:\Program Files (x86)\Microsoft Visual Studio\2019\Community
MSBuild not found!
At F:\projects\lucenenet\websites\apidocs\docs.ps1:112 char:2
+ throw "MSBuild not found!"
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : OperationStopped: (MSBuild not found!:String) [], RuntimeException
+ FullyQualifiedErrorId : MSBuild not found! Not sure where to go from here. Do we need to use MSBuild? Can this be done using dotnet.exe? code samplesI had a thought about how we might automate the code samples more easily and reliably than a code converter, since lack of using blocks will totally make what a code converter gives us irrelevant, anyway. If the java code sample block can be isolated as a single block of text, we could generate a hash for it and put that hash into a text (markdown?) file along with the sample, for example: <hash>C34406A7F4070BC61B9256F6239E2B251CE691F83C2F4A6DD1ADC846FC9847A2</hash>
<code language="java">
Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
// Store the index in memory:
Directory directory = new RAMDirectory();
// To store an index on disk, use this instead:
//Directory directory = FSDirectory.open("/tmp/testindex");
IndexWriterConfig config = new IndexWriterConfig(Version.LUCENE_CURRENT, analyzer);
IndexWriter iwriter = new IndexWriter(directory, config);
Document doc = new Document();
String text = "This is the text to be indexed.";
doc.add(new Field("fieldname", text, TextField.TYPE_STORED));
iwriter.addDocument(doc);
iwriter.close();
// Now search the index:
DirectoryReader ireader = DirectoryReader.open(directory);
IndexSearcher isearcher = new IndexSearcher(ireader);
// Parse a simple query that searches for "text":
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, "fieldname", analyzer);
Query query = parser.parse("text");
ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
assertEquals(1, hits.length);
// Iterate through the results:
for (int i = 0; i < hits.length; i++) {
Document hitDoc = isearcher.doc(hits[i].doc);
assertEquals("This is the text to be indexed.", hitDoc.get("fieldname"));
}
ireader.close();
directory.close();
<code> Then these text files can be manually converted to c# and vb (the latter using the roslyn online code converter) and can be committed to the lucenenet repo. A converted file may look something like: <hash>C34406A7F4070BC61B9256F6239E2B251CE691F83C2F4A6DD1ADC846FC9847A2</hash>
<code language="c#">
Analyzer analyzer = new StandardAnalyzer(LuceneVersion.LUCENE_CURRENT);
// Store the index in memory:
using (Directory directory = new RAMDirectory())
// To store an index on disk, use this instead:
//using (Directory directory = FSDirectory.Open("/tmp/testindex"))
{
IndexWriterConfig config = new IndexWriterConfig(LuceneVersion.LUCENE_CURRENT, analyzer);
using (IndexWriter iwriter = new IndexWriter(directory, config))
{
Document doc = new Document();
string text = "This is the text to be indexed.";
doc.Add(new Field("fieldname", text, TextField.TYPE_STORED));
iwriter.AddDocument(doc);
}
// Now search the index:
using (DirectoryReader ireader = DirectoryReader.Open(directory))
{
IndexSearcher isearcher = new IndexSearcher(ireader);
// Parse a simple query that searches for "text":
QueryParser parser = new QueryParser(LuceneVersion.LUCENE_CURRENT, "fieldname", analyzer);
Query query = parser.Parse("text");
ScoreDoc[] hits = isearcher.Search(query, null, 1000).ScoreDocs;
Assert.AreEqual(1, hits.Length);
// Iterate through the results:
for (int i = 0; i < hits.Length; i++)
{
Document hitDoc = isearcher.Doc(hits[i].Doc);
Assert.AreEqual("This is the text to be indexed.", hitDoc.Get("fieldname"));
}
}
}
<code>
<code language="vb">
Dim analyzer As Analyzer = New StandardAnalyzer(LuceneVersion.LUCENE_CURRENT)
Using directory As Directory = New RAMDirectory()
Dim config As IndexWriterConfig = New IndexWriterConfig(LuceneVersion.LUCENE_CURRENT, analyzer)
Using iwriter As IndexWriter = New IndexWriter(directory, config)
Dim doc As Document = New Document()
Dim text As String = "This is the text to be indexed."
doc.Add(New Field("fieldname", text, TextField.TYPE_STORED))
iwriter.AddDocument(doc)
End Using
Using ireader As DirectoryReader = DirectoryReader.Open(directory)
Dim isearcher As IndexSearcher = New IndexSearcher(ireader)
Dim parser As QueryParser = New QueryParser(LuceneVersion.LUCENE_CURRENT, "fieldname", analyzer)
Dim query As Query = parser.Parse("text")
Dim hits As ScoreDoc() = isearcher.Search(query, Nothing, 1000).ScoreDocs
Assert.AreEqual(1, hits.Length)
For i As Integer = 0 To hits.Length - 1
Dim hitDoc As Document = isearcher.Doc(hits(i).Doc)
Assert.AreEqual("This is the text to be indexed.", hitDoc.[Get]("fieldname"))
Next
End Using
End Using
<code> During doc generation, the hash can be re-generated based off of the Java code and checked against this file, and if it has not changed, no change will be made and the code in the converted text file will be used in the documentation. If it has changed, then the java code block can be inserted/appended to the text file, the hash updated to the new value, and a warning/log generated so the code changes can be manually propagated to the c# and vb code blocks, then we can manually remove the java code block. We will need to do 2 passes to generate the docs if the original Java code changes, but since that will only happen if we upgrade to target a new version of Lucene it won't be a common case. We should have the build process write warning messages to stdout and also to a log that gets uploaded as a build artifact just to ensure we don't miss this during deployment. We could do the second stage offline:
And of course, if there were no code changes, then we can just skip 2 & 3 and deploy. We don't necessarily have to use Of course, it would be best if the end user had some way to switch between the VB and C# code sample in the generated documents, but for now we should focus on C# if VB is going to be too difficult or time consuming to deal with. The original doc could then have some specially constructed token that the doc generator knows how to use to grab the code sample from the text file and insert it into the right place in the generated HTML. Some more thought will need to be put in to the exact layout and number of text files in relation to number of code samples per generated document, but I am sure you can work that out. Maybe using a GUID as both a filename and a placeholder in the documentation is the appropriate way to go, but we don't want to use the hash because that may change over time. |
For the building we can't use dotnet because the plugin system for docfx is netframework only. I've tried updated the project to use PackageReference but it still fails and i guess netframework projects are just not supported by dotnet But the problem is that I can see you are using an older revision. I fixed a bunch of the build stuff a couple of days ago. Your log output currently has
which is no longer logged. If you update to latest it hopefully should work. |
Yes, you are right I was using an older version. I had just realized that when I got your reply. I will try the latest from this branch and see if that helps. |
I just submitted a PR to your branch. There were some issues with some of the arguments showing up in the I also added a job to The docs seem to build fine locally, but when I pushed to Azure DevOps there was a problem:
You can view the full logs at: https://dev.azure.com/LuceneNET-Temp/Lucene.NET/_build/results?buildId=154 Hopefully you can work out what is going on - perhaps a missing dependency? Also, I noticed that the markdown for the benchmark
Is this something that is required by the doc generator, or an errant commit? |
Docs converter update
Nice, just got that merged. Regarding the strange blurb: This is a common format for metadata with markdown which uses Yaml in it's headers. I think the term is "YAML Front Matter" . Each document can contain a For the build issues on the server, i think docfx fails right away, the first logs when docfx runs is:
AFAIK this is because the version of docfx being used requires VS 2017 build tools installed.... BUT I've just run some tests locally and now for whatever reason it works by passing in the 2019 msbuild version to the env variable. I've also realized that docfx builds the sln in the background anyways so the build script doesn't actually need to build the lucene sln at all which will save some time. I've pushed changes for that and will check on the build. |
…lucenenet into docs-converter-update
Looks like the build server doesn't execute with changes based on PRs so you might have to give it a nudge and see how it goes. |
That would be because I haven't investigated setting up the PR options yet. I pulled down your changes and pushed them up to Azure DevOps. This time it failed much quicker: https://dev.azure.com/LuceneNET-Temp/Lucene.NET/_build/results?buildId=155 BTW - You could potentially setup your own Azure DevOps account, add a Lucene.NET project to it pointing to your GitHub fork, and use the
You will probably want to set:
We will probably end up changing the exact location the doc generation happens in the pipeline, but for now this will get you started so you can work out how to get it running. Once it runs all the way through, the files will be zipped and added as a |
Cool, i have some investigation to do. I 'think' it's because the VS 2017 build tools are required to be installed with this docfx version. The latest docfx version doesn't have this requirement apparently but i ran into some trouble getting it work on the latest version so first I need to figure out that issue and then go from there. I'll see what i can do on monday - but i'm on holidays from next wed until beginning of sept so just a heads up on that :) |
No problem. Let's try to get this PR merged before then. |
@NightOwl888 I've commented on a currently logged issue on DocFx here dotnet/docfx#4869 Any chance you can configure the build server to run VS 2017 for this particular build step and see if that works? |
Actually, it is pretty easy. Just change the - job: Docs
pool:
vmImage: 'windows-2019' to: - job: Docs
pool:
vmImage: 'vs2017-win2016' |
Oops. Submitted that before it was done by accident. See my edit above. |
While going through and cleaning up the code in the test framework, I was reminded about an important c# syntax that is not demonstrated on the home page (https://lucenenet.apache.org), that probably should be. // Add to the index
var source = new
{
Name = "Kermit the Frog",
FavouritePhrase = "The quick brown fox jumps over the lazy dog"
};
var doc = new Document();
// StringField indexes but doesn't tokenise
doc.Add(new StringField("name", source.Name, Field.Store.YES));
doc.Add(new TextField("favouritePhrase", source.FavouritePhrase, Field.Store.YES));
writer.AddDocument(doc);
writer.Flush(triggerMerge: false, applyAllDeletes: false); can be simplified to: // Add to the index
var source = new
{
Name = "Kermit the Frog",
FavoritePhrase = "The quick brown fox jumps over the lazy dog"
};
Document doc = new Document
{
// StringField indexes but doesn't tokenize
new StringField("name", source.Name, Field.Store.YES),
new TextField("favoritePhrase", source.FavoritePhrase, Field.Store.YES)
};
writer.AddDocument(doc);
writer.Flush(triggerMerge: false, applyAllDeletes: false); Not urgent, but could we make sure this is updated eventually? VS2019 is smart enough to give you a hint to do this, but it would be great if the main samples showed how much we have bent the Java-ness toward .NET.
Side note: could we normalize it to US English?
Or, just change the samples so they are culture neutral instead, then there doesn't need to be a debate about it. |
🎉that worked on my personal azure pipelines build. Have pushed a change to the yaml file. Just FYI you will noticed a boat load of warnings during the build (approx 665) and that's because some of the cross links in the docs and in the other md files have some issues. I'll have to go through and fix them up prob on a case by case basis with our converter tool. In some cases it's probably just namespace imports. In many other cases it's because the docs are cross linking types that are in the .Net types which we don't link to so i think we can just ignore those but i'll see if i can suppress the warnings for those at some stage.
... will push an update for this soon. I won't have time before my hols to do anything with the version number stuff but can look into that when i'm back (2nd week of sept). Re: the code samples idea: DocFx has a nifty way to replace or extend parts of docs with external files. I sort of mentioned this above with the |
…e, but slightly better)
Alright. Just a heads up I have made lots of changes to the docs in test framework (but nowhere else). So be sure to skip that module. I am trying to get to a point where I can merge it so we can sync up. Some of the issues with the warnings are due to the fact some types don't exist in .NET Standard 1.6. I found a workaround that we might be able to use for
Unfortunately, that will only work for classes. Some of the warnings are due to methods that don't exist in .NET Standard 1.6 and I am not sure what to do in that case. It may be that duplicating the entire documentation that refers to the methods may be the only choice. |
@NightOwl888 I've pushed a bunch of changes:
|
Looking good.
Packages is sort of a Java-ism. I think it would make more sense to call them libraries, or the more generic "modules". Libraries is probably the most specific and indicates they are not executable. |
Sounds good, will update and push. |
Sorry, once again my message was posted before I was done with it. What happened to ENTER always meaning CRLF instead of submit?
UIMA - probably not a thing. If it is possible, just leave it there, but make it generate HTML comments so it is not visible. Demo - I thought it, but forgot to mention that these are all included in Lucene-CLI. What I'd like to do is pull some of the documentation from that module into the CLI docs (maybe I could just do that manually). I was also thinking that it would be simpler if instead of linking to the code in the repo, just grab it and put it directly into the docs (MSDN style). Each file is a standalone console app and would just need to be pasted inside of the docs. Come to think of it, I already put in some special tokens so only the code sample is grabbed and none of the surrounding junk that isn't important for the sample. They are used by the CLI tool itself to display the code on screen. So, basically instead of both Demo and CLI, we should just have CLI. But no problem keeping the Tools category in case it grows. |
Scratch that, looks like the files don't have any tokens. The entire contents can be grabbed and inserted except for maybe the license header (I think the CLI supports exclusion tokens, but I ended up not using them). Is there some kind of placeholder we need in the doc files so you can grab the latest code from the demo (looks like the demos could use some updates to the cleaner APIs)? The demos map one-to-one to each one of the files in Lucene.Net.Demo. Would those "blubs" happen to look exactly like the one in the benchmark docs, and if different, could you go through and link them? Then I can go through the docs and pull out what is needed and arrange the text around the code as appropriate. |
I've pushed another update: hides UIMA and removes the Since Benchmark isn't a lib and it's part of the CLI, I was going move that under tools too? I'm trying to get my head around exactly what you want but TBH I'm not quite sure :) I understand we have the CLI Demo stuff here for example: https://lucenenetdocs.azurewebsites.net/cli/demo/simple-facets.html Based on your comments above I'm just unsure what the end result is that you are after? I don't have much more time today and then I'm overseas for 3 weeks but can see what i can do in the next hr or 2. |
Sorry if I am a bit scatterbrained :).
Actually, it is both a library and a tool. The library is available if someone wants to extend it to do customized benchmarks. So, looks good as-is.
Basically, I am hoping to get some of the documentation from the original JavaDocs into markdown (which I can do manually), and get the code into the same markdown files (which I am hoping you can achieve). So, the end user will basically have 4 ways to get the demo code:
The on-screen and export from the CLI embed the code as resources, so it is always the same version as the Once that is achieved, we really have no need for the original Demo documentation and can remove the link/generation for it. Is that clear? |
Gotcha. So I'll need to do is: Take all of the *.cs files in the Lucene.Net.Demo project and automatically output the code into .md files that we can use to directly display this code in our docs. For example, on this page: https://lucenenetdocs.azurewebsites.net/api/Lucene.Net.Demo/Lucene.Net.Demo.Facet.SimpleFacetsExample.html ... we'd want to have the actual cs code shown there. The user could then still click on "View Source" to navigate to the source file in GitHub. Correct? If so, we can definitely do that by using the DocFx "overwrite" feature. |
Yes, that is correct. |
You don't have to do this today, but one thing that I think is crucial for the .NET ecosystem is to put the books that have been published about Lucene and Lucene.NET on the home page. A few people have asked for a "user manual" of sorts on the user and dev mailing lists, and I think that is about as close as our team can get to answering that request. There is even a "Lucene 4 cookbook" that I think should go front and center. Not sure if they have updated these books since then. |
to be completed
noteI have upload the result of this to our temporary docs site https://lucenenetdocs.azurewebsites.net/
@NightOwl888 just regarding your comments here #206 (comment)
home page/package names
Can you elaborate a bit more on this:
... I've moved the Kuromoji and SmartCn headings outside of the big Analysis heading since they are separate packages and linked to them properly from the home page, is this the type of thing you mean? I also fixed the ICU link on the home page.
(there's currently still an issue with docfx when there is overlapping namespaces between packages, i haven't yet researched into how to fix this but i can, the docfx team are very responsive)
build/build times
A full clean build takes about 20 mins on my machine so should be ok for the build server. On the build server you'd want to run the powershell:
which is shorthand for
The output website is in
./websites/apidocs/_site
versioning
I know that in the JavaDocToMarkdownConverter there's a TODO for passing in a tab/version which is for the method
RepoLinkReplacer
... but, this method is looking for links in files likeoverview.md
with this syntaxsrc-html
, but as far as i can see there is only 2x places in all of the source that contains these types of links which is in theLucene.Net/overview.md
file which is supposed to link to some demos. For this method it would prob just be easier to fix this file, unless there are more inline links i'm unsure of.Apart from that is there another area where we need to have the version number/tag injected in places?
update: just noticed a version link here https://lucenenetdocs.azurewebsites.net/index.html#reference-documents ... so we'd need to pass a version into the build for that one, do you know of others?
hosting
The docs are currently just hosted on my azure subscription which is also why it has that temporary dns name. I'm fine to leave it there for any length of time but we should look at getting these hosted properly, it's just static files so nothing special.