-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc-topic distr. #19
Comments
The problem seems to stem from WriteASCIIDoubleMatrix. Decimal numbers are written with commas both as decimal separators and column separators. This adds an extra column for each printed value and every other column gets the value 0. |
Yes, I noticed this bug also, and have a fix in 9.2.0, for parts of the problem, but will have to double check if this is also solved with that fix... |
9.2.0 should solve this problem |
The test for WriteASCIIDoubleMatrix now passes, but the problem unfortunately remains for me.
since String.format() depends on defaultLocale (which for me is SE) |
Yes, it is due to locale and it is a bit of a mess now unfortunately, the combination of Locale and possibility of selecting separator makes it complicated... I'll have a look and see if I can re-design to a better solution. |
Outputen sparad av "save_doc_theta_estimate = true" har fel dimensioner och uutputen visar inte heller proportioner utan counts.
Detta står i README.txt-filen:
Save the a file with document topic theta estimates (will not include zeros)
Unlike Phi means which are sampled with thinning, theta means is just a simple
average of the topic counts in the last iteration divided by the number of
tokens in the document thus there is not theta_burnin or theta_thinning
save_doc_theta_estimate = true
doc_topic_theta_filename = doc_topic_theta.csv
Har en model med 200 ämnen men doc_theta_means filen har 400 kolumner och antal dokument som rader? Varför är antalet kolumner dubbla antalet ämnen i modellen?
Config-file:
configs = Spalias
no_runs = 1
[Spalias]
title = PCPLDA
description = 200 topics with alpha 0.2 and extended priorlist
dataset = data/fb_politics_news.txt
scheme = spalias_priors
seed = 1904
topics = 200
alpha = 0.2
beta = 0.01
iterations = 1500
rare_threshold = 0
batches = 4
topic_batches = 4
topic_interval = 500
start_diagnostic = 200
debug = 0
#log_type_topic_density = true
log_document_density = true
log_phi_density = true
phi_mean_filename = phi-mean.csv
phi_mean_burnin = 20
phi_mean_thin = 5
stoplist = nsc-test/PartiallyCollapsedLDA-8.4.0/stoplist-empty.txt
save_vocabulary = true
vocabulary_filename = lda_vocab.txt
topic_prior_filename = wfw/bash/priors/k200_v7.txt
keep_connecting_punctuation = true
log_topic_indicators = true
save_sampler = false
save_doc_theta_estimate = true
doc_topic_theta_filename = doc_topic_theta.csv
save_phi_mean = true
Jag bifogar en bild av delar av outputen så du ser hur den ser ut.
The text was updated successfully, but these errors were encountered: