Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with exported email address. #16

Closed
steinwaywhw opened this issue Jun 24, 2016 · 10 comments
Closed

Problems with exported email address. #16

steinwaywhw opened this issue Jun 24, 2016 · 10 comments

Comments

@steinwaywhw
Copy link

Hi @icy, thanks for this awesome tools.

I noticed that the messages it exports (from a public group ats-session-types) only contain emails like this: <[email protected]>, not full email addresses.

I first tried the same request in Chrome, no full email.
I then tried the request with a suffix &authuser=0 in Chrome and it then shows the full email address.
I then tried the request with a suffix &authuser=0 in Chrome without logging in, no full email address.

I then though it might be the problem of cookies so I exported them to use with wget, plus the suffix of &authuser=0. It still doesn't work. I tried curl with cookies too, doesn't work.

Do you have any experience with such things?

Attachment, the url request I use.

https://groups.google.com/forum/message/raw?msg=ats-session-types/1qFgIUe0rww/1U_5LsjTAwAJ&authuser=0
@icy
Copy link
Owner

icy commented Jun 25, 2016

Hi @steinwaywhw,

Arre you sure you followed https://github.com/icy/google-group-crawler#private-group ?

Thanks

@steinwaywhw
Copy link
Author

Hi @icy, I used https://chrome.google.com/webstore/detail/cookiestxt/njabckikapfpffapmjgojcnbfjonfjfg?hl=en to export my cookies from Chrome, and specified wget options as instructed. I tried different user agent strings to match my Chrome. I also tried --load-cookies blabla as shown in the man page, and --load-cookies=blabla as shown in --help, and I even tried curl with -b option, non worked. I have no idea what's going wrong. :(

@icy
Copy link
Owner

icy commented Jun 26, 2016

Hi @steinwaywhw,

I have looked and found a bad news: Google will not expose the original email in any cases unless you're a member and you're a manager of the group. Google changed the behavior since my lastest release of the script.

I'm so sorry but this is a Google problem. I will update the README to avoid any future confusion.

Thanks,

@steinwaywhw
Copy link
Author

Hi @icy, sorry i forgot to mention that. I am actually the owner of that that google group. It works in my browser (with valid cookie I guess), but not my from your wget scripts. Even if the requests are from the same IP (my own machine). I think it's either something wrong with the cookie, or they have some way to distinguish a unique browser from any other client like curl and wget.

But anyway, I exported my user lists and run a separate ruby script to clean up "..." from the email address. It worked good enough for me. https://glot.io/snippets/efyndm7qs5 here's the actual script that I hope could help. The "load_users" and "match_user" methods are the actual function that do the work. You can have a look.

@icy
Copy link
Owner

icy commented Jun 27, 2016

Hi @steinwaywhw,

Thanks a lot. The script is very useful and I will add them to the contrib/ directory.

A minor note is that the cookie doesn't affect the downloaded messages; that means you need to clean up your local directory after you set up cookie data.

There would be something wrong with cookie handle, and I would have written a test mechanism. I will add that soon.

Thanks again for your patience and the script.

@pacharanero
Copy link

Hi @icy, I'm very thankful for your GG export script, which is used extensively for Google Group to Discourse migrations.

I do however need to point out that the externally contributed code in https://github.com/icy/google-group-crawler/blob/master/contrib/fix_dot_in_users_emails.rb is almost entirely code taken from the Discourse open source forum project, link and you might need to amend the author and attribution details as such, change licensing details or consider removing the code from your repo.

Thanks again for icy/google-group-crawler and hope you don't mind me pointing out the above.

@erlend-sh
Copy link

Hmm, right. Seems appropriate to include @eviltrout and @riking in the attribution note. As for the license, I think it'd suffice to include the GPL v2 license inline, just for that specific file? And maybe include a note in the README.md about this exception.

All of that being said, I don't think there's any reason why any of the Discourse import scripts have to be GPL v2 licensed, so I'll look into the possibility of changing these to the MIT license so this type of code sharing will be easier in the future.

@pacharanero
Copy link

There is also, I have just noticed, a fair bit of my googlegroups.rb importer in there, also GPLv2, also unattributed.

@icy
Copy link
Owner

icy commented Apr 8, 2017

Hi @pacharanero and @erlend-sh,

Thanks a ton for your feedback. It's definitely my mistake when I didn't provide enough information for the script. I'm going to fix that. I will keep you posted.

Have a great day.

@icymatter
Copy link

Sorry for my belated response. I will update the script information today. Thanks for your patience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants