Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

csv data contains malformed rows for song 6'1 #4

Open
colinmorris opened this issue May 26, 2020 · 1 comment
Open

csv data contains malformed rows for song 6'1 #4

colinmorris opened this issue May 26, 2020 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@colinmorris
Copy link

The row in azlyrics_lyrics_l.csv looks like:

"liz phair","https://www.azlyrics.com/p/phair.html","6'1"","https://www.azlyrics.com/lyrics/lizphair/61.html","i bet you fall in bed[....]"

There's an extra double-quote in the song title field, which confuses the parser in Python's csv library (and probably most others). Per the csv RFC:

If double-quotes are used to enclose fields, then a double-quote
appearing inside a field must be escaped by preceding it with
another double quote. For example:
"aaa","b""bb","ccc"

(btw, thank you for publishing this dataset! It's sorely needed.)

@AlbertSuarez
Copy link
Owner

Hey @colinmorris, thanks for letting me know and sorry for the delay. I don't know how GitHub doesn't notify me about it.
Related to the issue, you are completely right. This is like this because there's no pre-processing of the data for skipping problematic characters like the mentioned one ("). I'll try to submit a PR fixing this. Thanks!

@AlbertSuarez AlbertSuarez self-assigned this Oct 13, 2020
@AlbertSuarez AlbertSuarez added the bug Something isn't working label Oct 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants