This is a parser for the Netscape Bookmarks file format, which is generated by browsers when exporting bookmarks to html. It parses the file and delivers an object representing the file with the bookmark structure of folders and shortcuts as objects too. The folder tree can be navigated using the "." notation. It an also create a Netscape Bookmarks file.
Run this in your command line (might need to run as administrator on Windows):
pip install git+https://github.com/FlyingWolFox/Netscape-Bookmarks-File-Parser.git
To update add the --upgrade
flag to the end
Import the classes and the parser:
from NetscapeBookmarksFileParser import *
from NetscapeBookmarksFileParser import parser # if you want to parse a file
If you want to create a file, import the creator as well:
from NetscapeBookmarksFileParser import creator # if you want to create a file
then:
bookmarks = NetscapeBookmarksFile(file).parse()
Where file
is a string with the file contents or a file opened with open()
, e.g.:
with open('bookmarks.html') as file:
bookmarks = NetscapeBookmarksFile(file)
or:
with open('bookmarks.html') as file:
bookmarks = NetscapeBookmarksFile(file.read())
If you want to create a file, create the bookmark structure and call create_file()
.
To know about the classes that the parser and the creator will work with, see the wiki Classes section.
from NetscapeBookmarksFileParser import *
from NetscapeBookmarksFileParser import parser
with open('bookmarks.html') as file:
bookmarks = NetscapeBookmarksFile(file).parse()
root_folder = bookmarks.bookmarks
print(bookmarks.title) # print the file's title
print(root_folder.items[0].name) # print the name of the first item on the root folder
print(root_folder.shortcuts[1].href) # print the url of the first shortcut on the root folder
print(root_folder.children[0].personal_toolbar) # print if the first children folder is the Bookmarks Toolbar
The parser will play like a browser and will ignore most errors and warn for some missing tags. If a folder has an opening <body>
, but no closing </body>
, an exception will be raised. Since Netscape Bookmarks files are commonly generated by browsers when exporting bookmarks in html, these warnings and exceptions shouldn't be common. This parser was based on Microsoft's documentation on the Netscape File Format mainly, but also on file examples (here, here, here and here) and my own browser exports (test\test.html
is one of them). Some more uncommon attributes and items might not be supported. See the Attributes Supported and Items Supported sections in the wiki. If you want to know more about what a file needs to have to be accepted by the parser, read the Netscape Bookmarks File Format page in the wiki.
The creator is the parser in reverse. If you parse a file and create it again, if all lines are valid, the files will be equal. You can see this with test/test.html
and test/created_file.html
. The first was parsed, then the creation process created the second. Look at the wiki Creator page to know more about the creator.
Due to the Netscape Bookmark file format not having an official standard, many things of this parser was got by file examples in the internet (see the Nestcape Bookmarks File format and The parser in the wiki). This has legacy support for some types of items that aren't in use today. These are:
- Feed: Probably RSS feeds, just some attributes following the Microsoft's Documentation
- Web Slices: "Live bookmarks". They showed a piece of the page you saved. Extinct but in the Microsoft's Documentation. If you want more details look at the Legacy section in the wiki
- If you would like to report a bug or ask a question please open an issue.
- If you would like to help this project, you can open a Pull Request
- If you want more information about this project, have a look at the wiki