-
Install NodeJS 14: https://nodejs.org/en/download/
-
Configure the NodeJS environment for development
export NODE_ENV=development export NODE_EXTRA_CA_CERTS=/path/to/custom/ca/cert.pem
-
Install packages
npm ci
-
Start the server
npx gulp watch:dev:app
-
Build the embed
npx gulp build:dev:embed
-
Build the extension
npx gulp build:dev:extension
-
Load the extension in a browser
- Chrome
- Go to chrome://extensions and enable "Developer mode"
- Click "Load unpacked extension..." and select the output directory (
bin/dev/extension
)
- Firefox
- Go to about:debugging
- Click "This Firefox"
- Click "Load Temporary Add-on" and select the
manifest.json
file from the output directory (bin/dev/extension
)
- Chrome
-
Build the native client script bundles
npx gulp build:dev:native-client-reader npx gulp build:dev:native-client-share-extension
-
Copy files to the
ios
repository underIosApp/reader.js
andShareExtension/share-extension.js
and updateRRITReaderScriptVersion
andRRITShareExtensionScriptVersion
in theplist
files.
The article content parser supports publisher-specific rules in order to correct parsing issues caused by a publisher's article web page structure. Different rules can be added to fix different problems such as:
- Primary text content mis-identification.
- Ads or other noise identified as primary text content.
- Missing images.
The process of diagnosing the root cause of primary text content mis-identification and applying an article content parser rule to fix it can vary widely from publisher to publisher, but there are some general similarities as well. In order to document the general approach we'll use this article, which is unreadable as of version 2.0.1
of common/contentParser
, as an example: https://www.taosnews.com/news/business/talpa-resident-discusses-reading-app-that-promotes-digital-mindfulness/article_c61a3588-4eae-5d93-b897-1294ccc61b1f.html
Start by answering the following questions:
-
What content is being identified as the primary text content by the article content parser? This can often be achieved by simply opening the article in reader mode and comparing the text content therein with the article's HTML source (easily viewable and searchable using "View page source" in a browser).
In this example, by searching for the first sentence visible in reader mode in the HTML source, we can see that the publisher's subscription promotional copy is identified as the primary text content of the article even though that copy is not visible when the web page first loads in a browser. This is likely caused by the promotional copy consisting of a larger cluster of text nodes than the article itself which is a common problem for short articles.
-
Is the actual article primary text content present in the HTML source? The easiest way to check is to search for snippets of the primary text in the HTML source. Be cautious of searching for any characters that may be HTML encoded. It's best to search for snippets that only contain basic alphanumeric ASCII characters.
In this example, by searching for the first sentence of the article content in the HTML source, we get several results including the actual article content nodes as well as several metadata nodes. It's important to verify that the node cluster actually contains the full article text and not just a preview.
If the actual primary text content is present in the HTML source then continue to the next step.
To point the parser in the right direction, add a new rule that specifies a selector for an ancestor of the primary text content elements that is not also an ancestor of the elements that are mis-identified as primary text elements.
-
Choose a selector. In our example a
div
element with an[itemprop="articleBody"]
attribute is the nearest common ancestor of the primary text elements. This seems like a good candidate for the following reasons:- The element is close to the primary text elements as measured by depth.
- The element uses schema.org microdata to specify that it contains the article body content.
- Querying the document for the selector returns only a single element.
-
Create the rule. Add or amend an existing entry in the
publishers
array in thesrc\common\contentParsing\configuration\configs.ts
file. Entries should be sorted alphabetically using reverse-DNS notation and subdomains (including www.) should be excluded unless specifically required since any subdomain will match a parent domain. There is no existing entry fortaosnews.com
so we'll add a new one:{ hostname: 'taosnews.com', contentSearchRootElementSelector: '[itemprop="articleBody"]' }
-
Test the rule. Increment the versions for
common/contentParser
(in this case to version2.0.2
) as well asnativeClient/reader
andnativeClient/shareExtension
which both reference the content parser. Build both scripts usingnpx gulp build:prod:native-client-reader
andnpx gulp build:prod:native-client-share-extension
. Copy both scripts to their respective locations in either theios
ordesktop
repositories for testing.Repeat the previous steps as many times as necessary until all new rules are working as intended. Then commit the changes to this repository, follow instructions in other repositories for updating the bundled script files, and follow the instructions in the
static
repository for uploading the scripts to thestatic.readup.org
server to make them available for existing Readup client applications.
In all the steps below, {version}
is a placeholder for the unquoted version number of this release, e.g. 1.2.3
.
-
Increment the
[it.reallyread].version.app
version number inpackage.json
for this release according to Semantic Versioning if it hasn't been incremented already. If you want to trigger an "Update Available" toast in the app, update theappPublic
version as well. The toast prompts for a reload of the app for every client running the old app. If you do not want to trigger an update toast, for example during a non-critical patch release, do not update theappPublic
version.appPublic
is the only version theweb
client sees. This will result in the old app version running in clients until they are fully restarted. -
Commit the incremented
package.json
with a commit message like "app release", where "app" can be replaced with whichever sub-projects in the web monorepo (extension, embed, app, native-client) have been updated. -
Run the
publish-app.ps1
script. The deployable assets will be available in thepkg/app/{version}
directory after the script finishes. -
Upload the assets.
- Upload the
app-{version}.zip
package tos3://aws.reallyread.it/web-sites
. This is the private location that the EC2 instances will download the package from during installation. - Upload the
.js
and.css
bundle files tos3://static.readup.org/app/bundles
. This is the public hosted location that the webapp will link to.
- Upload the
-
Install the package.
Perform a zero-downtime deployment by installing the package to a server that is offline, bringing that server online, and then taking the server with the old version of the package offline.
-
Select the target EC2 instance that is currently offline and start the instance. It should be named
readup-app-server-x
wherex
isa
orb
. -
Log in to the server using Remote Desktop over the VPN connection (see
ops
README
) and execute the update script using PowerShell.Execute-S3Object -Key app/update.ps1 -Params @{SiteName = 'readup.org'; Version = '{version}'}
-
Log out of the server. Don't just close the connection, but use a "Sign out" action.
-
-
Register the EC2 instance with the
app-servers
ELB target group. As soon as the new instance is healthy, the old and new instance will each receive 50% of the traffic. -
When the target is healthy, deregister the old EC2 instance from the ELB target group. It will start "draining". The new instance will receive 100% of the traffic. This is a good moment to check whether there are any critical bugs in the release.
-
When the old EC2 instance has finished draining, log in to the server and shut it down with the following command.
Execute-S3Object -Key app/shutdown.ps1
Important: Always shut down the EC2 servers using the script referenced above. Failure to do so will result in log files not being shipped and processed by the analytics server.