Skip to content

Commit

Permalink
Merge pull request #3 from Liana64/main
Browse files Browse the repository at this point in the history
Features: Gunicorn, CI, Docker, requirements
  • Loading branch information
isaiasghezae authored Dec 11, 2024
2 parents 326543b + e6541c3 commit de54033
Show file tree
Hide file tree
Showing 5 changed files with 318 additions and 5 deletions.
56 changes: 56 additions & 0 deletions .github/workflows/docker-build.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
---
name: Docker Build

on:
push:
branches:
- main
tags-ignore:
- "*"
pull_request:
branches:
- main
paths-ignore:
- "*"

jobs:
build:
if: "!github.event.head_commit.message || contains(github.event.head_commit.message, 'Feature') || contains(github.event.head_commit.message, 'feature') || contains(github.event.head_commit.message, 'Release') || contains(github.event.head_commit.message, 'release')"
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v4

- name: Lowercase repository owner
shell: bash
run: echo "LOWERCASE_REPO_OWNER=${GITHUB_REPOSITORY_OWNER,,}" >> $GITHUB_ENV

- name: Setup Docker Buildx
uses: docker/setup-buildx-action@v3

- name: Login to GitHub Container Registry
uses: docker/login-action@v3
with:
registry: ghcr.io
username: "${{ github.actor }}"
password: "${{ secrets.GITHUB_TOKEN }}"

- name: Build Image
uses: docker/build-push-action@v6
id: build
with:
context: .
platforms: "amd64"
push: true
cache-from: type=gha
cache-to: type=gha,mode=max
tags: |
ghcr.io/${{ env.LOWERCASE_REPO_OWNER }}/unique-turker:${{ github.sha }}
ghcr.io/${{ env.LOWERCASE_REPO_OWNER }}/unique-turker:latest
labels: |-
org.opencontainers.image.title="unique-turker"
org.opencontainers.image.url=https://ghcr.io/${{ env.LOWERCASE_REPO_OWNER }}/unique-turker
org.opencontainers.image.version="latest"
org.opencontainers.image.revision=${{ github.sha }}
org.opencontainers.image.vendor=${{ env.LOWERCASE_REPO_OWNER }}
66 changes: 66 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
FROM docker.io/library/python:3.12-alpine
LABEL org.opencontainers.image.source="https://github.com/isaiasghezae/unique-turker-2"

ARG TARGETPLATFORM
ARG VERSION
ARG CHANNEL

ENV \
PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_ROOT_USER_ACTION=ignore \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1 \
PIP_BREAK_SYSTEM_PACKAGES=1 \
CRYPTOGRAPHY_DONT_BUILD_RUST=1

ENV UMASK="0002" \
TZ="Etc/UTC" \
EXPOSED_URL="REPLACE_ME" \
EXPOSED_PROTO="HTTPS" \
CONFIG_DB="/config/database.db"

USER root

WORKDIR /config
VOLUME /config

WORKDIR /app

RUN apk add --no-cache \
bash \
catatonit \
coreutils \
curl \
jq \
nano \
tzdata \
git \
&& git clone https://github.com/isaiasghezae/unique-turker-2.git . \
&& pip install uv \
&& uv pip install --system \
flask \
flask-cors \
Flask-SQLAlchemy \
gunicorn \
&& chown -R root:root /app && chmod -R 755 /app \
&& if [ -f /config/database.db ]; then \
rm -f /app/instance/database.db; \
ln -s /config/database.db /app/instance/database.db; \
else \
cp /app/instance/database.db /config/database.db && \
rm -f /app/instance/database.db && \
ln -s /config/database.db /app/instance/database.db; \
fi \
&& chown -R root:root /app && chmod -R 755 /app \
&& chown -R nobody:nogroup /app && chmod -R 755 /config/database.db \
&& rm -rf /root/.cache /root/.cargo /tmp/*

COPY ./dockerfiles/entrypoint.sh /entrypoint.sh

USER nobody:nogroup

EXPOSE 8080

ENTRYPOINT ["/usr/bin/catatonit", "--", "/entrypoint.sh"]

106 changes: 101 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@

Full-stack Flask app with a built-in database that can be used by Mechanical Turk requesters to prevent duplicate HIT access from Mechanical Turk workers.

## 🔍Purpose
## 🔍 Purpose

Unique Turker was a service created by Myle Ott that was designed for researchers and developers who use Amazon's Mechanical Turk (MTurk) platform. In short, it allowed requesters to avoid the 40% MTurk fee that comes when recruiting more than 9 workers in a single batch. Although one could deploy a HIT with multiple batches of <9 workers, there is always the possibility that a worker could access the same HIT from multiple batches. To combat this, requesters could go on the Unique Turker site and obtain a snippet of code that they could include in their HIT HTML source code. This snippet of code communicated with the Unique Turker database to ensure that each worker could complete a particular HIT only once, thus preventing duplicate submissions. For academic researchers, obtaining unique responses is an especially desirable quality when collecting data as multiple data submissions from the same participant is almost always of no use. Therefore, Unique Turker was valuable for allowing researchers to not get duplicated responses while also saving money from avoiding the 40% fee.\* Unfortunately, however, Unique Turker went down in 2022 and seems to no longer be maintained.

Expand All @@ -29,19 +29,29 @@ This diagram breaks down how the server interacts with a HIT:

\*Note that there was a way for some workers to bypass Unique Turker in the past and so I'm similarly not expecting for perfect prevention of duplicate workers. Indeed, I've run a few HITs to see how effective the app is and, as expected, very, very few workers had repeated responses (e.g., 2 out of 500 workers in my first run).

## ️Deploying the App
## ️ Deploying the App

Steps for setting up and deploying the app:

1. Download this repository.
1. Clone this repository or download the docker image

2. Make sure to change `https://LINK-TO-YOUR-DATABASE.COM/check_worker_eligibility` in output.html to be the URL to your actual web app. It's important that the URL ends with `/check_worker_eligibility` since this is the route that handles communication with MTurk.
2. If you're cloning the repository, make sure to change `https://LINK-TO-YOUR-DATABASE.COM/check_worker_eligibility` in output.html to be the URL to your actual web app. It's important that the URL ends with `/check_worker_eligibility` since this is the route that handles communication with MTurk.

3. Upload the repository source code on any platform that can host web applications (e.g., Heroku, PythonAnywhere, Docker).

4. Deploy the web app online.

## 👨 💻How to Use
## 🐳 Docker Container

There are some configurable environment variables.

| Name | Default | Description |
| ------------- | --------------------- | ----------------------------------------------- |
| EXPOSED_URL | N/A | URL for web service |
| EXPOSED_PROTO | `HTTPS` | Protocol for web server, "HTTP" or "HTTPS" only |
| CONFIG_DB | `/config/database.db` | Path to database file |

## 💻 Using the App

There are two pages in this web app: the home page and the HTML output page.

Expand Down Expand Up @@ -81,3 +91,89 @@ In this structure, one unique identifier in the Uniqueid table can be associated

_Generating unique ID for a new HIT and obtaining the HTML source code to be uploaded to MTurk_
<img src="demo.gif" width="700" height="400" alt="Demo GIF">

## Notes: Kubernetes

If you're using Kubernetes, you might deploy a helmchart like below:

```
---
# yaml-language-server: $schema=https://raw.githubusercontent.com/bjw-s/helm-charts/main/charts/other/app-template/schemas/helmrelease-helm-v2.schema.json
apiVersion: helm.toolkit.fluxcd.io/v2
kind: HelmRelease
metadata:
name: &app unique-turker
spec:
interval: 30m
chart:
spec:
chart: app-template
version: 3.5.1
sourceRef:
kind: HelmRepository
name: bjw-s
namespace: flux-system
install:
remediation:
retries: 3
upgrade:
cleanupOnFail: true
remediation:
retries: 3
values:
controllers:
unique-turker:
type: deployment
annotations:
reloader.stakater.com/auto: "true"
containers:
app:
image:
repository: ghcr.io/liana64/unique-turker
tag: latest
env:
EXPOSED_URL: unique-turker.${SECRET_EXTERNAL_DOMAIN}
probes:
liveness:
enabled: true
readiness:
enabled: true
resources:
requests:
cpu: 15m
memory: 64Mi
limits:
memory: 256Mi
service:
app:
controller: *app
ports:
http:
port: 8080
ingress:
app:
className: traefik-external
annotations:
cert-manager.io/cluster-issuer: "letsencrypt-production"
hosts:
- host: &host "unique-turker.${SECRET_EXTERNAL_DOMAIN}"
paths:
- path: /
service:
identifier: app
port: http
tls:
- secretName: unique-turker-tls
hosts: [*host]
persistence:
data:
storageClass: local-nvme
accessMode: ReadWriteOnce
size: 256Mb
retain: true
globalMounts:
- path: /config
```
79 changes: 79 additions & 0 deletions dockerfiles/entrypoint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#!/usr/bin/env bash


APP_DB="/app/instance/database.db"
OUTPUT_FILE=/app/website/templates/output.html
FULL_URL="${EXPOSED_PROTO,,}://${EXPOSED_URL}"

if [[ "$EXPOSED_PROTO" != "HTTPS" && "$EXPOSED_PROTO" != "HTTP" ]]; then
echo "Error: EXPOSED_PROTO must be either 'HTTPS' or 'HTTP'."
exit 1
fi

if [[ -z "$EXPOSED_URL" || "$EXPOSED_URL" == "REPLACE_ME" ]]; then
echo "Error: EXPOSED_URL must be set."
exit 1
fi


echo "───────────────────────────────────────"
create_symlink() {
ln -s "$CONFIG_DB" "$APP_DB"
if [ $? -eq 0 ]; then
echo "Symbolic link created: $APP_DB -> $CONFIG_DB"
else
echo "Failed to create symbolic link."
exit 1
fi
}

# Check if /config/database.db exists
if [ ! -f "$CONFIG_DB" ]; then
echo "$CONFIG_DB does not exist. Copying from $APP_DB."

# Check if the source database exists before copying
if [ -f "$APP_DB" ]; then
cp "$APP_DB" "$CONFIG_DB"
if [ $? -eq 0 ]; then
echo "Copied $APP_DB to $CONFIG_DB."
else
echo "Failed to copy $APP_DB to $CONFIG_DB."
exit 1
fi
else
echo "Source database $APP_DB does not exist. Cannot copy."
exit 1
fi

# Create symbolic link
create_symlink
else
echo "$CONFIG_DB already exists."
echo $(du -h $CONFIG_DB)

# Check if the application database exists before attempting to delete
if [ -L "$APP_DB" ] || [ -f "$APP_DB" ]; then
rm "$APP_DB"
if [ $? -eq 0 ]; then
echo "Deleted existing $APP_DB."
else
echo "Failed to delete $APP_DB."
exit 1
fi
else
echo "$APP_DB does not exist. No need to delete."
fi

# Create symbolic link
create_symlink
fi
echo "Setting EXPOSED_URL"
sed -i.bak -e "s|var url = \"https://LINK-TO-YOUR-DATABASE.COM/check_worker_eligibility\";.*|var url = \"$FULL_URL/check_worker_eligibility\"; // IMPORTANT: This is where you put the link to your database|g" "$OUTPUT_FILE"
echo ""
echo "Done! Webserver is ready for use and accessible at:"
echo $FULL_URL
echo "───────────────────────────────────────"

exec \
/usr/local/bin/gunicorn \
--bind 0.0.0.0:8080 main:app
16 changes: 16 additions & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
blinker==1.9.0
click==8.1.7
flask==3.1.0
flask-cors==5.0.0
flask-sqlalchemy==3.1.1
greenlet==3.1.1
gunicorn==23.0.0
itsdangerous==2.2.0
jinja2==3.1.4
markupsafe==3.0.2
packaging==24.2
pip==24.3.1
sqlalchemy==2.0.36
typing-extensions==4.12.2
uv==0.5.7
werkzeug==3.1.3

0 comments on commit de54033

Please sign in to comment.