Skip to content

Latest commit

 

History

History
189 lines (142 loc) · 11.5 KB

README.md

File metadata and controls

189 lines (142 loc) · 11.5 KB

Acomoeye-NN: NVGaze gaze estimation with upgrades

Implementation of one-shot direct gaze estimation NN based on NVGaze as described in Nvidia paper. In a long run, the project aims to implement low latency eye-tracking for VR/AR goggles' high demands. The implementation is based on Intel's Openvino training extension from license plate repo.

alt text

I have added few additional techniques to upgrade base NVGaze implementation, which you can turn on/off in config:

  • 🍉 coordConv (paper)
  • 🍉 globalContext (paper)
  • 🍉 coarseDropout (example)
  • 🍉 fireBlocks (code)
  • 🍉 selfAttention (paper)

alt text

I have achieved ~2.5° angular generalization error on Acomo-14 dataset, with base implementation as fast as 0.36 ms inference in openvino 🎉.

Quick Start Guide

Install deps Windows/Linux:

  1. install nvidia driver which support at least cuda 10.0
  2. download & install 🐍 anaconda ; set anaconda PATH (may check 'set Path as default' on Windows installer)
conda create -n tf_gpu python=3.6
conda activate tf_gpu	
conda install -c anaconda tensorflow-gpu=1.15
conda install -c conda-forge opencv=3.4
conda install -c anaconda cython
conda install -c conda-forge imgaug
git clone https://github.com/czero69/acomoeye-NN.git

Datasets Acomo-14

Dataset for 8 subjects in 14 tests, 49 gaze points each test. Each gaze point has 1k photos of 'narrowing' pupil with gaze direction label: gazeX (yaw), gazeY (pitch). Rotations are applied in order yaw, pitch as extrinsic rotation. Note that it is different notation compared to NVGaze datasets. Dataset was gathered with a 300Hz IR camera mounted in Oculus DK2 on 40°x40° FOV. Dataset is ready to be trained, images are cropped and resized to 127x127.

💿 Download Acomo-14 dataset.

  • ⚡ for train use: Train6-merged-monocular-R-shuffled-train.csv (80% gaze points from 7 subjects in 13 tests)
  • ⚡ for valid use: Train6-merged-monocular-R-shuffled-test-SMALL.csv (100% gaze points from 1 unseen subject plus 20% gaze points from 7 subjects in 13 tests)

Run

1️⃣ Edit paths for train and eval in .config.py:

cd tensorflow_toolkit/eyetracking/
gedit acomo_basic/config.py

2️⃣ Training & eval:

python tools/train.py acomo_basic/config.py
python tools/eval.py acomo_basic/config.py

#️⃣ You can run train and eval concurrently, so later you can plot per-subject accuracies in tensorboard.

Tips

🔸 To run eval or training on tensorflow CPU and not GPU set in .config in eval/train class:

CUDA_VISIBLE_DEVICES = ""

🔸 To export openvino model (you must install openvino environment, see below):

python tools/export.py --data_type FP32 acomo_basic/config.py

🔸 To check out openvino inference engine, 79000 - exported iteration number:

python tools/infer_ie.py --model model_acomo_basic/export_79000/IR/FP32/eyenet.xml --config acomo_basic/config.py /path/to/img/0005353.png

🔸 To plot loss, accuracies, per-subject accuracies etc. run tensorboard:

cd ./model_acomo_basic
tensorboard --logdir ./ --port 5999
localhost:5999		# paste in web browser

Results

🔷 Table 1. Inference time for 127x127 input and first layer L=16.

inference engine baseline coord-
Conv
global-
Context
coarse-
Dropout
fireBlocks attention cgDa cgDaf
tf-gpu (ms) 0.7568 0.7691 1.2115 0.7636 1.5305 0.8589 1.3492 2.0812
tf-cpu (ms) 0.6877 0.8687 0.9158 0.6959 1.1433 0.7450 1.1114 2.0415
openvino-cpu (ms) 0.3621 0.3977 1.0357 0.3643 0.6118 0.4936 1.2463 1.4081
---------------- -------- --------- ------- ------- --------- --------- ---- -----
parameters count 157755 158043 158459 157755 22424 159853 160845 25514

🔷 Table 2. Generalization for unseen subjects, angular error in degrees, trained for 1M iterations. Error with affine calibration (calibrated, bottom) and without affine calibration (raw, upper) is reported. (N/T), N - number of the subject when test, T - number of the subject when train. NVGaze Datasets were used 'as is' forex. NVGaze-AR was not cropped to pupil location. Input res: 127x127, first layer L=16.

Dataset raw/calibrated baseline coord-
conv
global-
context
coarse-
dropout
fireBlock attention cgda cgdaf
NVGaze-AR (2/40) 6.37°
4.88°
6.16°
3.81°
6.93°
4.43°
7.34°
5.34°
6.41°
4.91°
6.41°
4.99°
9.40°
5.65°
9.64°
6.46°
NVGaze-VR (4/9) 3.49°
2.33°
3.00°
2.51°
3.30°
2.57°
3.58°
2.63°
3.27°
2.67°
3.04°
2.29°
2.79°
2.47°
3.21°
2.69°
Acomo (1/8) 5.24°
3.44°
4.93°
3.48°
5.11°
2.68°
6.17°
3.66°
4.23°
3.28°
4.86°
3.22°
7.51°
3.86°
3.99°
2.57°

🔷 Table 3. Generalization for new gaze vectors (amongst known subjects), angular error in degrees, trained for 1M iterations. Error with affine calibration (calibrated, bottom) and without affine calibration (raw, upper) is reported. (N), N - number of the subject when test. NVGaze Datasets were used 'as is' forex. NVGaze-AR was not cropped to pupil location. Input res: 127x127, first layer L=16.

Dataset raw/calibrated baseline coord-
conv
global-
context
coarse-
dropout
fireBlock attention cgda cgdaf
NVGaze-syntetic (40) 2.31°
2.09°
2.27°
2.14°
2.23°
2.04°
2.25°
2.13°
2.15°
2.07°
1.79°
1.69°
1.99°
1.95°
1.88°
1.76°
NVGaze-AR (42) 3.76°
3.23°
3.51°
2.87°
3.81°
3.17°
3.95°
3.36°
3.95°
3.43°
3.45°
3.00°
4.10°
3.32°
4.37°
3.61°
NVGaze-VR (9) 2.89°
2.48°
2.52°
2.09°
2.62°
2.16°
2.73°
2.28°
2.86°
2.55°
2.63°
2.34°
2.42°
2.26°
2.45°
2.24°
Acomo (8) 4.05°
3.01°
3.84°
2.97°
3.72°
2.66°
4.39°
3.24°
3.51°
2.97°
3.84°
2.98°
4.50°
3.21°
3.41°
2.67°

Choosing the best upgrade technique for each dataset, proposed upgrades gives better avaraged accuracy, as follows:

  • -0.82° generalization new subject raw error
  • -0.70° generalization new subject affine calibrated error
  • -0.51° generalization new gaze vectors raw error
  • -0.38° generalization new gaze vectors affine calibrated error

OpenVINO Training Extensions

OpenVINO Training Extensions provide a convenient environment to train Deep Learning models and convert them using OpenVINO™ Toolkit for optimized inference.

Setup OpenVINO Training Extensions

  1. Clone repository in the working directory
cd /<path_to_working_dir>
git clone https://github.com/opencv/openvino_training_extensions.git
  1. Install prerequisites
sudo apt-get install libturbojpeg python3-tk python3-pip virtualenv

Citation

Please cite NVGaze paper if you use this NN architecture:

@inproceedings{kim2019,
	author = {Kim, Joohwan and Stengel, Michael and Majercik, Alexander and De Mello, Shalini and Dunn, David and Laine, Samuli and McGuire, Morgan and Luebke, David},
	title = {NVGaze: An Anatomically-Informed Dataset for Low-Latency, Near-Eye Gaze Estimation},
	booktitle = {Proceedings of the SIGCHI Conference on Human Factors in Computing Systems},
	series = {CHI '19},
	year = {2019},
	isbn = {978-1-4503-5970-2/19/05},
	location = {Glasgow, Scotland UK},
	numpages = {10},
	url = {https://sites.google.com/nvidia.com/nvgaze},
	doi = {10.1145/3290605.3300780},
	acmid = {978-1-4503-5970-2/19/05},
	publisher = {ACM},
	address = {New York, NY, USA},
	keywords = {eye tracking, machine learning, dataset, virtual reality},
}
@INPROCEEDINGS{7410785,
  author={E. {Wood} and T. {Baltruaitis} and X. {Zhang} and Y. {Sugano} and P. {Robinson} and A. {Bulling}},
  booktitle={2015 IEEE International Conference on Computer Vision (ICCV)}, 
  title={Rendering of Eyes for Eye-Shape Registration and Gaze Estimation}, 
  year={2015},
  volume={},
  number={},
  pages={3756-3764},}

If you find LPRNet useful in your research, or you are using FireBlocks implementation, please, consider to cite the following paper:

@article{icv2018lprnet,
title={LPRNet: License Plate Recognition via Deep Neural Networks},
author={Sergey Zherzdev and Alexey Gruzdev},
journal={arXiv:1806.10447},
year={2018}
}