Dataset

We describe the structure of the dataset and provide optional instructions on how to run WebGazer off-line to assess its accuracy against a Tobii Pro X3-120 eye tracker.

Participant Folders

The dataset contains 51 folders with curated data of 51 participants. The format of the name of each folder is: P_X, where X is an ID for the users. Notice that the folders run from P_1 to P_64, as the original study was conducted with 64 participants. Out of these, only 51 have valid data which we include in the dataset. A detailed explanation of the experiment protocol can be found in Chapter 5 of Papoutsaki's dissertation and in the ETRA 2018 paper.

Contents of ParticipantFolders

Every P_X folder has a number of different data in it, ranging from videos, log files, and specification files:

Webcam Videos: Resolution 640x480. Their name follows the format ParticipantLogID_VideoID_-study-nameOfTask.mp4 and ParticipantLogID_VideoID_-study-nameOfTask.webm. For each task page that the user visited, there is at least one corresponding webcam video capture. If a user visited the same page multiple times, then a different webcam video would correpond to each individual. The possible task pages in increasing order of visit are:
- dot_test_instructions: instruction page for the Dot Test task.
- dot_test: Dot Test task.
- fitts_law_instructions: instruction page for the Fitts Law task.
- fitts_law: Fitts Law task.
- serp_instructions: instruction page for the search related tasks.
- benefits_of_running_instructions: instruction page for the query benefits of running.
- benefits_of_running: benefits of running SERP.
- benefits_of_running_writing: Writing portion of benefits of running search task.
- educational_advantages_of_social_networking_sites_instructions: instruction page for the query educational advantages of social networking sites.
- educational_advantages_of_social_networking_sites: educational advantages of social networking sites SERP.
- beducational_advantages_of_social_networking_sites_writing: Writing portion of educational advantages of social networking sites search task.
- where_to_find_morel_mushrooms_instructions: instruction page for the query where to find morel mushrooms.
- where_to_find_morel_mushrooms: where to find morel mushrooms SERP.
- where_to_find_morel_mushrooms_writing: Writing portion of where to find morel mushrooms search task.
- tooth_abscess_instructions: instruction page for the query tooth abscess.
- tooth_abscess: tooth abscess SERP.
- tooth_abscess_writing: Writing portion of tooth abscess sesrch task.
- dot_test_final_instructions: instruction page for the Final Dot Test task.
- dot_test_final: Final Dot Test task.
- thank_you: Questionnaire.
Screen videos: Their name follows the format P_X.mp4. They contain the video of the screen of the participant throughout the experiment.
User Interaction Logs: Their name follows the format ParticipantLogID.txt. They contain the user interactions for the whole experiment. Every task starts and ends with a line in the following format:
```
{"sessionId": "1491423217564_2_/study/dot_test_instructions", "webpage": "/study/dot_test_instructions.htm", "sessionString": "1491423217564_2_/study/dot_test_instructions", "epoch": 1491423557726, "time": 420.26000000000005, "type": "recording start", "event": "video started"}
```
- sessionId: logID_TaskID_task
- webpage: task
- sessionId: same with sessionId
- epoch: Unix Timestap
- time: (ms) since loading of page
- type: indicating that recording has started/stopped
- event: indicated that video is ready to be recorded/saved
The user interactions are captured in lines with the following format:
```
{"clientX": 724, "clientY": 440, "windowY": 23, "windowX": 0, "windowInnerWidth": 1440, "time": 1273.9850000000001, "sessionId": "1491423217564_2_/study/dot_test_instructions", "webpage": "/study/dot_test_instructions.htm", "epoch": 1491423558580, "windowOuterWidth": 1440, "windowInnerHeight": 679, "pageX": 724, "pageY": 440, "windowOuterHeight": 797, "screenY": 537, "screenX": 724, "type": "mousemove"}
```
- screenX: horizontal distance, in CSS pixels, of the left border of the user's browser from the left side of the screen
- screenY: vertical distance, in CSS pixels, of the left border of the user's browser from the left side of the screen
- clientX: horizontal coordinate of the mouse pointer in global (screen) coordinates
- clientY: vertical coordinate of the mouse pointer in global (screen) coordinates
- pageX: horizontal coordinate of the event relative to the whole document
- pageY: vertical coordinate of the event relative to the whole document
- windowX: same with screenX
- windowY: same with screenY
- windowInnerWidth: width (in pixels) of the browser window viewport
- windowInnerHeight: height (in pixels) of the browser window viewport
- windowOuterWidth: width of the outside of the browser window
- windowOuterHeight: height of the outside of the browser window
- epoch: Unix Timestap
- time: (ms) since loading of page
- type: mousemove, mouseclick, or textInput
- pos: for typing events only, top snd left defines the y and x coordinate of the carret during a key stroke

Tobii Pro X3-120 Gaze Predictions: Their name follows the format P_X.txt. They contain the gaze predictions given by Tobii Pro X3-120 at 120 Hz at the following format:

{"right_pupil_validity": 1, "right_gaze_point_on_display_area": [0.23851549625396729, 0.30423176288604736], "left_gaze_origin_validity": 0, "system_time_stamp": 1491423557714414, "right_gaze_origin_in_user_coordinate_system": [-9.197460174560547, -119.45834350585938, 649.9231567382812], "left_gaze_point_in_user_coordinate_system": [-1.0, -1.0, -1.0], "left_gaze_origin_in_user_coordinate_system": [-1.0, -1.0, -1.0], "left_pupil_validity": 0, "right_pupil_diameter": -1.0, "true_time": 1491423557.724913, "left_gaze_origin_in_trackbox_coordinate_system": [-1.0, -1.0, -1.0], "right_gaze_point_in_user_coordinate_system": [-135.97193908691406, 237.99029541015625, 8.616291999816895], "left_pupil_diameter": -1.0, "right_gaze_origin_validity": 1, "left_gaze_point_validity": 0, "right_gaze_point_validity": 1, "left_gaze_point_on_display_area": [-1.0, -1.0], "right_gaze_origin_in_trackbox_coordinate_system": [0.5202165842056274, 0.7625768184661865, 0.49974384903907776], "device_time_stamp": 193918205466}

An explanation of the three coordinate systems is provided by the Tobii Pro SDK.

specs.txt: Data provided by Tobii Pro X3-120 during its calibration phase. It can be used to measure the original accuracy and precision of the eye tracking predictions.
saved_calibration.bin: The calibration binary file for Tobii Pro X3-120.

Participant Characteristics

At the same level with the 51 user folders, you will find a spreadsheet named Participant_Characteristics. Each row corresponds to a unique participant and columns capture the following columns:

Participant ID: participant IDs in ascending order, from P_1 to P_64.
Participant Log ID: log ID that corresponds to the Unix timestamp of the start of the experiment. E.g., 1491423217564.
Notice that the Participant Log ID matches the name of the log file with the user interactions for a specific user.
Date: The date that the experiment took place. E.g., 04/18/2017.
Setting: One of Laptop or PC depending on which setting was chosen by the participant.
- Laptop: MacBook Pro (Retina, 15-inch, Late 2013), ran macOS Sierra 10.12.5, had an Intel Core i7 processor at 2.6 GHz, and a resolution of 1440 × 900 pixels.
- PC: ran Windows 10, had an Intel Core i5-6600 processor at 3.30 GHz, and a Samsung SyncMaster 2443 monitor with a 24-inch diagonal measurement and a resolution of 1920×1200 pixels. In addition, a Logitech Full HD Webcam C920 USB was attached on the top of the monitor.
Display Width (pixels): 1440 for Laptop, 1920 for PC.
Display Height (pixels): 900 for Laptop, 1200 for PC.
Screen Width (cm): 33.17 for Laptop, 20.73 for PC.
Screen Height (cm): 51.7 for Laptop, 32.31 for PC.
Distance From Screen (cm): Distance of participant's eyes to webcam, measured by tape at the beginning of the experiment.
Gender: Male or Female.
Age: e.g., 26.
Self-Reported Race: One of Asian, Black, White, or Other that match the options American Indian or Alaska Native, Asian, Black or African American, White, or Other.
Self-Reported Skin Color: 1-6, as reported in this publication.
Self-Reported Eye Color: E.g., Dark Brown to Brown. Available options selected from this chart.
Facial Hair: None, Little, Beard, as defined by the experimenter.
Self-Reported Vision: Normal, Contacts, or Glasses.
Touch Typer: Yes/No based on user's ability to touch type.
Self-Reported Handedness: Right or Left. Note that a comment on that cell might indicate that despite certain hand dominance, the user might have used the opposite hand for the experiment.
Weather: Sunny, Cloudy, or Indoors if blinds of room that the experiment was conducted were shut.
Pointing Device: Mouse or Trackpad (only for Laptop).
Notes: Irregularities as noted down by the experimenter.
Time of day: that the experiment was conducted. E.g., 16:00 for 4pm.
Duration: Estimated duration of the experiment in minutes.

Using the Dataset Extractor

If you are interested in using this dataset in conjunction with WebGazer, this software takes the dataset and creates CSV files for each video, containing per-frame WebGazer and Tobii values in normalized screen coordinates. After extraction, this makes it simple and efficent to analyse the performance of WebGazer in your favourite data science application.

Requirements:

Python 3.x
- tornado
- pytz
- numpy
- opencv-python
ffmpeg on PATH
Chrome (tested on)
A lot of disk space: 50GB+ for running on every video on every participant

Instructions:

Download the dataset and unzip into www/data/src

https://webgazer.cs.brown.edu/data/WebGazerETRA2018Dataset_Release20180420.zip

Execute the Python webserver while in the www/data/src/ folder

python webgazerExtractServer.py

Launch browser

http://localhost:8000/webgazerExtractClient.html

Watch for outputs in ../FramesDataset/

Contains:
- Per participant
- Per video
- Every video frame in the dataset as a .png
- CSV file containing a lot of useful metadata about each video frame
  - Frame file path
  - Frame number in video
  - Frame time (Unix milliseconds); only approximate at the millisecond level
  - Any mouse input since the previous frame; only approximate at the millisecond level
  - Any keyboard input since the previous frame; only approximate at the millisecond level
  - Tobii prediction closest in time; normalized to screen coordinates
  - WebGazer prediction; normalized to screen coordinates
  - CLMTracker positions for all 71 2D points
  - Eye features as extracted by WebGazer, 140 features
Watch a replay.

As it processes, the system can show the interaction events against the screen recording. Note that only laptop participants have screen recording synchronization data, and only then are they roughly aligned. Use the text box to 'try some numbers' and find the sync offset. This varies per participant.
Write out screen recording videos with interactions overlaid.

This uses OpenCV, and is a little flakey, but should work. It will slow down extraction a lot. There's a switch in the code to turn it on; let us know if it breaks (I haven't tested it in a while).

Options:

The software is currently set up to run on only the two dot tests and the four typing videos. This can be changed by editing webgazerExtractServer.py - look out for 'filter' as a keyword in comments. Likewise, the software currently processes all participants; again look for 'filter'.

Gotchas:

At times, it might look like nothing is happening to the client. It is, just on the server. E.G., extracting video frames, loading interaction log/Tobii data.
Never edit and save a CSV in Excel. It will format the numbers on reading it in, then save them out in the formatted form. E.G., the Unix timestamps are converted to standard form. : (
It's pretty easy to spit out error in screen millimetres, but be careful to check which participant was on desktop and which on laptop for real-world measurement conversion from normalized screen coordinates.
The CSV has one line per video frame. Sometimes, multiple interaction events happen within a video frame. As such, the interaction columns in the CSVs contain ordered lists, chronologically ordered in increasing time.
New versions of webgazer.js with algorithm improvements will require recomputing from scratch. These scripts ship with a version of webgazer.js from a few months ago.