Skip to content

Latest commit

 

History

History
306 lines (283 loc) · 9.88 KB

mtab.md

File metadata and controls

306 lines (283 loc) · 9.88 KB

MTab: Tabular Data Annotation


News:

  • To be expected: Data analysis, and QA for table data.
  • 2021/06/04 - Address the run time error of big table annotation. Solution: only process 1000 cells of tables as the default setting. However, you can adjust the parameter "limit" as your need by adding "?limit=1000" of the API. We also set the timeout limit of our API up to 3 days.
  • 2021/02/23 - API is online at https://mtab.app. This API is optimized and running on Mac mini M1.

Features:

  • Semantic Annotation with knowledge graphs: Wikidata, Wikipedia, DBpedia
    • Annotate table cells with entities
    • Annotate table attributes (columns) with entity types, or classes. Currently, the tool supports table attributes as table columns.
    • Annotate the relation between table attributes (columns) with properties (relations, or predicates).
  • Structure Annotation:
    • Table type prediction: matrix, relational, layout (under development)
    • Header detection
    • Core attribute (subject column, or primary key) detection
    • basic stats including number of rows, columns, cells.
    • data types (under development)
    • languages (under development)
  • Good for table interpretation, data integration, and knowledge discovery.

Interface:

https://mtab.app/mtab

API URL:

https://mtab.app/api/v1/mtab

Usage:

Users can send table files (in CSV, Excel, TSV format) to the tool and get the annotations.

Note: We do not keep your data. After processing your tables, we will delete your data immediately.

  • Due to the efficiency issue, the API will do annotation only the first 1000 rows of the input table.However, you can adjust the API parameter as “limit=1000”. Maybe you can change it to 1 million if you have 1 million rows in your table, but it will take more time to do annotation. For example:
% curl -X POST -F file=@"YOUR_FILE_LOCATION/YOUR_TABLE.xltx" https://mtab.app/api/v1/mtab?limit=1000000

Annotate a table:

Annotate a table in the Excel format 0AJSJYAL.xltx

% curl -X POST -F file=@"YOUR_FILE_LOCATION/0AJSJYAL.xltx" https://mtab.app/api/v1/mtab

Expected Answer:

{
  "n_tables": 1,
  "status": "Success",
  "tables": [
    {
      "name": "0AJSJYAL",
      "run_time": 0.8414499759674072,
      "semantic": {
        "cea": [
          {
            "annotation": {
              "dbpedia": "http://dbpedia.org/resource/Newport,_Arkansas",
              "desc": "city in Arkansas, USA",
              "label": "Newport",
              "wikidata": "http://www.wikidata.org/entity/Q79414",
              "wikipedia": "http://en.wikipedia.org/wiki/Newport,_Arkansas"
            },
            "target": [
              1,
              0
            ]
          },
          {
            "annotation": {
              "dbpedia": "http://dbpedia.org/resource/Thomas_(2001_film)",
              "desc": "2001 film by Raffaele Mertes",
              "label": "Thomas",
              "wikidata": "http://www.wikidata.org/entity/Q2421872",
              "wikipedia": "http://en.wikipedia.org/wiki/Thomas_(2001_film)"
            },
            "target": [
              2,
              0
            ]
          },
          {
            "annotation": {
              "dbpedia": "http://dbpedia.org/resource/8082_Haynes",
              "desc": "asteroid",
              "label": "8082 Haynes",
              "wikidata": "http://www.wikidata.org/entity/Q533244",
              "wikipedia": "http://en.wikipedia.org/wiki/8082_Haynes"
            },
            "target": [
              3,
              0
            ]
          },
          {
            "annotation": {
              "desc": "family name",
              "label": "Lampitt",
              "wikidata": "http://www.wikidata.org/entity/Q37468695"
            },
            "target": [
              4,
              0
            ]
          },
          {
            "annotation": {
              "dbpedia": "http://dbpedia.org/resource/Solanki",
              "desc": "family name",
              "label": "Solanki",
              "wikidata": "http://www.wikidata.org/entity/Q37521226",
              "wikipedia": "http://en.wikipedia.org/wiki/Solanki"
            },
            "target": [
              5,
              0
            ]
          },
          {
            "annotation": {
              "dbpedia": "http://dbpedia.org/resource/Weston,_Colorado",
              "desc": "unincorporated community in Colorado",
              "label": "Weston",
              "wikidata": "http://www.wikidata.org/entity/Q7989353",
              "wikipedia": "http://en.wikipedia.org/wiki/Weston,_Colorado"
            },
            "target": [
              6,
              0
            ]
          }
        ],
        "cpa": [],
        "cta": [
          {
            "annotation": [
              {
                "dbpedia": "http://dbpedia.org/resource/Human_settlement",
                "desc": "community of any size, in which people live",
                "label": "human settlement",
                "wikidata": "http://www.wikidata.org/entity/Q486972",
                "wikipedia": "http://en.wikipedia.org/wiki/Human_settlement"
              },
              {
                "dbpedia": "http://dbpedia.org/resource/Surname",
                "desc": "part of a naming scheme for individuals, used in many cultures worldwide",
                "label": "family name",
                "wikidata": "http://www.wikidata.org/entity/Q101352",
                "wikipedia": "http://en.wikipedia.org/wiki/Surname"
              }
            ],
            "target": 0
          }
        ]
      },
      "status": "Success",
      "structure": {
        "cells": 37,
        "columns": 7,
        "core_attribute": 0,
        "encoding": "utf-8",
        "headers": [
          0
        ],
        "r_cells": 0.7551020408163265,
        "rows": 7,
        "table type": "vertical relation"
      },
      "table_cells": [
        [
          "col0",
          "col1",
          "col2",
          "col3",
          "col4",
          "col5",
          "col6"
        ],
        [
          "Newport",
          "31",
          "8",
          "95",
          "2",
          "-",
          "-"
        ],
        [
          "Thomas",
          "30",
          "5",
          "98",
          "2",
          "-",
          "-"
        ],
        [
          "Haynes",
          "25",
          "8",
          "68",
          "2",
          "-",
          "-"
        ],
        [
          "Lampitt",
          "29.4",
          "10",
          "73",
          "3",
          "-",
          "-"
        ],
        [
          "Solanki",
          "19",
          "4",
          "76",
          "1",
          "-",
          "-"
        ],
        [
          "Weston",
          "1",
          "0",
          "1",
          "0",
          "-",
          "-"
        ]
      ]
    }
  ]
}

Annotate multiple tables:

Note:

  • Please do not send more than 100 tables for one request to avoid data transmission corruption. (We only process 100 tables per request).
  • Please put your tables in a folder named tables and compress like this file mytables.zip. (We only accept a compressed file in zip format to speed up data transmission)
mytable.zip
|-- tables (folder)
|   |--table_1.csv 
|   |--table_2.csv 
|   |--...

Command:

% curl -X POST -F file=@"YOUR_FILE_LOCATION/mytables.zip" https://mtab.app/api/v1/mtab

Expected Answer: Refer to the mytables.json as the full answers

Annotate multiple tables with targets (CEA, CTA, and CPA as SemTab challenge):

Note: The format of the compressed file like mytables_ntar.zip.

mytable_ntar.zip
|-- tables (folder)
|   |--table_1.csv 
|   |--table_2.csv 
|   |--...
|-- cea.csv (Cell annotation targets in the format of [table ID, row index, column index])
|-- cta.csv (Column annotation targets in the format of [table ID, column index])
|-- cpa.csv (The relation between two columns in the format of [table ID, column 1, column 2]

Command:

% curl -X POST -F file=@"YOUR_ZIP_FILE_LOCATION/mytables_ntar.zip" https://mtab.app/api/v1/mtab

Expected Answer:

Refer to the mytables_ntar.json as the full answers

Other Examples:

Table Annotation

Data correction:

Input: Tabular data

col0 col1 col2 col3
2MASS J10540655-0031018 -5.7 19.3716366 13.6356351
2MASS J0464841+0715177 -2.77475 26.671236 11.8187551
2MAS J08351104+2006371 72.216 3.7242888 128.151961
2MASS J08330994+186328 -6.993 6.0962562 127.649963

Output:

star radial velocity parallax right ascension
2MASS J00540655-0031018 -5.70 19.2561 13.52741580209200
2MASS J00464841+0715177 -2.75 26.6180 11.70173767885790
2MASS J08351104+2006371 72 3.6984 128.79594070217040
2MASS J08330994+1806328 -7 6.1146 128.29142004157090