My sincere thanks to yahoo/fili & yahoo/elide, which gave tremendous amount of guidance on design and development of Athena, and to my former employer, Yahoo, who taught me to love software engineering and fundamentally influenced my tech career
Athena is a Java library that lets you set up object storage webservice with minimal effort. Athena is meant to be specialized on managing files, such as books, videos, and photos. It supports object storage through two variants of APIs:
-
A JSON API for uploading and downloading files
-
A GraphQL API for reading file metadata, including
- File name
- File type
- etc.
Athena has first-class support for OpenStack Swift and Hadoop HDFS file storage back-ends, but Athena's flexible pipeline-style architecture can handle nearly any back-end for data storage, such as S3.
Object storage (also known as object-based storage) is a computer data storage architecture that manages data as objects, as opposed to other storage architectures like file systems which manages data as a file hierarchy, and block storage which manages data as blocks within sectors and tracks.
Each object (i.e. file), in Athena, typically includes:
- the data itself,
- a variable amount of metadata, and
- a globally unique identifier
Athena allow retention of massive amounts of unstructured data in which data is written once and read once (or many times). It is used for purposes such as storing objects like videos and photos.
Athena, however, is not intended for transactional data and _ does not support the locking and sharing mechanisms needed to maintain a single, accurately updated version of a file_.
Athena comes with a pre-configured example application to help you get started and serve as a jumping-off-point for building your own web service using Athena. The example application lets you upload and download books you love to read, and picks up where Swift's quick-start tutorial leaves off.
One of the design principles of Athena is to abstract lower layers of storage away from the administrators and applications. Thus, data is exposed and managed as objects instead of files or blocks. They do not have to perform lower-level storage functions like constructing and managing logical volumes to utilize disk capacity or setting RAID levels to deal with disk failure.
Athena also allows the addressing and identification of individual objects by more than just file name and file path. Athena adds a unique identifier across the entire system, to support much larger namespaces and eliminate name collisions.
Athena explicitly separates file metadata from data to support additional capabilities. As opposed to fixed metadata in file systems (filename, creation date, type, etc.), Athena provides for full function, custom, object-level metadata in order to:
- Capture application-specific or user-specific information for better indexing purposes
- Support data-management policies (e.g. a policy to drive object movement from one storage tier to another)
- Centralize management of storage across many individual nodes and clusters
- Optimize metadata storage (e.g. encapsulated, database or key value storage) and caching/indexing (when authoritative metadata is encapsulated with the metadata inside the object) independently from the data storage (e.g. unstructured binary storage)
Athena provides programmatic interfaces to allow applications to manipulate data. At the base level, this includes create, read, and delete (CRUD) functions for basic read, write and delete operations. The API implementations are REST-based, allowing the use of many standard HTTP calls.
More information about Athena can be found here
The use and distribution terms for Athena are covered by the Apache License, Version 2.0.