diff --git a/book/src/arch/ecs.md b/book/src/arch/ecs.md index c07c4435c..a8fd05e4f 100644 --- a/book/src/arch/ecs.md +++ b/book/src/arch/ecs.md @@ -1,17 +1,17 @@ # The Architecture of Xvc Entity Component System -Xvc uses an entity component system (ECS) in its core. +Xvc uses an entity component system (ECS) in its core. [ECS architecture] is popular among game development, but didn't find popularity in other areas. -It's an alternative to Object-Oriented Programming. +It's an alternative to Object-Oriented Programming. [ECS architecture]: https://en.wikipedia.org/wiki/Entity_component_system -There are a few basic notions of ECS architecture. +There are a few basic notions of ECS architecture. Although it may differ in other frameworks, Xvc assumes the following: -- An entity is a neutral way of tracking components and their relationships. +- An entity is a neutral way of tracking components and their relationships. It doesn't contain any semantics other than _being an entity._ -An _entity_ in Xvc is an atomic usize integer. (`XvcEntity`) +An _entity_ in Xvc is an atomic integer tuple. (`XvcEntity`) - A component is a bundle of associated data about these entities. All semantics of entities are described through components. @@ -20,84 +20,104 @@ Xvc uses components to keep track of different aspects of file system objects, d - A system is where the components are created and modified. Xvc considers all modules that interact with components as separate systems. -For example, suppose you're want to track a new file in Xvc. -Xvc creates a new entity for this file. -Associates the path (`XvcPath`) with this entity. -Checks the file metadata, creates an instance of `XvcMetadata`, and associates it with this entity. -If this object is commit to Xvc cache, an `XvcDigest` struct is associated with the entity. +Suppose you're want to track a new file in Xvc. +Xvc creates a new entity for this file. +Associates the path (`XvcPath`) with this entity. +Checks the file metadata, creates an instance of `XvcMetadata`, and associates it with this entity. +If this object is commit to Xvc cache, an `XvcDigest` struct is associated with the entity. -The difference from OOP is that there is no _basic_ or _main_ object. -If you want to work only with digests and want to find the workspace paths associated with them, you can write a function (system) that starts from `XvcDigest` records and collect the associated paths. -If you want to get only the files larger than a certain size, you can work with `XvcMetadata`, filter them and get the paths later. -In contrast, in an OOP setting, these kind of data is associated with paths and when you want to do such operations, you need to load paths and all their associations first. +The difference from OOP is that there is no _basic_ or _main_ object. +If you want to work only with digests and want to find the workspace paths associated with them, you can write a function (system) that starts from `XvcDigest` records and collect the associated paths. +If you want to get only the files larger than a certain size, you can work with `XvcMetadata`, filter them and get the paths later. +In contrast, in an OOP setting, these kind of data is associated with paths and when you want to do such operations, you need to load paths and all their associations first. OOP way of doing things is usually against the principle of locality. The whole idea is to be flexible for further changes. -As of now, Xvc doesn't have different notion of _data_ and _models._ -It doesn't have different functionality for files that are models or data. -In the future, however, when this will be added, an `XvcModel` component will be created and associated with the same entity of an `XvcPath`. +As of now, Xvc doesn't have different notion of _data_ and _models._ +It doesn't have different functionality for files that are models or data. +In the future, however, when this will be added, an `XvcModel` component will be created and associated with the same entity of an `XvcPath`. It will allow to work with some paths as model files but it doesn't require _paths_ to be known beforehand. -There may be other metadata, like _features_ or _version_ associated with models that are more important. -There may be some models without a file system path, maybe living only in memory or on the cloud. +There may be other metadata, like _features_ or _version_ associated with models that are more important. +There may be some models without a file system path, maybe living only in memory or in the cloud. Those kind of models might be checked by verifying whether the model has a corresponding `XvcPath` component or not. -In contrast, OOP would define this either by _inheritance_ (a model is a path) or _containment_ (a model has a path). +In contrast, OOP would define this either by _inheritance_ (a model is a path) or _containment_ (a model has a path). When you select any of these, it becomes a _relationship_ that must be maintained indefinitely. -When you only have an integer that identifies these components, it's much easier to describe _models without a path_ later. -There is no predefined relationship between paths and models. +When you only have an integer that identifies these components, it's much easier to describe _models without a path_ later. +There is no predefined relationship between paths and models. -The architecture is approximately similar to database modeling. +The architecture is approximately similar to database modeling. Components are in-memory tables, albeit they are small and mostly contain a few fields. Entities are sequential primary keys. -Systems are _insert_, _query_ and _update_ mechanisms. +Systems are _insert_, _query_ and _update_ mechanisms. -# Stores +## Stores -An `XvcStore` in its basic definition is a map structure between `XvcEntity` and a component type `T` -It has facilities for persistence, iteration, search and filtering. +An `XvcStore` in its basic definition is a map structure between `XvcEntity` and a component type `T` +It has facilities for persistence, iteration, search and filtering. It can be considered a _system_ in the usual ECS sense. -## Loading and Saving Stores +### Loading and Saving Stores As our goal is to track data files with Git, stores save and load binary files' metadata to text files. Instead of storing the binary data itself in Git, Xvc stores information about these files to track whether they are changed. By default, these metadata are persisted to JSON. -Component types must be serializable because of this. +Component types must be serializable because of this. They are meant to be stored to disk in JSON format. -Nevertheless, as they are almost always composed of basic types [serde] supports, this doesn't pose a difficulty in usage. -The JSON files are then commit to Git. +Nevertheless, as they are almost always composed of basic types [serde] supports, this doesn't pose a difficulty in usage. +The JSON files are then commit to Git. Note that, there are usually multiple branches in Git repositories. Also multiple users may work on the same branch. -When these text files are reused by the stores, they are modified and this may lead to merge conflicts. -We don't want our users to deal with merge conflicts with entities and components residing in text files. +When these text files are reused by the stores, they are modified and this may lead to merge conflicts. +We don't want our users to deal with merge conflicts with entities and +components in text files. +This also makes it possible to use binary formats like MessagePack in the +future. -Suppose user A made a change in `XvcStore` by adding a few files. +Suppose user A made a change in `XvcStore` by adding a few files. Another user B made another change to the project, by adding another set of files in another copy of the project. -This will lead to merge conflicts: -- `XvcEntity` counter will have different values in A and B's repositories. -- `XvcStore` will have different records in A and B's repositories. +This will lead to merge conflicts: -Instead of saving and loading to monolithical files, `XvcStore` saves and loads _event logs._ +- `XvcEntity` counter will have different values in A and B's repositories. +- `XvcStore` will have different records in A and B's repositories. + +Instead of saving and loading to monolithical files, `XvcStore` saves and loads _event logs._ There are two kind of events in a store: + - `Add(XvcEntity, T)`: Adds an element `T` to a store. - `Remove(XvcEntity)`: Removes the element with entity id. -These events are saved into files. -When the store is loaded, all files after the last full snapshot are loaded and replayed. +These events are saved into files. +When the store is loaded, all files after the last full snapshot are loaded and replayed. -When you add an item to a store, it saves the `Add` event to a log. +When you add an item to a store, it saves the `Add` event to a log. These events are then put into a vector. -A `BTreeMap` is also created by this vector. +A `BTreeMap` is also created by this vector. When an item is deleted, a `Remove` event is added to the event vector. While loading, stores removes the elements with `Remove` events from the `BTreeMap`. -So the final set of elements doesn't contain the removed item. +So the final set of elements doesn't contain the removed item. + +The second problem with multiple branches is duplicate entities in separate +branches. Xvc uses a _counter_ to generate unique entity ids. +When a store is loaded, it checks the last entity id in the event log and uses +it as the starting point for the counter. But using this counter as is causes +duplicate values in different branches. Xvc solves this by adding a random value +to these counter values. -Stores also have a inverse index for quick lookup. -They store value of `T` as key and a list of entities that correspond to this key. -For example, when we have a path that we stored, it's a single operation to get the corresponding `XvcEntity` and after this, all recorded metadata about this path is available. +Since v0.5, `XvcEntity` is a tuple of 64-bit integers. The first is loaded from +the disk and is an atomic counter. The second is a random value that is renewed +at every command invocation. Therefore we have a unique entity id for every run, +that's also sortable by the first value. Easy sorting with integers is sometimes +required for stable lists. + +### Inverted Index + +Stores also have a inverted index for quick lookup. +They store value of `T` as key and a list of entities that correspond to this key. +For example, when we have a path that we stored, it's a single operation to get the corresponding `XvcEntity` and after this, all recorded metadata about this path is available. All search, iteration and filtering functionality is performed using these two internal maps. @@ -108,54 +128,44 @@ In summary, a store has four components. - A mutable map of the current data: `BTreeMap` - A mutable map of the entities from values: `BTreeMap>` -Insert, update and delete operations affect mutable log and maps. +Note that, when two branches perform the same operation, the event logs will be +different, as the random part of `XvcEntity` is different. When two parties +branches merge, the inverted index may contain conflicting values. In this case, +a `fsck` command is used to merge the store files and merge conflicting entity +ids. + +Insert, update and delete operations affect mutable log and maps. Queries, iteration and such non-destructive operations are done with the maps. -When loading, all log files are merged in immutable log. +When loading, all log files are merged in immutable log. No standard operation touches the event logs. All log modifications are done outside of the normal worflow. When saving, only the mutable log is saved. -Note that only can only be added to the log, they are not removed. -(See `xvc fsck --merge-stores` for merging store files.) +Note that only can only be added to the log, they are not removed. +(See `xvc fsck --merge-stores` for merging store files.) -In the future, if the performance for loading/saving becomes a bottleneck, the map can also be serialized and only the events after its record can be replayed. -For the time being additional complexity from saving multiple files is avoided. +### Relationship Stores -## Relationship Stores +`XvcStore` keeps component-per-entity. +Each component is a flat structure that doesn't refer to other components. -The store keeps data-per-entity. Xvc also has _relation_ stores that represent relationships between entities, and components. -As in the database Entity-Relationship Theory, there are three kinds of the relationship store: +Similar to the database Entity-Relationship model, there are three kinds of the relationship store: -`R11Store` keeps two sets of components associated with the same entity. +`R11Store` keeps two sets of components associated with the same entity. It represents a 1-1 relationship between `T` and `U`. -It contains two `XvcStore`s for each component type. +It contains two `XvcStore`s for each component type. These two stores are indexed with the same `XvcEntity` values. -For example, an `R11Store` keeps track of path metadata for the identical `XvcEntity` keys. +For example, an `R11Store` keeps track of path metadata for the identical `XvcEntity` keys. -`R1NStore` keeps parent-child relationships. -It represents a 1-N relationship between `T` and `U`. -On top of two `XvcStore`s, this one keeps track of relationships with a third `XvcStore`. +`R1NStore` keeps parent-child relationships. +It represents a 1-N relationship between `T` and `U`. +On top of two `XvcStore`s, this one keeps track of relationships with a third `XvcStore`. It lists which `U`'s are children of `T`s. -For example, a value of `XvcPipeline` can have multiple `XvcStep`s. +For example, a value of `XvcPipeline` can have multiple `XvcStep`s. These are represented with `R1NStore`. This struct has `parent-to-child` and `child-to-parent` functions that can be used get children of a parent, or parent of child element. -The third type is `RMNStore`. -This one keeps arbitrary number of relationships between `T` and `U`. -Any number of `T`s may correspond to any number of `U`s. +The third type is `RMNStore`. +This one keeps arbitrary number of relationships between `T` and `U`. +Any number of `T`s may correspond to any number of `U`s. This type of store keeps the relationships in two `XvcStore`'s. -As of this writing, this one isn't used yet in Xvc. -The above two is enough as of today, and this one is not developed more than basic functionality. -When we'll have some cross-cutting structure, e.g., steps that can be used in multiple pipelines, we can improve and use this. - -# Loading and Saving XvcEntity - -`XvcEntity` should be unique for each non-1-1-related element. -It's actually a singleton, thread-safe incrementing counter. -It should save the last value it was used. - -In multiple-user or multiple-branch scenarios, if the counter is incremented differently, it may cause havoc in the system. -Therefore the state is saved to timestamped files similar to `XvcStore`. -They are loaded and the maximum value is selected as the last value. - - diff --git a/ecs/Cargo.toml b/ecs/Cargo.toml index cece458c4..d90e6f332 100644 --- a/ecs/Cargo.toml +++ b/ecs/Cargo.toml @@ -37,12 +37,12 @@ thiserror = "^1.0" ## Misc lazy_static = "^1.4" +rand = "^0.8" [dev-dependencies] tempdir = "^0.3" jwalk = "^0.6" -rand = "^0.8" [package.metadata.cargo-udeps.ignore] normal = ["xvc-logging", "test-case", "tempdir"] diff --git a/ecs/src/ecs/hstore.rs b/ecs/src/ecs/hstore.rs index ce43c1db1..16ddf1411 100644 --- a/ecs/src/ecs/hstore.rs +++ b/ecs/src/ecs/hstore.rs @@ -238,7 +238,7 @@ impl HStore { if let Some(v) = self.get(&e) { map.insert(e, v.clone()); } else { - Error::CannotFindKeyInStore { key: e.0 }.warn(); + Error::CannotFindKeyInStore { key: e.to_string() }.warn(); } } Ok(Self { map }) diff --git a/ecs/src/ecs/mod.rs b/ecs/src/ecs/mod.rs index 36d7ffe81..0ae2b01a7 100755 --- a/ecs/src/ecs/mod.rs +++ b/ecs/src/ecs/mod.rs @@ -16,11 +16,12 @@ pub mod storable; pub mod vstore; pub mod xvcstore; +use rand::{rngs, RngCore, SeedableRng}; use std::fmt; use std::fs; use std::path::Path; use std::path::PathBuf; -use std::sync::atomic::{AtomicUsize, Ordering}; +use std::sync::atomic::{AtomicU64, Ordering}; use std::sync::Once; use std::time::SystemTime; use std::time::UNIX_EPOCH; @@ -28,35 +29,48 @@ use std::time::UNIX_EPOCH; use serde::{Deserialize, Serialize}; use xvc_logging::watch; -use crate::error::{Error as XvcError, Result as XvcResult}; +use crate::error::{Error as XvcError, Result}; /// Describes an entity in Entity Component System-sense. /// -/// It doesn't have any semantics except being a unique number for a given entity. +/// It doesn't have any semantics except being unique for a given entity. /// Various types of information (components) can be attached to this entity. /// XvcStore uses the entity as a key for the components. /// -/// It's possible to convert to `usize` back and forth. +/// It's possible to convert to `(u64, u64)` or `u128` back and forth. /// Normally, you should use [XvcEntityGenerator] to create entities. -/// It ensures that the numbers are unique and saves the last number across sessions. +/// It randomizes the first value to be unique and saves the last number across sessions. +/// This changed in 0.5. See https://github.com/iesahin/xvc/issues/198 #[derive(Debug, Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Serialize, Deserialize, Hash)] -pub struct XvcEntity(usize); +pub struct XvcEntity(u64, u64); impl fmt::Display for XvcEntity { fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result { - write!(f, "{}", self.0) + write!(f, "({}, {})", self.0, self.1) } } -impl From for XvcEntity { - fn from(e: usize) -> Self { - Self(e) +impl From<(u64, u64)> for XvcEntity { + fn from(e: (u64, u64)) -> Self { + Self(e.0, e.1) } } -impl From for usize { - fn from(e: XvcEntity) -> usize { - e.0 +impl From for XvcEntity { + fn from(e: u128) -> Self { + Self((e >> 64) as u64, e as u64) + } +} + +impl From for u128 { + fn from(e: XvcEntity) -> u128 { + ((e.0 as u128) << 64) | (e.1 as u128) + } +} + +impl From for (u64, u64) { + fn from(e: XvcEntity) -> (u64, u64) { + (e.0, e.1) } } @@ -69,7 +83,8 @@ impl From for usize { #[derive(Debug)] pub struct XvcEntityGenerator { - current: AtomicUsize, + counter: AtomicU64, + random: u64, } static INIT: Once = Once::new(); @@ -80,8 +95,8 @@ static INIT: Once = Once::new(); /// This function can only be used once in a process. /// You cannot load a second instance of the entity generator, as it will defeat its thread-safe /// uniqueness purpose. -pub fn load_generator(dir: &Path) -> XvcResult { - let mut gen: XvcResult = Err(XvcError::CanInitializeOnlyOnce { +pub fn load_generator(dir: &Path) -> Result { + let mut gen: Result = Err(XvcError::CanInitializeOnlyOnce { object: "XvcEntityGenerator".to_string(), }); INIT.call_once(|| gen = XvcEntityGenerator::load(dir)); @@ -92,8 +107,8 @@ pub fn load_generator(dir: &Path) -> XvcResult { /// /// Normally this only be used once an Xvc repository initializes. /// The starting value for entities is 1. -pub fn init_generator() -> XvcResult { - let mut gen: XvcResult = Err(XvcError::CanInitializeOnlyOnce { +pub fn init_generator() -> Result { + let mut gen: Result = Err(XvcError::CanInitializeOnlyOnce { object: "XvcEntityGenerator".to_string(), }); @@ -110,22 +125,27 @@ impl Iterator for XvcEntityGenerator { } impl XvcEntityGenerator { - fn new(start: usize) -> XvcEntityGenerator { - let current = AtomicUsize::new(0); - current.fetch_add(start, Ordering::SeqCst); - Self { current } + fn new(start: u64) -> XvcEntityGenerator { + let counter = AtomicU64::new(0); + counter.fetch_add(start, Ordering::SeqCst); + let mut rng = rngs::StdRng::from_entropy(); + let init_random = rng.next_u64(); + Self { + counter, + random: init_random, + } } /// Returns the next element by atomically incresing the current value. pub fn next_element(&self) -> XvcEntity { - XvcEntity(self.current.fetch_add(1, Ordering::SeqCst)) + XvcEntity(self.counter.fetch_add(1, Ordering::SeqCst), self.random) } - fn load(dir: &Path) -> XvcResult { + fn load(dir: &Path) -> Result { let path = most_recent_file(dir)?; match path { Some(path) => { - let current_val = fs::read_to_string(path)?.parse::()?; + let current_val = fs::read_to_string(path)?.parse::()?; Ok(Self::new(current_val)) } None => Err(XvcError::CannotRestoreEntityCounter { @@ -134,14 +154,16 @@ impl XvcEntityGenerator { } } - /// Saves the current XvcEntity value to path. - pub fn save(&self, dir: &Path) -> XvcResult<()> { - let u: usize = self.next_element().into(); + /// Saves the current XvcEntity counter to path. + /// It saves only the first (e.0) part of the entity. The second part is + /// generated per creation to randomize entities in parallel branches. + pub fn save(&self, dir: &Path) -> Result<()> { + let (counter, _) = self.next_element().into(); if !dir.exists() { fs::create_dir_all(dir)?; } let path = dir.join(timestamp()); - fs::write(path, format!("{}", u))?; + fs::write(path, format!("{}", counter))?; Ok(()) } } @@ -160,7 +182,7 @@ pub fn timestamp() -> String { /// This is used to sort timestamp named files. (See [timestamp]). /// Store files are loaded in this order to replay the changes across branches. /// TODO: Add link to book chapter. -pub fn sorted_files(dir: &Path) -> XvcResult> { +pub fn sorted_files(dir: &Path) -> Result> { if dir.exists() { let mut files: Vec = fs::read_dir(dir)? .filter_map(|e| match e { @@ -180,7 +202,7 @@ pub fn sorted_files(dir: &Path) -> XvcResult> { /// This one returns the most recent timestamp named file. /// It gets files with [sorted_files] and returns the last one if there is one. /// If there are no files in a directory, this returns `Ok(None)`. -pub fn most_recent_file(dir: &Path) -> XvcResult> { +pub fn most_recent_file(dir: &Path) -> Result> { watch!(dir); if !dir.exists() { return Ok(None); @@ -233,23 +255,23 @@ mod tests { use xvc_logging::setup_logging; #[test] - fn test_init() -> XvcResult<()> { + fn test_init() -> Result<()> { let gen = init_generator()?; - assert_eq!(gen.current.load(Ordering::SeqCst), 1); - assert_eq!(gen.next_element(), XvcEntity(1)); - assert_eq!(gen.next_element(), XvcEntity(2)); + assert_eq!(gen.counter.load(Ordering::SeqCst), 1); + assert_eq!(gen.next_element().0, 1); + assert_eq!(gen.next_element().0, 2); let gen2 = init_generator(); assert!(matches!(gen2, Err(XvcError::CanInitializeOnlyOnce { .. }))); Ok(()) } #[test] - fn test_load() -> XvcResult<()> { + fn test_load() -> Result<()> { setup_logging(Some(LevelFilter::Trace), None); let tempdir = TempDir::new("test-xvc-ecs")?; let gen_dir = tempdir.path().join("entity-gen"); fs::create_dir_all(&gen_dir)?; - let r: usize = rand::random(); + let r: u64 = rand::random(); let gen_file_1 = gen_dir.join(timestamp()); fs::write(&gen_file_1, format!("{}", r))?; sleep(Duration::from_millis(1)); @@ -259,13 +281,22 @@ mod tests { let gen_file_3 = gen_dir.join(timestamp()); fs::write(&gen_file_3, format!("{}", r + 2000))?; let gen = XvcEntityGenerator::load(&gen_dir)?; - assert_eq!(gen.current.load(Ordering::SeqCst), r + 2000); - assert_eq!(gen.next_element(), XvcEntity(r + 2000)); - assert_eq!(gen.next_element(), XvcEntity(r + 2001)); - assert_eq!(gen.next_element(), XvcEntity(r + 2002)); + assert_eq!(gen.counter.load(Ordering::SeqCst), r + 2000); + assert_eq!(gen.next_element().0, (r + 2000)); + assert_eq!(gen.next_element().0, (r + 2001)); + assert_eq!(gen.next_element().0, (r + 2002)); gen.save(&gen_dir)?; - let new_val = fs::read_to_string(most_recent_file(&gen_dir)?.unwrap())?.parse::()?; + let new_val = fs::read_to_string(most_recent_file(&gen_dir)?.unwrap())?.parse::()?; assert_eq!(new_val, r + 2003); Ok(()) } + + #[test] + fn test_from_to() -> Result<()> { + let e1 = XvcEntity(1, 2); + let u1: u128 = e1.into(); + let e2 = XvcEntity::from(u1); + assert_eq!(e1, e2); + Ok(()) + } } diff --git a/ecs/src/ecs/r11store.rs b/ecs/src/ecs/r11store.rs index dad448692..1cf7275fe 100644 --- a/ecs/src/ecs/r11store.rs +++ b/ecs/src/ecs/r11store.rs @@ -64,7 +64,7 @@ where /// ``` /// # use xvc_ecs::{R11Store, XvcEntity}; /// # let mut rs = R11Store::::new(); - /// let entity: XvcEntity = 100usize.into(); + /// let entity: XvcEntity = (100u64, 200u64).into(); /// rs.insert(&entity, "left component".into(), "right component".to_string()); /// ``` @@ -77,7 +77,7 @@ where /// ``` /// # use xvc_ecs::{R11Store, XvcEntity}; /// # let mut rs = R11Store::::new(); - /// let entity: XvcEntity = 100.into(); + /// let entity: XvcEntity = (100, 200).into(); /// rs.insert(&entity, "left component".into(), "right component".into()); /// ``` pub fn right_to_left(&self, entity: &XvcEntity) -> Option<(&XvcEntity, &T)> { @@ -88,7 +88,7 @@ where /// ``` /// # use xvc_ecs::{R11Store, XvcEntity}; /// # let mut rs = R11Store::::new(); - /// let entity: XvcEntity = 100.into(); + /// let entity: XvcEntity = (100, 200).into(); /// rs.insert(&entity, "left component".into(), "right component".into()); /// let t = rs.tuple(&entity); /// ``` @@ -181,7 +181,7 @@ mod test { #[test] fn test_insert() -> Result<()> { let mut rs = R11Store::::new(); - let entity: XvcEntity = 100.into(); + let entity: XvcEntity = (100, 12830912380).into(); rs.insert(&entity, "left component".into(), "right component".into()); assert!(rs.left[&entity] == "left component"); assert!(rs.right[&entity] == "right component"); @@ -191,30 +191,30 @@ mod test { #[test] fn test_left_to_right() -> Result<()> { let mut rs = R11Store::::new(); - let entity: XvcEntity = 100usize.into(); + let entity: XvcEntity = (100, 218021380921).into(); rs.insert( &entity, "left component".into(), "right component".to_string(), ); assert!(rs.left_to_right(&entity) == Some((&entity, &"right component".to_string()))); - assert!(rs.left_to_right(&(101usize.into())) == None); + assert!(rs.left_to_right(&(101, 921309218309).into()) == None); Ok(()) } #[test] fn test_right_to_left() -> Result<()> { let mut rs = R11Store::::new(); - let entity: XvcEntity = 100usize.into(); + let entity: XvcEntity = (100, 128012389012).into(); rs.insert(&entity, "left component".into(), "right component".into()); assert!(rs.right_to_left(&entity) == Some((&entity, &"left component".to_string()))); - assert!(rs.right_to_left(&101usize.into()) == None); + assert!(rs.right_to_left(&(101, 8120938120931).into()) == None); Ok(()) } #[test] fn test_tuple() -> Result<()> { let mut rs = R11Store::::new(); - let entity: XvcEntity = 100usize.into(); + let entity: XvcEntity = (100, 123980123819203).into(); rs.insert(&entity, "left component".into(), "right component".into()); let t = rs.tuple(&entity); assert!(t.0.as_deref() == Some(&"left component".to_string())); diff --git a/ecs/src/ecs/r1nstore.rs b/ecs/src/ecs/r1nstore.rs index 6381083f4..afd5bff0d 100644 --- a/ecs/src/ecs/r1nstore.rs +++ b/ecs/src/ecs/r1nstore.rs @@ -109,7 +109,7 @@ where pub fn parent_of(&self, child_entity: &XvcEntity) -> Result<(&ChildEntity, &T)> { match self.child_parents.get(child_entity) { None => Err(Error::NoParentEntityFound { - entity: (*child_entity).into(), + entity: (*child_entity), }), Some(p_e) => { let (_, v) = diff --git a/ecs/src/ecs/vstore.rs b/ecs/src/ecs/vstore.rs index b97698a07..ca8a798e8 100644 --- a/ecs/src/ecs/vstore.rs +++ b/ecs/src/ecs/vstore.rs @@ -190,8 +190,8 @@ mod test { #[test] fn new() -> Result<()> { let mut store = VStore::::new(); - store.insert(0.into(), "0".into()); - store.insert(1.into(), "1".into()); + store.insert((0, 12398012938).into(), "0".into()); + store.insert((1, 12398012938).into(), "1".into()); assert_eq!(store.len(), 2); assert_eq!(store.vec.pop().unwrap().1, "1".to_string()); diff --git a/ecs/src/ecs/xvcstore.rs b/ecs/src/ecs/xvcstore.rs index 7590beac0..01f9b08f5 100644 --- a/ecs/src/ecs/xvcstore.rs +++ b/ecs/src/ecs/xvcstore.rs @@ -213,7 +213,7 @@ where if let Some(v) = self.map.get(&e) { store.map.insert(e, v.clone()); } else { - Error::CannotFindKeyInStore { key: e.0 }.warn(); + Error::CannotFindKeyInStore { key: e.to_string() }.warn(); } } Ok(store) @@ -365,12 +365,12 @@ mod test { #[test] fn new() -> Result<()> { let mut store = XvcStore::::new(); - store.insert(0.into(), "0".into()); - store.insert(1.into(), "1".into()); + store.insert((0, 123).into(), "0".into()); + store.insert((1, 123).into(), "1".into()); assert_eq!(store.len(), 2); - assert_eq!(*store.get(&XvcEntity(0)).unwrap(), String::from("0")); - assert_eq!(*store.get(&XvcEntity(1)).unwrap(), String::from("1")); + assert_eq!(*store.get(&XvcEntity(0, 123)).unwrap(), String::from("0")); + assert_eq!(*store.get(&XvcEntity(1, 123)).unwrap(), String::from("1")); Ok(()) } @@ -381,9 +381,9 @@ mod test { let mut store = XvcStore::::new(); - store.insert(0.into(), "0".into()); - store.insert(1.into(), "1".into()); - store.insert(2.into(), "2".into()); + store.insert((0, 123).into(), "0".into()); + store.insert((1, 123).into(), "1".into()); + store.insert((2, 123).into(), "2".into()); store.to_dir(&dir)?; diff --git a/ecs/src/error.rs b/ecs/src/error.rs index 69231a349..a6e615fa8 100644 --- a/ecs/src/error.rs +++ b/ecs/src/error.rs @@ -60,11 +60,11 @@ pub enum Error { #[error("Multiple keys for value found: {value}")] MultipleCorrespondingKeysFound { value: String }, #[error("Cannot find a related entity: {entity}")] - NoParentEntityFound { entity: usize }, + NoParentEntityFound { entity: XvcEntity }, #[error("More than one root entity found in an 1-N relation")] MoreThanOneParentFound { entity: usize }, #[error("Cannot find key in store: {key}")] - CannotFindKeyInStore { key: usize }, + CannotFindKeyInStore { key: String }, #[error("Internal Store Conversion Error")] StoreConversionError, #[error("Can initialize {object} only once")] diff --git a/file/src/common/compare.rs b/file/src/common/compare.rs index 695a663f0..9c8736395 100644 --- a/file/src/common/compare.rs +++ b/file/src/common/compare.rs @@ -584,12 +584,16 @@ pub fn diff_dir_content_digest( for xe in sorted_entities { let xvc_content_diff = xvc_content_diff .get(xe) - .ok_or(EcsError::CannotFindEntityInStore { entity: *xe })?; + .ok_or(EcsError::CannotFindKeyInStore { + key: xe.to_string(), + })?; match xvc_content_diff { Diff::Identical | Diff::Skipped => { - let content = stored_xvc_content_store - .get(xe) - .ok_or(xvc_ecs::error::Error::CannotFindEntityInStore { entity: *xe })?; + let content = stored_xvc_content_store.get(xe).ok_or( + xvc_ecs::error::Error::CannotFindKeyInStore { + key: xe.to_string(), + }, + )?; content_digest_bytes.extend(content.0.expect("digest").digest); } Diff::RecordMissing { actual } => { @@ -601,7 +605,7 @@ pub fn diff_dir_content_digest( Diff::ActualMissing { .. } => { // This is to make sure the content digest is different when // all records are missing or their order has changed. - let entity_bytes: usize = (*xe).into(); + let entity_bytes: u128 = (*xe).into(); let mut entity_bytes_as_digest = Vec::from([0u8; DIGEST_LENGTH]); entity_bytes_as_digest.copy_from_slice(&entity_bytes.to_le_bytes()); content_digest_bytes.extend( diff --git a/logging/CHANGELOG.md b/logging/CHANGELOG.md new file mode 100644 index 000000000..356aaf55a --- /dev/null +++ b/logging/CHANGELOG.md @@ -0,0 +1,138 @@ +# Introduction + +This document is a change log that I write for the project, as I develop. It's a +tree and subtasks are marked with indentation. + +## v0.5.0 + +- Refactor XvcEntity to `(u64, u64)` + - Issue: + - PR: + - [x] `From` and `Into` + - [x] `From<(u64, u64)>` and `Into<(u64, u64)>` + - [x] Tests + - [x] Add tests for `From` and `Into` ecs/src/ecs/mod.rs + - [x] Fix doc tests that use `100usize` to create `XvcEntity` + - [x] Update the ECS documentation + - [x] Update arch/ecs.md + - [x] Search for any `XvcEntity` references that may be changed +- [x] `xvc-test-helper` binary is not produced at builds + - [x] Moved it from dev-dependencies to dependencies in workflow_tests/Cargo.toml + - [x] Still doesn't work 🛑 + - [x] We need binary dependencies in cargo: , + - [x] It's available in nightly: + - [x] Revert to dev-dependencies + - [x] `z_test_docs` fails immediately if no `xvc-test-helper` binary is found. + - [x] Run the tests without `-p workflow_tests` + - [x] Hypothesis: The reason the test helper binary is not produced is that we run only `workflow_tests` crate. + - [x] Looks this hypothesis is not correct. + - [x] The best way seems to be adding + and building the binary before + the doc tests. + - Now builds the binary before running the doc tests. ✅ +- [x] Write pipelines code documentation + - [ ] + +## v0.4.2 + +- `xvc file carry-in` + - PR + - `xvc file list` debugging + - Fixed slicing bug ✅ + - Recursive option + - If not given all files including the ignored files will be reported. + - Ignored files will be reported with file type `I` + - Add `G` for as a file type for git-tracked files. + - `DX 224 2022-12-31 08:21:11 dir-0001/dir-0001 rcd \n` + - Fix `rcd` ✅ + - Count lines in the result + - I think it's better to write all of this as a doc test +- create a `xvc-test-helper create-directory-hierarchy` command. + - Add a main.rs to xvc-test-helper ✅ + - Add clap to parse CLI + - Add subcommands ✅ + - create directory tree + - random dir name --prefix str --seed u64 + - random temp dir --prefix str + - seeded temp dir --seed u64 + - create temp dir + - run in temp dir + - run in temp git dir + - create temp git dir + - generate random file filename size + - generate filled file filename size byte + - generate random text file filename num_lines + - Add to doc-tests + - added with `cargo_bin!` ✅ + - began to add `xvc-file-list.md` + - Open doc test results in a directory + - Use neovim for this + - It looks we need to update directory permissions in the cache too + - updated move_to_cache function + - fix recheck errors + - it looks recheck doesn't check whether the file is changed before trying to checkout + - do we use `--text-or-binary` option to update the file? + - removed the option from help text ✅ + - I think we need a `DEBUG` level in XvcOutput for otherwise irrelevant information + - Added debug option to XvcOutputLine + - Changed all noisy output to debug! ✅ + - fix `carry-in` errors + - updated outputs + - there seems to be a bug to update the stores + - add watches for several places. + - the bug was about missing configuration keys. + - it must warn/panic when the keys are not there. + - all machinery is there, it must report error, but doesn't. + - there seems to be a bug in xvc list output about cached/workspace sizes + - yes, there was. fixed the summary. ✅ + - started moving `test_file_list.rs` to document test. + - `--recheck-as` option must be introduced instead of `--cache-type`. + - there is a bug in `track` when `--cache-type` is given. 🐛 + - pmm doesn't contain directory contents + - fixed ✅ + - the sorting for timestamp and size are not working + - fixed ✅ + - if a field is blank or None, it should print spaces. + - Done for size and timestamp ✅ + - Why the cache size is empty when they are not reported + - Fixed. Loads the rec content digests always now. ✅ + - We need more tests for other sorting options to increase coverage perhaps. + - removed older tests and added only the sorting test to xvc file list wf tests + - tests in ref md is larger than this file anyway. + - Listing only the changed. + - As a status command. + - Fix `xvc file hash` tests + - create directory tree needs an option to create random files or filled files + - update all uses ✅ + - modify test helper to have this option ✅ + - Fix `xvc file list` tests + - Fix counting and sorting tests ✅ + - Could we have file, line, function etc in panic! / error! macros? + - Modified and did this ✅ + - Fix `xvc file recheck parallel` tests + - There is a failing command, which one? + - It looks like a plain recheck after hardlink + - The target permissions should be removed + - The bug seems to be in `xvc file track` + - There is a gitignore bug + - Fixed it by using the targets directly + - The failure is in cleanup, about permissions. + - Delete files and directories one by one + - Deleted by shell ✅ + - Fix `xvc root` + - `--debug` should only determine the xvc.log output + - changed output in `run_xvc` fn ✅ + - Fix `xvc pipeline export` tests + - There must be sorting in the output, as we changed the stores to HStore ✅ + - Fix `xvc pipeline import` tests + - The same changes, ordering of elements changed ✅ + - Fix `xvc pipeline run` tests + - The example repository again and again ✅ + - Fix `xvc storage generic fs` tests + - Where is the actual error? + - It was about removing the repos + - Fix `xvc storage local` tests ✅ + - Cache operations from storages should be done on temp dir and _move to cache_ must be used for all + - This is to keep permission operations correct + - I did this in the trait ✅ + - Modified all receive functions to return a temp dir ✅ diff --git a/pipeline/src/lib.rs b/pipeline/src/lib.rs index 79e21db39..e13cfd93d 100755 --- a/pipeline/src/lib.rs +++ b/pipeline/src/lib.rs @@ -1,8 +1,8 @@ //! Pipeline management commands and data structures //! //! This contains CLI structs for `xvc pipeline` subcommands, [`init`] function to -//! run during `xvc init` for pipeline related initialization, [`run`] function -//! to dispatch the options to subcommands. +//! run during `xvc init` for pipeline related initialization, [`cmd_pipeline`] +//! and [`handle_step_cli`] functions to dispatch the options to subcommands. #![warn(missing_docs)] #![forbid(unsafe_code)] pub mod error; diff --git a/pipeline/src/pipeline/mod.rs b/pipeline/src/pipeline/mod.rs index 14e0e4736..23e6459d3 100644 --- a/pipeline/src/pipeline/mod.rs +++ b/pipeline/src/pipeline/mod.rs @@ -15,6 +15,7 @@ use crate::deps::{dependencies_to_path, dependency_paths}; use crate::error::{Error, Result}; use crate::{XvcPipeline, XvcPipelineRunDir}; +use chrono::Utc; use crossbeam_channel::{Receiver, Sender}; use xvc_walker::notify::{make_watcher, PathEvent}; @@ -37,8 +38,7 @@ use xvc_core::{ TextOrBinary, XvcFileType, XvcMetadata, XvcPath, XvcPathMetadataMap, XvcRoot, }; -use xvc_ecs::{persist, HStore, R1NStore, XvcEntity, XvcStore}; -use xvc_logging::watch; +use xvc_ecs::{persist, HStore, R1NStore, XvcEntity}; use sp::ExitStatus; use subprocess as sp; @@ -386,7 +386,6 @@ pub fn the_grand_pipeline_loop(xvc_root: &XvcRoot, pipeline_name: String) -> Res step_timeout: &step_timeouts[step_e], pipeline_rundir: &pipeline_rundir, }; - let params_debug = params.clone(); let r_next_state = match step_s { XvcStepState::Begin(s) => s_begin(s, params), XvcStepState::NoNeedToRun(s) => s_no_need_to_run(s, params), @@ -600,7 +599,7 @@ fn s_checking_timestamps(s: &CheckingTimestampsState, params: StateParams) -> Re }); let min_out_ts = out_paths.fold( - Some(SystemTime::from(chrono::MAX_DATETIME)), + Some((chrono::DateTime::::MAX_UTC).into()), |opt_st, (path, md)| match md { None => { Error::PathNotFoundInPathMetadataMap { diff --git a/pipeline/src/pipeline/outs.rs b/pipeline/src/pipeline/outs.rs index 765bb579c..233eaf07c 100644 --- a/pipeline/src/pipeline/outs.rs +++ b/pipeline/src/pipeline/outs.rs @@ -11,15 +11,23 @@ use crate::error::{Error, Result}; use serde::{Deserialize, Serialize}; +/// Possible formats for recognized metrics formats. +/// Metrics files are where the pipeline writes its output in a structured format. +/// We can read these files and use them to generate reports #[derive(Debug, Clone, Copy, Eq, PartialEq, Serialize, Deserialize, Ord, PartialOrd)] pub enum XvcMetricsFormat { + /// Unknown format, we don't know how to read it Unknown, + /// Comma,separated,values CSV, + /// JavaScript Object Notation JSON, + /// Tab separated values TSV, } impl XvcMetricsFormat { + /// Decide the format from extension of the given path pub fn from_path(path: &Path) -> Self { match path .extension() @@ -36,23 +44,36 @@ impl XvcMetricsFormat { } } +/// Possible outputs for the pipeline. +/// +/// These outputs can be defined with `xvc pipeline output` command. #[derive(Debug, Clone, Eq, PartialEq, Serialize, Deserialize, Display, PartialOrd, Ord)] pub enum XvcOutput { + /// A (possibly binary) file. File { + /// Path to the file path: XvcPath, }, + /// A textual metrics file with a known [`XvcMetricsFormat`] Metric { + /// Path to the file path: XvcPath, + /// Format of the file format: XvcMetricsFormat, }, + /// An image file, like a plot or generated file Image { + /// Path to the file path: XvcPath, + // TODO: Should we add a `format` field here? }, + // TODO: We can add `Model` here. } persist!(XvcOutput, "xvc-output"); impl From for XvcPath { + /// Return the path of a given output fn from(out: XvcOutput) -> XvcPath { match out { XvcOutput::File { path } => path, @@ -63,6 +84,7 @@ impl From for XvcPath { } impl From<&XvcOutput> for XvcPath { + /// Return the path of a given output fn from(out: &XvcOutput) -> XvcPath { match out { XvcOutput::File { path } => path.clone(), @@ -73,6 +95,7 @@ impl From<&XvcOutput> for XvcPath { } impl XvcOutput { + /// Used to check whether pipeline / step output is changed (or missing.) pub fn fs_metadata(&self, xvc_root: &XvcRoot) -> Result { let xvc_path: XvcPath = self.into(); let abs_path = xvc_path.to_absolute_path(xvc_root); diff --git a/pipeline/src/pipeline/schema.rs b/pipeline/src/pipeline/schema.rs index 11072ec56..383eb0ba9 100644 --- a/pipeline/src/pipeline/schema.rs +++ b/pipeline/src/pipeline/schema.rs @@ -35,19 +35,36 @@ impl XvcSchemaSerializationFormat { } } +/// Defines the user editable pipeline schema used in `xvc pipeline export` and +/// `xvc pipeline import` commands. #[derive(Debug, Clone, Eq, PartialEq, Serialize, Deserialize)] pub struct XvcPipelineSchema { + /// Version of the schema, currently 1. pub version: i32, + /// Name of the pipeline. + /// Note that this can also be specified in CLI with `--name` flag and it + /// supersedes this value. pub name: String, + /// Path to the pipeline root directory. pub workdir: XvcPath, + /// List of steps in the pipeline. pub steps: Vec, } +/// User editable pipeline step schema used in `xvc pipeline export` and `xvc +/// pipeline import` commands. #[derive(Debug, Clone, Eq, PartialEq, Serialize, Deserialize)] pub struct XvcStepSchema { + /// Name of the step. pub name: String, + /// Command to run in the step. pub command: String, + /// When we consider the step as changed? pub invalidate: XvcStepInvalidate, + /// List of dependencies of the step. + /// These do not require a separate schema. pub dependencies: Vec, + /// List of outputs of the step. + /// These do not require a separate schema. pub outputs: Vec, } diff --git a/pipeline/src/pipeline/step.rs b/pipeline/src/pipeline/step.rs index e706cf3fe..13ef40244 100644 --- a/pipeline/src/pipeline/step.rs +++ b/pipeline/src/pipeline/step.rs @@ -7,14 +7,17 @@ use serde::{Deserialize, Serialize}; use xvc_core::XvcRoot; use xvc_ecs::{persist, XvcEntity}; +/// A step (stage) in a pipeline. #[derive(Debug, Clone, Eq, PartialEq, Serialize, Deserialize, Ord, PartialOrd)] pub struct XvcStep { + /// Name of the step pub name: String, } persist!(XvcStep, "xvc-step"); impl XvcStep { + /// Search for a step with the given name in the given pipeline. pub fn from_name( xvc_root: &XvcRoot, pipeline_e: &XvcEntity, @@ -33,7 +36,8 @@ impl XvcStep { }), } } - #[allow(dead_code)] + + /// Search for a step with the given entity in the given pipeline. pub fn from_entity( xvc_root: &XvcRoot, pipeline_e: &XvcEntity, @@ -50,6 +54,7 @@ impl XvcStep { } } +// TODO: Link to the Documentation after it's written: https://github.com/iesahin/xvc/issues/202 state_machine! { XvcStepState { InitialStates { Begin } diff --git a/test_helper/Cargo.toml b/test_helper/Cargo.toml index d926f4a14..9fe1978bd 100644 --- a/test_helper/Cargo.toml +++ b/test_helper/Cargo.toml @@ -12,7 +12,6 @@ keywords = ["file", "devops", "git", "versioning", "mlops"] [lib] name = "xvc_test_helper" -type = ["rlib"] [[bin]] name = "xvc-test-helper" diff --git a/workflow_tests/Cargo.toml b/workflow_tests/Cargo.toml index 1c5e38436..6a0852ae5 100644 --- a/workflow_tests/Cargo.toml +++ b/workflow_tests/Cargo.toml @@ -65,9 +65,9 @@ test-generic-rsync = [] [dev-dependencies] proptest = "^1.0" test-case = "^2.2" -xvc-test-helper = {path = "../test_helper/"} globset = "^0.4" - +escargot = "^0.5" +xvc-test-helper = { version = "0.4.2-alpha.8", path = "../test_helper" } shellfn = "^0.1" jwalk = "^0.6" anyhow = "^1.0" diff --git a/workflow_tests/tests/test_pipeline_import.rs b/workflow_tests/tests/test_pipeline_import.rs index 79d38c3b1..be0e171eb 100644 --- a/workflow_tests/tests/test_pipeline_import.rs +++ b/workflow_tests/tests/test_pipeline_import.rs @@ -137,7 +137,7 @@ fn test_pipeline_import() -> Result<()> { let command = all_commands .left_to_right(step_e) .ok_or(xvc_ecs::error::Error::CannotFindKeyInStore { - key: (*step_e).into(), + key: step_e.to_string(), })? .1; watch!(command); diff --git a/workflow_tests/tests/z_test_docs.rs b/workflow_tests/tests/z_test_docs.rs index 25f3452ff..d44482be7 100644 --- a/workflow_tests/tests/z_test_docs.rs +++ b/workflow_tests/tests/z_test_docs.rs @@ -118,7 +118,16 @@ fn link_to_docs() -> Result<()> { fn z_doc_tests() -> Result<()> { link_to_docs()?; - let path_to_xvc_test_helper = cargo_bin!("xvc").parent().unwrap().join("xvc-test-helper"); + let xvc_th = escargot::CargoBuild::new() + .bin("xvc-test-helper") + .current_release() + .current_target() + .manifest_path("../test_helper/Cargo.toml") + .run() + .map_err(|e| anyhow!("Failed to build xvc-test-helper: {e:?}"))?; + + let path_to_xvc_test_helper = xvc_th.path().to_path_buf(); + assert!(path_to_xvc_test_helper.exists()); trycmd::TestCases::new() .register_bin("xvc-test-helper", &path_to_xvc_test_helper)