Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feedback needed #48

Closed
trustmaster opened this issue Apr 14, 2018 · 9 comments
Closed

Feedback needed #48

trustmaster opened this issue Apr 14, 2018 · 9 comments
Assignees

Comments

@trustmaster
Copy link
Owner

trustmaster commented Apr 14, 2018

Hello fellow Gophers!

I apologise as this project slipped out of my scope for several years.

I still have some ideas and plans of maintaining it, but I need some feedback on how it is going to be used by people who actually tried it. So, I would really appreciate your answers to the following questions in this thread or in any other form (e.g. via email, please find the address in my profile).

Questions:

  1. What kind of application did you use GoFlow for? (E.g. bioinformatics, ETL, web backend, IoT, etc.)
  2. Did you use it for personal, business, or academic purposes?
  3. Do you prefer working with graphs in visual or text form?
  4. Which visual tools have you used and which ones do you prefer?
  5. Do you prefer a Component to have main Loop(), or do you prefer setting up handler functions on each port (OnPortName())?
  6. Do you prefer processes to stay resident in memory or started and stopped on demand?
  7. Have you ever used State Locks, Synchronous Mode, Worker Pool, or other advanced features of GoFlow?
  8. Please tell me what you liked about GoFlow and what you would like to be added or changed.

Why this is important

As you might have noticed, this codebase is a bit dated. In fact, it was written in 2011 and didn't change much ever since. My own views on how an FBP library should work have changed over time. So, I think this library deserves a rewrite.

My views can be similar or different from yours, while I'm not building this library only for myself. That's why feedback is appreciated so much.

Thank you for participating!

@trustmaster trustmaster self-assigned this Apr 14, 2018
@trustmaster
Copy link
Owner Author

@abferm @sascha-andres @lrgar @manadart @mtojek @btittelbach @davidkbainbridge @kortschak @samuell @josi-asae @seanward @phiros @lovromazgon Your feedback as a contributor is especially appreciated!

@manadart
Copy link

manadart commented Apr 16, 2018

First of all, a big thank-you for contributing this to the community.

  1. We used this in a data-processing pipeline, receiving data parsed by a Python application. It did (does) validation, cache maintenance, matching against multiple other databases and resulting updates.

  2. Business.

  3. We did not use visual aids to graphing.

  4. (See above)

  5. We almost always used OnPortName()(see 7. below for more info).

  6. For our specific case, we wanted 24/7 uptime - no specific case for stopping/starting on demand. For maintenance, we closed the input channel and waited for the last data to be processed. Our cache maintained enough state to spin back up in good order.

  7. Our use case was very large volumes of data, so we sometimes used synchronous mode or a worker pool type component just to limit the data coming through, for observation.

  8. GoFlow made it easy for us to write logic steps and control flow decisions as simple components and then have them wired up in a single place, so it is nice in the modular sense, and also in the system overview sense too.
    It is very nice also, to delegate the concurrency/async mechanics to the library and just think in terms of steps, decisions and an input channel.
    Lastly, performance. We had powerful servers to run this on, so it just scales out (per machine) based on hardware limitations. One might choose something different (like https://github.com/AsynkronIT/protoactor-go) for a distributed pipeline, but for single-machine, this library is very convenient.

I have actually left the company where I worked on this, but it was with @tonygallagher and @lrgar. They might have more to add. I seem to recall an issue connecting multiple components to an exit port. We got around this by adding an aggregation component before the graph exit.

@samuell
Copy link

samuell commented Apr 16, 2018

Hi @trustmaster, I want to also thank you so much for contributing this!

I've had a lot of fun by playing with GoFlow for bioinformatics use cases, and also learned a lot both about Go and some FBP, by studying it.

To answer the questions one at a time:

  1. What kind of application did you use GoFlow for? (E.g. bioinformatics, ETL, web backend, IoT, etc.)

I've did experimentation with bioinformatics components using it (like this). I eventually got worried about performance of reflection though, and has since explored a way to build FBP-like programs using only plain channels (this is ongoing work in my scipipe and flowbase libraries).

  1. Did you use it for personal, business, or academic purposes?

Academic.

  1. Do you prefer working with graphs in visual or text form?

Prefer to work mainly in text form. I feel graphs can be a very useful addition for sketching at design time, and for presentation though.

  1. Which visual tools have you used and which ones do you prefer?

I've used JPM's fbpdraw a bit for presentation purposes. I like that it is simple and just works. Haven't had time / patience for setting up any more complicated tools.

  1. Do you prefer a Component to have main Loop(), or do you prefer setting up handler functions on each port (OnPortName())?

I prefer a central Loop(), as bioinformatics tools often need to gather data from multiple inputs for each operation.

  1. Do you prefer processes to stay resident in memory or started and stopped on demand?

In our work, pipelines runs have had a clear start and finish time, so on-demand stop and start has not been something we saw a need for.

  1. Have you ever used State Locks, Synchronous Mode, Worker Pool, or other advanced features of GoFlow?

No.

  1. Please tell me what you liked about GoFlow and what you would like to be added or changed.

I like that it is Go code, as that enables re-use of Go-tooling. I also found the API natural and easy to to understand and work with.

I'm worried about the performance hit from reflection though for data intensive pipelines. If there is any way to avoid reflection on each data read, e.g. by just using reflection to do set up channels, which are then used for the data communication, that would be great.

Keep up the great work!

@trustmaster
Copy link
Owner Author

@manadart @samuell thank you for your feedback! After collecting some more responses, I'm going to sum them up and make a proposal for a new version of GoFlow.

@samuell regarding your concern about the use of reflection, in the latest version it's mostly used to wire up the channels. The only thing that is used on each data read is passing the arbitrary data as reflect.Value, which is done to allow handler functions to have precise data types in their signature. The alternative is using interface{} as input in all handler functions, which is underneath very similar to how reflection works. On the other hand, I'm currently favouring more bare-bones components which decide how to read from their channels themselves (e.g. this).

@samuell
Copy link

samuell commented Apr 17, 2018

@samuell regarding your concern about the use of reflection, in the latest version it's mostly used to wire up the channels. The only thing that is used on each data read is passing the arbitrary data as reflect.Value, which is done to allow handler functions to have precise data types in their signature. The alternative is using interface{} as input in all handler functions, which is underneath very similar to how reflection works. On the other hand, I'm currently favouring more bare-bones components which decide how to read from their channels themselves (e.g. this).

Ah, interesting, I should have another deep look at the code!

@erdelmaero
Copy link

Fist of all: Great work! It's much fun to use the package!

What kind of application did you use GoFlow for? (E.g. bioinformatics, ETL, web backend, IoT, etc.)
IoT

Did you use it for personal, business, or academic purposes?
I want to use it for business.

Do you prefer working with graphs in visual or text form?
I prefer working with visual graphs.

Which visual tools have you used and which ones do you prefer?
Till now, I've only used text form.

Do you prefer a Component to have main Loop(), or do you prefer setting up handler functions on each port (OnPortName())?
I prefer handler functions.

Do you prefer processes to stay resident in memory or started and stopped on demand?
Resident in memory.

Have you ever used State Locks, Synchronous Mode, Worker Pool, or other advanced features of GoFlow?
I only testet the library so far. But soon I will try this modes.

Please tell me what you liked about GoFlow and what you would like to be added or changed.
I really like the way how flow based programs are structured, and I would really like to see this package maintained in the future, so we can start using it in production!

@roscopecoltran
Copy link

Hi guys,

For my part, it would be much more for an advanced ETL/node-based pipeline. It would get lots of traction wiht DevOps as it could made easier and better to aggregate data (apis, logs) or to process data to import with dynamic/composable pipelines, chained by event triggers or conditions. It would also awesome for building smart/refined datasets for AI and Deep Learning.

Features

  • Node-based tasks
  • Data aggregation (APIs Aggregator, Gateway)
  • Ready to use units/builtin plugins/go plugins (*.so)/rpc/http plugins
  • Easy to defined new units (template based with generator)
  • Remote build trigger (MQ, Events)
  • Scheduled builds/tasks
  • Real-time build logs
  • Elegant user interface (noflo-ui or node-red)
  • Responsive UI
  • Docker based

Example by video

Pipeline Video
ScreenShot
Gateway

Refs:

Cheers,

@trustmaster trustmaster mentioned this issue Oct 31, 2018
3 tasks
@dahvid
Copy link

dahvid commented Sep 3, 2020

Hi Trustmaster,
I just started a project in Go, so I was looking for what had been done already in terms of DataFlow. Here are my answers to your questions, clearly two years later than the others!

What kind of application did you use GoFlow for? (E.g. bioinformatics, ETL, web backend, IoT, etc.)

At this point it would be for running a reactive systemd daemon that schedules and alters resource allocations for health-care applications sharing resources on a single host. But I have a similar product, not yet opened sourced called "coolflow" which is being used in-house for AI applications, it is in python, so not suitable for real-time applications
Did you use it for personal, business, or academic purposes?
This would be for business
Do you prefer working with graphs in visual or text form?
Text form, in previous projects I've used graph manipulation and generation to prepare complicated graphs for final execution. As such having graphs in python format was a big win, even though the execution was in C++/MPI or CUDA.

Do you prefer a Component to have main Loop(), or do you prefer setting up handler functions on each port (OnPortName())?

I prefer a main loop with AND semantics, that is component does not Process() until data exists on all input channels. I realize that OR semantics for components is more general, so I think creating AND components out of OR components should be easy
Do you prefer processes to stay resident in memory or started and stopped on demand?
Resident in memory, this is because the processes I use often call into GPU's, so they need to store state between executions
Have you ever used State Locks, Synchronous Mode, Worker Pool, or other advanced features of GoFlow?
I'm a newbe, but I'm sure I would end up using all of these
Please tell me what you liked about GoFlow and what you would like to be added or changed.
**One thing that breaks the graph paradigm is error handling, If a component enters an error state it should be able to send a message and the graph should stop processing in a known state. Unfortunately in a pure graph semantics this means every component has a connection to an "error" component. **

@dahvid
Copy link

dahvid commented Sep 3, 2020

One other thing I forgot. That is having a plug-in architecture where pre-compiled components can be read in along with graph definitions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants