Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop smartos support? #43457

Closed
bnoordhuis opened this issue Jun 17, 2022 · 35 comments
Closed

Drop smartos support? #43457

bnoordhuis opened this issue Jun 17, 2022 · 35 comments
Labels
build Issues and PRs related to build files or the CI. discuss Issues opened for discussions and feedbacks. smartos Issues and PRs related to the SmartOS platform.

Comments

@bnoordhuis
Copy link
Member

bnoordhuis commented Jun 17, 2022

Suggestion: just drop it?

@bnoordhuis bnoordhuis added build Issues and PRs related to build files or the CI. smartos Issues and PRs related to the SmartOS platform. labels Jun 17, 2022
@richardlau
Copy link
Member

Suggestion: just drop it?

cc @nodejs/platform-smartos @bahamat

@sxa
Copy link
Member

sxa commented Jun 17, 2022

Can we check the download numbers on it? It could be moved to unofficial status, although we cross-compile most of them so may require extra effort. I would imagine that even though libuv formally dropped support it would still be kept working by virtue of them still supporting SunOS, as you say it should really have someone who would be willing to maintain it ...

@richardlau
Copy link
Member

Can we check the download numbers on it? It could be moved to unofficial status, although we cross-compile most of them so may require extra effort. I would imagine that even though libuv formally dropped support it would still be kept working by virtue of them still supporting SunOS, as you say it should really have someone who would be willing to maintain it ...

If anyone wants to crunch numbers: https://storage.googleapis.com/access-logs-summaries-nodejs/index.html

@sxa
Copy link
Member

sxa commented Jun 17, 2022

If anyone wants to crunch numbers: https://storage.googleapis.com/access-logs-summaries-nodejs/index.html

Broadly comparable to AIX then ;-)

@bahamat
Copy link

bahamat commented Jun 17, 2022

Hi, I’m the maintainer of a very large project using Node on SmartOS. Our codebase consists of over 500 repos, most of which are node.js. We have thousands of users. Our customers provide services that are used by millions of end users. All depending on node.js running on SmartOS. So the premise of “probably 0 users” completely false.

As part of this project, I maintain several node modules that are direct or transitive dependencies on millions of installations, all of which are developed on SmartOS first.

We provide infra to the Node.js project for build machines and various other services, for both SmartOS as well as other operating systems.

Despite this, there seems to be a concerted effort from some people to kill Node.js on SmartOS/illumos. We’ve seen reduced collaboration over time from the node project with regard to maintaining and keeping support for SmartOS, despite our efforts. After node stopped producing binaries for SmartOS, we’ve been building our own, and needed to maintain a large set of patches to ensure that happens properly.

We would gladly work closer with the node community and have proper support upstream. We currently have 21 patches that we apply for builds of node 16 (we haven’t yet had time to port these forward to node 18). We would gladly contribute all of these patches, but we’ve seen little interest in them being accepted.

As mentioned by @bnoordhuis, Triton (and all of its code, customers, and users) now has a new home, with new management, and new personnel (now with MNX, no longer with Joyent). Our project is not dead, we are actually seeing renewed interest, and MNX is investing heavily in SmartOS and Triton.

Instead of dropping support for SmartOS, I would really like this to be an opportunity to let the past be the past, and treat this a call for working closer together. We’re still working on some things related to the transition of the code base from Joyent to MNX, but within the next couple of months we’ll be fully standing on our own and would really like to take a more active role in maintaining the port and providing support for the platform.

@sxa
Copy link
Member

sxa commented Jun 17, 2022

That sounds great. I'm a little surprised that you've seen resistance to your patches being accepted (I don't think I've seen any of those myself) but I'd hope we'd be willing to take patches that didn't cause any problems elsewhere. I certainly don't have any negative impression of SmartOS/illumos (I'm a long term Solaris fan!) and it would be great to know that someone was willing to continue to contribute to the main codebase and keep the support for it alive and functioning in there, so I'm personally 👍🏻 on working closer together.

Has the lack of support at the libuv project been problematic for you at all?

@bahamat
Copy link

bahamat commented Jun 17, 2022

@sxa A lot of that happened long before I got involved with the project and took over maintainership, to the point where I wouldn't even be able to cite specific examples. Which is another reason, IMO to just let the past be the past, and just focus on going forward.

It may take a little bit of time, but we'll start working towards organizing what we've got and figuring out what to upstream. In the mean time, if there are any pressing needs or anything particularly broken with SmartOS, please send it my way, and we'll see what we can do about addressing those. You can also hit me up on the OpenJS Foundation Slack, if you have any questions or need to discuss something where a github issue isn't quite the right venue.

As far as libuv goes, I didn't know they had dropped support. We've been producing our own builds to go into Triton and SmartOS, essentially forever (which is one reason why there are low download numbers). That's something we'll get to, and hopefully nothing will have broken too badly there. But we'd also like to do what we can to ensure that libuv builds and runs on SmartOS as well.

@sxa
Copy link
Member

sxa commented Jun 17, 2022

Sounds great! It may be worth engaging with libuv particularly in light of your interest in keeping such a critical component of node working, since if there are any platform specific issues in the libuv implementation we'd generally want them to be upstreamed.

Do you run the node test suite regularly on your builds? I know there has been some mention of particular flakiness in the smartos CI jobs and it would be good to understand if you're seeing problems too or if some of it might be specific to our environment and therefore resolvable.

@cjihrig
Copy link
Contributor

cjihrig commented Jun 17, 2022

@bahamat libuv does have an illumos team. Would you (or anyone else) like to be added to it?

@bahamat
Copy link

bahamat commented Jun 17, 2022

Yeah, I’ll be evaluating the situation with libuv in the near future. We currently have 9 patches for libuv.

I’m not sure about the test suite, but I’ll look into that. Again, it’ll take some ramp up time for us, but we’ll start working on evaluating the test suite, and seeing how it goes both with and without patches, and getting forward ported to latest.

@cjihrig Yes, I think that would probably help. This also brings up a good point. In most cases, SmartOS support doesn’t need to be different from illumos support. If libuv (and node for that matter) simply want to stop treating SmartOS and illumos differently, and just support illumos as a whole, that probably wouldn’t be an issue from our side. In any case, I’d like to have a closer relationship, so everything else I said still stands.

@bnoordhuis
Copy link
Member Author

Data point: in the last 4 or 5 years libuv received more patches from traditional Solaris users (!) than smartos and illumos combined. Hell, I personally probably fixed more bugs than the whole of the illumos community together.

The commit history for node is harder to pry apart but it doesn't look thriving either. If no one's maintaining the port and CI flakiness adds friction for contributors, it should go.

@bahamat
Copy link

bahamat commented Jun 20, 2022

I don't think that should be that surprising.

We address bugs that are illumos specific. As I mentioned, I have a code base that consists of over 500 repos. I have a lot more to do than fix bugs in libuv that will almost certainly be addressed by the libuv community. Likewise, we (Triton, specifically) don't regularly contribute to nginx, haproxy, or postgres, all of which we heavily depend on.

I don't think anybody here is expecting the illumos community to become core contributors to node.js, libuv, or any other project. Those communities are more than capable of addressing general bugs. But where there are edge cases that need addressing on our platform specifically, we're more than happy to directly contribute.

As far as each point in your original comment:

This is a mistaken assumption. SmartOS isn't dead. It just has new owners (with renewed investment, I might add).

  • BUILDING.md claims it's a tier 2 platform but I'm unaware of anyone actively maintaining the port

I've volunteered MNX to actively help here.

  • CI buildbots are pretty flaky

Again, something we (MNX) intend to address.

  • port has probably 0 users

Again, a mistaken assumption. We make our own builds for binary deployment in our own products, and node.js is included in our packaging system for SmartOS. Which means that the vast majority of downloads for SmartOS will be our own builds from our own servers. The only time someone needs to use a build not produced by us is when we don't yet have a version available.

Given the above, I'd like to move to close this issue, unless there's something more that needs to be discussed.

@bnoordhuis
Copy link
Member Author

I believe AIX (the other tier 2 platform) has 1-2 FTE working on it but .5 FTE is probably the bare minimum. Is that something MNX can commit to?

@bahamat
Copy link

bahamat commented Jun 22, 2022

We are committed to making sure that it gets the attention it needs. If there are currently specific issues that need to be addressed, please send them my way.

@bnoordhuis
Copy link
Member Author

bnoordhuis commented Jun 22, 2022

You're not saying yes so I'll take that as a no.

What we can do is keep building but no longer run the test suite on smartos. That keeps the maintenance work for you down (because 80% is debugging flaky tests) and removes the friction of flaky CI runs for contributors.

@bahamat
Copy link

bahamat commented Jun 22, 2022

I just said that we are. I don't understand why you would take that to infer no.

If you need specific details, not that it should matter, we have an FTE starting soon who will be working on this.

@F3n67u
Copy link
Member

F3n67u commented Jun 23, 2022

If you need specific details, not that it should matter, we have an FTE starting soon who will be working on this.

@bahamat Currently, there are a lot of flaky tests timeout on smartos. See nodejs/reliability#308. Your side will handle those flaky test and make CI green on smartos again, right?

@bahamat
Copy link

bahamat commented Jun 23, 2022

@F3n67u I'll start looking at those.

@F3n67u
Copy link
Member

F3n67u commented Jun 23, 2022

@F3n67u I'll start looking at those.

Great! Thank you very much.

@F3n67u
Copy link
Member

F3n67u commented Jun 23, 2022

@bahamat FYI. I paste the wrong issue, the issue to reliability should be nodejs/reliability#308.

@bahamat
Copy link

bahamat commented Jun 23, 2022

Ok.

@bnoordhuis
Copy link
Member Author

the issue to reliability should be nodejs/reliability#308

39 of the 78 flakes - 50%! - are on the two smartos machines. Wow, just wow.

How does everyone feel about disabling the test suite on smartos for now? There's zero chance of getting everything fixed in the short term, even with a FTE working on it. This is a monumental task.

@F3n67u F3n67u added the discuss Issues opened for discussions and feedbacks. label Jun 24, 2022
@sxa
Copy link
Member

sxa commented Jun 24, 2022

39 of the 78 flakes - 50%! - are on the two smartos machines. Wow, just wow.

Looks like a lot of them are timeouts on test-domain- tests, so if we can resolve them the situation will be a lot better. I expect it's one underlying issue.

There's zero chance of getting everything fixed

I don't think that's necessarily the correct bar we should select for including a platform.

@richardlau
Copy link
Member

FWIW I've bumped up the test runner's timeout in the SmartOS Jenkins CI job earlier today from the default 2 minutes to 5 minutes (TEST_CI_ARGS="-t 300" -- I was making the change for a different platform at the time so it seemed like a simple adjustment to make).

@sxa
Copy link
Member

sxa commented Jun 24, 2022

As an FYI I've tried a make test run against a build of the main branch on SmartOS and it failed 5 test, none of which are the test-domain- ones, which suggests that a lot of them are at least capable of running cleanly. Maybe the inreased timeout will resolve any flakiness there 🤞🏻 The machines we have seem rather well-specced so I wouldn't expect that to be an issue unless they've been getting into some funny state.

@bnoordhuis
Copy link
Member Author

Looks like a lot of them are timeouts on test-domain- tests

Reference point: all those tests together take 6 seconds wall clock time on a 10 year old laptop (20s with -j1)

@F3n67u
Copy link
Member

F3n67u commented Jun 27, 2022

If we don't have consensus about dropping support smartest, could we mark all flaky on smartos in one go temporarily to make green ci back? The flaky test problem make merge a pr very hard. The CI is super flaky in recent days. We could remove the flaky mark if we have fixed those flaky tests.

@bahamat
Copy link

bahamat commented Jun 27, 2022

I think that works for me? That'll give us some breathing room to get things going over here as we bring on more people that can help out.

@mhdawson
Copy link
Member

My preference is to mark the problematic tests as flaky and then let @bahamat work them back to green over time. Talking to @bahamat it sounds like the change in ownership is a potential opportunity for more investment versus less and hopefully more active work in the community to go along with that. Using the flaky test approach would give them time to demonstrate that.

@F3n67u
Copy link
Member

F3n67u commented Jun 28, 2022

I collect all tests that are flaky only on SmartOS between 06-22 and 06-28 and created a pr to mark them as flaky at #43596. There are 24 flaky tests in total.

nodejs-github-bot pushed a commit that referenced this issue Jun 28, 2022
PR-URL: #43596
Refs: #43457
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Filip Skokan <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
@mhdawson
Copy link
Member

@F3n67u many thanks!

mabaasit pushed a commit to mabaasit/node that referenced this issue Jul 6, 2022
PR-URL: nodejs#43596
Refs: nodejs#43457
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Filip Skokan <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
@bnoordhuis
Copy link
Member Author

@bahamat I wonder if you can give a short status update?

targos pushed a commit that referenced this issue Jul 12, 2022
PR-URL: #43596
Refs: #43457
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Filip Skokan <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
@bahamat
Copy link

bahamat commented Jul 12, 2022

@bnoordhuis I have baseline build/tests done on Linux. I'm now working through SmartOS compile issues.

I also have someone I'm onboarding right now. Someone else starting in two weeks, and someone starting in September.

targos pushed a commit that referenced this issue Jul 31, 2022
PR-URL: #43596
Refs: #43457
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Filip Skokan <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
guangwong pushed a commit to noslate-project/node that referenced this issue Oct 10, 2022
PR-URL: nodejs/node#43596
Refs: nodejs/node#43457
Reviewed-By: Tobias Nießen <[email protected]>
Reviewed-By: Filip Skokan <[email protected]>
Reviewed-By: Ben Noordhuis <[email protected]>
Reviewed-By: Richard Lau <[email protected]>
Reviewed-By: Darshan Sen <[email protected]>
@bnoordhuis
Copy link
Member Author

I think smartos has a semblance of support again? Can anyone confirm/deny?

@bahamat
Copy link

bahamat commented Dec 29, 2022

Yes, I can confirm. This can be closed.

@bnoordhuis bnoordhuis closed this as not planned Won't fix, can't repro, duplicate, stale Dec 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build Issues and PRs related to build files or the CI. discuss Issues opened for discussions and feedbacks. smartos Issues and PRs related to the SmartOS platform.
Projects
None yet
Development

No branches or pull requests

7 participants