Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HashMap should be more widely usable #11

Open
Ericson2314 opened this issue Mar 7, 2018 · 22 comments
Open

HashMap should be more widely usable #11

Ericson2314 opened this issue Mar 7, 2018 · 22 comments
Labels

Comments

@Ericson2314
Copy link
Collaborator

Ericson2314 commented Mar 7, 2018

Right now it can only be used on regular platforms due to the way the default hasher is initialized. This is a such a silly little thing. Yes, it's a one-of issue, but I view it as a good litmus test.

@jethrogb
Copy link
Collaborator

jethrogb commented Mar 7, 2018

I think there's an issue for better handling of default type parameters, but I can't find it right now.

@Ericson2314
Copy link
Collaborator Author

Ericson2314 commented Mar 7, 2018

There is. But I rather not wait for such language change when we can just move it into alloc and wrap in a new type. We could have done that years ago.

@jethrogb jethrogb added the std label Mar 7, 2018
@jethrogb
Copy link
Collaborator

jethrogb commented Mar 7, 2018

That would mean that users of std and alloc are seeing a different type, no?

@Ericson2314
Copy link
Collaborator Author

@jethrogb Yes. But that's no worse than today when its not even available in alloc. While alloc is unstable, a stop-gap solution like the newtype need not tie our hands.

@WildCryptoFox
Copy link

@Ericson2314 @jethrogb A newtype could have worked, but now the default type alias should work today. Is there anything blocking this today?

@Ericson2314
Copy link
Collaborator Author

type aliases cannot have default type parameters.

@WildCryptoFox
Copy link

@Ericson2314 Er. I use this feature often.

pub type Result<T, E = Error> = core::result::Result<T, E>;

@Ericson2314
Copy link
Collaborator Author

Ericson2314 commented Nov 28, 2018

Woah I didn't know that was even allowed in stable Rust. @eddyb?

Still there are issues:

pub type Result<T, E = ()> = core::result::Result<T, E>;

fn main() {
    Ok(());
    // fn asdf<E = ()>() -> Result<(), E> { Ok(()) }
    //asdf();
    println!("Hello, world!");
}

Ok is ambiguous, and parameter defaults are not accepted on functions.

@WildCryptoFox
Copy link

@Ericson2314 Of course that Ok is ambiguious, it is Result<(), _>::Ok.

Indeed functions don't get this treat, but for all that it matters for HashMap you just need a impl<K: Eq + Ord, V> HashMapStd for HashMap<K, V> { fn new; fn with_capacity; }; where the HashMap type alias in std includes the default. See my suggestion in rust-lang/rust#56192

@WildCryptoFox
Copy link

WildCryptoFox commented Nov 28, 2018

@Ericson2314 I must admit I was surprised to find the following doesn't work.

type Result<T, E = ()> = std::result::Result<T, E>;

// doesn't build
//use Result::*;

enum XResult<T, E = ()> {
    XOk(T),
    XErr(E),
}

// works as expected
use XResult::*;

Edit: Although you can't use XOk(()); either as it doesn't know the E type concretely; thus no difference or loss of features. Although it would be nice if it defaulted the type and adopted the concrete types when made concrete later.

@Ericson2314
Copy link
Collaborator Author

To not break compat, we have to default those method type vars taken from the inherent impl. This isn't possible. We don't have the same problem with variant constructurs since HashMap has none exposed, but if it did that would be the same problem.

@Amanieu
Copy link

Amanieu commented Nov 28, 2018

I would recommend anyone wanting to use HashMap on no_std to just use the hashbrown crate, which only depends on liballoc.

@Ericson2314
Copy link
Collaborator Author

Ericson2314 commented Nov 28, 2018

That's a good idea if you don't need to worry about other libraries consuming a hash map. My RFC rust-lang/rfcs#2492, which will hopefully get more consideration after the Rust 2018 crunch, would get us a way to defer the definition of the default so the main one can be moved. (And I'd hope the general thrust of the portability push would allows @Amanieu's alternative hashmaps and locks to be substituted in custom-built standard libraries easily so that they can be used more widely.)

@cbeck88
Copy link

cbeck88 commented Jan 23, 2019

I hope that I'm not out-of-line in contributing to this conversation given that
I'm not part of the working group, or an active rust contributor.

Let me just suggest that it's helpful to the users if we can avoid letting perfect
be the enemy of good.

Even if lang_items are ugly and the proliferation of lang_items
is "a bad thing", it would be have been nice if 6+ months ago when this issue was
first raised (WG issue 11 Mar 27, 2018 #11,
PR June 27, 2018 rust-lang/rust#51846)
the developers could have agreed that e.g. a #[core_entropy_fcn], #[hashmap_random_keys], whatever,
lang_item was acceptable in the near term, with the expectation that it would be removed when
existential types RFC was eventually (if ever?) merged, so that there is a "long term solution"
and a "quick fix to tide us over, which is self-contained and without negative
consequences for other parts of the system".

From the point of view of my company:

  • We are trying to use rust to build Intel SGX enclaves.
    These are ``trusted computing'' environments, where access to the OS is limited.
  • There are several teams at other companies that have worked on creating
    "enclave frameworks" in rust that provide basic std like functionality,
    piercing the enclave in Intel's prescribed way to make specific sets of system calls.
    (rust-sgx-sdk: https://github.com/baidu/rust-sgx-sdk, fortanix: https://github.com/fortanix/rust-sgx)
    (fortanix is also contributing patching directly to rust stdlib to support an sgx target)
  • We believe that these frameworks are very complex and somewhat defeat the purpose
    of the enclave, which is to isolate the sensitive code from the OS, which is viewed as untrusted.
    Additionally, there are papers that recommend that the SGX OCALL's on which these
    frameworks are based when they call out across the enclave to the OS, are fundamentally
    a source of vulnerabilities and should be avoided.
    (https://www.blackhat.com/docs/us-17/thursday/us-17-Swami-SGX-Remote-Attestation-Is-Not-Sufficient-wp.pdf)
  • We prefer to build our enclave target as #[no_std] for this reason. Our enclave
    only does some serialization and cryptography, and nothing else, and the
    untrusted layer of our program outside the enclave is what does network and disk I/O.
  • Not being able to use the std::HashMap in a no_std crate even with alloc
    is really bad for us, we need to be able to have efficient key value stores
    in the enclave, and we need to be able to use third party serialization libs,
    which all want to provide support for HashMap out of the box.
    In fact, the dependency of the serde lib on std, solely for HashMap afaict,
    is the source of quite a bit of pain for projects lke ours, because
    if a target has dependencies some of which select serde/std and some of which
    select serde/alloc, nasty build failures ensue where rust says that certain
    traits are not implemented (although they plainly are, it's just for the other
    version of serde).
  • To get the exact same hashmap implementation, we can copy-paste the stdlib
    Hashmap in tree, much as hashmap_core crate does. And then swap out the
    references to ThreadRng with calls to RDRAND intrinsics on x86.
    But this is a significant maintanence burden, and since all of our code is
    security-critical and will have to be audited externally, this means that
    the auditors will have to audit the entire 20k+ lines of rust standard hashmap
    that we copied in tree.
  • Using hashbrown crate as mentioned above is only a partial solution because
    it too would need to be audited, and it seems to have less test coverage than
    the rust standard hashmap (as would be expected).
  • While the rust-sgx-sdk etc. options can be reconsidered at this point since
    they make it easy to use std::HashMap the truth is that these sdk's would
    also need to be audited, and they are very large and by their nature circumvent
    some of the security of the enclave by attempting to make it easy to talk to
    the OS.

The inability to move HashMap out of std has led to fallout for the Rust
ecosystem at large, because it has significantly complicated serde and the use
of serde in no_std crates -- even though serde is merely a struct-field
reflection / serialization framework, consists entirely of traits / generic code,
and should not need to talk to the OS at all (!)

Intel SGX Enclaves seems at a high level to be the perfect application
for Rust -- almost anything that someone does in an enclave is security sensitive
and therefore stronger memory safety guarantees have an almost compelling appeal.
And Rust's serde is a significantly nicer in-language serialization framework
than what can be done in other systems-languages like C and C++ which usually
require external code-gen tools integrated with the build-system.
Almost the only sensitive part that actually needs to be in the enclave is the
actual cryptography and serialization routines, which seemingly should be trivial
to perform without the OS and in a no_std environment.
But if you can't use rust in practice for this without doing really sketchy stuff, e.g.

  • can't use open-source third-party serialization libraries without patching them for no_std support,
  • can't use standard hashmap without bringing in tree / patching it,
  • have to commit to using a nightly compiler for the foreseeable future

then there's a point at which engineers start to say it would be simpler and
better for the company if we just used C++ instead. In C++ the hashmaps don't use
randomness. It's trivial to replace things like malloc and free by interposing
symbols at link time. And if I want to use anything from the standard library
template headers, I can just include those headers, and as long as I tell the
linker not to link to libstdc++, pthreads, etc. we won't get any of the
OS-specific stuff.

Anyways -- I hope I've made my point without being too annoying. I want to thank
you all for your hard work. And I want to let you
know that we are keenly watching the resolution of this issue and the
RFC (rust-lang/rfcs#2492)!

@Ericson2314
Copy link
Collaborator Author

@cbeck88 the biggest delay is not due to the scope of that RFC but due to the limit bandwidth of the core team. Even some sort of lang item fix would have been a long process (RFC, PR, etc...).

I'm totally with you that pure things need to work, and big "fameworks" that drag everything do contradict the ideal of fine grained composition and use-only-what-you-need.

Hopefully now that 2018 is out, we can get things moving with the RFC. Please comment sharing your usecase there!

@jethrogb
Copy link
Collaborator

@cbeck88

I hope that I'm not out-of-line in contributing to this conversation given that I'm not part of the working group, or an active rust contributor.

Not at all, all comments are welcome!

Let me just suggest that it's helpful to the users if we can avoid letting perfect be the enemy of good.

In my experience Rust generally favors the status quo in favor of ad-hoc solutions if a general solution is in sight.

it would be have been nice if ... the developers could have agreed that e.g. a #[core_entropy_fcn], #[hashmap_random_keys], whatever, lang_item was acceptable in the near term, with the expectation that it would be removed when existential types RFC was eventually (if ever?) merged, so that there is a "long term solution" and a "quick fix to tide us over, which is self-contained and without negative consequences for other parts of the system".

It would've been nice too if we had const generics already. The Rust development process is community-driven and as @Ericson2314 mentions the priorities are set by the core team.

This issue wasn't first raised 10 months ago (it's just that the Portability WG was formed around that time) but rather many years ago by myself and other people facing the exact same problems you described. I first released the core_collections crate almost 3 years ago for the exact purpose you want it for now. I don't think the arguments in support of this feature have changed much and I also don't think your usecase is a new one that hasn't been considered.

So, the only reason no progress has been made is because of prioritization. Please add your voice to the demand for this feature and hopefully something can be done in the future!

(people who don't care about SGX can stop reading here)


I'm the author of the Fortanix Rust EDP and the x86_64-fortanix-unknown-sgx target. I think it perfectly fits your usecase and you should use it. I'll respond to some of your points here, but the discussion regarding Rust & SGX usage is entirely off topic for this issue I ask it be continued elsewhere. I'd be happy to talk about your specific security concerns, and if possible make changes to our platform address them. We can setup a call, we can talk on the #rust-sgx slack channel, or communicate in whatever other forum you suggest.

Intel SGX Enclaves seems at a high level to be the perfect application for Rust -- almost anything that someone does in an enclave is security sensitive and therefore stronger memory safety guarantees have an almost compelling appeal.

I agree! That's why I've been working on accomplishing exactly this for over 3 years and have started a company that has made this a reality. For a long time, Fortanix was the only company running Rust & SGX in production. Our considerable experience actually writing and using enclaves has been invaluable in designing the Fortanix Rust EDP and the x86_64-fortanix-unknown-sgx target. I hope you can learn from our experience instead of having to re-encounter all the problems yourself by building from scratch.

piercing the enclave in Intel's prescribed way to make specific sets of system calls [...] SGX OCALL's on which these frameworks are based

I'm not sure what you mean by this exactly, but the Fortanix EDP isn't based on the Intel SGX SDK.

Additionally, there are papers that recommend that the SGX OCALL's [...], are fundamentally a source of vulnerabilities and should be avoided.

I think you are reading too much into the Swami paper. What he describes (but doesn't cite) are known as Iago attacks and yes careful interface design is needed to avoid such attacks. However, there are even more damning attacks on the userspace-enclave interface, such as https://arxiv.org/abs/1710.09061 .

Because getting this interface right is so important (it's the most direct attack surface an attacker can use!), I highly recommend that you do not design it from scratch, lest you run into the same pitfalls as everyone else before you.

The x86_64-fortanix-unknown-sgx target uses a minimal interface that has been carefully reviewed (by us, academia, 3rd party security auditors) and refined over time, and that we believe is secure. The implementation uses Rust language features specifically designed to prevent misuse of the interface.

We believe that these frameworks are very complex and somewhat defeat the purpose of the enclave, which is to isolate the sensitive code from the OS, which is viewed as untrusted.

[...]

by their nature circumvent some of the security of the enclave by attempting to make it easy to talk to the OS.

I don't think this is a fair characterization. It sounds like your enclave will largely consist of the memory allocator, the alloc crate which you apparently intend to use and serde. The complexity of those parts is far greater than anything you'd be pulling in alongside it from std. The “talking to the OS” part of std is rather minimal, especially on x86_64-fortanix-unknown-sgx. Besides, if you don't call TcpStream::new you will not have to add that part of the “framework” to your complexity budget. Your auditors should be able to confirm that there is no use of e.g. std::io/std::net.

we can copy-paste the stdlib Hashmap in tree [...] But this is a significant maintanence burden, and since all of our code is security-critical and will have to be audited externally, this means that the auditors will have to audit the entire 20k+ lines of rust standard hashmap that we copied in tree.

I don't understand this argument. Either you need to audit the HashMap implementation, or you don't, but whether you pull it from std or alloc or some fork of the source code shouldn't matter for a security audit.

have to commit to using a nightly compiler for the foreseeable future

x86_64-fortanix-unknown-sgx is on track to land in 1.33.0 stable.

if we just used C++ instead

It's not clear to me if you intend to use the the Intel SGX SDK here or not, but the “trusted” parts of that cloc in at about 60000 CLOC of unsafe code! The x86_64-fortanix-unknown-sgx-specific parts of std are only 3800 CLOC.

In C++ the hashmaps don't use randomness.

Depending on your usecase, this could in and of itself be a security vulnerability!

@cbeck88
Copy link

cbeck88 commented Jan 26, 2019

Hi Jethro,

Thanks very much for your comments. We are still discussing internally several
of the things that you mentioned.

It is great to hear that the fortanix sgx target is on track to hit stable in
1.33.0!

@XVilka
Copy link

XVilka commented Aug 26, 2020

We recently met the same problem in petrgaph, it makes graph code that uses Hash* looking a bit messy with all these feature switches. For now indexmap and hashbrown are helpful, but it would be amazing to get fully-features Hash* available for everyone.

@joshlf
Copy link

joshlf commented Feb 6, 2021

Fuchsia needs this in order to make Netstack3 no_std.

@Amanieu
Copy link

Amanieu commented Feb 6, 2021

I'll just repeat my comment from earlier in this thread:

I would recommend anyone wanting to use HashMap on no_std to just use the hashbrown crate, which only depends on liballoc.

The standard library HashMap is a wrapper around hashbrown::HashMap, so you're not missing out on any features.

@joshlf
Copy link

joshlf commented Feb 6, 2021

Ah I missed your comment there, thanks!

@Stargateur
Copy link

I'll just repeat my comment from earlier in this thread:

I would recommend anyone wanting to use HashMap on no_std to just use the hashbrown crate, which only depends on liballoc.

The standard library HashMap is a wrapper around hashbrown::HashMap, so you're not missing out on any features.

But that only useful for application, If I use this in a library, people can't put the HashMap of std in place of the HashBrown HashMap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants