Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The JSON for SSVC "options" splits out keys into individual records #576

Open
jayjacobs opened this issue May 30, 2024 · 24 comments
Open

The JSON for SSVC "options" splits out keys into individual records #576

jayjacobs opened this issue May 30, 2024 · 24 comments
Labels
enhancement New feature or request

Comments

@jayjacobs
Copy link

The schema (with example in data/schema_examples/Computed-CVE-2014-0751-Coordinator.json) is being leveraged by CISA for Vulnrichment and creates some unfriendly JSON.

I opened issue #40 on vulnrichment to talk about it and they suggested they are just following this schema.

Long story short, by specifying each key:value pair in it's own object under options, it is flattened (by tools) to be unique records, when all of those key:value pairs represent a single object. (see issue 40 in vulnrichment)

It also does not make sense to have the options specified as an array if it is a single object tied to the single computed field in the same record.

Fixed example:

{
    "role": "Coordinator",
    "id": "CVE-2014-0751",
    "version": "2.0.3",
    "computed": "SSVCv2/E:A/V:S/T:T/P:M/B:A/M:M/D:A/2021-09-29T15:29:44Z/",
    "timestamp": "2021-09-29T15:29:44Z",
    "options": {
	"Exploitation": "active",
	"Automatable": "no",
	"Technical Impact": "total",
	"Mission Prevalence": "Minimal",
	"Public Well-being Impact": "Material",
	"Mission & Well-being": "medium"
    },
    "$schema": "https://democert.org/ssvc/SSVC_Computed_v2.0.3.schema.json",
    "decision_tree_url": "https://democert.org/ssvc/CISA-Coordinator-v2.0.3.json"
}
@jayjacobs jayjacobs added the enhancement New feature or request label May 30, 2024
@sei-vsarvepalli
Copy link
Contributor

Hello Jay,

Thanks for your feedback. I believe the intent in options being an array was for it to be an ordered list, the Decisions Points in schema are considered ordered. This makes it possible to arrange the Decision Point bundles and embed them both in a Computed schema as well the full Decision Tree schema. There was some discussion related to this earlier that may be relevant

#403
#290

Some related information from JSON Schema discussions Google Group

https://groups.google.com/g/json-schema/c/rgkxYocPSVg

@jayjacobs
Copy link
Author

First, I do not understand the need for these to be an ordered list. The "decision tree" in SSVC is simply for visualizing. The ordering of variables will never change the outcome. But that discussion is not what this issue is about.

Let's just assume they are ordered, it still does not mean that the individual variables need to be represented in the JSON as ordered. In other words, the representation/presentation can be different than the storage.

In #290 it is stated:

This is also generally true of (I think) all CVSS vector elements to date. (Counterexamples hereby requested.)

Which I think is true, the CVSS vector representation is ordered, so take this example from CVE-2007-3484 where the CVSS is stored in JSON:

            "cvssV3_1": {
              "scope": "CHANGED",
              "version": "3.1",
              "baseScore": 6.1,
              "attackVector": "NETWORK",
              "baseSeverity": "MEDIUM",
              "vectorString": "CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N",
              "integrityImpact": "LOW",
              "userInteraction": "REQUIRED",
              "attackComplexity": "LOW",
              "availabilityImpact": "NONE",
              "privilegesRequired": "NONE",
              "confidentialityImpact": "LOW"
            }

Notice how the vectorString is showing the ordering of the variables, but the variables themselves are stored in the object unordered. This doesn't make the values any less ordered for display than any other storage mechanism. It also "flattens" down to structured data rather nicely and can then be past to a presentation layer or wherever.

I maintain that the way the options for SSVC are stored are sub-optimal, and can be greatly simplified that could still support ordering the variables for whatever presentation may occur and allow clean flattening and extraction of structured data.

@ahouseholder
Copy link
Contributor

ahouseholder commented May 30, 2024

I think there's some confusion here on what needs to be "ordered" in SSVC. The rationale is laid out in decision record 0008, but the gist is that within a single decision point (think CVSS vector element), the values must be ordered. So for exploitation, $None < PoC < Active$, or for confidentialityImpact, $None < Low < High$.

That is not saying anything about how to represent multiple specific values across decision points, which seems to be what this thread is touching on.

I realize this doesn't resolve the question, I just wanted to point out that the "ordering" line of argument might be a red herring to the issue at hand. I haven't had a chance to reexamine the json schema mentioned, so I'm not prepared to comment on the rest of the thread at the moment.

@sei-vsarvepalli
Copy link
Contributor

Ah okay.

I think your recommended schema will work. The current schema also works, but I think your request is to primarily optimize the schema? Will the records consuming tools fail to understand or process the schema?

If you feel strongly the flattening of the schema is user, you can make a PR suggestion with the schema, and update the JavaScript library for us to run our tests and take it in. We only need that your PR be digitally signed and be evaluated by us before merging it.

On a related topic:

The reason for the Computed schema options field to be array of objects whose items can be array or string is so one can specify a Decision Point to have multiple potential values either because a Decision Point has not yet evaluated or Decision Point has been evaluated to "exclude" one of the options.

For example a valid SSVC computed schema options below states that Exploitation is either "poc" or "active" but NOT "none" and Mission Prevalence has NOT been evaluated by the Role of this metrics developer.

                "options": [
                  {
                    "Exploitation": ["poc","active"]
                  },
                  {
                    "Automatable": "no"
                  },
                  {
                    "Technical Impact": "partial"
                  },
                  {
                    "Mission Prevalence": ["minimal","support","essential"]
                  }
                ]

@ahouseholder
Copy link
Contributor

So ignoring the current schema for a moment, the CVSS example @jayjacobs mentioned might turn @sei-vsarvepalli's example into

                "options": {
                    "Exploitation": ["poc","active"],
                    "Automatable": "no",
                    "Technical Impact": "partial",
                    "Mission Prevalence": ["minimal","support","essential"]
                }

But that would have the unfortunate side effect of having a dictionary whose keys are always strings but whose values could be strings or lists, which could lead to ambiguous parsing.

So maybe

                "options": {
                    "Exploitation": ["poc","active"],
                    "Automatable": ["no"],
                    "Technical Impact": ["partial"],
                    "Mission Prevalence": ["minimal","support","essential"]
                }

would be preferable? Every value is consistently a list of strings. The list could be of length 1 up to the entire decision point (implying that the record is asserting no information about that decision point)

This seems logically consistent with what we say in https://certcc.github.io/SSVC/howto/bootstrap/use/#partial-or-incomplete-information

The basic guidance is that the analyst should communicate all of the vulnerability's possible states, to the best of the analyst's knowledge. If the analyst knows nothing, all states are possible.
...
The merit in this “list all values” approach emerges when the stakeholder knows that the value for a decision point may be A or B, but not C.

@jayjacobs
Copy link
Author

I've given this some thought and I cannot open a PR for the work required, I just do not have the time and you do not want me working on javascript.

But for future reference, an array of objects is generally treated as a collection of separate objects (think of unique rows in a table), this is the only JSON I've come across (so far) where an array is used to order a single object - meaning each object in the JSON is actually a part of an object defined at the array level. It's just not standard JSON practice.

Also, an array denotes a one-to-many relationship. If you what something like "Exploitation" to have a one-to-many relationship with its values, than the whole thing should be an array regardless if one or many values exist, do not store When someone needs to convert JSON data into columnar format that relationship needs to apply across all records and cannot exist in a per-record definition. This means that even simple records would be an array:

    "options": {
	"Exploitation": [ "active" ],
	"Automatable": [ "no" ],
	"Technical Impact": [ "total" ],
	"Mission Prevalence": [ "Minimal" ],
	"Public Well-being Impact": [ "Material" ],
	"Mission & Well-being": [ "medium" ]
    },

@ahouseholder
Copy link
Contributor

    "options": {
	"Exploitation": [ "active" ],
	"Automatable": [ "no" ],
	"Technical Impact": [ "total" ],
	"Mission Prevalence": [ "Minimal" ],
	"Public Well-being Impact": [ "Material" ],
	"Mission & Well-being": [ "medium" ]
    },

I think this is consistent with what I intended in my comment above -- that the values would always be an array, even if there is only a single element.

However, due to the nearly coincident timing of our respective comments, I'm not sure whether you were responding to mine or whether we were both replying nearly simultaneously.

@jayjacobs
Copy link
Author

Yes sorry, my last comment was for @sei-vsarvepalli and your comment came in right before I posted. I agree @ahouseholder, your comment and mine seem to be very much in line.

@j---
Copy link
Collaborator

j--- commented May 31, 2024

The decision points don't need to be an ordered list; the order of the decision points in a tree is not material to the output. There are some display choices we make about ordering points with a tree's display, but that should be something that is ensured by the display tool, not the JSON format.

I agree the values should always be an array for each decision point, just to keep us aligned with the ability to relay partial information (though we could determine that use case is overtaken by events and no one wants to do it).

Thanks Jay and Allen for converging on a solution.

@ahouseholder
Copy link
Contributor

ahouseholder commented Jun 25, 2024

Capturing some additional thoughts on a way forward based on internal discussions:

flowchart TD
SSVC_Computed --> Decision_Point_Group
SSVC_Provision --> Decision_Point_Group
Decision_Point_Group --> Decision_Point
Loading
  • We'd want any changes to also include the array changes already described in this thread as well.

@sei-vsarvepalli
Copy link
Contributor

@jayjacobs

I have a branch that tries to resolve this issue with a consistent schema that can be used for Decision Points and Decision Point Groups.

https://github.com/sei-vsarvepalli/SSVC/tree/feature/issue_576

specifically take a look at https://github.com/sei-vsarvepalli/SSVC/blob/feature/issue_576/data/schema_examples/Computed-CVE-2014-0751-Coordinator.json

if that comes close to what would be reliable way to parse and represent SSVC data via ADP. This requires a bit of coordination with CISA developers to ensure the new schema also gets adopted.

@sei-vsarvepalli
Copy link
Contributor

Capturing some additional thoughts on a way forward based on internal discussions:

flowchart TD
SSVC_Computed --> Decision_Point_Group
SSVC_Provision --> Decision_Point_Group
Decision_Point_Group --> Decision_Point
Loading
  • We'd want any changes to also include the array changes already described in this thread as well.

This part is also resolved in the recent PR #588

@ahouseholder
Copy link
Contributor

There are some additional snags we ran into in trying to address this in PR #588:

  • Decision points are namespaced and versioned objects, so it's not enough to say "Exploitation". You have to say {"namespace": "ssvc", "name": "Exploitation", "version": "1.0.0"} to be clear which decision point you're talking about.
  • We don't have a schema to represent a list of decision points each having a list of selected values. That's now the topic of We need a schema for an array of decision point downselects #595
  • The list also needs to say what vul it pertains to (based on this comment in Feature/issue 576 #588)

@sei-vsarvepalli
Copy link
Contributor

Now we are pursuing #599 as the potential way to represent ADP data. If @jayjacobs has comments, speak now...

@ahouseholder
Copy link
Contributor

ahouseholder commented Jul 11, 2024

Status summary:

I believe the primary issue in this ticket---the need for a schema for a succinct list of decision point selections, which sits upstream of cisagov/vulnrichment#40

In the course of working on that, we tried

which was a bit too big to be manageable, but that spawned

However, I believe everything in the above list is peripheral to the resolution offered in #599.

@jayjacobs
Copy link
Author

Looks like this expanded way beyond what the original issue was. I just spent 15 minutes going through various issues and pull requests here and I could not figure out what to comment on. Any chance someone could paste an example of or link to what the options section looks like in the new schema?

@sei-vsarvepalli
Copy link
Contributor

Looks like this expanded way beyond what the original issue was. I just spent 15 minutes going through various issues and pull requests here and I could not figure out what to comment on. Any chance someone could paste an example of or link to what the options section looks like in the new schema?

Hello Jay,

It will no longer be options but be called selections instead with a values array of a minus single element that will capture current evaluation of a Decision Point for a vulnerability. See example evaluation performed
https://github.com/CERTCC/SSVC/blob/main/data/schema_examples/CVE-1969-0000-Decision_Point_Value_Selection.json

Ideally the ADP container adheres to this proposed schema, although the "id" field will be redundant. We are exploring other optionally things like a references section that can be used to capture any resources that were used to make the evaluation #600

@zmanion
Copy link
Contributor

zmanion commented Jul 12, 2024

Somewhat off topic, where did namespace come from? At first glance it seems odd to be within an SSVC scope and need to specify that other nodes in the SSVC scope are in fact also in the SSVC scope. Is the idea that an SSVC decision tree can "import" decision points from external sources, like CVSS?

@zmanion
Copy link
Contributor

zmanion commented Jul 12, 2024

I'm stuggling a bit with how to interpret multiple decision point values. I understand the rationale, but this seems to imply that the default value for a decsion point is "all the values" then analysis excludes some values, narrowing down the decision. Using Automatable as an example, does this mean the effective starting or defaults look like this?

{
  "namespace": "ssvc",
  "name": "Automatable",
  "key": "A",
  "version": "2.0.0",
  "values": [
    ["Yes"], ["No"]
  ]
}

Are empty values ("values": []) equivalent to all the values ("values": [["Yes"], ["No"]])?

Are the muliple values each an individual boolean/flag, the value is set or not? I don't think logical OR works for what appear to be boolean values, e.g., Automatable cannot IRL be Yes and No at the same time. For other decision points, some values are not mutually exclusive (Exploitation could be PoC and Active at the same time).

@sei-vsarvepalli
Copy link
Contributor

Hello @zmanion

Some clarifications

  1. namespace will allow us to work with other namespaces like CVSS as you alluded. See https://github.com/CERTCC/SSVC/blob/main/src/ssvc/decision_points/cvss/availability_impact.py https://github.com/CERTCC/SSVC/blob/main/src/ssvc/dp_groups/cvss/collections.py
  2. For evaluated decisions the values cannot be an empty array. That is a good catch to specify in the schema. I have added minItems to capture this. For each Decision Point one can provide multiple values, thus eliminating some of the values. For example, An evaluation can result in Exploitation being either "Public Poc" or "active" but NOT "none". However your example Automatable being an array of "yes" "no" - basically would mean no evaluation of Automatable was done, which does not make sense to share as a completed Value Selection.

@jayjacobs
Copy link
Author

Okay, my original and only concern was that the JSON produced a sane data structure when automatically parsed. And it looks like it does now, so thanks.

I do hope you don't do an array of arrays like "values": [ ["Yes"], ["No"] ] as in Art's comment above (I don't see that in the example linked). You probably aren't, but totally fine to just do "values": [ "Yes", "No" ].

@sei-vsarvepalli
Copy link
Contributor

Okay, my original and only concern was that the JSON produced a sane data structure when automatically parsed. And it looks like it does now, so thanks.

I do hope you don't do an array of arrays like "values": [ ["Yes"], ["No"] ] as in Art's comment above (I don't see that in the example linked). You probably aren't, but totally fine to just do "values": [ "Yes", "No" ].

Hmm.. I think the array of array is probably a typo error. values should be an array of strings as specified in the schema.

@zmanion
Copy link
Contributor

zmanion commented Jul 17, 2024

Yes sorry my bad, I either saw that somewhere or accidentally added the internal brackets, should be an array of strings.

@sei-vsarvepalli
Copy link
Contributor

This is mostly resolved. Waiting on CISA adoption of the new schema and models.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants