General formula for computing exercise difficulty #193

ilya-khadykin · 2017-09-08T08:32:17Z

This issue is a follow-up for discussion in exercism/python#523. This is closely related to #92, but I think @exercism/track-maintainers can compute the difficulty themselves instead for now.

I like the idea proposed by @lilislilit of summing up a difficulty score based on exercise topics.
Topics themselves can be organized in Tiers which have respective scores. It's almost done in TOPICS.txt, so Base Concepts topics can have 1 as the difficulty score.
This will help us choose topics carefully since the difficulty will be a good indicator when very few topics were chosen or too many.

What are your thoughts on this?

Other refs:

Structuring topics #167 (recent discussion concerning Nextercism)
Problem ordering, topics covered, and understanding of concepts required #60

The text was updated successfully, but these errors were encountered:

kytrinyx · 2017-09-08T23:36:39Z

This is interesting. I think anything we can do to simplify the initial scoring of exercises would be a big help, even if we end up having to manually tweak things.

ilya-khadykin · 2017-09-09T14:49:59Z

I've come up with a somewhat straightforward solution (more like a proof of concept):

https://nbviewer.jupyter.org/gist/m-a-ge/03185c4ab401c23d5b559bc28d96b659

Not addressed issues with these approach:

how to correctly compute overall difficulty score if more than one topic is provided for one Tier (I've summed them up).
what to do if overall score is above 10

Insti · 2017-09-09T15:14:32Z

The complexity caused by the interaction of topics can be tricky to calculate, and some may indicate 'AND' and others 'OR'.

My idea was to calculate difficulty based on the measured complexity of the solutions.
My issue about this from the Ruby track: exercism/ruby#663

radon appears to be what you'd use in python, how do those scores compare to your topic based scores?

what to do if overall score is above 10

Normalize (all) the values so this isn't an issue.

ilya-khadykin · 2017-09-09T17:18:38Z

Thanks for the link.

My idea was to calculate difficulty based on the measured complexity of the solutions.

Good idea. The example solution should be optimized as much as possible which is not always the case. But still, this idea is better

@Insti, how did you measure complexity?

NobbZ · 2017-09-09T17:22:04Z

Optimized for what? Reading? Speed? Idioms? Something else?

ilya-khadykin · 2017-09-09T17:26:08Z

Optimized for what? Reading? Speed? Idioms? Something else?

It can be a controversial topic, but in this context it should be optimized in terms of speed.
Anyway, we can measure performance of existing example solutions to get a sense of what difficulty score should be.

NobbZ · 2017-09-09T17:57:16Z

But once I read, that the example solution should be idomatic.

Most languages idioms I'm aware of, do prefer readability over speed unless the part is identified as bottleneck.

Insti · 2017-09-09T22:28:48Z

@Insti, how did you measure complexity?

I used a Ruby tool called flog for which the main component of the score is cyclomatic complexity. Higher scores are more complicated solutions.

The example solution should be optimized as much as possible.

Why?

We're trying to put problems into 10 big (difficulty) buckets.

Whether an optimal solution scores 10 and an average solution scores 20 doesn't really matter as they'll end up in buckets close enough to where they should be.

Once they're there and it turns out that people think that a problem is easier or harder it can easily be moved.

tleen · 2017-09-09T22:36:37Z

It can be a controversial topic, but in this context it should be optimized in terms of speed.

That is usually a go-to definition of "optimized" for any solution: use less time and/or less resources than another solution. I would venture, given the audience and goals of Exercism, that our most optimal solutions should feature idiomaticity and clarity over all things.

When I try to figure out an exercise's difficulty for a track I do it in a most-algorithmically-unsuitable manner. I look at:

the number of solutions that users submitted which were basically non-tries just to get to the next exercise,
the level of understanding users seemed to have of the submission (was the concept itself confusing?),
the number of submissions that were basically copies of the repository example solution (with slight variable names changes and the like)

Unsure how this would be done in an automated way: it is a lot of gut-feeling. Perhaps the latter can be a diff thing. But I do think somehow mining user submitted solutions could be helpful. I'd wager that if the last X number of submitted solutions to a problem average 100 LOC it would be a more difficult exercise than one where the average was 5 LOC. Or even if there is a high degree of variance between LOC in submissions for the same exercise on the same track. This could be very language dependent.

Also @m-a-ge the TOPICS.txt and the actual topics used in exercises can be quite divergent. I've been trying to normalize that information across tracks but still have a ways to go.

Overall this is a good avenue of approach you are taking it has a lot of promise, especially for less maintained tracks. Automate all the things! 👍

samir-majhi · 2017-10-01T09:20:26Z

Another option is to use user input the way ratings work on products on Amazon.

jonmcalder · 2017-10-01T12:27:21Z

Getting user input in order to measure this is a useful idea, but it doesn't necessarily get around the problem of how you measure complexity - it just shifts that problem onto users. Given the variance in ability, knowledge and interpretation that may or may not be helpful. But I certainly like the idea of trying to be more data driven with these kinds of issues if it's feasible.

samir-majhi · 2017-10-01T20:05:25Z

@jonmcalder As devs, we love to figure out an algorithm to solve a problem. But crowd sourcing information sometimes works better than algorithms.

Yes, variance is a problem. But then what we are trying to measure isn't necessarily objectively definable. So subjectivity might actually help, rather than harm.

I'm not saying that this is the way it should be done. Just that it's worth considering.

jonmcalder · 2017-10-01T20:47:31Z

Ok cool - I think we're in agreement then, since I think it's worth considering.

FWIW, I'm a data scientist not a dev - so in I'm usually all for data driven approaches where feasible. I'm just aware that the quality of data determines what kind of results you can get out of it. And it may be hard to get meaningful feedback from users on exercise difficulty if we as maintainers can't even agree on what type of "difficulty measure" we want to reflect.

Another factor is that even if feedback was collected, we wouldn't have it upfront so this wouldn't be useful for initial setup but only for refining difficulty scores later.

matthewmorgan · 2017-10-02T18:05:04Z

I would also comment that what is very difficult for one user might be an immediately recognized pattern for another. If we went with a crowd-sourced approach (which is an idea I think is worth exploring,) maybe we should make sure people see the distribution of responses somehow. EG, so they have an idea about how many people thought it was difficult, moderate, easy etc.

samir-majhi · 2017-10-16T10:51:57Z

OK. Here's how I'm thinking of exercise difficulty.

wrt Language features needed:
Easy: Loops, strings, conditionals
Hard: recursion, editing system classes, language specific idioms or methods

wrt Problem Concept
Easy: School level math, String manipulation
Hard: College level math, Needs domain understanding

wrt Algorithm
Easy: find and transform,
Hard: Multiple levels of abstraction, functional programming (easy for languages that are already functional)

Since the objective of Exercism to to teach a language rather than algos and complicated problem solving, the level of problems here are easy both wrt Problem Concept as well as Algorithm.
This means we can think of exercise difficulty primarily from the language features needed point of view.

How can we measure it? Using something like Riiki over the sample solution to find out how many easy and how many hard language concepts are used.

Thoughts?

Stargator · 2017-12-06T17:49:53Z

The example solution should be optimized as much as possible.

The example solution has no bearing on how a user may approach a problem. Sometimes as humans we make a problem harder without realizing it. That's how we learn :)

kytrinyx · 2018-08-03T01:36:30Z

It's been a while since this discussion was active, but my gut sense is that our data is not good enough and the languages are different enough that an automated, data-driven solution is not going to be a particularly good approach for us.

That said, if anyone does manage to do this in their track, please share!

ilya-khadykin mentioned this issue Sep 8, 2017

Implement exercise complex-numbers exercism/python#523

Merged

kytrinyx closed this as completed Aug 3, 2018

cmccandless mentioned this issue Aug 3, 2018

Exercise difficulty guidelines exercism/python#1370

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

General formula for computing exercise difficulty #193

General formula for computing exercise difficulty #193

ilya-khadykin commented Sep 8, 2017 •

edited

Loading

kytrinyx commented Sep 8, 2017

ilya-khadykin commented Sep 9, 2017

Insti commented Sep 9, 2017

ilya-khadykin commented Sep 9, 2017

NobbZ commented Sep 9, 2017

ilya-khadykin commented Sep 9, 2017

NobbZ commented Sep 9, 2017

Insti commented Sep 9, 2017

tleen commented Sep 9, 2017

samir-majhi commented Oct 1, 2017

jonmcalder commented Oct 1, 2017

samir-majhi commented Oct 1, 2017 •

edited

Loading

jonmcalder commented Oct 1, 2017

matthewmorgan commented Oct 2, 2017

samir-majhi commented Oct 16, 2017

Stargator commented Dec 6, 2017

kytrinyx commented Aug 3, 2018

General formula for computing exercise difficulty #193

General formula for computing exercise difficulty #193

Comments

ilya-khadykin commented Sep 8, 2017 • edited Loading

kytrinyx commented Sep 8, 2017

ilya-khadykin commented Sep 9, 2017

Insti commented Sep 9, 2017

ilya-khadykin commented Sep 9, 2017

NobbZ commented Sep 9, 2017

ilya-khadykin commented Sep 9, 2017

NobbZ commented Sep 9, 2017

Insti commented Sep 9, 2017

tleen commented Sep 9, 2017

samir-majhi commented Oct 1, 2017

jonmcalder commented Oct 1, 2017

samir-majhi commented Oct 1, 2017 • edited Loading

jonmcalder commented Oct 1, 2017

matthewmorgan commented Oct 2, 2017

samir-majhi commented Oct 16, 2017

Stargator commented Dec 6, 2017

kytrinyx commented Aug 3, 2018

ilya-khadykin commented Sep 8, 2017 •

edited

Loading

samir-majhi commented Oct 1, 2017 •

edited

Loading