-
-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
General formula for computing exercise difficulty #193
Comments
This is interesting. I think anything we can do to simplify the initial scoring of exercises would be a big help, even if we end up having to manually tweak things. |
I've come up with a somewhat straightforward solution (more like a proof of concept): Not addressed issues with these approach:
|
The complexity caused by the interaction of topics can be tricky to calculate, and some may indicate 'AND' and others 'OR'. My idea was to calculate difficulty based on the measured complexity of the solutions.
Normalize (all) the values so this isn't an issue. |
Thanks for the link.
Good idea. The example solution should be optimized as much as possible which is not always the case. But still, this idea is better @Insti, how did you measure complexity? |
Optimized for what? Reading? Speed? Idioms? Something else? |
It can be a controversial topic, but in this context it should be optimized in terms of speed. |
But once I read, that the example solution should be idomatic. Most languages idioms I'm aware of, do prefer readability over speed unless the part is identified as bottleneck. |
I used a Ruby tool called
Why? We're trying to put problems into 10 big (difficulty) buckets. Whether an optimal solution scores 10 and an average solution scores 20 doesn't really matter as they'll end up in buckets close enough to where they should be. Once they're there and it turns out that people think that a problem is easier or harder it can easily be moved. |
That is usually a go-to definition of "optimized" for any solution: use less time and/or less resources than another solution. I would venture, given the audience and goals of Exercism, that our most optimal solutions should feature idiomaticity and clarity over all things. When I try to figure out an exercise's difficulty for a track I do it in a most-algorithmically-unsuitable manner. I look at:
Unsure how this would be done in an automated way: it is a lot of gut-feeling. Perhaps the latter can be a Also @m-a-ge the TOPICS.txt and the actual topics used in exercises can be quite divergent. I've been trying to normalize that information across tracks but still have a ways to go. Overall this is a good avenue of approach you are taking it has a lot of promise, especially for less maintained tracks. Automate all the things! 👍 |
Another option is to use user input the way ratings work on products on Amazon. |
Getting user input in order to measure this is a useful idea, but it doesn't necessarily get around the problem of how you measure complexity - it just shifts that problem onto users. Given the variance in ability, knowledge and interpretation that may or may not be helpful. But I certainly like the idea of trying to be more data driven with these kinds of issues if it's feasible. |
@jonmcalder As devs, we love to figure out an algorithm to solve a problem. But crowd sourcing information sometimes works better than algorithms. Yes, variance is a problem. But then what we are trying to measure isn't necessarily objectively definable. So subjectivity might actually help, rather than harm. I'm not saying that this is the way it should be done. Just that it's worth considering. |
Ok cool - I think we're in agreement then, since I think it's worth considering. FWIW, I'm a data scientist not a dev - so in I'm usually all for data driven approaches where feasible. I'm just aware that the quality of data determines what kind of results you can get out of it. And it may be hard to get meaningful feedback from users on exercise difficulty if we as maintainers can't even agree on what type of "difficulty measure" we want to reflect. Another factor is that even if feedback was collected, we wouldn't have it upfront so this wouldn't be useful for initial setup but only for refining difficulty scores later. |
I would also comment that what is very difficult for one user might be an immediately recognized pattern for another. If we went with a crowd-sourced approach (which is an idea I think is worth exploring,) maybe we should make sure people see the distribution of responses somehow. EG, so they have an idea about how many people thought it was difficult, moderate, easy etc. |
OK. Here's how I'm thinking of exercise difficulty. wrt Language features needed: wrt Problem Concept wrt Algorithm Since the objective of Exercism to to teach a language rather than algos and complicated problem solving, the level of problems here are easy both wrt Problem Concept as well as Algorithm. How can we measure it? Using something like Riiki over the sample solution to find out how many easy and how many hard language concepts are used. Thoughts? |
The example solution has no bearing on how a user may approach a problem. Sometimes as humans we make a problem harder without realizing it. That's how we learn :) |
It's been a while since this discussion was active, but my gut sense is that our data is not good enough and the languages are different enough that an automated, data-driven solution is not going to be a particularly good approach for us. That said, if anyone does manage to do this in their track, please share! |
This issue is a follow-up for discussion in exercism/python#523. This is closely related to #92, but I think @exercism/track-maintainers can compute the difficulty themselves instead for now.
I like the idea proposed by @lilislilit of summing up a difficulty score based on exercise topics.
Topics themselves can be organized in Tiers which have respective scores. It's almost done in TOPICS.txt, so
Base Concepts
topics can have1
as the difficulty score.This will help us choose topics carefully since the difficulty will be a good indicator when very few topics were chosen or too many.
What are your thoughts on this?
Other refs:
The text was updated successfully, but these errors were encountered: