Description

This doco describes how I was doing this project.

I've got an email with the task assignment somewhere in the evening of Tue 16 Apr but I couldn't get started straight away.
Started looking at this task early in the morning on 18 Apr Thu.
I sketched all my thoughts straight away into the readme.md as a draft.
Then I straight away manually downloaded file Contents-amd64.gz, ungzipped it and mocked up the main algorithm of finding the biggest package.
Then I went to my day job :)
Then I took some time during the mid-day and briefly mocked up a structure for a contents_index class. Briefly, but with enough documentation in order to not get lost my thoughts :)
A bit later I got back to this project again and tried think what should be a good way to process each line in Contents-amd64.gz. Looked into the file again, read to Debian doc again. Finaly decided to write up a function that will handle comma-separated list of packages in a single line like 'filename pkg1,pkg2,...,pkgN'. Also added couple of unit tests to it to make sure it works as expected.
Later after work, I came back again to this. I decided I need to implement downloading of the file and ungzipping. And so I've done that. Along that way I've learned something interesting about downloading binary data via requests.get and espessially using flag stream=True and reading data in chunks.
Then I started trying to read gzipped file which I downloaded on the previous step and applying the initial alrorithm to it.
And I hit a BUG as a result :) I described the issue here as well as how to fix it.
Then I took some time to fix the bug. The fix was pretty simple so I've applied it. To make sure this bug will be visible I added special unit test for that particular use case.
After that I've finished writing get_packages_size() function which does all calculations and wrote a unit test for it as well.
As a next step I've coded how to actually get top N biggest packages out of our calculated sizes of all packages. That was rather simple as I just re-used a ready function heapq.nlargest() which does exactly what we need.
As a last step I've created a separate cmd package_statistics.py which simply imports our class and calls needed methods.
And finally I took a dive into Python strings formatting on how to make our output to look nice. Ultimately done that too.

Overall this piece of work was done in about 24 hours with a lot of switching of context from one thing to another and trying to sneak it some time to work on it.

Ultimately I had a good time working on this project and I'm interested and opened to any comments/critics towards it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how_this_work_was_done.md

how_this_work_was_done.md

Description

Files

how_this_work_was_done.md

Latest commit

History

how_this_work_was_done.md

File metadata and controls

Description