Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

request for a smaller version of the combined.json #17

Closed
jannikmi opened this issue May 19, 2017 · 10 comments
Closed

request for a smaller version of the combined.json #17

jannikmi opened this issue May 19, 2017 · 10 comments

Comments

@jannikmi
Copy link

For some applications it might not be feasible to have a 120MB .json file as data basis.
With certain simplifications and tricks it should theoretically be possible to compress the data to around 7MB (cf. timezonefinderL data, consisting of simplified tz_world data).

Without trying to go into those extremes the question is how reduce data size while still keeping an acceptable level of accuracy.

@evansiroky
Copy link
Owner

One of the primary motivations for this project is to have a very high level of accuracy of the boundaries. One can see a big improvement when comparing tz_world to timezone-boundary-builder. Thus, I am very hesitant to do any simplifications of the boundaries.

With respect to various libraries that implement a lookup based off of this data, it does seem that each of them have their own compression methodologies. Some of them do go so far as simplifying geometries. In issue #11 a user mentioned that a lot of coordinates are excessively precise. I think reducing the precision to 5 or 6 decimal places could have a good effect on reducing the file size and is something that I can commit to pursuing.

@jannikmi
Copy link
Author

One would need do some calculations on how many decimal places are required for a certain accuracy, but my actual suggestion is compiling a completely separate .json file with reduced size (while keeping the bigger more accurate one).

@evansiroky
Copy link
Owner

As noted in my previous message, I believe that is a task best left to downstream users of the output data.

@eusonlito
Copy link

@jannikmi I am working on a project that needs more performance than accuracy as the timezones are stored in a database and the originals are very heavy. The process involves reducing to 4 decimal places the coordinates (10 meters error is acceptable) https://gis.stackexchange.com/a/8674 and applying simplification methods in the database for insertion as a polygon https://gis.stackexchange.com/a/428927

@jannikmi
Copy link
Author

Thanks for referencing this here. Perhaps it would be helpful for other users when you make your data compressions code available once your done!

@eusonlito
Copy link

@jannikmi I'm using PHP and MySQL. PostgreSQL uses ST_SimplifyPolygonHull and MySQL ST_Simplify.

My Code is similar to:

use Illuminate\Support\Facades\DB;

/**
 * @param \stdClass $geometry
 * @param float $simplify = 0
 *
 * @return \Illuminate\Database\Query\Expression
 */
function geomFromGeoJSON(stdClass $geometry, float $simplify = 0): Expression
{
    if ($geometry->type !== 'MultiPolygon') {
        $geometry->type = 'MultiPolygon';
        $geometry->coordinates = [$geometry->coordinates];
    }

    $sql = sprintf("ST_GeomFromGeoJSON('%s', 2, 0)", json_encode($geometry));
    $sql = 'ST_Simplify('.$sql.', '.$simplify.')';

    return DB::raw($sql);
}

/**
 * @param \stdClass $zone
 *
 * @return void
 */
function zoneSave(stdClass $zone): void
{
    try {
        zoneUpdateOrInsert($zone, 0.005);
    } catch (Throwable $e) {
        zoneUpdateOrInsert($zone, 0);
    }
}

/**
 * @param \stdClass $zone
 * @param float $simplify
 *
 * @return void
 */
function zoneUpdateOrInsert(stdClass $zone, float $simplify): void
{
    TimeZoneModel::updateOrInsert(
        ['zone' => $zone->properties->tzid],
        ['geojson' => geomFromGeoJSON($zone->geometry, $simplify)]
    );
}

$json = file_get_contents('combined-with-oceans.json');
$json = preg_replace('/([0-9]\.[0-9]{4})[0-9]+/', '$1', $json);

foreach (json_decode($json)->features as $zone) {
    zoneSave($zone);
}

Regards,
Lito.

@eusonlito
Copy link

@jannikmi, as final comment, the size of timezone table with the import optimization of combined-with-oceans.json is about 8MB.

@farfromrefug
Copy link

@eusonlito i am interested in smaller size shape/geojson. I am looking at find a very small timezone geojson (only used in a weather app to get timezone from weather location).
I tried to simplify the geojson with QGIS but it gets "hole". It is normal as each polygon is simplified separately.
image

Can you explain a bit more how you simplified it? Did you clone this repo and modified some files?

@eusonlito
Copy link

eusonlito commented Dec 12, 2024

@farfromrefug I'm not worried about holes :)

My code:

// Load Original JSON file from repository as string
$json = file_get_contents('combined-with-oceans.json');

// Replace al float values with more than 4 decimals with only 4 decimals
$json = preg_replace('/([0-9]\.[0-9]{4})[0-9]+/', '$1', $json);

// Try to insert a simplified version with 0.005 simplification value
zoneUpdateOrInsert($zone, 0.005);

// If error, insert the original value without simplification (some shapes are too small to simplify)
zoneUpdateOrInsert($zone, 0);

@farfromrefug
Copy link

@eusonlito ok so you end up with something similar to mine :D
/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants