Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assets sometimes fail to load from the asset cache. #2663

Closed
kwvanderlinde opened this issue May 19, 2021 · 7 comments · Fixed by #2835
Closed

Assets sometimes fail to load from the asset cache. #2663

kwvanderlinde opened this issue May 19, 2021 · 7 comments · Fixed by #2835
Assignees
Labels
bug tested This issue has been QA tested by someone other than the developer.

Comments

@kwvanderlinde
Copy link
Collaborator

Describe the bug
When adding many new tokens using assets from the resource library, it sometimes happens that the asset is not fully read and results in MapTools freezing.

To Reproduce
Unfortunately I was not able to reproduce this without 5e Framework since I'm not sure exactly what it's doing to load the assets.

Steps to reproduce the behavior:

  1. Clean MapTool installation
  2. Downloads Rod's 5e Framework
  3. Install the image files from https://get.5e.tools/.
  4. Open up the Creature List and clicking "Make Token" for several different creatures.
  5. After MapTool freezes, force quit.
  6. Load up the campaign again and try to add the same creature that resulted in a freeze. This time it will work.

Expected behavior
MapTool should load the asset without freezing.

MapTool Info

  • Version: 1.8.3, reproduced on latest develop (7664101).
  • Install: New

Desktop (please complete the following information):

  • OS: Linux Mint
  • Version 20

Additional context
I'm not sure if MapTool itself is freezing, or if the framework is somehow holding it up. I also don't know how to get the logs from a frozen MapTool process, so I reproduced the issue with the latest on develop and got these logs right when the freeze occurred:

14:49:11.392 (AssetManager.java:454) [pool-4-thread-1] ERROR net.rptools.maptool.model.AssetManager - MD5 for asset Camel corrupted
14:49:11.397 (ImageManager.java:428) [pool-8-thread-1] ERROR net.rptools.maptool.util.ImageManager - BackgroundImageLoader.run(Camel,png, d453500b4f82966f1d166b65acaa1c12): image not resolved
javax.imageio.IIOException: Error reading PNG image data
	at com.sun.imageio.plugins.png.PNGImageReader.readImage(PNGImageReader.java:1523) ~[?:?]
	at com.sun.imageio.plugins.png.PNGImageReader.read(PNGImageReader.java:1840) ~[?:?]
	at javax.imageio.ImageIO.read(ImageIO.java:1468) ~[?:?]
	at javax.imageio.ImageIO.read(ImageIO.java:1363) ~[?:?]
	at net.rptools.lib.image.ImageUtil.bytesToImage(ImageUtil.java:302) ~[main/:?]
	at net.rptools.maptool.util.ImageManager$BackgroundImageLoader.run(ImageManager.java:424) [main/:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:831) [?:?]
Caused by: java.io.EOFException: Unexpected end of ZLIB input stream
	at java.util.zip.InflaterInputStream.fill(InflaterInputStream.java:244) ~[?:?]
	at java.util.zip.InflaterInputStream.read(InflaterInputStream.java:158) ~[?:?]
	at java.io.BufferedInputStream.fill(BufferedInputStream.java:244) ~[?:?]
	at java.io.BufferedInputStream.read1(BufferedInputStream.java:284) ~[?:?]
	at java.io.BufferedInputStream.read(BufferedInputStream.java:343) ~[?:?]
	at java.io.DataInputStream.readFully(DataInputStream.java:201) ~[?:?]
	at com.sun.imageio.plugins.png.PNGImageReader.decodePass(PNGImageReader.java:1245) ~[?:?]
	at com.sun.imageio.plugins.png.PNGImageReader.decodeImage(PNGImageReader.java:1381) ~[?:?]
	at com.sun.imageio.plugins.png.PNGImageReader.readImage(PNGImageReader.java:1514) ~[?:?]
	... 8 more

I did some digging and added some additional logging to AssetManager, and what I found is that this call to FileUtils.readFileToByteArray() does not return all the data for the asset. This makes the MD5 to be incorrect and causes the inflation to fail.

@kwvanderlinde
Copy link
Collaborator Author

I did some digging in the code, and I have a theory as to why this is happening. It looks like there is a race condition in AssetManager that allows a partially written asset files to be read back.

These are a few culprits here:

  1. When creating the cached file, the file is written to on its own thread, and there doesn't appear to be any way of indicating that whether all the data has been written.
  2. When reading the cached file, the file is considered to be in the cache if it exists and has a non-zero length.
  3. When reading the cached file, the asset is used regardless of whether the MD5 matches. An error is logged, but otherwise this mismatch is not acted upon.

It seems to me that all three of these points could be made more robust, and fixing any one of them should fix the underlying issue. E.g.,

  1. We could write the asset to a temporary file (e.g., <md5 hash>.tmp or something) and then rename it once the write has completed. That way, if the file exists with the right name, it should have all the data in it. If the write hasn't completed, or even if it failed part way through, the asset file won't exist and can't be read.
  2. If we could know the expected length of the file, we could compare against that instead of comparing against zero. This would ensure that an asset is only considered to be in the cache if its file has the correct amount of data. This would also suggest that the write has actually completed (barring external factors such as tampering with the files, of course). Alternatively, the asset could be considered to be in the cache whenever its file exists, regardless of size, with corruption being handled purely by the MD5 check.
  3. If the MD5 does not match the expectation, then the asset is not considered successfully loaded and the corrupted asset is not associated with the MD5 in the asset map. Perhaps the file could even be ejected from the cache in this case.

@Phergus
Copy link
Contributor

Phergus commented Jun 15, 2021

I suspect what you have found here explains many of the cases where players end up with dreaded red X for tokens/maps.

It seems like your #1 suggestion will fix most cases and is a common way of avoiding these problems. Adding the MD5 check in #3 will allow us to identify corrupt images in the cache which would allow us to flush it out of the cache and, for player clients, request it again from the server.

@Phergus
Copy link
Contributor

Phergus commented Jul 3, 2021

Testing. This appears to have created a problem with saving new images to the assetcache.

Do any of the following:

  • Create New Map and add map image that has not been loaded before and thus not in cache.
  • Import DungeonDraft UVTT file.
  • Drop new image as Token.

These all produce exceptions like this:

2021-07-03 15:03:17.292 ERROR net.rptools.maptool.model.AssetManager - Could not persist asset while writing image data
java.nio.file.FileSystemException: C:\Users\Phil\.maptool-rptools\assetcache\tmp.10391504194488045576 -> C:\Users\Phil\.maptool-rptools\assetcache\e3b4e8912980831edd7d0cc514c5ee17: The process cannot access the file because it is being used by another process
	at sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) ~[?:?]
	at sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) ~[?:?]
	at sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:309) ~[?:?]
	at sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:293) ~[?:?]
	at java.nio.file.Files.move(Files.java:1426) ~[?:?]
	at net.rptools.maptool.model.AssetManager$AssetWriteRenameOperation.commit(AssetManager.java:850) ~[main/:?]
	at net.rptools.maptool.model.AssetManager.lambda$putInPersistentCache$3(AssetManager.java:544) ~[main/:?]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) [?:?]
	at java.util.concurrent.FutureTask.run(FutureTask.java:264) [?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130) [?:?]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:630) [?:?]
	at java.lang.Thread.run(Thread.java:831) [?:?]

Only the .info file is added to the assetcache. Fortunately the assets are saved with the campaign or exported map files.

@Phergus Phergus reopened this Jul 3, 2021
@kwvanderlinde
Copy link
Collaborator Author

I have a feeling this may be a windows-specific thing, though I don't see why. I'll see if I can reproduce it.

@Phergus
Copy link
Contributor

Phergus commented Jul 3, 2021

Could be given where the error occurs. The exception occurs 3 or 4 times depending on image size it seems.

@kwvanderlinde
Copy link
Collaborator Author

Ah, this may be as simple as an incorrect closing order for streams. Will test later and post a PR if that's the case.

@Phergus
Copy link
Contributor

Phergus commented Jul 5, 2021

Re-tested. New assets are now being written to the asset cache and with no exceptions.

@Phergus Phergus added the tested This issue has been QA tested by someone other than the developer. label Jul 5, 2021
@Phergus Phergus closed this as completed Jul 13, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug tested This issue has been QA tested by someone other than the developer.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants