U4-7513 - Invalid EXIF data in image causes incorrect size values in media item

Created by Gordon Saxby 09 Dec 2015, 08:35:54 Updated by Pete Duncanson 04 Mar 2018, 21:08:38

Some images have been created using PhotoShop and passed on to users. The users have then resized (reduced) the images, mainly in Paint.Net but other packages as well, I believe.

When these images are uploaded to the media library, incorrect width and height values are being recorded. The values are being extracted from the EXIF data, which was correct when the original images were created, but is now wrong since they have been edited with software that does not maintain the EXIF data.

The width and height information is being taken from the media item and used in the IMG tag, so the images are being displayed larger than 100%.

6 Attachments

Comments

Sebastiaan Janssen 09 Dec 2015, 08:57:24

Do you have an example image please?


Gordon Saxby 09 Dec 2015, 09:10:27

The original and resized images attached.


Sebastiaan Janssen 09 Dec 2015, 09:24:57

Got it and can reproduce with the small image (umbracoHeight and umbracoWidth are indeed set to the wrong values).

Unfortunately it will also have a perfomance impact to not use the EXIF data, see: https://github.com/umbraco/Umbraco-CMS/blob/d50e49ad37fd5ca7bad2fd6e8fc994f3408ae70c/src/Umbraco.Core/Media/ImageHelper.cs#L25

Not sure what to do about this. And, to be ahead of this suggestion: please no, adding a setting to disable EXIF processing somewhere is not the answer. ;-)


Gordon Saxby 09 Dec 2015, 10:15:00

Yes, I agree it is a tricky issue! The problem here is that Paint.Net is used throughout the organisation by web authors who upload their own images. The image example I supplied is only one of many! We may be able to find a fix / workaround for the authors but it still leaves the issue that invalid EXIF data will stuff up the media library entry!


Sebastiaan Janssen 09 Dec 2015, 10:46:05

Forgot to add: for an immediate fix, for now you could implement an AfterSave eventhandler that DOES execute the correct code and updates the umbracoHeight and umbracoWidth property.


Gordon Saxby 09 Dec 2015, 10:56:07

Yes, I have considered / suggested that. For now, there is a way of using Paint.Net to remove the EXIF data and therefore the issue (basically, copy image and paste into a new file, then resize & save). Of course, that does rely on the authors doing it the correct way, or in fact doing it at all!


Sentient 14 Jan 2016, 20:48:09

Any traction on this one so far? Several clients have images with the exif image width and image height metadata differing to the plain image width and height metadata. (Used exiftool for inspection and would require reprocessing of a large number of images which the client/members of the public would not be expected to be able to pick up that their images had incorrect exif dimensions). Why not just get Umbraco to read the width and height instead of the exif dimensions when an image type is created? It is known that the exif dimensions are not the best properties or trustworthy and there are several ways they can become out of sync.


Douglas Robar 16 Mar 2017, 16:25:28

I agree that for speed and memory reasons exif data should be consulted if possible. Yet when there is inconsistency between the embedded data and the physical file... it happens.

Perhaps take the one-time hit to interrogate the file at the time it is saved? It needn't take a massive amount of processing time or memory to read the file headers.

http://stackoverflow.com/questions/111345/getting-image-dimensions-without-reading-the-entire-file has two relevant answers and decent comment discussion. The obvious approach is to grab the relevant first bytes of the file (which you probably do similarly in umbraco.core/media/exif though I didn't confirm that). A second point, that is helpful is further down and regards the dramatic performance improvement when ValidateImageData:false is used if you do read the file in its entirety. I can confirm from my own work this is an enormous savings.

Image.FromStream(stream: file, useEmbeddedColorManagement: false, validateImageData: false))

I've shown basic screenshots of the exif data of a sample image as seen through the Info panel of a Mac's Preview application.

Also, using the EXIFTOOL (exiftool -a -u -g1 7.jpg ) we can see that the width is reported in three places with three different values. Only the file details are correct for this resized image. ---- File ---- Image Width : 1680 ---- IFD0 ---- Image Width : 4992 ---- ExifIFD ---- Exif Image Width : 5616

---- ExifTool ---- ExifTool Version Number : 10.46 ---- System ---- File Name : 7.jpg Directory : . File Size : 981 kB File Modification Date/Time : 2017:03:16 14:18:11+00:00 File Access Date/Time : 2017:03:16 15:50:57+00:00 File Inode Change Date/Time : 2017:03:16 14:18:49+00:00 File Permissions : rw-r--r-- ---- File ---- File Type : JPEG File Type Extension : jpg MIME Type : image/jpeg Exif Byte Order : Big-endian (Motorola, MM) Current IPTC Digest : 0c59cc6dc047d98fcba24ee28eb0021d Image Width : 1680 Image Height : 656 Encoding Process : Baseline DCT, Huffman coding Bits Per Sample : 8 Color Components : 3 Y Cb Cr Sub Sampling : YCbCr4:2:0 (2 2) ---- JFIF ---- JFIF Version : 1.01 Resolution Unit : inches X Resolution : 72 Y Resolution : 72 ---- IFD0 ---- Image Width : 4992 Image Height : 3328 Bits Per Sample : 16 16 16 Compression : Uncompressed Photometric Interpretation : RGB Make : Canon Camera Model Name : Canon EOS-1D Mark IV Orientation : Horizontal (normal) Samples Per Pixel : 3 X Resolution : 72 Y Resolution : 72 Planar Configuration : Chunky Resolution Unit : inches Software : Adobe Photoshop CC 2017 (Windows) Modify Date : 2017:02:10 15:55:10 Artist : Anthony Terrot White Point : 0.313 0.329 Primary Chromaticities : 0.64 0.33 0.21 0.71 0.15 0.06 Copyright : terrots@yahoo.com tel:+973.39669671 ---- ExifIFD ---- Exposure Time : 1/320 F Number : 11.0 ISO : 200 Exif Version : 0221 Date/Time Original : 2010:05:25 11:25:45 Create Date : 2010:05:25 11:25:45 Shutter Speed Value : 1/332 Aperture Value : 11.3 Exposure Compensation : 0 Flash : No Flash Focal Length : 24.0 mm User Comment : Flashpix Version : 0100 Color Space : Uncalibrated Exif Image Width : 5616 Exif Image Height : 2194 Focal Plane X Resolution : 3795.348877 Focal Plane Y Resolution : 3904.306152 Focal Plane Resolution Unit : inches Custom Rendered : Normal Exposure Mode : Auto White Balance : Auto Scene Capture Type : Standard Gamma : 2.2 ---- IFD1 ---- Compression : JPEG (old-style) X Resolution : 96 Y Resolution : 96 Resolution Unit : inches Thumbnail Offset : 1270 Thumbnail Length : 3199 Thumbnail Image : (Binary data 3199 bytes, use -b option to extract) ---- IPTC ---- Coded Character Set : UTF8 Coded Character Set : UTF8 Application Record Version : 2

I hope a solution can be found because users only know "something is wrong" but are not in a position to understand it is bogus data embedded in their images.


Pete Duncanson 04 Mar 2018, 21:08:38

Is reading from EXIF as mentioned in the code snippet from @sebastiaan an optimisation that we don't really need? How often are we reading that data? Is it a hit worth taking to ask the file directly and know your "right"?


Priority: Major

Type: Bug

State: Open

Assignee:

Difficulty: Normal

Category:

Backwards Compatible: True

Fix Submitted:

Affected versions: 7.2.8, 7.3.4

Due in version:

Sprint:

Story Points:

Cycle: