When I dropped the kids off at school today, one of the teachers commented to another “Oh, you must see that video of them singing on the bike, it’s hilarious”. She grinned and looked at me expectantly.
OK. I’ve got my entire catalogue on Mylio. I opened it, hit search, and “jingle bells bike”. Nothing.
And that was it – the moment was gone. I would have loved to share it with them, but I couldn’t.
And this is the point. If I’d had PictureLife, I probably could have found the video and streamed it pretty quickly. With Mylio – no; even if hosted in their cloud (which is not really what it’s for), it would still need to download the video first, by which time you’ve lost the moment.
Dropbox – as long as the file had that in the name, it probably could have found it immediately. Once found, a video can be played with a click, even skipped through, and it keeps playing within a few seconds. Speed is a priority.
Don’t get me wrong, Mylio is great – I just searched for “bike” and it showed me 80 photos immediately which I can scroll through chronologically with absolutely zero lag (PictureLife became dog-slow towards the end). But all the videos are white spaces without thumbnails (not sure what happened there), and even if I found and played one, it would need a full cloud subscription and be able to download the whole thing before it started playing.
Dropbox is simple, it works, they’re not mining the photos (that we know of / yet), and it streams media easily.
So, I thought I’d try Flickr as a photo hosting site, as a trial run. Since I want to make a little effort to retain privacy through obscurity (<g>), I tried creating a new account with no links to my own ID. But Yahoo require a valid phone number.
No problem, I thought – I’ll use the Burner app to create a temporary phone number to receive the verification code.
No dice. Yahoo reject VOIP numbers from apps like Burner. I’ll have to use a valid mobile – although I do have a UK one that I bought in a vending machine with no ID, and topped up with another credit card.
I’m sure government services could easily make the link, but that’s not what I’m trying to do – I’m just trying to make things a little harder for advertising and identity trading services should Yahoo sell out to someone who then try to mine my photos for information and link it back to my identity.
I’m currently running a large reorganisation of my 50,000-odd photos on my home server / NAS using Picasa and Mylio. While doing so, I realised one argument for using XMP rather than EXIF data.
As a small recap – both allow you to store metadata for photos, such as keywords, 5-star ratings, geolocation, and so on.
- EXIF is inline in the JPEG file, and hence you only need the JPEG file itself – if you change the EXIF data, it basically changes those bytes in the file or adds/removes that data. There are various versions of the EXIF standard, and there appear to be tens or hundreds of fields available.
- XMP is a ‘sidecar’ file – it sits in the same directory, with the same name, but with an .XMP extension. Eg. You might have a file from your iPhone called IMG_1234.jpg. With this method, the EXIF data in that JPEG is untouched, but instead an XMP file is written called IMG_1234.xmp, and some editing software such as Lightroom or Mylio know to look at the XMP file and apply the data they find there. XMP is extensible – you can have almost as many fields as you need.
I can see arguments for and against each – EXIF is in the file, so as long as you have the JPEG, the data travels with it. This can be important if, say, uploading photos to Facebook or other photo sharing sites that wouldn’t recognise the XMP file.
However, XMP has the advantage of never changing the JPEG file itself, so you can make image edits that are only applied to the JPEG at view-time. This means you can make multiple edits to the JPEG file, and it’s not repeatedly de/re-coded and re-written, decreasing quality each time you save. They’re also very useful for .RAW files, which vary in format and don’t have the same EXIF fields as JPEG files.
However – the thing that struck me, was when I applied 15,000 facial recognition edits from Picasa to JPEG files after a couple of days of training and processing. Picasa dutifully wrote the name tags to each JPEG file – and in so doing, changed the checksum for each one, even if only by adding a few bytes. My cloud backup software Crashplan picked this up, and immediately wanted to back up the 15,000 files all over again.
This is hundreds of Gigabytes! Of course, if I want to keep those tags, I’ll have to do it too. And again every time I add more tags – if I add album names and/or keywords at a later date – it will have to re-upload the lot each time! Backup software does often offer differential backup of only the changed bytes, but this is generally meant for a small section of one huge file being changed, not a small section of thousands of small files, and would likely not help.
If I had written them to XMP instead, then the JPEG files themselves wouldn’t have changed, and Crashplan would only have uploaded the new or changed XMP files. These would be far smaller, and far easier to sync. Hence I will be doing this in the future.
It’s at least worth writing ongoing edits to XMP files, and if you ever decide that your changes are ‘done’, you can perhaps write and ‘flatten’ those changes into the original JPG and EXIF data at that point for archival storage.
Following the near-demise of PictureLife, I keep finding myself, almost weekly, trying to find a replacement, but none seem to offer the critical data export or portability that I need. I’ve learnt a harsh lesson from trusting a service that stored my data in a semi-proprietary manner – I chose PictureLife because I could export my photos as zipfiles with all the metadata, but then neglected to do so, and so found I only had the S3 bucket which did have all my photos, but without the metadata I had created to organise them.
So – I’ve spent another two hours on a Saturday evening looking for a service that allows me to store my photos in the cloud, organise them as I see fit, share them to friends and family, yet still retain full control of them, including exporting with metadata at any time.
And there is nothing out there.
Well, there’s lots out there, but none of them gives you full control. For example:
- Google Photos is by far the best service, practically for free, but in sharing all your data with Google, you’re exposing every place, person, and memorable thing you’ve done in your entire life, to a megacorp and government who’ll be up to who-knows-what around about the time your kids are trying to get a job. (“2035: President Trump II has decreed that all families photographed in the past with muslims or mexicans will be declined permanent visas to the USA”). Think this is irrational? China are doing it already. Also – with Google Photos, it’s still tricky to get your data out in a structured manner.
- Flickr also offers free storage, but again, bulk download is a bit of an unknown. You can bulk download natively, but possibly without any metadata since added. There are various client-based apps to download photos with metadata, but again, how long will they be around? Will Flickr change its APIs? If they fail to maintain tag access, then all your metadata is locked up forever.
- Shoebox is a close contender to PictureLife – everything is fully managed; you can share; the photo storage is free, and you pay for storing large quantities of video (which I do have). But the bulk export is simply a single button – once the zip is generated, you get a link to export all your photos. If you accumulate hundreds of Gigabytes, it’s not something you want to do every month as a precaution (and you should do this at least every month, considering the story of Everpix, PictureLife, Loom, etc).
- Everalbum is another new cloud storage photo manager. How do you export your photos? Well, from iOS, you can do ten at a time. From web – one at a time. Bulk export is ‘in the future’, says the help topic from December 2015.
OK – so, let’s admit that we can’t trust third-party photo hosting companies with our photos. What about major cloud storage providers? Surely if their entire business is storing data (Hello Dropbox!), then we can simply retain the data on our computer and sync it file-for-file with that cloud storage service. We presume they don’t go away – and if they do, we still have an original copy ourselves, no worries!
So, we need either a good photo album front-end for that provider, or otherwise, another third party (fourth party?) who can run a thin front-end on top of that cloud storage to make all our photos beautiful, editable, shareable and accessible.
Well, of course, that’s what PictureLife was: they used Amazon S3 for the storage, for which you could use your own S3 bucket, and then offered the timeline, memories, geoview, Aviary editor, and all sorts of other excellent tools, while you retained ownership of the photos. Unfortunately, in order for them to operate such a large multi-user system, all the metadata and organisation was in their system and inaccessible if the lights suddenly went out – as they did.
So what else is there?
- Dropbox is the obvious contender – your photos are right there on your drive, and the same in the cloud. Sharing is easy, as almost everyone has dropbox, and Photos and Videos appear and stream effortlessly. But their organisation is basic, and they’ve recently discontinued their photo management frontend, Carousel.
- Microsoft’s OneDrive does offer file-and-folder-based storage while also offering Google Photos-level organisation, galleries, sharing, searching, etc. as a front-end. It’s good, and again, most people will have an MS account, but again, you wonder how much Microsoft will be peering into your metadata in the future.
- Unbound is one of those ‘fourth-party’ apps – a simple mobile app that reads from Dropbox, but gives you gallery and calendar views. It’s an entirely client-based app, rather than a service, which means you get thumbnail caching and other features on your phone, and it’s more private than trusting a ‘broker’ sorting your photos for you, but it suffers from the limitations of this, such as not offering memories, or a geoview with all your photos on a zoomable map.
- CloudGallery is another front-end app over cloud storage, working with Dropbox, Google and Flickr, and offering features based on the APIs of each. Again, if you’re using Dropbox, it turns out that you can’t use features like Timeline or filter because Dropbox doesn’t actually make much photo EXIF metadata available through its API, which rather scuppers the ‘front end app’ requirement entirely. There’s also no features for tagging or rating.
So, in effect, there’s not much. Many are 80% there, and PictureLife was 95% there, but none are all the way. There does seem to be a gap, something The Verge acknowledges itself:
“It’s a strange time for photo storage. It’s never been more important, and yet even the biggest consumer internet companies barely seem to be paying attention. On one hand, I can hardly blame them — it’s a proposition that has proven singularly unprofitable. And yet I still can’t believe there isn’t a billion-dollar business to be built out of our collective need to remember.”
At the same time, Everpix, the great original ‘photo management, storage and enjoyment start-up… that failed’ admitted themselves that the business model is hard, as The Verge reported:
“And while the product wasn’t particularly difficult to use, it [….] required a commitment to entrust an unknown startup with your life’s memories — a hard sell that Everpix never got around to making much easier.”
It’s so true.
And so, you wonder which business model would work. Probably in this day and age of Google Photos, OneDrive, and iPhoto, there is none – the masses will go for the free/integrated option, and sadly, the people who care about things like privacy and portability don’t comprise a large enough market segment to make a business viable. Mylio is my current favourite, and walking this curious line between a selling point as a ‘easier, better sync tool than Lightroom for serious photographers’ whilst their marketing is still the consumer-land ‘making your memories accessible and safe’. Mylio is incidentally awesome, except of course it has no significant cloud hosting, so no sharing, social, or streaming videos.
Perhaps there’s space for a ‘fourth party’ service in the same way Boxcryptor works with Dropbox and others – you pay for dropbox, and then you pay Boxcryptor to act as a software and cloud service layer to encrypt your sensitive files. Similarly for photos, you could pay for the cloud storage service, perhaps one you’re using day-to-day already – and then pay a smaller monthly fee to a separate cloud-service provider who acts as the front-end to your photos, providing the collection, sharing, access, etc, but whom you are not trusting with your photos’ storage.
Of course, this was largely what PictureLife were, on Amazon S3. With that gone, perhaps this could take the form of an enhancement to Mylio Cloud for Mylio, where happy customers pay an additional fee for Mylio to broker access to data stored in another cloud service; many of the parts are there.
Footnote: It’s interesting to see that TheVerge
makes similar comments, along with an excellent review and table of the contenders, these being the usual suspects plus PictureLife, the latter which he sagely noted was a ‘cautious’ prospect following its acquisition by StreamNation. Their preference was for Google+, ironically three months before it was shut down and transferred to Google Photos, or PictureLife, also now effectively defunct.
I recently wanted to buy a microSD card for my new Xiamoi Yi. Given the plethora of fake SD cards out there, I decided to buy from Light In the Box rather than eBay, so that I was purchasing from a single retailer with a reputation to uphold.
Well, evidently they will have challenges with that.
Of the three items I bought, all didn’t work out.
- The ‘Sandisk’ Class 10 / UHC-1 64GB SD card seemed not to work in my Yi, beeping and stopping after a few seconds. I suspected it was not keeping up with the write speed of the Yi, which being the older model, should be fine with even a Class 6 (6MB/s), let alone a Class 10. So I tested it on my laptop, and found an average read speed of 3MB/s, compared to 13.7MB/s for my ‘true’ Class 10 SD card
- The $5 keyring torch I bought had a ‘bonus’ laser pointer mode that wasn’t listed at all on the description. I wouldn’t have minded if I hadn’t bought it as a nighttime light for my toddler. Lasers and todders don’t mix, unless you really want detached retinas all round
- The ‘warm light’ LED bulbs I bought were as clinically white as it’s possible for lights to be, and contrast horribly with my halogen bulbs
I guess Light In The Box is perhaps not more reliable than eBay in these circumstances.
EDIT: Well, they gave me a refund with no need to return the card, but only after 4 rounds of to/fro where they sent me a form email requesting a photo showing the damage. After I managed to convince them a photo would not reveal the card’s performance, they refunded immediately to my Paypal account.
[Note: Use this information at your own risk. I am not responsible if you lose any valuable photos or other assets as a result of following advice or using information in this blog]
There’s something none of us do enough in life – contingency planning. Not nearly enough of us write wills, buy insurance, or do all the other things that are going to save us when the do-do hits the wall.
A good example of this, is failing to test your fallback plan when your photo-hosting service suddenly goes away.
I’ve been posting about Picturelife, a (formerly) excellent Picture-backup/hosting service that I have been using from April 2014 to April 2016. Given that our children were born during that time, you can imagine that we have a huge collection of incredibly important videos and photos in Picturelife. Which is why it upset quite a few people when the lights went out earlier this year, and it’s been struggling ever since.
The big issue right now, is that customers can’t get their photos out of Picturelife; if you have two years’ family photos there, then tough – they’re staying there. Hopefully. Luckily, I had foreseen something like this, and chosen to self-host the photos on my own S3 bucket, meaning I had access at all times. I also manually synced this to my home NAS, in case something nasty happened, like someone deleting all self-hosted photos via the Picturelife access key.
I had also tested an export from Picturelife in the early days to check that all the metadata I had entered – album names, people names, dates, etc – were indeed contained in the metadata of the exported photos as they claimed it would be, so that I would be able to port my data away from them when it all went wrong and retain all the data I had added. And it was.
I also assumed that the data in the raw S3 bucket would contain all that metadata. And I meant to, one day, check that I would be able to use all the photos from that S3 backup, if Picturelife went away.
Analysing the S3 files
So, while I wait to see whether PictureLife’s remaining team will ever be able to stand the picture export function back up (more on that later), I’ve been analysing what is, and isn’t, in that S3 data store synced to my NAS.
Well, for a start, there is a lot of data. 75GB, and 23,000 files
So, how is this data all stored?
Well, unsurprisingly, all files are stored using hashes for filenames; hashes generally are chosen as a unique reference to a file, which won’t clash with any other file, even owned by another user. These can lead to, for example, unique sharing URLs at http://picturelife.com, without any credentials.
Looking at these files, it admittedly seems pretty random.
Some have JPG extensions, but many don’t – even though they are still JPGs, and can be opened as such.
- The same is true for other file types – GIFs, RAW, MP4… some have the correct extension, but many do not
- Some are stored with multiple resolutions – these have the same name with _O, _1000, etc. suffixes
- Videos are stored with multiple resolutions, and also with a thumbnail image and an audio file (which is the audio of the track), all with the same name
- All of my images and some videos were stored in one random/hash directory name. There’s also another folder called “video”, which contains only videos (with all the different resolutions, thumbnails and audio)
- Some files have XMP sidecar files – but only a very few (maybe those that were edited?). I had 118 XMP files out of all of mine
- There are no other metadata or index files anywhere that might describe these files
- The EXIF data in the images appear to be intact – whether for all of them, I don’t know. This means in a photo viewer, they appear in the right date in the calendar and the right location from GPS coordinates
- The EXIF or other data in the videos does not appear to be intact – all have a creation date within a few days last year, probably of the date I synced them to my NAS
Analysing the ZIP export
Now, I mentioned the zip export; this was a supported export feature, that allowed you to export a month at a time for backup, optionally with all metadata in an XMP sidecar file for each photo and video. I did manage to previously export April 2014 – April 2015 – one years’ worth. Of course, I forgot to do the exports for several months, and it’s not working right now.
The exports are much better:
- There’s a rich list of parameters – album name, face tags, geo coordinates, dates
- The XMP files for the video files contain the date it was recorded
- There are none of the ‘behind the scenes’ ancillary files – such as thumbnails, lower resolutions, etc. No mess
- Files retain the original filename from the source, such as IMG_NNNN for iPhone photos – rather than the hashed filename the S3 bucket contains.
- The zip file download function appears to be based on the original photo creation date, not the upload date; ie. If you only started using Picturelife a year ago, but uploaded 10 years’ worth of photos with their timestamps intact, you would see a list of 10 years’ worth of zip files to download.
So – I can at least try to use those copies where I have them. Ideally, I would be able to dedupe these against the S3 sync, so I can delete the ones in the S3 bucket which I have safe and sound, and avoid processing duplicates in the S3 bucket if they never get the export working again and I have to go through it manually to recover my photos.
Where to go from here
So, right now, I have a list of challenges to consider:
- Dedupe the zipped Picturelife exports, against the S3 backup, including any thumbnails or different resolutions
- Get a datestamp against the videos, which is likely only stored on Picturelife’s own servers, so as to get them in sequence with all my photos
- See how any ‘special’ photos were handled, if at all (eg. Bursts, live photos, etc), and if they can be recovered (I suspect they weren’t stored)
- Analyse generally whether my assumptions and deductions are working, ideally by checking distributions of datestamps, video timestamps, etc.
I’ve checked for various deduplication tools for files and photos, as well as file scanners, file renamers, etc., but I have to admit, I wish I had the scripting skills to do a lot of this. Scripts such as “do a checksum compare of the exported and S3 Picturelife files, deleting the S3 copy with any thumbnails or other recodes if the zip copy is good” is something few tools will have, unless they also support a scripting language themselves. Other useful things would be to check the number of unique photos before thumbnails and different resolutions, or compare creation dates against modification dates to understand what they represented. I wish I had learnt Python, here!
In the meantime, I’m left to sorting out what I have, and waiting. You can argue that I should never have moved away from the YYYY\MMDD – Event Name structure that I used to use, but the fact was, I wasn’t keeping up with that either, finding myself spending endless weekends trying to keep up. Picturelife did offer a great way to organise, and actually use – share and view – these photos, and my one big failure was in taking regular backups, and testing my data export strategy.
Lesson learned. Backup. Test. The two most important words in IT.
I’m in another burst of activity in what is a perpetual photo-organising spree. In this current burst, I used Mylio to analyse all my photos stored everywhere, which suddenly makes clear all the duplicates we have lying around.
It seems every time we retire a phone or laptop, we don’t have time check at the time whether we’d already sorted all the photos stored on it, and so they’re copied in bulk into a “Photos/Unsorted Photos/More new photos/Laptop unsorted/Really sort these ones!/My Pictures” type folder, hiding deep in the filesystem in my NAS.
Enter Auslogics Duplicate File Finder. At time of writing, it’s maintained a steady pace of development, and having updated after a long pause, I now find myself on v5.2.1.
A really useful feature of this version is that it shows a folder hierarchy of where the duplicate files were found, and then lets you select the folder that you consider to be the ‘duplicate’. Ie. That can be deleted in deference to the ‘main, primary’ copy folder. You can include subdirectories in this, so that in effect, I can simply run the duplicate scan, and then right click the “Photos/Unsorted Photos” folder, and tell DFF to delete all duplicates in that folder; done!
Note these are CRC32 exact matches – same size, no edits, no additional EXIF data – byte for byte the same. If you want to analyse resized or edited photos, such as deleting a copy downloaded from Facebook in preference for the original, then you’ll need to use a tool with dedicated duplicate image detection – something Mylio might have, but I haven’t figured out yet.
For some reason, perhaps because it’s the free version, it’s incredibly slow – but you can let it run unattended. Even with tools like Mylio or Picasa, this basic bytewise comparison/dedupe gives you a “zero doubt”, high-level first-pass dedupe to clear out the obvious candidates, and the folder tree approach gives a nice, clear view to allow you to do it quickly and easily.