Leaving Picturelife isn’t as easy as I thought
[Note: Use this information at your own risk. I am not responsible if you lose any valuable photos or other assets as a result of following advice or using information in this blog]
There’s something none of us do enough in life – contingency planning. Not nearly enough of us write wills, buy insurance, or do all the other things that are going to save us when the do-do hits the wall.
A good example of this, is failing to test your fallback plan when your photo-hosting service suddenly goes away.
I’ve been posting about Picturelife, a (formerly) excellent Picture-backup/hosting service that I have been using from April 2014 to April 2016. Given that our children were born during that time, you can imagine that we have a huge collection of incredibly important videos and photos in Picturelife. Which is why it upset quite a few people when the lights went out earlier this year, and it’s been struggling ever since.
The big issue right now, is that customers can’t get their photos out of Picturelife; if you have two years’ family photos there, then tough – they’re staying there. Hopefully. Luckily, I had foreseen something like this, and chosen to self-host the photos on my own S3 bucket, meaning I had access at all times. I also manually synced this to my home NAS, in case something nasty happened, like someone deleting all self-hosted photos via the Picturelife access key.
I had also tested an export from Picturelife in the early days to check that all the metadata I had entered – album names, people names, dates, etc – were indeed contained in the metadata of the exported photos as they claimed it would be, so that I would be able to port my data away from them when it all went wrong and retain all the data I had added. And it was.
I also assumed that the data in the raw S3 bucket would contain all that metadata. And I meant to, one day, check that I would be able to use all the photos from that S3 backup, if Picturelife went away.
Analysing the S3 files
So, while I wait to see whether PictureLife’s remaining team will ever be able to stand the picture export function back up (more on that later), I’ve been analysing what is, and isn’t, in that S3 data store synced to my NAS.
Well, for a start, there is a lot of data. 75GB, and 23,000 files
So, how is this data all stored?
Well, unsurprisingly, all files are stored using hashes for filenames; hashes generally are chosen as a unique reference to a file, which won’t clash with any other file, even owned by another user. These can lead to, for example, unique sharing URLs at http://picturelife.com, without any credentials.
Looking at these files, it admittedly seems pretty random.
Some have JPG extensions, but many don’t – even though they are still JPGs, and can be opened as such.
- The same is true for other file types – GIFs, RAW, MP4… some have the correct extension, but many do not
- Some are stored with multiple resolutions – these have the same name with _O, _1000, etc. suffixes
- Videos are stored with multiple resolutions, and also with a thumbnail image and an audio file (which is the audio of the track), all with the same name
- All of my images and some videos were stored in one random/hash directory name. There’s also another folder called “video”, which contains only videos (with all the different resolutions, thumbnails and audio)
- Some files have XMP sidecar files – but only a very few (maybe those that were edited?). I had 118 XMP files out of all of mine
- There are no other metadata or index files anywhere that might describe these files
- The EXIF data in the images appear to be intact – whether for all of them, I don’t know. This means in a photo viewer, they appear in the right date in the calendar and the right location from GPS coordinates
- The EXIF or other data in the videos does not appear to be intact – all have a creation date within a few days last year, probably of the date I synced them to my NAS
Analysing the ZIP export
Now, I mentioned the zip export; this was a supported export feature, that allowed you to export a month at a time for backup, optionally with all metadata in an XMP sidecar file for each photo and video. I did manage to previously export April 2014 – April 2015 – one years’ worth. Of course, I forgot to do the exports for several months, and it’s not working right now.
The exports are much better:
- There’s a rich list of parameters – album name, face tags, geo coordinates, dates
- The XMP files for the video files contain the date it was recorded
- There are none of the ‘behind the scenes’ ancillary files – such as thumbnails, lower resolutions, etc. No mess
- Files retain the original filename from the source, such as IMG_NNNN for iPhone photos – rather than the hashed filename the S3 bucket contains.
- The zip file download function appears to be based on the original photo creation date, not the upload date; ie. If you only started using Picturelife a year ago, but uploaded 10 years’ worth of photos with their timestamps intact, you would see a list of 10 years’ worth of zip files to download.
So – I can at least try to use those copies where I have them. Ideally, I would be able to dedupe these against the S3 sync, so I can delete the ones in the S3 bucket which I have safe and sound, and avoid processing duplicates in the S3 bucket if they never get the export working again and I have to go through it manually to recover my photos.
Where to go from here
So, right now, I have a list of challenges to consider:
- Dedupe the zipped Picturelife exports, against the S3 backup, including any thumbnails or different resolutions
- Get a datestamp against the videos, which is likely only stored on Picturelife’s own servers, so as to get them in sequence with all my photos
- See how any ‘special’ photos were handled, if at all (eg. Bursts, live photos, etc), and if they can be recovered (I suspect they weren’t stored)
- Analyse generally whether my assumptions and deductions are working, ideally by checking distributions of datestamps, video timestamps, etc.
I’ve checked for various deduplication tools for files and photos, as well as file scanners, file renamers, etc., but I have to admit, I wish I had the scripting skills to do a lot of this. Scripts such as “do a checksum compare of the exported and S3 Picturelife files, deleting the S3 copy with any thumbnails or other recodes if the zip copy is good” is something few tools will have, unless they also support a scripting language themselves. Other useful things would be to check the number of unique photos before thumbnails and different resolutions, or compare creation dates against modification dates to understand what they represented. I wish I had learnt Python, here!
In the meantime, I’m left to sorting out what I have, and waiting. You can argue that I should never have moved away from the YYYY\MMDD – Event Name structure that I used to use, but the fact was, I wasn’t keeping up with that either, finding myself spending endless weekends trying to keep up. Picturelife did offer a great way to organise, and actually use – share and view – these photos, and my one big failure was in taking regular backups, and testing my data export strategy.
Lesson learned. Backup. Test. The two most important words in IT.