Flickr: finding images not in groups

Flickr has been one of my most-visited websites for nearly the last ten years, and rarely a day goes by if I am near the internet that I don’t check it out. I love the ability to share my photos with the world and to also see the work of others. Things have changed a lot over the last couple of years with new designs from the site that have polarised a lot of users (and upset me on more than one occasion) but I still stick with it; I haven’t found anything else quite like it.

flickr

Despite its brilliance though there are a few niggles with the way it works; the stats that I pay for as a “Pro” user (I don’t consider myself pro in any way incidentally) are pretty random and can change wildly with no explanation. The Organizr allows you to do some things really well (adding a selection of pictures to a group), and others really slowly (adding that same selection of pictures to lots of groups). But I love Flickr and am willing to forgive it for its many ways and wherefores.

There are many ways of making sure your photo is seen once you’ve posted it. Tags, titles and descriptions are some, but sharing it with other Flickr users in groups is really good. There are groups for just about anything you can think of taking a photograph of, and if there isn’t one you can start your own. So one thing I had wanted to know for a while though was which of the photos of mine I had uploaded over the lifetime of my account were not in a group.

groups

Whilst Flickr stats will tell you how many photos you have that are not tagged (and will load them into the Organizr so that you can tag them), and it will tell you how many photos you have that are not in groups, it won’t load them into the Organizr. In fact it won’t hint at which ones are not in groups at all.

This is apparently a complicated thing to do within the mechanics of Flickr and the Organizr, and as the-web-admin-for-a-big-site-in-a-former-life I have sympathy for when something good just can’t be done technically.

But also as a result of being a-web-admin-for-a-big-site-in-a-former-life, I know that sometimes, however clunky, there are ways of getting what you want. So what follows is a very clunky way of working out which of your photos are not in groups in Flickr.

The basic idea is

  1. Look through all of your Flickr pages
  2. Check which ones say they are not in a group
  3. Write the URLs out on a list which you can then use to check this

It seems so obvious when it’s written down, and you can do this manually if you want of course, but at the time of writing I have around 450 photos that are not in groups, so I want to do this faster. I’ve worked out the following automated method but you need to have a bit of a hacker hat on as it uses some old-school command-line tools. If you have a hacker hat that is better than mine you are probably going to laugh at how bad my technique is too. But, for me, it gets the job done, and you are welcome to suggest improvements.

I’m not explaining here how you get the tools by the way – there are lots of different versions for different operating systems and to list them all would take a long time. Plus, if you are a nerd like me you’ll already have them, and if you are a nerd-in-waiting then getting them set up is all part of the fun.

So, to start with I used wget to download a copy of my Flickrstream, but without any photos – all I wanted was the HTML page for each photo. I have about 1800 pictures on Flickr and this took about 45 minutes and generated LOADS of folders and files, but it did the job.

The basic WGET command is

wget -nc --no-check-certificate -r -l 1 www.flickr.com/photos/name-of-your-flickrstream

If you want to know what that means, well, the wget switches are many and complicated and the manual is lengthy (but good), and my attempts at a lengthy explanation would probably be wrong. But roughly speaking I wanted wget to follow all the links on my page, but not to go mad and download the whole of Flickr (which is totally possible for it to do if you really want, and I suggest you really DON’T want to do that). The command listed above seemed to work for what I wanted on the first page.

Since Flickr is paginated I had to run the WGET command once for each of the pages I have in the current Flickr (currently 18). I was using Windows so I just wrote a batch file and copied and pasted the above line 18 times with a different page number at the end. It looks something like this:

wget -nc --no-check-certificate -r -l 1 www.flickr.com/photos/name-of-your-flickrstream/page18
wget -nc --no-check-certificate -r -l 1 www.flickr.com/photos/name-of-your-flickrstream/page17
wget -nc --no-check-certificate -r -l 1 www.flickr.com/photos/name-of-your-flickrstream/page16
wget -nc --no-check-certificate -r -l 1 www.flickr.com/photos/name-of-your-flickrstream/page15

Yep, a real programmer would know how to set up something like a recursive function to do the whole set of pages in a loop, but I’m not that clever and, again, the above worked for me.

Note that wget is a sort-of-browser, and running it across your whole site will mean each of your photos will get 1 hit. So when you do this your stats will shoot up by the number of photos you have in your stream (if you care about that sort of thing).

What I ended up with was the aforementioned LOADS of files and folders. Each photo on my stream generated a folder (just like the URL) and in each folder was an index.html file which was a download of the Flickr page for the corresponding photo.

The next step was to query these files for something which would indicate whether the photo was or wasn’t in a group. For example, I knew that a photo which is not in a group has something on the page like “This photo is not in any groups”, so I thought I could just search for files with that sentence in them.

nogroups_live

But Flickr displays differently on older browsers, and my version of wget doesn’t appear to be like a modern browser. So my wget saved pages didn’t have any JQuery-type stuff showing – in fact when I loaded up one of the index.html files in a browser I didn’t get the same thing as I would if I’d gone to Flickr online. It looks a bit more vintage and doesn’t seem to mention if the photo is not in a group for example.

nogroups_wget

Nevertheless, something had to be somewhere in the HTML which told the browser, modern or not, whether the photo was in a group (or not). Sure enough, loading one of the index.html files into a text editor and doing a search for “group” eventually found a listing for a function(?) called Y.photo.init and this has a variable(?) called group_count. If your downloaded HTML has "group_count":0 in it, that means the photo on that page is not in any groups.

So, if like me you have about 1800 photos and now, after running wget you have about 1800 index.html files stored in nested folders to sort through, how do you work out which ones have "group_count":0 in them?

Well, what you want is a little program which you can put at the top of your file tree and get it to jump in and out of each folder, look through all the lines of HTML code, check for "group_count":0 and if it finds it, write it out to a list somewhere else. Clever programmers who know Perl and awk and other languages could probably do that in no time, but I’ve never been able to fathom them out (and I have tried) so I used sfk (Swiss File Knife) instead.

As with wget the options and manual for sfk are numerous and lengthy in explanation, but suffice to say after much trial and error the following worked for me if I ran it from the folder where wget dumped my copy of Flickr:

sfk filter -+"\"group_count\":0" -dir www.flickr.com\photos\jamesdavies > nogroups.txt

This wrote to a text file called nogroups and listed the paths for the downloaded Flickr files which had "group_count":0 in them. It also listed the corresponding line of code as well (which turns out to be quite a big block), but running nogroups through sed to only print the filename line generated a better list…

sed -n "/index\.html/p" nogroups.txt > nogroups1.txt

…and after that I was able to tweak nogroups1.txt to generate a full list of URLs for images that were not in groups which I could use with the proper online Flickr (and not my downloaded WGET copy).

[AFG_gallery id=’4′]

The pictures above are photos considered “uninteresting” by Flickr, and many of these are not in groups, so it’s these sort of photos that are listed with this method. I would have included part of my list here but once I add the pictures to groups it won’t be accurate!

So now I have this list it’s down to me to look at each of the photos on it and decide what to do with them. Most of them, because they aren’t in groups, don’t have many views at all. But, more interestingly, they seem to have mostly come from a time when I used Flickr in a different way than I do today. Along with Livejournal it was the first social network I really used, but I was never much of a diarist with Livejournal and I liked the idea of letting the pictures in Flickr tell stories about my life. I guess it wasn’t until Facebook that I started to use Flickr exclusively for showing photographs for their own sake to likeminded photographers, rather than putting out snapshots of part of my social life, so these group-less photos are mostly from the pre-Facebook era. They are often a set of images from somewhere I went, posted up in some sort of sequence with the expectation that I might get some comments. Where there are comments they are often from people I don’t hear from any more. It’s really strange!

It’s also astounding how little information I used to put on my Flickr uploads at the time – often just obscure one word titles and a few tags. There was rarely much information about what camera I had used or if it was film what the filmstock was.

Having generated my list of groupless photos I think the next step will be to delete a lot of the ones where there isn’t much of interest going on for anyone else, whilst tidying up descriptions and titles for the others and trying not to flood groups with the ones that ought to be seen more.

Anyway, I hope if you have got this far that this was useful for you. Comments are appreciated (but seemingly scarce these days; hey-ho).

Leave a Reply

Your email address will not be published. Required fields are marked *