CSI Apple: The Omnibus Edition
By Neil Fairbrother
Apple’s recent CSAM detection announcement has caused a controversy about online privacy. Privacy of course is something we should all take seriously, especially with the unregulated commercial intrusion into our private use of social media services and of the internet in general.
Apple is a private limited company. They provide a service, iCloud, for the storing and sharing of electronic documents such as photos, videos, PDFs and the like. Before you use their iCloud service you have to sign a contract with them and by doing so you agree to their terms and conditions of service, which we’ve written about in this blog post.
Let’s pause and remind ourselves what lies behind the label CSAM. CSAM stands for Child Sexual Abuse Material. These are digital documents, typically photographs and videos, which document the offline rape of children, toddlers and babies. CSAM is digital evidence of heinous crimes, often self-posted and shared by the perpetrators themselves. These images and videos are not just images. It’s worth emphasising and repeating, they are visual records of an offline crime scene. These files provide evidence which can be used to convict criminals and just like real word crime scenes, they need investigating for clues.
As we wrote elsewhere, this content is illegal. Period. Creating it is illegal. Period. Storing it is illegal. Period. Sharing it is illegal. Period. Looking at it is illegal. Period. Other than for law enforcement purposes there are no exceptions. Period. There are no privacy grounds whatsoever that provide a safe harbour or carve out. None. What. So. Ever.
Children’s rights to not be abused and raped far exceed your rights to use a private service operated by a private, limited, for profit technology company, especially for creating, storing and sharing this illegal content.
Apple have clearly decided that the default position of being a passive bystander to this form of egregious child abuse is simply not good enough. They are using their technologies to become an active participant in child safeguarding by enforcing their own terms and conditions of iCloud usage, which all Apple iCloud users have already agreed to abide by.
This shift to enforcement is not subtle but seismic and perhaps it’s this change that’s caught so many people out. Apple quite reasonably don’t want this kind of content on their infrastructure, a position that makes sense to us. Who in their right mind would argue for it to be otherwise? For Apple to expunge and eradicate it, then logically enough they have to identify it and having identified it, they then have to report it to law enforcement for law enforcement to then act on it.
This will achieve two things. It will cleanse Apple’s services of the internet’s biggest and dirtiest trick, and it will help bring the perpetrators of these crimes against children, toddlers and babies to justice. What’s not to like?
This novel and proactive stance from a device manufacturer against the perpetration and proliferation of CSAM crimes has civil liberties organisations such as the Electronic Frontier Foundation up in arms about privacy. They contend that this move by Apple is encroaching on our rights to privacy and have even set up a petition here against it on the grounds that Apple is introducing a “mass surveillance” system.
Any objective review of Apple’s process shows this simply isn’t the case, so you have to wonder what motives the EFF in particular have in their lobbying for the apparent protection of privacy of predatory pedophiles and for whipping up a petition on clearly specious grounds.
Privacy has long been held by Apple as an absolute. They have withstood pressure from governments to open “back doors” into their ecosystem. They have introduced numerous privacy-enhancing technologies across their products in recent years which it would seem the EFF support as they haven’t raised petitions against them. So what’s different now?
In this blog post we examine Apple’s privacy enhancing features and compare how effective they are. Apple are involving law enforcement in their CSAM detection system, so we also look at the use of offline forensics and how digital techniques are applied to Apple’s CSAM detection tools. In all of this, remember that the videos and images that predatory pedophiles and others store and share on iCloud are visual records of offline crime scenes.
So as Apple turns enforcer, does CSI Apple have a case to answer for the “Man on the Clapham Omnibus”?
Putting your finger on it
First of all let’s talk about fingerprints. Fingerprints can refer to:
The pad of an actual finger, complete with the distinctive patterns caused by ridges and valleys, which have fascinated humans for thousands of years. There’s a commonly held view that all such fingerprints are unique which if so, would make these an ideal basis for a privacy feature as there are currently some 76.4Bn, or 76,400,000,000, individual unique fingerprint patterns in the world today, making a privacy breach a vanishingly small possibility.
Fingerprints can also be “friction ridge impressions”, those patterns left behind by human touch, a ghostly reminder of the presence of someone, and which have been used as a form of ID in pottery for example for millennia. Latterly, latent prints are found by forensics and used as evidence in courts of law. These friction ridge impressions are caused by the deposit of sweat from eccrine sweat glands that track the ridges in the skin that make up the distinctive patterns we can all see on our fingertips. There are approximately 2,500 to 3,000 eccrine sweat glands[1] per 2.5cm2, or per fingerprint, giving a “resolution” so to speak of approx. 4.4 dots per mm2.
A fingerprint can be the recorded image of a print at a scene of crime by a Crime Scene Investigator (CSI) or Scene of Crime Officer (SOCO). Traditionally fingerprint powders and latterly forensic light sources have been used to reveal the presence of latent fingerprint traces across a broad variety of porous and non-porous surfaces. Once exposed these are then collected, transferred and analysed for structural similarities with other images of prints. These are often partial prints and may themselves be corrupted by smearing or other contaminants at the crime scene. The transfer and analysis process can be error prone.
The US Department of Justice’s Fingerprint Sourcebook tells you everything you need to know about the forensic use of fingerprints, apart from how effective or accurate the whole process is; what is the false positive rate? With barely a reference to the issue, the document does acknowledge that “False positives, on the other hand, would be most troublesome.” But doesn’t expand on this tantalising hint. Perhaps a systematic review is not readily available because when it goes wrong it goes horribly wrong as in the case of the FBI misidentification of Brandon Mayfield in the Madrid, Spain, train bombing terrorist attack.
The man on the Omnibus might begin to doubt whether the whole premise of the uniqueness of fingerprints, first mooted but unproven in 1788 when the global population was less than a billion, is wrong. If so what would that mean for individual privacy? While there is little published work on the false positive rates of fingerprint analysis, this study found a false positive rate of a staggeringly high 0.1%, or one in 1,000.
A fingerprint might refer to a digital scan of the approx. 225mm2 patch of skin on a fingertip, complete with the distinctive patterns of arches, loops, whorls and other characteristics which contribute to their complexity, and some say uniqueness. Just like any other scan, these can be at different resolutions. The FBI’s Integrated Automated Fingerprint Identification System IAFIS uses 1,000dpi scans, Apple’s TouchID at 500dpi.
Finally, a fingerprint might refer to a unique digital code attached to a file so that when similar versions of these files are found, they too can be digitally fingerprinted and more easily tracked, traced, compared and deleted. PhotoDNA is an example of a technology that uses “digital fingerprints” or “hashes” as they are widely known.
Password and Passcodes: Privacy 101
The most basic of privacy technologies Apple introduced for MacOS and iOS were passwords and passcodes. Apple says that the odds of someone breaking your 4-digit privacy passcode are 1 in 10,000. Further privacy protections include a limited number of retries and a “time out” of increasing duration. Ultimately iPhone will auto-disable and will, if you’re set it up to do so, erase itself after 10 attempts, forever rendering your content totally private, permanently.
The modern default privacy code uses 6 digits which makes the system even more private with the odds against at 1:1,000,000, which the Man on the Clapham-bound Omnibus might conclude is a pretty robust privacy lock.
Next level privacy: TouchID
When Omnibuses were actually a thing, TouchID would have seemed like science fiction. TouchID privacy was launched by Apple with the iPhone 5S in September 2013. The TouchID privacy system has since become embedded across all of Apple’s systems except AppleWatch.
Orignally based on technology acquired from AuthenTec, it uses 500dpi sensor which Apple says “…intelligently analyzes this information [the digital fingerprint scans] with a remarkable degree of detail and precision. It categorizes your fingerprint as one of three basic types—arch, loop, or whorl. It also maps out individual details in the ridges that are smaller than the human eye can see, and even inspects minor variations in ridge direction caused by pores and edge structures.”
Apple goes on to say that “Every fingerprint is unique, so it’s rare that even a small section of two separate fingerprints are alike enough to register as a match for Touch ID. The probability of this happening is 1 in 50,000 with a single, enrolled finger.”
The scan produces a digital map of the fingerprint’s features which are then stored in a multi-layered privacy-enabled Secure Enclave.
What’s interesting here in the context of privacy protection is the relative low security that Apple’s TouchID provides, just 1:50,000. TouchID’s fingerprint scanning process is one of most efficient and effective ways to scan a fingerprint, certainly for the domestic user, as explained in Apple’s TouchID user guidelines here, and all done in the comfort of one’s own home.
If it is the case that all 76,400,000,000 fingerprints on the planet are unique, then why is Apple’s fingerprint scanning technology so weak when it comes to the absolute numbers for privacy protection? The “privacy lobby” however seems to accept this privacy lock as being good enough, as most likely would our Omnibus-riding man.
Total Privacy on the face of it: FaceID
FaceID was launched by Apple in November 2017 and it uses various techniques to scan a user’s face to produce a mathematical model which can subsequently be used to unlock the privacy locks on an iPhone and to make payments with ApplePay.
Apple says “Face ID uses the TrueDepth camera and machine learning for a secure authentication solution. Face ID data – including mathematical representations of your face – is encrypted and protected with a key available only to the Secure Enclave.
The probability that a random person in the population could look at your iPhone or iPad Pro and unlock it using Face ID is approximately 1 in 1,000,000 with a single enrolled appearance.”
It’s curious to note that although there are 10 times fewer faces around the world than fingerprints at 7.64Bn, FaceID at 1:1,000,000 offers stronger privacy protections than TouchID. In fact this step forward in technology offers the same level of privacy protection as a 6-digit access code. Given there’s no significant privacy advantage of FaceID over Access codes, you have to wonder what the point is, other than perhaps a small degree of convenience.
Note that Apple also says “The statistical probability is different for twins and siblings that look like you, and among children under the age of 13 because their distinct facial features may not have developed fully. If you’re concerned about this, we recommend using a passcode to authenticate.” The reason 13 is used here is almost certainly to do with the Child Online Privacy Protection Act (COPPA) rather than any biological tipping point at 13.
Civil liberties campaigners are right to be worried about the spread of facial recognition technologies and how this can be used for mass surveillance. But as a personal key to a personal privacy lock, would the man on the Clapham Omnibus be happy with a privacy lock offering 1:1,000,000 odds? We know the answer to that don’t we?
Organic DNA, the gold standard
As of 30th June 2021, there are 31,521 children’s DNA profiles stored in the UK’s National DNA Database (NDNAD), out of a total of 6,719 274 total subject profiles. According to HM Government, the probability of the DNA profiles of two unrelated individuals matching is on average less than 1:1,000,000,000 (one in one billion).
Just as with fingerprint collection and analysis the accuracy and reliability of DNA analysis as a proof of ID is subject to the quality of process in finding, gathering, storing and analysing the samples. However, should due process be properly applied then the 1:1,000,000,000 odds of a random match are considered by the Courts of Law and public opinion, crime writers and the man on the Omnibus as being omniscient.
Were it possible for Apple to construct a consumer-friendly privacy technology based on DNA samples, then 1:1,000,000,000 would seem to be a pretty acceptable privacy threshold on which to base it.
Taking a byte out of Digital DNA
DNA is a pattern of a basic code. The more the repetition of this code matches the patterns of repetition in another DNA sample, the more sure you can be that you have a match. Our Man on the Clapham Omnibus might wonder how this principle could be extended to digital media, especially the photos and videos that comprise visual records of offline crime scenes against children, and if it is possible what the odds would be against a positive match?
The answer lies within digital “hashing”, where the hashes or digital codes represent a form of “digital DNA”. Quite possibly the best known and most widely used “digital DNA” tool in the fight against CSAM is PhotoDNA, developed by Microsoft and Hany Farid, an American university professor who specializes in the analysis of digital images, Dean and Head of School for the UC Berkeley School of Information.
PhotoDNA by Microsoft and Hany Farid
PhotoDNA creates a unique digital signature (known as a “hash”) of an image which is then compared against signatures (hashes) of other photos to find copies of the same image. When matched with a database containing hashes of previously identified illegal images, PhotoDNA help detect, disrupt and report the distribution of child exploitation material. PhotoDNA is not facial recognition software and cannot be used to identify a person or object in an image. Microsoft say that a PhotoDNA hash is not reversible, and therefore cannot be used to recreate an image.
Apple uses NeuralHash, their equivalent of PhotoDNA as the basic building block of their CSAM detection solution for illegal images that are uploaded to iCloud. It will help perform a match of found hashes against the NCMEC database of all known hashed CSAM images.
According to the International Telecommunications Union (ITU), PhotoDNA enables the U.S. National Center for Missing & Exploited Children (NCMEC) and leading technology companies such as Facebook, Twitter, and Google, to match images through the use of a mathematical signature (the hash) with a likelihood of false positive of 1 in 10 billion. That’s 1:10,000,000,000.
Might a 1:10,000,000,000 false positive rate be considered by the still Omnibus-bound man as reasonable and acceptable?
Making a hash of it
Apple’s perceptual hash algorithm, NeuralHash, is Apple’s algorithm which they say is designed to answer whether one image is really the same image as another, even if some image-altering transformations have been applied (like transcoding, resizing, and cropping).
During NeuralHash development, Apple produced an image-level false positive rate of 3:100,000,000 using an adult pornography dataset. For field deployment Apple have built into NeuralHash a “safety margin that is 2 orders of magnitude stronger”. Apple say in their Security Threat Model Review document that “…specifically, we assume a worst-case NeuralHash image-level error rate of one in one million, and pick a threshold that safely produces less than a one-in-one-trillion error rate for a given account under that assumption.”
No explanation is given for the one-in-one-trillion (1:1,000,000,000,000) error rate. It seems like it’s a random number plucked out of thin air, simply to comfort people that the chances of a false accusation of having CSAM on your iCloud account is vanishingly small. While our man in the Omnibus may be very happy with these odds, there is a price to pay which he may well find disconcerting, especially as there’s no valid justification from Apple for it.
Apple go on to say that “Building in an additional safety margin by assuming that every iCloud Photo library is larger than the actual largest one, we expect to choose an initial match threshold of 30 images.”
Bearing in mind that we’re talking about images and videos of offline crime scenes, this threshold of 30 images is akin to saying to the CSIs or SOCOs that they need to find 30 traces of organic DNA, or 30 fingerprints, at physical crime scene before the analysis of the evidence can begin. For all crime scenes with fewer than 30 DNA samples or fingerprint sample, no further action will be taken.
Yet with CSAM, one hashed image is all the evidence you need. Apple explain that the NeuralHash “… has not been trained on CSAM images…It does not contain extracted features from CSAM images, or any ability to find such features elsewhere. Indeed, NeuralHash knows nothing at all about CSAM images”.
In other words, it is not analysing and classifying the images it finds – it is in fact image dumb. All it seems to be doing is searching for hashes that have already been tagged by NCMEC (and one other unnamed authority) onto images that might be uploaded to iCloud, comparing those against the known NCMEC database of pre-hashed images and reporting a find, which will be assessed by a human operator, an Apple employee.
As an aside, some are questioning the legality of this Apple inspection, pointing out that in the US the laws around CSAM discovery allow platforms to only report to NCMEC with no intermediary processing.
If it is the case that there’s no image analysis happening, and it’s a relatively simple database lookup of hash codes, then why is Apple so worried about false positives? It makes no sense to us, to the Phoenix11 and we’re pretty sure the man in the Omnibus who’s now reached his destination will have grave doubts about this aspect of Apple’s online child safety proposition.
J’accuse
Apple’s product launches are usually slick, well planned and rehearsed to the last detail affairs. From Steve Jobs’ original iPhone launch, which still stands as a masterclass for any aspiring product manager, through to the latest WWDC keynote earlier this year, rarely has Apple put a foot wrong when introducing new features and products. On this occasion though, with one of the most important feature announcements concerning the pernicious global social problem of child abuse, it’s justifiable to accuse Apple of botching it. They let themselves down, their customers down and most importantly they’ve let children down. And there are still questions left unanswered.
Apple’s stance on privacy is to be celebrated, applauded and supported and set against the extensive privacy features Apple have introduced for their users, it was natural that this CSAM announcement would be couched in privacy terms. Apple are still bending over backwards to protect their users’ privacy, and once you understand what they are saying to suggest otherwise is disingenuous.
But as we’ve long maintained, privacy and safety are not the same thing. True online security for children is delivered by privacy and safety. On that score Apple have more work to do.
—-
[1] DoJ Fingerprint Sourcebook
Updated 28.8.2021 to say “Apple uses NeuralHash, their equivalent of PhotoDNA “