This article on machine learning for writers is part of the Science in Sci-fi, Fact in Fantasy blog series. Each week, we tackle one of the scientific or technological concepts pervasive in sci-fi (space travel, genetic engineering, artificial intelligence, etc.) with input from an expert. Please join the mailing list to be notified every time new content is posted.
The Expert: Emily Randall
Emily Randall is a software developer for Google, specializing in user-centered design and accessibility. Everything they say here is their own opinion – they do not speak for their company! (And, yes, they have to say that.)
In their free time, they run, climb, larp, and write sci-fi and fantasy. They can be found at www.emily-randall.com.
Machine Learning for Writers
Machine learning (ML) is the current special sauce of the tech world. Adobe uses it to organize your photos; Yelp uses it to give you recommendations for great restaurants; Amazon, Pinterest, Twitter, and Google run virtually everything with machine learning. Other companies, from two-person startups to tech giants, use it for everything from recognizing spam calls to detecting credit card fraud. It’s even in some of our airports, which are now using facial recognition in place of IDs.
Given such a diverse range of uses, ML is often used in fiction as Applied Phlebotinum, a magic substance that can create any kind of plot effect that you need. But, in reality, ML is nothing more than a tool, and a dangerous one at that.
Machine learning is pattern recognition
ML can be divided into three basic types: supervised, unsupervised, and reinforcement learning.
With supervised machine learning, you give the algorithm a whole bunch of data that you’ve annotated with the answers you’re looking for – “this is a cat, this is a dog.” The algorithm then does its best to learn which features of the data are relevant. For instance, if you give it height and weight of cats and dogs, it can draw a line that (mostly) separates the species, though it’ll probably decide that some small dogs are cats.
This is great when you can label the data – which often happens via crowdsourcing – but that’s not always an option. And that’s where unsupervised learning comes in: it doesn’t need any labels. Instead, you hand all the data to your algorithm, and it sorts it into categories for you. For instance, if you gave it a set of fruit pictures, it might classify them by color, or fruit type, or by something subtle in the pixels that only the algorithm can detect – it doesn’t always give you what you’d expect.
What if you don’t have any data at all, though? What if you want to build an AI that can play games or accomplish a task? That’s where reinforcement learning comes in. Instead of giving your algorithm a ton of data, you ask it to do something like playing a chess match, then give it points at the end based on how successful it was. It then does that task hundreds of thousands of times, until it learns how to maximize the number of points it gets.
This is a powerful technique – it’s how AI like AlphaGo, DeepMind’s Go-playing AI, learn. But its results can be quite unpredictable.
Honestly, that’s true for any type of ML. It’s great at recognizing patterns, but that can go awry quickly, leaving you with some major problems on your hands.
Myth 1: Machine learning is objective
This myth is particularly pernicious. People want to believe that the math powering ML is purely logical, without any of the biases that permeate human society, yet machine learning algorithms have done things like:
- Show ads for high-paying executive jobs primarily to men
- Direct police to focus their efforts in majority-Black neighborhoods even when the crime rate was the same between those neighborhoods and majority-white neighborhoods
- Translate languages without gendered pronouns in ways that reinforced gender bias (“He is a doctor, she is a nurse”)
- Declare that words like “gay” are inherently negative, and sentences like “I am gay” are likely to be toxic
The list goes on and on, and it’s all due to the data that the algorithms are given. Even if you don’t explicitly tell an algorithm to, say, translate “o bir doktor” (“they are a doctor”) to “he is a doctor,” it can figure out from your dataset that men are associated in popular culture with doctors, while women are associated with nurses. So it decides that a gender-neutral sentence in one language makes the most sense as a gendered sentence in another, even when that’s incorrect.
Now, that may seem relatively harmless, but the biases baked into many ML algorithms do everything from denying people housing and jobs to giving them overly-long prison sentences. Unless we’re very careful – and often there’s no way to be careful enough – machine learning will absorb the biases inherent in our society.
Myth 2: Machine learning is infallible
Most computer scientists wish this was true. But machine learning can fail in all sorts of unpredictable ways. For instance, one algorithm was told to sort a list. It determined that the best way to do so was to return an empty list – technically sorted, but only technically.
Another algorithm, which was supposed to be simulating jumping robots, learned to exploit a bug in the collision-detection code that propelled it high into the air when it clapped its body parts together. A different robot simulation, told to learn to walk as quickly as possible, grew into a large tower, then fell over. It achieved high rates of forward movement… for a very short amount of time .
Needless to say, these weren’t quite what the researchers had intended.
Like examples of bias, examples of weird machine learning failures abound. Like the algorithms that produce sexist or racist results, they’re doing exactly what they’re told to do – just in ways that their creators didn’t expect or desire.
Getting machine learning right
If you’re going to include machine learning in a story, use it to create more problems than it solves, or at least don’t make a one-stop solution for everything. Maybe your characters are able to translate the alien language, but they stumble into a nasty cultural pitfall when they use their flawed translations. Or maybe they’ve got a system to detect anomalous energy patterns, but a crucial clue is miscategorized.
You can use ML for a lot of things – in some ways, it really is Applied Phlebotinum. Want to tell if the movement of enemy starships is a precursor to an attack? Run a few algorithms on their prior movements and see if the current ones fit into the normal patterns. Need to give your dictator a way to immediately squash dissent? Have them use ML to process online chatter and seize potential rebels as soon as they say something vaguely threatening.
But remember that none of that is going to be perfect. Algorithms that try to detect hateful speech often fail if that speech is presented in a pseudo-rational way. Algorithms designed to speed up loan approval or automate prison sentences perpetuate racism and sexism (as Cathy O’Neil’s brilliant Weapons of Math Destruction explains.)
Add attackers into the mix, and it gets even more complicated. Adding noise undetectable to the human eye to a picture of a lion can cause a neural net to label it as a library, for instance, while facial detection software can be tricked by something as simple as a t-shirt with a face on it. You can even create images that a computer thinks are objects, but which look nothing like those objects to a human .
So, have fun with ML in your stories, but don’t make it infallible. Not only is that more realistic, it creates a better plot for your characters.
 Lehman, Joel, et al. “The surprising creativity of digital evolution: A collection of anecdotes from the evolutionary computation and artificial life research communities.” arXiv preprint arXiv:1803.03453 (2018).
 Nguyen, Anh, Jason Yosinski, and Jeff Clune. “Deep neural networks are easily fooled: High confidence predictions for unrecognizable images.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.Please share this article:
Follow me and you'll never miss a post: