The Reviewer's Guide to Reviews
When do I have to listen to other people and when do I get to ignore them?
It seems to me there are two general ways to determine if something is good (in the sense that Howl’s Moving Castle is a good movie). The first is simply by experiencing the final output, without looking at or deeply considering how it was made. It seems there are some pretty obvious areas where this is an effective way to determine quality. I think that Sufjan Stevens’ Casimir Pulaski Day is a good song because it makes me want to cry my eyes out. I think that Lua by Bright Eyes and World Spins Madly On by The Weepies are good songs because they are personally significant to me and also make me want to cry my eyes out1. Maybe understanding how these songs were made would enhance my appreciation of them - my favourite book is Mikhail Bulgakov’s The Master and Margarita in no small part because of the incredible history of how it came to be published2 - but it’s not necessary.
In contrast, it feels silly to call a bridge “good” just because of my subjective experience of it. Maybe a bridge is aesthetically pleasing or it feels good to drive across, but I personally think the most important attribute a bridge should possess is structural integrity. Realistically, there’s no way I’m going to be able to determine that, no matter how many times I drive over it or watch sunsets from it. The way to determine if a bridge is good is to look at how it is made, which is the second way to determine quality. Is it likely to be able to survive possible significant weather events? Is there good support across all parts of the bridge? If the answers to these questions and more are yes, I would give that bridge my official stamp of approval.
What this means is that my ability to assess the quality of something depends on whether it is sufficient to simply observe the output or if I would need to see how it’s put together to make that determination. For things in the latter category, it is much more difficult to come to a view on quality. I not only need access to the internals of the thing (either proverbially or literally) but also the expertise to make an evaluation of the quality of its composition. While I can determine if a piece of software is good, I sure can’t tell if my dentist did a good job filling my cavities, and I can sometimes tell if cooking is good (depending on how worried I am about food poisoning).
One interesting implication of this divide is understanding the role of other people in coming to a view on quality. For things where we can determine quality on our own (such as with the arts), I don’t particularly need outside help - I can just listen to a song and decide if I like it. In fact, outside influence probably actively interferes with my taste. Knowing that a song has 100 million listens on Spotify vs. 50 thousand is going to change how I view that song, and as much as I wish it weren’t the case, if I saw Pitchfork give an album a 9.5 vs. a 6, that would probably change my subjective experience of that album. While it can be useful to know what other people think to help me decide if it’s worth my time to engage with something (i.e. do I want to go watch Oppenheimer in theatres or not), those opinions will also colour my own view on the film. Put differently, I’d rather blindly trust a friend telling me to read a book than go on Goodreads.
In contrast, I have no choice but to rely on others when it comes to things I otherwise couldn’t assess the quality of. I’m glad we have food safety inspectors to help ensure there are minimal rodents in restaurant kitchens just as I’m glad we have financial observatory bodies which provide oversight on banks to mitigate the risk they blow up. The trippy insight is that you can’t evaluate the work of quality assessors by any particular output - you would have to look at how they do their jobs. This suggests that in principle, there should be an ad infinitum chain of quality assessor quality assessors. In practice, we just end up doing this based on vibes and heuristics and other not very good reasons, hope we don’t get burnt too badly or often, and get righteously angry when we’re let down. Ultimately, you just have to trust people and cross your fingers.
There’s one other area where this taxonomy is useful. That’s right, this article is about AI baybeee!3
The Twist You Expected
About a year ago I started working through Andrew Ng’s Machine Leaning Course, but hit a wall when dataset normalization was introduced (essentially, a way to “smooth out” a dataset to make it look more sensible and logical than it actually is). The problem I ran into was that I couldn’t wrap my head around why you would want to do this and the course material was solely focused on how to normalize this rather than when and why we would choose to do so.
I’m fairly certain I understand what was / is going on now. When it comes to machine learning and AI as a whole, the people developing these systems are determining quality by looking at the output. With the pace that these systems are developing, I can’t imagine anyone has a thorough understanding of how and why AI generally produces the outputs it does (to say nothing about any particular output). As such, it is sufficient to say that data normalization is useful if it makes the output look better as once the data passes through the black box of AI.
If looking at outputs is partially (or more likely, largely) how the designers of AI systems and models evaluate their work, then it all the general public can do. We evaluate ChatGPT by how sensible the writing it produces is. We evaluate Midjourney by if it can create an accurate image for the prompt “Exhausted dragon working a desk job with a bonsai tree and picture of her family next to her in a surrealist style”. We can’t meaningfully assess the process by which the output is produced by these tools.
As such, it seems to me that an important question in determining if some AI output is good is the extent we can determine quality based solely on our observation of said output. This is the primary reason for my deep skepticism of the use of AI for data analysis4 - the most important element of determining the quality of such analysis is by understanding how it was done (such as the assumptions baked in and if any coding has been done correctly), and the risk of using incorrect analysis to inform decision making is potentially very high. Adjacently, part of my hesitance of using LLMs for news is because of a fear of it making things up. For all the problems with the news (not least that ideally I would assess the quality of news by assessing process rather than output, which I can’t do), I trust an Economist article on the recent attack on Iran from Israel a hell of a lot more than whatever ChatGPT outputs. Even if I can’t determine quality from output, I’d still rather put my trust in people than an LLM.
Lastly - what about art? This seems like the most obvious category for determining quality by output - if I watch a film made by AI and enjoy it, isn’t it good? The truth is that I want it to matter whether something is made by people or by AI. I want to prefer things made by people, I feel an instinctual revulsion to knowing that a piece of art is made by AI, and I hate how the training data for these systems has been stolen from artists. It seems to me that the way to reconcile this is to change my standards for determining if art is good - that before saying something is good, I need to know how it was made5. Especially since I expect that my emotional revulsion of AI will be gradually worn down over time (despite my best efforts), this is a standard I will have to consciously fight to hold on to.
I leave you with favourite poem.
Archaic Torso of Apollo
by Rainer Maria Rilke
We cannot know his legendary head
with eyes like ripening fruit. And yet his torso
is still suffused with brilliance from inside,
like a lamp, in which his gaze, now turned to low,
gleams in all its power. Otherwise
the curved breast could not dazzle you so, nor could
a smile run through the placid hips and thighs
to that dark center where procreation flared.
Otherwise this stone would seem defaced
beneath the translucent cascade of the shoulders
and would not glisten like a wild beast’s fur:
would not, from all the borders of itself,
burst like a star: for here there is no place
that does not see you. You must change your life.
Image from: Marie-Lan Nguyen
Respectively these are the favourite song of a girl I had a crush on in high school and an important song in my last relationship.
Including Bulgakov burning the original manuscript and us being uncertain if our current version is complete or assembled in the intended order.
I somewhat recently saw an idea for an entirely AI consultancy. While I’m not sure it would be much worse than real consultancies, I do not think it would be good.
Very similarly to the recurring conversation about divorcing art from artist and whether art can be said to be good if the person who produced it is distasteful / insert other negatively coded adjective here.