ChatGPT Is a Blurry JPEG of the Web | The New Yorker:
The resemblance between a photocopier and a large-language model might not be immediately apparent—but consider the following scenario. Imagine that you’re about to lose your access to the Internet forever. In preparation, you plan to create a compressed copy of all the text on the Web, so that you can store it on a private server. Unfortunately, your private server has only one per cent of the space needed; you can’t use a lossless compression algorithm if you want everything to fit. Instead, you write a lossy algorithm that identifies statistical regularities in the text and stores them in a specialized file format. Because you have virtually unlimited computational power to throw at this task, your algorithm can identify extraordinarily nuanced statistical regularities, and this allows you to achieve the desired compression ratio of a hundred to one.
Now, losing your Internet access isn’t quite so terrible; you’ve got all the information on the Web stored on your server. The only catch is that, because the text has been so highly compressed, you can’t look for information by searching for an exact quote; you’ll never get an exact match, because the words aren’t what’s being stored. To solve this problem, you create an interface that accepts queries in the form of questions and responds with answers that convey the gist of what you have on your server.