Advent of Technical Writing: Jargon
Published on under the Advent of Technical Writing category. Toggle Memex mode
This is the twenty-second post in the Advent of Technical Writing series, wherein I will share something I have learned from my experience as a technical writer. My experience is primarily in software technical writing, but what you read may apply to different fields, too. View all posts in the series.
When you are writing documentation, it is essential that you always keep your target audience in mind. Who do you expect to read your content? What knowledge do you expect your readers will have? Is it reasonable to expect they have that knowledge?
The audience for which you are writing, and the knowledge they have, should inform several parts of your writing, including how you define terms. In general, make sure that you only use terms that are:
- Relevant to the documentation or blog post you are writing.
- Defined in their first use, if they are jargon terms that a reader is unlikely to know.
- Written in long-form as well as short form if they are abbreviations, in their first instance. Then, used in short form.
Consider the following sentences which define the term "zero-shot classification", taken from an article by Amazon:
Zero-shot classification is a paradigm where a model can classify new, unseen examples that belong to classes that were not present in the training data. For example, a language model that has beed [sic] trained to understand human language can be used to classify New Year’s resolutions tweets on multiple classes like
career,health, andfinance, without the language model being explicitly trained on the text classification task. This is in contrast to fine-tuning the model, since the latter implies re-training the model (through transfer learning) while zero-shot learning doesn’t require additional training.The following diagram illustrates the differences between transfer learning (left) vs. zero-shot learning (right).
[diagram]
Yin et al. proposed a framework for creating zero-shot classifiers using natural language inference (NLI). The framework works by posing the sequence to be classified as an NLI premise and constructs a hypothesis from each candidate label. For example, if we want to evaluate whether a sequence belongs to the class
politics, we could construct a hypothesis of “This text is about politics.” The probabilities for entailment and contradiction are then converted to label probabilities....
Now read this, an excerpt from an article I wrote on zero-shot classification:
Heading: What is Zero-Shot Classification?
Zero-shot classification models are large, pre-trained models that can classify images without being trained on a particular use case.
One of the most popular zero-shot models is the Contrastive Language-Image Pretraining (CLIP) model developed by OpenAI. Given a list of prompts (i.e. “cat”, “dog”), CLIP can return a similarity score which shows how similar the embedding calculated from each text prompt is to an image embedding. An embedding is a numeric representation of text or an image which can be compared to measure similarity. Embeddings encode semantics, which means that the embedding for "cat" will be closre to an image for a cat than the embedding for "dog".
You can take the highest embedding similarity as a label for the image.
CLIP was trained on over 400 million pairs of images and text. Through this training process, CLIP developed an understanding of how text relates to images. Thus, you can ask CLIP to classify images by common objects (i.e. “cat”) or by a characteristic of an image (i.e. “park” or “parking lot”). Between these two capabilities lie many possibilities.
Which do you prefer, and why?
(Note: The first example talks about text classification, whereas the second talks about image classification. AWS showed up first in a search for "what is zero shot classification" so I used their example.)
Comparing the excerpts
I want to point out a couple of features of this the second text. First, notice how I defined zero-shot classification in the first sentence. My definition is concise. To make this more effective, I could add "This is in contrast to fine-tuned models, which need to be trained on specific use cases to be used." I will need to make that update!
Then, I refer to an example, CLIP. I define CLIP, using its long ("Contrastive Language-Image Pretraining") and short ("CLIP") forms. I give this topic a few sentences since it is essential to understanding the topic. Then, I talk more about CLIP.
Here, I assume the reader has a cursory understanding of "training". If someone is reading about zero-shot classification, they probably have some general awareness of machine learning. This is because zero-shot classification is a specialized form of classification. I also assume the reader knows what "classify" means.
We introduce embeddings, which is a technical term. But, they defined, and with reference to an example in context.
In contrast, the first example uses more complex words like "paradigm" which add little value to the definition. The second sentence in the first example is a run on sentence, which makes it difficult to read.
Then, the second example mentions "transfer learning". Transfer learning is relevant, but it is likely a new machine learning term to the reader. It is explained in a diagram (not pictured above, but present in the original article), but no text is written to supplement and explain the diagram. The article then mentions a phrase like "constructs a hypothesis from each candidate label", which is relatively hard to understand.
"The probabilities for entailment and contradiction are then converted to label probabilities." is confusing. Someone with limited knowledge of statistics may wonder if "entailment" and "contradiction" are technical terms with meanings, or gloss over them entirely because they are complex.
From the two excerpts above, there are a few things we can learn. First, concise definitions matter. Second, it is key that you define jargon clearly. Third, you should use language that is intuitive to your audience. Transfer learning, for example, is somewhat difficult to understand. My post doesn't mention the term because knowing it doesn't impact your ability to understand and intuit what zero-shot classification is.
Which example would you send in an email to a friend?
If you are used to writing complex technical documents, the first and second examples may seem equally acceptable (and perhaps the second one lacking in detail). Put yourself in the shoes of a friend who is learning about classification and has limited statistics knowledge. Which definition above would you send them in an email to help them build their understanding?
Words matter. Technical jargon is unavoidable, but it is our job as technical writers to help introduce readers to jargon at the right pace. As a technical writer, you should minimise jargon, define jargon when it is used, and avoid assuming knowledge that someone is unlikely to have given the subject matter of an article.
Responses
Comment on this post
Respond to this post by sending a Webmention.
Have a comment? Email me at readers@jamesg.blog.
