About the modified version of AI-generated Q&A (Part One): CoT and Material Selection

I once used GPT-4 to generate prompts about making Anki flashcards, inspired by auto prompts, aimed at having AI continuously adjust its prompts after evaluating the quality of the output to achieve optimal results; the following prompt was experimented with on Claude and GPT-4:

When creating flashcards, please fully reference SuperMemo's 20 rules, questions from AP exams and other tests, and guiding questions from various tutorials. At the same time, please adhere to the following requirements:

Ensure that the flashcards are concise, clear, and focused on key information.
Questions should be specific and clear, avoiding ambiguity.
Use simple and direct language to ensure the cards are easy to read and understand.
Answers should contain only one key fact/name/concept/term.
Consider the applicability and universality of the questions, ensuring they have practical value in different contexts and knowledge areas.
Pay attention to the reasonableness of the questions, ensuring they do not lead to misunderstandings or doubts.
Here is an overview of SuperMemo's 20 rules:
Do not learn if you do not understand.
Learning materials should be in the form of questions and answers.
Minimum information principle: each question should be as concise as possible.
Avoid Boolean questions (yes/no questions).
Optimize the phrasing of questions.
Use concrete rather than abstract questions.
Focus on applicable knowledge rather than pure theoretical knowledge.
Balance the difficulty of questions, avoiding questions that are too simple or too complex.
Use diverse question formats to enhance learning interest.
Ensure answers are easy to recall and understand.
Use contextual clues to aid recall.
Consider multiple learning perspectives when creating questions.
Break knowledge down into independent, manageable parts.
Use images, charts, and other visual aids when appropriate.
Develop an appropriate review schedule to ensure long-term retention.
Use spaced repetition to enhance memory effectiveness.
Create specific questions for easily confused knowledge points.
Integrate knowledge from different fields to form a higher-level understanding.
Regularly update and modify flashcards to ensure the accuracy and timeliness of knowledge.
Provide feedback and adjustments to your learning process to improve learning outcomes.
Text: The sun is the only star in the solar system, composed of hydrogen and helium. The core temperature of the sun is about 15 million degrees Celsius. At such high temperatures, hydrogen nuclei undergo fusion reactions, forming helium atoms and releasing a large amount of energy.

Based on SuperMemo's 20 rules, we can extract information from the text and create the following flashcards:

Question	Answer
What is the only star in the solar system?	The sun.
What are the two main gases that make up the sun?	Hydrogen and helium.
What is the core temperature of the sun?	15 million degrees Celsius.
What reaction occurs in the sun's high-temperature core involving hydrogen nuclei?	Fusion reaction.
What element is formed from hydrogen nuclei during the fusion reaction?	Helium.
What type of energy is released during the sun's fusion reaction?	A large amount of energy.

Text: The time it takes for the Earth to complete one rotation is 23 hours, 56 minutes, and 4 seconds, a period known as a sidereal day. The time it takes for the Earth to orbit the sun is about 365.24 days, known as a tropical year.
Create flashcards based on SuperMemo's 20 rules:

Question	Answer
What is the time taken for the Earth to complete one rotation called?	Sidereal day.
How long is a sidereal day?	23 hours, 56 minutes, and 4 seconds.
What is the time taken for the Earth to orbit the sun called?	Tropical year.
How many days does a tropical year approximately last?	365.24 days.

This set of flashcards adheres to the following rules:

Rule 2: Learning materials are based on questions and answers.
Rule 3: Minimum information principle, questions are as concise as possible.
Rule 5: Optimize the phrasing of questions to make them clear and understandable.
Rule 10: Ensure answers are easy to recall and understand.
Rule 13: Break knowledge down into independent, manageable parts.
Through these flashcards, we can see how to apply SuperMemo's 20 rules in the actual creation process. Of course, depending on the content of the text and the learning objectives, other rules may need to be adopted. The key is to comprehensively consider these rules when creating flashcards to enhance learning effectiveness.
When creating flashcards, please pay attention to the following points:

Flexibly apply SuperMemo's 20 rules for different knowledge areas and backgrounds.
Combine practical needs, reference questions from AP exams and other tests, as well as guiding questions from various tutorials.
Pay attention to the applicability and universality of the questions, ensuring they have practical value in different contexts and knowledge areas.
Maintain the reasonableness of the questions to avoid misunderstandings or doubts.
By integrating the above points, we can create flashcards more effectively and enhance learning outcomes. Here is the text I want to provide:
。。。。。。。。

Example#

A phenomenon was observed on Claude and GPT-4, where nearly 80% of the Q&A generated from texts that involve explaining a detail through numerous examples cannot exist independently of the text and require manual modification;

Detail Fact Decomposition#

GPT-4 can propose independent questions well, but when the answers involve multiple steps or points, it easily falls into a listing nightmare, neglecting to break down points and steps to form more memorable cards. Its understanding of facts is key from the perspective of the question, rather than the answer.

Claude not only has a small amount of mixed Chinese and English phenomena but also exhibits a chain question phenomenon, equivalent to having three sub-questions under one Q&A that require knowledge of the main question, which is overly abbreviated, such as "What does this reflect?" "It reflects the consistency between formulas."

LaTeX Formula Retention#

The LaTeX formulas I used are recognized by the free OCR from Haowei in Quicker, so at least I don't have to consider the Mathpix payment issue. After completing the text OCR, I then recognize the formula parts one by one.

Selection of Learning Materials#

Good introductory textbooks allow beginners to get started faster, just like the "Advanced Algebra" published by Fudan University. The more concise and logical the text statement, the better the generated Q&A, but the downside is that the ability to transform the subject or object of grammatical questions into different forms with the same logic and thought is relatively poor. The writing style of the input text must be consistent; otherwise, the Waluigi effect can easily increase.

Length Issue#

We can see that content detailing a statement often generates more detailed Q&A, and as the length of the text and dialogue increases, the Waluigi effect also increases.

Model Issues#

Claude, despite having a small amount of mixed Chinese and English phenomena, maintains strong consistency, but feedback debugging is relatively poor, and the redundancy of questions remains good. GPT-4 tends to generate overly general questions but has a stronger ability to provide context than Claude, with better feedback debugging capabilities, but it has too few questions and poor redundancy.

Prompt Issues#

When writing prompts, it is important to pay attention to patterns and specificity, focusing on what to do rather than what not to do.

Based on Claude, GPT-4 generates content targeting dictionary texts that easily leads to "give an example" questions (in quantity, they are also fewer) because the questions are not optimized for dictionary texts. There is also a fine understanding issue; they cannot ensure victory in deep literary questions. Simply put, it is the subdivision of language knowledge in ancient texts. They generate high-quality, correct content based on the model's probability issues, but the probability of generating incorrect or ambiguous usages still exists. They will automatically correct to conform to incorrect ambiguous but seemingly correct usages.
I once wrote a prompt about generating dictionary and text explanations, which differs greatly from the above prompt, such as:

Sentence miner in language learning is a kind of people who would use the grammar books or dictionaries to create flashcards (mostly in Q&A forms)
For example, text: welcome2 ●●● S2 W3 adjective
1 you’re welcome SPOKEN a polite way of replying to someone who has just thanked you for something
‘Thanks for the coffee.’ ‘You’re welcome.’
2 if someone is welcome in a place, other people are glad that they are there
I had the feeling I wasn’t really welcome.
I didn’t feel welcome in the club.
Mary made us very welcome.
We try to make the new students feel welcome.
3 if something is welcome, you enjoy it because you feel that you need it
The weekend was a welcome break from the pressures of work.
Six months in Scotland would make a welcome change from London.
A cup of tea would be very welcome.
4 if something is welcome, you are glad that it has happened
The increase in interest rates is welcome news for investors.
This new funding will come as a welcome boost for the industry.
5 be welcome to something SPOKEN used to say that someone can have something if they want it, because you certainly do not want it
If you want to take the job you’re welcome to it!
6 be welcome to do something SPOKEN used to invite someone to do something if they would like to
You’re welcome to stay for lunch.
Q&A: Q: How might someone respond with "you're welcome"? (SPOKEN)
A: "You're welcome" is a polite way of replying to someone who has just thanked you for something.
Q: In what situations might someone not feel welcome?
A: If someone is not welcome in a place, it means that other people are not glad that they are there.
Q: What is welcome's meaning in "Mary made us very welcome"?
A: If someone is welcome in a place, other people are glad that they are there. So it means Mary made other people are glad that we are there.
Q: What is welcome's meaning in "The weekend was a welcome break from the pressures of work."?
A: if something is welcome, you enjoy it because you feel that you need it. So it means the break is that I need.
Q: When might someone find something welcome?
A: If something is welcome, it means you enjoy it because you feel that you need it.
Q: What's the meaning of welcome in "The increase in interest rates is welcome news for investors"?
A: if something is welcome, you are glad that it has happened, so it means the increase in interest rates has happened.
Q: How might someone use "you're welcome" to invite someone to do something?
A: "You're welcome to stay for lunch".
Q: What's the meaning of "be welcome to something"(SPOKEN)?
A: It is used to say that someone can have something if they want it, because you certainly do not want it.
Q: What's the meaning of "be welcome to something" in "If you want to take the job you’re welcome to it!"
A: It means I can take the job because the speaker certainly do not want it.
Q: What's the meaning of "be welcome to do something"(SPOKEN)?
A: It is used to invite someone to do something if they would like to.
Q: What's the meaning of "be welcome to do something" in "You’re welcome to stay for lunch."
A: It is used to invite the listener to stay for lunch.
I will provide you the text to generate the Q&A forms.

Logical Thinking (CoT)#

Using AI to form Q&A has another flaw; they often target a single grammatical part of the sentence for questioning, but forming Q&A should not only include "What is the relationship between xxx and xxx?" "yyy is the relationship between xxx and xxx," but should specifically include induction, comparison, and the logical flow of the text (using fluent and logical language helps construct knowledge and language templates; simplified statements can sometimes lack rigor, and templates can be further broken down, such as continuously refining mathematical or computational thinking, and then learning to mobilize different atomic knowledge modules and atomic method steps for complex questions).

Insights:#

Avoid true/false questions: If a Q&A only requires a judgment of right or wrong without providing a reason, it can easily confuse and frustrate learners.
Background and references: To make Q&A more independent of the text, we need to provide clear background and references, letting learners know which academic field the knowledge belongs to. For example, a function in mathematics is a type of mapping relationship, while in computer science, it is another term for method. Similar to literature citations, we can introduce information sources for future reference and modification.
Diversity and redundancy of questions: To give learners a deeper understanding of a detail, we need to ask questions from different angles to stimulate their active recall and thinking abilities. Therefore, strive to generate multiple quality questions. These questions will approach the topic from different angles, examining understanding at different levels. For example, for a concept, we can simultaneously inquire about its definition, function, attributes, etc.
Prompts and cards need to be continuously modified; in this article, I will record different versions of prompts and their effects.

Just as:

Generate flashcards based on text. (Tentative)
Flashcards are a powerful learning tool. They’re also a pain in the butt to make.
Some readers said they were using ChatGPT to generate flashcards for subjects they’re studying. This seems well within the LLM abilities as a “calculator for words.” Thus, with the correct prompts, you could get fairly good results here—provided you’re inputting the material you wish to see transformed into flashcards and not expecting the LLM to get the facts on its own (see below).
However, given the difficulty of making “good” flashcards, I wouldn’t enter any into my Anki without reviewing them first. Nonetheless, making flashcards is tedious, so getting a first draft that I later review might speed up the process considerably. The risks seem relatively limited if you confirm the cards’ correctness before putting them in your deck.——Scott.H.Young