Completion Weaving

Completion weaving is a powerful technique to dramatically speed up GPT-3 api calls. You should use it when you’re generating a lot of text with an unordered list.

GPT-3 performance suffers as you increase response length

For the purposes of this guide our goal is to generate a list of blog post ideas about artificial intelligence. Let’s start with a naive approach.

Naive Code

const axios = require('axios')

const prompt = `A list of 25 blog posts about artificial intelligence:
1) The difference between artificial intelligence and machine learning
2) Neural Networks: A Beginner's Guide
3)`

async function gpt() {
    const data = {
        prompt,
        max_tokens: 300,
        temperature: 0.8,
        n: 1,
        stop: [`4)`]
    };
    const result = await axios({
        method: 'post',
        url: 'https://api.openai.com/v1/engines/ada/completions',
        data,
        headers: {
            Authorization: 'Bearer <your-token>'
        }
    });
    const ideas = parseResults(result.data.choices[0].text);
    console.log(ideas);
};

function parseResults(text) {
    // Split response into each item in the list.
    return text.split('\n')
    // Remove e.g. 4) and 2) before each element in the list.
    .map(l => {
        if (l.indexOf(')') === -1) return l;
        return l.substr(l.indexOf(')') + 1);
    })
    // Remove any whitespace.
    .map(l => l.trim())
    // Remove any empty items.
    .filter(l => l.length > 0)
}

(async() => {
    await gpt()  
  })()

This works, and will pretty reliably give you a new item every time. If you want more results, then you can just change the stop token from 4) to 10).

Performance as we increase the number of ideas

But what happens if you do that? I used this exact prompt and increased the number of responses from 1 up to 15 and took the response time of each (averaging over 10 runs) and this is the result:

A graph showing that the increased response length dramatically reduces api call performance

You can see the trend.

A better way

Luckily, there’s a way to dramatically improve the performance. We can use the “n” parameter to request multiple versions of the output. Instead of asking for 15 responses once, we’ll ask for 3 responses with n = 5. That will give us 5 different versions of those 3 responses, so 15 total.

With completion weaving, we’d request 15 new results like this:

Completion Weaving Code

const axios = require('axios')
const _ = require('lodash');
const prompt = `A list of 25 blog posts about artificial intelligence:
1) The difference between artificial intelligence and machine learning
2) Neural Networks: A Beginner's Guide
3)`

async function gpt() {
    const data = {
        prompt,
        max_tokens: 300,
        temperature: 0.8,
        // New
        n: 5,
        stop: [`7)`]
    };
    const result = await axios({
        method: 'post',
        url: 'https://api.openai.com/v1/engines/ada/completions',
        data,
        headers: {
            Authorization: 'Bearer <your-token>'
        }
    });
    // New
    const variations = result.data.choices
        .map(c => c.text)
        .map(c => parseResults(c));
    const ideas = _.flatten(variations);
    console.log(ideas);
};

function parseResults(text) {
    return text.split('\n')
    .map(l => {
        if (l.indexOf(')') === -1) return l;
        return l.substr(l.indexOf(')') + 1);
    })
    .map(l => l.trim())
    .filter(l => l.length > 0)
}

(async() => {
    await gpt()  
  })()

Much faster

And how does the performance compare?

Completion weaving performance

So we can get 15 responses just as fast as 3 (and 2x as fast as 15).

Conclusion

I hope you found this overview of Completion Weaving helpful. It’s a useful technique that I’m using myself.

Is there anything you’re struggling with? Leave a comment and I’ll try to help.

comments powered by Disqus