Keeping Context Throughout a Conversation

Is there a way with the generation api to hold the context of a conversation/chat?

Currently, I pass a “base” prompt into the API and prepend it to the latest question/input from the user. This works OK, but doesn’t really hold the context of the conversation very well. Ideally, you could start a “session” using an initial prompt and that prompt would be updated in real time as the conversation progressed (gathering the context that is created along the way).

Please let me know if there is a way to accomplish this - thanks!

Hey! I have had a lot of success by just keeping track of the dialogue and putting it back into the prompt.

So i start with a base prompt like

This is a chat log between the shopkeeper and the player
player: hello! how are you?
shopkeeper: I am well. What can i get you today?

and then when i get a question from the player i append it like this:

This is a chat log between the shopkeeper and the player
player: hello! how are you?
shopkeeper: I am well. What can i get you today?
player:
shopkeeper:

and i save the resulting sample from the model so that i keep building up the prompt.

In this way i am able to have stateful conversations like this

This is a chat log between the shopkeeper and the player
player: hello! how are you?
shopkeeper: I am well. What can i get you today?
player: do you sell magical hats!
shopkeeper: yes i do.
player: how much are they?
shopkeeper: magical hats are 20 dollars

1 Like

Cool - thanks - I will try this.

Do you know the maximum size (num of characters?) for a prompt?

Hi Shanen!

For generation the way we think about the maximum size would be the sum between the prompt + the output. While it may vary per model, we generally cap the maximum size (prompt + generation) at 2048.

The error you might experience if you exceed the cap would be the following:

Is there some code/documentation to calculate the number of tokens?

Hey @thinkevolve,
We’ll be adding this to our SDKs in the new year! I’ll be sure to ping you when this feature is live.

1 Like

Hi @elliott when you mention “generally at 2048”, what are the next reasonable / cost-effective limits, and what would the associated cost be?
Thanks!

Hi MGE,

Apologies for being not clear. We cap the input +output tokens at 2048 which means if you were to input 1024 tokens you can only generate a maximum of 1024 tokens. In terms of reasonable / cost-effective limits if you were to increase the input length you would be able to have more context but would be generating less output.

Let me know if this helps or if you have any more questions :slight_smile:

I see, Elliott.
But there is still a hard cap on 2048 tokens? Or is it a soft one that we can change?

Hey @MGE,

Sorry for the super late reply - its a hard cap currently.