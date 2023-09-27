Limits
During the open beta, the following limits are place:
Inference requests per minute (per model)
- @cf/meta/llama-2-7b-chat-int8 - 50 reqs/min
- @cf/openai/whisper - 4000 reqs/min
- @cf/meta/m2m100-1.2b - 4000 reqs/min
- @cf/huggingface/distilbert-sst-2-int8 - 6000 reqs/min
- @cf/microsoft/resnet-50 - 6000 reqs/min
- @cf/baai/bge-base-en-v1.5 - 6000 reqs/min
Note that these limits are estimates, subject to change, and will vary by location while in Open Beta.
Other Limits
- @cf/meta/llama-2-7b-chat-int8 (max tokens) - 768 input / 256 output
- @cf/meta/m2m100-1.2b (max tokens) - 256 output