Hi Colin MacWilliam,
Thanks for reaching out. I understand how urgent and frustrating it can be when you hit token or throughput limits, especially when your workload depends on Azure OpenAI running smoothly. Token-per-minute (TPM) limits are enforced at the subscription and resource level, so increasing them requires a review from Microsoft rather than a configuration change on the user side.
The first step is to submit a quota-increase request directly from the Azure Portal. You can do this by going to your Azure OpenAI resource, opening the “Quotas” or “Usage + quotas” section, and selecting “Request increase.” Make sure to provide a clear explanation of your use case, expected traffic, and why the current quota is insufficient. This information significantly improves the chances of approval.
It’s also important to be aware that some subscription types such as free, student, or trial plans come with stricter limitations. In those cases, Microsoft may not be able to grant the higher limits you request unless you upgrade to a higher-tier or enterprise subscription. This is normal and part of Azure’s quota-management process.
In the meantime, review your current usage patterns to ensure you aren’t unintentionally exceeding token or request limits due to prompt size or request frequency. Sometimes optimizing prompts or batching requests helps avoid hitting ceilings while you wait for the quota increase to be approved.
Please let me know if there are any remaining questions or additional details, I can help with, I’ll be glad to provide further clarification or guidance.
Thankyou!