same issue here
Deployed gpt-5.2 to foundry running into rate limit issues or very slow responses
Hello,
I deployed gpt-5.2 to azure -
My rate limit is set pretty high - I am the only user testing this right now and am getting
{"type":"response.failed","sequence_number":3,"response":{"id":"resp_077206c4b441379001693c2b62d0308196afcbb0353a0c03ce","object":"response","created_at":1765550946,"status":"failed","background":false,"content_filters":null,"error":{"code":"rate_limit_exceeded","message":" | ==================== d001-20251211012732-api-default-78bd44c5dc-9w645 ====================\n | Traceback (most recent call last):\n | \n | File "/usr/local/lib/python3.12/site-packages/inference_server/routes.py", line 726, in streaming_completion\n | await response.write_to(reactor)\n | \n | oai_grpc.errors.ServerError: | no_kv_space\n | "},
Last night the same deployment was working. Today the responses are either slow or I get a rate limit exceed on a simple prompt like Hello
Is it too early to use gpt-5.2 in Microsoft Foundry?