FriendliAI Expands to San Francisco to Scale Frontier AI Inference for Open-Weight and Custom Models

May 11, 2026 at 11:00 AM EDT

ⓘ This article is third-party content and does not represent the views of this site. We make no guarantees regarding its accuracy or completeness.

New 7,000-square-foot SoMa office anchors FriendliAI’s global push as AI agents drive a generational surge in token consumption

FriendliAI, The Frontier AI Inference Cloud, today announced the opening of its new San Francisco office at 20 Hawthorne Street, occupying 7,000 square feet in the historic Crown Point Press building, around the corner from the San Francisco Museum of Modern Art. The expansion places FriendliAI at the heart of the Bay Area AI ecosystem and closer to the customers, partners, and developers building the next generation of AI applications.

The expansion lands at an inflection point for AI inference. Two forces are driving the shift: AI agents — which plan, reason across many steps, and call tools on every turn — require five to thirty times more tokens per task than chatbots, and that consumption compounds as agents move from pilots into always-on production workflows. Meanwhile, the latest open-weight models including Z.ai’s GLM-5.1, Moonshot AI’s Kimi K2.6, DeepSeek V4, and NVIDIA Nemotron 3 now match or exceed leading closed models like Anthropic’s Claude Opus at a fraction of the cost, and custom fine-tunes align even more tightly with enterprise use cases. Production-grade inference infrastructure has become the bottleneck — and the prize.

“San Francisco is the epicenter of AI innovation, and a deeper presence here lets us partner with the customers and developers shaping what comes next,” said Byung-Gon Chun, CEO of FriendliAI. “The industry is no longer asking whether to build with AI — it’s asking how to run AI in production, profitably, at scale. FriendliAI, The Frontier AI Inference Cloud, was built for exactly that.”

“Inference is where AI economics are won or lost,” said Brian Yoo, Chief Business Officer at FriendliAI. “Every percentage point of GPU efficiency translates directly to margin, and every millisecond of latency translates to user experience. Putting senior commercial and engineering leadership on the ground in San Francisco lets us move at the speed our customers need as they scale.”

FriendliAI was founded by Professor Byung-Gon Chun and members of his research team at Seoul National University, where they pioneered continuous batching — the inference optimization technique that is now an industry standard. Today FriendliAI runs state-of-the-art open-weight and custom models at production scale with industry-leading throughput, latency, and reliability. Independent benchmarks from Artificial Analysis and OpenRouter rank FriendliAI as the top inference provider for models such as GLM-5.1 and Gemma 4 across output speed, latency, tool calling, and structured outputs. The company partners with model creators on launch — most recently as a Day 0 partner for NVIDIA Nemotron 3 and Z.ai’s GLM-5.1 — and with cloud providers including AWS, OCI, and Samsung Cloud Platform on infrastructure to scale globally.

Customers including Twelve Labs and LG are already scaling with FriendliAI in production, and that momentum is translating into rapid business growth. FriendliAI is on a trajectory to grow revenue tenfold this year, with a goal of growing another tenfold the year after, as AI-native and AI-augmented SaaS companies migrate production workloads to its platform. The San Francisco expansion is built to support the trajectory: FriendliAI plans to significantly grow its U.S. team across go-to-market, partnerships, and engineering functions over the coming year.

The bright, loft-style space is also purpose-built as a hub for the AI builder community, hosting developer meetups, hackathons, and executive briefings on the practical realities of deploying inference at scale — from open-weight model deployments and GPU efficiency to multimodal and agentic workloads.

Learn more at www.friendli.ai, or explore open roles in San Francisco at friendli.ai/careers.

About FriendliAI

FriendliAI is The Frontier AI Inference Cloud. Built by the researchers who invented continuous batching, an inference optimization technique that is now an industry standard, FriendliAI efficiently runs state-of-the-art open-weight and custom models, enabling model ownership and advanced performance tuning. By optimizing every layer of the inference stack — from GPU kernels to serving infrastructure — FriendliAI delivers industry-leading throughput, latency, and reliability for engineers deploying frontier AI in production.

View source version on businesswire.com: https://www.businesswire.com/news/home/20260511481484/en/

Contacts

press@friendli.ai

Report this content

If you believe this article contains misleading, harmful, or spam content, please let us know.

Report this article

Symbol	Price	Change (%)
AMZN	262.18	-8.46 (-3.13%)
AAPL	307.70	-4.36 (-1.40%)
AMD	514.01	-2.09 (-0.40%)
BAC	51.75	+0.15 (0.29%)
GOOG	372.33	-4.10 (-1.09%)
META	607.67	-24.84 (-3.93%)
MSFT	460.91	+10.67 (2.37%)
NVDA	224.22	+13.08 (6.19%)
ORCL	247.69	+21.91 (9.70%)
TSLA	420.67	-15.12 (-3.47%)

FriendliAI Expands to San Francisco to Scale Frontier AI Inference for Open-Weight and Custom Models

Contacts

More News

Recent Quotes

Sections

Services

Contact Information

Follow Us