What this does

Adjusts runtime parameters of the embedded API server including port binding, context window size, GPU offload split, temperature, and maximum token output. After completion, the server serves requests with the desired behavior and performance.

Steps

Open the Server tab and load a model. Click the Server icon in the navigation bar. Use the model selector to choose a model, then click Load to place it in memory.
Set the bind host and port. In the server configuration panel, locate the Host and Port fields. The default host is 127.0.0.1 (localhost only). To allow connections from other machines, change the host to 0.0.0.0.
Adjust inference parameters. Locate the inference sliders for context length, temperature, and max tokens. Slide each parameter to the desired value. Changes take effect on the next request.
Enable GPU acceleration if available. If the system has a compatible GPU, locate the GPU Offload toggle and enable it. Adjust the offload percentage to match available VRAM.
Restart the server to apply changes. After modifying settings, click Stop Server (if running) then Start Server again.

Verification

curl -s http://localhost:1234/v1/models
# Expected: a JSON response showing the loaded model with applied settings

Common failures

Port already in use — The chosen port is claimed by another process. Run netstat -an | grep <port> to identify the conflict, or choose a different port.
Connection from remote hosts fails — The host is still bound to 127.0.0.1. Change the binding to 0.0.0.0 and ensure the firewall allows incoming connections.
GPU offload causes out-of-memory errors — Reduce the offload split to 50% or lower.
Temperature slider has no effect — Some model quantizations are tuned to specific temperature ranges. Verify the model's configuration card for recommended bounds.

How to configure LM Studio server settings

What this does

Steps

Verification

Common failures

Related guides