TensorFlow
TensorFlow is an open-source machine learning framework developed by Google. Operators encounter it as an alternative to PyTorch for training or running models, though most local inference tools (llama.cpp, Ollama, vLLM) use PyTorch or GGUF formats. TensorFlow models use a different graph-based runtime (TF SavedModel, TF Lite) and often require conversion (e.g., to ONNX or GGUF) before they run on local hardware. TensorFlow's static computation graph can yield faster inference on some hardware (TPUs, older GPUs), but its ecosystem is less common in the local LLM space.
Deeper dive
TensorFlow 1.x used a static graph: you define the entire computation graph before running it, which enables optimizations like graph pruning and XLA compilation. TensorFlow 2.x (2019) adopted eager execution by default, making it more Pythonic and similar to PyTorch. However, most local LLM operators use PyTorch because Hugging Face Transformers (the dominant model hub) and llama.cpp (GGUF) are PyTorch-centric. TensorFlow models can be converted to ONNX or TensorFlow Lite for edge deployment, but the local LLM community rarely uses TensorFlow directly. For inference, TensorFlow Serving or TF Lite can serve models on CPU/GPU, but the operator workflow typically involves converting a TensorFlow checkpoint to a format compatible with local runtimes.
Practical example
A model like BERT was originally released in TensorFlow. To run it locally with llama.cpp, you would first convert the TensorFlow checkpoint to a GGUF file using a conversion script (e.g., convert_tf_to_gguf.py). If you try to load a TensorFlow SavedModel directly in Ollama or vLLM, it will fail — those tools expect PyTorch or GGUF. On an RTX 3060 (12 GB VRAM), a TensorFlow model might run with TF Lite, but you'd get better performance by converting to GGUF and using llama.cpp.
Workflow example
When you download a model from Hugging Face, you often see both PyTorch (pytorch_model.bin) and TensorFlow (tf_model.h5) checkpoints. If you only have TensorFlow files, you can convert them using transformers-cli convert or a script. For example, to run a TensorFlow BERT model locally, you might: python convert_tf_to_gguf.py --tf-model ./bert_tf --output-dir ./bert-gguf. Then load the GGUF file in llama.cpp: ./main -m ./bert-gguf/ggml-model-q4_0.gguf -p "Hello". Most operators skip this step by choosing PyTorch-native models.
Reviewed by Fredoline Eruo. See our editorial policy.