HammerBench : AGX Thor’s Power Meets Ollama

What is an LLM benchmark and why is it important?

LLM benchmarks are standardized tests designed to measure how fast, efficient, and accurate large language models (LLMs) perform across different hardware and environments. These tests evaluate metrics such as latency, throughput, and sometimes accuracy to provide an objective view of performance.

As LLMs continue to grow larger and more complex, choosing the right hardware to run them on becomes a critical decision. Benchmark results are essential to understand which device or infrastructure delivers better performance, to balance cost and efficiency, and to identify the most suitable solution for real-world use cases. In short, LLM benchmarks give both researchers and developers a clear roadmap of how models perform in practice.

To showcase the performance of Jetson AGX Thor, we are sharing our results and performance charts with you. At the same time, you can also run benchmarks across different GPU types to compare and validate performance for your own workloads. If you want to measure the performance metrics of your own devices and test your models under real-world conditions, get in touch with us. With our solution, your measurements turn into more than just numbers — they become actionable insights that drive strategic decisions.

How to use HammerBench ?

🖥️ What the App Does

This is a Streamlit-based LLM Benchmark Tool interface designed to evaluate large language models (LLMs) on NVIDIA Jetson AGX Thor hardware using Ollama as the backend.

⚙️ Configuration (Left Sidebar)

  • GPU Information:
  • Detects if the device is a Jetson (in this case, a Jetson AGX Thor Developer Kit).
  • Shows details about the GPU (NVIDIA Jetson AGX Thor) and available memory (125,772 MB ≈ 122.8 GB).

Use Only GPU:

A checkbox option that allows restricting benchmarks to GPU-only execution.

📊 Main Panel

Title: LLM Benchmark Tool with description: Benchmark LLM models using Ollama with real-time progress tracking.

Models Compatible with GPU memory (VRAM) requirements:

  • Displays a table of available models (llama3.2.1b, gemma3.4b, qwen3.14b, gpt-oss20b, etc.)
  • Shows how much memory (VRAM in GB) each model requires.
  • Marks them with ✅ if they are runnable on the detected GPU.

Select Models to Benchmark:

  • Lists the same models with checkboxes so the user can pick which ones to run benchmarks on.
  • Each option shows the memory requirement for clarity (e.g., gemma3.27b (17 GB), gpt-oss-120b (65 GB)).

🚀 Purpose

The tool helps developers and researchers:

  • See which LLMs are compatible with their GPU memory.
  • Select multiple models and run benchmarks to measure performance (latency, throughput, GPU utilization).
  • Use the results to compare models and make better deployment or scaling decisions.