Skip to main content

ML Kit & TensorFlow Lite

Not every AI task needs an LLM. ML Kit ships pre-built features — text recognition, barcodes, faces, translation — that work on every Android device. TensorFlow Lite (rebranded LiteRT) is the escape hatch when you need a custom model. Both run 100% on-device.

ML Kit overview

📝

Text Recognition

OCR from camera or image. Supports 100+ scripts. Works offline.

🔍

Barcode Scanning

QR, UPC, EAN, PDF417, Data Matrix. 12+ formats.

😀

Face Detection

Faces + landmarks + expressions. Real-time.

🏃

Pose Detection

Human body pose, 33 landmarks, 3D.

🌐

Translation

60+ languages, offline after first download.

💬

Smart Reply

Suggested responses for chat messages.

🖼️

Image Labeling

Classify image content (custom model support).

✂️

Selfie Segmentation

Separate person from background. Real-time.

📄

Document Scanner

Auto-crop + perspective correct + enhance.

All ML Kit features are bundled (run fully offline after SDK install) or unbundled (downloaded from Play Services on demand).

Bundled vs unbundled

BundledUnbundled
Works offline immediatelyRequires Play Services
Larger APK (~5-20 MB per feature)Tiny APK; models downloaded lazily
Deterministic availabilityMay fail to download
Good for core featuresGood for optional features

Setup

// libs.versions.toml
ml-kit-text = { module = "com.google.mlkit:text-recognition", version = "16.0.0" }
ml-kit-text-cjk = { module = "com.google.mlkit:text-recognition-chinese", version = "16.0.0" }
ml-kit-barcode = { module = "com.google.mlkit:barcode-scanning", version = "17.3.0" }
ml-kit-face = { module = "com.google.mlkit:face-detection", version = "16.1.7" }
ml-kit-translate = { module = "com.google.mlkit:translate", version = "17.0.3" }
ml-kit-smart-reply = { module = "com.google.mlkit:smart-reply", version = "17.0.4" }
ml-kit-docscanner = { module = "com.google.android.gms:play-services-mlkit-document-scanner", version = "16.0.0-beta1" }
ml-kit-selfie-seg = { module = "com.google.mlkit:segmentation-selfie", version = "16.0.0-beta6" }

Text Recognition — OCR from images

class TextOcrService @Inject constructor() {
private val recognizer = TextRecognition.getClient(TextRecognizerOptions.DEFAULT_OPTIONS)

suspend fun recognize(bitmap: Bitmap): TextResult = suspendCancellableCoroutine { cont ->
val image = InputImage.fromBitmap(bitmap, 0)
recognizer.process(image)
.addOnSuccessListener { result ->
val blocks = result.textBlocks.map { block ->
TextBlock(
text = block.text,
boundingBox = block.boundingBox,
cornerPoints = block.cornerPoints?.toList() ?: emptyList()
)
}
cont.resume(TextResult(result.text, blocks)) {}
}
.addOnFailureListener { cont.resumeWithException(it) }
}
}

data class TextResult(val fullText: String, val blocks: List<TextBlock>)
data class TextBlock(val text: String, val boundingBox: Rect?, val cornerPoints: List<Point>)

In a CameraX pipeline

@OptIn(ExperimentalGetImage::class)
val analyzer = ImageAnalysis.Analyzer { imageProxy ->
val mediaImage = imageProxy.image ?: run { imageProxy.close(); return@Analyzer }
val input = InputImage.fromMediaImage(mediaImage, imageProxy.imageInfo.rotationDegrees)

recognizer.process(input)
.addOnSuccessListener { result ->
// Display detected text bounds on overlay
onTextDetected(result)
}
.addOnCompleteListener { imageProxy.close() } // critical
}

Real-time receipt scanning: ~10 FPS on a mid-tier device. See CameraX & Sensors for the full pipeline.


Barcode Scanning

val options = BarcodeScannerOptions.Builder()
.setBarcodeFormats(
Barcode.FORMAT_QR_CODE,
Barcode.FORMAT_EAN_13,
Barcode.FORMAT_UPC_A,
Barcode.FORMAT_CODE_128
)
.build()

val scanner = BarcodeScanning.getClient(options)

@OptIn(ExperimentalGetImage::class)
val analyzer = ImageAnalysis.Analyzer { imageProxy ->
val mediaImage = imageProxy.image ?: run { imageProxy.close(); return@Analyzer }
val input = InputImage.fromMediaImage(mediaImage, imageProxy.imageInfo.rotationDegrees)

scanner.process(input)
.addOnSuccessListener { barcodes ->
barcodes.firstOrNull()?.let { barcode ->
when (barcode.valueType) {
Barcode.TYPE_URL -> onUrl(barcode.url?.url)
Barcode.TYPE_WIFI -> onWifi(barcode.wifi?.ssid, barcode.wifi?.password)
Barcode.TYPE_CONTACT_INFO -> onContact(barcode.contactInfo)
else -> onGenericBarcode(barcode.rawValue)
}
}
}
.addOnCompleteListener { imageProxy.close() }
}

Barcode scanner restricts to specified formats — faster if you only need QR codes (common).


Face Detection

val options = FaceDetectorOptions.Builder()
.setPerformanceMode(FaceDetectorOptions.PERFORMANCE_MODE_ACCURATE) // or FAST
.setLandmarkMode(FaceDetectorOptions.LANDMARK_MODE_ALL)
.setClassificationMode(FaceDetectorOptions.CLASSIFICATION_MODE_ALL)
.setMinFaceSize(0.15f)
.enableTracking()
.build()

val detector = FaceDetection.getClient(options)

suspend fun detectFaces(bitmap: Bitmap): List<Face> = suspendCancellableCoroutine { cont ->
detector.process(InputImage.fromBitmap(bitmap, 0))
.addOnSuccessListener { faces ->
cont.resume(faces.map { face ->
Face(
bounds = face.boundingBox,
smilingProbability = face.smilingProbability ?: 0f,
leftEyeOpenProbability = face.leftEyeOpenProbability ?: 0f,
rightEyeOpenProbability = face.rightEyeOpenProbability ?: 0f,
rotationY = face.headEulerAngleY,
rotationZ = face.headEulerAngleZ,
landmarks = face.allLandmarks.map { it.position }
)
}) {}
}
.addOnFailureListener { cont.resumeWithException(it) }
}

Use for:

  • Auto-capture when everyone is smiling
  • Unlock via face (not production-secure; use BiometricPrompt for that)
  • AR filters (overlay glasses, hats)
  • Attention tracking (eye-open probability for fatigue detection)

Translation

class TranslationService {
suspend fun translate(text: String, sourceLang: String, targetLang: String): String {
val options = TranslatorOptions.Builder()
.setSourceLanguage(sourceLang)
.setTargetLanguage(targetLang)
.build()

val translator = Translation.getClient(options)

// Ensure model is downloaded
val conditions = DownloadConditions.Builder().requireWifi().build()
translator.downloadModelIfNeeded(conditions).await()

return translator.translate(text).await()
}
}

// Usage
val translated = translationService.translate(
text = "Hello, world!",
sourceLang = TranslateLanguage.ENGLISH,
targetLang = TranslateLanguage.SPANISH
)
// → "¡Hola mundo!"

Each language pair model is ~20-30 MB. First call triggers download (respect requireWifi()). Subsequent calls are instant and offline.

Language identification

val identifier = LanguageIdentification.getClient()
val lang = identifier.identifyLanguage(text).await()
// lang = "en", "hi", "und" (undetermined), etc.

Pair with Translation for auto-translate features.


Smart Reply

val smartReply = SmartReply.getClient()

suspend fun suggestReplies(conversation: List<ChatMessage>): List<String> {
val mlMessages = conversation.map { msg ->
if (msg.isFromUser) {
TextMessage.createForLocalUser(msg.body, msg.timestampMs)
} else {
TextMessage.createForRemoteUser(msg.body, msg.timestampMs, msg.senderId)
}
}
val result = smartReply.suggestReplies(mlMessages).await()
return result.suggestions.map { it.text }
}

Returns up to 3 short reply suggestions. Works for English only (as of 2025). Display below the chat input for quick-reply UX.


Pose Detection (MediaPipe Tasks)

Beyond ML Kit, MediaPipe Tasks offers more advanced features:

// libs.versions.toml
mediapipe-tasks = { module = "com.google.mediapipe:tasks-vision", version = "0.10.15" }
val baseOptions = BaseOptions.builder()
.setModelAssetPath("pose_landmarker_lite.task")
.build()
val options = PoseLandmarker.PoseLandmarkerOptions.builder()
.setBaseOptions(baseOptions)
.setRunningMode(RunningMode.LIVE_STREAM)
.setResultListener(::onResult)
.setErrorListener { /* ... */ }
.build()

val landmarker = PoseLandmarker.createFromOptions(context, options)

// In ImageAnalysis
@OptIn(ExperimentalGetImage::class)
val analyzer = ImageAnalysis.Analyzer { imageProxy ->
val bitmap = imageProxy.toBitmap()
val mpImage = BitmapImageBuilder(bitmap).build()
landmarker.detectAsync(mpImage, imageProxy.imageInfo.timestamp)
imageProxy.close()
}

private fun onResult(result: PoseLandmarkerResult, input: MPImage) {
val landmarks = result.landmarks().firstOrNull() ?: return
// 33 landmarks: nose, shoulders, elbows, wrists, hips, knees, ankles, ...
val leftWrist = landmarks[15]
val rightWrist = landmarks[16]
// Detect gestures / check form / count reps
}

Used for:

  • Fitness apps (squat counting, form analysis)
  • Gesture control
  • AR body overlays

TensorFlow Lite / LiteRT — custom models

When ML Kit doesn't fit, drop to TF Lite:

// libs.versions.toml
tflite-task = { module = "org.tensorflow:tensorflow-lite-task-vision", version = "0.4.4" }
tflite-support = { module = "org.tensorflow:tensorflow-lite-support", version = "0.4.4" }
tflite-gpu = { module = "org.tensorflow:tensorflow-lite-gpu-delegate-plugin", version = "0.4.4" }

Example — product classifier

You trained a model on your product catalog. Ship the .tflite file:

class ProductClassifier @Inject constructor(
@ApplicationContext private val context: Context
) {
private val classifier: ImageClassifier by lazy {
val options = ImageClassifier.ImageClassifierOptions.builder()
.setMaxResults(3)
.setScoreThreshold(0.5f)
.setBaseOptions(BaseOptions.builder().useGpu().build())
.build()

ImageClassifier.createFromFileAndOptions(context, "product_classifier.tflite", options)
}

fun classify(bitmap: Bitmap): List<Classification> {
val tensorImage = TensorImage.fromBitmap(bitmap)
val results = classifier.classify(tensorImage)
return results.firstOrNull()?.categories?.map {
Classification(label = it.label, score = it.score)
} ?: emptyList()
}
}

data class Classification(val label: String, val score: Float)

Delegates — hardware acceleration

DelegateSpeed upWhen to use
CPU (default)1xAlways works
GPU3-5xMost devices with a GPU
NNAPI2-10xNPU-equipped devices (Pixel 6+)
Hexagon DSP3-8xQualcomm-powered devices

Use useGpu() first; fall back to CPU if initialization fails.

Model distribution via Firebase ML

Instead of bundling the .tflite in the APK:

val conditions = CustomModelDownloadConditions.Builder()
.requireWifi()
.build()

FirebaseModelDownloader.getInstance()
.getModel("product_classifier_v2", DownloadType.LATEST_MODEL, conditions)
.addOnSuccessListener { model ->
val modelFile = model.file
val interpreter = Interpreter(modelFile!!)
// Use the interpreter
}

Firebase ML hosts the model. You can push new versions without app updates. Great for iterating on model quality.

Text embeddings for RAG

val embedder = TextEmbedder.createFromFile(context, "universal_sentence_encoder.tflite")

fun embed(text: String): FloatArray {
val embedding = embedder.embed(text)
return embedding.embeddings().first().floatArray()
}

fun cosineSimilarity(a: FloatArray, b: FloatArray): Float {
var dot = 0f; var magA = 0f; var magB = 0f
for (i in a.indices) {
dot += a[i] * b[i]
magA += a[i] * a[i]
magB += b[i] * b[i]
}
return dot / (sqrt(magA) * sqrt(magB))
}

Enables semantic search — find similar notes, group similar messages. Required for RAG with Gemini Nano (see Gemini Nano).


Model optimization

Quantization

Full-precision FP32 → INT8 quantization shrinks models 4x with minimal accuracy loss:

# Training-side (Python)
converter = tf.lite.TFLiteConverter.from_saved_model(saved_model_dir)
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.representative_dataset = representative_dataset_gen
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.int8
converter.inference_output_type = tf.int8
tflite_model = converter.convert()

Always quantize before shipping. A quantized model runs faster, uses less RAM, and ships as a smaller file.

Pruning

Remove weights that contribute little. Combined with quantization, a 30MB model becomes ~5MB.


Benchmarking inference

@get:Rule val rule = BenchmarkRule()

@Test fun classifier_inference_time() {
val classifier = ProductClassifier(context)
val bitmap = BitmapFactory.decodeResource(context.resources, R.drawable.sample_product)

rule.measureRepeated {
classifier.classify(bitmap)
}
}

Measure on the range of target devices — low-end matters more than flagship. Run in release builds for accurate numbers.


Common anti-patterns

Anti-patterns

ML Kit / TF Lite mistakes

  • Running inference on the main thread
  • Not closing ImageProxy after ML Kit call
  • Bundling huge TF models into APK (base APK bloat)
  • CPU-only delegates when GPU is available
  • Shipping float32 models without quantization
  • Not caching Translation model downloads
Best practices

Production ML

  • All inference on Dispatchers.Default or a dedicated executor
  • imageProxy.close() in addOnCompleteListener
  • Firebase ML for custom models > 5MB
  • GPU delegate primary; CPU fallback
  • INT8 quantization before shipping
  • Download Translation models on first use, keep them

Key takeaways

Practice exercises

  1. 01

    OCR from camera

    Integrate ML Kit Text Recognition into a CameraX ImageAnalysis. Overlay detected text bounds on the preview.

  2. 02

    Barcode → action

    Scan a QR code, parse URL / Wi-Fi / contact types, and trigger appropriate intents (open URL, save contact, etc.).

  3. 03

    Smart replies

    Add Smart Reply to your chat app. Display 3 suggestions below the message input.

  4. 04

    Custom TF Lite

    Train or download a simple image classifier. Integrate with GPU delegate. Benchmark inference time.

  5. 05

    RAG with embeddings

    Use Universal Sentence Encoder to embed 100 notes. Semantic search for a query. Pass top-5 to Gemini Nano for Q&A.

Next

Return to Module 21 Overview or continue to advanced platform topics like Graphics & Rendering.