PureTensor research initiative

Language research for preservation, archives, and computational methods.

Lingua is PureTensor’s language research initiative. It brings together research notes, technical essays, and prototype work around language preservation, archival text workflows, and low-resource computational linguistics.

Archival texts Endangered languages Low-resource NLP Corpus design
Focus

Where Lingua is focused.

Lingua concentrates on the practical side of language technology: how source material is digitised, organised, searched, analysed, and turned into usable research material.

Archival text workflows

Testing whether OCR, alignment, and corpus-building workflows can make fragile manuscripts and dispersed archives more legible to scholars.

Low-resource language tooling

Exploring how multilingual models, transcription systems, and annotation pipelines might support endangered and historical language work without overstating what current systems can do.

Research infrastructure

Treating compute, storage, search, and careful review processes as part of the research problem, not just the delivery layer around it.

Why this exists

Language work is not just a model problem.

For PureTensor, the interesting question is not only whether a model can classify or transcribe. It is whether the surrounding workflow — digitisation, review, corpus assembly, search, storage, provenance, and human judgement — can be improved enough to make small, careful research efforts materially more effective.

That is why Lingua combines research direction with infrastructure thinking, rather than treating models in isolation from the workflows around them.

Overview

A focused research line within PureTensor.

Lingua operates as a compact, publication-driven initiative. The emphasis is on careful problem selection, technical clarity, and work that can support future prototypes, partnerships, or archive-building efforts.

What Lingua includes

  • Research notes and technical essays
  • Computational linguistics and preservation-oriented tooling
  • Archive and corpus workflow design
  • Collaboration conversations where the fit is strong

How it operates

  • Led within PureTensor
  • Built around a focused research agenda
  • Publication first, prototypes where useful
  • Open to future partnerships and pilot work
Research notes

The working archive.

Research notes, essays, and working papers from the Lingua research line, retained as part of the PureTensor archive.

31 January 2026Audio archives

Sound Before Silence: How Audio Archives Are Preserving the World’s Tonal and Oral Languages

Of the roughly 7,000 languages spoken on Earth today, a significant proportion have never been written down. They exist only in the mouths and ears of their speakers — in conversation, in song, in the stories told at nig…

5 January 2026Undecoded scripts

Cracking the Code: How Computational Methods Are Deciphering the World’s Last Undecoded Scripts

For most of recorded history, the decipherment of ancient scripts has been a fundamentally human endeavour — part intuition, part obsessive pattern recognition, part luck. Michael Ventris spent years working on Linear B…

9 December 2025Sign languages

The Unheard Languages: Why Endangered Sign Languages Matter

When we talk about endangered languages, the conversation almost always centres on spoken words — the fading voices of elderly speakers, the unwritten grammars of remote communities, the oral traditions that die when the…

22 October 2025Revitalisation

Language Nests: How Immersion Schools Are Creating New Generations of Speakers

In a small classroom on the Big Island of Hawai’i, a three-year-old greets her teacher entirely in ʻōlelo Hawaiʻi — the Hawaiian language. Her parents don’t speak it. Her grandparents don’t speak it. But she does, becaus…

3 September 2025Extinction risk

One Language Dies Every Two Weeks: Inside the Global Extinction Crisis

Somewhere in the world, a language is falling silent. Not with a dramatic last word or a ceremonial farewell, but quietly — in the gap between an elderly grandmother who dreams in her mother tongue and grandchildren who…

14 August 2025Extinction risk

The Race Against Silence: How AI Is Rescuing Endangered Languages

A language dies roughly every two weeks. With nearly half of the world’s 7,000 languages at risk of vanishing within a generation, linguists and technologists are locked in an unprecedented race against time. But a new a…

Roadmap

A credible next step sequence.

The useful framing for Lingua is not launch theatre. It is disciplined sequencing: concept first, then pilot, then partnership only if the work warrants it.

Phase 1

Concept

Clarify the thesis, publish research notes, and narrow the actual problem worth building for.

Phase 2

Pilot

Assemble a small demonstration corpus and test a limited workflow end-to-end with human review.

Phase 3

Partnership

If the work proves useful, pursue collaborators, source access, and formal research relationships.

About

Built inside PureTensor.

PureTensor // Lingua

Lingua is led by Heimir Helgason within PureTensor, combining research direction, technical development, and infrastructure design.

If the work matures, the outward form can mature with it. Until then, the site is intentionally modest and direct.

Founder

Heimir Helgason
Founder, PureTensor

Infrastructure, systems design, and applied AI are the current backbone of the project. Linguistic depth and source access would need to be built through future collaboration rather than implied by branding alone.

Contact

Interested in the direction?

Use the form below if you want to discuss archival material, potential collaboration, or whether there is a genuinely useful pilot worth attempting.

Good reasons to get in touch

  • Access to source material or archives
  • Interest in a tightly scoped prototype
  • Scholarly or community grounding the initiative does not yet have internally
  • Thoughtful critique of the framing or methodology