Back to News

Lighter and More Accurate: GeneLLM™ Builds a Science Version of DeepSeek for Gene Diagnosis

February 2025

With the rapid development of artificial intelligence, the bioscience field is undergoing a profound transformation. AI not only brings new tools and methods for drug discovery and disease diagnosis but also redefines the fundamental research paradigm of bioscience through data-driven approaches. However, faced with massive data and complex systems, traditional scientific computing resources and technologies are gradually revealing their limitations. OxTium Technology, with its independently developed GeneLLM™ large model, systematically innovates across hardware, algorithms, architecture, optimization, and data to provide lightweight and precise solutions, driving the deep integration of AI and bioscience.

Entering the DeepSeek Era: The Growing Contradiction Between Multi-Dimensional Data and AI Computing Limitations

Although general AI platforms like DeepSeek have achieved cost-efficient training through underlying technologies and the integration of tens of thousands of NVIDIA A100 GPUs (reducing costs by about 50% compared to traditional distributed computing), specialized AI for Bioscience large models still face severe computing bottlenecks and high cost pressures when processing multi-dimensional data in the bioscience field. The core of this contradiction lies in the complexity, diversity, and scale of bioscience data, which impose exponentially growing performance demands on computing resources—technical challenges far exceeding those of conventional language models.

Taking genomics as an example, a single human whole-genome sequencing generates raw data between 100 GB and 200 GB (covering 3 billion base pairs and sequencing information), while large cohort studies (e.g., UK Biobank) need to process petabytes of data from over 500,000 samples. Proteomics data is even more complex: a single mass spectrometry experiment can produce tens of thousands to millions of peptide signals. If multi-omics data needs to be loaded simultaneously for computationally intensive tasks such as RNA 3D structure prediction or molecular dynamics simulations, the computational complexity often grows nonlinearly, with memory requirements exceeding 1 TB.

Therefore, bioscience AI models and language models are not in the same track. They differ fundamentally not only in data types and computational requirements but also in storage methods, technical frameworks, algorithmic logic, and application scenarios. Language models primarily handle syntax, semantics, and context in natural language, while bioscience models must process complex biological data while considering the cross-disciplinary integration of biology, chemistry, physics, mathematics, and more. They even need to mine potential research paradigms from vast basic biological information. Thus, applying models in bioscience requires a cross-disciplinary "scientific revolution" to break the limitations of traditional computing and even basic research methods, driving a revolutionary improvement in productivity.

From this perspective, only innovative underlying self-developed scientific models can deliver lower-cost, higher-efficiency research solutions.

Since its establishment in 2022, OxTium Technology has deeply understood the three major pain points in bioscience research: high demand for computing resources, insufficient model generality, and complex and diverse data. Starting from five angles—lightweight architecture, dual-configuration chips, underlying algorithm optimization, expert-level data screening, and efficient storage technology—it has taken the lead in deploying the lightweight multi-omics large model GeneLLM™:

1. Pain Point 1: High demand for computing resources. Training and inference of large-scale models require extremely high computing power, leading to high costs and limited accessibility. OxTium Technology employs multiple strategies, deploying both cloud platforms and inference all-in-one machines with imported and domestic chip solutions. The core GeneLLM™ model, after optimization, significantly reduces computing and storage requirements (as low as 1.5B parameters). Meanwhile, using intelligent resource scheduling technology, GeneLLM™ can run efficiently on cloud platforms and desktop inference all-in-one machines, greatly reducing scientific research computing costs.

2. Pain Point 2: Insufficient model generality. Existing bioscience models are often designed for specific tasks and lack cross-domain, cross-scenario versatility. GeneLLM™ is trained from raw biological data (e.g., sequencing data) and, through efficient compression technology, can complete analysis of a single disease with as few as a hundred cases.

3. Pain Point 3: Complex and diverse data. The diversity of biological data makes deep learning algorithms perform poorly in practical applications, failing to meet the innovative research needs of bioscience. GeneLLM™, through adaptive learning and multimodal data integration technology, combined with the Bioford™ platform hosting hundreds of biological models, can efficiently process complex biological data, improving algorithm robustness and accuracy, thus meeting the diverse needs of innovative research.

GeneLLM™ and DeepSeek: Shared Academic Roots from Peking University AI

GeneLLM™ is a large language model integrating OxTium Technology's core technologies. Its design philosophy aligns with DeepSeek's core, emphasizing specialized applications in vertical scientific fields and driving industry innovation. Professor Sha Lei, co-founder of OxTium Technology, graduated from the Institute of Computational Linguistics, School of Computer Science, Peking University. He has long been dedicated to AI algorithm optimization and industrial applications, having served as an associate researcher at Oxford University and a senior NLP scientist at Apple. Professor Sha not only provides deep technical support for GeneLLM™'s development but also brings unique advantages in advancing science and technology due to his outstanding contributions in AI. Notably, Professor Sha and the core R&D team of DeepSeek share the same academic lineage, being a senior fellow student of Luo Fuli, Dai Damai, and others.

Based on GeneLLM™, the one-stop bioscience research platform Bioford™ now integrates hundreds of bioscience large models, covering basic research, medical diagnosis, drug discovery, biomanufacturing, breeding, environmental monitoring, and more. It features a user-friendly modular interface design and provides an extensible model framework for individual tasks, supporting adaptation and integration across domains, greatly enhancing the model's versatility and practicality in different usage scenarios.

The core advantages of GeneLLM™ include:

1. Multi-domain, multi-dimensional data integration: Capable of processing multi-dimensional data such as genomics, transcriptomics, proteomics, metagenomics, and epigenomics, deeply integrating AI algorithms, genomics, bioinformatics, and other technological achievements to provide comprehensive research support.

2. Cross-domain knowledge transfer: Through pre-training and fine-tuning, the platform's models can adapt to diverse task requirements in basic research, medical diagnosis, biomanufacturing, breeding, environmental monitoring, and disease treatment, offering high flexibility. Additionally, based on different customer needs, it provides lightweight inference devices and customized solutions, lowering the technical threshold for small and medium-sized research institutions.

3. Efficient inference capability: GeneLLM™ can complete small-sample data fine-tuning for a single disease within weeks, significantly improving research efficiency.

Based on GeneLLM™, OxTium Technology also achieves rapid industrialization of bioscience research through AI algorithms and models. For example, the Jinling series of kits use AI technology to achieve import substitution of high-end biological reagents. This layout not only meets the localization needs of bioscience research but also further addresses the cost pain points of small and medium-sized research institutions in equipment investment and technology use.

The AI Foundation for Bioscience: GeneLLM™ Leading a "Scientific Revolution" in Basic Research

Going forward, GeneLLM™ will continue to provide efficient research tools for the bioscience field, driving innovation in basic research. For instance, in cancer genomics and early risk assessment of Alzheimer's disease, GeneLLM™ has already shown significant potential. As the technology matures, more diseases will be able to achieve breakthroughs in basic research through this platform, greatly improving innovation efficiency for enterprises and research institutions while reducing industrialization costs.

In the long term, OxTium Technology aims to make the Bioford™ platform a standard infrastructure in the bioscience field, becoming a core support for global researchers and enterprises. By continuously expanding platform functions, Bioford™ will cover more application scenarios, including drug screening, environmental monitoring, and breeding. As a pioneer at the intersection of AI and bioscience, OxTium Technology has not only made great breakthroughs in technological innovation but also demonstrated highly forward-looking strategic insight into industry prospects.

Next, OxTium Technology will focus on promoting cross-border innovation in AI+bioscience through global partnerships. For example, the strategic collaboration with Peking University Hospital not only validates the clinical value of GeneLLM™ but also provides new solutions for early risk assessment of gastric and colorectal cancers. Additionally, OxTium Technology emphasizes social responsibility, ensuring ethical compliance in technology applications and promoting a balance between technological innovation and social welfare. Looking ahead, OxTium Technology will continue its mission of "Exploring the Mysteries of Life with AI Technology," accelerating the formation of new productive forces through the deep application of GeneLLM™ and driving the intelligent transformation of global bioscience research.

Leading is just the first step; our goal is to rise in a trillion-level track.