First step and troubleshooting Docling — RAG with LlamaIndex on my CPU laptop

Alain Airom (Ayrom)
5 min readDec 16, 2024

Motivation

Following my tests with Docling, I wanted to try the “RAG with LLamaIndex” 🦙 functionality. Usually before moving forward and building something on my own, I use a sort of “quick and dirty” application by copy/paste of the sample provided and figure out it if works with my configuration or not.

Building the very 1st basic app (copy/paste of exmaples)

So to begin with “RAG with LLamaIndex”, I followed the instructions from the Docling documentation provided here: https://ds4sd.github.io/docling/examples/rag_llamaindex/

Hereafter the overview of Docling extensions for LLamaIndex.

This example leverages the official LlamaIndex Docling extension.

Presented extensions DoclingReader and DoclingNodeParser enable you to:

-use various document types in your LLM applications with ease and speed, and

-leverage Docling’s rich format for advanced, document-native grounding.

As in “quick & dirty” I copy/pasted the first 3 parts of samples provided into a main app.py file.

import os
from pathlib import Path
from tempfile import mkdtemp
from warnings import filterwarnings

from dotenv import load_dotenv

# for Tensorflow problem
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '1'
import tensorflow as tf
####

def _get_env_from_colab_or_os(key):
try:
from google.colab import userdata

try:
return userdata.get(key)
except userdata.SecretNotFoundError:
pass
except ImportError:
pass
return os.getenv(key)


load_dotenv()

filterwarnings(action="ignore", category=UserWarning, module="pydantic")
filterwarnings(action="ignore", category=FutureWarning, module="easyocr")
# https://github.com/huggingface/transformers/issues/5486:
os.environ["TOKENIZERS_PARALLELISM"] = "false"

########
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.llms.huggingface_api import HuggingFaceInferenceAPI

EMBED_MODEL = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
MILVUS_URI = str(Path(mkdtemp()) / "docling.db")
GEN_MODEL = HuggingFaceInferenceAPI(
token=_get_env_from_colab_or_os("HF_TOKEN"),
model_name="mistralai/Mixtral-8x7B-Instruct-v0.1",
)
SOURCE = "https://arxiv.org/pdf/2408.09869" # Docling Technical Report
QUERY = "Which are the main AI models in Docling?"

embed_dim = len(EMBED_MODEL.get_text_embedding("hi"))

########
from llama_index.core import StorageContext, VectorStoreIndex
from llama_index.core.node_parser import MarkdownNodeParser
from llama_index.readers.docling import DoclingReader
from llama_index.vector_stores.milvus import MilvusVectorStore

reader = DoclingReader()
node_parser = MarkdownNodeParser()

vector_store = MilvusVectorStore(
uri=str(Path(mkdtemp()) / "docling.db"), # or set as needed
dim=embed_dim,
overwrite=True,
)
index = VectorStoreIndex.from_documents(
documents=reader.load_data(SOURCE),
transformations=[node_parser],
storage_context=StorageContext.from_defaults(vector_store=vector_store),
embed_model=EMBED_MODEL,
)
result = index.as_query_engine(llm=GEN_MODEL).query(QUERY)
print(f"Q: {QUERY}\nA: {result.response.strip()}\n\nSources:")
#display([(n.text, n.metadata) for n in result.source_nodes])
print([(n.text, n.metadata) for n in result.source_nodes])

I created a .env file with my Huggingface token and sourced it.

HF_TOKEN="hf_xxxxxx"

And;

source .env

1st run and fail

The first attempt to run the app was unsuccessfull with the following message;

2024-12-16 15:45:13.455420: I tensorflow/core/platform/cpu_feature_guard.cc:210] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.

Troubleshooting

A quick search and with the help of stackoverflow I did the following;

conda create --name py11 python==3.11  
####
The following packages will be downloaded:

package | build
---------------------------|-----------------
bzip2-1.0.8 | h6c40b1e_6 151 KB
ca-certificates-2024.11.26 | hecd8cb5_0 132 KB
libffi-3.4.4 | hecd8cb5_1 129 KB
openssl-1.1.1w | hca72f7f_0 2.8 MB
pip-24.2 | py311hecd8cb5_0 2.8 MB
python-3.11.0 | h1fd4e5f_3 15.5 MB
setuptools-75.1.0 | py311hecd8cb5_0 2.2 MB
sqlite-3.45.3 | h6c40b1e_0 1.2 MB
tk-8.6.14 | h4d00af3_0 3.4 MB
tzdata-2024b | h04d1e81_0 115 KB
wheel-0.44.0 | py311hecd8cb5_0 150 KB
xz-5.4.6 | h6c40b1e_1 371 KB
zlib-1.2.13 | h4b97444_1 102 KB
------------------------------------------------------------
Total: 29.1 MB
The following NEW packages will be INSTALLED:

bzip2 pkgs/main/osx-64::bzip2-1.0.8-h6c40b1e_6
ca-certificates pkgs/main/osx-64::ca-certificates-2024.11.26-hecd8cb5_0
libffi pkgs/main/osx-64::libffi-3.4.4-hecd8cb5_1
ncurses pkgs/main/osx-64::ncurses-6.4-hcec6c5f_0
openssl pkgs/main/osx-64::openssl-1.1.1w-hca72f7f_0
pip pkgs/main/osx-64::pip-24.2-py311hecd8cb5_0
python pkgs/main/osx-64::python-3.11.0-h1fd4e5f_3
readline pkgs/main/osx-64::readline-8.2-hca72f7f_0
setuptools pkgs/main/osx-64::setuptools-75.1.0-py311hecd8cb5_0
sqlite pkgs/main/osx-64::sqlite-3.45.3-h6c40b1e_0
tk pkgs/main/osx-64::tk-8.6.14-h4d00af3_0
tzdata pkgs/main/noarch::tzdata-2024b-h04d1e81_0
wheel pkgs/main/osx-64::wheel-0.44.0-py311hecd8cb5_0
xz pkgs/main/osx-64::xz-5.4.6-h6c40b1e_1
zlib pkgs/main/osx-64::zlib-1.2.13-h4b97444_1


Proceed ([y]/n)? y
####
conda activate py11
####
conda install tensorflow
### ... some more installations...
conda install keras

 ✔  took 48s   py11   at 15:55:17  ▓▒░
Channels:
- defaults
Platform: osx-64
Collecting package metadata (repodata.json): done
Solving environment: done

# All requested packages already installed.

2nd run and success

So I ran again the application as it follows.

python3 app.py 
####
Fetching 9 files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 110054.62it/s]
Q: Which are the main AI models in Docling?
A: 1. A layout analysis model, an accurate object-detector for page elements. 2. TableFormer, a state-of-the-art table structure recognition model.

Sources:
[('## 3.2 AI models\n\nAs part of Docling, we initially release two highly capable AI models to the open-source community, which have been developed and published recently by our team. The first model is a layout analysis model, an accurate object-detector for page elements [13]. The second model is TableFormer [12, 9], a state-of-the-art table structure recognition model. We provide the pre-trained weights (hosted on huggingface) and a separate package for the inference code as docling-ibm-models . Both models are also powering the open-access deepsearch-experience, our cloud-native service for knowledge exploration tasks.', {'header_path': '/Docling Technical Report/'}), ("## 5 Applications\n\nThanks to the high-quality, richly structured document conversion achieved by Docling, its output qualifies for numerous downstream applications. For example, Docling can provide a base for detailed enterprise document search, passage retrieval or classification use-cases, or support knowledge extraction pipelines, allowing specific treatment of different structures in the document, such as tables, figures, section structure or references. For popular generative AI application patterns, such as retrieval-augmented generation (RAG), we provide quackling , an open-source package which capitalizes on Docling's feature-rich document output to enable document-native optimized vector embedding and chunking. It plugs in seamlessly with LLM frameworks such as LlamaIndex [8]. Since Docling is fast, stable and cheap to run, it also makes for an excellent choice to build document-derived datasets. With its powerful table structure recognition, it provides significant benefit to automated knowledge-base construction [11, 10]. Docling is also integrated within the open IBM data prep kit [6], which implements scalable data transforms to build large-scale multi-modal training datasets.", {'header_path': '/Docling Technical Report/'})]

Conclusion

A nice step for me on my Intel/CPU laptop. I learned how to move forward and execute the very simple examples provided on the Docling documentation site, it’s not rocket science but a first step!

Thanks for reading 🤗.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Alain Airom (Ayrom)
Alain Airom (Ayrom)

Written by Alain Airom (Ayrom)

IT guy for a long time, looking for technical challenges everyday!

No responses yet

What are your thoughts?