W3C JSON-LD结构化数据标准在知识图谱构建与语义搜索中的应用案例与实战解析

引言：JSON-LD在现代数据生态中的核心地位

JSON-LD（JSON for Linking Data）是W3C推荐的结构化数据标准，它通过在JSON格式基础上增加语义链接能力，使得普通JSON数据具备了链接数据的特性。在知识图谱构建和语义搜索领域，JSON-LD已经成为事实上的标准格式，因为它既保持了开发者熟悉的JSON语法，又实现了RDF数据模型的全部能力。

JSON-LD的核心优势在于其上下文（Context）机制。通过@context字段，我们可以将简单的键值对映射到全局唯一的URI，从而消除数据孤岛。例如，一个普通的JSON对象：

{ "name": "张三", "age": 30, "email": "zhangsan@example.com" }

通过添加JSON-LD上下文，立即获得了语义明确性：

{ "@context": "https://schema.org", "@type": "Person", "name": "张三", "age": 30, "email": "zhangsan@example.com" }

现在，”name”被明确为schema.org/Person/name，”age”被明确为schema.org/Person/age，任何系统都能准确理解这些数据的含义。

JSON-LD基础语法与核心概念

1. 上下文（Context）定义

上下文是JSON-LD的灵魂，它定义了键与URI的映射关系。在实际应用中，我们通常使用标准词汇表如Schema.org或自定义上下文。

标准上下文示例：

{ "@context": { "schema": "https://schema.org/", "ex": "https://example.com/vocab#", "name": "schema:name", "description": "schema:description", "author": "schema:author", "publicationDate": "schema:datePublished" }, "@type": "schema:Book", "name": "知识图谱实战", "description": "一本关于知识图谱构建的实用指南", "author": { "name": "李四" }, "publicationDate": "2024-01-15" }

内联上下文与外部引用：

// 方式1：内联上下文 { "@context": { "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#", "rdfs": "http://www.w3.org/2000/01/rdf-schema#", "ex": "http://example.org/vocab#" }, "@id": "http://example.org/resource1", "ex:property": "value" } // 方式2：外部引用（推荐用于生产环境） { "@context": "https://example.com/context.jsonld", "@id": "http://example.org/resource1", "ex:property": "value" }

2. 节点标识与引用

JSON-LD使用@id字段来唯一标识资源，支持复杂的实体关系建模。

实体关系建模示例：

{ "@context": { "schema": "https://schema.org/", "ex": "https://example.com/vocab#", "knows": "schema:knows", "worksFor": "schema:worksFor", "employee": "schema:employee" }, "@graph": [ { "@id": "https://example.com/people/zhangsan", "@type": "schema:Person", "schema:name": "张三", "schema:email": "zhangsan@example.com", "knows": [ {"@id": "https://example.com/people/lisi"}, {"@id": "https://example.com/people/wangwu"} ], "worksFor": {"@id": "https://example.com/orgs/acme"} }, { "@id": "https://example.com/people/lisi", "@type": "schema:Person", "schema:name": "李四", "schema:email": "lisi@example.com" }, { "@id": "https://example.com/orgs/acme", "@type": "schema:Organization", "schema:name": "Acme Corp", "employee": {"@id": "https://example.com/people/zhangsan"} } ] }

3. 嵌套结构与复杂类型

JSON-LD支持深度嵌套的复杂数据结构，这对于构建丰富的知识图谱至关重要。

复杂知识图谱结构示例：

{ "@context": { "schema": "https://schema.org/", "ex": "https://example.com/vocab#", "medical": "https://example.com/medical#", "symptom": "medical:symptom", "diagnosis": "medical:diagnosis", "treatment": "medical:treatment" }, "@type": "schema:MedicalCondition", "@id": "https://example.com/conditions/diabetes", "schema:name": "糖尿病", "schema:description": "一种慢性代谢性疾病", "medical:symptom": [ { "@type": "schema:MedicalSymptom", "schema:name": "多饮", "schema:description": "异常口渴" }, { "@type": "schema:MedicalSymptom", "schema:name": "多尿", "schema:description": "尿量异常增多" } ], "medical:diagnosis": { "@type": "schema:MedicalProcedure", "schema:name": "血糖检测", "schema:procedureType": "diagnostic" }, "medical:treatment": { "@type": "schema:MedicalTherapy", "schema:name": "胰岛素治疗", "schema:description": "通过注射胰岛素控制血糖" } }

知识图谱构建中的JSON-LD应用

1. 数据源集成与标准化

在知识图谱构建中，JSON-LD作为统一的数据交换格式，能够整合来自不同源的数据。

企业数据集成案例：

import json from datetime import datetime # 原始数据源1：CRM系统 crm_data = { "customer_id": "C001", "name": "张三", "email": "zhangsan@company.com", "phone": "13800138000", "company": "ABC科技", "created_date": "2024-01-10" } # 原始数据源2：ERP系统 erp_data = { "cust_id": "C001", "full_name": "张三", "contact_email": "zhangsan@company.com", "organization": "ABC科技", "join_date": "2024-01-10", "orders": [ {"order_id": "O001", "amount": 15000, "date": "2024-02-01"}, {"order_id": "O002", "amount": 23000, "date": "2024-03-15"} ] } # 转换为统一的JSON-LD格式 def convert_to_jsonld(crm, erp): # 合并数据 customer_id = crm["customer_id"] # 构建JSON-LD jsonld = { "@context": { "schema": "https://schema.org/", "ex": "https://example.com/vocab#", "order": "ex:order", "orderAmount": "ex:orderAmount", "orderDate": "ex:orderDate" }, "@id": f"https://example.com/customers/{customer_id}", "@type": "schema:Person", "schema:name": crm["name"], "schema:email": crm["email"], "schema:telephone": crm["phone"], "schema:worksFor": { "@type": "schema:Organization", "schema:name": crm["company"] }, "ex:customerSince": crm["created_date"], "ex:orderHistory": [] } # 添加订单信息 for order in erp["orders"]: jsonld["ex:orderHistory"].append({ "@type": "ex:Order", "ex:orderId": order["order_id"], "ex:orderAmount": order["amount"], "ex:orderDate": order["date"] }) return jsonld # 执行转换 unified_customer = convert_to_jsonld(crm_data, erp_data) print(json.dumps(unified_customer, indent=2, ensure_ascii=False))

输出结果：

{ "@context": { "schema": "https://schema.org/", "ex": "https://example.com/vocab#", "order": "ex:order", "orderAmount": "ex:orderAmount", "orderDate": "ex:orderDate" }, "@id": "https://example.com/customers/C001", "@type": "schema:Person", "schema:name": "张三", "schema:email": "zhangsan@company.com", "schema:telephone": "13800138000", "schema:worksFor": { "@type": "schema:Organization", "schema:name": "ABC科技" }, "ex:customerSince": "2024-01-10", "ex:orderHistory": [ { "@type": "ex:Order", "ex:orderId": "O001", "ex:orderAmount": 15000, "ex:orderDate": "2024-02-01" }, { "@type": "ex:Order", "ex:orderId": "O002", "ex:orderAmount": 23000, "ex:orderDate": "2024-03-15" } ] }

2. 实体链接与消歧

JSON-LD通过@id和外部链接实现跨系统的实体链接和消歧。

实体链接实战：

import requests import json # 本地实体 local_entity = { "@context": { "schema": "https://schema.org/", "dbpedia": "http://dbpedia.org/resource/", "wikidata": "http://www.wikidata.org/entity/" }, "@type": "schema:Person", "schema:name": "刘德华", "schema:birthDate": "1961-09-27", "schema:nationality": "中国香港" } # 链接到外部知识库 def link_to_external(entity): # 链接到Wikidata entity["schema:subjectOf"] = { "@id": "wikidata:Q123456" # 假设的Wikidata ID } # 链接到DBpedia entity["schema:sameAs"] = [ {"@id": "dbpedia:Liu_Dehua"}, {"@id": "http://viaf.org/viaf/100187499"} ] return entity linked_entity = link_to_external(local_entity) print(json.dumps(linked_entity, indent=2, ensure_ascii=False))

3. 批量处理与流式转换

处理大规模数据时，需要高效的批量转换策略。

流式JSON-LD处理示例：

import json import ijson # 用于流式解析大JSON文件 def stream_convert_to_jsonld(input_file, output_file, context): """ 流式转换大量数据到JSON-LD """ with open(input_file, 'r', encoding='utf-8') as infile, open(output_file, 'w', encoding='utf-8') as outfile: # 写入JSON-LD数组开始 outfile.write('[n') first = True # 使用ijson流式解析 parser = ijson.parse(infile) prefix = 'item' for prefix, event, value in parser: if event == 'start_map': if not first: outfile.write(',n') first = False # 开始构建单个JSON-LD文档 outfile.write(' {n') outfile.write(f' "@context": {json.dumps(context)},n') outfile.write(f' "@type": "schema:Person",n') elif event == 'map_key': current_key = value # 读取下一个值 try: _, _, val = next(parser) if current_key == 'id': outfile.write(f' "@id": "https://example.com/people/{val}",n') elif current_key == 'name': outfile.write(f' "schema:name": "{val}",n') elif current_key == 'email': outfile.write(f' "schema:email": "{val}",n') except StopIteration: break # 写入数组结束 outfile.write('n ]n') # 使用示例 context = { "schema": "https://schema.org/", "ex": "https://example.com/vocab#" } # 假设有一个大文件 stream_convert_to_jsonld('large_users.json', 'users.jsonld', context)

语义搜索中的JSON-LD应用

1. 构建语义索引

JSON-LD为语义搜索引擎提供了丰富的元数据，支持基于实体和关系的搜索。

语义索引构建示例：

from rdflib import Graph, Namespace, Literal, URIRef from rdflib.namespace import RDF, RDFS, XSD import json def build_semantic_index(jsonld_data): """ 将JSON-LD转换为RDF图，用于语义查询 """ g = Graph() # 定义命名空间 SCHEMA = Namespace("https://schema.org/") EX = Namespace("https://example.com/vocab#") # 解析JSON-LD if isinstance(jsonld_data, str): data = json.loads(jsonld_data) else: data = jsonld_data # 处理单个实体或实体数组 entities = data if isinstance(data, list) else [data] for entity in entities: # 获取实体ID entity_id = entity.get('@id') if not entity_id: continue subject = URIRef(entity_id) # 添加类型 if '@type' in entity: for t in entity['@type']: g.add((subject, RDF.type, URIRef(t))) # 添加属性 for key, value in entity.items(): if key.startswith('@'): continue # 处理URI属性 if isinstance(value, dict) and '@id' in value: g.add((subject, URIRef(key), URIRef(value['@id']))) elif isinstance(value, list): for item in value: if isinstance(item, dict) and '@id' in item: g.add((subject, URIRef(key), URIRef(item['@id']))) else: g.add((subject, URIRef(key), Literal(item))) else: g.add((subject, URIRef(key), Literal(value))) return g # 示例：构建产品知识图谱索引 product_data = { "@context": { "schema": "https://schema.org/", "ex": "https://example.com/vocab#" }, "@id": "https://example.com/products/P001", "@type": "schema:Product", "schema:name": "智能手机X1", "schema:description": "高端智能手机", "schema:brand": { "@id": "https://example.com/brands/BrandA", "@type": "schema:Brand", "schema:name": "品牌A" }, "schema:category": "电子产品", "schema:offers": { "@type": "schema:Offer", "schema:price": "5999.00", "schema:priceCurrency": "CNY", "schema:availability": "https://schema.org/InStock" }, "ex:hasFeature": [ {"@type": "schema:PropertyValue", "schema:name": "屏幕", "schema:value": "6.7英寸"}, {"@type": "schema:PropertyValue", "schema:name": "摄像头", "schema:value": "1亿像素"} ] } semantic_index = build_semantic_index(product_data) # 查询示例：查找所有价格低于6000的智能手机 query = """ PREFIX schema: <https://schema.org/> PREFIX ex: <https://example.com/vocab#> SELECT ?product ?name ?price WHERE { ?product a schema:Product ; schema:name ?name ; schema:offers ?offer . ?offer schema:price ?price . FILTER (xsd:decimal(?price) < 6000) } """ results = semantic_index.query(query) for row in results: print(f"产品: {row.name}, 价格: {row.price}")

2. 语义查询优化

利用JSON-LD的上下文信息优化查询性能。

高级语义查询示例：

from rdflib import Graph, Namespace, Literal, URIRef from rdflib.plugins.sparql import prepareQuery class SemanticSearchEngine: def __init__(self): self.graph = Graph() self.namespaces = { 'schema': Namespace('https://schema.org/'), 'ex': Namespace('https://example.com/vocab#'), 'rdf': RDF, 'rdfs': RDFS } def load_jsonld(self, jsonld_data): """加载JSON-LD数据""" self.graph.parse(data=json.dumps(jsonld_data), format='json-ld') def search_by_semantic_similarity(self, query_entity, threshold=0.8): """ 基于语义相似度的搜索 """ # 构建查询：查找相同类型的实体 q = prepareQuery(''' SELECT ?entity ?name ?type WHERE { ?entity a ?type ; schema:name ?name . ?query_entity a ?query_type . FILTER(?type = ?query_type) } ''', initNs=self.namespaces) # 执行查询 results = self.graph.query(q, initBindings={ 'query_entity': URIRef(query_entity), 'query_type': URIRef('https://schema.org/Person') }) return [(str(r.entity), r.name) for r in results] def find_related_entities(self, entity_uri, relation): """ 查找相关实体 """ q = prepareQuery(''' SELECT ?related ?relatedName WHERE { ?entity ?relation ?related . ?related schema:name ?relatedName . } ''', initNs=self.namespaces) results = self.graph.query(q, initBindings={ 'entity': URIRef(entity_uri), 'relation': URIRef(relation) }) return [(str(r.related), r.relatedName) for r in results] # 使用示例 engine = SemanticSearchEngine() # 加载数据 data = { "@context": { "schema": "https://schema.org/", "ex": "https://example.com/vocab#" }, "@graph": [ { "@id": "https://example.com/people/zhangsan", "@type": "schema:Person", "schema:name": "张三", "schema:knows": {"@id": "https://example.com/people/lisi"} }, { "@id": "https://example.com/people/lisi", "@type": "schema:Person", "schema:name": "李四" } ] } engine.load_jsonld(data) # 搜索 results = engine.find_related_entities( "https://example.com/people/zhangsan", "https://schema.org/knows" ) print("张三认识的人:", results)

3. 语义搜索API实现

构建一个完整的语义搜索服务。

RESTful语义搜索API：

from flask import Flask, request, jsonify from rdflib import Graph, Namespace, URIRef import json import requests app = Flask(__name__) class JSONLDSemanticSearch: def __init__(self): self.graph = Graph() self.namespaces = { 'schema': Namespace('https://schema.org/'), 'ex': Namespace('https://example.com/vocab#') } def add_document(self, jsonld_doc): """添加文档到索引""" try: self.graph.parse(data=json.dumps(jsonld_doc), format='json-ld') return True except Exception as e: print(f"Error adding document: {e}") return False def search(self, query_params): """ 执行语义搜索 query_params: { "type": "Product", "filters": {"brand": "BrandA", "price_max": 6000}, "keywords": "智能手机" } """ # 构建动态查询 query_parts = [ "PREFIX schema: <https://schema.org/>", "PREFIX ex: <https://example.com/vocab#>", "SELECT ?entity ?name ?price ?brand WHERE {", "?entity a schema:" + query_params['type'] + " ;", "schema:name ?name ;", "schema:offers ?offer .", "?offer schema:price ?price .", "?entity schema:brand ?brandObj .", "?brandObj schema:name ?brand ." ] # 添加过滤条件 if 'filters' in query_params: filters = query_params['filters'] if 'brand' in filters: query_parts.append(f'FILTER (str(?brand) = "{filters["brand"]}")') if 'price_max' in filters: query_parts.append(f'FILTER (xsd:decimal(?price) <= {filters["price_max"]})') # 添加关键词搜索 if 'keywords' in query_params: query_parts.append(f'FILTER (regex(?name, "{query_params["keywords"]}", "i"))') query_parts.append("}") query_str = "n".join(query_parts) results = [] for row in self.graph.query(query_str): results.append({ "entity": str(row.entity), "name": str(row.name), "price": str(row.price), "brand": str(row.brand) }) return results # 初始化搜索引擎 search_engine = JSONLDSemanticSearch() @app.route('/api/v1/documents', methods=['POST']) def add_document(): """添加JSON-LD文档""" doc = request.get_json() if search_engine.add_document(doc): return jsonify({"status": "success"}), 201 else: return jsonify({"status": "error"}), 400 @app.route('/api/v1/search', methods=['GET']) def search(): """执行语义搜索""" query = { "type": request.args.get('type', 'Product'), "filters": {}, "keywords": request.args.get('keywords', '') } # 解析过滤器 brand = request.args.get('brand') if brand: query['filters']['brand'] = brand price_max = request.args.get('price_max') if price_max: query['filters']['price_max'] = float(price_max) results = search_engine.search(query) return jsonify({"results": results, "count": len(results)}) if __name__ == '__main__': # 预加载一些测试数据 sample_docs = [ { "@context": {"schema": "https://schema.org/", "ex": "https://example.com/vocab#"}, "@id": "https://example.com/products/P001", "@type": "schema:Product", "schema:name": "智能手机X1", "schema:brand": {"@id": "https://example.com/brands/BrandA", "schema:name": "品牌A"}, "schema:offers": {"schema:price": "5999.00", "schema:priceCurrency": "CNY"} }, { "@context": {"schema": "https://schema.org/", "ex": "https://example.com/vocab#"}, "@id": "https://example.com/products/P002", "@type": "schema:Product", "schema:name": "笔记本电脑Y2", "schema:brand": {"@id": "https://example.com/brands/BrandB", "schema:name": "品牌B"}, "schema:offers": {"schema:price": "8999.00", "schema:priceCurrency": "CNY"} } ] for doc in sample_docs: search_engine.add_document(doc) app.run(debug=True, port=5000)

实际应用案例分析

案例1：电商平台知识图谱构建

背景：某大型电商平台需要整合商品、品牌、用户行为数据，构建统一的知识图谱。

解决方案：

# 电商知识图谱构建器 class EcommerceKnowledgeGraph: def __init__(self): self.graph = Graph() self.base_uri = "https://ecommerce.example.com/" def add_product(self, product_data): """添加商品""" jsonld = { "@context": { "schema": "https://schema.org/", "ec": "https://ecommerce.example.com/vocab#", "brand": "schema:brand", "category": "schema:category", "price": "schema:offers", "rating": "schema:aggregateRating", "review": "schema:review" }, "@id": f"{self.base_uri}products/{product_data['id']}", "@type": "schema:Product", "schema:name": product_data['name'], "schema:description": product_data['description'], "brand": { "@id": f"{self.base_uri}brands/{product_data['brand_id']}", "schema:name": product_data['brand_name'] }, "category": product_data['category'], "price": { "@type": "schema:Offer", "schema:price": str(product_data['price']), "schema:priceCurrency": "CNY", "schema:availability": "https://schema.org/InStock" if product_data['stock'] > 0 else "https://schema.org/OutOfStock" }, "rating": { "@type": "schema:AggregateRating", "schema:ratingValue": str(product_data['rating']), "schema:reviewCount": str(product_data['review_count']) } } # 添加用户行为关系 if 'viewed_by' in product_data: for user_id in product_data['viewed_by']: jsonld[f"{self.base_uri}vocab/viewedBy"] = { "@id": f"{self.base_uri}users/{user_id}" } self.graph.parse(data=json.dumps(jsonld), format='json-ld') def add_user(self, user_data): """添加用户""" jsonld = { "@context": { "schema": "https://schema.org/", "ec": "https://ecommerce.example.com/vocab#" }, "@id": f"{self.base_uri}users/{user_data['id']}", "@type": "schema:Person", "schema:name": user_data['name'], "schema:email": user_data['email'], "ec:memberLevel": user_data['level'], "ec:registrationDate": user_data['reg_date'] } self.graph.parse(data=json.dumps(jsonld), format='json-ld') def get_recommendations(self, user_id, limit=5): """基于知识图谱的推荐""" query = f""" PREFIX schema: <https://schema.org/> PREFIX ec: <https://ecommerce.example.com/vocab#> SELECT ?product ?name ?price ?brand WHERE {{ ?user ec:viewed ?product . ?product schema:name ?name ; schema:offers ?offer . ?offer schema:price ?price . ?product schema:brand ?brandObj . ?brandObj schema:name ?brand . FILTER(?user = <{self.base_uri}users/{user_id}>) }} ORDER BY DESC(?price) LIMIT {limit} """ results = [] for row in self.graph.query(query): results.append({ "product": str(row.product), "name": str(row.name), "price": str(row.price), "brand": str(row.brand) }) return results # 使用示例 kg = EcommerceKnowledgeGraph() # 添加商品 kg.add_product({ "id": "P001", "name": "iPhone 15 Pro", "description": "最新款智能手机", "brand_id": "B001", "brand_name": "Apple", "category": "手机", "price": 7999.00, "stock": 100, "rating": 4.8, "review_count": 1500, "viewed_by": ["U001", "U002"] }) # 添加用户 kg.add_user({ "id": "U001", "name": "张三", "email": "zhangsan@example.com", "level": "gold", "reg_date": "2023-01-15" }) # 获取推荐 recommendations = kg.get_recommendations("U001") print("推荐商品:", recommendations)

案例2：医疗知识图谱与语义搜索

背景：医院需要构建疾病-症状-药物知识图谱，支持医生进行语义搜索。

解决方案：

class MedicalKnowledgeGraph: def __init__(self): self.graph = Graph() self.med_ns = Namespace("https://medical.example.com/vocab#") self.schema_ns = Namespace("https://schema.org/") def add_disease(self, disease_data): """添加疾病""" jsonld = { "@context": { "schema": "https://schema.org/", "med": "https://medical.example.com/vocab#", "symptom": "med:symptom", "treatment": "med:treatment", "drug": "med:drug" }, "@id": f"https://medical.example.com/diseases/{disease_data['id']}", "@type": "schema:MedicalCondition", "schema:name": disease_data['name'], "schema:description": disease_data['description'], "med:severity": disease_data['severity'], "symptom": [], "treatment": [] } # 添加症状 for symptom in disease_data.get('symptoms', []): jsonld["symptom"].append({ "@type": "schema:MedicalSymptom", "schema:name": symptom['name'], "med:frequency": symptom.get('frequency', 'common') }) # 添加治疗方案 for treatment in disease_data.get('treatments', []): jsonld["treatment"].append({ "@type": "schema:MedicalTherapy", "schema:name": treatment['name'], "med:effectiveness": treatment.get('effectiveness', 'medium'), "drug": [{ "@type": "schema:Drug", "schema:name": drug['name'], "schema:dosage": drug.get('dosage', '') } for drug in treatment.get('drugs', [])] }) self.graph.parse(data=json.dumps(jsonld), format='json-ld') def semantic_search(self, query_terms, search_type='symptom'): """ 语义搜索：根据症状查找疾病，或根据疾病查找治疗方案 """ if search_type == 'symptom': query = f""" PREFIX schema: <https://schema.org/> PREFIX med: <https://medical.example.com/vocab#> SELECT ?disease ?name ?description WHERE {{ ?disease a schema:MedicalCondition ; schema:name ?name ; schema:description ?description ; med:symptom ?symptom . ?symptom schema:name ?symptomName . FILTER(regex(?symptomName, '{"|".join(query_terms)}', "i")) }} """ elif search_type == 'treatment': query = f""" PREFIX schema: <https://schema.org/> PREFIX med: <https://medical.example.com/vocab#> SELECT ?treatment ?drugName ?effectiveness WHERE {{ ?disease a schema:MedicalCondition ; schema:name ?diseaseName ; med:treatment ?treatment . ?treatment med:drug ?drug ; med:effectiveness ?effectiveness . ?drug schema:name ?drugName . FILTER(regex(?diseaseName, '{"|".join(query_terms)}', "i")) }} """ else: return [] results = [] for row in self.graph.query(query): if search_type == 'symptom': results.append({ "disease": str(row.disease), "name": str(row.name), "description": str(row.description) }) else: results.append({ "treatment": str(row.treatment), "drug": str(row.drugName), "effectiveness": str(row.effectiveness) }) return results # 使用示例 med_kg = MedicalKnowledgeGraph() # 添加疾病数据 disease_data = { "id": "D001", "name": "2型糖尿病", "description": "胰岛素抵抗导致的慢性代谢病", "severity": "high", "symptoms": [ {"name": "多饮", "frequency": "very_common"}, {"name": "多尿", "frequency": "very_common"}, {"name": "体重下降", "frequency": "common"} ], "treatments": [ { "name": "胰岛素治疗", "effectiveness": "high", "drugs": [ {"name": "二甲双胍", "dosage": "500mg"}, {"name": "格列美脲", "dosage": "2mg"} ] } ] } med_kg.add_disease(disease_data) # 语义搜索：根据症状查找疾病 results = med_kg.semantic_search(['多饮', '多尿'], 'symptom') print("根据症状搜索结果:", results) # 搜索治疗方案 treatments = med_kg.semantic_search(['糖尿病'], 'treatment') print("治疗方案:", treatments)

性能优化与最佳实践

1. JSON-LD压缩与优化

import json import hashlib class JSONLDCompressor: """JSON-LD压缩与优化工具""" def __init__(self): self.context_cache = {} def compact(self, jsonld_data, context): """ 使用上下文压缩JSON-LD """ # 使用jsonld库进行压缩 try: from jsonld import compact as jsonld_compact return jsonld_compact(jsonld_data, context) except ImportError: # 手动实现简化版压缩 return self._manual_compact(jsonld_data, context) def _manual_compact(self, data, context): """手动压缩实现""" if isinstance(data, dict): compacted = {} for key, value in data.items(): if key.startswith('@'): compacted[key] = value else: # 查找上下文中的短键 short_key = self._find_short_key(key, context) compacted[short_key] = self._manual_compact(value, context) return compacted elif isinstance(data, list): return [self._manual_compact(item, context) for item in data] else: return data def _find_short_key(self, long_key, context): """查找短键""" for short, long in context.items(): if long == long_key or long == f"https://schema.org/{long_key}": return short return long_key def generate_hash(self, jsonld_data): """生成数据哈希用于缓存""" normalized = json.dumps(jsonld_data, sort_keys=True) return hashlib.sha256(normalized.encode()).hexdigest() # 使用示例 compressor = JSONLDCompressor() large_jsonld = { "@context": {"schema": "https://schema.org/"}, "@type": "schema:Product", "schema:name": "测试商品", "schema:description": "这是一个很长的商品描述...", "schema:brand": {"schema:name": "测试品牌"}, "schema:offers": {"schema:price": "99.99", "schema:priceCurrency": "CNY"} } # 压缩 context = { "name": "https://schema.org/name", "description": "https://schema.org/description", "brand": "https://schema.org/brand", "price": "https://schema.org/price" } compacted = compressor.compact(large_jsonld, context) print("压缩后:", json.dumps(compacted, indent=2)) # 生成哈希 data_hash = compressor.generate_hash(large_jsonld) print("数据哈希:", data_hash)

2. 批量处理优化

import asyncio import aiohttp import json from typing import List, Dict class AsyncJSONLDProcessor: """异步批量处理JSON-LD""" def __init__(self, max_concurrent=10): self.semaphore = asyncio.Semaphore(max_concurrent) async def process_batch(self, jsonld_docs: List[Dict]) -> List[Dict]: """批量处理JSON-LD文档""" async with aiohttp.ClientSession() as session: tasks = [self._process_single(doc, session) for doc in jsonld_docs] return await asyncio.gather(*tasks, return_exceptions=True) async def _process_single(self, doc: Dict, session: aiohttp.ClientSession): """处理单个文档""" async with self.semaphore: # 模拟处理：验证、转换、存储 try: # 验证JSON-LD结构 if not self._validate_jsonld(doc): return {"status": "error", "message": "Invalid JSON-LD"} # 模拟API调用 await asyncio.sleep(0.1) # 模拟网络延迟 # 返回处理结果 return { "status": "success", "id": doc.get("@id"), "type": doc.get("@type"), "size": len(json.dumps(doc)) } except Exception as e: return {"status": "error", "message": str(e)} def _validate_jsonld(self, doc: Dict) -> bool: """验证JSON-LD基本结构""" required_fields = ["@context", "@type"] return all(field in doc for field in required_fields) # 使用示例 async def main(): processor = AsyncJSONLDProcessor(max_concurrent=5) # 生成测试数据 docs = [ { "@context": {"schema": "https://schema.org/"}, "@id": f"https://example.com/doc/{i}", "@type": "schema:Thing", "schema:name": f"文档{i}" } for i in range(20) ] results = await processor.process_batch(docs) success_count = sum(1 for r in results if isinstance(r, dict) and r.get("status") == "success") print(f"处理完成: {success_count}/{len(docs)} 成功") # 运行 # asyncio.run(main())

3. 缓存策略

import redis import json from functools import wraps def jsonld_cache(expire=3600): """JSON-LD缓存装饰器""" def decorator(func): @wraps(func) def wrapper(self, *args, **kwargs): # 生成缓存键 cache_key = f"jsonld:{func.__name__}:{hash(str(args))}:{hash(str(kwargs))}" # 尝试从缓存获取 cached = self.redis_client.get(cache_key) if cached: return json.loads(cached) # 执行函数 result = func(self, *args, **kwargs) # 存入缓存 self.redis_client.setex(cache_key, expire, json.dumps(result)) return result return wrapper return decorator class CachedJSONLDProcessor: def __init__(self, redis_host='localhost', redis_port=6379): self.redis_client = redis.Redis(host=redis_host, port=redis_port, decode_responses=True) @jsonld_cache(expire=7200) def get_entity_by_id(self, entity_id): """获取实体（带缓存）""" # 模拟数据库查询 return { "@id": entity_id, "@type": "schema:Person", "schema:name": "张三", "schema:email": "zhangsan@example.com" } # 使用示例 processor = CachedJSONLDProcessor() result1 = processor.get_entity_by_id("https://example.com/people/1") result2 = processor.get_entity_by_id("https://example.com/people/1") # 从缓存读取