本文共 2366 字,大约阅读时间需要 7 分钟。
京东搜索结果的抓取是整个项目的基础,决定了后续的数据处理流程。以下是主要实现步骤:
data-lazy-img 属性。p-price 和 p-name 类中提取。服务层:创建 ContentService,注入Elasticsearch客户端。
@Servicepublic class ContentService { private RestHighLevelClient client; @Autowired @Qualifier("restHighLevelClient") public void setClient(RestHighLevelClient client) { this.client = client; } public boolean parseContent(String keyword) throws IOException { List contents = new HtmlParseUtil().parseJD(keyword); BulkRequest request = new BulkRequest(); request.timeout("2m"); for (int i = 0; i < contents.size(); i++) { IndexRequest indexRequest = new IndexRequest("jd_goods") .source(JSON.toJSONString(contents.get(i)), XContentType.JSON); request.add(indexRequest); } BulkResponse bulk = client.bulk(request, RequestOptions.DEFAULT); return !bulk.hasFailures(); }} Controller层:提供RESTful API接口,接受关键词并调用解析服务。
@RestControllerpublic class ContentController { private ContentService contentService; @Autowired public ContentController(ContentService contentService) { this.contentService = contentService; } @GetMapping("/parse/{keyword}") public Boolean parse(@PathVariable("keyword") String keyword) throws IOException { return contentService.parseContent(keyword); }} @GetMapping("/search/{keyword}/{pageNo}/{pageSize}")public List 为了提升用户体验,增加搜索结果的高亮显示:
HighlightBuilder highlightBuilder = new HighlightBuilder();highlightBuilder.field("title");highlightBuilder.preTags("");highlightBuilder.postTags(""); http://localhost:9090/search/java/1/20,验证搜索和高亮功能。通过本次实战,掌握了从数据抓取到存储、索引、查询优化等一整套Elasticsearch技能。同时,前后端分离的实现搭建了一个完整的应用架构,为后续项目开发奠定了坚实基础。
转载地址:http://bzkaz.baihongyu.com/