ElasticSearch的综合实战-白红宇

ElasticSearch的综合实战

阅读量：612 次

发布时间：2019-03-13

本文共 2326 字，大约阅读时间需要 7 分钟。

ElasticSearch 综合实战指南

一、爬虫实现

京东搜索结果的抓取是整个项目的基础，决定了后续的数据处理流程。以下是主要实现步骤：

URL构造：根据关键词构造搜索URL，访问京东搜索页面。

HTML解析：使用Jsoup解析HTML，提取商品列表。

数据提取：

图片获取：注意到京东使用了懒加载技术，图片 URL 存在于 data-lazy-img 属性。

价格和标题：分别从 p-price 和 p-name 类中提取。

数据存储：将提取到的数据存储到本地文件或数据库，以备后续处理。

2. 数据存储到Elasticsearch

服务层：创建 ContentService，注入Elasticsearch客户端。

@Servicepublic class ContentService {    private RestHighLevelClient client;    @Autowired    @Qualifier("restHighLevelClient")    public void setClient(RestHighLevelClient client) {        this.client = client;    }    public boolean parseContent(String keyword) throws IOException {        List
     
       contents = new HtmlParseUtil().parseJD(keyword);        BulkRequest request = new BulkRequest();        request.timeout("2m");        for (int i = 0; i < contents.size(); i++) {            IndexRequest indexRequest = new IndexRequest("jd_goods")                .source(JSON.toJSONString(contents.get(i)), XContentType.JSON);            request.add(indexRequest);        }        BulkResponse bulk = client.bulk(request, RequestOptions.DEFAULT);        return !bulk.hasFailures();    }}

Controller层：提供RESTful API接口，接受关键词并调用解析服务。

@RestControllerpublic class ContentController {    private ContentService contentService;    @Autowired    public ContentController(ContentService contentService) {        this.contentService = contentService;    }    @GetMapping("/parse/{keyword}")    public Boolean parse(@PathVariable("keyword") String keyword) throws IOException {        return contentService.parseContent(keyword);    }}

3. 前后端分离

前端：使用Vue.js实现数据展示和交互。

后端：实现RESTful API，支持分页搜索。

@GetMapping("/search/{keyword}/{pageNo}/{pageSize}")public List
      
       > search(    @PathVariable("keyword") String keyword,    @PathVariable("pageNo") int pageNo,    @PathVariable("pageSize") int pageSize) throws IOException {        return contentService.searchPage(keyword, pageNo, pageSize);    }

4. 搜索高亮

为了提升用户体验，增加搜索结果的高亮显示：

修改服务类的搜索方法，添加高亮功能：

HighlightBuilder highlightBuilder = new HighlightBuilder();highlightBuilder.field("title");highlightBuilder.preTags("");highlightBuilder.postTags("");