ElasticSearch的综合实战-白红宇

ElasticSearch的综合实战

阅读量：612 次

发布时间：2019-03-13

本文共 2366 字，大约阅读时间需要 7 分钟。

ElasticSearch 综合实战指南

一、爬虫实现

京东搜索结果的抓取是整个项目的基础，决定了后续的数据处理流程。以下是主要实现步骤：

URL构造：根据关键词构造搜索URL，访问京东搜索页面。

HTML解析：使用Jsoup解析HTML，提取商品列表。

数据提取：

图片获取：注意到京东使用了懒加载技术，图片 URL 存在于 data-lazy-img 属性。

价格和标题：分别从 p-price 和 p-name 类中提取。

数据存储：将提取到的数据存储到本地文件或数据库，以备后续处理。

2. 数据存储到Elasticsearch

服务层：创建 ContentService，注入Elasticsearch客户端。

@Service
public class ContentService {
    private RestHighLevelClient client;
    @Autowired
    @Qualifier("restHighLevelClient")
    public void setClient(RestHighLevelClient client) {
        this.client = client;
    }
    public boolean parseContent(String keyword) throws IOException {
        List
     
       contents = new HtmlParseUtil().parseJD(keyword);
        BulkRequest request = new BulkRequest();
        request.timeout("2m");
        for (int i = 0; i < contents.size(); i++) {
            IndexRequest indexRequest = new IndexRequest("jd_goods")
                .source(JSON.toJSONString(contents.get(i)), XContentType.JSON);
            request.add(indexRequest);
        }
        BulkResponse bulk = client.bulk(request, RequestOptions.DEFAULT);
        return !bulk.hasFailures();
    }
}

Controller层：提供RESTful API接口，接受关键词并调用解析服务。

@RestController
public class ContentController {
    private ContentService contentService;
    @Autowired
    public ContentController(ContentService contentService) {
        this.contentService = contentService;
    }
    @GetMapping("/parse/{keyword}")
    public Boolean parse(@PathVariable("keyword") String keyword) throws IOException {
        return contentService.parseContent(keyword);
    }
}

3. 前后端分离

前端：使用Vue.js实现数据展示和交互。

后端：实现RESTful API，支持分页搜索。

@GetMapping("/search/{keyword}/{pageNo}/{pageSize}")
public List
      
       > search(
    @PathVariable("keyword") String keyword,
    @PathVariable("pageNo") int pageNo,
    @PathVariable("pageSize") int pageSize) throws IOException {
        return contentService.searchPage(keyword, pageNo, pageSize);
    }

4. 搜索高亮

为了提升用户体验，增加搜索结果的高亮显示：

修改服务类的搜索方法，添加高亮功能：

HighlightBuilder highlightBuilder = new HighlightBuilder();
highlightBuilder.field("title");
highlightBuilder.preTags("");
highlightBuilder.postTags("");