[0.0]Analysis of Baidu search engine

时间:2023-10-06 16:14:14

Rencently, my two teammates and I is doing a project, a simplified Chinese search engine for children(in primary school). We call it "kidsearch".

Since our project will be based on Baidu search engine. I'd like to have a simple analysis of Baidu search engine.

First, Baidu is not for children to use totally. Baidu, as a commercial company, provides the public a free service of searching. It is natural that not all the contents shown on the search engine are what people need. Some of them are shown because of benefits and some other factors.Perhaps it doesn't have a great impact on adults who can distinguish the contents of good or bad. But the impact will be obvious when it comes to children. For example,we can search these keys on Baidu : "波"(notice its pictures),"交换群"(notice its results),"医院"(notice its advertisements). And these are some normal words. Don't mention the results of some even worse key words. These results of searching not just inappropriate, some of them even harmful. So, the situation has to be fixed, which is also the purpose of our project "kidsearch".

Actually, seaching on the Internet for children is easier to that for adults. So the problem is also simplified. We can just use Baidu as a tool(not exagerated), rearrange the result, fix the inproper or useless entries, and add some contents suitable for children. The search engine will be really better for children after we do some fix on it.

So, what are the contents appropriate to children?

Based on the thoughts above, I concluded the requirements of children, which are what children may need.(Perhaps it doesn't cover all at present and we will perfect it in the future)

1.Notion -- encyclopedia

2.Material -- picture, music, video

3.Entertainment -- game

4.Study -- homework, knowledge

Moreover, there are some kinds of content that children don't need:

1.advertisement

2.adult(mature) content

3.sexual or homosexual content

4.sidebar(ad. or adult content or useless for children mostly)

Now that we have known what children need, what we should do next is to tackle them one by one.

What the technology we will use?

After tried many approaches, such as PHP, Java, Python, etc. I decided to use Python to do this job because it's really convenient to do the crawl job. Although it is a bit more difficult to make webpages than PHP, it doesn't matter too much.

Besides, there are huge amount of extended library to use with Python, such as requests, flask, django, jieba, etc. I have tried all of them preliminarily.

More details will be illustrated later. And our aim is to create a search engine which children can use and like to use.