This forum is in archive mode. You will not be able to post new content.

Author Topic: [Request] Resources on search engine designs and algorithms  (Read 526 times)

0 Members and 1 Guest are viewing this topic.

Spacecow

  • Guest
[Request] Resources on search engine designs and algorithms
« on: August 29, 2014, 08:18:40 AM »
Recently I have been interested in learning more about how search engines operate and the theory behind some of the algorithms such as page rankings, searching, spiders, etc and was wondering if anyone here has any links/ebooks/papers on the subject. I've been looking up some myself but am curious if I missed any notable resources or hidden gems out there. Anything that can be scraped up would be appreciated.  :)

Offline Deque

  • P.I.N.N.
  • Global Moderator
  • Overlord
  • *
  • Posts: 1203
  • Cookies: 518
  • Programmer, Malware Analyst
    • View Profile
Re: [Request] Resources on search engine designs and algorithms
« Reply #1 on: August 29, 2014, 01:59:39 PM »
Have you read about Ontologies/Semantic Web?

Offline Psycho_Coder

  • Knight
  • **
  • Posts: 166
  • Cookies: 84
  • Programmer, Forensic Analyst
    • View Profile
    • Code Hackers Blog
Re: [Request] Resources on search engine designs and algorithms
« Reply #2 on: August 29, 2014, 02:20:48 PM »
Hello Spacecow,

Search Engines arequite difficult to make and require a great deal or different other knowledge. Basically, Search engines are a part of Information Retrieval. So, you could look into those algorithms. Also just IR techniques won't do. You need to know a good amount of NLP concepts, Machine Learning and more importantly Statistical Concepts.

Learn about these :- Gradient Descent, Bayes Classifiers, Logistic and Linear Regression, TF-IDF (Term Frequency–Inverse Document Frequency), Clustering, Supervised and Unsupervised Machine Learning and many more.

For books you can follow these which are my favorite :-

http://www.amazon.com/Introduction-Information-Retrieval-Christopher-Manning/dp/0521865719/ref=pd_sim_b_1?ie=UTF8&refRID=11P5YPK2HET273Z3A3VV

http://www.amazon.com/gp/product/0262026511/ref=pd_lpo_sbs_dp_ss_1?pf_rd_p=1535523722&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0136072240&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=11P5YPK2HET273Z3A3VV

http://www.amazon.com/Programming-Collective-Intelligence-Building-Applications/dp/0596529325/ref=sr_1_1?ie=UTF8&qid=1409319169&sr=8-1&keywords=programming+collective+intelligence

http://www.amazon.com/Search-Engines-Information-Retrieval-Practice/dp/0136072240

For PAttern Classification : http://www.amazon.com/Pattern-Classification-2nd-Computer-Manual/dp/0471703508/ref=sr_1_1?s=books&ie=UTF8&qid=1409314315&sr=1-1&keywords=pattern+classification+duda

Recommended Programming Languages : Python, java

Google for these Tools :- Lucene, NLTK, LingPipe.

For NLP :  http://www.amazon.com/Speech-Language-Processing-Daniel-Jurafsky/dp/0131873210/ref=sr_1_1?s=books&ie=UTF8&qid=1409314745&sr=1-1&keywords=speech+and+language+processing

For machine Learning and Bayesian Statistics : http://www.amazon.com/Bayesian-Reasoning-Machine-Learning-Barber/dp/0521518148/ref=sr_1_7?s=books&ie=UTF8&qid=1409314444&sr=1-7&keywords=machine+learning+a+probabilistic+perspective

Also if you think making a good search engine would be easy then you must be dreaming.


The above will provide with the various concepts and algorithms and designs which are required. Also very strong background with Database design is essential since its important to store data properly naively. You can search by "Search engine" on Google Scholar which will fetch you a lot research papers.

I hope you don't get scared :P

EDIT: Also learn about Scraping and web crawlers and make them to extract data and build database.  These are some database that offer requests through REST API http://sindice.com/ and  http://www.freebase.com/

These will be very valuable. Also if you make a product oriented search engine then Focus on proper organization of User data and proper syntactic analysis of User queries and gathering data from their queries :)
« Last Edit: August 29, 2014, 03:33:10 PM by Psycho_Coder »
"Don't do anything by half. If you love someone, love them with all your soul. When you hate someone, hate them until it hurts."--- Henry Rollins

Spacecow

  • Guest
Re: [Request] Resources on search engine designs and algorithms
« Reply #3 on: August 29, 2014, 09:22:43 PM »
I'm not at all scared, in fact that was just the kind of response I was hoping to get. :)
I'm not planning on making a search engine and I certainly don't think it's an easy task lol I was just curious on the topic because I realized I use search engines everyday and yet know next to nothing about the algorithms and technology that powers them. And besides, who would try to build their own search engine? We all know that Bing already perfected it in every way :P Anyways thanks for the help and looks like its time for some reading. 

 



Want to be here? Contact Ande, Factionwars or Kulverstukas on the forum or at IRC.