Website Indexing is not working (front-end check)

Issue #271 new

Enrico Piccini created an issue 2016-06-09

No description provided.

Comments (17)

David Marrs
@enricosoft Do you mean indexing by search engines? Would the change to stop everyone without a udemi_welcome cookie being redirected to /welcome fix that? (The changes you just deployed to test.udemi.org last night - see https://bitbucket.org/udemi/udemi-ui/commits/5589476a2fbc64079fef750f6e4c39d8970373cb#chg-public/static/js/udemi-ng.js )
- 2016-06-09T20:00:46+00:00
Enrico Piccini reporter
@dwmarrs Yes, the search engines indexing.... Right now we've only 1 page indexed :( The things to check are: * check if we've setted a robots.txt that prevents the indexing * check the indexing metatags on the html pages * Create a patch to "subscription" flow that in addition to cookie, check also the user agent using userAgent.indexOf('....'). If the user agent is "facebookexternalhit", "Twitterbot", "LinkedInBot", "Googlebot", "Bingbot" and "YandexBot" we don't redirect to /welcome

Do you wanna work this issue Dave? If so, assign this to your account
- 2016-06-10T14:13:25+00:00
Enrico Piccini reporter
- changed title to Website Indexing is not working (front-end check)
- 2016-06-10T14:13:39+00:00
David Marrs
- assigned issue to
  
  David Marrs
- 2016-06-10T19:38:20+00:00
David Marrs
@Brotrob Can you please test the tour functionality changes on http://test.udemi.org/ because that will help with this problem.

Thanks
- 2016-06-10T19:42:13+00:00
David Marrs
Started a check on branch feature/NoRedirectCrawlersToWelcome
- 2016-06-10T21:05:18+00:00
udemi repo owner
Hey @dwmarrs - so I tested the tour functionality on test.udemi.org once on Chrome and it is looking good :)
- 2016-06-11T14:01:39+00:00
Enrico Piccini reporter
@Brotrob @dwmarrs So, is it possibile to deploy the new subscription flow to PROD? is it stable?

@dwmarrs News about indexing problem?
- 2016-06-12T08:00:16+00:00
David Marrs
I have merged the check to prevent redirect from the Issues map to /welcome into the develop branch just now.

I tested it just by creating some fake user agents for the bots in Chrome dev tools. Seems ok to me, doesn't look like it will interfere with normal visitors being redirected to /welcome

If we get it through test and out to prod then we can test it with the likes of webmaster tools (I'm assuming we didn't want to point webmaster tools to our test and dev sites?). The only problem I can see is if we have the wrong strings to check against user agents (I just googled to get the user agents strings to check).

HOWEVER, a thought has occured to me as I write this: even if crawlers were redirected to /welcome, I assume they should, after that redirect, still be able to move on from there to pages like /about, and it doesn't look like /about has been indexed. So, I looked in Google Webmaster Tools Search Console, and ran Crawl > Fetch as Google. There is an issue with robots.txt blocking the static folder, which contains the JavaScript which will load page contents in, so I have created a pull request to allow it to be crawled, assuming Enrico thinks that's safe: https://bitbucket.org/udemi/udemi-ui/pull-requests/10/allow-crawlers-to-access-javascript-in/diff

The only other problem i can think of (once the JavaScript can be loaded) is that the crawlers will be shown top issues/policies/actions/users around where they are based (based on their ip address) so if there are no issues around there then no issues will be indexed etc. Do we need some sort of site map for that? I imagine that could be huge. Something to think about...
- 2016-06-12T20:24:18+00:00
Enrico Piccini reporter
Hi @dwmarrs , it's ok to index the static folder but exclude files like "env.js" or files that contains private information. You can decide which files must be private
- 2016-06-13T07:31:10+00:00
David Marrs
Hmmm.... thinking about this, I might need to remove the line
```
#!

Disallow: /views/
```
From robots.txt as well. Let's deploy to live and run Webmaster tools on it now anyway. Can always remove the line later.

I think we will need it to be able to access env.js for the page to render anyway. Perhaps I could lock that access down to just Googlebot, BingBot, Yandexbot later (if we need to allow access to /views/ anyway)
- 2016-06-26T18:00:45+00:00
udemi repo owner
Very much looking forward to this feature - the more content we have, the more we should rise in the search engine rankings.

This is a very strong value proposition for politicians, who want their web-presence up the search results :)
- 2016-06-27T09:16:50+00:00
Enrico Piccini reporter
@dwmarrs Hey Dave, I don't understand if you completed this task or not? You need only to check if works using Webmaster tools on test.udemi.org and then is done?
- 2016-06-27T14:21:56+00:00
David Marrs
Yeah, I was right, I should have removed that line about /views. Done now on PR at https://bitbucket.org/udemi/udemi-ui/pull-requests/11/allowing-googlebot-etc-access-to-views/diff

@enricosoft see my comments on that PR about crawling dev.udemi.org and test.udemi.org. Am I wrong to assume that we don't want to point Webmaster tools towards those two sites?
- 2016-06-29T13:06:32+00:00
Enrico Piccini reporter
Hi @dwmarrs , I think you should disallow /static and /views on the robots.txt because we don't want to index those files....

About the dev.udemi.org and test.udemi.org, right now we've the same robots.txt of prod so is possibile that Google index them if someone send it manually to the index (through Webmaster Tools) or other domains link to those... If not theoretically they will not be never indexed. I have an Idea that we can implement to be sure that dev.udemi.org and test.udemi.org will never be indexed: from javascript code, detect the hostname (eg: dev.udemi.org) and if so, set the meta <meta name="robots" content="noindex,nofollow"/>

I think this is the last thing you can do + the robots.txt changes
- 2016-06-29T13:36:31+00:00
David Marrs
Hi @enricosoft I have updated the pull request with some JS to change the Robots meta tag as you suggested, can this be merged & deployed now?
- 2016-06-30T19:55:18+00:00
Enrico Piccini reporter
Hey @dwmarrs , yes of course.... I prepare a deploy now
- 2016-06-30T19:57:08+00:00
Log in to comment

Assignee: David Marrs

Type: bug

Priority: major

Status: new

Votes: 0

Watchers: 1